Chain-of-Thought Reasoning is a Policy Improvement Operator.
Publication information:
H. Zhang and D. C. Parkes,
“Chain-of-Thought Reasoning is a Policy Improvement Operator”., Workshop on Instruction Tuning and Instruction Following at NeurIPS 2023 , 2023.