Chain-of-Thought Reasoning is a Policy Improvement Operator.

Publication information:

H. Zhang and D. C. Parkes,
“Chain-of-Thought Reasoning is a Policy Improvement Operator”., Workshop on Instruction Tuning and Instruction Following at NeurIPS 2023 , 2023.