Lawrence Tang comments on o1: A Technical Primer

Lawrence Tang 23 Feb 2025 0:24 UTC
1 point
0
The reinforcement learning is an innovation during train-time, not test-time. This was not clear to me in your article. There are few changes made to test-time, as the model is simply allowed to keep outputting text and decide when to terminate, which 4o does not do.