Lawrence Tang comments on o1: A Technical Primer

Lawrence Tang 23 Feb 2025 0:31 UTC
1 point
0
What evidence is there that a model’s labels can benefit its own training? Or that an “ORM” or “PRM” can benefit an LLM? This is the big problem which is not addressed in this article.