What evidence is there that a model’s labels can benefit its own training? Or that an “ORM” or “PRM” can benefit an LLM? This is the big problem which is not addressed in this article.
What evidence is there that a model’s labels can benefit its own training? Or that an “ORM” or “PRM” can benefit an LLM? This is the big problem which is not addressed in this article.