CronoDAS comments on Tsinghua paper: Does RL Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

CronoDAS 6 May 2025 7:33 UTC
2 points
0

So RL “narrows the reasoning boundary”— the region of problems the model is capable of solving sometimes.

This seems useful if you don’t want your model answering questions about, say, how to make bombs.