So RL “narrows the reasoning boundary”— the region of problems the model is capable of solving sometimes.
This seems useful if you don’t want your model answering questions about, say, how to make bombs.
This seems useful if you don’t want your model answering questions about, say, how to make bombs.