I think jaggedness of RL (in modern LLMs) is an obstruction that would need to be addressed explicitly, otherwise it won’t fall to incremental improvements or scaffolding. There are two very different levels of capability, obtained in pretraining and in RLVR, but only pretraining is somewhat general. And even pretraining doesn’t adapt to novel situations other than through in-context learning, which only expresses capabilities at the level of pretraining, significantly weaker than RLVR-trained narrow capabilities.
Scaling will make pretraining stronger, but probably not sufficiently to matter for this issue, and natural text data will only last for another step of improvement similar to what happened in 2023-2025 (in pretraining only, ignoring RLVR). If RL doesn’t get more general, it’ll probably remain useless for improving general capabilities outside the skills trained with RLVR. Capabilities will remain jagged, with gaps that have to be addressed manually by changing the training data.
This could change within a few years, possibly even faster if LLMs can be RLVRed to become able to RLVR themselves, though that won’t necessarily work. Or via next token prediction RLVR that makes pretraining stronger without requiring more natural text data, but this probably needs much more compute even if it works in principle, so might also take 5-10 years, to uncertain capability level results.
Under the belief vs. understanding distinction, open-mindedness is a virtue of understanding ideas you disbelieve (or purposes you don’t endorse). It’s not directly relevant to belief, but sometimes understanding is the bottleneck to belief, in which case more open-mindedness would help. When you already understand the idea, open-mindedness is no longer relevant.
More to the point, understanding an idea shouldn’t necessarily result in believing it, and high open-mindedness doesn’t increase the number of hours in a day. Learning any given piece of nonsense would still not be the best thing to focus on, but high open-mindedness should prevent you from actively avoiding low-hanging fruit of understanding when it’s ripe for the taking, just because it doesn’t seem to be the kind of thing you are likely to believe or endorse.