The impressive performance we have obtained is because supervised (in this case technically “self-supervised”) learning is much easier than e.g. reinforcement learning and other paradigms that naturally learn planning policies. We do not actually know how to overcome this barrier.
What about current reasoning models trained using RL? (Do you think something like, we don’t know, and won’t easily figure out, how to make that work well outside a narrow class of tasks that doesn’t include ‘anything important’?)
Edit: The class of tasks doesn’t include autonomously doing important things such as making discoveries. It does include becoming a better coding assistant.
What about current reasoning models trained using RL? (Do you think something like, we don’t know, and won’t easily figure out, how to make that work well outside a narrow class of tasks that doesn’t include ‘anything important’?)
Yes, that is what I think.
Edit: The class of tasks doesn’t include autonomously doing important things such as making discoveries. It does include becoming a better coding assistant.