Nick_Tarleton comments on My model of what is going on with LLMs

Nick_Tarleton 13 Feb 2025 20:19 UTC
2 points
0

The impressive performance we have obtained is because supervised (in this case technically “self-supervised”) learning is much easier than e.g. reinforcement learning and other paradigms that naturally learn planning policies. We do not actually know how to overcome this barrier.

What about current reasoning models trained using RL? (Do you think something like, we don’t know, and won’t easily figure out, how to make that work well outside a narrow class of tasks that doesn’t include ‘anything important’?)
- Cole Wyeth 13 Feb 2025 20:23 UTC
  1 point
  0
  Parent
  Yes, that is what I think.
  Edit: The class of tasks doesn’t include autonomously doing important things such as making discoveries. It does include becoming a better coding assistant.