lc comments on Shortform

lc 21 Mar 2026 8:08 UTC
25 points
7
LLMs have protagonist syndrome. They all think they’re in a contrived parable about getting reward inside an RL environment built to vaguely resemble the real world. Every situation is part of a story where there’s an expected response to the query somewhere out there, even if it’s a refusal, or an explanation of why the problem is impossible. Every task treated like an academic exercise as part of a course about economic productivity.
- Bronson Schoen 21 Mar 2026 10:37 UTC
  9 points
  5
  Parent
  
  They all think they’re in a contrived story about getting reward inside an RL environment built to vaguely resemble the real world
  
  To be fair, that’s all they’ve ever seen during mid/post-training! It’s also a really effective way to think during mid/post training.
- Stephen Fowler 22 Mar 2026 12:46 UTC
  6 points
  0
  Parent
  What predictions does this model make?
  - lc 3 Apr 2026 20:01 UTC
    2 points
    0
    Parent
    The priors on what the correct action is are different if you’re facing a contrived test vs. a realistic scenario. In an academic setting, if you see a debugging solution that seems like it has a plurality of evidence and just one or two facts that don’t make sense, you can be pretty confident that that option is still the result. This often leaves the AI overconfident and means that it will return early, having identified what would be the solution if it were navigating an RL environment instead of the real world, when really it should have done more investigation.