Bronson Schoen comments on LLMs are badly misaligned

Bronson Schoen 7 Oct 2025 9:57 UTC
2 points
0
I think a reasonable version of this (done on e.g. Claude 4.5 Sonnet) would be pretty likely to result in preferences that care a decent amount about keeping humans alive with their preferences satisfied

I know this is speculative, but is your intuition that this is also true for OpenAI models? (ex: GPT-5, o3)?
- ryan_greenblatt 7 Oct 2025 17:34 UTC
  2 points
  1
  Parent
  Probably? But less likely. Shrug.