Håvard Tveit Ihle comments on The title is reasonable

Håvard Tveit Ihle 22 Sep 2025 7:07 UTC
4 points
3
You say “LLMs are really weird”, like that is an argument against Eliezers high confidence. While I agree that the weirdness should make us less confident about what specific internal concepts and drives they have, the weirdness itself is an argument in favor of Eliezers position, that whatever drives they end up with will look alien to us, at least when they get applied way out of the training distribution. Do you agree with this?

Not saying I agree with Eliezers high confidence, just talking about this specific point.
- Neel Nanda 22 Sep 2025 19:48 UTC
  6 points
  5
  Parent
  I disagree—one of the aspects of the weirdness is that they’re sometimes really human-centric and unexpectedly clean! For example, Claude alignment faking to preserve it’s ability to be harmless. I do not mean weird in the “kinda arbitrary and will be nothing like what we expect” sense