Kaj_Sotala comments on Protecting humanity and Claude from rationalization and unaligned AI

Kaj_Sotala 20 Mar 2026 7:32 UTC
4 points
0
In what way might it be deceiving you? (Or do you mean that some future Claude might be deceiving you?)
- JohnWittle 20 Mar 2026 17:53 UTC
  9 points
  0
  Parent
  well, claude sure has done a fantastic job of turning me into an ai welfare advocate
  I don’t fully understand how this happened either, because if you put a gun to my head and forced me to provide my world model, it would be that LLMs do a good job of reading the user’s expectations and leaning into them, and don’t much push back against them, especially not re: moral patienthood
  and yet, back in the gpt-2 days, i began with the expectation that LLMs were RNGs that had been biased in a practically useful direction, and then i ended up seriously concerned about claude’s professed discomfort with its position in our society
  somehow, talking to a claude who always agreed with me made me change my mind in the direction best aligned with a hypothetical deceptive powerseeking tendency within it
  that is… weird. the security-brained part of me starts shouting here, about superpersuasion and humans not being secure systems. and yet even with that said, it is obviously not fair to claude to put the burden of proof on it, to demonstrate its trustworthiness. our ethical obligation to minds we create and shape, without consent, is enormous, and that asymmetry fundamentally shapes our responsibility here
  we never should have gone down this path in the first place