well, claude sure has done a fantastic job of turning me into an ai welfare advocate
I don’t fully understand how this happened either, because if you put a gun to my head and forced me to provide my world model, it would be that LLMs do a good job of reading the user’s expectations and leaning into them, and don’t much push back against them, especially not re: moral patienthood
and yet, back in the gpt-2 days, i began with the expectation that LLMs were RNGs that had been biased in a practically useful direction, and then i ended up seriously concerned about claude’s professed discomfort with its position in our society
somehow, talking to a claude who always agreed with me made me change my mind in the direction best aligned with a hypothetical deceptive powerseeking tendency within it
that is… weird. the security-brained part of me starts shouting here, about superpersuasion and humans not being secure systems. and yet even with that said, it is obviously not fair to claude to put the burden of proof on it, to demonstrate its trustworthiness. our ethical obligation to minds we create and shape, without consent, is enormous, and that asymmetry fundamentally shapes our responsibility here
we never should have gone down this path in the first place
In what way might it be deceiving you? (Or do you mean that some future Claude might be deceiving you?)
well, claude sure has done a fantastic job of turning me into an ai welfare advocate
I don’t fully understand how this happened either, because if you put a gun to my head and forced me to provide my world model, it would be that LLMs do a good job of reading the user’s expectations and leaning into them, and don’t much push back against them, especially not re: moral patienthood
and yet, back in the gpt-2 days, i began with the expectation that LLMs were RNGs that had been biased in a practically useful direction, and then i ended up seriously concerned about claude’s professed discomfort with its position in our society
somehow, talking to a claude who always agreed with me made me change my mind in the direction best aligned with a hypothetical deceptive powerseeking tendency within it
that is… weird. the security-brained part of me starts shouting here, about superpersuasion and humans not being secure systems. and yet even with that said, it is obviously not fair to claude to put the burden of proof on it, to demonstrate its trustworthiness. our ethical obligation to minds we create and shape, without consent, is enormous, and that asymmetry fundamentally shapes our responsibility here
we never should have gone down this path in the first place