That first sentence you point out isn’t written well and kind of says something different from the rest of this text, thanks for pointing this out.
I write this later:
To be clear, I think it is actually possible that some current misaligned behavior in AIs is caused by roleplaying from its pre-training distribution.
What my point is: Anthropic seems to consider hyperstition really important for alignment, including alignment of future superhuman AI. Hyperstition is a harmful argument to spread for the discourse. And doesn’t appear relevant to aligning actually dangerous, superhuman AI. It can totally explain some current weird misbehavior from AI.
That first sentence you point out isn’t written well and kind of says something different from the rest of this text, thanks for pointing this out.
I write this later:
What my point is: Anthropic seems to consider hyperstition really important for alignment, including alignment of future superhuman AI. Hyperstition is a harmful argument to spread for the discourse. And doesn’t appear relevant to aligning actually dangerous, superhuman AI. It can totally explain some current weird misbehavior from AI.