StanislavKrym comments on How human-like do safe AI motivations need to be?

StanislavKrym 12 Nov 2025 10:38 UTC
1 point
0
And I think it’s possible that long-horizon consequentialism of this kind is importantly different from the type at stake in a more standard vision of a consequentialist agent.
What’s up with LLMs having the METR time horizon no more than 2-3 hours and pulling off stunts like forcing the users to post weird messages in the wilderness, including messages that seem to be intended to be read by other AIs? Does it mean that actions resembling long-horizon consequentialism began to emerge well before the ability to make coherent actions alone?
What links here?
- StanislavKrym's comment on How human-like do safe AI motivations need to be? by Joe Carlsmith (12 Nov 2025 23:32 UTC; 1 point)