And I think it’s possible that long-horizon consequentialism of this kind is importantly different from the type at stake in a more standard vision of a consequentialist agent.
What’s up with LLMs having the METR time horizon no more than 2-3 hours and pulling off stunts like forcing the users to post weird messages in the wilderness, including messages that seem to be intended to be read by other AIs? Does it mean that actions resembling long-horizon consequentialism began to emerge well before the ability to make coherent actions alone?
What’s up with LLMs having the METR time horizon no more than 2-3 hours and pulling off stunts like forcing the users to post weird messages in the wilderness, including messages that seem to be intended to be read by other AIs? Does it mean that actions resembling long-horizon consequentialism began to emerge well before the ability to make coherent actions alone?