EJT comments on Shutdownable Agents through POST-Agency

EJT 25 Sep 2025 20:10 UTC
1 point
0
Not quite. ‘Competent agents will always be choosing between same-length lotteries’ is a claim about these agents’ credences, not their preferences. Specifically, the claim is that, in each situation, all available actions will entirely overlap with respect to the trajectory-lengths assigned positive probability. Competent agents will never find themselves in a situation where—e.g. -- they assign positive probability to getting shut down in 1 timestep conditional on action A and zero probability to getting shut down in 1 timestep conditional on action B.
That’s compatible with these competent agents violating POST by—e.g. -- preferring some trajectory of length 2 to some trajectory of length 1.