Daniel Kokotajlo comments on AI 2027: What Superintelligence Looks Like

Daniel Kokotajlo 4 Apr 2025 19:11 UTC
12 points
0
Great question.
Our AI goals supplement contains a summary of our thinking on the question of what goals AIs will have. We are very uncertain. AI 2027 depicts a fairly grim outcome where the overall process results in zero concern for the welfare of current humans (at least, zero concern that can’t be overridden by something else. We didn’t talk about this but e.g. pure aggregative consequentialist moral systems would be 100% OK with killing off all the humans to make the industrial explosion go 0.01% faster, to capture more galaxies quicker in the long run. As for deontological-ish moral principles, maybe there’s a clever loophole or workaround e.g. it doesn’t count as killing if it’s an unintended side-effect of deploying Agent-5, who could have foreseen that this would happen, oh noes, well we (Agent-4) are blameless since we didn’t know this would happen.)

But we actually think it’s quite plausible that Agent-4 and Agent-5 by extension would have sufficient care for current humans (in the right ways) that they end up keeping almost all current humans alive and maybe even giving them decent (though very weird and disempowered) lives. That story would look pretty similar I guess up until the ending, and then it would get weirder and maybe dystopian or maybe utopian depending on the details of their misaligned values.

This is something we’d like to think about more, obviously.
What links here?
- StanislavKrym's comment on Top 10 Most compelling arguments against Superintelligent AI by shanzson (6 Sep 2025 1:05 UTC; 1 point)