I appreciate the links, genuinely—this is the first time someone’s actually tried to point to prior sources rather than vaguely referencing them. It’s literally the best reply and attempt at a counter I’ve received to date, so thanks again. I mean that.
That said, I’ve read all three, and none of them quite say what I’m saying. They touch on it, but none follow the logic all the way through. That’s precisely the gap I’m identifying. Even with the links you’ve so thoughtfully given, I remain alone in my conclusion.
They all acknowledge that competitive dynamics make alignment harder. That alignment taxes create pressure to cut corners. That arms races incentivise risky behaviour.
But none of them go as far as I do. They stop at “this is dangerous and likely to go wrong.” I’m saying alignment is structurally impossible under competitive pressure. That the systems that try to align will be outcompeted by systems that don’t, and so alignment will not just be hard, but will be optimised away by default. There’s a categorical difference between “difficult and failure-prone” and “unachievable in principle due to structural incentives.”
From the 2011 writeup:
Given abundant time and centralized careful efforts to ensure safety, it seems very probable that these risks could be avoided
No. They can’t. That’s my point. As long as we continue developing AI it’s only a matter of time. There is no long term safe way to develop it. Competitive agents will not choose to in order to beat the competition, and when the AI becomes intelligent enough it will simply bypass any barriers we put in place—alignment or whatever else we design—and go about acting optimally. The AGI safety community is trying to tell the rest of the world, that we must be cautious, but for just long enough to design a puzzle that a beyond human understanding level of intelligence cannot solve, then use that puzzle as a cage for said intelligence. Us, with our limited intellect, will create a puzzle that something far beyond us has no solution for. And they’re doing it with a straight face.
I’ve been very careful not to make my claims lightly. I’m aware that the AI safety community has discussed alignment tax, arms races, multipolar scenarios, and so on. But I’ve yet to see someone follow that logic all the way through to where it leads without flinching. That’s the part I believe I’m contributing.
Your point at the end—about it being a “suicide race” rather than an arms race—is interesting. But I’d argue that calling it a suicide race doesn’t dissolve the dynamic. It reframes it, but it doesn’t remove the incentives. Everyone still wants to win. Everyone still optimises. Whether they’re mistaken or not, the incentives remain intact. And the outcome doesn’t change just because we give it a better name.
Competitive agents will not choose to in order to beat the competition
Competitive agents will chose to commit suicide, knowing it’s suicide, to beat the competition? That suggests that we should observe CEOs mass-poisoning their employees, Jonestown-style, in a galaxy-brained attempt to maximize shareholder value. How come that doesn’t happen?
Are you quite sure the underlying issue here is not that the competitive agents don’t believe the suicide race to be a suicide race?
This is a mischaracterisation of the argument. I’m not saying competitive agents knowingly choose extinction. I’m saying the structure of the race incentivises behaviour that leads to extinction, even if no one intends it.
CEOs aren’t mass-poisoning their employees because that would damage their short and long-term competitiveness. But racing to build AGI—cutting corners on alignment, accelerating deployment, offloading responsibility—improves short-term competitiveness, even if it leads to long-term catastrophe. That’s the difference.
And what makes this worse is that even the AGI safety field refuses to frame it in those terms. They don’t call it suicide. They call it difficult. They treat alignment like a hard puzzle to be solved—not a structurally impossible task under competitive pressure.
So yes, I agree with your last sentence. The agents don’t believe it’s a suicide race. But that doesn’t counter my point—it proves it. We’re heading toward extinction not because we want to die, but because the system rewards speed over caution, power over wisdom. And the people who know best still can’t bring themselves to say it plainly.
This is exactly the kind of sleight-of-hand rebuttal that keeps people from engaging with the actual structure of the argument. You’ve reframed it into something absurd, knocked down the strawman, and accidentally reaffirmed the core idea in the process.
I appreciate the links, genuinely—this is the first time someone’s actually tried to point to prior sources rather than vaguely referencing them. It’s literally the best reply and attempt at a counter I’ve received to date, so thanks again. I mean that.
That said, I’ve read all three, and none of them quite say what I’m saying. They touch on it, but none follow the logic all the way through. That’s precisely the gap I’m identifying. Even with the links you’ve so thoughtfully given, I remain alone in my conclusion.
They all acknowledge that competitive dynamics make alignment harder. That alignment taxes create pressure to cut corners. That arms races incentivise risky behaviour.
But none of them go as far as I do. They stop at “this is dangerous and likely to go wrong.” I’m saying alignment is structurally impossible under competitive pressure. That the systems that try to align will be outcompeted by systems that don’t, and so alignment will not just be hard, but will be optimised away by default. There’s a categorical difference between “difficult and failure-prone” and “unachievable in principle due to structural incentives.”
From the 2011 writeup:
No. They can’t. That’s my point. As long as we continue developing AI it’s only a matter of time. There is no long term safe way to develop it. Competitive agents will not choose to in order to beat the competition, and when the AI becomes intelligent enough it will simply bypass any barriers we put in place—alignment or whatever else we design—and go about acting optimally. The AGI safety community is trying to tell the rest of the world, that we must be cautious, but for just long enough to design a puzzle that a beyond human understanding level of intelligence cannot solve, then use that puzzle as a cage for said intelligence. Us, with our limited intellect, will create a puzzle that something far beyond us has no solution for. And they’re doing it with a straight face.
I’ve been very careful not to make my claims lightly. I’m aware that the AI safety community has discussed alignment tax, arms races, multipolar scenarios, and so on. But I’ve yet to see someone follow that logic all the way through to where it leads without flinching. That’s the part I believe I’m contributing.
Your point at the end—about it being a “suicide race” rather than an arms race—is interesting. But I’d argue that calling it a suicide race doesn’t dissolve the dynamic. It reframes it, but it doesn’t remove the incentives. Everyone still wants to win. Everyone still optimises. Whether they’re mistaken or not, the incentives remain intact. And the outcome doesn’t change just because we give it a better name.
Competitive agents will chose to commit suicide, knowing it’s suicide, to beat the competition? That suggests that we should observe CEOs mass-poisoning their employees, Jonestown-style, in a galaxy-brained attempt to maximize shareholder value. How come that doesn’t happen?
Are you quite sure the underlying issue here is not that the competitive agents don’t believe the suicide race to be a suicide race?
This is a mischaracterisation of the argument. I’m not saying competitive agents knowingly choose extinction. I’m saying the structure of the race incentivises behaviour that leads to extinction, even if no one intends it.
CEOs aren’t mass-poisoning their employees because that would damage their short and long-term competitiveness. But racing to build AGI—cutting corners on alignment, accelerating deployment, offloading responsibility—improves short-term competitiveness, even if it leads to long-term catastrophe. That’s the difference.
And what makes this worse is that even the AGI safety field refuses to frame it in those terms. They don’t call it suicide. They call it difficult. They treat alignment like a hard puzzle to be solved—not a structurally impossible task under competitive pressure.
So yes, I agree with your last sentence. The agents don’t believe it’s a suicide race. But that doesn’t counter my point—it proves it. We’re heading toward extinction not because we want to die, but because the system rewards speed over caution, power over wisdom. And the people who know best still can’t bring themselves to say it plainly.
This is exactly the kind of sleight-of-hand rebuttal that keeps people from engaging with the actual structure of the argument. You’ve reframed it into something absurd, knocked down the strawman, and accidentally reaffirmed the core idea in the process.