Appreciate the thoughtful reply—even if it’s branded as a “thoughtless kneejerk reaction.”
I disagree with your framing that this is just 101-level AGI risk content. The central argument is not that AGI is dangerous. It’s that alignment is structurally impossible under competitive pressure, and that capitalism—while not morally to blame—is simply the most extreme and efficient version of that dynamic.
Most AGI risk discussions stop at “alignment is hard.” I go further: alignment will be optimised away, because any system that isn’t optimising as hard as possible won’t survive the race. That’s not an “entry-level” argument—it’s an uncomfortable one. If you know where this specific line of reasoning has been laid out before, I’d genuinely like to see it. So far, people just say “we’ve heard this before” and fail to cite anything. It’s happened so many times I’ve lost count. Feel free to be the first to buck the trend and link someone making this exact argument, clearly, before I did.
I’m also not “focusing purely on capitalism.” The essay explicitly states that competitive structures—whether between nations, labs, or ideologies—would lead to the same result. Capitalism just accelerates the collapse. That’s not ideological; that’s structural analysis.
The suggestion that I should have reframed this as a way to “tap into anti-capitalist sentiment” misses the point entirely. I’m not trying to sell a message. I’m explaining why we’re already doomed. That distinction matters.
As for the asteroid analogy: your rewrite is clever, but wrong. You assume the people in the room already understand the trajectory. My entire point is that they don’t. They’re still discussing mitigation strategies while refusing to accept that the cause of the asteroid’s trajectory is unchangeable. And the fact that no one can directly refute that logic—only call it “entry-level” or “unhelpful”—kind of proves the point.
So yes, you did skim my essay—with the predictable result. You repeated what many others have already said, without identifying any actual flaws, and misinterpreted as much of it as possible along the way.
alignment is structurally impossible under competitive pressur
Alignment contrasts with control, as a means to AI safety.
Alignment roughly means the AI has goals, or values similar to human ones (which are assumed, without much evidence to be similar across humans), so that it will do what we want , because it’s what it wants.
Control means that it doesn’t matter what the AI wants, if it wants anything.
In short, there is plenty of competitive pressure towards control , because no wants an AI they can’t control. Control is part of capability.
Given abundant time and centralized careful efforts to ensure safety, it seems very probable that these risks could be avoided: development paths that seemed to pose a high risk of catastrophe could be relinquished in favor of safer ones. However, the context of an arms race might not permit such caution. A risk of accidental AI disaster would threaten all of humanity, while the benefits of being first to develop AI would be concentrated, creating a collective action problem insofar as tradeoffs between speed and safety existed.
I assure you the AI Safety/Alignment field has been widely aware of it since at least that long ago.
Also,
alignment will be optimised away, because any system that isn’t optimising as hard as possible won’t survive the race
Any (human) system that is optimizing as hard as possible also won’t survive the race. Which hints at what the actual problem is: it’s not even that we’re in an AI arms race, it’s that we’re in an AI suicide race which the people racing incorrectly believe to be an AI arms race. Convincing people of the true nature of what’s happening is therefore a way to dissolve the race dynamic. Arms races are correct strategies to pursue under certain conditions; suicide races aren’t.
I appreciate the links, genuinely—this is the first time someone’s actually tried to point to prior sources rather than vaguely referencing them. It’s literally the best reply and attempt at a counter I’ve received to date, so thanks again. I mean that.
That said, I’ve read all three, and none of them quite say what I’m saying. They touch on it, but none follow the logic all the way through. That’s precisely the gap I’m identifying. Even with the links you’ve so thoughtfully given, I remain alone in my conclusion.
They all acknowledge that competitive dynamics make alignment harder. That alignment taxes create pressure to cut corners. That arms races incentivise risky behaviour.
But none of them go as far as I do. They stop at “this is dangerous and likely to go wrong.” I’m saying alignment is structurally impossible under competitive pressure. That the systems that try to align will be outcompeted by systems that don’t, and so alignment will not just be hard, but will be optimised away by default. There’s a categorical difference between “difficult and failure-prone” and “unachievable in principle due to structural incentives.”
From the 2011 writeup:
Given abundant time and centralized careful efforts to ensure safety, it seems very probable that these risks could be avoided
No. They can’t. That’s my point. As long as we continue developing AI it’s only a matter of time. There is no long term safe way to develop it. Competitive agents will not choose to in order to beat the competition, and when the AI becomes intelligent enough it will simply bypass any barriers we put in place—alignment or whatever else we design—and go about acting optimally. The AGI safety community is trying to tell the rest of the world, that we must be cautious, but for just long enough to design a puzzle that a beyond human understanding level of intelligence cannot solve, then use that puzzle as a cage for said intelligence. Us, with our limited intellect, will create a puzzle that something far beyond us has no solution for. And they’re doing it with a straight face.
I’ve been very careful not to make my claims lightly. I’m aware that the AI safety community has discussed alignment tax, arms races, multipolar scenarios, and so on. But I’ve yet to see someone follow that logic all the way through to where it leads without flinching. That’s the part I believe I’m contributing.
Your point at the end—about it being a “suicide race” rather than an arms race—is interesting. But I’d argue that calling it a suicide race doesn’t dissolve the dynamic. It reframes it, but it doesn’t remove the incentives. Everyone still wants to win. Everyone still optimises. Whether they’re mistaken or not, the incentives remain intact. And the outcome doesn’t change just because we give it a better name.
Competitive agents will not choose to in order to beat the competition
Competitive agents will chose to commit suicide, knowing it’s suicide, to beat the competition? That suggests that we should observe CEOs mass-poisoning their employees, Jonestown-style, in a galaxy-brained attempt to maximize shareholder value. How come that doesn’t happen?
Are you quite sure the underlying issue here is not that the competitive agents don’t believe the suicide race to be a suicide race?
This is a mischaracterisation of the argument. I’m not saying competitive agents knowingly choose extinction. I’m saying the structure of the race incentivises behaviour that leads to extinction, even if no one intends it.
CEOs aren’t mass-poisoning their employees because that would damage their short and long-term competitiveness. But racing to build AGI—cutting corners on alignment, accelerating deployment, offloading responsibility—improves short-term competitiveness, even if it leads to long-term catastrophe. That’s the difference.
And what makes this worse is that even the AGI safety field refuses to frame it in those terms. They don’t call it suicide. They call it difficult. They treat alignment like a hard puzzle to be solved—not a structurally impossible task under competitive pressure.
So yes, I agree with your last sentence. The agents don’t believe it’s a suicide race. But that doesn’t counter my point—it proves it. We’re heading toward extinction not because we want to die, but because the system rewards speed over caution, power over wisdom. And the people who know best still can’t bring themselves to say it plainly.
This is exactly the kind of sleight-of-hand rebuttal that keeps people from engaging with the actual structure of the argument. You’ve reframed it into something absurd, knocked down the strawman, and accidentally reaffirmed the core idea in the process.
Appreciate the thoughtful reply—even if it’s branded as a “thoughtless kneejerk reaction.”
I disagree with your framing that this is just 101-level AGI risk content. The central argument is not that AGI is dangerous. It’s that alignment is structurally impossible under competitive pressure, and that capitalism—while not morally to blame—is simply the most extreme and efficient version of that dynamic.
Most AGI risk discussions stop at “alignment is hard.” I go further: alignment will be optimised away, because any system that isn’t optimising as hard as possible won’t survive the race. That’s not an “entry-level” argument—it’s an uncomfortable one. If you know where this specific line of reasoning has been laid out before, I’d genuinely like to see it. So far, people just say “we’ve heard this before” and fail to cite anything. It’s happened so many times I’ve lost count. Feel free to be the first to buck the trend and link someone making this exact argument, clearly, before I did.
I’m also not “focusing purely on capitalism.” The essay explicitly states that competitive structures—whether between nations, labs, or ideologies—would lead to the same result. Capitalism just accelerates the collapse. That’s not ideological; that’s structural analysis.
The suggestion that I should have reframed this as a way to “tap into anti-capitalist sentiment” misses the point entirely. I’m not trying to sell a message. I’m explaining why we’re already doomed. That distinction matters.
As for the asteroid analogy: your rewrite is clever, but wrong. You assume the people in the room already understand the trajectory. My entire point is that they don’t. They’re still discussing mitigation strategies while refusing to accept that the cause of the asteroid’s trajectory is unchangeable. And the fact that no one can directly refute that logic—only call it “entry-level” or “unhelpful”—kind of proves the point.
So yes, you did skim my essay—with the predictable result. You repeated what many others have already said, without identifying any actual flaws, and misinterpreted as much of it as possible along the way.
Alignment contrasts with control, as a means to AI safety.
Alignment roughly means the AI has goals, or values similar to human ones (which are assumed, without much evidence to be similar across humans), so that it will do what we want , because it’s what it wants.
Control means that it doesn’t matter what the AI wants, if it wants anything.
In short, there is plenty of competitive pressure towards control , because no wants an AI they can’t control. Control is part of capability.
Off the top of my head, this post. More generally, this is an obvious feature of AI arms races in the presence of alignment tax. Here’s a 2011 writeup that lays it out:
I assure you the AI Safety/Alignment field has been widely aware of it since at least that long ago.
Also,
Any (human) system that is optimizing as hard as possible also won’t survive the race. Which hints at what the actual problem is: it’s not even that we’re in an AI arms race, it’s that we’re in an AI suicide race which the people racing incorrectly believe to be an AI arms race. Convincing people of the true nature of what’s happening is therefore a way to dissolve the race dynamic. Arms races are correct strategies to pursue under certain conditions; suicide races aren’t.
I appreciate the links, genuinely—this is the first time someone’s actually tried to point to prior sources rather than vaguely referencing them. It’s literally the best reply and attempt at a counter I’ve received to date, so thanks again. I mean that.
That said, I’ve read all three, and none of them quite say what I’m saying. They touch on it, but none follow the logic all the way through. That’s precisely the gap I’m identifying. Even with the links you’ve so thoughtfully given, I remain alone in my conclusion.
They all acknowledge that competitive dynamics make alignment harder. That alignment taxes create pressure to cut corners. That arms races incentivise risky behaviour.
But none of them go as far as I do. They stop at “this is dangerous and likely to go wrong.” I’m saying alignment is structurally impossible under competitive pressure. That the systems that try to align will be outcompeted by systems that don’t, and so alignment will not just be hard, but will be optimised away by default. There’s a categorical difference between “difficult and failure-prone” and “unachievable in principle due to structural incentives.”
From the 2011 writeup:
No. They can’t. That’s my point. As long as we continue developing AI it’s only a matter of time. There is no long term safe way to develop it. Competitive agents will not choose to in order to beat the competition, and when the AI becomes intelligent enough it will simply bypass any barriers we put in place—alignment or whatever else we design—and go about acting optimally. The AGI safety community is trying to tell the rest of the world, that we must be cautious, but for just long enough to design a puzzle that a beyond human understanding level of intelligence cannot solve, then use that puzzle as a cage for said intelligence. Us, with our limited intellect, will create a puzzle that something far beyond us has no solution for. And they’re doing it with a straight face.
I’ve been very careful not to make my claims lightly. I’m aware that the AI safety community has discussed alignment tax, arms races, multipolar scenarios, and so on. But I’ve yet to see someone follow that logic all the way through to where it leads without flinching. That’s the part I believe I’m contributing.
Your point at the end—about it being a “suicide race” rather than an arms race—is interesting. But I’d argue that calling it a suicide race doesn’t dissolve the dynamic. It reframes it, but it doesn’t remove the incentives. Everyone still wants to win. Everyone still optimises. Whether they’re mistaken or not, the incentives remain intact. And the outcome doesn’t change just because we give it a better name.
Competitive agents will chose to commit suicide, knowing it’s suicide, to beat the competition? That suggests that we should observe CEOs mass-poisoning their employees, Jonestown-style, in a galaxy-brained attempt to maximize shareholder value. How come that doesn’t happen?
Are you quite sure the underlying issue here is not that the competitive agents don’t believe the suicide race to be a suicide race?
This is a mischaracterisation of the argument. I’m not saying competitive agents knowingly choose extinction. I’m saying the structure of the race incentivises behaviour that leads to extinction, even if no one intends it.
CEOs aren’t mass-poisoning their employees because that would damage their short and long-term competitiveness. But racing to build AGI—cutting corners on alignment, accelerating deployment, offloading responsibility—improves short-term competitiveness, even if it leads to long-term catastrophe. That’s the difference.
And what makes this worse is that even the AGI safety field refuses to frame it in those terms. They don’t call it suicide. They call it difficult. They treat alignment like a hard puzzle to be solved—not a structurally impossible task under competitive pressure.
So yes, I agree with your last sentence. The agents don’t believe it’s a suicide race. But that doesn’t counter my point—it proves it. We’re heading toward extinction not because we want to die, but because the system rewards speed over caution, power over wisdom. And the people who know best still can’t bring themselves to say it plainly.
This is exactly the kind of sleight-of-hand rebuttal that keeps people from engaging with the actual structure of the argument. You’ve reframed it into something absurd, knocked down the strawman, and accidentally reaffirmed the core idea in the process.