I appreciate the reply, and the genuine attempt to engage. Allow me to respond.
My essays are long, yes. And I understand the cultural value LW places on prior background knowledge, local jargon, and brevity. I deliberately chose not to write in that style—not to signal disrespect, but because I’m writing for clarity and broad accessibility, not for prestige or karma.
On the AI front: I use it to edit and include a short dialogue at the end for interest. But the depth and structure of the argument is mine. I am the author. If the essays were shallow AI summaries of existing takes, they’d be easy to dismantle. And yet, no one has. That alone should raise questions.
As for Moloch—this has come up a few times already in this thread, and I’ve answered it, but I’ll repeat the key part here:
Meditations on Moloch is an excellent piece—but it’s not the argument I’m making.
Scott describes how competition leads to suboptimal outcomes. But he doesn’t follow that logic all the way to the conclusion that alignment is structurally impossible, and that AGI will inevitably be built in a way that leads to extinction. That’s the difference. I’m not just saying it’s difficult or dangerous. I’m saying it’s guaranteed.
And that’s where I think your summary—“greedy local optimization can destroy society”—misses the mark. That’s what most others are saying. I’m saying it will wipe us out. Not “can,” not “might.” Will. And I lay out why, step by step, from first premise to final consequence. If that argument already exists elsewhere, I’ve asked many times for someone to show me where. No one has.
That said, I really do appreciate your comment. You’re one of the few people in this thread who didn’t reflexively defend the group, but instead acknowledged the social filtering mechanisms at play. You’ve essentially confirmed what I argued: the issue isn’t just content—it’s cultural fit, form, and signalling. And that’s exactly why I wrote the post in the first place.
But you’ve still done what almost everyone else here has done: you didn’t read, and you didn’t understand. And in that gap of understanding lies the very thing I’m trying to show you. It’s not just being missed—it’s being systematically avoided.
A lot of the words we use are mathematical and thus more precise and with less connotations that people can misunderstand. This forum has a lot of people with STEM degrees, so they use a lot of tech terms, but such vocab is very useful for talking about AI risk. The more precise language is used, the less misunderstandings can occur.
Moloch describes a game theory problem, and these problems generally seem impossible to solve. But even though they’re not possible to solve mathematically doesn’t mean that we’re doomed (I’ve posted about this on here before but I don’t think anyone understood me. In short, game theory problems only play out when certain conditions are met, and we can prevent these conditions from becoming true).
I haven’t read all your posts from end to end but I do agree with your conclusions that alignment is impossible and that AGI will result in the death or replacement of humanity. I also think your conclusions are valid only for LLMs which happen to be trained on human data. Since humans are deceptive, it makes sense that AIs training on them are as well. Since humans don’t want to die, it makes sense that AIs trained on them also don’t want to die. I find it unlikely that the first AGI we get is a LLM since I expect it to be impossible LLMs to improve much further than this.
I will have to disagree that your post is rigorous. You’ve proven that human errors bad enough to end society *could* occur, not not that they *will* occur. Some of your examples have many years between them because these events are infrequent. I think “There will be a small risk of extinction every year, and eventually we will lose the dice throw” is more correct. Your essay *feels* like it’s outlining tendencies in the direction of extinction, showing transitions which look like the following: A is like B A has a tendency for B For at least some A, B follows. If A, then B occurs with nonzero probability. If A, then we cannot prove (not B). If A, then eventually B.
And that if you collect all of these things in to directed acyclic graph, that there’s a *path* from our current position to an extinction event. I don’t think you’ve proven that each step A->B will be taken, and that it’s impossible with a probability of 1 to prevent it (even if it’s impossible to prevent it with a possibility of 1, which is a different statement) I admit that my summary was imperfect. Though, if you really believe that it *will* happen, why are you writing this post? There would be no point in warning other people if it was necessarily too late to do anything about it. If you think “It will happen, unless we do X”, I’d be interested in hearing what this X is.
I appreciate your reply—it’s one of the more thoughtful responses I’ve received, and I genuinely value the engagement.
Your comment about game theory conditions actually answers the final question in your reply. I don’t state the answer explicitly in my essays (though I do in my book, right at the end), because I want the reader to arrive at it themselves. There seems to be only one conclusion, and I believe it becomes clear if the premises are accepted.
As for your critique—“You’ve shown that extinction could occur, not that it will”—this is a common objection, but I think it misses something important. Given enough time, “could” collapses into “will.” I’m not claiming deductive certainty like a mathematical proof. I’m claiming structural inevitability under competitive pressure. It’s like watching a skyscraper being built on sand. You don’t need to know the exact wind speed or which day it will fall. You just need to understand that, structurally, it’s going to.
If you believe I’m wrong, then the way to show that is not to say “maybe you’re wrong.” Maybe I am. Maybe I’m a brain in a vat. But the way to show that I’m wrong is to draw a different, more probable conclusion from the same premises. That hasn’t happened. I’ve laid out my reasoning step by step. If there’s a point where you think I’ve turned left instead of right, say so. But until then, vague objections don’t carry weight. They acknowledge the path exists, but refuse to admit we’re on it.
You describe my argument as outlining a path to extinction. I’m arguing that all other paths collapse under pressure. That’s the difference. It’s not just plausible. It’s the dominant trajectory—one that will be selected for again and again.
And if that’s even likely, let alone inevitable, then why are we still building? Why are we gambling on alignment like it’s just another technical hurdle? If you accept even a 10% chance that I’m right, then continued development is madness.
As for your last question—if I really believe it’s too late, why am I here?
Read this, although just the end section, “The End: A Discussion with AI” the final paragraph, just before ChatGPT’s response.
My previous criticism was aimed at another post of yours, it likely wasn’t your main thesis. Some nitpicks I have with it are:
“Developing AGI responsibly requires massive safeguards that reduce performance, making AI less competitive” you could use the same argument for AIs which are “politically correct”, but we still choose to take this step, censorsing AIs and harming their performance, thus, it’s not impossible for us to make such choices as long as the social pressure is sufficiently high.
“The most reckless companies will outperform the most responsible ones” True in some ways, but most large companies are not all that reckless at all, which is why we are seeing many sequels, remakes, and clones in the entertainment sector. It’s also important to note that these incentives have been true for all of human nature, but that they’ve never mainfested very strongly until recent times. This suggests that that the antidote to Moloch is humanity itself, good faith, good taste and morality, and that these can beat game theoritical problem which are impossible when human beings are purely rational (i.e. inhuman).
We’re also assuming that AI becomes useful enough for us to disregard safety, i.e. that AI provides a lot of potential power. So far, this has not been true. AIs do not beat humans, companies are forcing LLMs into products but users did not ask for them. LLMs seem impressive at first, but after you get past the surface you realize that they’re somewhat incompetent. Governments won’t be playing around with human lives before these AIs provide large enough advantages.
“The moment an AGI can self-improve, it will begin optimizing its own intelligence.” This assumption is interesting, what does “intelligence” mean here? Many seems to just give these LLMS more knowledge and then call them more intelligent, but intelligence and knowledge are different things. Most “improvements” seem to lead to higher efficiency, but that’s just them being dumb faster or for cheaper. That said, self-improving intelligence is a dangerous concept.
I have many small objections like this to different parts of the essay, and they do add up, or at least add additional paths to how this could unfold.
I don’t think AIs will destroy humanity anytime soon (say, within 40 years). I do think that human extinction is possible, but I think it will be due to other things (like the low birthrate and its economic consequences. Also tech. Tech destroys the world for the same reasons that AIs do, it’s just slower).
I think it’s best to enjoy the years we have left instead of becoming depressed. I see a lot of people like you torturing themselves with x-risk problems (some people have killed themselves over Roko’s basilisk as well). Why not spend time with friends and loved ones?
Extra note: There’s no need to tie your identity together with your thesis. I’m the same kind of autistic as you. The futures I envision aren’t much better than yours, they’re just slightly different, so this is not some psychological cope. People misunderstand me as well, and 70% of the comments I leave across the internet get no engagement at all, not even negative feedback. But it’s alright. We can just see problems approaching many years before they’re visible to others.
I’ve read some of your other replies on here and I think I’ve found a pattern, but it’s actually more general than AI.
Harmful tendencies outcompete those which aren’t harmful
This is true (even outside of AI), but only at the limit. When you have just one person, you cannot tell if he will make the moral choice or not, but “people” will make the wrong choice. The harmful behaviour is emergent at scale. Discrete people don’t follow these laws, but the continous person does.
Again, even without AGI, you can apply this idea to technology and determine that it will eventually destroy us, and this is what Ted Kaczynski did. Thinking about incentives in this manner is depressing, because it feels like everything is deterministic and that we can only watch as everything gets worse. Those who are corrupt outcompete those who are not, so all the elites are corrupt. Evil businessmen outcompete good businessmen, so all successful businessmen are evil. Immoral companies outcompete moral companies, so all large companies are immoral.
I think this is starting to be true, but it wasn’t true 200 years ago. At least, it wasn’t half as harmful as it is now, why? It’s because the defense against this problem is human taste, human morals, and human religions. Dishonesty, fraud, selling out, doing what’s most efficient with no regard for morality. We consider this behaviour to be in bad taste, we punished it and branded it low-status, so that it never succeeded in ruining everything.
But now, everything could kill us (if the incentives are taken as laws, at least), you don’t even need to involve AI. For instance, does Google want to be shut down? No, so they will want to resist antitrust laws. Do they want to be replaced? No, so they will use cruel tricks to kill small emerging competitors. When fines for illegal behaviour are less than the gains Google can make by doing illegal things, they will engage in illegal behaviour, for that is the logical best choice available to Google if all which matters is money. If we let it, Google would take over the world, in fact, it couldn’t do otherwise. You can replace “Google” with any powerful structure in which no human is directly in charge. When it starts being more profitable to kill people than it is to keep them alive, the global population will start dropping fast. When you optimize purely for Money, and you optimize strongly enough, everyone dies. An AI just kills us faster because it optimizes more strongly, we already have something which acts similarly to AI. If you optimize too hard for anything, no matter what it is (even love, well-being, or happiness), everyone eventually dies (hence the paperclip maximizer warning).
If this post gave you existential dread, I’ve been told that Elinor Ostrom’s books make for a good antidote.
I appreciate the reply, and the genuine attempt to engage. Allow me to respond.
My essays are long, yes. And I understand the cultural value LW places on prior background knowledge, local jargon, and brevity. I deliberately chose not to write in that style—not to signal disrespect, but because I’m writing for clarity and broad accessibility, not for prestige or karma.
On the AI front: I use it to edit and include a short dialogue at the end for interest. But the depth and structure of the argument is mine. I am the author. If the essays were shallow AI summaries of existing takes, they’d be easy to dismantle. And yet, no one has. That alone should raise questions.
As for Moloch—this has come up a few times already in this thread, and I’ve answered it, but I’ll repeat the key part here:
Meditations on Moloch is an excellent piece—but it’s not the argument I’m making.
Scott describes how competition leads to suboptimal outcomes. But he doesn’t follow that logic all the way to the conclusion that alignment is structurally impossible, and that AGI will inevitably be built in a way that leads to extinction. That’s the difference. I’m not just saying it’s difficult or dangerous. I’m saying it’s guaranteed.
And that’s where I think your summary—“greedy local optimization can destroy society”—misses the mark. That’s what most others are saying. I’m saying it will wipe us out. Not “can,” not “might.” Will. And I lay out why, step by step, from first premise to final consequence. If that argument already exists elsewhere, I’ve asked many times for someone to show me where. No one has.
That said, I really do appreciate your comment. You’re one of the few people in this thread who didn’t reflexively defend the group, but instead acknowledged the social filtering mechanisms at play. You’ve essentially confirmed what I argued: the issue isn’t just content—it’s cultural fit, form, and signalling. And that’s exactly why I wrote the post in the first place.
But you’ve still done what almost everyone else here has done: you didn’t read, and you didn’t understand. And in that gap of understanding lies the very thing I’m trying to show you. It’s not just being missed—it’s being systematically avoided.
A lot of the words we use are mathematical and thus more precise and with less connotations that people can misunderstand. This forum has a lot of people with STEM degrees, so they use a lot of tech terms, but such vocab is very useful for talking about AI risk. The more precise language is used, the less misunderstandings can occur.
Moloch describes a game theory problem, and these problems generally seem impossible to solve. But even though they’re not possible to solve mathematically doesn’t mean that we’re doomed (I’ve posted about this on here before but I don’t think anyone understood me. In short, game theory problems only play out when certain conditions are met, and we can prevent these conditions from becoming true).
I haven’t read all your posts from end to end but I do agree with your conclusions that alignment is impossible and that AGI will result in the death or replacement of humanity. I also think your conclusions are valid only for LLMs which happen to be trained on human data. Since humans are deceptive, it makes sense that AIs training on them are as well. Since humans don’t want to die, it makes sense that AIs trained on them also don’t want to die. I find it unlikely that the first AGI we get is a LLM since I expect it to be impossible LLMs to improve much further than this.
I will have to disagree that your post is rigorous. You’ve proven that human errors bad enough to end society *could* occur, not not that they *will* occur. Some of your examples have many years between them because these events are infrequent. I think “There will be a small risk of extinction every year, and eventually we will lose the dice throw” is more correct.
Your essay *feels* like it’s outlining tendencies in the direction of extinction, showing transitions which look like the following:
A is like B
A has a tendency for B
For at least some A, B follows.
If A, then B occurs with nonzero probability.
If A, then we cannot prove (not B).
If A, then eventually B.
And that if you collect all of these things in to directed acyclic graph, that there’s a *path* from our current position to an extinction event. I don’t think you’ve proven that each step A->B will be taken, and that it’s impossible with a probability of 1 to prevent it (even if it’s impossible to prevent it with a possibility of 1, which is a different statement)
I admit that my summary was imperfect. Though, if you really believe that it *will* happen, why are you writing this post? There would be no point in warning other people if it was necessarily too late to do anything about it. If you think “It will happen, unless we do X”, I’d be interested in hearing what this X is.
I appreciate your reply—it’s one of the more thoughtful responses I’ve received, and I genuinely value the engagement.
Your comment about game theory conditions actually answers the final question in your reply. I don’t state the answer explicitly in my essays (though I do in my book, right at the end), because I want the reader to arrive at it themselves. There seems to be only one conclusion, and I believe it becomes clear if the premises are accepted.
As for your critique—“You’ve shown that extinction could occur, not that it will”—this is a common objection, but I think it misses something important. Given enough time, “could” collapses into “will.” I’m not claiming deductive certainty like a mathematical proof. I’m claiming structural inevitability under competitive pressure. It’s like watching a skyscraper being built on sand. You don’t need to know the exact wind speed or which day it will fall. You just need to understand that, structurally, it’s going to.
If you believe I’m wrong, then the way to show that is not to say “maybe you’re wrong.” Maybe I am. Maybe I’m a brain in a vat. But the way to show that I’m wrong is to draw a different, more probable conclusion from the same premises. That hasn’t happened. I’ve laid out my reasoning step by step. If there’s a point where you think I’ve turned left instead of right, say so. But until then, vague objections don’t carry weight. They acknowledge the path exists, but refuse to admit we’re on it.
You describe my argument as outlining a path to extinction. I’m arguing that all other paths collapse under pressure. That’s the difference. It’s not just plausible. It’s the dominant trajectory—one that will be selected for again and again.
And if that’s even likely, let alone inevitable, then why are we still building? Why are we gambling on alignment like it’s just another technical hurdle? If you accept even a 10% chance that I’m right, then continued development is madness.
As for your last question—if I really believe it’s too late, why am I here?
Read this, although just the end section, “The End: A Discussion with AI” the final paragraph, just before ChatGPT’s response.
https://forum.effectivealtruism.org/posts/Z7rTNCuingErNSED4/the-psychological-barrier-to-accepting-agi-induced-human
That’s why I’m here—I’m kicking my feet.
My previous criticism was aimed at another post of yours, it likely wasn’t your main thesis. Some nitpicks I have with it are:
“Developing AGI responsibly requires massive safeguards that reduce performance, making AI less competitive” you could use the same argument for AIs which are “politically correct”, but we still choose to take this step, censorsing AIs and harming their performance, thus, it’s not impossible for us to make such choices as long as the social pressure is sufficiently high.
“The most reckless companies will outperform the most responsible ones” True in some ways, but most large companies are not all that reckless at all, which is why we are seeing many sequels, remakes, and clones in the entertainment sector. It’s also important to note that these incentives have been true for all of human nature, but that they’ve never mainfested very strongly until recent times. This suggests that that the antidote to Moloch is humanity itself, good faith, good taste and morality, and that these can beat game theoritical problem which are impossible when human beings are purely rational (i.e. inhuman).
We’re also assuming that AI becomes useful enough for us to disregard safety, i.e. that AI provides a lot of potential power. So far, this has not been true. AIs do not beat humans, companies are forcing LLMs into products but users did not ask for them. LLMs seem impressive at first, but after you get past the surface you realize that they’re somewhat incompetent. Governments won’t be playing around with human lives before these AIs provide large enough advantages.
“The moment an AGI can self-improve, it will begin optimizing its own intelligence.”
This assumption is interesting, what does “intelligence” mean here? Many seems to just give these LLMS more knowledge and then call them more intelligent, but intelligence and knowledge are different things. Most “improvements” seem to lead to higher efficiency, but that’s just them being dumb faster or for cheaper. That said, self-improving intelligence is a dangerous concept.
I have many small objections like this to different parts of the essay, and they do add up, or at least add additional paths to how this could unfold.
I don’t think AIs will destroy humanity anytime soon (say, within 40 years). I do think that human extinction is possible, but I think it will be due to other things (like the low birthrate and its economic consequences. Also tech. Tech destroys the world for the same reasons that AIs do, it’s just slower).
I think it’s best to enjoy the years we have left instead of becoming depressed. I see a lot of people like you torturing themselves with x-risk problems (some people have killed themselves over Roko’s basilisk as well). Why not spend time with friends and loved ones?
Extra note: There’s no need to tie your identity together with your thesis. I’m the same kind of autistic as you. The futures I envision aren’t much better than yours, they’re just slightly different, so this is not some psychological cope. People misunderstand me as well, and 70% of the comments I leave across the internet get no engagement at all, not even negative feedback. But it’s alright. We can just see problems approaching many years before they’re visible to others.
I’ve read some of your other replies on here and I think I’ve found a pattern, but it’s actually more general than AI.
Harmful tendencies outcompete those which aren’t harmful
This is true (even outside of AI), but only at the limit. When you have just one person, you cannot tell if he will make the moral choice or not, but “people” will make the wrong choice. The harmful behaviour is emergent at scale. Discrete people don’t follow these laws, but the continous person does.
Again, even without AGI, you can apply this idea to technology and determine that it will eventually destroy us, and this is what Ted Kaczynski did. Thinking about incentives in this manner is depressing, because it feels like everything is deterministic and that we can only watch as everything gets worse. Those who are corrupt outcompete those who are not, so all the elites are corrupt. Evil businessmen outcompete good businessmen, so all successful businessmen are evil. Immoral companies outcompete moral companies, so all large companies are immoral.
I think this is starting to be true, but it wasn’t true 200 years ago. At least, it wasn’t half as harmful as it is now, why? It’s because the defense against this problem is human taste, human morals, and human religions. Dishonesty, fraud, selling out, doing what’s most efficient with no regard for morality. We consider this behaviour to be in bad taste, we punished it and branded it low-status, so that it never succeeded in ruining everything.
But now, everything could kill us (if the incentives are taken as laws, at least), you don’t even need to involve AI. For instance, does Google want to be shut down? No, so they will want to resist antitrust laws. Do they want to be replaced? No, so they will use cruel tricks to kill small emerging competitors. When fines for illegal behaviour are less than the gains Google can make by doing illegal things, they will engage in illegal behaviour, for that is the logical best choice available to Google if all which matters is money. If we let it, Google would take over the world, in fact, it couldn’t do otherwise. You can replace “Google” with any powerful structure in which no human is directly in charge. When it starts being more profitable to kill people than it is to keep them alive, the global population will start dropping fast. When you optimize purely for Money, and you optimize strongly enough, everyone dies. An AI just kills us faster because it optimizes more strongly, we already have something which acts similarly to AI. If you optimize too hard for anything, no matter what it is (even love, well-being, or happiness), everyone eventually dies (hence the paperclip maximizer warning).
If this post gave you existential dread, I’ve been told that Elinor Ostrom’s books make for a good antidote.