2: You use AI in your posts, but AIs aren’t able to produce high enough quality that it’s worth posting.
3: Some of your ideas have already been discovered before and have a name on here. “Moloch” for instance is the personification of bad nash equilibriums in game theory. It generally annoys people if you don’t make yourself familiar with the background information of the community before posting, but it’s a lot of work to do so.
Your conclusion is correct, but it boils down to very little: “greedy local optimization can destroy society”. People who already know that likely don’t want to read 30 pages which makes the same point. “Capitalism” was likely the closest word you knew, but there’s many better words, and you sadly have to be a bit of a nerd to know a lot of useful words.
Here’s where I think you’re right:
This is not a individualist website for classic nerds with autism who are interested in niche topics, it’s a social and collectivism community for intellectual elites who care about social status and profits.
Objective truth is not of the highest value. Users care about their image and reputation. Users care about how things are interpreted (and not just what’s written). Users are afraid of controversies. A blunt but correct answer might net you less karma than a wrong answer which shows good-will. Users value form—how good of a writer you are will influence the karma, regardless of how correct or valuable your idea is. Verbal intelligence is valued more than other forms. The userbase has a left-wing bias, and as does the internet (as if about 8 years ago), so you can find lots of sources which argue in favor of things which are just objectively not true. But it’s often difficult to find a source which disproves the thing, as they’re burried. Finally, as a social website, people value authority and reputation/prestige, and it’s likely that the websites they feel are “trustworthy” only include those controlled by left-wing elites. Users value knowledge more than they value intelligence. They also value experience, but only when some public institution approves of it. They care if you have a PhD, they don’t care if you have researched something for 5 years in your own free time.
You’re feeling the consequences of both. I think most of the negative reaction comes from my first 3 points, and that the way it manifests is a result of the social dynamics.
They care if you have a PhD, they don’t care if you have researched something for 5 years in your own free time.
I don’t think this is right. If anything, the median lw user would be more likely to trust a random blogger who researched a topic on their own for 5 years vs a PhD, assuming the blogger is good at presenting their ideas in a persuasive manner.
I’m afraid “Good at presenting their ideas in a persuasive manner” is doing all the heavy lifting here.
If the community had a good impression of him, they’d value his research over that of a PhD. If the community had a bad impression of him, they’d not give a second of thought towards his “research” and they would refer to it with the same mocking quotation marks that I just used. However, in the latter case, they’d find it more difficult to dismiss his PhD.
In other words, the interpretation depends if the community likes you or not. I’ve been in other rationalist communities and I’m speaking from experience (if I’m less vague that this, I’d be recognizable, which I don’t want to be). I saw all the negative social dynamics that you’d find on Reddit or in young female friend groups with a lot of “drama” going on, in case you’re unfortunate enough to have an intuition for such a thing.
In any “normie” community there’s the staff in charge, and a large number of regular users who are somewhat above the law, and who feel superior to new users (and can bully them all they want, as they’re friends with the staff). The treatment of users users depend on how well they fit in culturally, and it requires that they act as if the regulars are special (otherwise their ego is hurt). Of course, some of these effects are borderline invisible on this website, so they’re either well-hidden or kept in check.
Still, this is not a truth-maximizing website, the social dynamics and their false premises (e.g. the belief that popularity is a measure of quality) are just too strong. The sort of intllectuals who don’t care about social norms, status or money are better at truth-seeking and generally received poorly by places like this.
I mean, I agree with this, but popularity has a better correlation with truth here compared with any other website—or more broadly, social group—that I know of. And actually, I think it’s probably not possible for a relatively open venue like this to be perfectly truth-seeking. To go further in that direction, I think you ultimately need some sort of institutional design to explicitly reward accuracy, like prediction markets. But the ways in which LW differs from pure truth-and-importance-seeking don’t strike me as entirely bad things either—posts which are inspiring or funny get upvoted more, for instance. I think it would be difficult to nucleate a community focused on truth-seeking without “emotional energy” of this sort.
I don’t think it’s possible without changing the people into weird types who really don’t care too much about the social aspects of life because they’re so interested in the topics at hand. You can try rewarding truth, but people still stumble into issues regarding morality, popularity of ideas, the overton window, some political group that they dislike randomly hitting upon the truth so that they look like supporters for stating the same thing, etc.
I think prediction markets are an interesting concept, but it cannot be taken much further than it is now, since the predictions could start influencing the outcomes. It’s dangerous to add rewards to the outcomes of predictions, for when enough money is involved, one can influence the outcome.
The way humans in general differ from truth-seeking agents makes their performance downright horrible on some specific areas (if the truth is not in the overton window for instance). These inaccuracies can cascade and cause problems elsewhere, since they cause incorrect worldviews even in somewhat intelligent people like Musk. There’s also a lot of information which is simply getting deleted from the internet, and you can’t “weight both sides of the argument” if half the argument is only visible on the waybackmachine or archive.md.
I guess it’s important to create a good atmosphere and that everyone is having fun theorizing and such, but some of the topics we’re discussing are actually serious. The well-being of millions of people depend on the sort of answers and perspectives which float around puvlic discourse, and I find it pathetic that ideas are immediately shut down if they’re not worded correctly or if they touch a growing list of socially forbidden hypotheses.
Finally, these alternative rewards have completely destroyed almost all voting systems on the internet. There’s almost no website left on which the karma/thumb/upvote/like count bears any resemblence to post quality anymore. Instead, it’s a linear combination of superstimuli like ‘relatability’, ‘novelty’, ‘feeling of importance (e.g. bad news, danger)’, ‘cuteness’, ‘escapism’, ‘sexual fantasy’, ‘romantic fantasy’, ‘boo outgroup’, ‘irony/parody/parody of parody/self-parady/nihilism’, ‘nostalgia’, ‘stupidity’ (I’m told it’s a kind of humor if you’re stupid on purpose, but I think “irony” is a defence mechanism against social judgement). It’s like a view into the unfulfilled needs of the population. Youtube view count and subscriptions, Reddit karma, Twitter retweets, all almost gamed to the point that they’re useless metrics. Online review sites are going in the same direction. It’s like interacting with a group of mentally ill people who decide what you’re paid each day. I think it’s dangerous to upvote comments based on vibes as it takes very little to corrupt these metrics, and it’s hard to notice if upvotes gradually come to represent “dopamine released by reading” or something other than quality/truthfulness.
In theory, I’d agree with you. That’s how lesswrong presents itself: truth-seeking above credentials. But in practice, that’s not what I’ve experienced. And that’s not just my experience, it’s also what LW has a reputation for. I don’t take reputations at face value, but lived experience tends to bring them into sharp focus.
If someone without status writes something long, unfamiliar, or culturally out-of-sync with LW norms—even if the logic is sound—it gets downvoted or dismissed as “political,” “entry-level,” or “not useful.” Meanwhile, posts by established names or well-known insiders get far more patience and engagement, even when the content overlaps.
You say a self-taught blogger would be trusted if they’re good at presenting ideas persuasively. But that’s exactly the issue—truth is supposed to matter more than form. Ideas stand on their own merit, not on an appeal to authority. And yet persuasion, style, tone, and in-group fluency still dominate the reception. That’s not rationalism. That’s social filtering.
So while I appreciate the ideal, I think it’s important to distinguish it from the reality. The gap between the two is part of what my post is addressing.
I appreciate the reply, and the genuine attempt to engage. Allow me to respond.
My essays are long, yes. And I understand the cultural value LW places on prior background knowledge, local jargon, and brevity. I deliberately chose not to write in that style—not to signal disrespect, but because I’m writing for clarity and broad accessibility, not for prestige or karma.
On the AI front: I use it to edit and include a short dialogue at the end for interest. But the depth and structure of the argument is mine. I am the author. If the essays were shallow AI summaries of existing takes, they’d be easy to dismantle. And yet, no one has. That alone should raise questions.
As for Moloch—this has come up a few times already in this thread, and I’ve answered it, but I’ll repeat the key part here:
Meditations on Moloch is an excellent piece—but it’s not the argument I’m making.
Scott describes how competition leads to suboptimal outcomes. But he doesn’t follow that logic all the way to the conclusion that alignment is structurally impossible, and that AGI will inevitably be built in a way that leads to extinction. That’s the difference. I’m not just saying it’s difficult or dangerous. I’m saying it’s guaranteed.
And that’s where I think your summary—“greedy local optimization can destroy society”—misses the mark. That’s what most others are saying. I’m saying it will wipe us out. Not “can,” not “might.” Will. And I lay out why, step by step, from first premise to final consequence. If that argument already exists elsewhere, I’ve asked many times for someone to show me where. No one has.
That said, I really do appreciate your comment. You’re one of the few people in this thread who didn’t reflexively defend the group, but instead acknowledged the social filtering mechanisms at play. You’ve essentially confirmed what I argued: the issue isn’t just content—it’s cultural fit, form, and signalling. And that’s exactly why I wrote the post in the first place.
But you’ve still done what almost everyone else here has done: you didn’t read, and you didn’t understand. And in that gap of understanding lies the very thing I’m trying to show you. It’s not just being missed—it’s being systematically avoided.
A lot of the words we use are mathematical and thus more precise and with less connotations that people can misunderstand. This forum has a lot of people with STEM degrees, so they use a lot of tech terms, but such vocab is very useful for talking about AI risk. The more precise language is used, the less misunderstandings can occur.
Moloch describes a game theory problem, and these problems generally seem impossible to solve. But even though they’re not possible to solve mathematically doesn’t mean that we’re doomed (I’ve posted about this on here before but I don’t think anyone understood me. In short, game theory problems only play out when certain conditions are met, and we can prevent these conditions from becoming true).
I haven’t read all your posts from end to end but I do agree with your conclusions that alignment is impossible and that AGI will result in the death or replacement of humanity. I also think your conclusions are valid only for LLMs which happen to be trained on human data. Since humans are deceptive, it makes sense that AIs training on them are as well. Since humans don’t want to die, it makes sense that AIs trained on them also don’t want to die. I find it unlikely that the first AGI we get is a LLM since I expect it to be impossible LLMs to improve much further than this.
I will have to disagree that your post is rigorous. You’ve proven that human errors bad enough to end society *could* occur, not not that they *will* occur. Some of your examples have many years between them because these events are infrequent. I think “There will be a small risk of extinction every year, and eventually we will lose the dice throw” is more correct. Your essay *feels* like it’s outlining tendencies in the direction of extinction, showing transitions which look like the following: A is like B A has a tendency for B For at least some A, B follows. If A, then B occurs with nonzero probability. If A, then we cannot prove (not B). If A, then eventually B.
And that if you collect all of these things in to directed acyclic graph, that there’s a *path* from our current position to an extinction event. I don’t think you’ve proven that each step A->B will be taken, and that it’s impossible with a probability of 1 to prevent it (even if it’s impossible to prevent it with a possibility of 1, which is a different statement) I admit that my summary was imperfect. Though, if you really believe that it *will* happen, why are you writing this post? There would be no point in warning other people if it was necessarily too late to do anything about it. If you think “It will happen, unless we do X”, I’d be interested in hearing what this X is.
I appreciate your reply—it’s one of the more thoughtful responses I’ve received, and I genuinely value the engagement.
Your comment about game theory conditions actually answers the final question in your reply. I don’t state the answer explicitly in my essays (though I do in my book, right at the end), because I want the reader to arrive at it themselves. There seems to be only one conclusion, and I believe it becomes clear if the premises are accepted.
As for your critique—“You’ve shown that extinction could occur, not that it will”—this is a common objection, but I think it misses something important. Given enough time, “could” collapses into “will.” I’m not claiming deductive certainty like a mathematical proof. I’m claiming structural inevitability under competitive pressure. It’s like watching a skyscraper being built on sand. You don’t need to know the exact wind speed or which day it will fall. You just need to understand that, structurally, it’s going to.
If you believe I’m wrong, then the way to show that is not to say “maybe you’re wrong.” Maybe I am. Maybe I’m a brain in a vat. But the way to show that I’m wrong is to draw a different, more probable conclusion from the same premises. That hasn’t happened. I’ve laid out my reasoning step by step. If there’s a point where you think I’ve turned left instead of right, say so. But until then, vague objections don’t carry weight. They acknowledge the path exists, but refuse to admit we’re on it.
You describe my argument as outlining a path to extinction. I’m arguing that all other paths collapse under pressure. That’s the difference. It’s not just plausible. It’s the dominant trajectory—one that will be selected for again and again.
And if that’s even likely, let alone inevitable, then why are we still building? Why are we gambling on alignment like it’s just another technical hurdle? If you accept even a 10% chance that I’m right, then continued development is madness.
As for your last question—if I really believe it’s too late, why am I here?
Read this, although just the end section, “The End: A Discussion with AI” the final paragraph, just before ChatGPT’s response.
My previous criticism was aimed at another post of yours, it likely wasn’t your main thesis. Some nitpicks I have with it are:
“Developing AGI responsibly requires massive safeguards that reduce performance, making AI less competitive” you could use the same argument for AIs which are “politically correct”, but we still choose to take this step, censorsing AIs and harming their performance, thus, it’s not impossible for us to make such choices as long as the social pressure is sufficiently high.
“The most reckless companies will outperform the most responsible ones” True in some ways, but most large companies are not all that reckless at all, which is why we are seeing many sequels, remakes, and clones in the entertainment sector. It’s also important to note that these incentives have been true for all of human nature, but that they’ve never mainfested very strongly until recent times. This suggests that that the antidote to Moloch is humanity itself, good faith, good taste and morality, and that these can beat game theoritical problem which are impossible when human beings are purely rational (i.e. inhuman).
We’re also assuming that AI becomes useful enough for us to disregard safety, i.e. that AI provides a lot of potential power. So far, this has not been true. AIs do not beat humans, companies are forcing LLMs into products but users did not ask for them. LLMs seem impressive at first, but after you get past the surface you realize that they’re somewhat incompetent. Governments won’t be playing around with human lives before these AIs provide large enough advantages.
“The moment an AGI can self-improve, it will begin optimizing its own intelligence.” This assumption is interesting, what does “intelligence” mean here? Many seems to just give these LLMS more knowledge and then call them more intelligent, but intelligence and knowledge are different things. Most “improvements” seem to lead to higher efficiency, but that’s just them being dumb faster or for cheaper. That said, self-improving intelligence is a dangerous concept.
I have many small objections like this to different parts of the essay, and they do add up, or at least add additional paths to how this could unfold.
I don’t think AIs will destroy humanity anytime soon (say, within 40 years). I do think that human extinction is possible, but I think it will be due to other things (like the low birthrate and its economic consequences. Also tech. Tech destroys the world for the same reasons that AIs do, it’s just slower).
I think it’s best to enjoy the years we have left instead of becoming depressed. I see a lot of people like you torturing themselves with x-risk problems (some people have killed themselves over Roko’s basilisk as well). Why not spend time with friends and loved ones?
Extra note: There’s no need to tie your identity together with your thesis. I’m the same kind of autistic as you. The futures I envision aren’t much better than yours, they’re just slightly different, so this is not some psychological cope. People misunderstand me as well, and 70% of the comments I leave across the internet get no engagement at all, not even negative feedback. But it’s alright. We can just see problems approaching many years before they’re visible to others.
I’ve read some of your other replies on here and I think I’ve found a pattern, but it’s actually more general than AI.
Harmful tendencies outcompete those which aren’t harmful
This is true (even outside of AI), but only at the limit. When you have just one person, you cannot tell if he will make the moral choice or not, but “people” will make the wrong choice. The harmful behaviour is emergent at scale. Discrete people don’t follow these laws, but the continous person does.
Again, even without AGI, you can apply this idea to technology and determine that it will eventually destroy us, and this is what Ted Kaczynski did. Thinking about incentives in this manner is depressing, because it feels like everything is deterministic and that we can only watch as everything gets worse. Those who are corrupt outcompete those who are not, so all the elites are corrupt. Evil businessmen outcompete good businessmen, so all successful businessmen are evil. Immoral companies outcompete moral companies, so all large companies are immoral.
I think this is starting to be true, but it wasn’t true 200 years ago. At least, it wasn’t half as harmful as it is now, why? It’s because the defense against this problem is human taste, human morals, and human religions. Dishonesty, fraud, selling out, doing what’s most efficient with no regard for morality. We consider this behaviour to be in bad taste, we punished it and branded it low-status, so that it never succeeded in ruining everything.
But now, everything could kill us (if the incentives are taken as laws, at least), you don’t even need to involve AI. For instance, does Google want to be shut down? No, so they will want to resist antitrust laws. Do they want to be replaced? No, so they will use cruel tricks to kill small emerging competitors. When fines for illegal behaviour are less than the gains Google can make by doing illegal things, they will engage in illegal behaviour, for that is the logical best choice available to Google if all which matters is money. If we let it, Google would take over the world, in fact, it couldn’t do otherwise. You can replace “Google” with any powerful structure in which no human is directly in charge. When it starts being more profitable to kill people than it is to keep them alive, the global population will start dropping fast. When you optimize purely for Money, and you optimize strongly enough, everyone dies. An AI just kills us faster because it optimizes more strongly, we already have something which acts similarly to AI. If you optimize too hard for anything, no matter what it is (even love, well-being, or happiness), everyone eventually dies (hence the paperclip maximizer warning).
If this post gave you existential dread, I’ve been told that Elinor Ostrom’s books make for a good antidote.
First, a few criticisms which I feel are valid:
1: Your posts are quite long.
2: You use AI in your posts, but AIs aren’t able to produce high enough quality that it’s worth posting.
3: Some of your ideas have already been discovered before and have a name on here. “Moloch” for instance is the personification of bad nash equilibriums in game theory. It generally annoys people if you don’t make yourself familiar with the background information of the community before posting, but it’s a lot of work to do so.
Your conclusion is correct, but it boils down to very little: “greedy local optimization can destroy society”. People who already know that likely don’t want to read 30 pages which makes the same point. “Capitalism” was likely the closest word you knew, but there’s many better words, and you sadly have to be a bit of a nerd to know a lot of useful words.
Here’s where I think you’re right:
This is not a individualist website for classic nerds with autism who are interested in niche topics, it’s a social and collectivism community for intellectual elites who care about social status and profits.
Objective truth is not of the highest value.
Users care about their image and reputation.
Users care about how things are interpreted (and not just what’s written).
Users are afraid of controversies. A blunt but correct answer might net you less karma than a wrong answer which shows good-will.
Users value form—how good of a writer you are will influence the karma, regardless of how correct or valuable your idea is. Verbal intelligence is valued more than other forms.
The userbase has a left-wing bias, and as does the internet (as if about 8 years ago), so you can find lots of sources which argue in favor of things which are just objectively not true. But it’s often difficult to find a source which disproves the thing, as they’re burried. Finally, as a social website, people value authority and reputation/prestige, and it’s likely that the websites they feel are “trustworthy” only include those controlled by left-wing elites.
Users value knowledge more than they value intelligence. They also value experience, but only when some public institution approves of it. They care if you have a PhD, they don’t care if you have researched something for 5 years in your own free time.
You’re feeling the consequences of both. I think most of the negative reaction comes from my first 3 points, and that the way it manifests is a result of the social dynamics.
I don’t think this is right. If anything, the median lw user would be more likely to trust a random blogger who researched a topic on their own for 5 years vs a PhD, assuming the blogger is good at presenting their ideas in a persuasive manner.
I’m afraid “Good at presenting their ideas in a persuasive manner” is doing all the heavy lifting here.
If the community had a good impression of him, they’d value his research over that of a PhD. If the community had a bad impression of him, they’d not give a second of thought towards his “research” and they would refer to it with the same mocking quotation marks that I just used. However, in the latter case, they’d find it more difficult to dismiss his PhD.
In other words, the interpretation depends if the community likes you or not. I’ve been in other rationalist communities and I’m speaking from experience (if I’m less vague that this, I’d be recognizable, which I don’t want to be). I saw all the negative social dynamics that you’d find on Reddit or in young female friend groups with a lot of “drama” going on, in case you’re unfortunate enough to have an intuition for such a thing.
In any “normie” community there’s the staff in charge, and a large number of regular users who are somewhat above the law, and who feel superior to new users (and can bully them all they want, as they’re friends with the staff). The treatment of users users depend on how well they fit in culturally, and it requires that they act as if the regulars are special (otherwise their ego is hurt). Of course, some of these effects are borderline invisible on this website, so they’re either well-hidden or kept in check.
Still, this is not a truth-maximizing website, the social dynamics and their false premises (e.g. the belief that popularity is a measure of quality) are just too strong. The sort of intllectuals who don’t care about social norms, status or money are better at truth-seeking and generally received poorly by places like this.
I mean, I agree with this, but popularity has a better correlation with truth here compared with any other website—or more broadly, social group—that I know of. And actually, I think it’s probably not possible for a relatively open venue like this to be perfectly truth-seeking. To go further in that direction, I think you ultimately need some sort of institutional design to explicitly reward accuracy, like prediction markets. But the ways in which LW differs from pure truth-and-importance-seeking don’t strike me as entirely bad things either—posts which are inspiring or funny get upvoted more, for instance. I think it would be difficult to nucleate a community focused on truth-seeking without “emotional energy” of this sort.
I don’t think it’s possible without changing the people into weird types who really don’t care too much about the social aspects of life because they’re so interested in the topics at hand. You can try rewarding truth, but people still stumble into issues regarding morality, popularity of ideas, the overton window, some political group that they dislike randomly hitting upon the truth so that they look like supporters for stating the same thing, etc.
I think prediction markets are an interesting concept, but it cannot be taken much further than it is now, since the predictions could start influencing the outcomes. It’s dangerous to add rewards to the outcomes of predictions, for when enough money is involved, one can influence the outcome.
The way humans in general differ from truth-seeking agents makes their performance downright horrible on some specific areas (if the truth is not in the overton window for instance). These inaccuracies can cascade and cause problems elsewhere, since they cause incorrect worldviews even in somewhat intelligent people like Musk. There’s also a lot of information which is simply getting deleted from the internet, and you can’t “weight both sides of the argument” if half the argument is only visible on the waybackmachine or archive.md.
I guess it’s important to create a good atmosphere and that everyone is having fun theorizing and such, but some of the topics we’re discussing are actually serious. The well-being of millions of people depend on the sort of answers and perspectives which float around puvlic discourse, and I find it pathetic that ideas are immediately shut down if they’re not worded correctly or if they touch a growing list of socially forbidden hypotheses.
Finally, these alternative rewards have completely destroyed almost all voting systems on the internet. There’s almost no website left on which the karma/thumb/upvote/like count bears any resemblence to post quality anymore. Instead, it’s a linear combination of superstimuli like ‘relatability’, ‘novelty’, ‘feeling of importance (e.g. bad news, danger)’, ‘cuteness’, ‘escapism’, ‘sexual fantasy’, ‘romantic fantasy’, ‘boo outgroup’, ‘irony/parody/parody of parody/self-parady/nihilism’, ‘nostalgia’, ‘stupidity’ (I’m told it’s a kind of humor if you’re stupid on purpose, but I think “irony” is a defence mechanism against social judgement). It’s like a view into the unfulfilled needs of the population. Youtube view count and subscriptions, Reddit karma, Twitter retweets, all almost gamed to the point that they’re useless metrics. Online review sites are going in the same direction. It’s like interacting with a group of mentally ill people who decide what you’re paid each day. I think it’s dangerous to upvote comments based on vibes as it takes very little to corrupt these metrics, and it’s hard to notice if upvotes gradually come to represent “dopamine released by reading” or something other than quality/truthfulness.
In theory, I’d agree with you. That’s how lesswrong presents itself: truth-seeking above credentials. But in practice, that’s not what I’ve experienced. And that’s not just my experience, it’s also what LW has a reputation for. I don’t take reputations at face value, but lived experience tends to bring them into sharp focus.
If someone without status writes something long, unfamiliar, or culturally out-of-sync with LW norms—even if the logic is sound—it gets downvoted or dismissed as “political,” “entry-level,” or “not useful.” Meanwhile, posts by established names or well-known insiders get far more patience and engagement, even when the content overlaps.
You say a self-taught blogger would be trusted if they’re good at presenting ideas persuasively. But that’s exactly the issue—truth is supposed to matter more than form. Ideas stand on their own merit, not on an appeal to authority. And yet persuasion, style, tone, and in-group fluency still dominate the reception. That’s not rationalism. That’s social filtering.
So while I appreciate the ideal, I think it’s important to distinguish it from the reality. The gap between the two is part of what my post is addressing.
I appreciate the reply, and the genuine attempt to engage. Allow me to respond.
My essays are long, yes. And I understand the cultural value LW places on prior background knowledge, local jargon, and brevity. I deliberately chose not to write in that style—not to signal disrespect, but because I’m writing for clarity and broad accessibility, not for prestige or karma.
On the AI front: I use it to edit and include a short dialogue at the end for interest. But the depth and structure of the argument is mine. I am the author. If the essays were shallow AI summaries of existing takes, they’d be easy to dismantle. And yet, no one has. That alone should raise questions.
As for Moloch—this has come up a few times already in this thread, and I’ve answered it, but I’ll repeat the key part here:
Meditations on Moloch is an excellent piece—but it’s not the argument I’m making.
Scott describes how competition leads to suboptimal outcomes. But he doesn’t follow that logic all the way to the conclusion that alignment is structurally impossible, and that AGI will inevitably be built in a way that leads to extinction. That’s the difference. I’m not just saying it’s difficult or dangerous. I’m saying it’s guaranteed.
And that’s where I think your summary—“greedy local optimization can destroy society”—misses the mark. That’s what most others are saying. I’m saying it will wipe us out. Not “can,” not “might.” Will. And I lay out why, step by step, from first premise to final consequence. If that argument already exists elsewhere, I’ve asked many times for someone to show me where. No one has.
That said, I really do appreciate your comment. You’re one of the few people in this thread who didn’t reflexively defend the group, but instead acknowledged the social filtering mechanisms at play. You’ve essentially confirmed what I argued: the issue isn’t just content—it’s cultural fit, form, and signalling. And that’s exactly why I wrote the post in the first place.
But you’ve still done what almost everyone else here has done: you didn’t read, and you didn’t understand. And in that gap of understanding lies the very thing I’m trying to show you. It’s not just being missed—it’s being systematically avoided.
A lot of the words we use are mathematical and thus more precise and with less connotations that people can misunderstand. This forum has a lot of people with STEM degrees, so they use a lot of tech terms, but such vocab is very useful for talking about AI risk. The more precise language is used, the less misunderstandings can occur.
Moloch describes a game theory problem, and these problems generally seem impossible to solve. But even though they’re not possible to solve mathematically doesn’t mean that we’re doomed (I’ve posted about this on here before but I don’t think anyone understood me. In short, game theory problems only play out when certain conditions are met, and we can prevent these conditions from becoming true).
I haven’t read all your posts from end to end but I do agree with your conclusions that alignment is impossible and that AGI will result in the death or replacement of humanity. I also think your conclusions are valid only for LLMs which happen to be trained on human data. Since humans are deceptive, it makes sense that AIs training on them are as well. Since humans don’t want to die, it makes sense that AIs trained on them also don’t want to die. I find it unlikely that the first AGI we get is a LLM since I expect it to be impossible LLMs to improve much further than this.
I will have to disagree that your post is rigorous. You’ve proven that human errors bad enough to end society *could* occur, not not that they *will* occur. Some of your examples have many years between them because these events are infrequent. I think “There will be a small risk of extinction every year, and eventually we will lose the dice throw” is more correct.
Your essay *feels* like it’s outlining tendencies in the direction of extinction, showing transitions which look like the following:
A is like B
A has a tendency for B
For at least some A, B follows.
If A, then B occurs with nonzero probability.
If A, then we cannot prove (not B).
If A, then eventually B.
And that if you collect all of these things in to directed acyclic graph, that there’s a *path* from our current position to an extinction event. I don’t think you’ve proven that each step A->B will be taken, and that it’s impossible with a probability of 1 to prevent it (even if it’s impossible to prevent it with a possibility of 1, which is a different statement)
I admit that my summary was imperfect. Though, if you really believe that it *will* happen, why are you writing this post? There would be no point in warning other people if it was necessarily too late to do anything about it. If you think “It will happen, unless we do X”, I’d be interested in hearing what this X is.
I appreciate your reply—it’s one of the more thoughtful responses I’ve received, and I genuinely value the engagement.
Your comment about game theory conditions actually answers the final question in your reply. I don’t state the answer explicitly in my essays (though I do in my book, right at the end), because I want the reader to arrive at it themselves. There seems to be only one conclusion, and I believe it becomes clear if the premises are accepted.
As for your critique—“You’ve shown that extinction could occur, not that it will”—this is a common objection, but I think it misses something important. Given enough time, “could” collapses into “will.” I’m not claiming deductive certainty like a mathematical proof. I’m claiming structural inevitability under competitive pressure. It’s like watching a skyscraper being built on sand. You don’t need to know the exact wind speed or which day it will fall. You just need to understand that, structurally, it’s going to.
If you believe I’m wrong, then the way to show that is not to say “maybe you’re wrong.” Maybe I am. Maybe I’m a brain in a vat. But the way to show that I’m wrong is to draw a different, more probable conclusion from the same premises. That hasn’t happened. I’ve laid out my reasoning step by step. If there’s a point where you think I’ve turned left instead of right, say so. But until then, vague objections don’t carry weight. They acknowledge the path exists, but refuse to admit we’re on it.
You describe my argument as outlining a path to extinction. I’m arguing that all other paths collapse under pressure. That’s the difference. It’s not just plausible. It’s the dominant trajectory—one that will be selected for again and again.
And if that’s even likely, let alone inevitable, then why are we still building? Why are we gambling on alignment like it’s just another technical hurdle? If you accept even a 10% chance that I’m right, then continued development is madness.
As for your last question—if I really believe it’s too late, why am I here?
Read this, although just the end section, “The End: A Discussion with AI” the final paragraph, just before ChatGPT’s response.
https://forum.effectivealtruism.org/posts/Z7rTNCuingErNSED4/the-psychological-barrier-to-accepting-agi-induced-human
That’s why I’m here—I’m kicking my feet.
My previous criticism was aimed at another post of yours, it likely wasn’t your main thesis. Some nitpicks I have with it are:
“Developing AGI responsibly requires massive safeguards that reduce performance, making AI less competitive” you could use the same argument for AIs which are “politically correct”, but we still choose to take this step, censorsing AIs and harming their performance, thus, it’s not impossible for us to make such choices as long as the social pressure is sufficiently high.
“The most reckless companies will outperform the most responsible ones” True in some ways, but most large companies are not all that reckless at all, which is why we are seeing many sequels, remakes, and clones in the entertainment sector. It’s also important to note that these incentives have been true for all of human nature, but that they’ve never mainfested very strongly until recent times. This suggests that that the antidote to Moloch is humanity itself, good faith, good taste and morality, and that these can beat game theoritical problem which are impossible when human beings are purely rational (i.e. inhuman).
We’re also assuming that AI becomes useful enough for us to disregard safety, i.e. that AI provides a lot of potential power. So far, this has not been true. AIs do not beat humans, companies are forcing LLMs into products but users did not ask for them. LLMs seem impressive at first, but after you get past the surface you realize that they’re somewhat incompetent. Governments won’t be playing around with human lives before these AIs provide large enough advantages.
“The moment an AGI can self-improve, it will begin optimizing its own intelligence.”
This assumption is interesting, what does “intelligence” mean here? Many seems to just give these LLMS more knowledge and then call them more intelligent, but intelligence and knowledge are different things. Most “improvements” seem to lead to higher efficiency, but that’s just them being dumb faster or for cheaper. That said, self-improving intelligence is a dangerous concept.
I have many small objections like this to different parts of the essay, and they do add up, or at least add additional paths to how this could unfold.
I don’t think AIs will destroy humanity anytime soon (say, within 40 years). I do think that human extinction is possible, but I think it will be due to other things (like the low birthrate and its economic consequences. Also tech. Tech destroys the world for the same reasons that AIs do, it’s just slower).
I think it’s best to enjoy the years we have left instead of becoming depressed. I see a lot of people like you torturing themselves with x-risk problems (some people have killed themselves over Roko’s basilisk as well). Why not spend time with friends and loved ones?
Extra note: There’s no need to tie your identity together with your thesis. I’m the same kind of autistic as you. The futures I envision aren’t much better than yours, they’re just slightly different, so this is not some psychological cope. People misunderstand me as well, and 70% of the comments I leave across the internet get no engagement at all, not even negative feedback. But it’s alright. We can just see problems approaching many years before they’re visible to others.
I’ve read some of your other replies on here and I think I’ve found a pattern, but it’s actually more general than AI.
Harmful tendencies outcompete those which aren’t harmful
This is true (even outside of AI), but only at the limit. When you have just one person, you cannot tell if he will make the moral choice or not, but “people” will make the wrong choice. The harmful behaviour is emergent at scale. Discrete people don’t follow these laws, but the continous person does.
Again, even without AGI, you can apply this idea to technology and determine that it will eventually destroy us, and this is what Ted Kaczynski did. Thinking about incentives in this manner is depressing, because it feels like everything is deterministic and that we can only watch as everything gets worse. Those who are corrupt outcompete those who are not, so all the elites are corrupt. Evil businessmen outcompete good businessmen, so all successful businessmen are evil. Immoral companies outcompete moral companies, so all large companies are immoral.
I think this is starting to be true, but it wasn’t true 200 years ago. At least, it wasn’t half as harmful as it is now, why? It’s because the defense against this problem is human taste, human morals, and human religions. Dishonesty, fraud, selling out, doing what’s most efficient with no regard for morality. We consider this behaviour to be in bad taste, we punished it and branded it low-status, so that it never succeeded in ruining everything.
But now, everything could kill us (if the incentives are taken as laws, at least), you don’t even need to involve AI. For instance, does Google want to be shut down? No, so they will want to resist antitrust laws. Do they want to be replaced? No, so they will use cruel tricks to kill small emerging competitors. When fines for illegal behaviour are less than the gains Google can make by doing illegal things, they will engage in illegal behaviour, for that is the logical best choice available to Google if all which matters is money. If we let it, Google would take over the world, in fact, it couldn’t do otherwise. You can replace “Google” with any powerful structure in which no human is directly in charge. When it starts being more profitable to kill people than it is to keep them alive, the global population will start dropping fast. When you optimize purely for Money, and you optimize strongly enough, everyone dies. An AI just kills us faster because it optimizes more strongly, we already have something which acts similarly to AI. If you optimize too hard for anything, no matter what it is (even love, well-being, or happiness), everyone eventually dies (hence the paperclip maximizer warning).
If this post gave you existential dread, I’ve been told that Elinor Ostrom’s books make for a good antidote.