Matthew Barnett

Karma: 10,985

My Twitter: https://twitter.com/MatthewJBar

My Substack: https://matthewbarnett.substack.com/

Matthew Barnett 5 May 2026 6:38 UTC
45 points
4
on: Irretrievability; or, Murphy’s Curse of Oneshotness upon ASI
The opposing faction is forced to that effort of hurried misinterpretation, because the actual impact of the concept of “we only get one shot at ASI” is so devastating to their position, in the presence of even a slight understanding of why there might be any noteworthy engineering difficulties whatsoever.
Honest question for anyone who agrees with this post: is there any extinction problem at all where you’d say we don’t only have one shot to solve the problem? If so, why?

Consider a few examples:

1. A giant asteroid is hurtling toward the planet, and will arrive very soon. If we mess up and fail to deflect the asteroid, then we all die. This is presumably a classic one-shot scenario, and perhaps few people disagree with that assessment, but I’m not sure.

2. Global warming, if continued for a very, very long time, could heat up the planet to catastrophic levels and eliminate the viability of agriculture, killing everyone. Do we also only get one shot to avoid extinction here?

3. Genetically engineered humans, if made much smarter than ordinary humans, and if they are accidentally created as psychopaths, could conceivably coordinate a genocide against ordinary humans. Is this a one-shot problem too?
One might say that what we mean by a problem being “one-shot” is that it needs to be solved on the first try. But what counts as a first try? Does deflecting a small asteroid count as a first try before we need to deflect a large one? If so, that would suggest that the asteroid scenario is not a one-shot problem, which seems wrong. I may as well claim that our first try surviving AI was in 2019 with GPT-2.
In each of the above cases, extinction is irreversible, so that’s another sense in which we only get “one-shot” to solve them. But irreversibility is a very weak condition since it applies to basically all extinction scenarios. If all we mean by saying that AI risk is a “one-shot” extinction problem is that the outcome is irreversible, then that label gives us no extra information beyond simply saying that it’s an extinction problem. Calling it “one-shot” is redundant.
If irreversibility is not what is meant by “one-shot”, then what is meant by that term? Suddenness? A discrete phase transition? Uniqueness? Simultaneity? Extreme difficulty? The problem is adversarial? I’m genuinely not sure how people are using this term.

Matthew Barnett 27 Apr 2026 22:14 UTC
61 points
38
on: In defense of parents
People should really clarify what ages they’re talking about when they argue about childhood independence and whether parents are too controlling. 15-year-olds are much more similar psychologically and biologically to adults than to children, and yet legally and culturally they’re considered children, with a range of restrictions on their behavior that would seem unthinkable if applied to 30-year-olds. This, in my view, is madness.
I think it’s clearly reasonable to liberate teenagers and give them adult-like freedoms (freedom to drop out of school, work, own property, etc.). But I don’t think that means we should necessarily let 5-year-olds control their medical decisions or even their bedtimes.

Matthew Barnett 2 Mar 2026 7:56 UTC
7 points
6
in reply to: bhauth’s comment on: OpenAI employees: Now is the time to stop doing good work.
Your points don’t support the claim I’m objecting to. I can consistently hold all of the following beliefs: having a bunker selfishly benefits Sam Altman, a bunker wouldn’t actually help him in a typical omnicidal AI scenario, and even if it did help him survive, Sam Altman would still suffer enormous personal costs from a global AI catastrophe. None of these claims contradict each other, but the latter two directly contradict what I interpreted you to be saying at the end of your post.

Matthew Barnett 2 Mar 2026 7:34 UTC
4 points
3
in reply to: bhauth’s comment on: OpenAI employees: Now is the time to stop doing good work.
I still don’t think this interpretation makes a lot of sense.
Imagine if you gained the option to live in a bunker. Would you suddenly realize that what happens in the rest of the world no longer matters, at least as far as you’re selfishly concerned, because even if there’s a mega catastrophe, you could always just retreat to your bunker?
Presumably not, because even granting that retreating to a bunker could allow you to survive such a catastrophe (and I don’t see much reason to believe that in the case of an AI omnicide), your quality of life would still substantially decline given the deaths of countless people you know, the collapse of the world economy and infrastructure, and new restrictions on your ability to travel freely and experience the world.

Matthew Barnett 2 Mar 2026 7:07 UTC
8 points
−3
on: OpenAI employees: Now is the time to stop doing good work.
I have to wonder what the advantage of the US “winning” an AI race is—especially since it’s fairly likely that everyone loses. [...] But of course, Altman isn’t worried. He has no kids. He’s having a bunker built. He’ll be OK.
This doesn’t make sense. If AI causes human extinction, Sam Altman would die along with everyone else. You can’t simultaneously claim that winning the AI race means everyone loses, and that Sam Altman is selfishly hastening an AI catastrophe because, unlike everyone else, he will survive and perhaps even thrive in a bunker.

Matthew Barnett 15 Feb 2026 9:00 UTC
2 points
0
in reply to: Nisan’s comment on: Optimal Timing for Superintelligence: Mundane Considerations for Existing People
How does it matter to my argument that, in your analogy, someone dies but we can’t sue the person responsible? I don’t see the relevance. My point is about whether the death constitutes a harm that we should try to mitigate, not about whether anyone can be held legally liable for it.
I concede that if policymakers pass regulations that delay medical progress and cause billions of deaths as a result, I won’t be able to sue them. I still intend to fight against those regulations.

Matthew Barnett 15 Feb 2026 8:19 UTC
5 points
−1
in reply to: Amalthea’s comment on: Optimal Timing for Superintelligence: Mundane Considerations for Existing People
I’ll freely admit that my case for acceleration depends in large part on the risk being low. But I want to separate two distinct arguments here. Many people have told me that acceleration would be unjustified even if the risk is low. Their reasoning is that the sheer number of potential future people creates an overwhelming moral obligation to prioritize bringing them into existence, and that this obligation outweighs the welfare interests of everyone alive today.
I think this longtermist moral argument fails on its own terms, independently of my views about risk. Giving each potential future person significant moral weight inevitably reduces the moral weight of every currently living person to something negligible, since >10^23 potential future people will always swamp anything on the other side of the equation. Billions of real, existing people effectively become a rounding error in the calculation. To me, any moral framework that treats the people alive right now as though they barely matter at all is not one worth taking seriously. It is a ghastly moral stance, and I would reject it even if I thought the risks of acceleration were higher than I actually believe them to be.

Matthew Barnett 15 Feb 2026 7:24 UTC
4 points
1
in reply to: Nisan’s comment on: Optimal Timing for Superintelligence: Mundane Considerations for Existing People
If someone physically held a gun to a surgeon’s head and stopped them from saving your life, would you consider that a harm? I would. In the same way, if the government forcibly prevents AI companies from accelerating medical breakthroughs through AI pause regulations, I consider that a harm too. This is fundamentally different from a situation where someone simply chooses not to advance medicine on their own. In one case, progress is being actively blocked by force; in the other, someone is merely declining to contribute. The distinction between coercively preventing progress and passively not pursuing it matters a lot for assigning blame and naming harm.

Matthew Barnett 15 Feb 2026 5:13 UTC
3 points
0
in reply to: Nisan’s comment on: Optimal Timing for Superintelligence: Mundane Considerations for Existing People
If I get cancer and someone intervenes to prevent the cure from being developed before the cancer kills me, I think they harmed me. Even if they choose to call it a “foregone benefit”, I still died. That’s what matters to me, not how they choose to describe it.

Matthew Barnett 14 Feb 2026 23:16 UTC
28 points
11
in reply to: habryka’s comment on: Optimal Timing for Superintelligence: Mundane Considerations for Existing People
Mostly people have some concern about “the future of humanity”, and that concern is really quite strong. I don’t think it’s particularly coherent (as practically no ethical behavior exhibited by broad populations is), but it clearly is quite strong.
How would we test the claim that people have a strong concern about the long-term future of humanity? Almost every way I can think of measuring this seems to falsify it.
The literature on time discounting and personal finance behavior doesn’t support it. Across the world people are having fewer children than ever, suggesting they are placing less and less priority on having a posterity at all. Virtually all political debate concerns the lives of currently living people rather than abstract questions about humanity’s distant future. The notable exceptions, such as climate change, seem to reinforce my point: climate concern has been consistently overshadowed by our material interest in cheap fossil fuels, as evidenced by the fact that emissions and temperatures keep rising every year despite decades of debate.
One might argue that in each of these cases people are acting irrationally, and that we should look at their stated values rather than their revealed behavior. But the survey data doesn’t clearly demonstrate that people are longtermists either. Schubert et al. asked people directly about existential risk, and one of their primary findings was: “Thus, when asked in the most straightforward and unqualified way, participants do not find human extinction uniquely bad. This could partly explain why we currently invest relatively small resources in reducing existential risk.” We could also look at moral philosophers, who have spent thousands of years debating what we should ultimately value, and among whom explicit support for longtermism remains a minority position. This fact is acknowledged by longtermist philosophers like Hilary Greaves and William MacAskill, who generally emphasize that longtermist priorities are “neglected”, both within their field and by society at large.
I acknowledge that most people have some concern for the future of humanity. But “some concern” is not what we’re arguing about here. This concern would need to be very strong to override people’s interests in their own lives, such as whether they will develop Alzheimer’s or whether their parents will die. Even if people do have strong feelings about the future of humanity upon reflection, that concern is not “clear” but rather speculative. How could we actually know what people ultimately value upon reflection? In any case, the strong concern people have for their actual, living family is already pretty clear given the ordinary behavior that they engage in: how they spend their money, how many children they have, etc.

Matthew Barnett 14 Feb 2026 22:49 UTC
7 points
6
in reply to: habryka’s comment on: Optimal Timing for Superintelligence: Mundane Considerations for Existing People
People do care about having children, and they care especially strongly about their living children. But their concern for future unborn descendants, particularly in the distant future, is typically weaker than their concern for everyone who is currently alive.

Matthew Barnett 14 Feb 2026 22:19 UTC
11 points
−15
in reply to: habryka’s comment on: Optimal Timing for Superintelligence: Mundane Considerations for Existing People
the paper is referencing a much narrower moral perspective (in which the only things that matter are the experiences, not the preferences of the people currently alive).
Note that you could hold the view that the vast majority of people care mostly, even if not entirely, about the lives of people who currently exist: themselves, their immediate family, their children, and their friends. This is highly plausible when you consider that birth rates are crashing worldwide. Most people clearly prioritize their family’s material well-being over maximizing their future descendants who will be born many decades or centuries from now. Most people are not longtermists, or total utilitarians.
If this is the case, and I believe it is, then the welfare version of person-affecting views and the preference version largely coincide.

Matthew Barnett 14 Feb 2026 21:47 UTC
3 points
0
in reply to: DaystarEld’s comment on: Optimal Timing for Superintelligence: Mundane Considerations for Existing People
If someone was actually making arguments specifically for the benefit of all the people currently alive today and next generation, I would expect very different ones from those in this paper. You could try to reasonably try to say that 96% chance of the world ending is acceptable from an 80 year old person who doesn’t care about their younger family or friends or others, but I don’t think it’s a serious argument.

For example, you would have to also do the math for the likelihood of biotech advancements that help currently living 40 year olds or 30 year olds hit the immortality event horizon, as an alternative scenario to “either race for AGI or everyone alive today dies.” If you don’t do things like that, then it doesn’t seem reasonable to argue that this is all in service of a perspective for those alive today vs “hypothetical people”… and of course the conclusion is going to be pretty badly lopsided toward taking high risks, if no other path to saving lives is seriously considered.
I suspect you either lack a clear understanding of the argument made in Bostrom’s post, or you are purposely choosing to not engage with its substance beyond the first thousand words or so.
Bostrom is not claiming that a 96% chance of catastrophe is acceptable as a bottom line. That figure came only from his simplest go/no-go model. The bulk of the post extends this model with diminishing marginal utility, temporal discounting, and other complications, which can push toward longer wait times and more conservative risk tolerance. Moreover, your specific objection, that he doesn’t consider alternative paths to life extension without AGI, is false. In fact, he addressed this objection directly in his “Shifting Mortality Rates” section, where he models scenarios in which non-AGI medical breakthroughs reduce background mortality before deployment, and shows this does lengthen optimal timelines. He also explicitly acknowledges in his distributional analysis that the argument differentially benefits the old and sick, and engages with that fact rather than ignoring it.
I find it frustrating when someone dismisses an argument as unserious while clearly not engaging with what was actually said. This makes productive dialogue nearly impossible: no matter how carefully a point is made, the other person ignores it and instead argues against a version they invented in their own head and projected onto the original author.

Matthew Barnett 14 Feb 2026 9:44 UTC
7 points
−47
on: Optimal Timing for Superintelligence: Mundane Considerations for Existing People
I appreciate this paper because—like what I suspect is true of Bostrom—I also put substantial weight on person-affecting views. In fact, I would go even further than Bostrom goes here. I think, in general, we should usually take actions that benefit the billions of people alive today, or people who will soon exist, rather than assuming that everyone alive today should get negligible weight in the utilitarian calculus because of highly speculative considerations about what might occur in millions of years.
I expect this argument will not be received well on LessWrong, because it violates a major taboo in the community. Specifically, it points out that pausing AI development would likely cause grave harm to billions of currently living people by delaying medical progress that advanced AI could otherwise accelerate. Those billions of people are not abstractions. They include the readers of LessWrong, their parents, and their other family members. Acknowledging this cost is uncomfortable, and the community tends to avoid giving it serious weight, but that does not make it any less real.
I have long appreciated Bostrom for prioritizing clear and careful analysis over merely providing superficial rationalizations of socially acceptable views, and I believe this paper is a good example of that.

Matthew Barnett 25 Dec 2025 4:34 UTC
15 points
8
on: Contradict my take on OpenPhil’s past AI beliefs
I’d like to point out that Ajeya Cotra’s report was about “transformative AI”, which had a specific definition:
I define “transformative artificial intelligence” (transformative AI or TAI) as “software” (i.e. a computer program or collection of computer programs) that has at least as profound an impact on the world’s trajectory as the Industrial Revolution did. This is adapted from a definition introduced by CEO Holden Karnofsky in a 2016 blog post.
How large is an impact “as profound as the Industrial Revolution”? Roughly speaking, over the course of the Industrial Revolution, the rate of growth in gross world product (GWP) went from about ~0.1% per year before 1700 to ~1% per year after 1850, a tenfold acceleration. By analogy, I think of “transformative AI” as software which causes a tenfold acceleration in the rate of growth of the world economy (assuming that it is used everywhere that it would be economically profitable to use it).
Currently, the world economy is growing at ~2-3% per year, so TAI must bring the growth rate to 20%-30% per year if used everywhere it would be profitable to use. This means that if TAI is developed in year Y, the entire world economy would more than double by year Y + 4. This is a very extreme standard—even 6% annual growth in GWP is outside the bounds of what most economists consider plausible in this century.
My personal belief is that a median timeline of ~2050 for this specific development is still reasonable, and I don’t think the timelines in the Bio Anchors report have been falsified. In fact, my current median timeline for TAI, by this definition, is around 2045.

Matthew Barnett 4 Nov 2025 18:10 UTC
−1 points
0
in reply to: J Bostock’s comment on: Comparative advantage & AI
I was approaching the mosquito analogy on its own terms but at this level of granularity it does just break down.
My goal in my original comment was narrow: to demonstrate that a commonly held model of trade is incorrect. This naive model claims (roughly): “Entities do not trade with each other when one party is vastly more powerful than the other. Instead, in such cases, the more powerful entity rationally wipes out the weaker one.” This model fails to accurately describe the real world. Despite being false, this model appears popular, as I have repeatedly encountered people asserting it, or something like it, including in the post I was replying to.
I have some interest in discussing how this analysis applies to future trade between humans and AIs. However, that discussion would require extensive additional explanation, as I operate from very different background assumptions than most people on LessWrong regarding what constraints future AIs will face and what forms they will take. I even question whether the idea of “an ASI” is a meaningful concept. Without establishing this shared context first, any attempt to discuss whether humans will trade with AIs would likely derail the narrow point I was trying to make.
If you don’t think an ASI could definitely make a profit from getting us out of the picture, then we just have extremely different pictures of the world.
Indeed, we likely do have extremely different pictures of the world.

Matthew Barnett 4 Nov 2025 17:13 UTC
4 points
−8
in reply to: J Bostock’s comment on: Comparative advantage & AI
Eradicating mosquitoes would be incredibly difficult from a logistical standpoint. Even if we could accomplish this goal, doing so would cause large harm to the environment, which humans would prefer to avoid. By contrast, providing a steady stored supply of blood to feed all the mosquitoes that would have otherwise fed on humans would be relatively easy for humans to accomplish. Note that, for most mosquito species, we could use blood from domesticated mammals like cattle or pigs, not just human blood.
When deciding whether to take an action, a rational agent does not merely consider whether that action would achieve their goal. Instead, they identify which action would achieve their desired outcome at the lowest cost. In this case, trading blood with mosquitoes would be cheaper than attempting to eradicate them, even if we assigned zero value to mosquito welfare. The reason we do not currently trade with mosquitoes is not that eradication would be cheaper. Rather, it is because trade is not feasible.
You might argue that future technological progress will make eradication the cheaper option. However, to make this argument, you would need to explain why technological progress will reduce the cost of eradication without simultaneously reducing the cost of producing stored blood at a comparable rate. If both technologies advance together, trade would remain relatively cheaper than extermination. The key question is not whether an action is possible. The key question is which strategy achieves our goal at the lowest relative cost.
If you predict that eradication will become far cheaper while trade will not become proportionally cheaper, thereby making eradication the rational choice, then I think you’d simply be making a speculative assertion. Unless it were backed up by something rigorous, this prediction would not constitute meaningful empirical evidence about how trade functions in the real world.

Matthew Barnett 4 Nov 2025 9:22 UTC
14 points
−3
on: Comparative advantage & AI
In the case of ants, we wouldn’t even consider signing a trade deal with them or exchanging goods. We just take their stuff or leave them alone.
Consider mosquitoes instead. Imagine how much better off both species would be if we could offer mosquitoes large quantities of stored blood in exchange for them never biting humans again. Both sides would clearly gain from such an arrangement. Mosquitoes would receive a large, reliable source of food, and humans would be freed from annoying mosquito bites and many tropical diseases.
The reason this trade does not happen is not that humans are vastly more powerful than mosquitoes, but that mutual coordination is impossible. Obviously, we cannot communicate with mosquitoes. Mosquitoes cannot make commitments or agreements. But if we could coordinate with them, we would—just as our bodies coordinate with various microorganisms, or many of the animals we domesticate.
The barrier to trade is usually not about power; it is about coordination. Trade occurs when two entities are able to coordinate their behavior for mutual benefit. That is the key principle.

Matthew Barnett 9 Oct 2025 23:13 UTC
9 points
−3
in reply to: loic’s comment on: Jan_Kulveit’s Shortform
What about a scenario where no laws are broken, but over the course of months to years large numbers of humans are unable to provide for themselves as a consequence of purely legal and non violent actions by AIs? A toy example would be AIs purchasing land used for agriculture for other means (you might consider this an indirect form of violence).
I’d consider it bad if AIs take actions that result in a large fraction of humans becoming completely destitute and dying as a result.
But I think such an outcome would be bad whether it’s caused by a human or an AI. The more important question, I think, is whether such an outcome is likely to occur if we grant AIs legal rights. The answer to this, I think, is no. I anticipate that AGI-driven automation will create so much economic abundance in the future that it will likely be very easy to provide for the material needs of all biological humans.
Generally I think biological humans will receive income through charitable donations, government welfare programs, in-kind support from family members, interest, dividends, by selling their assets, or by working human-specific service jobs where consumers intrinsically prefer hiring human labor (e.g., maybe childcare). Given vast prosperity, these income sources seem sufficient to provide most humans with an adequate, if not incredibly high, standard of living.

Matthew Barnett 8 Oct 2025 19:49 UTC
36 points
1
in reply to: Jan_Kulveit’s comment on: Jan_Kulveit’s Shortform
My views on AI have indeed changed over time, on a variety of empirical and normative questions, but I think you’re inferring larger changes than are warranted from that comment in isolation.
Here’s a comment from 2023 where I said:
The term “AI takeover” is ambiguous. It conjures an image of a violent AI revolution, but the literal meaning of the term also applies to benign scenarios in which AIs get legal rights and get hired to run our society fair and square. A peaceful AI takeover would be good, IMO.
In fact, I still largely agree with the comment you quoted. The described scenario remains my best guess for how things could go wrong with AI. However, I chose my words poorly in that comment. Specifically, I was not clear enough about what I meant by “disempowerment.”
I should have distinguished between two different types of human disempowerment. The first type is violent disempowerment, where AIs take power by force. I consider this morally bad. The second type is peaceful or voluntary disempowerment, where humans willingly transfer power to AIs through legal and economic processes. I think this second type will likely be morally good, or at least morally neutral.
My moral objection to “AI takeover”, both now and back then, applies primarily to scenarios where AIs suddenly seize power through unlawful or violent means, against the wishes of human society. I have, and had, far fewer objections to scenarios where AIs gradually gain power by obtaining legal rights and engaging in voluntary trade and cooperation with humans.
The second type of scenario is what I hope I am working to enable, not the first. My reasoning for accelerating AI development is straightforward: accelerating AI will produce medical breakthroughs that could save billions of lives. It will also accelerate dramatic economic and technological progress that will improve quality of life for people everywhere. These benefits justify pushing forward with AI development.
I do not think violent disempowerment scenarios are impossible, just unlikely. And I think that pausing AI development would not meaningfully reduce the probability of such scenarios occurring. Even if pausing AI did reduce this risk, I think the probability of violent disempowerment is low enough that accepting this risk is justified by the billions of lives that faster AI development could save.