The Lethal Reality Hypothesis
The epistemic status thing
Please read this section because it is not a disclaimer for the sake of it.
I think the model I describe in this post can easily be wrong, in its full form. However, I also think the probability of it being substantially correct is non-negligible and even more than that, and that this alone warrants writing it up and thinking about it seriously.
Throughout this post, I will write “X is the case” rather than “it seems plausible that X might be the case, though I am not certain.” Every claim should be mentally prefixed with the appropriate hedge. I’m dropping the hedges not because I’m confident, but because a text where every sentence contains “it seems plausible that” becomes unreadable without actually making anyone more calibrated.
I should also flag that I’m writing this from a state of considerable frustration with the current state of affairs, and I this frustration may be load-bearing in places where it shouldn’t be.
This is not about AI
AI risk is, at most, one illustration of the phenomenon I’m describing here, and not a particularly privileged one. Everything in this post would hold in a world where artificial intelligence had never been conceived of. It would hold in a world where nuclear weapons were never built or where the climate was perfectly stable and no one had ever heard of greenhouse gases.
The claim is about a structural property of civilizations under optimization pressure, not about any particular technology or threat vector. If you finish reading this and your takeaway is “oh, another AI doom argument,” then I have failed to communicate the central point, which is that the problem runs much deeper than any specific risk and would persist even if every currently-known risk were magically eliminated tomorrow.
The central claim: extinctive pressure
The core idea: extinction is the default outcome for any civilization, and it is a strong default. Not because of any particular threat, but because of a structural property which I will call extinctive pressure, which operates on every civilization simply by virtue of how optimization, competition, and the universe work together.
I want to unpack what I mean by calling it a “pressure” rather than just “a risk” or “a likely outcome”.
Consider a version of an old thought experiment. Suppose you learn that Cthulhu is real. He is sleeping somewhere in the Pacific, and in approximately 200 years he will wake up and destroy all of humanity, unless humanity coordinates to prevent it and spends significant but realistically achievable resources on it. You have excellent evidence for this. The evidence is publicly available and widely accepted as credible.
I claim that, almost certainly, humanity does not successfully fight Cthulhu.
Why? Roughly, because fighting Cthulhu is costly (in whatever sense), and the cost is immediate while the payoff is distant.[1] Every dollar and every unit of political capital spent on Cthulhu-proofing is a dollar and a unit of political capital not spent on things that yield competitive advantage right now. Every leader who commits their country’s resources to the Cthulhu project is outcompeted by a leader who instead commits those resources to economic growth, military strength, or popular welfare programs. Every researcher who works on Cthulhu defense could be working on something that produces more papers, grants, or products in their lifetime. The people who take Cthulhu seriously are, at every level of the competition, at a disadvantage relative to those who don’t, or who mouth the right words about taking it seriously while allocating resources elsewhere.
This is what I mean by “pressure”. I don’t mean “extinction is likely” as a passive observation, the way one might say “rain is likely tomorrow.” I mean there is an active force that pushes civilizations toward extinction and pushes back against attempts to resist. It is a force in the relevant sense that: (1) counteracting it requires continuous expenditure of energy and resources, (2) in the absence of such expenditure, the system drifts toward extinction by default, and (3) the agents who do spend energy fighting it are systematically outcompeted by those who do not, which means the system’s ability to fight it degrades over time even if some agents start out fighting it.
The analogy I find useful is sailing against the wind. It is not impossible and it does not violate any law of physics. Sailboats can and do sail against the wind. But it is costly, it is slower than the opposite, and it requires continuous active effort and skill, and the moment you stop actively doing it, you drift back. And critically, in a race, the boat sailing with the wind will generally beat the boat sailing against it.
Now let me explain the entire chain of arguments.
Optimization does not target survival
Agents and civilizations are subject to optimization. Most of these optimization processes don’t have survival-of-the-civilization as their objective. They have local objectives: reproductive fitness, profit, political power, memetic spread. These objectives are sometimes loosely correlated with civilizational survival, in the same way that a company’s profitability is sometimes loosely correlated with its customers’ wellbeing. But “loosely correlated” is doing an enormous amount of work in that sentence, and for our purposes, the correlation is almost certainly insufficient, and for two reasons:
The set of states in which a civilization survives long-term is astronomically small relative to the total state space.
The states toward which optimization actually drives us are in tension with survival states rather than merely uncorrelated with them.
The worst of the best possible worlds
Here I need to introduce a concept I’ll call the local environment.
You exist, right now, in an extraordinarily specific set of conditions. The fundamental physical constants of this universe permit complex chemistry, the Sun is a stable main-sequence star in a relatively quiet region of the galaxy, Earth has a magnetic field that deflects solar wind, plate tectonics that regulate the carbon cycle, a large moon that stabilizes axial tilt, oxygen-nitrogen atmosphere held at a temperature range compatible with liquid water, and so on.
All of this is what I mean by “local environment”: the specific bubble of conditions, bounded in space, time, and parameter space, within which our continued existence happens to be viable.
Now apply a combination of two principles. The anthropic principle tells us that we necessarily find ourselves in conditions compatible with our existence, so we should not be surprised that our local environment is friendly. The Copernican principle tells us that we are not in a special or privileged location in the space of possibilities; we should expect to be typical among observers.
Combining these yields a conclusion: we should expect to live in a minimally friendly universe, not a maximally or even an average-friendly one. The anthropic principle guarantees that our universe clears the bar for producing observers. The Copernican principle says we are typical among observer-containing universes. Since there are vastly more ways for a universe to barely clear the bar than to be friendly for humans on all levels and in all parts of the configuration space, typical means “barely clearing the bar.” We should expect our universe to be friendly enough to produce us and not much more than that.
This can be illustrated with the evidence on the life-permitting regions in the space of fundamental physical constants (but this is an illustration, I do not want to create impression that the argument is only about fundamental physics parameters fine-tuning). Barnes (2012) provides the most comprehensive review: in the space of possible physical laws, parameters and initial conditions, the set that permits the evolution of intelligent life is very small. The key point that matters for the argument here is the geometric fact that makes the “minimally friendly” conclusion follow from the Copernican principle: in a multi-dimensional parameter space where the viable region is a tiny sliver, most of the volume of that sliver is near its boundaries, not deep in its interior. Typical observers are near the edge of viability, not comfortably in the center.
In other words, the conditions for civilizational survival are not the default state of reality, they are a razor-thin exception, and we happen to be in the exception right now because that is the only place observers can be. Step outside the local environment, in any direction (deeper into space, further into the future, or into novel physical/technological/social configurations that our environment has not been tested against) and you should by default expect conditions to be lethal.[2]
Why, then, have we survived until now? Probably, because our local environment has been stable, at least on the timescale relevant to biological evolution and civilization development, and because of anthropic selection.
Local environments are not stable, and we are actively destabilizing ours
Even on the natural side, if we wait long enough, then environment will change to lethal: the Sun’s luminosity will increase, asteroid impacts are stochastic, and then comes heat death.
But the more immediate concern is that powerful agents actively change their local environment, often dramatically and often in ways that are poorly understood at the time.
The entropy frame
Making everything go right is hard. Making something go wrong is easy. This is also, at root, an observation about the relative sizes of state spaces: the states in which a complex system continues to function are vastly outnumbered by the states in which it doesn’t, in the same way that the configurations of a watch that tell time are vastly outnumbered by the configurations that don’t.
The important corollary is that things go right only when there is a specific, powerful mechanism making them go right. Your body maintains homeostasis not by default but because billions of years of evolution built elaborate regulatory systems to keep temperature, pH, blood oxygen, and a thousand other parameters within viable ranges.
Where there is no such mechanism, entropy wins. The system drifts toward the vastly larger space of non-functional states. The question for civilizational survival is therefore: is there a specific, powerful mechanism that keeps civilization within the narrow band of survival-compatible states? Most likely, no.
Feedback loops for survival are not tight
A reader might object to the entropy argument as follows: “Sure, survival states are rare, but we have feedback mechanisms. When things start going wrong, we feel pain, we experience resource shortages, we notice and we correct.” This is true in many domains, and it is exactly why many complex systems remain functional despite entropy: they have tight feedback loops that detect deviation from the functional state and apply corrective pressure.
The problem is that for existential threats, the feedback loops are, generally, not tight at all.
Consider what a tight feedback loop for survival would look like. For every existential threat, there would need to be some proportionate preliminary signal: pain, resource loss, political instability, something that registers in the preferences and incentive structures of the agents who could actually do something about it (so not only in some “empirical signal”). The signal would need to be (a) early enough to allow corrective action, (b) strong enough to motivate costly corrective action, and (c) reliably connected to the actual severity of the threat rather than to some proxy that can be Goodharted. Which is unrealistic to expect.
Survival conflicts with optimization
The states that optimization actually pushes us toward are not merely uncorrelated with survival, they are actively in tension with it.
At every level of competition, agents face resource allocation decisions. Some resources can be directed toward survival-relevant activities. The same resources can alternatively be directed toward activities that improve the agent’s competitive fitness right now.[3]
There is always a margin at which you can reallocate from the first category to the second. And crucially, there is no strong feedback loop running in the reverse direction: investing in civilizational survival does not, in general, make you more competitive. Sometimes there is a weak correlation (a society that prepares for pandemics may have a healthier workforce), but the correlation is nowhere near tight enough to make survival-oriented behavior a winning strategy in the competitive landscape.
This means survival-oriented behavior has negative fitness in most competitive environments in the long run, because agents who skip it and reallocate those resources to direct competition will, all else being equal, outperform agents who don’t.
Now, there is one scenario in which survival-oriented behavior could be reinforced: if existential disasters happen frequently enough and are mild enough that agents who prepare for them consistently outperform agents who don’t. In this scenario, you get something like natural selection for survival-competence. Think of how organisms evolved immune systems: pathogens are frequent, individual infections are survivable, and organisms with better defenses reliably outcompete those without them. The feedback loop is tight, the disaster distribution is right, and survival-competence gets selected for.
But notice how specific the required conditions are:
(a) The disaster distribution must be very particular. The existential threats must be frequent enough to provide a training signal, but mild enough that failing to prepare is costly without being instantly terminal. If disasters are too rare, the agents who prepare for them waste resources that their competitors spend on winning the current round. If disasters are too severe, they simply wipe out everyone and there is no differential selection at all. The sweet spot, frequent mild existential threats that reward preparation, is a very narrow band, and there is no reason to expect our actual threat distribution to fall in it. In fact, the threats we actually face are characterized precisely by being rare and catastrophic rather than frequent and mild.
(b) Even if the distribution were right, calibration to local threats does not generalize. Suppose a civilization does develop good responses to the existential threats in its local environment. This gives you no guarantee whatsoever that those responses transfer to novel threats outside the training distribution. The “training distribution” of past existential near-misses is not representative of the full distribution of possible existential threats.
The ability to foresee landscape changes does not reliably help
Suppose some agents are smart enough and can predict how the competitive landscape will shift. Can they use this foresight to prepare? In principle, yes. In practice, they are competing against agents who use their resources to win in the current landscape rather than preparing for the next one. An agent who correctly predicts that AI will reshape the competitive landscape in 10 years and diverts resources to prepare for that transition is, for the next 9 years, outcompeted by agents who use those resources to dominate in the present. By the time the foresighted agent’s predictions are vindicated, the shortsighted agents may have already accumulated enough power and resources to be the ones who determine how the transition goes. This is, incidentally, a recognizable dynamic in financial markets, where being right too early is operationally indistinguishable from being wrong.
Requirements for survival under optimization
We can now state fairly precisely what the optimization landscape would need to look like for a civilization to survive under optimization pressure:
Requirement 1: The current fitness level must be sufficient. Either the civilization’s present position in the competitive landscape must already be within the survival-compatible region, or the optimization dynamics must be carrying it toward the survival-compatible region fast enough that it arrives before an existential catastrophe occurs.
Requirement 2: The landscape must be sufficiently static. The competitive environment must not change so fast that adaptations become obsolete before they can accumulate. If the landscape shifts faster than the civilization can adjust, then even a civilization that is currently well-adapted is on borrowed time, because its adaptations are being invalidated faster than new ones can be developed.
So: how likely is it that both requirements are satisfied simultaneously?
The conjunction of these two requirements is not logically impossible. But it requires a very specific, very lucky configuration of the optimization landscape, and there is no known mechanism that selects for or maintains such a configuration.
The civilization path is not ergodic
You do not get to play multiple times. There is no ensemble of civilizations over which your results average out. You have one path through time, and extinction is an absorbing state: once you enter it, you do not leave.
Any per-period probability of extinction that is not infinitesimally small converges to certainty over a long enough time horizon. If you face a 1% probability of extinction per century, your probability of surviving 1,000 years is about 0.99^10 ≈ 0.90. Survivable, maybe. Your probability of surviving 10,000 years is about 0.99^100 ≈ 0.37. Uncomfortable. Your probability of surviving 100,000 years is about 0.99^1000 ≈ 0.00004. Effectively zero. And 100,000 years is nothing on a cosmological timescale.
To survive in the long run, you do not need to survive “on average” or “in expectation.” You need to survive almost surely, almost in the technical measure-theoretic sense: the probability of the extinction event must be driven so close to zero, in every period, that the infinite product of survival probabilities converges to something positive. This is an astronomically more demanding requirement than “keep the per-period risk reasonably low”.
Survival does not scale
A corollary: survival competence at one scale does not compose into survival at larger scales, neither in time nor in space.
Consider the temporal dimension. Suppose a society manages, through heroic effort, to maintain a serious focus on existential risk mitigation for the duration of one generation, roughly 30 to 50 years. Suppose every generation is like this: each inherits the commitment, maintains the institutions, and keeps the probability of extinction very low for its own period. Does this chain of responsible generations add up to long-term survival?
Not necessarily, and probably not. The probabilities which look totally ok for a generation lifetime will multiply and imply extinction in the long run.
The spatial dimension is, if anything, worse. Suppose one country manages to implement strong pro-survival policies. This country’s survival still depends on what every other country does. Survival under existential risk is a weakest-link problem, more precisely a weakest-link problem across the entire globe and across all of future time, and the probability of every link holding in every period is the product of all the individual holding probabilities.
The process is fat-tailed
The non-ergodicity argument above assumed, generously, that the per-period extinction probability is roughly constant over time. The actual situation is almost certainly worse.
The stochastic process representing civilizational outcomes is very likely fat-tailed. That is, the distribution of possible shocks to the system has tails that are much heavier than a Gaussian or other thin-tailed distribution would suggest. Extreme events, especially technological ones, are not exponentially rare the way they would be under a thin-tailed model. They are polynomially rare at best, which means they are much more likely to occur on civilizational timescales than naive extrapolation from recent calm periods would suggest.
Under fat tails, the non-ergodicity problem gets dramatically worse. In a thin-tailed world, you can at least estimate the per-period risk reasonably well and plan around it. In a fat-tailed world, the per-period risk is dominated by events that are outside your current model, events that you cannot estimate because you have never observed anything like them and your historical data gives you almost no information about their probability. The “1% per century” estimate I used above was illustrative; in a fat-tailed world, the honest estimate is something closer to “we don’t know and we may be unable to know, because the tail events that dominate the risk have not happened yet and will look nothing like anything that has.”
However, there is even more to it.
In the absence of sufficiently strong corrective mechanisms, the multiplicative fat-tailed process drives the system to zero
Consider a multiplicative process: a system whose state at each time step is multiplied by some random factor (rather than having some random amount added). Civilizational “health” is plausibly multiplicative in this sense: a sufficiently bad shock doesn’t subtract a fixed amount from your prospects, it multiplies them by something close to zero. And extinction is the absorbing state at zero.
For the simplest case, geometric Brownian motion (a multiplicative process with Gaussian log-returns of mean μ and variance σ²), there is an exact and famous result: the time-average growth rate of a single path is not μ but μ − ½σ². This is the gap between the ensemble average and the time average, between what happens “on average across many parallel worlds” and what happens “in the one world you actually live in.” The correction term ½σ² is always negative, which means the single-path growth rate is always lower than the expected growth rate, and when σ² is large enough, it goes negative even if μ is positive. Your expected value grows, but you, on your one path, go to zero.
This is already an important result. It tells you that high volatility is not just unpleasant but existentially dangerous for any system that cannot restart after hitting zero: a single path through a sufficiently volatile multiplicative process will almost surely be destroyed even if the average outcome across all possible paths looks fine.
Now consider what happens when the process is fat-tailed. Replace the Gaussian shocks with shocks drawn from an α-stable distribution with stability index α < 2. For α < 2, the theoretical variance is infinite (or undefined). The exact formula μ − ½σ² is derived for the Gaussian case specifically (via Itô′s lemma), and does not formally carry over to Lévy processes with infinite variance; the correction terms take a different and more complex form involving the characteristic exponent of the process. However, there is an empirical regularity: if you compute the running sample variance up to each time point (which is always finite, being computed from finitely many observations) and plug it into the Gaussian formula μ − ½(σ_empirical)², you get a quantity that tracks the actual realized growth rate of the process quite well. And what you observe is that the running sample variance grows over time, punctuated by ever-larger spikes as new extreme observations arrive, because the sample variance of an infinite-variance distribution does not stabilize. Each new tail event revises the effective σ² upward, and the effective growth rate μ − ½(σ_empirical)² is dragged further and further down.[4]
The implication, stated loosely: for a multiplicative process with fat-tailed shocks, the effective volatility penalty grows without bound over time, eventually overwhelming any positive drift. The single path goes to zero not just with high probability, but with a kind of inevitability, as the accumulating tail events ratchet the effective growth rate ever more deeply negative.
Now, this argument, taken at face value, seems to prove too much. If fat-tailed multiplicative dynamics inevitably drive systems to zero and if they really describe reality precisely enough, then every organism, every species, every civilization is doomed regardless of anything it does. Which is probably too strong argument even for this speculative post.
The resolution, I think, is that real systems that persist because they are not undergoing unconstrained multiplicative random walks—either their walks are not really multiplicative or not really fat-tailed.
So the mathematical argument does not prove that everything dies no matter what. What it shows is something like that: in the absence of sufficiently strong corrective mechanisms, the multiplicative fat-tailed dynamics dominate and the system goes to zero. Survival requires a corrective mechanism strong enough to counteract the ever-growing volatility penalty that fat-tailed shocks impose on single-path dynamics.
Instrumental convergence: the strongest counterargument, and why it probably fails in practice
Ok, but what about instrumental convergence?
We surely know: almost any goal you might have requires you to be alive to pursue it. A sufficiently intelligent agent, regardless of its terminal goals, should converge on self-preservation (and by extension, civilization-preservation, if the agent depends on civilization) as an instrumental subgoal.
I think this argument is correct in principle, and if it works in practice, it defeats the lethal reality hypothesis. If civilizations reliably produce agents (whether individual humans, institutions, or AI systems) that are smart enough to internalize the instrumental value of long-term survival and eventually act on that understanding effectively, then there is a mechanism that counteracts extinctive pressure, and the central thesis of this post is wrong.
However, the key word is “eventually”. The question is not whether this realization eventually occurs but whether it occurs soon enough, with sufficient force, in enough agents, to actually counteract the extinctive pressure before it is too late.
More specifically:
“Sufficiently smart” is a high bar, and we have not cleared it. Instrumental convergence is a theoretical property of sufficiently intelligent agents. Humans are intelligent, but apparently not sufficiently intelligent in the relevant sense. Humans understand, in the abstract, that they need to be alive to pursue their goals, but there are many cases when humans do not act accordingly, including on civilization level.
Theoretical acceptance is not behavioral compliance. Even among agents who understand and accept the instrumental convergence argument, there is a further gap between intellectual acceptance and actual behavior. Understanding that survival is instrumentally necessary does not automatically reconfigure your incentive structure, your discount rate, your competitive environment, or the institutions you operate within. This is, in a sense, just the core mechanism of extinctive pressure restated: even agents who understand the argument are outcompeted by agents who understand it equally well but choose to defect.
The race between understanding and environmental shift. A civilization needs to become smart enough to internalize the instrumental value of survival before it becomes powerful enough to alter its own local environment to a lethal state.
Competitive exclusion of those who understand. Suppose some agents in the civilization do clear the bar. These agents are now competing against equally intelligent agents who also understand the argument but who, for whatever reason, do not act on it. The agents who don’t act on it have more resources available for immediate competition, because they are not paying the survival tax. The agents who understood and acted are marginalized, not because they were wrong, but because being right was expensive and being wrong was free.
So it is not enough to be smart enough eventually. A civilization must be smart enough here and now, which is hard because:
Every emerging civilization is around the dumbest possible civilization
A civilization arises when a species crosses some threshold of cognitive and social capability. Below this threshold, you don’t get civilization at all. Above it, you do. The question is: how far above the threshold should we expect a newly-emerged civilization to be?
The answer, by a straightforward selection argument, is: barely above it. Civilizations emerge as soon as they can, because the components that give rise to civilization (intelligence, social complexity, tool use) are under positive selection pressure for reasons that have nothing to do with building civilizations per se. They are selected because they confer immediate competitive advantage. A species crosses the civilization threshold not because it was aiming for civilization but because the optimization pressures that were pushing intelligence upward for other reasons happened to push it past the threshold. And since it crosses as soon as it can, it crosses with approximately the minimum cognitive endowment required.
This is directly analogous to a point that is familiar in evolutionary biology: organisms tend to be minimally adapted to their niches, not maximally, because evolution is a satisficing process that stops optimizing a trait once it is “good enough” for the current competitive landscape (or more precisely, once the marginal fitness return of further improvement drops below the marginal cost). Civilizations, similarly, emerge at approximately the minimum viable intelligence, because there is no selection pressure that specifically pushes a pre-civilizational species past the minimum, and even if there were, it would require more time than civilization-scale time.[5]
So, we should expect a newly-emerged civilization to be roughly as dumb as a civilization can possibly be while still counting as a civilization.
Intelligence is not the bottleneck
It is true that, all else being equal, greater intelligence leads to better awareness of existential risks and perhaps even greater concern about them.
But “all else being equal” is never the actual situation. What actually happens when the general intelligence level of a civilization increases is that all agents get smarter, including the ones who are optimizing for things other than survival. And the ones optimizing for things other than survival still have a competitive advantage, because they are not paying the survival tax. A world with smarter humans is not a world where smart survival-oriented humans dominate, but a world where smart survival-oriented humans compete against equally smart growth-oriented, power-oriented, and profit-oriented humans, and lose, for exactly the same structural reasons they lose now, just at a higher cognitive level.
The state of affairs where you have high intelligence combined with thorough disregard for existential risk might seem like an unlikely or unnatural configuration. And in a sense it is: if you picked a truly random mind from the space of all possible minds with that intelligence level, it might be unlikely to have this particular pattern of concerns and blind spots. But we are not sampling randomly from mind-space. We are observing the output of an optimization process that specifically selects for competitive fitness, and competitive fitness is improved by intelligence while being unimproved or actively harmed by existential risk concern. The optimization process is, in effect, searching for exactly this “unlikely” configuration.
Caring about survival is not the same as surviving
I do not dispute that many people care about survival, on the psychological level.
However, there are three levels, and only the third matters:
Level 1: Caring about survival in the psychological sense. You feel concerned. You believe existential risk is real and important. You experience anxiety or urgency about it.
Level 2: Executing actions that are intended to be pro-survival. You donate to AI safety organizations. You write papers about existential risk. You lobby for policy changes. You work at an alignment lab. You feel, reasonably, that you are doing something about the problem.
Level 3: Executing actions that actually lead to survival. You are executing actions that causally contribute to the civilization actually not going extinct. This is an absurdly demanding criterion, and that is precisely the point.
The extinctive pressure is a claim about Level 3. It says that the optimization dynamics of civilizations make it extremely unlikely that agents execute Level 3 actions at sufficient scale. It doesn’t imply that agents don’t execute actions on Levels 1 and 2.
The progress dilemma: there are no good options
Option 1: Full speed ahead. This is approximately the status quo. Technological progress continues at whatever rate the competitive optimization landscape produces, which in practice means as fast as possible, because agents who develop technology faster outcompete those who develop it slower. The problem is obvious and already discussed at length.
Option 2: Halt progress. Stop developing new technology entirely. This avoids the problem of self-inflicted environmental destabilization, but it runs directly into the other horn of the dilemma: the natural environment is not static. On cosmic timescales, it implies certain extinction. Also, beware that competitive dynamics pushes towards resuming progress.
Option 3: Slow, cautious, survival-directed progress. This is the option that seems like it should work, and the reasons it doesn’t are the most instructive. The idea is: progress slowly and carefully, directing research and development specifically toward technologies that enhance civilizational resilience, while carefully managing the risks introduced by each new technology before proceeding to the next one.
The problem is that “slow, cautious, survival-directed progress” is not a natural attractor of any known optimization process. It is simply slow progress. A world that progresses slowly is not, by default, a world that directs its slow progress toward survival. It is a world where the same competitive dynamics operate at a slower rate. The agents who would need to redirect progress toward survival still face the same structural disadvantages.
For this to work, you would need a mechanism that not only slows progress but redirects it, that specifically channels civilizational resources toward survival-relevant capabilities rather than competitive-advantage-relevant capabilities. And this mechanism would need to be self-sustaining: it would need to maintain itself against the continuous pressure of competitive dynamics that reward defection from the survival-oriented program. This is, in effect, asking for a global coordination mechanism that persists indefinitely against strong incentives to defect.
Indirect supporting arguments
There is a variety of classical doomsday arguments in the philosophical literature. I do not build on them, nor they are required for the current line of reasoning. But the consistency between them is worth noting.
The most relevant is the Doomsday Argument proper, which in its simplest form observes: if humanity will eventually number trillions of people spread across millennia or galaxies, then your birth rank (roughly the 100-billionth human ever born) places you extraordinarily early in the species’ history, which is a priori unlikely. You would be in the first 0.001% or less of all humans who will ever live. Under a uniform prior over birth rank, this is much less likely than the alternative: that the total number of humans who will ever live is not astronomically large, which implies that humanity does not have a long or expansive future. The argument is, to a degree, controversial and there are well-known objections (the Self-Sampling Assumption vs. Self-Indication Assumption debate, the reference class problem), and I do not want to relitigate them here. The point is simply that if you take the Doomsday Argument seriously, it is telling you roughly the same thing as extinctive pressure: the far future with billions of years of flourishing human civilization is not the expected outcome.
The Great Filter argument is perhaps even more directly relevant. For all I know, it may very well be the case that the Great Filter is the case, and the growing biological evidence, in my view, shifts the estimates towards the future from the past. At the same time, I acknowledge that ASI can’t really be the Great Filter, because in that case the Universe around us would have been already conquered by some ASI. So there is a real tension here, in my view.[6]
In any case, the extinctive pressure hypothesis offers a specific mechanism for a filter that is ahead.
The observation selection effects literature, generally, supplies the tools for reasoning carefully about what we should expect to observe given that we are observers, and many of the conclusions in that literature point in the same direction. We should not be surprised that we find ourselves in a universe that permits observers, but we should also not take the fact that we are here as strong evidence that the future is likely safe.
And of course, there is a lot written on technogenic risks. Every major technology is, from the extinctive pressure perspective, a step outside the local environment into an untested region of parameter space. On the object level, we can trivially think about things like, for example, ASI, nuclear war, global warming, bioweapons, or grey goo.
Summary: optimization-wise, you are punished for caring about survival
Agents who divert resources from competition toward survival pay an immediate cost and receive no commensurate competitive benefit, which means they are systematically outperformed by agents who do not pay this cost.
This is what I mean by calling it a “pressure”. When a system tries to fight extinction, there is a restoring force that pushes the system back toward the state of not fighting extinction, unless large resources are applied in the opposite direction.
The sailors who turn and run with the wind will always move faster, unless additional stronger source of power is applied by their competitors.
Which is not say you shouldn’t care about survival.
- ^
This reads like an argument about preferences/utility functions over time, but it is not, let me be clear on that. I make this note because that sentence may create a wrong impression that the problem is in time discounting, which is not, because see what is written next.
- ^
This is very much in line with Fragile World Hypothesis. See it for more discussions on this topic.
- ^
This line of reasoning, recurring throught the entire post, relates a lot to the Inadequate Equilibria. I think it is correct to formulate what I write here in terms of the inadequate equilibra framework. It is just that “adequate equilibria” are very rare and are optimized against.
- ^
This entire thing probably sounds too technical and not what I, as the author, should expect to be smoothly understood by the audience. For clarity, I ask to either ask AIs about alpha stable processes, or take my book on that, or ask AIs to summarize the results from my book on empirics of alpha stable processes.
- ^
Strictly speaking, there are plausible scenarios where the same pressures that produced civilization-level intelligence continue operating after the threshold is crossed (sexual selection for intelligence, for instance, doesn’t stop just because you’ve invented agriculture). The point, however, still holds in a weaker form: we should expect to be near the minimum (especially in the short timescales), not necessarily at it, and “near the minimum” is sufficient for everything that follows.
- ^
Obviously, I think ASI is powerful enough to escape extinctive pressure. So the absense of ASI in this region of the Universe looks like an argument for the Great Filter being behind, but I am not sure. Anyway, it is a separate big topic.
Civilization consists of some mixture of local myopic competition, and global organized agency.
Maximally local competition is bacteria. Each individual cell fighting for resources. And no cell can do anything else, or it will be outcompeted.
Maximally global organization is a singleton. An ASI or world government that can easily squash any subsystem that gets out of line.
The world is neither extreme, and fragments of both patterns can be found.
Look at the global moratorium on CFC’s. That’s one of those large scale, long term global problems, with a not-that-expensive solution. And it was met with a large scale coordinated response.
Except, industrial civilization. The set of universes in which an industrial high tech civilization can propagate itself seems to be larger than the set of universes in which homonids can evolve. The set of universes in which self replicating nanotech can spread through the lightcone is FAR larger than the set of universes where the hunter gatherers have access to the right kind of flint and tasty large herbivores.
Also, the idea that there are “vastly more ways to barely clear the bar than to be friendly” sounds like a complicated and nontrivial assumption. It might be true, but there is no obvious-to-me reason why it MUST be true.
I am not convinced by fine tuning. I don’t think we know enough physics to know whether or not nuclear-physics life exists on the surface of neutron stars in our universe. Let alone in some other universe. I expect that most of the universes with different constants have a complexity similar to that of this universe.
As a property of high dimensional spaces, Yes. However, being at the edge of that sliver doesn’t automatically mean we are “barely surviving” and risk extinction. We could be in a situation where, if the strong force was just marginally weaker, stars couldn’t exist. (And suppose for a moment that life without stars is impossible) That puts us near the boundaries of survivable-parameter space, but it doesn’t mean we are risking extinction.
Also, I’m not actually convinced that the viable region is a tiny sliver.
The thing is, mosquitos are also a complex system. Applying this logic, it should be really really easy to wipe them all out.
Ability for humans to self replicate and rebuild. Active adaptions, both evolved and intelligent. Decisions and actions, both individual and coordinated.
Every human starving to death because we just randomly decide we don’t want to eat, despite having food available. This is, in a sense, an existential threat. Unless a large fraction of humanity does some fairly specific actions of unwrapping, cooking and eating food, humanity goes extinct. But this isn’t on your list of x-risks, because this is an example where the feedback loop is tight.
Again, you seem to be assuming a world of perfect competition and 0 foresight and planning. Also, the groups that prepare against pandemics get less pandemics.
So long as there is at least one source of disasters like that, this selects against anything too myopically competitive.
Granted. A high competence singleton is also somewhat of an absorbing state.
And a civilization that has maxed out the tech tree will probably have a low ongoing risk of extinction.
I don’t think that’s true. If x-risk reduction gets a constant fraction of resources, a richer civilization has more resources to throw at the problem.
You are taking a situation where 2 utility functions are mostly uncorrelated, and using “resources” to claim that the game is 0 sum. Uncorrelated != 0 sum. 2 agents with uncorrelated utility functions might find a way to achieve near maximum on both functions.
Generally you keep assuming near perfect competition, but also everyone has an end-the-universe button. This is quite an odd thing to assume. A world of perfect unrestricted military competition is one where various sides routinely throw nukes and bioweapons at each other. In this world, everyone has nuclear bunkers and bioweapon defenses.
It’s possible there is something that can destroy the whole world, but not be targeted to only destroy your enemies, but that’s a rather specific kind of thing.
It is quite possible that the techs necessary to come up with a solution to deal with the threat could end up having substantial spinoff applications, thus paying for itself quite comfortably.
Otherwise the society in question would have to be, among other things, a post-scarcity one where capitalism has been transcended and money no longer exists.
Thanks! That is really sound optimistic take and I think there is a real hope that the things are more like you described rather than how they are described in the original post. So, almost all what you wrote goes for me in the category “can easily be correct in the sense that this is how things actually play out in our Universe”.
A couple of arguments fall out of this category and constitute rather disagreements:
As I wrote, “the feedback loops are, generally, not tight at all”. So the important word is generally—generally, existential threats don’t have this property. You gave an example of a threat which has this property, but my point is that we don’t need all threats not to have tight loops for extinction to happen, few are enough.
To be clear, I don’t think I am assuming the second thing, but now that you said it explicitly, it looks indeed highly likely that as civilizations grow technologically, more and more agents have an end-the-universe button.
I may be wrong here, but “2 agents with uncorrelated utility functions might find a way to achieve near maximum on both functions” sounds like something very unlikely. I mean, even they might find a way, why would they?
Well, the entire point of the post is that ” x-risk reduction gets a constant fraction of resources” is unlikely. Now, I think you argued succesfully elsewhere in your reply that it may not be the case, but here, if we accept the premise, then this particular argument should be correct.
It might be interesting to spend some time trying to construct systems that wouldn’t kill themselves because of the dynamics you describe, or that would kill themselves only extremely slowly (maybe it’s good to think of this in terms of how much stuff you can get done or how much tech development you can do without killing yourself, so that we don’t consider just slowing down the pace of everything uniformly a win). [1] In particular, I think it is interesting to consider configurations of the following form:
Some options of what to have for the initial “living” core:
an individual human
a small group including you and your most careful + [philosophically competent] + [generally competent] friends
current humanity but with everyone below 160 iq lifted to 160 iq
Some options of what artifacts to grant them:
all the usual technologies available in 2026, and textbooks explaining how to make these in detail
cures to like the 10000 most important diseases and kinds of aging, and understanding and methods for creating new cures if necessary
a device that lets you save a state of a human and then construct another copy with that state later (e.g. the individual human could save a copy at age 20 and have a policy of constructing a new young clone each time the previous one gets old)
lots of various resources in easily usable form
Some options for “semi-living” components to set up:
various choices of initial governance conventions: some sort of democracy, maintaining and enforcing some sort of system of principles
various choices of initial epistemic infrastructure: a forum, a prediction market system, educational institutions and practices
some convention for resolving disputes
some attempt to tie actions to the best available thinking more. implementing some stronger conservatism around technologies and other major actions maybe
some practices that are supposed to help with not going crazy and not getting depressed and not becoming a radical negative utilitarian and other mental health stuff
some mechanisms fighting against the system being subverted
My guess is that we can identify an initial configuration in this space such that the system probably doesn’t kill itself for a long time (like let’s say for at least a thousand years’ worth of getting stuff done or technological development at the 2025 pace [2] ).
Also see Yudkowsky’s world.
like for now i mean: “construct” conceptually, like “construct” in the mathematician’s sense, not in practice. though constructing such a system in practice is of course also very interesting and important
note that this is a decent amount of development/[doing stuff] despite corresponding to only 1000 years — plausibly more than the sum total of development/[doing stuff] in our galaxy’s history so far, given how much things have sped up
Thanks! These are interesting thoughts to think about indeed, and I will do that!
My two counterarguments go like this:
It proves to much. You talks about this in one sense. But what I specifically mean is: the anthropic principle tells us that we should expect ourselves to be alive. But there are other things than ourselves. Like other animals, other nations, other organizations, other people. And many of these exist for millions or years, or currently exist together with us. Anthropic principle doesn’t rescue those entities, but they still persist.
You don’t really talk about nested structures of cooperation and competition incentivizing structures to be longer lived and avoid going extinct. Like within a tribe, people could be fighting for their own short term gains over the survival of the whole, but tribes where people do that go extinct. Same with species. And nations probably. So its a problem we’ve faced before, and there is a strong incentive to avoid the extinction thing.
You kind of talk about it here:
> “Now, there is one scenario in which survival-oriented behavior could be reinforced: if existential disasters happen frequently enough and are mild enough that agents who prepare for them consistently outperform agents who don’t.”
But this makes it seem like this is a rare and unusual thing. But it seems to me like like an omnipresent force. Bodies are destroyed, get sick. Nations die out. Species go extinct etc.
I mean, this is true, but it doesn’t sound to me like something contradictory to the post? Except maybe small remark that we may be alive because of the anthropic reasons, but that is quite secondary.
Agreed! Hopefully the forces you described prevail. Also very much in line with what Donald Hobson wrote in his comment.
The first one did seem pretty central to me.
Like the first thing the Lethal Reality hypothesis has to explain is “why are we alive at all”. And the argument given is anthropics.
I think the reason why ”we figured out how to survive in tribes” doesn’t hold is that with climbing the tech tree you distribute actuators to individual agents that are complex enough to have much longer feedback loops. Acting uncooperatively in a tribe is trivially observable as bad through short feedback loops. Releasing a bioagent to hurt a rival nation or a larger interest group is not trivially traceable and there’s less historic analogues to extrapolate from.
Your point on an omnipresent force that selects for survival preparedness holds for things that have analogues in near miss scenarios—e.g. pandemic preparedness. But I think the authors central thesis is one of the “near misses are the only thing driving survival preparedness“. In fact if we zoom in on AI—I think the best chance we have civilisationaly is that we increase AI ability sufficiently to have a near miss disaster of sufficient scale happen quickly enough to incentivise strong survival preparedness way ahead of ASI. This is what I see the natural consequence of the slowdown to be—buy yourself time to experience a near miss at social dynamics feedback loop level before you make the next ability jump. But the game is not progressing with that feedback loop in mind.
This is a good write up of an interesting, if pessimistic argument. I’m not sold that this happens on a timescale that falls within ordinary human planning timelines of a century or two, but I’m not totally convinced that it doesn’t, either.
I’ve actually seen a somewhat different argument about the dangers of optimization. This was made by Vernor Vinge in his fantastic novel, A Deepness in the Sky.
The central idea was that optimization gave you more resources to use, but that sufficient optimization also destroyed “slack”, your margin for dealing with emergencies. For example, a highly optimized “Just in Time” manufacturing system is more profitable than idle warehouses full of inventory. But it things went wrong, you had very little buffer to draw upon. And that over generational time, if nothing else killed you, there was a temptation to optimize right up to the limits of your environment. This would mean that even small environmental shifts might cause cascading failures.
I’m not if this poses a true risk of civilizational collapse and large scale disaster. But it has made me appreciate the idea of slack and redundancy in systems. I recall that Netflix, for example used to run across 3 AWS regions, but it only needed 2 of them. Which meant they could lose a region and keep operating.
And this is certainly a risk that ops people know: Running too close to 100% capacity for too long means that failures tend to cascade rapidly and dramatically.
I think this dynamic is true at different scales, not just humanity’s overall civilization.
The fundamental problem is that everyone’s locked in a prisoner’s dilemma with Darwinian evolution tacked on top so that those who win one round get to duplicate and gain an advantage in the next round, so that everyone has to constantly defect to gain power. (This applies even to actors who want to optimize for cooperation in the world—their best strategy is ruthlessly gaining power first to gain the ability to use coercive strategies that force other people to cooperate! Note: in the real world these actors may have a way to avoid “defecting”, or doing non-cooperative actions, if they can creatively find a way around most competiton and thus avoid the prisoner’s dilemma entirely)
As a result you can see examples throughout history of organizations and societies failing to “stop Cthulhu” (as defined by this article) in various different ways. You can replace Cthulhu with climate change, Hitler, or the long-term innovation of companies that are required to stay relevant in a changing market 20 years down the line. People never start cooperating until they realize it’s almost too late to stop the issue, and thus that banding together and focusing efforts to solve the issue becomes the only possible good strategy for all actors. And sometimes they realize too late and the society or company never recovers. (On a global civilizational level, this kind of last-minute recovery gets harder and harder as technology improves. How WW2 played out would be almost certainly impossible to replicate if something like it were to happen again. But perhaps that would just push the “last-minute” threshold earlier instead so the world keeps being saved just before the problem gets out of control.)
The only way around these dynamics is for a leader or regulator to punish defectors and force everyone to cooperate towards solving the long-term problem, who is themselves benevolent (wants to solve said long-term problem) and beyond the ability of anyone else to compete against. Examples: visionary founders, shareholder-proof leaders of PBCs, national governments (relative to companies), multinational organizations like the EU and UN (to the extent they actually do something), and the US-led post-war world order more generally.
Otherwise the short-term optimizers will always bubble to the top and doom the society or organization as a whole.
The author says as much:
At corporate-sized organizational scales, there are obviously feasible solutions to the problem. In the civilization case, it is almost impossible without domination by a single leader, whether human or AI.
(I would say that this line of thinking, if extrapolated to a societal and global level, leads to some very troubling implications on what the best path of future civilization looks like.)
You provided many real-world illustrations of the idea, which are indeed relevant. I did not focus in the post itself on examples or on how the ideas manifest in history or economics, but there are many local examples.
Actually, one particular case that is very close to me as a Ukrainian is the Russo-Ukrainian war. I have been constantly thinking about this case while working on the idea. Ukraine, fortunately, survived, but at a great cost, and it would not have survived at its level of preparedness if Russia had not been so extremely incompetent. Pro-survival agents were largely outcompeted in Ukraine by survival-indifferent agents, and a competent adversary motivated to exterminate Ukraine would have succeeded rather easily. Later, because the war persisted for a long time and provided a tight feedback loop, society reorganized itself around being pro-survival, but that happened only because the initial Russian invasion failed. The pressure from the existential threat was slow enough to allow adjustment, and in a way it was a great stroke of luck that it was slow enough.
This is a great reframing. The concept of humanity likely being “minimally fit” for its niche is one I need to reflect more on.
“A civilization needs to become smart enough to internalize the instrumental value of survival before it becomes powerful enough to alter its own local environment to a lethal state”—this reminds me of Nick Land’s deterritorilisation argument.
won’t a society that reasons this way get “outcompeted” by one that makes better decisions, in the sense that the former society ends up eaten by fish people (or whatever the fate)?
as long as there is an era where the threats are local, selection should have enough feedback to teach the lesson.
There aren’t enough different societies for selection to have much effect at this level. This is the evolutionary theory of group selection, which I think seems to be mostly false.
Partial disagree. There are absolutely intrinsically high-trust and intrinsically low-trust societies, and we have seen this WRT global issues like the environment. In some places, things like littering and dumping in rivers is “just what’s done”, and in others it is “just not done”, despite every nation having access to the same information about how bad it is to pollute the water supply. Group selection kind-of-works for humans because human groups can police their own very effectively over many generations. Most high trust societies today have a long history of executing a decent share of the population for crime and dishonorable behavior every generation.
That said, AI is low-salience for most people, and I think a substantial share of the people that care believe that descriptions of a threat are overblown. Among the remainder, you generally see programmers, engineers, politicians, and military planners rather than ordinary people, and those groups are much more inclined towards logical game-theoretic arguments than moral ones, even if the rest of the population leans the other way, simply because they either start their problem-solving process by mathing it out (programmers, engineers) or because they got where they are by being pragmatists (politicians, military planners).
this has nothing to do with group-level gene selection: the learnings can be entirely cultural. i’m not arguing that we are genetically predisposed to consider tail risks, rather that existing societies have faced some pressure to [create cultural machinery that effectively aligns their constituents to] care about tail risks.
i don’t expect that many societies would be needed for this, as horizontal meme transfer is easy, and cultures can learn post-mortem from their missing neighbors. see for example the sentinelese, as an existence proof.
I see. I’d frame this more as individuals learning from past societal failures. But to what extent is this happening? The warning of Easter Island seems largely unheeded. Some individuals learn. Whether it’s enough is highly questionable. I don’t see societies putting much effort into this; I can’t even think of a single class on “why civilizations collapse”. Books exist, but that’s not much of a societal-level effort.
But I see what you mean and I agree. Individuals will see examples of this happening and be able to learn from them.
I think this is true in so far as there is selection pressure, in that such events are survivable, and in that they don’t require unified coordination of all agents to survive.
The cthulu example isn’t great, only in that the nature of the threat is pretty vague to most readers (at least to me).
A better example would be; a medieval society gets hit by a meteorite; does this cause selection pressure for medieval societies to build meteorite-proof castles? Not if it just kills everyone.
Alternatively; an early-industrial era society that notices an approaching comet might be able to coordinate to invent a redirect-rocket, or nukes, or whatever to save their society, but there is still no selection pressure since the two possible results are everyone survives, or everyone does, and if anything competitive pressure will punish anyone who spends resources on saving the world, since those resources will benefit competitors who spend zero resources on the asteroid redirect mission, just as much as the ones who spent half their GDP to survive. Unless social pressure or similar can effectively reward the heroic resource sacrificing nations, they will be putting themselves at a huge disadvantage and if GDP correlated to representation over time you would expect the selfish nations to actually be the ones that are selected for.
I get that you arguing that a society which was this bad at reasoning ing general should be outcompeted by a society that is better at reasoning, but we should expect both societies to be out competed by one that is both capable of reasoning well when it is competitively valuable, and ignoring such reasoning when it is not competitively valuable. I think might be a good description of the current United States, for example which is great at listening to academics when it is profitable and ignoring them when it is inconvenient for business interests.
this is “chicken” in the theory, right? whoever swerves first (fends off the meteor) loses (pays the cost), but if nobody swerves, then everybody loses big (suffers the impact).
i agree that the decisions here are more complex than “always immediately fund the antimeteor kickstarter” or “always freeride”. both societies should lose to ones that are better at skills like coalition building, etc.
Ya, I agree this should be true in principal; I think given more time, there might be the opportunity for some sort of “Dath Ilan” lite society to rise to the top.
I would add two points to refine your reasoning that intuitively seem to hold uncontroversially:
You walk back from the notion that everything dies and that no species can do anything to progress survival (paraphrase, apologies for reduction) - yet this is the exact thesis of Heat Death and I would argue the natural consequence of a universe that is entropy maximising. What are examples of naturally occurring complex systems that have achieved homeostasis that is stable with a high degree of certainty? Evolution is self correcting, or was, until it yielded us. Whether it persists is up in the air. In fact I would add another step to your reasoning to actually reinforce the conclusion that all species destroy themselves in the long term—as increasing actuators of an organism (which technology expands) increases the variance of actions and the more complex systems are in the action space the more likely they are to hit an extinguishing state. I get that this is an uncomfortable conclusion but I do not see how it doesn’t stem from your premises. In order words—on the basis of things written, I think you walk back not because your reasoning doesn’t support this, but because there is an instinct (or bias) that walks you back.
You argue that instrumental convergence means survival actions would be selected for—but you fail to acknowledge that civilisational survival trades off against individual survival (at least in ASI x-risk settings). In fact, it is precisely because of instrumental convergence that you would expect a rational behaviour that chases progress at all costs to exist—because the human rational circuitry optimises for preservation of the brain instance to remain competitive, rather than preservations of DNA sequences that yielded the brain instance (the evolutionary pressure), for which civilisational survival would be sufficient.
Small typo
Thanks, corrected.
A somewhat orthogonal hypothesis that I was thinking about for some time: if we develop a rigorous definition of intelligence (sounds plausible), it may be possible to prove mathematically that it is unstable. Or maybe to prove that it is stable. And I don’t mean just humans going extinct, but any possible intelligence, including ASI.
In other words, even in a maximally friendly universe p(doom) is exactly 1 (or 0, for the opposite result).
The trick is, of course, to create the math necessary to describe the system, beginning with definitions.