Unprecedented Catastrophes Have Non-Canonical Probabilities

The chance of a bridge failing, of an asteroid striking the earth, of whether your child will get into Harvard for a special reason only you know, and of whether AI will kill everyone are all things that can be expressed with probability, but they are not all the same type of probability. There is a structural difference in the “probability” of a bridge collapsing being a () chance, versus saying my P(doom) is 15%, or my P(utopia) is 2%.

I know what you are probably thinking: “we’ve heard this before...you are going to go down some rabbithole about how P(doom) is tribally divisive, or that it distracts from gears-level models, or maybe another Popperian argument that it is philosophically meaningless to assign probabilities to unprecedented events.”

Don’t worry! This post is none of those things. (If you want to double-check, feel free to skip down to the Why Is This Different section.)

And to be clear at the outset: I am not anti-Bayesian, and I definitely think quantifying uncertainty is generally a good idea. However; I am going to argue that probability estimates for unprecedented catastrophes are non-canonical[1], and I am going to do so within the confines of Bayesian machinery itself, using likelihood ratios, posterior convergence, and algorithmic information theory.

First, what do I mean by a canonical probability? A canonical probability is stable across reasonable scientific frameworks given the available evidence. For example, a probability of an extinction level asteroid impact is canonical because the mechanistic models of orbital dynamics would be continuously updated by routine data which would directly bear on the impact parameter and these, and related factors, would force all reasonable scientific frameworks to converge to a similar narrow estimate. Note: this is canonical even though we have not seen an asteroid like this hit earth (obviously we have evidence of past ones though).

However, genuinely unprecedented catastrophic risks are non-canonical because the available non-catastrophic data structurally refuses to wash out your priors. Without past events to sharply discriminate between competing models the probability remains only partially identified and your resulting estimate is only a measurement of your chosen ontology rather than a measurement of the world.[2]

The illusion of precise catastrophic probabilities can be traced to two distinct mathematical failures: one based on how we process evidence, and the other in how we define the event itself. Below, I will break down these two grounds of failure, using Nick Bostrom’s recent paper “Optimal Timing for Superintelligence” and Eliezer Yudkowsky’s asteroid rebuttal parable as an example.[3] Finally, I’ll offer a concrete suggestion of how to escape them and restore empirical rigor.

Data Doesn’t Resolve Ontology

Before showing where probabilities of unprecedented catastrophes break, let’s briefly return to the asteroid impact example to show when they do not break. The key idea with an asteroid is that we can gather non-event data that directly updates the impact parameter and we can washout whatever prior we started with. Econometrics has a great way of expressing this concept: it is called being point-identified.[4] The data has forced the parameter to a specific, narrow point.

Now let’s turn to catastrophic probabilities that do break. The battle between Bostrom and Yudkowsky is a great example of genuinely different ontological starting points, but I don’t mean to make overarching claims about their complete intellectual positions. They draw on multiple frameworks and are not formal description languages themselves.

Bostrom recently wrote a paper that framed building AGI as a “risky surgery for a condition that will otherwise prove fatal”. The crux of his argument is that if we want to maximize QALY, high probabilities of catastrophe from AGI are worth accepting because everyone alive today will die anyway if we do nothing. The surgery-framing model is structurally native and simple to state.[5]

In response, Yudkowsky, wrote a mocking parable on X called the “Asteroid of Immortality”. In Yudkowsky’s framework, AGI is like an incoming asteroid, and Bostrom’s reasoning for eternal life amounts to logic of a cult rather than a surgeon. In Yud’s alignment-centric framework, “doom via misaligned optimization” is the natively simple, default prediction.

To understand where probabilities associated with these views will come from, we first need to formalize what a framework is. Think of a framework as the programming language of their worldviews: it determines which explanations are naturally simple to express, and which are unnatural and super complex. Imagine two ideal rational agents using two different Universal Turing Machines as their formal description languages (their “programming languages”). Machine natively encodes Yudkowsky’s alignment concepts. Machine natively encodes Bostrom’s surgery-framing concepts.

There is a fundamental problem, however: the data that we are gathering right now cannot help us choose which of these frameworks is likely to be right. This is quite counterintuitive because when we Bayesian update we expect that new data will wash out our subjective priors. But unprecedented catastrophes hit a concept I call The Evidential Screening Property. Another year without world-ending events gives us more data, but this data fits equally well into a model that predicts it’s all over and a model that predicts it’s all ok. Because both models perfectly predict the prefix of history (the track record of AI development leading up to the current moment), the likelihood ratio between them is approximately 1. The catastrophe parameter is entirely screened off from the evidence, leaving our prior beliefs untouched.

To put this in more precise terms, we are not getting more useful data over time, just more noise that fits both theories perfectly. Formally, the log-likelihood ratio is bounded by a tiny number of bits, , that grows excruciatingly slowly, if at all. With Bayes your posterior odds are your prior odds multiplied by the likelihood ratio. If the likelihood ratio is structurally bounded near 1, the prior odds completely dominate the posterior.

This takes us back to econometrics and a concept called Partial Identification. I briefly mentioned point identification at the start of this section, that is, we can use data we gather from something like an extinction-asteroid to force a sharp single risk estimate. In contrast, the non-catastrophic prefix of AI history structurally fails to do this and the data only constrains the parameter to a hugely ambiguous region. This is what Manski calls being partially identified: the hugely ambiguous region of data is a prior-free identification region and the choice of descriptive language, and its induced algorithmic prior, is just an arbitrary coordinate selected in the void.

The Complexity Inversion and the Failure of Model Averaging

If you are familiar with Bayesian math, as many of you probably are, you may now be thinking something like this:

Ok, the log-odds between Bostrom and Yud’s specific and extreme models might shift wildly based on your ontology, but with Bayes we averaging over the mixture of all possible models, not just one. When we have a rich hypothesis space a diffuse and large number of middle-mass moderate models will anchor the probability and reduce the extreme swings. In other words, something like the aggregate of P(doom) will remain stable.

This is a smart defense, but I think it empirically fails for models used today under a specific condition I will call the Complexity Inversion: switching frameworks flips which type of futures are considered simple and which are considered complex.

Recall earlier that I likened frameworks to programming languages. In algorithmic information theory (“AIT”), Kolmogorov complexity is the rule that penalizes a theory based on how many lines of code it takes to express it. The longer the code, the lower the prior probability. When we switch frameworks, like switching between Yud’s and Bostrom’s, we aren’t just changing weights or adding noise, we are swapping which entire clusters of models (high-risk versus low-risk) get taxed by this complexity penalty.

Perhaps under Yudkowsky’s Machine , a doom scenario routed through deceptive alignment is concise and an entire cluster of high-risk models is algorithmically simple. In contrast, perhaps under Bostrom’s Machine , expressing mesa-optimization requires a massive, clunky translation, but maybe a QALY-maximizing “institutional equilibrium” cluster is inherently much easier.

In AIT the Invariance Theorem tells us that translating concepts between two Turing machines incurs a fixed overhead cost, which here I will denote as (measured in bits). Because in our example these frameworks use genuinely different ontological primitives the translation cost is massive. With Bayes calculations for something like the total aggregate probability of doom, we are not taking a flat average, instead the clusters are weighed down using a sigmoid curve. The sigmoid acts less like a balancing scale and much more like a tipping point: if the algorithmic advantage of one framework is large enough to beat the useless noise of the daily data by at least bits, the curve snaps hard to that side. It ends up dragging the entire middle-mass of moderate hypotheses along with it.

So let be the probability of doom in the high-risk cluster, and be the probability in the low-risk cluster. By the law of total probability, the cross-framework gap in the aggregate estimate is mathematically bounded below by:

where is the small residual mass outside the core model set (see the formal appendix for the full derivation).

The interesting thing about this inequality is that as the translation advantage grows, the fraction vanishes. The difference in aggregate P(doom) approaches which is the entire width of the probability space under dispute!

I should briefly pause again here: I love postmodern fiction, but I am not a postmodernist or a nihilist. I don’t mean to imply that math or data are fake or anything like that (which you could accidentally conclude if you think the implication is “you’re just saying assign random P(doom) based on your feelings and nothing matters.) I’m definitely not saying this, and the divergence is mathematically capped by the translation cost between frameworks. By the Invariance Theorem, the parameter satisfies the structural constraint:

The point isn’t to say that P(doom) can be anything, it is to say that it is underdetermined within a specific range, proportional to how different the competing scientific ontologies really are.

Differential Screening: The Random Walk of Doom

Let’s give room to the skeptic again: “Ok, I see the points above, but eventually, as clear and unambiguous evidence accumulates, the data force convergence, right?”

Unfortunately not. This assumes a monotone accumulation of evidence, that is, the assumption that every new datapoint pushes the probability in the exact same direction. But this is not correct, when frameworks carve up the hypothesis space using different causal joints, the same piece of evidence can cause them to update in opposite directions. I call this divergence of uniform agreement on what an observation means Differential Screening.

Returning to Yud and Bostrom (again these are just hypothetical examples, I am not saying I speak for them and I don’t know all of their positions) say an AI company deploys a new model that happens to find a ton of zero-day exploits.

  • Under Machine (Yud’s alignment framework): this event is negative. It is understood as representing a causal model where the system’s capabilities are demonstrably outpacing our ability to control them. The Bayesian update is gap-widening and risk goes up.

  • Under Machine (Bostrom’s surgery-framing framework): views this exact same event differently, say as a causal model where this is the warning shot that wakes up government regulation and boosts safety ahead of AGI. The update is gap widening in the opposite direction of Yud and risk goes down.

The log-odds gap between these two frameworks over time does not steadily shrink because a substantial fraction of new observations are gap-widening in opposite directions. The cumulative differential accumulates mixed-sign increments rather than declining monotonically. The trajectory of our collective belief doesn’t end up as a converging asymptote, instead it acts like a bounded-increment random walk.[6]

As a result, the expected time to canonical agreement scales non-linearly with the fraction of disputed evidence. It pushes the convergence timescale far, far beyond any policy-relevant horizon.

The Halting Problem of Permanent Loss

In the sections above we showed that even if we perfectly agreed on what “doom” meant, the evidence could not pin down the number. However; there is an even more pernicious and problematic formulation of catastrophic event prediction: ones involving expansive formulations that end up with ill-defined definitions of the event itself.

Think about a common definition of existential risk as something that leads to the “permanent loss of humanity’s potential.” If we want to calculate a real probability for this, we have to be precise about expressing it as a coherent, evaluatable mathematical object in code. We need to formulate a computable classifier because for computationally universal generative models (complex agent-based simulations or LLM rollouts over millions of tokens) the measure is not analytically integrable, meaning the probability measure cannot be solved with a clean analytical equation.

A computational agent can only estimate the probability of doom by essentially running Monte Carlo rollouts of finite-time trajectories and scoring them. To accurately score a rollout, the mushy philosophical concept of “permanent loss” must be operationalized into a computable, halting indicator function . You need an algorithm that can look at a simulated future and definitively output a 1 (Doom) or 0 (Not Doom) otherwise you cannot get a percentage from the math.

Even a set-theoretic Bayesian who doesn’t care about running code or doing Monte Carlo simulations, and just wants to define their event using ZFC runs into the same problem. To actually compute their probability they must write down a formal logical sentence defining the event, but because defining permanent loss at the asymptote is so deeply tied to a specific ontology, two agents using different formalizations will inevitably compute probabilities for extensionally different sets: and . They end up doing math on two entirely different versions of the apocalypse.

Defining this event feels sort of structurally cursed, and the reason why is that it comes down to the infinite nature of the environment we are trying to predict. Recent papers have suggested that the universe, or the post-AGI environment, is computationally universal[7] so “permanent loss of potential” is not a localized event, but an asymptotic property of trajectories. Formally, it has the logical structure of a condition (there exists a time after which recovery never occurs).

The mathematical classification of unsolvable problems is known as the arithmetical hierarchy and the number of alternating quantifiers determines just how deeply uncomputable a problem is. A standard Halting Problem only has one: does there exist a step where this program stops? Sometimes this can be solved by just running the code and waiting. But the notion of permanent loss stacks two quantifiers: it requires that there exists a threshold followed by a forever state. Because you can never confirm a forever state just by watching a finite simulation (because humanity could theoretically recover at step ), this logical structure makes the problem more difficult. In this case, it makes the target event a -complete tail set in the arithmetical hierarchy. It is a tail set because its truth value depends entirely on the infinite tail-end of the timeline and completely ignores whatever happened in the finite prefix.

This matters because any practical probability estimate requires a total computable classifier ; an algorithm you could write to evaluate whether a simulated future counts as “doom” or not. It must halt and output an answer (1 or 0), so it must make its decision after reading a finite prefix of the future.

The problem is that membership in a nontrivial tail set cannot be determined by a finite prefix. Therefore, any computable classifier must systematically misclassify some trajectories.

To solve this you might think you could write a more complex and better-describe classifier to reduce the error rate, but it is possible to strictly prove that reducing this classification error requires algorithmically incompressible bits of information. You run into a Precision-Robustness Tradeoff. Working in the Solomonoff mixture over Cantor space, the product of your classifier’s error and its complexity weight is bounded below:

This equation establishes a strict mathematical balancing act: to drive the error down, your code’s complexity must go up. But the trap is much worse than just having to write a longer program, correctly classifying the low-complexity boundary cases requires deciding the halting problem. As you may expect this is worrisome because no general algorithm can solve the Halting Problem. Your classifier literally cannot compute the answers to these boundary cases, it must have the answers hardcoded into it by you, the programmer. To reduce your error, your classifier must physically encode initial segments of Chaitin’s constant relative to the halting oracle ().

Slightly imprecisely: Chaitin’s constant is the solution to the Halting Problem expressed in binary digits. However; the sequence contains answers to uncomputable problems so it has a property called “2 randomness” which makes the sequence indistinguishable from computationally irreducible static. So since these bits are 2-random they cannot be compressed and for every factor-of-two reduction in predicate error, it is required to have one additional bit of perfectly random, incompressible specification.

Now if you wanted those bits, where could they come from?

  1. Computation can’t generate them because they are uncomputable.

  2. Data can’t supply them because data only tells you which finite prefix you are on, not how to classify its asymptotic tail.

  3. Cross-agent consensus won’t sync them because the Invariance Theorem doesn’t force two different frameworks to choose extensionally identical classifiers.

It turns out you are entirely leaning on unforced subjective taxonomy which traps you in an Impossibility Triangle. You must either

  • Accept Ambiguity: and use a simple predicate with low algorithmic complexity. This will cause you to mathematically misclassify a vast number of simple, computable futures. You may end up calculating the probability of the wrong event.

  • Accept Fragility: by using a highly complex event predicate to reduce misclassification. However; the bits required to define it are incompressible and must come from your specific ontology and your estimate becomes fiercely encoding-dependent.

  • Accept Both: and pick a middle ground. You suffer from moderate ambiguity and moderate fragility at the same time.

The overall issue is this: if you reduce the predicate ambiguity, as we just discussed, you are forced deeper into framework-dependence, with all of the issues we previously highlighted. We proved the data cannot correct your priors and now we proved you must inject uncomputable bias just to define an event. Together these guarantee that any precise percentage of these types of risks are a structural illusion. You are no longer measuring the “probability” of the event, you are only measuring the algorithmic cost of your own worldview.

Why This is Different

If you came from the introduction: welcome! And if you just read through the essay above, you might be tempted to map this argument onto existing debates. Especially because one type of unprecedented risk includes P(doom), and I used it as an example, it is critical for me to explain why the formalism above puts us into a different bucket.

Usually critiques of P(doom) fall into one of three buckets:

  1. A Psychological Critique: something like “humans are bad at forecasting, poorly calibrated, and overly influenced by tribal vibes or sci-fi tropes.”

  2. A Complexity Critique: someone says that the world is too messy, the causal graphs are too dense, and our current models are simply not gears-level enough to produce a reliable number.

  3. An Epistemological Critique: usually a standard frequentist or Popperian objection that genuinely unprecedented events cannot be assigned probabilities because they lack reference classes, making the exercise philosophically moot.

This post is none of those because I am assuming the perspective of an ideal Bayesian agent. In doing this I am granting the agent infinite compute, perfect logical omniscience, and flawless Solomonoff induction. Even after doing all of this, the math shows that canonical point-estimates for unprecedented catastrophes are structurally blocked.

Specifically, here is what separates this essay from the usual discourse:

  • It is not just about “Priors matter”, it is about Evidential Screening. The standard Bayesian defense is priors are subjective for small samples, but enough data washes them out. However; the Evidential Screening Property proves that for unprecedented risks, the data structurally cannot do this because all available non-catastrophic data perfectly fits both the “safe”-ontology and the “doom”-ontology. This makes the catastrophe parameter mathematically screened off. We could observe a thousand years of safe AI prefix data and the priors would never wash out.

  • It proves that more data won’t save us. The intuitive hope is that as AI capabilities scale we will approach a sort of asymptote of consensus. But Differential Screening proves that because different ontologies have misaligned causal joints, they interpret the exact same capabilities jump as evidence in opposite directions. This demonstrates that we don’t tend to a narrowing asymptote, but rather it leads to a bounded-increment random walk and more data can easily result in permanent polarization.

  • It proves that better definitions demand uncomputable magic. When faced with the imprecise nuance of expanded definitions of P(doom), a rationalist impulse is to operationalize the definition. The Precision-Robustness Tradeoff proves that this instinct is a trap. For asymptotic tail events like “permanent loss of potential”, driving down classification error mathematically requires incompressible, 2-random bits of Chaitin’s constant. This is an insurmountable problem: you cannot compute them and the data cannot supply them. The ambiguity isn’t a linguistic failure, but instead a strict boundary in the arithmetical hierarchy.

  • It is not epistemic nihilism. As I mentioned above, I am not throwing my hands up and saying, “We can’t know anything, so assign whatever number feels right.” The math strictly bounds the divergence by the translation cost between frameworks (). I am tearing down the illusion of single point-estimates specifically to replace them with mathematically rigorous alternatives.

Most critiques of P(doom) attack the agent making the estimate. This framework attacks the information geometry of the problem itself. And it is a problem for P(utopia) as well!

How to Restore Canonical Risk

The probabilities of unprecedented catastrophes are non-canonical because their precise numerical value is a syntax error generated by compilation costs, which is trapped by evidential screening, only partially identified by data, and then choked by uncomputable predicates. So what are we supposed to do? Wait for the end while wringing our hands? No, but we have to be clearer about what survives the mathematics:

Find the Holy Grail: Agreed Precursors With Aligned Causal Joints

You can break the Evidential Screening Property if rival frameworks agree ex ante on an observable intermediate causal node. In other words, if Yudkowsky and Bostrom can agree that “doom is strictly preceded by an AI system autonomously acquiring $100M of illicit computing resources” or public health experts agree that “a novel pandemic is strictly preceded by sustained human-to-human transmission”, then we have an observable precursor.

There is a catch: it is not enough to simply agree that the precursor is necessary. Both frameworks must explicitly agree ex ante on the directional update that observation triggers, that is, precursors where the causal joints align. This is not a new concept , but it is my top and best guess at a constructive recommendation. The screening property is what makes P(doom) non-canonical; agreed precursors with aligned derivatives are what break the screening property. When they do align, non-event data discriminates between models that predict different precursor rates, likelihood ratios can grow, prior gaps get washed out, and estimates converge! Before we can solve AI alignment, we should solve epistemic alignment among ourselves.

This points to a clear shift: progress does not come from debating unobservable asymptotic mechanisms (“is it a surgery or an asteroid?”) or refining subjective point-estimates. It comes from doing the difficult work of building cross-framework consensus on the observable causal nodes, and their directional updates, that necessarily precede the catastrophe. As treaties and agreements are of high interest to AI safety groups, this seems like a tractable area to focus on, and one that does not require nationstate agreements. It only requires rival labs and pundits to sit down and agree on what a specific test result will actually mean before the test is run.

The allure of a single, objective probability estimate is essentially a desire to outsource our existential fear to the comfort of a single number. It is unfortunate that math refuses to cooperate for this purpose. It is the case that when dealing with unprecedented catastrophes your framework is your fate. Until we find agreed precursors with aligned derivatives, we aren’t doing Bayesian updating, we are just arguing about which programming language is prettier.

Appendix: All of The Mathematics

This section goes deep into the math, with minimal explanation, into the concepts above. I am working on a full paper that melds the narrative essay with the precise mathematics below, please let me know if you would like to see this. I don’t think it would have fared well on LW for clarity. There are, of course, numerous sources that support the bulk of the mathematics utilized because I am not proving any new computability theorems or doing anything very special. The small additions of my own (at least to my knowledge) are: the Mixture Probability Dispersion Theorem (Theorem 1) and its composite-cluster proof, the Precision-Robustness Tradeoff (Theorem 2) and the direct routing through , the Additive Decomposition (Identity 1) separating the two grounds, the formal bridge connecting evidential screening to Manski’s identification region, the Translation Barrier Corollary, and Conjecture 1. If anyone who is actually good at math could help prove or disprove Conjecture 1 I would be very grateful. I tried a bunch of different ways to figure it out, but I just couldn’t.

Formal Setting and Objects

Spaces and measures. The sample space is Cantor space . A computable measure is one where the map from finite strings to cylinder probabilities is a computable real uniformly in . Let denote the Dirac measure concentrating on the sequence .

Description languages. Let and denote optimal prefix-free Universal Turing Machines (UTMs). is the prefix-free Kolmogorov complexity of a string or measure relative to . Directional compilation cost is the length of the shortest prefix-free compiler from -programs to -programs. Symmetric translation cost is . By the Invariance Theorem: .

Mixtures. Solomonoff’s uncomputable prior assigns weight to each computable measure . The posterior probability of event after finite prefix is . Let denote the induced Solomonoff measure over sequences.

Classifiers and events. A total computable classifier is an oracle Turing machine that queries finitely many bits of its infinite input and halts. By standard computable analysis, such a functional is continuous in the product topology, meaning its output is determined by a finite prefix. A tail set is an event invariant under modification of finitely many initial coordinates: for any and , where overwrites the first bits.

Ground 1: Canonicality Failure via Evidential Screening

Proposition 1 (Encoding Swing). For any models , the posterior log-odds under and satisfy:

Proof. The posterior odds decompose as . The likelihood ratio is encoding-independent and cancels upon subtraction. Letting and , the absolute difference is . By one-sided invariance, each bracket lies in . Subtracting the two intervals yields the bound.

Definition 1 (Evidential Screening & Partial Identification). An estimation problem satisfies the evidential screening property with respect to a target event and a core subset of models (capturing of the posterior mass) if available evidence satisfies:

for all achievable , where .

The prior-free identification region for the parameter of interest is:

Denote this region . For canonical parameters, shrinks to a point as grows. For screened parameters, remains persistently wide.

Theorem 1 (Mixture Probability Dispersion). Let be partitioned into a high-risk cluster () and low-risk cluster (), with . Let composite models and be the normalized within-cluster mixtures (any convex combination of computable measures is itself a computable measure, so these are well-defined). Let and . Let the normalized weights of the two clusters under framework be and , summing to 1 - (where is the residual mass outside the core set). Let and . If screening holds such that and for then:

Proof. The total mixture probability under is , where is the residual contribution. Let . Substituting normalized weights:

By the odds decomposition, . Under the screening bound, , so . Symmetrically, so . Since is monotonically increasing:

Subtracting the symmetric expansion for and bounding the residuals:

Corollary (Translation Barrier Cap). By the Invariance Theorem, . Since and , we have . The non-canonical divergence is strictly capped by the translation barrier.

Definition 2 & Proposition (Differential Screening & Gap Dynamics). The log-odds gap is , where is the differential update. If frameworks decompose hypotheses along different causal joints, takes mixed signs. Modeling as a bounded-increment random walk with step size and fraction of gap-widening evidence:

(a) If , frameworks permanently polarize (probability of consensus decays exponentially in ).

(b) If , expected time to consensus by optional stopping is . As , convergence scales superexponentially.

Ground 2: The Event Predicate

Identity 1 (Additive Decomposition). The total cross-framework discrepancy structurally decomposes into Predicate Divergence and Model Divergence:

Assumption 1. Target catastrophe is a -complete nontrivial tail set.

Hypothesis (Computable Witnesses). contains computable , and contains computable . Let .

Lemma 1. For any computable sequence , .

Proof. A prefix-free program computing the Dirac measure can be constructed from a program computing by prepending an prefix-comparison logic wrapper. Normalization is absorbed by .

Theorem 2 (Precision-Robustness Tradeoff). Let disagreement region be and error .

(a) Topological Lower Bound: For any total computable classifier , .

(b) Incompressibility: Specifying a classifier that correctly categorizes a sequence family up to complexity requires algorithmically incompressible bits.

Proof of (a). must halt on input outputting after reading finite prefix . If , define . Since and is a tail set, . But . If , define , but . In either case, . Generating requires simulating (requires bits) and appending the chosen witness tail (at most bits). Thus . By Lemma 1, .

Proof of (b). Let be the halting problem for prefix-free machines relative to the halting oracle, with halting probability .

By Assumption 1, is effectively -complete. By the definition of effective completeness, there exists a uniform computable map producing a computable sequence such that . Because is already a valid prefix-free program on , no logarithmic prefix-coding penalty is incurred. The uniform map adds only overhead, so .

Suppose a classifier correctly classifies versus for all prefix-free programs with . By simulating on , one computably decides for all such . Knowing which programs of length halt determines the partial sums of to within , recovering its first bits. Since is 2-random (Downey, Hirschfeldt, Nies & Terwijn 2006), these bits are algorithmically incompressible, enforcing .

Conjecture 1 (Extensional Divergence). For any -complete tail set and , with large , the optimal -bit computable classifiers and are extensionally distinct ().

Why I want it, why I can’t get it. The objective function is a complexity-weighted loss. Because and induce radically different mass distributions, the strict -bit budget forces each classifier to triage errors differently. While the Invariance Theorem allows to be described under in bits, this exceeds the -bit budget, so the -optimal classifier cannot simply import the -optimal one.

I can not figure this out because correctly classifying -simple sequences requires incompressible bits correlated with , while -simple sequences require bits correlated with . These are distinct 2-random reals. A proof would require showing that a single -bit decision boundary cannot simultaneously serve both (formally, a mutual-information bound on finite decision trees relative to distinct halting probabilities.) Let me know if you have any ideas.

  1. ^

    I am not talking about “non-canonical” probability in a Boltzmann sense. Only as defined here.

  2. ^

    A quick disclaimer: in this essay I formalize a number of concepts into mathematical terms. I want to be clear that I am not claiming your brain is literally running Solomonoff induction over Cantor space. But it is very useful to establish formal upper bounds on inference using algorithmic information theory. The argument is this: by showing the structural limits of the problem for a mathematically perfect infinite-compute superintelligence that cannot wash out its priors with non-catastrophic data, our limited human heuristics wouldn’t be able to do it either. This is not an argument about the literal mechanics of your cognition.

  3. ^

    To be clear: Bostrom and Yudkowsky aren’t running competing simulations of AI trajectories based on these works, but their frameworks (the QALY-maximizing surgery ontology versus the alignment-failure asteroid ontology) are good examples of the kind of deep ontological divergence that when formalized into actual generative models produces the encoding dependence I talk about in this essay.

  4. ^

    See Charles Manski’s work.

  5. ^

    Interestingly, Bostrom’s own analysis doesn’t make the mistake I mention here. He takes P(doom) as an input parameter and optimizes across the full range from 1% to 99%, rather than claiming to know the exact number. The divergence between his framework and Yudkowsky’s is not primarily about what P(doom) is. The difference is about what the right decision-theoretic framework is and for reasoning about it and this is actually an encoding dependence itself. It is just one operating at a meta-level: the choice of whether to frame the problem as “risky surgery” or “incoming asteroid” then determines which decision-theoretic apparatus seems natural, which then determines what actions seem rational across the identification region.

  6. ^

    To get really rigorous: in the limiting case where computable priors natively exclude mechanisms outside their ontological primitives, they assign literally zero probability to each other’s core catastrophic mechanisms. Nielsen & Stewart proved that rational agents whose measures fail mutual absolute continuity don’t merely practically fail to converge, they can permanently, rationally polarize on the exact same stream of infinite evidence.

  7. ^

    Memory-augmented LLMs are likely Turing-complete.