Everybody Wants to Rule the Future—Is Longtermism’s Mandate of Heaven by Arithmetic Justified?
Dnnn Uunnn, nnn nnn nnn nuh nuh nuh nuh, dnnn unnn nnn nnn nnn nuh nuh nuh NAH (Tears for Fears)
I was reading David Kinney’s interesting work from 2022 “Longtermism and Computational Complexity” in which he argues that longtermist effective altruism is not action-guiding because calculating the expected utility of events in the far future is computationally intractable. The crux of his argument is that longtermist reasoning requires probabilistic inference in causal models (Bayesian networks) that are NP-hard.[1]
This has important consequences for longtermism, as it is standardly utilized in the EA community, and especially for the works of Ord and MacAskill. Kinney suggests their framework cannot provide actionable guidance because mortal humans lack the computational bandwidth to do Bayesian updating. Therefore, the troubling conclusion is that utilizing this framework does not allow people to determine which interventions actually maximize expected value.
In this paper I want to show that even if we could magically solve Kinney’s inference problem (a genie gives us perfect probability distributions over every possible future) we can’t make definitive expected value comparisons between many longtermist strategies because it is an undecidable problem. Any intervention is comprised of a series of actions which end up acting as a constraint on strategies you can still do. When we compare interventions we are comparing classes of possible strategies and trying to determine the superior strategy in the long-run (dominance of constrained optima).
Because I am going to talk a lot about expected value I want to be clear that I am not claiming that using it as a private heuristic is bad, but rather that many Longtermists often utilize it as a public justification engine, in other words, a machine that mathematically shows what is more correct and what you should obey. This is the focus of EV in this essay.
I use some standard CS results from the 2000s, to show that the retort of “can’t we just estimate it” ends up as a NP-hard, undecidable, or uncomputable to guarantee depending on the restrictions. This challenges a thread that continues to exist in the EA/Longtermist community in 2025. For example, MacAskill continues to make strong dominance claims in his Essays on Longtermism. Even with the hedging included in his arguments (not requiring optimal policies, approximations suffice for large numbers, meta-options exist, etc.) serious computational road blocks arise. For general policies the problem turns out to be undecidable. If you constrain your work to memoryless stationary policies then polynomial approximation is only possible if P=NP. And if we go even narrower to average-reward cases no computable approximation exists.
EAs frequently utilize a sort of borrowed epistemic credibility based on very finite and restricted projects (say distributing malaria nets) and then unwarrantedly extend this into areas of extremely long (or infinite timelines) where it can be shown that mathematical tractability ceases to exist (panspermia, AI safety, etc), and that these interventions are not possible to be compared against one another.
That said, not every Longtermist claim is so hard, and there are likely restricted domains that are comparable. However, as a general schema it falls apart and cannot guarantee correctness. Longtermists that want to claim superiority by mathematical maximization must specify how they are simplifying their models and show why these simplified models have not defined away the critical elements of the future that longtermists vaunt.
Context
Greaves and MacAskill claim for dominance of moral action using EV when they say:
“The potential future of civilisation is vast… [therefore] impact on the far future is the most important feature of our actions today”
which they then formalize as:
“Axiological strong longtermism (ASL): In the most important decision situations facing agents today… (ii) Every option that is near-best overall delivers much larger benefits in the far future than in the near future.”
This notion can be expressed as , with representing the optimal expected value achievable under an intervention versus . Such a statement requires a methodological guarantee to gain authority as a ranking procedure (i.e. you need to be able to demonstrate why intervention is superior to .) Such claims are crucial to the justification of longtermism as a methodologically superior and more moral reasoning procedure for these questions.
When Kinney presented his results that showed inference to be NP-hard, a standard response could be that bounded agents, which don’t require exact probabilities, are sufficient. So let us assume we give even more than a bounded agent, we allow an agent to have a perfect probabilistic representation of the world. For model classes used by longtermists the optimization (control) ends up being a distinct and undecidable problem. In other words, even if some deus ex machina saved the inference problem, Longtermists still would not be able to fix the control problem.
A Model of Interventions
To model these types of moral decisions in the real world in the far future we should select a method that has action-conditioned dynamics (that is, a person or agent can influence the world) and one that is partially observable (we can’t know everything about the universe, only a limited slice of it.) To achieve this it is sensible to use[2] a finite-description Partially Observable Markov Decision Process (POMDP), formally defined here as:
Where , , and refer to the states, actions, and observations available to the agent. The function is a transition function for determining the probability of a state change based on an action. captures of the observation probabilities and is the reward function. is the discount to the reward based on how far in the future it is , but note that the results below hold even if you remove discounting. Finally, represents the initial probability distribution over states.
It is important to distinguish between the levels of control that are necessary for complex open-ended futures (General Policies, ), versus the limited capabilities of agents with bounded memory (Finite State Controllers, , i.e. bounded agents), versus Stationary Policies () that are memoryless because it provides clarity for the reasoning and justifications that should mirror each other. For example, it is not logical to assume access to general policies about the far future, but then retreat to bounded agents and claim to have solved for the math is provable.
I am going to model an intervention as a constraint on the admissible policy set because interventions for the real-world usually describe the initial step rather than the series of actions over all time. So you can do something like distributing malaria nets at , but then you can pursue a perfect strategy after that. Let be the set of policies consistent with intervention and represent the maximum, or perfect, expected value of the intervention:
So then we can define the problem of defining the superior intervention, given as:
There are three questions a Longtermist should be able to answer:
The Threshold Problem: is a specific standard of success mathematically achievable by some policy? Given and rational , does there exist a policy in such that
The Approximation Problem: can you output an estimate value that is within the specified error bound of the true optimal value ? Can a Bounded Agent produce an approximated value that is close enough to the true optimal value? Output a value such that (multiplicative) or (additive).
The Dominance Problem: given a formal model of cause prioritization can you show the optimal value of is strictly greater than the optimal value of ? Is ?
Three Theorems
To examine whether the three questions above are computationally tractable I am going to utilize results from Madani, Hanks, and Condon (2003)[3] and Lusena, Goldsmith, and Mundhenk (2001)[4]. Can an algorithm exist that takes a longtermist model and outputs answers to the Threshold Problem and Approximation Problem? After that I will examine the Dominance Problem.
Madani demonstrated that when the time horizon is infinite, trying to verify a specific value is achievable creates a paradox similar to the Halting Problem (of course Omega played a role in my thoughts on this project.) I am evaluating the Threshold Problem for (broad policies required to model open-ended future strategies).
The first Theorem comes from Madani (1999) and says for finite-description, infinite-horizon, POMDPs, the Threshold Problem is undecidable, specifically for the total reward criterion or if a longtermist assumes the ideal where .
Theorem 1 (Madani, 2003, Theorem 4.2 and 4.4):
This implies that for the general longtermism case (especially where future discounting is removed) no algorithm exists that can definitively answer “can we achieve this value?”
The second Theorem examines the Approximation Problem. A Longtermist may downgrade an agent and assume they utilize a restricted policy class, such as which are memoryless maps of However; Lusena demonstrated that these restrictions do not necessarily solve the tractability problem.
Theorem 2 (Lusena, 2001, Theorem 6.2): a polynomial-time algorithm achieving
This shows that for infinite-horizon POMDPs under total discounted, or average reward, calculating an -approximation for the optimal stationary policy is NP-hard.
Utilizing this same paper, Lusena shows that if we use the average reward criterion in an unobservable situation the situation devolves because there is no computable algorithm that can produce an approximation with an additive error .
Theorem 3 (Lusena, 2001, Theorem 6.3): Again taking Lusena’s work, we see that for unobservable POMDPs under average reward with time-dependent policies, no computable -approximation exists.
These three Theorems, utilizing well-known results from these papers, show that for general policies the problem is undecidable and for restricted policies it is either NP-hard or not approximable.
Schema-Level Reduction
One criticism a Longtermist might have is that it is easier to calculate the preference order of something ( is better than ) rather than the exact value of it ( is a 9.8 which is better than which is a 6.7). However; it turns out that this is not the case for this class of problems, and I will show that the Dominance Problem is at least as hard as the Threshold Problem. To my knowledge, this very modest (and rather simplistic) reduction is work unique to me.
Lemma 1: the Threshold Problem reduces to the Intervention Dominance Problem.
Proof by Construction: Let be an instance of the Threshold Problem with discount and I want to determine if First construct a new POMDP with a new initial state that has only two actions: it can Enter which causes a transition to a state with probability (the initial distribution of ) for an immediate reward of 0 or it can Safe which transitions deterministically to an absorbing state at time for an immediate reward of 0.
The rewards for this structure begin once an agent enters via the Enter action and their rewards follow the original reward structure in . If the agent chooses Safe they enter and receive a constant reward at every single time step forever.
Let’s now compare the optimal values of these interventions starting at . The Value of Entering is discounted by one step because the agent enters at . Since the transition probabilities match , the expected value of the next state is exactly the value of starting :
For the Value of Safety, the agent enters at and receives the constant reward forever in a geometric series:
So
Which proves that is strictly greater than iff the original optimal value is greater than the threshold . Any algorithm that could solve the Dominance Problem could solve the Threshold Problem, but we showed in Theorem 1 that the Threshold Problem is undecidable, so the Dominance Problem is also undecidable.
Bounded Agents and the Certification Gap
Another objection could take the form of “we understand that finding the global optimum is undecidable, but as bounded agents we are optimizing on a more restricted class (say as ) using a heuristic solver (say something like SARSOP).” However; this retreat from maximizing optimality surrenders Dominance. If they claim is better than Intervention and use a heuristic solver they only establish:
Which is a statement about algorithms, not interventions. For to actually permit better outcomes than you must assume the Certification Gap is small or bounded:
Unfortunately, this usually reduces to the Approximation Problem and Lusena’s work demonstrates that even for restricted stationary policies, guaranteeing an approximation is NP-hard. So the trade becomes undecidability for intractability and this calculation of “EV” is not a normative one, but rather an unverified hypothesis that the heuristic’s blind spots are distributed symmetrically across interventions. To verify this hypothesis we would have to solve the problem we have shown is either undecidable or intractable.
Conclusion
None of this work is meant to imply I don’t think we should care about future lives or long-term difficult problems. I think these are enormously important topics to work on. I do, however, believe these results challenge the narrative that longtermists can rely on EV dominance as a source of normative authority.
For the broad model classes that are of critical importance to Longtermists I have shown that it is undecidable whether one intervention is better than the other (Lemma 1 via Theorem 1) and even with significant restrictions obtaining correct guarantees are NP-hard (Theorem 2.)
At times Longtermists will play a sophisticated game of kicking the can down the road for these types of questions. This is often expressed in the form of a “pause” or “moratorium” until they learn more. However, as we have shown, even if they were granted perfect knowledge, they would not be able to control their intervention for these long duration events. That is a serious problem for the “delay” approach.
I think this leaves Longtermists with a much weaker case for why they should be the unique arbiters of long-term issues like AI-control, panspermia, etc. They simply don’t have compelling enough math, on its own, to argue for these cases, and it is often the math which is the bedrock of their spiritual authority.
Longtermists should specify the policy restrictions and approximation guarantees they are utilizing when relying on the authority of mathematical optimization. They should also shift from claiming “ is better than ” and instead reveal the heuristic that is being utilized to say something like “Heuristic X prefers to .”
Finally I would suggest that in making the restrictions that are necessary for them to argue about long-term dynamics, they frequently are going to end up defining away the very features that they purport to value. It may be the case that other philosophical methods are necessary to help answer these questions.
At the top we asked “Is Longtermism’s Mandate of Heaven by Arithmetic Justified?” The good news is that a Mandate of Heaven in ancient China was only divine justification until something really bad came up. As soon as there was a famine, the Divine Mandate dried up and it was time for a new one. It might be that time for the core of Longtermism.
- ^
Scott Aaronson brought attention to computational complexity when discussing the problematic implications for an “ideal reasoner” given finite compute.
- ^
Suggested as sensible in a footnote by Askell & Neth “Longtermist Myopia” “For example, it has been shown that if we assume an infinite time horizon, the problem of finding an optimal policy is undecidable (Madani, Hanks, and Condon 2003).” After review, it definitely seems to fit the bill.
- ^
Madani, O., Hanks, S., & Condon, A. (2003). “On the Undecidability of Probabilistic Planning and Related Stochastic Optimization Problems.” Artificial Intelligence, 147(1-2): 5–34.
- ^
Lusena, C., Goldsmith, J., & Mundhenk, M. (2001). “Nonapproximability results for partially observable Markov decision processes.” JAIR, 14:83–103.
Separate comment: the title doesn’t seem to connect well to the content, and it’d be nice if you were clearer about whether your theorems are partly original or simply lifts from the relevant texts that are justified by your modelling choices (I think the latter, given the absence of proofs, but “my first theorem” sorta confuses this).
Thank you for both of your comments, your first comment deserves a thoughtful response, so it might take me until the weekend to reply.
I chose the title because:
”Everybody Wants to Rule the Future”—refers to the notion that the future is hotly contested and longtermists specifically want to “rule” it with specific interventions that are based on EV dominance for normative authority. This is important and salient when billions of dollars are being allocated based on reasoning that has this at its core. (It’s also a playful reference to the Tears for Fears song “Everybody Wants to Rule the World”
Is Longtermism’s Mandate of Heaven by Arithmetic Justified? - this is the core claim being analyzed: is there something special about longtermist philosophy with their math that gives them a special priority over other frameworks?
I definitely didn’t mean to muddy the waters with the theorems, poor wording on my part in sections. They are lifts from relevant texts. (I tried to be clear with the references and citations, but I could improve.) The lemma is a reduction of my own (although nothing big or special, very standard). What is the most appropriate way to update this on LW? Is it ok to just edit it for clarity, or do I need to somehow make a version noting so that people can see the changes I made?
OK, I cleaned up the references. I think it is quite clear now (open to additional feedback if necessary).
A common longtermist model is one where there’s a transient period of instability (perhaps “around about now”) that settles into a stable state thereafter. This seems like it would be no harder than a finite-horizon problem terminating when stability is achieved. I haven’t looked into the results you quote and exactly what role the infinite horizon takes, but intuitively it seems correct that eternal instability (or even very long-lived instability) along any dimension of interest would make policy choice intractable while stability in the near future can make it fairly straightforward. Maybe there’s an issue where the highest value outcomes occur in the unstable regimes, which makes it hard to “bet on stability”, but I’d like to see it in maths + plausible examples.
Hey @David Johnston! I just realized there is a critical question I should ask you before responding: when you talk about the transient time of instability are you referring to an empirical claim or a formal claim?
Well I meant it as an empirical hypothesis and thought it may have formal implications (specifically, placing the problem in a smaller, more tractable class).
Thanks for the clarification!
I think that if a transient period of instability is exogenously bounded (like a snowstorm that passes regardless of our actions), fully observable (we can verify with certainty when the goal is achieved), and non-adversarial, then the modeling does become easier because then it is just a temporary empiric phenomena and doesn’t imply intractable planning problems.
I’m not familiar with all longtermist causes, but I could see something like asteroid impact mitigation, for a specific asteroid we’ve found, fitting quite cleanly into this category. The horizon is finite, the state is observable and the physics are non-adverserial.
However; the undecidability results do apply to any causes with features of endogenous termination, partial observability, adversarial dynamics, and semantic verification (we need to check the behavioral properties of complex systems). Something like AI alignment has all four of these features to a tee.
To return to the notion of a transient period being empirically short (say 10 years): if we model it as a fixed horizon T I see a problem because the expected duration of a process is not the same as the termination condition of a control problem.
For a finite-horizon Markov decision process this would execute a for loop (“run for 10 years”) and termination is exogenous because T is a fixed parameter.
In comparison, longtermist causes have a termination that is endogenous. They run a while loop (“run until alignment is guaranteed”) and T is endogenous. The stopping time is determined by whether the trajectory reaches an absorbing state (say paperclips or utopia.) Expecting that the period of instability is 10 years does not change the formal problem class because we are still trying to figure out a decision problem about which policy lets us end up in S_utopia rather than S_paperclip. This is the Goal Reachability structure and Madani, 2003 shows that the decision problem “Does there exist a policy reaching a goal state with probability >p?” is undecidable for POMDPs. The fact that the transient period will terminate at some point says nothing whether we can verify which policy ensures we end in the desired state.
While the planning problem is undecidable, a reasonable objection could be that we just want to verify when we have reached S_utopia and then we can stop. Unfortunately, the verification problem also has an undecidability barrier because verifying S_utopia means checking a semantic property. Rice’s Theorem shows that non-trivial semantic properties are undecidable for systems of sufficient computational expressiveness. Several recent works suggest AI systems fit this bill: Schuurmans in 2023 proved that LLMs with external memory can “exactly simulate the execution of a universal Turing machine.” and Feng in 2024 showed that for any computable function there exists a prompt that causes a finite-size Transformer to compute it. So agentic AI systems are essentially Turing-complete and fall under Rice’s Theorem.
(Also I think for the specific case of AI safety from the longtermist perspective suffers from being the hard case by default as a misaligned AI is an optimizer searching for gaps in the verification procedure.)
Finally, if we decide to just arbitrarily cut the horizon down, say to 10 years, estimate a terminal value function V(T), and try to optimize against that, this relocates the undecidability issue. To compute V(T) accurately we must estimate the probability of being in a safe rather than latent descriptive state at time T. However, making this determination is the same undecidable problem, it’s just now in the terminal value function. I think we end up with truncation bias: any solver using a heuristic for V(T) will prefer premature stabilization (a suboptimal but observable state, like totalitarianism) to avoid the uncertainty of the while loop.
So overall, I think even if we grant the empirical hypothesis for the transient period of instability, the problem remains undecidable. In general, any risk that requires verifying a semantic property of a Turing-complete, or effectively universal system, faces the Rice’s Theorem barrier.
I believe longtermists are left with the same options I mentioned in the paper: prove the POMDPs they care about somehow fall into a decidable subclass or concede that they are just using heuristics that don’t grant normative authority.
While I think this is a broadly reasonable response, I’m curious what you think is able to provide better public justification than longtermism. These results seem to apply fairly broadly to any realistic EV-based justification for action given that partial observability is very much the rule.
I genuinely don’t know. It’s out of my depth to try to sensibly answer that. I think it’s sometimes easier to see the error in something than the solution.
All the same, I have niggling fear that LTist reasoning as practiced by MacAskill and others rests on a base with very serious problems. That’s not minor when the future of the universe is being decided.
In contrast, I totally believe that EA efforts like distributing malaria nets is a wonderful and sensible idea.
So in summary, not sure.