Decision theory and zero-sum game theory, NP and PSPACE
(Cross-posted from my blog)
At a rough level:
Decision theory is about making decisions to maximize some objective function.
Zero-sum game theory is about making decisions to optimize some objective function while someone else is making decisions to minimize this objective function.
These are quite different.
Decision theory and NP
Decision theory roughly corresponds to the NP complexity class. Consider the following problem:
Given a set of items, each of which has a integer-valued value and weight, does there exist a subset with total weight less than and total value at least ?
(It turns out that finding a solution is not much harder than determining whether there is a solution; if you know how to tell whether there is a solution to arbitrary problems of this form, you can in particular tell if there is a solution that uses any particular item.)
This is the knapsack problem, and it is in NP. Given a candidate solution, it is easy to check whether it actually is a solution: you just count the values and the weights. Since this solution would constitute a proof that the answer to the question is “yes”, and a solution exists whenever the answer is “yes”, this problem is in NP.
The following is a general form for NP problems:
where is a specification of a circuit (say, made of AND, OR, and NOT gates) that outputs a single Boolean value. That is, the problem is to decide whether there is some assignment of values to that outputs true on. This is a variant of the Boolean satisfiability problem.
In decision theory (and in NP), all optimization is in the same direction. The only quantifier is .
Zero-sum game theory and PSPACE
Zero-sum game theory roughly corresponds to the PSPACE complexity class. Consider the following problem:
Given a specification of a Reversi game state (on an arbitrarily-large square board), does there exists a policy for the light player that guarantees a win?
(It turns out that winning the game is not much harder than determining whether there is a winning policy; if you know how to tell whether there is a solution to arbitrary problems of this form, then in particular you can tell if dark can win given a starting move by light.)
This problem is in PSPACE: it can be solved by a Turing machine using a polynomial amount of space. This Turing machine works through the minimax algorithm: it simulates all possible games in a backtracking fashion.
The following is a general form for PSPACE problems:
where is a specification of a circuit (say, made of AND, OR, and NOT gates) that outputs a single Boolean value. That is, the problem is to determine whether it is possible to set the values interleaved with an opponent setting the values such that, no matter how the opponent acts, is true. This is a variant of the quantified Boolean formula problem. (Interpreting a logical formula containing and as a game is standard; see game semantics).
In zero-sum game theory, all optimization is in one of two completely opposite directions. There is literally no difference between something that is good for one player and something that is bad for the other. The opposing quantifiers and , representing decisions by the two opponents, are interleaved.
Different cognitive modes
The comparison to complexity classes suggests that there are two different cognitive modes for decision theory and zero-sum game theory, as there are two different types of algorithms for NP-like and PSPACE-like problems.
In decision theory, you plan with no regard to any opponents interfering with your plans, allowing you to plan on arbitrarily long time scales. In zero-sum game theory, you plan on the assumption that your opponent will interfere with your plans (your s are interleaved with your opponent’s s), so you can only plan as far as your opponent lacks the ability to interfere with these plans. You must have a short OODA loop, or your opponent’s interference will make your plans useless.
In decision theory, you can mostly run on naïve expected utility analysis: just do things that seem like they will work. In zero-sum game theory, you must screen your plans for defensibility: they must be resistant to possible attacks. Compare farming with border defense, mechanical engineering with computer security.
High-reliability engineering is an intermediate case: designs must be selected to work with high probability across a variety of conditions, but there is normally no intelligent optimization power working against the design. One could think of nature as an “adversary” selecting some condition to test the design against, and represent this selection by a universal quantifier; however, this is qualitatively different from a true adversary, who applies intentional optimization to break a design rather than haphazard selection of conditions.
Conclusion
These two types of problems do not cover all realistic situations an agent might face. Decision problems involving agents with different but not completely opposed objective functions are different, as are zero-sum games with more than two players. But realistic situations share some properties with each of these, and I suspect that there might actually be a discrete distinction between cognitive modes for NP-like decision theory problems and PSPACE-like zero-sum games.
What’s the upshot? If you want to know what is going on, one of the most important questions (perhaps the most important question) is: what kind of game are you playing? Is your situation more like a decision theory problem or a zero-sum game? To what extent is optimization by different agents going in the same direction, opposing directions, or orthogonal directions? What would have to change for the nature of the game to change?
Thanks to Michael Vassar for drawing my attention to the distinction between decision theory and zero-sum game theory as a distinction between two cognitive modes.
Related: The Face of the Ice
Joan Feigenbaum’s Games, Complexity Classes, and Approximation Algorithms may be a good citation for zero-sum game theory = PSPACE.
I’m not sure Decision Theory = NP makes sense. If a problem is in NP then you can produce a certificate (verifiable in polynomial time) showing that a solution is correct. If you’re iterating over possible actions to find which one has the highest expected utility, I don’t see what certificate you can output (in general) that would let someone check that the chosen action is the highest EU one, without going through the whole computation all over again. Perhaps I’m expecting the analogy to be tighter than intended?
The recent AI safety via debate paper gave a Supervised Learning = P, Reinforcement Learning = NP, DEBATE = PSPACE analogy (section 2.2), which does make sense to me.
Thanks for the links, those are relevant.
If you have an NP oracle, you can first ask it “is there an action with EU >= k” for some medium k value, then binary search until you find the highest k value (within epsilon) for which there is an action with EU >= k, assuming that EU can be computed in polynomial time. I agree that you can’t prove that a particular sequence of actions has the highest possible EU.
“what move should open with in reversi” would be considered as an admissible decision-theory problem by many people. Or in other words: Your argument that EU maximization is in NP only holds for utility functions that permit computation in P of expected utility given your actions. That’s not quite true in the real world.
I agree with some of the other commenters that the term “decision theory” ought to be reserved for the overarching problem of which decision algorithm to use, and that the distinction you’re referring to ought to be called something like “adversarial” vs “non-adversarial” or “rival” vs “non-rival”. Nonetheless, I think this is an interesting handle for thinking about human psychology.
If we view these as two separate modes in humans, and presume that there’s some kind of subsystem that decides which mode to use, then false positives of that subsystem look like potential explanations for things like The Imp of the Perverse, or paranoia.
Another analogy is that Bayesianism = decision theory and frequentism = game theory. Bayesians try to find the best response to a particular mixed strategy of nature (prior), while frequentists try to achieve “frequentist guarantees” which is like minimaxing against nature. Wald’s complete class theorems are pretty much about that.
Is there yet another cognitive mode for general-sum game theory? I suspect so. I think I’ve seen several times that the purely cooperative case (which might be like the “decision theory” case here) and the purely competitive case (“zero sum” here) have tractable analyses, but questions of general games are open. I ran into this in multi-agent RL (though I’m not sure if it is still the case—I was looking at older papers).
This makes sense: it is easier to assume someone is either with you or against you than to navigate complicated preferences.
In terms of computational complexity, finding Nash equilibria is PPAD complete, but I can’t say anything about what that means for the story here.
Mention a recent interesting work here: On the Complexity of Computing Markov Perfect Equilibrium in General-Sum Stochastic Games gave a related analysis on the comuting of Markov PE for RL agents.
This post helped me understand NP noticeably more than the complexity course I was just examined on—thanks. I didn’t know there was such a thing as descriptive complexity theory (i.e. lists of correspondences between various subsets/closures of FOL/SOL for a bunch of difference classes). I’ll probably read about it a bunch this weekend.
It seems unfortunate that “zero sum” is the term of art. Utility functions are only meaningful up to (positive) scalar multiplication and a constant. So, zero-sum games can be constant-sum if we add a constant to someone’s utility function, or ever variable sum if we multiply one of the player’s utility functions by a constant. “adversarial games” or “(purely) competitive games” or something like that would be more descriptive.
Promoted to curated. Here are my thoughts:
I think this post hits a narrow target of taking a fairly high-level and complicated intuition, and successfully formalizing parts of it by putting it into the context of existing work in an established technical field. I think this kind of work is one of the most promising avenues for making progress in rationality, and has been one of the key drivers of progress historically (see as an example of this Eliezer’s Technical Explanation of a Technical Explanation).
Two things that seem like potentially great additions to this post are maybe one or two exercises that could test your understanding of the analogy and idea (one of the major benefits of technical posts is that you can often add exercises that have a clear-cut answer), and some more exploration of how this frame on decision theory and game theory interacts with existing writing on LW (i.e. more exploration of the form “this is what this implies about this thought experiment that was discussed previously”).
Note that imperfect information games are complete for EXP rather than PSPACE. (I’m surprised that I’d never run into this fact explicitly before. Scott Aaronson pointed it out to me today.)
And of course cooperative games with two non-communicating players are complete for NEXP.
ETA: also note that playing a “game against nature” is also PSPACE complete, if nature is stochastic (since IP=PSPACE).
Edit: I misunderstood. Left for historical reasons, but not important. Please ignore.
Wait—may I ask for some remedial instruction on why zero-sum game-theory isn’t a strict subset of decision theory, just with pessimistic assumptions about some of the variables? I can turn any decision problem into a game-theory problem by adding an opponent with opposite values from me, right? I strongly suspect that _both_ are NP generally, and I can give many specific examples of both that can be solved in polynomial time.
I’m confused by your “cognitive modes” idea as well—it may be true, but I don’t see how it follows, if decision theory is more difficult (NP) than game theory (P), that
This seems to lean in the opposite direction—in decision theory, you can easily answer as far out as you like, in zero-sum game-theory you’re far more limited. It is also more about certainty of game state than about motivation or existence of opponent. A plan with no opponents but also a bunch of randomness (or unknown starting state; same thing for these purposes) is MORE impossible to follow than a plan with known state and an active opponent.
I do somewhat agree with your final paragraph, but I don’t agree with your premise nor reasoning to get there.
Where did this come from? Did you confuse PSPACE with P?
If you add an opponent with opposite values from you into a decision theory problem, you are adding a component that is approximately as hard to compute the behavior of as it is to solve the problem as a whole (since it depends on a computation by the opponent which is of roughly the same form).
P and PSPACE are not the same.
My point is exactly that P and PSPACE are not the same. Addition of an opponent cannot turn NP into P, so “decision problem” vs “zero-sum game”cannot be the distinction which defines the problem space.
Why do you think zero-sum game theory is P? The quantified Boolean formula problem is PSPACE-complete.
Oh my—I completely misread the post, thinking it was comparing P vs PSPACE, when in fact it’s comparing NP to PSPACE. My point that competitive decisions are just a special-case of decision theory stands, but it’s far less important.
Nevermind!
Typo: xn, yn should be xk, yk
Fixed, thanks
Minor nit: I always thought the term “decision theory” referred to the meta-level task of formulating an algorithm which, given fully specified beliefs and values, tells you how to compare possible actions. On the contrary, when I see someone making a concrete decision using an EU calculation or some such, I don’t think of them as “doing decision theory”. So perhaps “decision making” or “positive sum game theory” rather than “decision theory”? It probably doesn’t matter much.
If you can perfectly simulate your opponent, then a zero sum game is also in NP. Of course, in practice you can’t, but you do have some reasonable beliefs about what the opponent will do, which, I think, moves most real life problems into the same gray-zone as high-reliability engineering.
Also, I suspect that real-life uncertainty about “f” in your NP problems likewise pushes them up in complexity, into that same gray-zone.
There may be two clear clusters in theory, but it’s not certain that they remain distinct in practice.