Decision theory and zero-sum game theory, NP and PSPACE

(Cross-posted from my blog)

At a rough level:

Decision theory is about making decisions to maximize some objective function.
Zero-sum game theory is about making decisions to optimize some objective function while someone else is making decisions to minimize this objective function.

These are quite different.

Decision theory and NP

Decision theory roughly corresponds to the NP complexity class. Consider the following problem:

Given a set of items, each of which has a integer-valued value and weight, does there exist a subset with total weight less than $w$ and total value at least $v$ ?

(It turns out that finding a solution is not much harder than determining whether there is a solution; if you know how to tell whether there is a solution to arbitrary problems of this form, you can in particular tell if there is a solution that uses any particular item.)

This is the knapsack problem, and it is in NP. Given a candidate solution, it is easy to check whether it actually is a solution: you just count the values and the weights. Since this solution would constitute a proof that the answer to the question is “yes”, and a solution exists whenever the answer is “yes”, this problem is in NP.

The following is a general form for NP problems:

$\exists x_{1} \in {0, 1} \exists x_{2} \in {0, 1} \dots \exists x_{k} \in {0, 1} f (x_{1}, . . ., x_{k})$

where $f$ is a specification of a circuit (say, made of AND, OR, and NOT gates) that outputs a single Boolean value. That is, the problem is to decide whether there is some assignment of values to $x_{1}, \dots, x_{k}$ that $f$ outputs true on. This is a variant of the Boolean satisfiability problem.

In decision theory (and in NP), all optimization is in the same direction. The only quantifier is $\exists$ .

Zero-sum game theory and PSPACE

Zero-sum game theory roughly corresponds to the PSPACE complexity class. Consider the following problem:

Given a specification of a Reversi game state (on an arbitrarily-large square board), does there exists a policy for the light player that guarantees a win?

(It turns out that winning the game is not much harder than determining whether there is a winning policy; if you know how to tell whether there is a solution to arbitrary problems of this form, then in particular you can tell if dark can win given a starting move by light.)

This problem is in PSPACE: it can be solved by a Turing machine using a polynomial amount of space. This Turing machine works through the minimax algorithm: it simulates all possible games in a backtracking fashion.

The following is a general form for PSPACE problems:

$\exists x_{1} \in {0, 1} \forall y_{1} \in {0, 1} \dots \exists x_{k} \in {0, 1} \forall y_{k} \in {0, 1} f (x_{1}, y_{1}, \dots, x_{k}, y_{k})$

where $f$ is a specification of a circuit (say, made of AND, OR, and NOT gates) that outputs a single Boolean value. That is, the problem is to determine whether it is possible to set the $x$ values interleaved with an opponent setting the $y$ values such that, no matter how the opponent acts, $f (x_{1}, y_{1}, \dots, x_{k}, y_{k})$ is true. This is a variant of the quantified Boolean formula problem. (Interpreting a logical formula containing $\exists$ and $\forall$ as a game is standard; see game semantics).

In zero-sum game theory, all optimization is in one of two completely opposite directions. There is literally no difference between something that is good for one player and something that is bad for the other. The opposing quantifiers $\exists$ and $\forall$ , representing decisions by the two opponents, are interleaved.

Different cognitive modes

The comparison to complexity classes suggests that there are two different cognitive modes for decision theory and zero-sum game theory, as there are two different types of algorithms for NP-like and PSPACE-like problems.

In decision theory, you plan with no regard to any opponents interfering with your plans, allowing you to plan on arbitrarily long time scales. In zero-sum game theory, you plan on the assumption that your opponent will interfere with your plans (your $\exists$ s are interleaved with your opponent’s $\forall$ s), so you can only plan as far as your opponent lacks the ability to interfere with these plans. You must have a short OODA loop, or your opponent’s interference will make your plans useless.

In decision theory, you can mostly run on naïve expected utility analysis: just do things that seem like they will work. In zero-sum game theory, you must screen your plans for defensibility: they must be resistant to possible attacks. Compare farming with border defense, mechanical engineering with computer security.

High-reliability engineering is an intermediate case: designs must be selected to work with high probability across a variety of conditions, but there is normally no intelligent optimization power working against the design. One could think of nature as an “adversary” selecting some condition to test the design against, and represent this selection by a universal quantifier; however, this is qualitatively different from a true adversary, who applies intentional optimization to break a design rather than haphazard selection of conditions.

Conclusion

These two types of problems do not cover all realistic situations an agent might face. Decision problems involving agents with different but not completely opposed objective functions are different, as are zero-sum games with more than two players. But realistic situations share some properties with each of these, and I suspect that there might actually be a discrete distinction between cognitive modes for NP-like decision theory problems and PSPACE-like zero-sum games.

What’s the upshot? If you want to know what is going on, one of the most important questions (perhaps the most important question) is: what kind of game are you playing? Is your situation more like a decision theory problem or a zero-sum game? To what extent is optimization by different agents going in the same direction, opposing directions, or orthogonal directions? What would have to change for the nature of the game to change?

Thanks to Michael Vassar for drawing my attention to the distinction between decision theory and zero-sum game theory as a distinction between two cognitive modes.

Related: The Face of the Ice