In Logical Time, All Games are Iterated Games
The main purpose of this post is to introduce the concept of logical time. The idea was mentioned in Scott’s post, Bayesian Probability is for things that are Space-like Separated from You. It was first coined in a conference call with, Daniel Demski, Alex Mennan, and perhaps Corey Staten and Evan Lloyd—I don’t remember exactly who was there, or who first used the term. Logical time is an informal concept which serves as an intuition pump for thinking about logical causality and phenomena in logical decision theory; don’t take it too seriously. In particular, I am not interested in anybody trying to formally define logical time (aside from formal approaches to logical causality). Still, it seems like useful language for communicating decision-theory intuitions.
Suppose you are playing chess, and you consider moving your bishop. You play out a hypothetical game which results in your loss in several moves. You decide not to move your bishop as a result of this. The hypothetical game resulting in your loss still exists within logic. You are logically later than it, in that the game you actually play depends on what happened in this hypothetical game.
Suppose you’re stuck in the desert in a Parfit’s Hitchhiker problem. Paul Ekman is reading your face, deciding whether you’re trustworthy. Paul Ekman does this based on experience, meaning that the computation which is you has a strong similarity with other computations. This similarity can be used to predict you fairly reliably, based on your facial expressions. What creates this similarity? According to the logical time picture, there is a logical fact much earlier in logical time, which governs the connection between facial expressions and behavior.
To the extent that agents are trying to predict the future, they can be thought of as trying to place themselves later in logical time than the events which they’re trying to predict. Two agents trying to predict each other are competing to see who can be later in logical time. This is not necessarily wise; in games like chicken, there is a sense in which you want to be earlier in logical time.
Traditional game theory, especially Nash equilibria, relies on what amounts to loopy logical causality to allow each agent to be after the other in logical time. Whether this is bad depends on your view on logical time travel. Perhaps there is a sense in which logical time can be loopy, due to prediction (which is like logical time travel). Perhaps logical time can’t be loopy, and this is a flaw in the models used by traditional game theory.
In logical time, all games are iterated games. An agent tries to forecast what happens in the decision problem it finds itself in by comparing it to similar decision problems which are small enough for it to look at. This puts it later in logical time than the small examples. “Similar games” includes the exact same game, but in which both players have had less time to think.
This means it is appropriate to use iterated strategies. Agents who are aware of logical time can play tit-for-tat in single-shot Prisoner’s Dilemma, and so, can cooperate with each other.
Iterated games are different in character than single-shot games. The folk theorem shows that almost any outcome is possible in iterated play (in a certain sense). This makes it difficult to avoid very bad outcomes, such as nearly always defecting in the prisoner’s dilemma, despite the availability of much better equilibria such as tit-for-tat. Intuitively, this is because (as Yoav Shoham et al point out in If multi-agent learning is the answer, what is the question?) it is difficult to separate “teaching behavior” from “learning behavior”: as in the tit-for-tat strategy, it is generally wise to adopt behavior designed to shape the incentive gradient of the other player, in addition to improving your own score. Unfortunately, it is difficult to say what it means to pursue these two objectives simultaneously.
The subfield most related to this problem is multi-agent learning. Sadly, as discussed in the Shoham et al paper I cited parenthetically in the preceding paragraph, multi-agent learning typically avoids the difficulty by focusing on learning single-shot equilibria via iterated play. Learning single-shot equilibria in an iterated setting is a somewhat weird thing to be doing (hence the title of the paper). However, it is understandable that people might avoid such a difficult problem. The folk theorem illustrates a severe equilibrium selection issue, meaning that traditional tools have little to say about rational play.
One might imagine learning to play single-shot games by playing them over and over. But, what can you do to learn iterated games? You might imagine that you jump up a level again, experiencing the iterated version repeatedly to discover the optimal iterated strategy. However, iterating the game more doesn’t really escape the iterated setting; there is no further level!
(You might think meta-iteration involves making the other player forget what it learned in iterated play so far, so that you can re-start the learning process, but that doesn’t make much sense if you retain your own knowledge; and if you don’t, you can’t be learning!)
We can make pictures of logical time using phenomena which we understand more fully. One such picture is based on proofs. If we imagine a theorem prover proving every theorem in some order (such as an ordering based on proof length), we can think of logical time as time-of-proof. We can formulate counterfactuals consistent with this notion of logical time. (As I mentioned before, a picture of logical time is just a picture of logical causality / logical counterfactuals—the notion of logical time adds nothing, formally.)
We can examine logical time travel in this kind of model by constructing predictors using stronger logics, which allows a predictor to find shorter proofs. This creates decision-theoretic puzzles, because the agent with the weaker logic can’t recognize the loopiness of the situation; it thinks it cannot influence the predictor, because (according to its weaker logic) the predictor has a short proof-length and is therefore earlier in logical time. We, on the other hand, can recognize that agents who act as if they control the predictor could do better in the decision problem.
This weirdness seems to only be possible because of the “two dimensional logical time” which exists in this toy model, in which we can vary both proof length and logical strength. One agent has access to arbitrarily long proofs via oracles, and so is “later” in the length dimension; the other has a stronger logic, and so is “later” in the strength dimension.
However, we can collapse the two dimensions into one via logical induction. Logical induction eventually learns to predict what stronger logics would predict, so computation time and logical strength are more or less the same.
You might expect that the loopy scenario in which an agent and a predictor accurately predict each other becomes impossible in logical induction, but, it does not. Logical-induction agents can predict each other well by examining what similar agents do in similar situations. As a result, LIDT agents converge to playing correlated equilibria with each other, more or less. (This result ignores the iterated aspect of the games, just like the multi-agent learning approaches I was complaining about earlier; despite learning from all the nearby copies within logic, the LIDT agents think only of the utility for their one decision, which paradoxically results in poorer outcomes even for that decision. Asymptotic decision theory does better, but no nice results for game theory have come from it so far.)
So long as an agent eventually settles down to making some reliable pattern of decisions in a situation, there will be relatively young logical inductors which have learned enough to accurately forecast the decisions made by logical-induction agents who reason using much more computational power.
We can think of the purely logical case, with its apparent two dimensions of logical time, as being a degenerate extreme version of the phenomenon in logical induction. In logical induction, the early predictions may be quite accurate, but they are fallible; they always run the risk of being wrong, since we’re in the process of learning. In the pure logical case, we also run the risk of being wrong: using a stronger logic to make predictions runs the risk of introducing inconsistencies. This is easy to forget, since we are accustomed to the assumption that we can easily add axioms to a consistent system to get a stronger one.
An early predictor predicting a late agent must give up on some accuracy—a prediction which relies on anything else than actually running the computation to be predicted has some chance of failure. This breaks the loopiness in logical time; the late agent always adds some small amount of information, even if its action is predicted with high reliability.