In Logical Time, All Games are Iterated Games

Log­i­cal Time

The main pur­pose of this post is to in­tro­duce the con­cept of log­i­cal time. The idea was men­tioned in Scott’s post, Bayesian Prob­a­bil­ity is for things that are Space-like Separated from You. It was first coined in a con­fer­ence call with, Daniel Dem­ski, Alex Men­nan, and per­haps Corey Staten and Evan Lloyd—I don’t re­mem­ber ex­actly who was there, or who first used the term. Log­i­cal time is an in­for­mal con­cept which serves as an in­tu­ition pump for think­ing about log­i­cal causal­ity and phe­nom­ena in log­i­cal de­ci­sion the­ory; don’t take it too se­ri­ously. In par­tic­u­lar, I am not in­ter­ested in any­body try­ing to for­mally define log­i­cal time (aside from for­mal ap­proaches to log­i­cal causal­ity). Still, it seems like use­ful lan­guage for com­mu­ni­cat­ing de­ci­sion-the­ory in­tu­itions.

Sup­pose you are play­ing chess, and you con­sider mov­ing your bishop. You play out a hy­po­thet­i­cal game which re­sults in your loss in sev­eral moves. You de­cide not to move your bishop as a re­sult of this. The hy­po­thet­i­cal game re­sult­ing in your loss still ex­ists within logic. You are log­i­cally later than it, in that the game you ac­tu­ally play de­pends on what hap­pened in this hy­po­thet­i­cal game.

Sup­pose you’re stuck in the desert in a Parfit’s Hitch­hiker prob­lem. Paul Ek­man is read­ing your face, de­cid­ing whether you’re trust­wor­thy. Paul Ek­man does this based on ex­pe­rience, mean­ing that the com­pu­ta­tion which is you has a strong similar­ity with other com­pu­ta­tions. This similar­ity can be used to pre­dict you fairly re­li­ably, based on your fa­cial ex­pres­sions. What cre­ates this similar­ity? Ac­cord­ing to the log­i­cal time pic­ture, there is a log­i­cal fact much ear­lier in log­i­cal time, which gov­erns the con­nec­tion be­tween fa­cial ex­pres­sions and be­hav­ior.

To the ex­tent that agents are try­ing to pre­dict the fu­ture, they can be thought of as try­ing to place them­selves later in log­i­cal time than the events which they’re try­ing to pre­dict. Two agents try­ing to pre­dict each other are com­pet­ing to see who can be later in log­i­cal time. This is not nec­es­sar­ily wise; in games like chicken, there is a sense in which you want to be ear­lier in log­i­cal time.

Tra­di­tional game the­ory, es­pe­cially Nash equil­ibria, re­lies on what amounts to loopy log­i­cal causal­ity to al­low each agent to be af­ter the other in log­i­cal time. Whether this is bad de­pends on your view on log­i­cal time travel. Per­haps there is a sense in which log­i­cal time can be loopy, due to pre­dic­tion (which is like log­i­cal time travel). Per­haps log­i­cal time can’t be loopy, and this is a flaw in the mod­els used by tra­di­tional game the­ory.

Iter­ated Games

In log­i­cal time, all games are iter­ated games. An agent tries to fore­cast what hap­pens in the de­ci­sion prob­lem it finds it­self in by com­par­ing it to similar de­ci­sion prob­lems which are small enough for it to look at. This puts it later in log­i­cal time than the small ex­am­ples. “Similar games” in­cludes the ex­act same game, but in which both play­ers have had less time to think.

This means it is ap­pro­pri­ate to use iter­ated strate­gies. Agents who are aware of log­i­cal time can play tit-for-tat in sin­gle-shot Pri­soner’s Dilemma, and so, can co­op­er­ate with each other.

Iter­ated games are differ­ent in char­ac­ter than sin­gle-shot games. The folk the­o­rem shows that al­most any out­come is pos­si­ble in iter­ated play (in a cer­tain sense). This makes it difficult to avoid very bad out­comes, such as nearly always defect­ing in the pris­oner’s dilemma, de­spite the availa­bil­ity of much bet­ter equil­ibria such as tit-for-tat. In­tu­itively, this is be­cause (as Yoav Sho­ham et al point out in If multi-agent learn­ing is the an­swer, what is the ques­tion?) it is difficult to sep­a­rate “teach­ing be­hav­ior” from “learn­ing be­hav­ior”: as in the tit-for-tat strat­egy, it is gen­er­ally wise to adopt be­hav­ior de­signed to shape the in­cen­tive gra­di­ent of the other player, in ad­di­tion to im­prov­ing your own score. Un­for­tu­nately, it is difficult to say what it means to pur­sue these two ob­jec­tives si­mul­ta­neously.

The sub­field most re­lated to this prob­lem is multi-agent learn­ing. Sadly, as dis­cussed in the Sho­ham et al pa­per I cited par­en­thet­i­cally in the pre­ced­ing para­graph, multi-agent learn­ing typ­i­cally avoids the difficulty by fo­cus­ing on learn­ing sin­gle-shot equil­ibria via iter­ated play. Learn­ing sin­gle-shot equil­ibria in an iter­ated set­ting is a some­what weird thing to be do­ing (hence the ti­tle of the pa­per). How­ever, it is un­der­stand­able that peo­ple might avoid such a difficult prob­lem. The folk the­o­rem illus­trates a se­vere equil­ibrium se­lec­tion is­sue, mean­ing that tra­di­tional tools have lit­tle to say about ra­tio­nal play.

One might imag­ine learn­ing to play sin­gle-shot games by play­ing them over and over. But, what can you do to learn iter­ated games? You might imag­ine that you jump up a level again, ex­pe­rienc­ing the iter­ated ver­sion re­peat­edly to dis­cover the op­ti­mal iter­ated strat­egy. How­ever, iter­at­ing the game more doesn’t re­ally es­cape the iter­ated set­ting; there is no fur­ther level!

(You might think meta-iter­a­tion in­volves mak­ing the other player for­get what it learned in iter­ated play so far, so that you can re-start the learn­ing pro­cess, but that doesn’t make much sense if you re­tain your own knowl­edge; and if you don’t, you can’t be learn­ing!)

Toy Models

We can make pic­tures of log­i­cal time us­ing phe­nom­ena which we un­der­stand more fully. One such pic­ture is based on proofs. If we imag­ine a the­o­rem prover prov­ing ev­ery the­o­rem in some or­der (such as an or­der­ing based on proof length), we can think of log­i­cal time as time-of-proof. We can for­mu­late coun­ter­fac­tu­als con­sis­tent with this no­tion of log­i­cal time. (As I men­tioned be­fore, a pic­ture of log­i­cal time is just a pic­ture of log­i­cal causal­ity /​ log­i­cal coun­ter­fac­tu­als—the no­tion of log­i­cal time adds noth­ing, for­mally.)

We can ex­am­ine log­i­cal time travel in this kind of model by con­struct­ing pre­dic­tors us­ing stronger log­ics, which al­lows a pre­dic­tor to find shorter proofs. This cre­ates de­ci­sion-the­o­retic puz­zles, be­cause the agent with the weaker logic can’t rec­og­nize the loop­iness of the situ­a­tion; it thinks it can­not in­fluence the pre­dic­tor, be­cause (ac­cord­ing to its weaker logic) the pre­dic­tor has a short proof-length and is there­fore ear­lier in log­i­cal time. We, on the other hand, can rec­og­nize that agents who act as if they con­trol the pre­dic­tor could do bet­ter in the de­ci­sion prob­lem.

This weird­ness seems to only be pos­si­ble be­cause of the “two di­men­sional log­i­cal time” which ex­ists in this toy model, in which we can vary both proof length and log­i­cal strength. One agent has ac­cess to ar­bi­trar­ily long proofs via or­a­cles, and so is “later” in the length di­men­sion; the other has a stronger logic, and so is “later” in the strength di­men­sion.

How­ever, we can col­lapse the two di­men­sions into one via log­i­cal in­duc­tion. Log­i­cal in­duc­tion even­tu­ally learns to pre­dict what stronger log­ics would pre­dict, so com­pu­ta­tion time and log­i­cal strength are more or less the same.

You might ex­pect that the loopy sce­nario in which an agent and a pre­dic­tor ac­cu­rately pre­dict each other be­comes im­pos­si­ble in log­i­cal in­duc­tion, but, it does not. Log­i­cal-in­duc­tion agents can pre­dict each other well by ex­am­in­ing what similar agents do in similar situ­a­tions. As a re­sult, LIDT agents con­verge to play­ing cor­re­lated equil­ibria with each other, more or less. (This re­sult ig­nores the iter­ated as­pect of the games, just like the multi-agent learn­ing ap­proaches I was com­plain­ing about ear­lier; de­spite learn­ing from all the nearby copies within logic, the LIDT agents think only of the util­ity for their one de­ci­sion, which para­dox­i­cally re­sults in poorer out­comes even for that de­ci­sion. Asymp­totic de­ci­sion the­ory does bet­ter, but no nice re­sults for game the­ory have come from it so far.)

So long as an agent even­tu­ally set­tles down to mak­ing some re­li­able pat­tern of de­ci­sions in a situ­a­tion, there will be rel­a­tively young log­i­cal in­duc­tors which have learned enough to ac­cu­rately fore­cast the de­ci­sions made by log­i­cal-in­duc­tion agents who rea­son us­ing much more com­pu­ta­tional power.

We can think of the purely log­i­cal case, with its ap­par­ent two di­men­sions of log­i­cal time, as be­ing a de­gen­er­ate ex­treme ver­sion of the phe­nomenon in log­i­cal in­duc­tion. In log­i­cal in­duc­tion, the early pre­dic­tions may be quite ac­cu­rate, but they are fal­lible; they always run the risk of be­ing wrong, since we’re in the pro­cess of learn­ing. In the pure log­i­cal case, we also run the risk of be­ing wrong: us­ing a stronger logic to make pre­dic­tions runs the risk of in­tro­duc­ing in­con­sis­ten­cies. This is easy to for­get, since we are ac­cus­tomed to the as­sump­tion that we can eas­ily add ax­ioms to a con­sis­tent sys­tem to get a stronger one.

An early pre­dic­tor pre­dict­ing a late agent must give up on some ac­cu­racy—a pre­dic­tion which re­lies on any­thing else than ac­tu­ally run­ning the com­pu­ta­tion to be pre­dicted has some chance of failure. This breaks the loop­iness in log­i­cal time; the late agent always adds some small amount of in­for­ma­tion, even if its ac­tion is pre­dicted with high re­li­a­bil­ity.