Timeless Decision Theory: Problems I Can’t Solve

Sup­pose you’re out in the desert, run­ning out of wa­ter, and soon to die—when some­one in a mo­tor ve­hi­cle drives up next to you. Fur­ther­more, the driver of the mo­tor ve­hi­cle is a perfectly self­ish ideal game-the­o­retic agent, and even fur­ther, so are you; and what’s more, the driver is Paul Ek­man, who’s re­ally, re­ally good at read­ing fa­cial microex­pres­sions. The driver says, “Well, I’ll con­vey you to town if it’s in my in­ter­est to do so—so will you give me $100 from an ATM when we reach town?”

Now of course you wish you could an­swer “Yes”, but as an ideal game the­o­rist your­self, you re­al­ize that, once you ac­tu­ally reach town, you’ll have no fur­ther mo­tive to pay off the driver. “Yes,” you say. “You’re ly­ing,” says the driver, and drives off leav­ing you to die.

If only you weren’t so ra­tio­nal!

This is the dilemma of Parfit’s Hitch­hiker, and the above is the stan­dard re­s­olu­tion ac­cord­ing to main­stream philos­o­phy’s causal de­ci­sion the­ory, which also two-boxes on New­comb’s Prob­lem and defects in the Pri­soner’s Dilemma. Of course, any self-mod­ify­ing agent who ex­pects to face such prob­lems—in gen­eral, or in par­tic­u­lar—will soon self-mod­ify into an agent that doesn’t re­gret its “ra­tio­nal­ity” so much. So from the per­spec­tive of a self-mod­ify­ing-AI-the­o­rist, clas­si­cal causal de­ci­sion the­ory is a wash. And in­deed I’ve worked out a the­ory, ten­ta­tively la­beled “time­less de­ci­sion the­ory”, which cov­ers these three New­comblike prob­lems and de­liv­ers a first-or­der an­swer that is already re­flec­tively con­sis­tent, with­out need to ex­plic­itly con­sider such no­tions as “pre­com­mit­ment”. Un­for­tu­nately this “time­less de­ci­sion the­ory” would re­quire a long se­quence to write up, and it’s not my cur­rent high­est writ­ing pri­or­ity un­less some­one offers to let me do a PhD the­sis on it.

How­ever, there are some other time­less de­ci­sion prob­lems for which I do not pos­sess a gen­eral the­ory.

For ex­am­ple, there’s a prob­lem in­tro­duced to me by Gary Drescher’s mar­velous Good and Real (OOPS: The be­low for­mu­la­tion was in­de­pen­dently in­vented by Vladimir Nesov; Drescher’s book ac­tu­ally con­tains a re­lated dilemma in which box B is trans­par­ent, and only con­tains $1M if Omega pre­dicts you will one-box whether B ap­pears full or empty, and Omega has a 1% er­ror rate) which runs as fol­lows:

Sup­pose Omega (the same su­per­a­gent from New­comb’s Prob­lem, who is known to be hon­est about how it poses these sorts of dilem­mas) comes to you and says:

“I just flipped a fair coin. I de­cided, be­fore I flipped the coin, that if it came up heads, I would ask you for $1000. And if it came up tails, I would give you $1,000,000 if and only if I pre­dicted that you would give me $1000 if the coin had come up heads. The coin came up heads—can I have $1000?”

Ob­vi­ously, the only re­flec­tively con­sis­tent an­swer in this case is “Yes—here’s the $1000”, be­cause if you’re an agent who ex­pects to en­counter many prob­lems like this in the fu­ture, you will self-mod­ify to be the sort of agent who an­swers “Yes” to this sort of ques­tion—just like with New­comb’s Prob­lem or Parfit’s Hitch­hiker.

But I don’t have a gen­eral the­ory which replies “Yes”. At the point where Omega asks me this ques­tion, I already know that the coin came up heads, so I already know I’m not go­ing to get the mil­lion. It seems like I want to de­cide “as if” I don’t know whether the coin came up heads or tails, and then im­ple­ment that de­ci­sion even if I know the coin came up heads. But I don’t have a good for­mal way of talk­ing about how my de­ci­sion in one state of knowl­edge has to be de­ter­mined by the de­ci­sion I would make if I oc­cu­pied a differ­ent epistemic state, con­di­tion­ing us­ing the prob­a­bil­ity pre­vi­ously pos­sessed by events I have since learned the out­come of… Again, it’s easy to talk in­for­mally about why you have to re­ply “Yes” in this case, but that’s not the same as be­ing able to ex­hibit a gen­eral al­gorithm.

Another stumper was pre­sented to me by Robin Han­son at an OBLW meetup. Sup­pose you have ten ideal game-the­o­retic self­ish agents and a pie to be di­vided by ma­jor­ity vote. Let’s say that six of them form a coal­i­tion and de­cide to vote to di­vide the pie among them­selves, one-sixth each. But then two of them think, “Hey, this leaves four agents out in the cold. We’ll get to­gether with those four agents and offer them to di­vide half the pie among the four of them, leav­ing one quar­ter apiece for the two of us. We get a larger share than one-sixth that way, and they get a larger share than zero, so it’s an im­prove­ment from the per­spec­tives of all six of us—they should take the deal.” And those six then form a new coal­i­tion and re­di­vide the pie. Then an­other two of the agents think: “The two of us are get­ting one-eighth apiece, while four other agents are get­ting zero—we should form a coal­i­tion with them, and by ma­jor­ity vote, give each of us one-sixth.”

And so it goes on: Every ma­jor­ity coal­i­tion and di­vi­sion of the pie, is dom­i­nated by an­other ma­jor­ity coal­i­tion in which each agent of the new ma­jor­ity gets more pie. There does not ap­pear to be any such thing as a dom­i­nant ma­jor­ity vote.

(Robin Han­son ac­tu­ally used this to sug­gest that if you set up a Con­sti­tu­tion which gov­erns a so­ciety of hu­mans and AIs, the AIs will be un­able to con­spire among them­selves to change the con­sti­tu­tion and leave the hu­mans out in the cold, be­cause then the new com­pact would be dom­i­nated by yet other com­pacts and there would be chaos, and there­fore any con­sti­tu­tion stays in place for­ever. Or some­thing along those lines. Need­less to say, I do not in­tend to rely on such, but it would be nice to have a for­mal the­ory in hand which shows how ideal re­flec­tively con­sis­tent de­ci­sion agents will act in such cases (so we can prove they’ll shed the old “con­sti­tu­tion” like used snakeskin.))

Here’s yet an­other prob­lem whose proper for­mu­la­tion I’m still not sure of, and it runs as fol­lows. First, con­sider the Pri­soner’s Dilemma. In­for­mally, two time­less de­ci­sion agents with com­mon knowl­edge of the other’s time­less de­ci­sion agency, but no way to com­mu­ni­cate or make bind­ing com­mit­ments, will both Co­op­er­ate be­cause they know that the other agent is in a similar epistemic state, run­ning a similar de­ci­sion al­gorithm, and will end up do­ing the same thing that they them­selves do. In gen­eral, on the True Pri­soner’s Dilemma, fac­ing an op­po­nent who can ac­cu­rately pre­dict your own de­ci­sions, you want to co­op­er­ate only if the other agent will co­op­er­ate if and only if they pre­dict that you will co­op­er­ate. And the other agent is rea­son­ing similarly: They want to co­op­er­ate only if you will co­op­er­ate if and only if you ac­cu­rately pre­dict that they will co­op­er­ate.

But there’s ac­tu­ally an in­finite regress here which is be­ing glossed over—you won’t co­op­er­ate just be­cause you pre­dict that they will co­op­er­ate, you will only co­op­er­ate if you pre­dict they will co­op­er­ate if and only if you co­op­er­ate. So the other agent needs to co­op­er­ate if they pre­dict that you will co­op­er­ate if you pre­dict that they will co­op­er­ate… (...only if they pre­dict that you will co­op­er­ate, etcetera).

On the Pri­soner’s Dilemma in par­tic­u­lar, this in­finite regress can be cut short by ex­pect­ing that the other agent is do­ing sym­met­ri­cal rea­son­ing on a sym­met­ri­cal prob­lem and will come to a sym­met­ri­cal con­clu­sion, so that you can ex­pect their ac­tion to be the sym­met­ri­cal analogue of your own—in which case (C, C) is prefer­able to (D, D). But what if you’re fac­ing a more gen­eral de­ci­sion prob­lem, with many agents hav­ing asym­met­ri­cal choices, and ev­ery­one wants to have their de­ci­sions de­pend on how they pre­dict that other agents’ de­ci­sions de­pend on their own pre­dicted de­ci­sions? Is there a gen­eral way of re­solv­ing the regress?

On Parfit’s Hitch­hiker and New­comb’s Prob­lem, we’re told how the other be­haves as a di­rect func­tion of our own pre­dicted de­ci­sion—Omega re­wards you if you (are pre­dicted to) one-box, the driver in Parfit’s Hitch­hiker saves you if you (are pre­dicted to) pay $100 on reach­ing the city. My time­less de­ci­sion the­ory only func­tions in cases where the other agents’ de­ci­sions can be viewed as func­tions of one ar­gu­ment, that ar­gu­ment be­ing your own choice in that par­tic­u­lar case—ei­ther by speci­fi­ca­tion (as in New­comb’s Prob­lem) or by sym­me­try (as in the Pri­soner’s Dilemma). If their de­ci­sion is al­lowed to de­pend on how your de­ci­sion de­pends on their de­ci­sion—like say­ing, “I’ll co­op­er­ate, not ‘if the other agent co­op­er­ates’, but only if the other agent co­op­er­ates if and only if I co­op­er­ate—if I pre­dict the other agent to co­op­er­ate un­con­di­tion­ally, then I’ll just defect”—then in gen­eral I do not know how to re­solve the re­sult­ing in­finite regress of con­di­tion­al­ity, ex­cept in the spe­cial case of pre­dictable sym­me­try.

You per­ceive that there is a definite note of “time­less­ness” in all these prob­lems.

Any offered solu­tion may as­sume that a time­less de­ci­sion the­ory for di­rect cases already ex­ists—that is, if you can re­duce the prob­lem to one of “I can pre­dict that if (the other agent pre­dicts) I choose strat­egy X, then the other agent will im­ple­ment strat­egy Y, and my ex­pected pay­off is Z”, then I already have a re­flec­tively con­sis­tent solu­tion which this mar­gin is un­for­tu­nately too small to con­tain.

(In case you’re won­der­ing, I’m writ­ing this up be­cause one of the SIAI Sum­mer Pro­ject peo­ple asked if there was any Friendly AI prob­lem that could be mod­u­larized and handed off and po­ten­tially writ­ten up af­ter­ward, and the an­swer to this is al­most always “No”, but this is ac­tu­ally the one ex­cep­tion that I can think of. (Any­one ac­tu­ally tak­ing a shot at this should prob­a­bly fa­mil­iarize them­selves with the ex­ist­ing liter­a­ture on New­comblike prob­lems—the ed­ited vol­ume “Para­doxes of Ra­tion­al­ity and Co­op­er­a­tion” should be a suffi­cient start (and I be­lieve there’s a copy at the SIAI Sum­mer Pro­ject house.)))