Exploiting EDT

The prob­lem with EDT is, as David Lewis put it, its “ir­ra­tional policy of man­ag­ing the news” (Lewis, 1981): it chooses ac­tions not only be­cause of their effects of the world, but also be­cause of what the fact that it’s tak­ing these ac­tions tells it about events the agent can’t af­fect at all. The canon­i­cal ex­am­ple is the smok­ing le­sion prob­lem.

I’ve long been un­com­fortable with the smok­ing le­sion prob­lem as the case against EDT, be­cause an AI sys­tem would know its own util­ity func­tion, and would there­fore know whether or not it val­ues “smok­ing” (pre­sum­ably in the AI case it would be a differ­ent goal), and if it up­dates on this fact it would be­have cor­rectly in the smok­ing le­sion. (This is an AI-cen­tric ver­sion of the “tickle defense” of EDT.) Nate and I have come up with a var­i­ant I find much more con­vinc­ing: a way to get EDT agents to pay you for man­ag­ing the news for them, which works by the same mechanism that makes these agents one-box in New­comb’s prob­lem. (It’s a vari­a­tion of the thought ex­per­i­ment in my LessWrong post on “the sin of up­dat­ing when you can change whether you ex­ist”.)

Sup­pose that there’s this EDT agent around which plays the stock mar­ket. It’s pretty good at do­ing so, and has amassed a sub­stan­tial net worth, but, un­sur­pris­ingly, it’s not perfect; there’s always a small chance of its in­vest­ments go­ing south. Cur­rently, for ex­am­ple, there’s a bit of a hub­bub around the CEO of one of the com­pa­nies the agent has been in­vest­ing in, and the agent as­signs a 0.4% chance that there’s a scan­dal about to be re­vealed which will force that CEO to re­sign; if that’s the case, it ex­pects to lose $150,000,000.

Along comes a clever AI re­searcher, who is known to be ex­tremely good at pre­dict­ing how differ­ent kinds of AI sys­tems will re­act on sim­ple thought ex­per­i­ments, who has ac­cess to the source code of our agent, and who hap­pens to be able to ac­cess in­for­ma­tion about whether or not there’s a scan­dal. The re­searcher could find out and sell the in­for­ma­tion to the agent, but since it’s a low-prob­a­bil­ity sce­nario, the value of that in­for­ma­tion wouldn’t be that large. In­stead, the re­searcher has a bet­ter idea: they are go­ing to

  1. find out whether or not there is a scan­dal;

  2. figure out whether or not the agent will pay them $100,000,000 if they ask it to in the next step;

  3. if ei­ther (a) there is no scan­dal and the re­searcher pre­dicts that the agent will pay up, or (b) there is a scan­dal and they pre­dict that the agent won’t pay up, then they will send the agent a pre-drafted e-mail that ex­plains this whole pro­ce­dure, ex­plains that ei­ther case (a) or (b) turned out to be true (but not which one), and asks the agent to im­me­di­ately pay the re­searcher $100,000,000.

We as­sume that the agent has suffi­cient in­for­ma­tion about the re­searcher to know that they are trust­wor­thy and a good pre­dic­tor of the agent’s be­hav­ior. Then, as an EDT agent, it’ll calcu­late ex­pected util­ity by con­di­tion­ing on its ob­ser­va­tions and on the ac­tion it’s con­sid­er­ing tak­ing, as fol­lows:

  • Con­di­tion­ing on re­ceiv­ing the e-mail and pay­ing up, the AI is in case (a): it’s pay­ing $100,000,000, but there is no scan­dal, so it’s not los­ing the $150,000,000.

  • Con­di­tion­ing on re­ceiv­ing the e-mail and not pay­ing up, the AI is in case (b): it’s not pay­ing the re­searcher, but there’s a scan­dal, and it’s go­ing to lose $150,000,000.

The first of these leads to higher con­di­tional ex­pected util­ity, so the AI pays up. The re­searcher de­cides to use the money to fund in­ves­ti­ga­tions into bet­ter de­ci­sion the­o­ries.