Troll Bridge

All of the re­sults in this post, and most of the in­for­mal ob­ser­va­tions/​in­ter­pre­ta­tions, are due to Sam Eisen­stat.

Troll Bridge is a de­ci­sion prob­lem which has been float­ing around for a while, but which has lacked a good in­tro­duc­tory post. The origi­nal post gives the es­sen­tial ex­am­ple, but it lacks the “troll bridge” story, which (1) makes it hard to un­der­stand, since it is just stated in math­e­mat­i­cal ab­strac­tion, and (2) makes it difficult to find if you search for “troll bridge”.

The ba­sic idea is that you want to cross a bridge. How­ever, there is a troll who will blow up the bridge with you on it, if (and only if) you cross it “for a dumb rea­son” — for ex­am­ple, due to un­sound logic. You can get to where you want to go by a worse path (through the stream). This path is bet­ter than be­ing blown up, though.

We ap­ply a Löbian proof to show not only that you choose not to cross, but fur­ther­more, that your coun­ter­fac­tual rea­son­ing is con­fi­dent that the bridge would have blown up if you had crossed. This is sup­posed to be a coun­terex­am­ple to var­i­ous pro­posed no­tions of coun­ter­fac­tual, and for var­i­ous pro­posed de­ci­sion the­o­ries.

The pseu­docode for the en­vi­ron­ment (more speci­fi­cally, the util­ity gained from the en­vi­ron­ment) is as fol­lows:

IE, if the agent crosses the bridge and is in­con­sis­tent, then U=-10. (□⊥ means “PA proves an in­con­sis­tency”.) Other­wise, if the agent crosses the bridge, U=+10. If nei­ther of these (IE, the agent does not cross the bridge), U=0.

The pseu­docode for the agent could be as fol­lows:

This is a lit­tle more com­pli­cated, but the idea is sup­posed to be that you search for ev­ery “ac­tion im­plies util­ity” pair, and take the ac­tion for which you can prove the high­est util­ity (with some tie-break­ing pro­ce­dure). Im­por­tantly, this is the kind of proof-based de­ci­sion the­ory which elimi­nates spu­ri­ous coun­ter­fac­tu­als in 5-and-10 type prob­lems. It isn’t that easy to trip up with Löbian proofs. (His­tor­i­cal/​ter­minolog­i­cal note: This de­ci­sion the­ory was ini­tially called MUDT, and is still some­times referred to in that way. How­ever, I now of­ten call it proof-based de­ci­sion the­ory, be­cause it isn’t cen­trally a UDT. “Mo­dal DT” (MDT) would be rea­son­able, but the modal op­er­a­tor in­volved is the “prov­abil­ity” op­er­a­tor, so “proof-based DT” seems more di­rect.)

Now, the proof:

  • Rea­son­ing within PA (ie, the logic of the agent):

    • Sup­pose the agent crosses.

      • Fur­ther sup­pose that the agent proves that cross­ing im­plies U=-10.

        • Ex­am­in­ing the source code of the agent, be­cause we’re as­sum­ing the agent crosses, ei­ther PA proved that cross­ing im­plies U=+10, or it proved that cross­ing im­plies U=0.

        • So, ei­ther way, PA is in­con­sis­tent—by way of 0=-10 or +10=-10.

        • So the troll ac­tu­ally blows up the bridge, and re­ally, U=-10.

      • There­fore (pop­ping out of the sec­ond as­sump­tion), if the agent proves that cross­ing im­plies U=-10, then in fact cross­ing im­plies U=-10.

      • By Löb’s the­o­rem, cross­ing re­ally im­plies U=-10.

      • So (since we’re still un­der the as­sump­tion that the agent crosses), U=-10.

    • So (pop­ping out of the as­sump­tion that the agent crosses), the agent cross­ing im­plies U=-10.

  • Since we proved all of this in PA, the agent proves it, and proves no bet­ter util­ity in ad­di­tion (un­less PA is truly in­con­sis­tent). On the other hand, it will prove that not cross­ing gives it a safe U=0. So it will in fact not cross.

The para­dox­i­cal as­pect of this ex­am­ple is not that the agent doesn’t cross—it makes sense that a proof-based agent can’t cross a bridge whose safety is de­pen­dent on the agent’s own logic be­ing con­sis­tent, since proof-based agents can’t know whether their logic is con­sis­tent. Rather, the point is that the agent’s “coun­ter­fac­tual” rea­son­ing looks crazy. (How­ever, keep read­ing for a ver­sion of the ar­gu­ment where it does make the agent take the wrong ac­tion.) Ar­guably, the agent should be un­cer­tain of what hap­pens if it crosses the bridge, rather than cer­tain that the bridge would blow up. Fur­ther­more, the agent is rea­son­ing as if it can con­trol whether PA is con­sis­tent, which is ar­guably wrong.

In a com­ment, Stu­art points out that this rea­son­ing seems highly de­pen­dent on the code of the agent; the “else” clause could be differ­ent, and the ar­gu­ment falls apart. I think the ar­gu­ment keeps its force:

  • On the one hand, it’s still very con­cern­ing if the sen­si­bil­ity of the agent de­pends greatly on which ac­tion it performs in the “else” case.

  • On the other hand, we can mod­ify the troll’s be­hav­ior to match the mod­ified agent. The gen­eral rule is that the troll blows up the bridge if the agent would cross for a “dumb rea­son”—the agent then con­cludes that the bridge would be blown up if it crossed. I can no longer com­plain that the agent rea­sons as if it were con­trol­ling the con­sis­tency of PA, but I can still com­plain that the agent thinks an ac­tion is bad be­cause that ac­tion in­di­cates its own in­san­ity, due to a trou­blingly cir­cu­lar ar­gu­ment.

Anal­ogy to Smok­ing Lesion

One in­ter­pre­ta­tion of this thought-ex­per­i­ment is that it shows proof-based de­ci­sion the­ory to be es­sen­tially a ver­sion of EDT, in that it has EDT-like be­hav­ior for Smok­ing Le­sion. The anal­ogy to Smok­ing Le­sion is rel­a­tively strong:

  • An agent is at risk of hav­ing a sig­nifi­cant in­ter­nal is­sue. (In Smok­ing Le­sion, it’s a med­i­cal is­sue. In Troll Bridge, it is log­i­cal in­con­sis­tency.)

  • The in­ter­nal is­sue would bias the agent to­ward a par­tic­u­lar ac­tion. (In Smok­ing Le­sion, the agent smokes. In Troll Bridge, an in­con­sis­tent agent crosses the bridge.)

  • The in­ter­nal is­sue also causes some imag­ined prac­ti­cal prob­lem for the agent. (In Smok­ing Le­sion, the le­sion makes one more likely to get can­cer. In Troll Bridge, the in­con­sis­tency would make the troll blow up the bridge.)

  • There is a chain of rea­son­ing which com­bines these facts to stop the agent from tak­ing the ac­tion. (In smok­ing le­sion, EDT re­fuses to smoke due to the cor­re­la­tion with can­cer. In Troll Bridge, the proof-based agent re­fuses to cross the bridge be­cause of a Löbian proof that cross­ing the bridge leads to dis­aster.)

  • We in­tu­itively find the con­clu­sion non­sen­si­cal. (It seems the EDT agent should smoke; it seems the proof-based agent should not ex­pect the bridge to ex­plode.)

In­deed, the anal­ogy to smok­ing le­sion seems to strengthen the fi­nal point—that the coun­ter­fac­tual rea­son­ing is wrong.

Prob­a­bil­is­tic Version

For the purely log­i­cal ver­sion, I said that we can’t fault the con­clu­sion (be­cause the agent can’t prove that it is safe to cross the bridge) – only the rea­son­ing is be­ing cri­tiqued. We might hope that an agent who uses some un­cer­tainty, rather than rea­son­ing with logic alone, would de­cide to cross the bridge.

More speci­fi­cally, sup­pose that we make the risk of cross­ing ex­tremely small: rather than −10 pay­off for the case where the bridge is blown up, we get a very small nega­tive pay­off, -ep­silon, which is barely worse than re­fus­ing to cross. This should make cross­ing the bridge worth the risk; very lit­tle to lose, and ev­ery­thing to gain.

Un­for­tu­nately, this is not how things work out.

Sup­pose that the agent uses a prob­a­bil­ity dis­tri­bu­tion which “re­spects logic” in the sense that it as­signs prob­a­bil­ity zero to any­thing which is log­i­cally re­futable. (This as­sumes log­i­cal om­ni­science. I’ll drop that as­sump­tion later.)

The agent is as fol­lows: if P(cross)=0, cross; if P(¬cross)=0, don’t cross; else, take the ac­tion with high­est ex­pected util­ity, break­ing ties by not cross­ing. The idea is that we are try­ing to take the high­est-ex­pected-util­ity ac­tion; how­ever, we must han­dle cases where this isn’t well-defined. Choos­ing to take an ac­tion when the prob­a­bil­ity of the ac­tion is zero guaran­tees that the agent can never prove its own ac­tion; it is a ver­sion of the chicken rule. Break­ing ties by not cross­ing is ar­bi­trary, but gives the sim­plest ver­sion of the ex­am­ple, be­cause this choice re­duces the num­ber of cases we need to check.

The troll is as fol­lows: if the agent crosses the bridge due to the P(cross)=0 clause, blow up the bridge.

Rea­son­ing in the logic of the agent:

  • Sup­pose

    • Sup­pose

      • Ex­am­in­ing the agent, ei­ther cross­ing had higher ex­pected util­ity, or P(cross)=0. But we as­sumed , so it must be the lat­ter. So the bridge gets blown up.

    • There­fore, .

  • There­fore,

Thus, by Löb’s the­o­rem, we see that .

There­fore, since the agent is cer­tain of things which are prov­able, the agent as­signs ex­pected value −10 to cross­ing. Since the ex­pected value of the other ac­tion will always be higher than this, and since we know the agent doesn’t ever re­ally en­ter into the P=0 con­di­tions, the agent will choose not to cross.

No­tice that this rea­son­ing did not de­pend much on the val­ues 10, 0, and −10. The util­ity of the bridge get­ting blown up could be −0.000001, and the agent still won’t cross. It isn’t weigh­ing the risk; it’s de­cided that the worst out­come is in­evitable. In the case of proof-based agents, I said that the over­all de­ci­sion not to cross was un­der­stand­able, since proof-based agents are un­able to weigh the risks. A prob­a­bil­is­tic agent, how­ever, should in­tu­itively be able to say “I don’t know whether the bridge will get blown up, be­cause it in­volves rea­son­ing about prop­er­ties of my­self which I’m fun­da­men­tally un­cer­tain about; but, the odds look de­cent.” But that’s not what hap­pens: in­stead, it is sure that cross­ing is un­fa­vor­able, no mat­ter what over­all prob­a­bil­ity it as­signs to P(A=cross)=0.

So, in this case we con­clude that the Troll Bridge ex­am­ple re­sults in a chicken-rule-based agent tak­ing the wrong ac­tion over­all. The agent shouldn’t be sure that it would cross “for the right rea­son” (it should as­sign some prob­a­bil­ity to P(A=cross)=0, since it can’t know that its own logic is con­sis­tent). How­ever, in­tu­itively, it should be able to as­sign some prob­a­bil­ity to this, and bal­ance the risks. If the down­side risk is U=-0.000001, and the prob­a­bil­ity it as­signs to its logic be­ing con­sis­tent is not similarly small, it should cross—and in do­ing so, it would get +10.

As men­tioned for the proof-based agent, the agent’s code is a bit ar­bi­trary, and it is worth ask­ing how im­por­tant the de­tails were. In par­tic­u­lar, the de­fault in the case of a tie was to not cross. What if the de­fault in case of a tie were to cross?

We then mod­ify the troll’s al­gorithm to blow up the bridge if and only if P(A=cross)=0 or there is a tie. The proof then goes through in the same way.

Per­haps you think that the prob­lem with the above ver­sion is that I as­sumed log­i­cal om­ni­science. It is un­re­al­is­tic to sup­pose that agents have be­liefs which perfectly re­spect logic. (Un)For­tu­nately, the ar­gu­ment doesn’t re­ally de­pend on this; it only re­quires that the agent re­spects proofs which it can see, and even­tu­ally sees the Löbian proof refer­enced. We can an­a­lyze this us­ing log­i­cal in­duc­tors, with a ver­sion of LIDT which has a chicken rule. (You can use LIDT which plays ep­silon-chicken, tak­ing any ac­tion with prob­a­bil­ity less than ep­silon; or, you can con­sider a ver­sion which takes ac­tions which the de­duc­tive state di­rectly dis­proves. Either case works.) We con­sider LIDT on a se­quence of troll-bridge prob­lems, and show that it even­tu­ally no­tices the Löbian proof and starts defect­ing. This is even more frus­trat­ing than the pre­vi­ous ex­am­ple, be­cause the agent can cross for a long time, ap­par­ently learn­ing that cross­ing is safe and re­li­ably gets +10 pay­off. Then, one day, it sud­denly sees the Löbian proof and stops cross­ing the bridge!

I leave that anal­y­sis as an ex­er­cise for the reader.


All of the ex­am­ples have de­pended on a ver­sion of the chicken rule. This leaves us with a fas­ci­nat­ing catch-22:

  • We need the chicken rule to avoid spu­ri­ous proofs. As a re­minder: spu­ri­ous proofs are cases where an agent would re­ject an ac­tion if it could prove that it would not take that ac­tion. Th­ese ac­tions can then be re­jected by an ap­pli­ca­tion of Löb’s the­o­rem. The chicken rule avoids this prob­lem by en­sur­ing that agents can­not know their own ac­tions, since if they did then they’d take a differ­ent ac­tion from the one they know they’ll take (and they know this, con­di­tional on their logic be­ing con­sis­tent).

  • How­ever, Troll Bridge shows that the chicken rule can lead to an­other kind of prob­le­matic Löbian proof.

So, we might take Troll Bridge to show that the chicken rule does not achieve its goal, and there­fore re­ject the chicken rule. How­ever, this con­clu­sion is very se­vere. We can­not sim­ply drop the chicken rule and open the gates to the (much more com­mon!) spu­ri­ous proofs. We would need an al­to­gether differ­ent way of re­ject­ing the spu­ri­ous proofs; per­haps a full ac­count of log­i­cal coun­ter­fac­tu­als.

Fur­ther­more, it is pos­si­ble to come up with var­i­ants of Troll Bridge which counter some such pro­pos­als. In par­tic­u­lar, Troll Bridge was origi­nally in­vented to counter proof-length coun­ter­fac­tu­als, which es­sen­tially gen­er­al­ize chicken rules, and there­fore lead to the same Troll Bridge prob­lems).

Another pos­si­ble con­clu­sion could be that Troll Bridge is sim­ply too hard, and we need to ac­cept that agents will be vuln­er­a­ble to this kind of rea­son­ing.