Troll Bridge

All of the re­sults in this post, and most of the in­for­mal ob­ser­va­tions/​in­ter­pre­ta­tions, are due to Sam Eisen­stat. I think the Troll Bridge story, as a way to make the de­ci­sion prob­lem un­der­stand­able, is due to Tsvi; but I’m not sure.

Pure Logic Version

Troll Bridge is a de­ci­sion prob­lem which has been float­ing around for a while, but which has lacked a good in­tro­duc­tory post. The origi­nal post gives the es­sen­tial ex­am­ple, but it lacks the “troll bridge” story, which (1) makes it hard to un­der­stand, since it is just stated in math­e­mat­i­cal ab­strac­tion, and (2) makes it difficult to find if you search for “troll bridge”.

The ba­sic idea is that you want to cross a bridge. How­ever, there is a troll who will blow up the bridge with you on it, if (and only if) you cross it “for a dumb rea­son” — for ex­am­ple, due to un­sound logic. You can get to where you want to go by a worse path (through the stream). This path is bet­ter than be­ing blown up, though.

We ap­ply a Löbian proof to show not only that you choose not to cross, but fur­ther­more, that your coun­ter­fac­tual rea­son­ing is con­fi­dent that the bridge would have blown up if you had crossed. This is sup­posed to be a coun­terex­am­ple to var­i­ous pro­posed no­tions of coun­ter­fac­tual, and for var­i­ous pro­posed de­ci­sion the­o­ries.

The pseu­docode for the en­vi­ron­ment (more speci­fi­cally, the util­ity gained from the en­vi­ron­ment) is as fol­lows:

IE, if the agent crosses the bridge and is in­con­sis­tent, then U=-10. (□⊥ means “PA proves an in­con­sis­tency”.) Other­wise, if the agent crosses the bridge, U=+10. If nei­ther of these (IE, the agent does not cross the bridge), U=0.

The pseu­docode for the agent could be as fol­lows:

This is a lit­tle more com­pli­cated, but the idea is sup­posed to be that you search for ev­ery “ac­tion im­plies util­ity” pair, and take the ac­tion for which you can prove the high­est util­ity (with some tie-break­ing pro­ce­dure). Im­por­tantly, this is the kind of proof-based de­ci­sion the­ory which elimi­nates spu­ri­ous coun­ter­fac­tu­als in 5-and-10 type prob­lems. It isn’t that easy to trip up with Löbian proofs. (His­tor­i­cal/​ter­minolog­i­cal note: This de­ci­sion the­ory was ini­tially called MUDT, and is still some­times referred to in that way. How­ever, I now of­ten call it proof-based de­ci­sion the­ory, be­cause it isn’t cen­trally a UDT. “Mo­dal DT” (MDT) would be rea­son­able, but the modal op­er­a­tor in­volved is the “prov­abil­ity” op­er­a­tor, so “proof-based DT” seems more di­rect.)

Now, the proof:

  • Rea­son­ing within PA (ie, the logic of the agent):

    • Sup­pose the agent crosses.

      • Fur­ther sup­pose that the agent proves that cross­ing im­plies U=-10.

        • Ex­am­in­ing the source code of the agent, be­cause we’re as­sum­ing the agent crosses, ei­ther PA proved that cross­ing im­plies U=+10, or it proved that cross­ing im­plies U=0.

        • So, ei­ther way, PA is in­con­sis­tent—by way of 0=-10 or +10=-10.

        • So the troll ac­tu­ally blows up the bridge, and re­ally, U=-10.

      • There­fore (pop­ping out of the sec­ond as­sump­tion), if the agent proves that cross­ing im­plies U=-10, then in fact cross­ing im­plies U=-10.

      • By Löb’s the­o­rem, cross­ing re­ally im­plies U=-10.

      • So (since we’re still un­der the as­sump­tion that the agent crosses), U=-10.

    • So (pop­ping out of the as­sump­tion that the agent crosses), the agent cross­ing im­plies U=-10.

  • Since we proved all of this in PA, the agent proves it, and proves no bet­ter util­ity in ad­di­tion (un­less PA is truly in­con­sis­tent). On the other hand, it will prove that not cross­ing gives it a safe U=0. So it will in fact not cross.

The para­dox­i­cal as­pect of this ex­am­ple is not that the agent doesn’t cross—it makes sense that a proof-based agent can’t cross a bridge whose safety is de­pen­dent on the agent’s own logic be­ing con­sis­tent, since proof-based agents can’t know whether their logic is con­sis­tent. Rather, the point is that the agent’s “coun­ter­fac­tual” rea­son­ing looks crazy. (How­ever, keep read­ing for a ver­sion of the ar­gu­ment where it does make the agent take the wrong ac­tion.) Ar­guably, the agent should be un­cer­tain of what hap­pens if it crosses the bridge, rather than cer­tain that the bridge would blow up. Fur­ther­more, the agent is rea­son­ing as if it can con­trol whether PA is con­sis­tent, which is ar­guably wrong.

In a com­ment, Stu­art points out that this rea­son­ing seems highly de­pen­dent on the code of the agent; the “else” clause could be differ­ent, and the ar­gu­ment falls apart. I think the ar­gu­ment keeps its force:

  • On the one hand, it’s still very con­cern­ing if the sen­si­bil­ity of the agent de­pends greatly on which ac­tion it performs in the “else” case.

  • On the other hand, we can mod­ify the troll’s be­hav­ior to match the mod­ified agent. The gen­eral rule is that the troll blows up the bridge if the agent would cross for a “dumb rea­son”—the agent then con­cludes that the bridge would be blown up if it crossed. I can no longer com­plain that the agent rea­sons as if it were con­trol­ling the con­sis­tency of PA, but I can still com­plain that the agent thinks an ac­tion is bad be­cause that ac­tion in­di­cates its own in­san­ity, due to a trou­blingly cir­cu­lar ar­gu­ment.

Anal­ogy to Smok­ing Lesion

One in­ter­pre­ta­tion of this thought-ex­per­i­ment is that it shows proof-based de­ci­sion the­ory to be es­sen­tially a ver­sion of EDT, in that it has EDT-like be­hav­ior for Smok­ing Le­sion. The anal­ogy to Smok­ing Le­sion is rel­a­tively strong:

  • An agent is at risk of hav­ing a sig­nifi­cant in­ter­nal is­sue. (In Smok­ing Le­sion, it’s a med­i­cal is­sue. In Troll Bridge, it is log­i­cal in­con­sis­tency.)

  • The in­ter­nal is­sue would bias the agent to­ward a par­tic­u­lar ac­tion. (In Smok­ing Le­sion, the agent smokes. In Troll Bridge, an in­con­sis­tent agent crosses the bridge.)

  • The in­ter­nal is­sue also causes some imag­ined prac­ti­cal prob­lem for the agent. (In Smok­ing Le­sion, the le­sion makes one more likely to get can­cer. In Troll Bridge, the in­con­sis­tency would make the troll blow up the bridge.)

  • There is a chain of rea­son­ing which com­bines these facts to stop the agent from tak­ing the ac­tion. (In smok­ing le­sion, EDT re­fuses to smoke due to the cor­re­la­tion with can­cer. In Troll Bridge, the proof-based agent re­fuses to cross the bridge be­cause of a Löbian proof that cross­ing the bridge leads to dis­aster.)

  • We in­tu­itively find the con­clu­sion non­sen­si­cal. (It seems the EDT agent should smoke; it seems the proof-based agent should not ex­pect the bridge to ex­plode.)

In­deed, the anal­ogy to smok­ing le­sion seems to strengthen the fi­nal point—that the coun­ter­fac­tual rea­son­ing is wrong.

I’ve come to think of Troll Bridge as “the real smok­ing le­sion”, since I’m gen­er­ally not satis­fied with how smok­ing le­sion is set up.

But is proof-based de­ci­sion the­ory re­ally a ver­sion of EDT? I think there’s more to say about the anal­ogy, but a sim­ple ex­pla­na­tion is this: both EDT and proof-based de­ci­sion the­ory eval­u­ate ac­tions by adding them to the knowl­edge base and see­ing what the world looks like un­der that ad­di­tional as­sump­tion. Or, to put it differ­ently, proof-based DT rea­sons about ac­tions as if they’re ob­ser­va­tions. That’s the fun­da­men­tal idea of ev­i­den­tial de­ci­sion the­ory.

Prob­a­bil­is­tic Version

For the purely log­i­cal ver­sion, I said that we can’t fault the con­clu­sion (be­cause the agent can’t prove that it is safe to cross the bridge) – only the rea­son­ing is be­ing cri­tiqued. How­ever, the prob­a­bil­is­tic ver­sion bet­ter demon­strates the sever­ity of the rea­son­ing er­ror; it illus­trates an EDT agent mak­ing what seems to be an out­right mis­take.

It’s rea­son­able to avoid cross­ing the bridge if you’re only think­ing about prov­able re­sults. But we might ex­pect an agent who bal­ances risks and re­wards would de­cide to cross, so long as it be­lieves the prob­a­bil­ity of PA be­ing in­con­sis­tent is low. Thus turns out not to be the case.

Here’s the setup.

Sup­pose that the agent uses a prob­a­bil­ity dis­tri­bu­tion which “re­spects logic” in the sense that it as­signs prob­a­bil­ity zero to any­thing which is log­i­cally re­futable. (This as­sumes log­i­cal om­ni­science. I’ll drop that as­sump­tion later.)

The agent is as fol­lows: if P(cross)=0, cross; if P(¬cross)=0, don’t cross; else, take the ac­tion with high­est ex­pected util­ity, break­ing ties by not cross­ing. The idea is that we are try­ing to take the high­est-ex­pected-util­ity ac­tion; how­ever, we must han­dle cases where this isn’t well-defined. Choos­ing to take an ac­tion when the prob­a­bil­ity of the ac­tion is zero guaran­tees that the agent can never prove its own ac­tion; it is a ver­sion of the chicken rule. Break­ing ties by not cross­ing is ar­bi­trary, but gives the sim­plest ver­sion of the ex­am­ple, be­cause this choice re­duces the num­ber of cases we need to check.

The troll is as fol­lows: if the agent crosses the bridge due to the P(cross)=0 clause, blow up the bridge.

Rea­son­ing in the logic of the agent:

  • Sup­pose

    • Sup­pose

      • Ex­am­in­ing the agent, ei­ther cross­ing had higher ex­pected util­ity, or P(cross)=0. But we as­sumed , so it must be the lat­ter. So the bridge gets blown up.

    • There­fore, .

  • There­fore,

Thus, by Löb’s the­o­rem, we see that .

There­fore, since the agent is cer­tain of things which are prov­able, the agent as­signs ex­pected value −10 to cross­ing. Since the ex­pected value of the other ac­tion will always be higher than this, and since we know the agent doesn’t ever re­ally en­ter into the P=0 con­di­tions, the agent will choose not to cross.

No­tice that this rea­son­ing did not de­pend much on the val­ues 10, 0, and −10. The util­ity of the bridge get­ting blown up could be −0.000001, and the agent still won’t cross. It isn’t weigh­ing the risk; it’s de­cided that the worst out­come is in­evitable. In the case of proof-based agents, I said that the over­all de­ci­sion not to cross was un­der­stand­able, since proof-based agents are un­able to weigh the risks. A prob­a­bil­is­tic agent, how­ever, should in­tu­itively be able to say “I don’t know whether the bridge will get blown up, be­cause it in­volves rea­son­ing about prop­er­ties of my­self which I’m fun­da­men­tally un­cer­tain about; but, the odds look de­cent.” But that’s not what hap­pens: in­stead, it is sure that cross­ing is un­fa­vor­able, no mat­ter what over­all prob­a­bil­ity it as­signs to P(A=cross)=0.

So, in this case we con­clude that the Troll Bridge ex­am­ple re­sults in a chicken-rule-based agent tak­ing the wrong ac­tion over­all. The agent shouldn’t be sure that it would cross “for the right rea­son” (it should as­sign some prob­a­bil­ity to P(A=cross)=0, since it can’t know that its own logic is con­sis­tent). How­ever, in­tu­itively, it should be able to as­sign some prob­a­bil­ity to this, and bal­ance the risks. If the down­side risk is U=-0.000001, and the prob­a­bil­ity it as­signs to its logic be­ing con­sis­tent is not similarly small, it should cross—and in do­ing so, it would get +10.

As men­tioned for the proof-based agent, the agent’s code is a bit ar­bi­trary, and it is worth ask­ing how im­por­tant the de­tails were. In par­tic­u­lar, the de­fault in the case of a tie was to not cross. What if the de­fault in case of a tie were to cross?

We then mod­ify the troll’s al­gorithm to blow up the bridge if and only if P(A=cross)=0 or there is a tie. The proof then goes through in the same way.

Per­haps you think that the prob­lem with the above ver­sion is that I as­sumed log­i­cal om­ni­science. It is un­re­al­is­tic to sup­pose that agents have be­liefs which perfectly re­spect logic. (Un)For­tu­nately, the ar­gu­ment doesn’t re­ally de­pend on this; it only re­quires that the agent re­spects proofs which it can see, and even­tu­ally sees the Löbian proof refer­enced.

Ran­dom Exploration

The frus­trat­ing thing about Troll Bridge is that it seems like the agent could just cross the bridge, and things would be fine. The proof that things wouldn’t be fine re­lies on the fact that the agent ac­cepts that very proof as suffi­cient rea­son; so can’t we just ig­nore that kind of proof some­how?

One thing you might try is to con­sider a learn­ing agent, and force ran­dom ex­plo­ra­tion so the agent just crosses the bridge some­times. If the agent crosses the bridge, it should be able to see that it’s safe, right?

How­ever, we have to ask: what’s the ap­pro­pri­ate ver­sion of Troll Bridge for the ex­plor­ing agent? Re­mem­ber I said that the ba­sic idea of Troll Bridge is that the troll blows up the bridge if the agent crosses “for a dumb rea­son”—ie, for a rea­son other than “cross­ing is a good idea”. Ran­dom ex­plo­ra­tion falls into this cat­e­gory.

Fur­ther­more, ran­dom ex­plo­ra­tion is re­ally closely analo­gous to the chicken rule. Con­sider:

  • (A) The first chicken rule I men­tioned: if you prove that you don’t take an ac­tion, you im­me­di­ately take it.

  • (B) The sec­ond chicken rule men­tioned: if P(A)=0, then do A.

  • (C) Ep­silon-chicken: if P(A)<ep­silon, then do A.

  • (D) Ep­silon-ex­plo­ra­tion: ran­domly do A with prob­a­bil­ity ep­silon.

(A) is very similar to (B) if the prob­a­bil­ity dis­tri­bu­tion re­spects proofs in the way I men­tioned pre­vi­ously; if the prob­a­bil­ity dis­tri­bu­tion has the fur­ther prop­erty of as­sign­ing pos­i­tive prob­a­bil­ity to any­thing not dis­proven, then they’re ex­actly the same. (C) is a nat­u­ral gen­er­al­iza­tion of (B). (B) tells us what to do when the EDT de­ci­sion rule isn’t well-defined (we can’t con­di­tion on prob­a­bil­ity zero events). (C) gen­er­al­izes this by treat­ing small-prob­a­bil­ity events in the same way (we might ex­pect our con­di­tional be­liefs to be bad for small-prob­a­bil­ity events, since we pre­sum­ably don’t have much ex­pe­rience with them; so we treat it as if it’s the un­defined case). (B) en­sures there are no zero-prob­a­bil­ity ac­tions, which means we don’t ac­tu­ally ever have to deal with the EDT de­ci­sion rule be­ing non-well-defined; similarly, (C) en­sures no ac­tion is al­most surely never taken, which helps en­sure that our con­di­tional ex­pec­ta­tions for all ac­tions are em­piri­cally grounded.

(C) and (D) are very similar; (C) just sup­plies pseu­do­ran­dom­ness vie the agent’s own prob­a­bil­ities. When­ever the agent is too con­fi­dent about what it does, it does some­thing else. This makes the ran­dom­iza­tion un­pre­dictable too the agent. (D) sup­plies ran­dom­ness from an ex­ter­nal source in­stead.

So we have a chain con­nect­ing the chicken rule to ran­dom ex­plo­ra­tion.

(Another way of think­ing about the similar­ity: The chicken rule en­sures that there are mod­els of the agent’s be­liefs in which each ac­tion is taken, so that the agent can have sen­si­ble log­i­cal rea­son­ing about each al­ter­na­tive. Ep­silon-ex­plo­ra­tion en­sures that the em­piri­cal ex­pe­riences of the agent in­clude all of the ac­tions, so that the agent can have rea­son­able em­piri­cal es­ti­mates of con­se­quences of each ac­tion.)

OK. So Troll Bridge for an ex­plor­ing agent should blow up the bridge on ex­plo­ra­tion, or if PA is in­con­sis­tent. An agent could look like this:

  • With small prob­a­bil­ity, ex­plore.

    • If ex­plor­ing, flip a coin to cross or not cross.

  • If P(cross)=0, cross.

  • If P(¬cross)=0, don’t cross.

  • Other­wise, take the max-ex­pected-value ac­tion, break­ing ties ran­domly.

I’m still as­sum­ing that the agent’s prob­a­bil­ity dis­tri­bu­tion re­spects proofs, as be­fore. I’m also as­sum­ing this agent is play­ing the game re­peat­edly, and learn­ing. I also must as­sume that the agent has found Now, the agent rea­sons:

  • Sup­pose (cross u=-10) for a par­tic­u­lar round.

    • Fur­ther sup­pose I crossed on that round.

      • By the first sup­po­si­tion, I knew the pay­out of cross­ing to be low; and I must also have known that the pay­out of not cross­ing is higher, since I can prove that. Since I can prove what both pay­outs are, the ex­pected val­ues must equal those, un­less PA is in­con­sis­tent (in which case P(cross)=0 any­way, since my be­liefs re­spect proofs). So I can only be cross­ing the bridge for two rea­sons—ei­ther this is an ex­plo­ra­tion round, or P(cross)=0.

      • In ei­ther case, cross­ing the bridge yields pay­out u=-10.

    • There­fore, cross u=-10 in fact.

  • So (cross u=-10) (cross u=-10).

Since the agent proves that a proof of cross­ing be­ing bad im­plies cross­ing is ac­tu­ally bad, the agent fur­ther must prove that cross­ing is bad in fact, by Löb.

I did this for the log­i­cally om­ni­scient case again, but as be­fore, I claim that you can trans­late the above proof to work in the case that the agent’s be­liefs re­spect proofs it can find. That’s maybe a bit weird, though, be­cause it in­volves a Bayesian agent up­dat­ing on log­i­cal proofs; we know this isn’t a par­tic­u­larly good way of han­dling log­i­cal un­cer­tainty.

We can use log­i­cal in­duc­tion in­stead, us­ing an ep­silon-ex­plor­ing ver­sion of LIDT. We con­sider LIDT on a se­quence of troll-bridge prob­lems, and show that it even­tu­ally no­tices the Löbian proof and starts re­fus­ing to cross. This is even more frus­trat­ing than the pre­vi­ous ex­am­ples, be­cause LIDT might suc­cess­fully cross for a long time, ap­par­ently learn­ing that cross­ing is safe, and re­li­ably gets +10 pay­off. Then, one day, it finds the Löbian proof and stops cross­ing the bridge!

That case is a lit­tle more com­pli­cated to work out than the Bayesian prob­a­bil­ity case, and I omit the proof here.

Non-ex­am­ples: RL

On the other hand, con­sider an agent which uses ran­dom ex­plo­ra­tion but doesn’t do any log­i­cal rea­son­ing, like a typ­i­cal RL agent. Such an agent doesn’t need any chicken rule, since it doesn’t care about proofs of what it’ll do. It still needs to ex­plore, though. So the troll can blow up the bridge when­ever the RL agent crosses due to ex­plo­ra­tion.

This ob­vi­ously messes with the RL agent’s abil­ity to learn to cross the bridge. The RL agent might never learn to cross, since ev­ery time it tries it, it looks bad. So this is sort of similar to Troll Bridge.

How­ever, I think this isn’t re­ally the point of Troll Bridge. The key differ­ence is this: the RL agent can get past the bridge if its prior ex­pec­ta­tion that cross­ing is a good idea is high enough. It just starts out cross­ing, and hap­pily crosses all the time.

Troll Bridge is about the in­evitable con­fi­dence that cross­ing the bridge is bad. We would be fine if an agent de­cided not to cross be­cause it as­signed high prob­a­bil­ity to PA be­ing in­con­sis­tent. The RL ex­am­ple seems similar in that it de­pends on the agent’s prior.

We could try to al­ter the ex­am­ple to get that kind of in­evita­bil­ity. Maybe we ar­gue it’s still “dumb” to cross only be­cause you start with a high prior prob­a­bil­ity of it be­ing good. Have the troll pun­ish cross­ing un­less the cross­ing is jus­tified by an em­piri­cal his­tory of cross­ing be­ing good. Then RL agents do poorly no mat­ter what—no one can get the good out­come in or­der to build up the his­tory, since get­ting the good out­come re­quires the his­tory.

But this still doesn’t seem so in­ter­est­ing. You’re just mess­ing with these agents. It isn’t illus­trat­ing the de­gree of patholog­i­cal rea­son­ing which the Löbian proof illus­trates—of course you don’t put your hand in the fire if you get burned ev­ery sin­gle time you try it. There’s noth­ing wrong with the way the RL agent is re­act­ing!

So, Troll Bridge seems to be more ex­clu­sively about agents who do rea­son log­i­cally.


All of the ex­am­ples have de­pended on a ver­sion of the chicken rule. This leaves us with a fas­ci­nat­ing catch-22:

  • We need the chicken rule to avoid spu­ri­ous proofs. As a re­minder: spu­ri­ous proofs are cases where an agent would re­ject an ac­tion if it could prove that it would not take that ac­tion. Th­ese ac­tions can then be re­jected by an ap­pli­ca­tion of Löb’s the­o­rem. The chicken rule avoids this prob­lem by en­sur­ing that agents can­not know their own ac­tions, since if they did then they’d take a differ­ent ac­tion from the one they know they’ll take (and they know this, con­di­tional on their logic be­ing con­sis­tent).

  • How­ever, Troll Bridge shows that the chicken rule can lead to an­other kind of prob­le­matic Löbian proof.

So, we might take Troll Bridge to show that the chicken rule does not achieve its goal, and there­fore re­ject the chicken rule. How­ever, this con­clu­sion is very se­vere. We can­not sim­ply drop the chicken rule and open the gates to the (much more com­mon!) spu­ri­ous proofs. We would need an al­to­gether differ­ent way of re­ject­ing the spu­ri­ous proofs; per­haps a full ac­count of log­i­cal coun­ter­fac­tu­als.

Fur­ther­more, it is pos­si­ble to come up with var­i­ants of Troll Bridge which counter some such pro­pos­als. In par­tic­u­lar, Troll Bridge was origi­nally in­vented to counter proof-length coun­ter­fac­tu­als, which es­sen­tially gen­er­al­ize chicken rules, and there­fore lead to the same Troll Bridge prob­lems).

Another pos­si­ble con­clu­sion could be that Troll Bridge is sim­ply too hard, and we need to ac­cept that agents will be vuln­er­a­ble to this kind of rea­son­ing.