Troll Bridge

All of the re­sults in this post, and most of the in­for­mal ob­ser­va­tions/​in­ter­pre­ta­tions, are due to Sam Eisen­stat.

Troll Bridge is a de­ci­sion prob­lem which has been float­ing around for a while, but which has lacked a good in­tro­duc­tory post. The origi­nal post gives the es­sen­tial ex­am­ple, but it lacks the “troll bridge” story, which (1) makes it hard to un­der­stand, since it is just stated in math­e­mat­i­cal ab­strac­tion, and (2) makes it difficult to find if you search for “troll bridge”.

The ba­sic idea is that you want to cross a bridge. How­ever, there is a troll who will blow up the bridge with you on it, if (and only if) you cross it “for a dumb rea­son” — for ex­am­ple, due to un­sound logic. You can get to where you want to go by a worse path (through the stream). This path is bet­ter than be­ing blown up, though.

We ap­ply a Löbian proof to show not only that you choose not to cross, but fur­ther­more, that your coun­ter­fac­tual rea­son­ing is con­fi­dent that the bridge would have blown up if you had crossed. This is sup­posed to be a coun­terex­am­ple to var­i­ous pro­posed no­tions of coun­ter­fac­tual, and for var­i­ous pro­posed de­ci­sion the­o­ries.

The pseu­docode for the en­vi­ron­ment (more speci­fi­cally, the util­ity gained from the en­vi­ron­ment) is as fol­lows:

IE, if the agent crosses the bridge and is in­con­sis­tent, then U=-10. (□⊥ means “PA proves an in­con­sis­tency”.) Other­wise, if the agent crosses the bridge, U=+10. If nei­ther of these (IE, the agent does not cross the bridge), U=0.

The pseu­docode for the agent could be as fol­lows:

This is a lit­tle more com­pli­cated, but the idea is sup­posed to be that you search for ev­ery “ac­tion im­plies util­ity” pair, and take the ac­tion for which you can prove the high­est util­ity (with some tie-break­ing pro­ce­dure). Im­por­tantly, this is the kind of proof-based de­ci­sion the­ory which elimi­nates spu­ri­ous coun­ter­fac­tu­als in 5-and-10 type prob­lems. It isn’t that easy to trip up with Löbian proofs. (His­tor­i­cal/​ter­minolog­i­cal note: This de­ci­sion the­ory was ini­tially called MUDT, and is still some­times referred to in that way. How­ever, I now of­ten call it proof-based de­ci­sion the­ory, be­cause it isn’t cen­trally a UDT. “Mo­dal DT” (MDT) would be rea­son­able, but the modal op­er­a­tor in­volved is the “prov­abil­ity” op­er­a­tor, so “proof-based DT” seems more di­rect.)

Now, the proof:

  • Rea­son­ing within PA (ie, the logic of the agent):

    • Sup­pose the agent crosses.

      • Fur­ther sup­pose that the agent proves that cross­ing im­plies U=-10.

        • Ex­am­in­ing the source code of the agent, be­cause we’re as­sum­ing the agent crosses, ei­ther PA proved that cross­ing im­plies U=+10, or it proved that cross­ing im­plies U=0.

        • So, ei­ther way, PA is in­con­sis­tent—by way of 0=-10 or +10=-10.

        • So the troll ac­tu­ally blows up the bridge, and re­ally, U=-10.

      • There­fore (pop­ping out of the sec­ond as­sump­tion), if the agent proves that cross­ing im­plies U=-10, then in fact cross­ing im­plies U=-10.

      • By Löb’s the­o­rem, cross­ing re­ally im­plies U=-10.

      • So (since we’re still un­der the as­sump­tion that the agent crosses), U=-10.

    • So (pop­ping out of the as­sump­tion that the agent crosses), the agent cross­ing im­plies U=-10.

  • Since we proved all of this in PA, the agent proves it, and proves no bet­ter util­ity in ad­di­tion (un­less PA is truly in­con­sis­tent). On the other hand, it will prove that not cross­ing gives it a safe U=0. So it will in fact not cross.

The para­dox­i­cal as­pect of this ex­am­ple is not that the agent doesn’t cross—it makes sense that a proof-based agent can’t cross a bridge whose safety is de­pen­dent on the agent’s own logic be­ing con­sis­tent, since proof-based agents can’t know whether their logic is con­sis­tent. Rather, the point is that the agent’s “coun­ter­fac­tual” rea­son­ing looks crazy. Ar­guably, the agent should be un­cer­tain of what hap­pens if it crosses the bridge, rather than cer­tain that the bridge would blow up. Fur­ther­more, the agent is rea­son­ing as if it can con­trol whether PA is con­sis­tent, which is ar­guably wrong.

Anal­ogy to Smok­ing Lesion

One in­ter­pre­ta­tion of this thought-ex­per­i­ment is that it shows proof-based de­ci­sion the­ory to be es­sen­tially a ver­sion of EDT, in that it has EDT-like be­hav­ior for Smok­ing Le­sion. The anal­ogy to Smok­ing Le­sion is rel­a­tively strong:

  • An agent is at risk of hav­ing a sig­nifi­cant in­ter­nal is­sue. (In Smok­ing Le­sion, it’s a med­i­cal is­sue. In Troll Bridge, it is log­i­cal in­con­sis­tency.)

  • The in­ter­nal is­sue would bias the agent to­ward a par­tic­u­lar ac­tion. (In Smok­ing Le­sion, the agent smokes. In Troll Bridge, an in­con­sis­tent agent crosses the bridge.)

  • The in­ter­nal is­sue also causes some imag­ined prac­ti­cal prob­lem for the agent. (In Smok­ing Le­sion, the le­sion makes one more likely to get can­cer. In Troll Bridge, the in­con­sis­tency would make the troll blow up the bridge.)

  • There is a chain of rea­son­ing which com­bines these facts to stop the agent from tak­ing the ac­tion. (In smok­ing le­sion, EDT re­fuses to smoke due to the cor­re­la­tion with can­cer. In Troll Bridge, the proof-based agent re­fuses to cross the bridge be­cause of a Löbian proof that cross­ing the bridge leads to dis­aster.)

  • We in­tu­itively find the con­clu­sion non­sen­si­cal. (It seems the EDT agent should smoke; it seems the proof-based agent should not ex­pect the bridge to ex­plode.)

In­deed, the anal­ogy to smok­ing le­sion seems to strengthen the fi­nal point—that the coun­ter­fac­tual rea­son­ing is wrong.

I plan to re­vise this post at some point, be­cause I’m leav­ing some stuff out. Planned ad­den­dums:

  • It might seem like the agent doesn’t cross the bridge be­cause of the se­vere risk aver­sion of proof-based DT. How­ever, there is a ver­sion of the ar­gu­ment which works for agents with prob­a­bil­is­tic rea­son­ing, illus­trat­ing that the prob­lem is not solved by as­sign­ing a low prob­a­bil­ity to the in­con­sis­tency of PA. Agents will not cross the bridge even if the cost of the ex­plo­sion is ex­tremely low, just slightly worse than not cross­ing. This makes the prob­lem look much more se­vere, be­cause it in­tu­itively seems as if putting low prob­a­bil­ity on in­con­sis­tency should not only block the “weird” coun­ter­fac­tual, but ac­tu­ally al­low an agent to cross the bridge.

  • Fur­ther­more, there’s a ver­sion which works for log­i­cal in­duc­tion. This shows that the prob­lem is not sen­si­tive to the de­tails of how one han­dles log­i­cal un­cer­tainty, and in­deed, is still pre­sent for the best the­ory of log­i­cal un­cer­tainty we cur­rently have.