Conditioning, Counterfactuals, Exploration, and Gears

The view of coun­ter­fac­tu­als as just con­di­tion­ing on low-prob­a­bil­ity events has a lot go­ing for it. To be­gin with, in a bayesian set­ting, up­dates are done by con­di­tion­ing. A prob­a­bil­ity dis­tri­bu­tion, con­di­tioned on some event (an imag­i­nary up­date), and a prob­a­bil­ity dis­tri­bu­tion af­ter ac­tu­ally see­ing (an ac­tual up­date) will be iden­ti­cal.

There is an is­sue with con­di­tion­ing on low-prob­a­bil­ity events, how­ever. When has a low prob­a­bil­ity, the con­di­tional prob­a­bil­ity has di­vi­sion by a small num­ber, which am­plifies noise and small changes in the prob­a­bil­ity of the con­junc­tion, so es­ti­mates of prob­a­bil­ity con­di­tional on lower-prob­a­bil­ity events are more un­sta­ble. The worst-case ver­sion of this is con­di­tion­ing on a zero-prob­a­bil­ity event, be­cause the prob­a­bil­ity dis­tri­bu­tion af­ter con­di­tion­ing can be liter­ally any­thing with­out af­fect­ing the origi­nal prob­a­bil­ity dis­tri­bu­tion. One use­ful in­tu­tion for this is that prob­a­bil­ities con­di­tional on are go­ing to be less ac­cu­rate, when you’ve seen very few in­stances of oc­cur­ing, as the sam­ple size is too small to draw strong con­clu­sions.

How­ever, in the log­i­cal in­duc­tor set­ting, it is pos­si­ble to get around this with in­finite ex­plo­ra­tion in the limit. If you act un­pre­dictably enough to take bad ac­tions with some (very small) prob­a­bil­ity, then in the limit, you’ll ex­per­i­ment enough with bad ac­tions to have well-defined con­di­tional prob­a­bil­ities on tak­ing ac­tions you have (a limit­ing) prob­a­bil­ity 0 of tak­ing. The coun­ter­fac­tu­als of stan­dard con­di­tion­ing are those where the ex­plo­ra­tion step oc­cured, just as the coun­ter­fac­tu­als of modal UDT are those where the agents im­plicit chicken step went off be­cause it found a spu­ri­ous proof in a non­stan­dard model of PA.

Now, this no­tion of coun­ter­fac­tu­als can have bad effects, be­cause zoom­ing in on the lit­tle slice of prob­a­bil­ity mass where you do is differ­ent from the in­tu­itive no­tion of coun­ter­fact­ing on do­ing . Coun­ter­fac­tual on me walk­ing off a cliff, I’d be badly hurt, but con­di­tional on me do­ing so I’d prob­a­bly also have some sort of brain le­sion. Similar prob­lems ex­ist with Troll Bridge, and this mechanism is the rea­son why log­i­cal in­duc­tors con­verge to not giv­ing Omega money in a ver­sion of New­comb’s prob­lem where Omega can’t pre­dict the ex­plo­ra­tion step. Con­di­tional on them 2-box­ing, they are prob­a­bly ex­plor­ing in an un­pre­dictable way, which catches Omega un­aware and earns more money.

How­ever, there’s no bet­ter no­tion of coun­ter­fac­tu­als that cur­rently ex­ists, and in fully gen­eral en­vi­ron­ments, this is prob­a­bly as well as you can do. In multi-armed ban­dit prob­lems, there are many ac­tions with un­known pay­off, and the agent must con­verge to figur­ing out the best one. Pretty much all multi-armed ban­dit al­gorithms in­volve ex­per­i­ment­ing with ac­tions that are worse than baseline, which is a pretty strong clue that ex­plo­ra­tion into bad out­comes is nec­es­sary for good perfor­mance in ar­bi­trary en­vi­ron­ments. If you’re in a world that will re­ward or pun­ish you in ar­bi­trary if-then fash­ion for se­lect­ing any ac­tion, then learn­ing the re­ward given by three of the ac­tions doesn’t help you figure out the re­ward of the fourth ac­tion. Also, in a similar spirit as Troll Bridge, if there’s a lever that shocks you, but only when you pull it in the spirit of ex­per­i­men­ta­tion, then if you don’t have ac­cess to ex­actly how the lever is work­ing, but just the ex­ter­nal be­hav­ior, it’s perfectly rea­son­able to be­lieve that it just always shocks you (af­ter all, it’s done that all other times it was tried).

And yet, de­spite these ar­gu­ments, hu­mans can make suc­cess­ful en­g­ineer­ing de­signs op­er­at­ing in realms they don’t have per­sonal ex­pe­rience with. And hu­mans don’t seem to rea­son about what-ifs by check­ing what they think about the prob­a­bil­ity of and , and com­par­ing the two. Even when think­ing about stuff with medium-high prob­a­bil­ity, hu­mans seem to rea­son by imag­in­ing some world where the thing is true, and then rea­son­ing about con­se­quences of the thing. To put it an­other way, hu­mans are us­ing some no­tion of in place of con­di­tional prob­a­bil­ities.

Why can hu­mans do this at all?

Well, physics has the nice prop­erty that if you know some sort of ini­tial state, then you can make ac­cu­rate pre­dic­tions about what will hap­pen as a re­sult. And these laws have proven their dura­bil­ity un­der a bunch of strange cir­cum­stances that don’t typ­i­cally oc­cur in na­ture. Put an­other way, in the multi-armed ban­dit case, know­ing the out­put of three lev­ers doesn’t tell you what the fourth will do, while physics has far more cor­re­la­tion among the var­i­ous lev­ers/​in­ter­ven­tions on the en­vi­ron­ment, so it makes sense to trust the pre­dicted out­put of pul­ling a lever you’ve never pul­led be­fore. Un­der­stand­ing how the en­vi­ron­ment re­sponds to one se­quence of ac­tions tells you quite a bit about how things would go if you took some differ­ent se­quence of ac­tions. (Also, as a side note, con­di­tion­ing-based coun­ter­fac­tu­als work very badly with full trees of ac­tions in se­quence, due to com­bi­na­to­rial ex­plo­sion and the re­sult­ing de­crease in the prob­a­bil­ity of any par­tic­u­lar ac­tion se­quence)

The en­vi­ron­ment of math, and figur­ing out which al­gorithms you con­trol when you take some ac­tion you don’t, ap­pears to be in­ter­me­di­ate be­tween the case of fully gen­eral multi-armed ban­dit prob­lems, and physics, though I’m un­sure of this.

Now, to take a de­tour to Abram’s old post on gears . I’ll ex­erpt a spe­cific part.

Here, I’m sid­ing with David Deutsch’s ac­count in the first chap­ter of The Fabric of Real­ity. He ar­gues that un­der­stand­ing and pre­dic­tive ca­pa­bil­ity are dis­tinct, and that un­der­stand­ing is about hav­ing good ex­pla­na­tions. I may not ac­cept his whole cri­tique of Bayesi­anism, but that much of his view seems right to me. Un­for­tu­nately, he doesn’t give a tech­ni­cal ac­count of what “ex­pla­na­tion” and “un­der­stand­ing” could be.

Well, if you already have maxed-out pre­dic­tive ca­pa­bil­ity, what ex­tra thing does un­der­stand­ing buy you? What use­ful thing is cap­tured by “un­der­stand­ing” that isn’t cap­tured by “pre­dic­tive ca­pa­bil­ity”?

I’d per­son­ally put it this way. Pre­dic­tive ca­pa­bil­ity is how ac­cu­rate you are about what will hap­pen in the en­vi­ron­ment. But when you truly un­der­stand some­thing, you can use that to figure out ac­tions and in­ter­ven­tions to get the en­vi­ron­ment to ex­hibit weird be­hav­ior that wouldn’t have prece­dent in the past se­quence of events. You “un­der­stand” some­thing when you have a com­pact set of rules and con­straints tel­ling you how a change in start­ing con­di­tions af­fects some set of other con­di­tions and prop­er­ties, which feels con­nected to the no­tion of a gears-level model.

To sum­ma­rize, con­di­tion­ing-coun­ter­fac­tu­als are very likely the most gen­eral type, but when the en­vi­ron­ment (whether it be physics or math) has the prop­erty that the change in­duced by a differ­ent start­ing con­di­tion is de­scrib­able by a much smaller pro­gram than an if-then table for all start­ing con­di­tions, then it makes sense to call to call it a “le­gi­t­i­mate coun­ter­fac­tual”. The no­tion of there be­ing some­thing be­yond ep­silon-ex­plo­ra­tion is closely tied to hav­ing com­pact de­scrip­tions of the en­vi­ron­ment and how it be­haves un­der in­ter­ven­tions, in­stead of the max-en­tropy prior where you can’t say any­thing con­fi­dently about what hap­pens when you take a differ­ent ac­tion than you nor­mally do, and this also seems closely tied to Abram’s no­tion of a “gears-level model”

There are in­ter­est­ing par­alells to this in the AIXI set­ting. The “mod­els” would be the Tur­ing ma­chines that may be the en­vi­ron­ment, and the Tur­ing ma­chines are set up such that any ac­tion se­quence could be in­put into them and they would be­have in some pre­dictable way. This at­tains the prop­erty of ac­cu­rately pre­dict­ing con­se­quences of var­i­ous ac­tion se­quences AIXI doesn’t take if the world it is in­ter­act­ing with is low-com­plex­ity, for much the same rea­son as hu­mans can rea­son through con­se­quences of situ­a­tions they have never en­coun­tered us­ing rules that ac­cu­rately de­scribe the situ­a­tions they have en­coun­tered.

How­ever if AIXI has some high-prob­a­bil­ity world (ac­cord­ing to the start­ing dis­tri­bu­tion) where an ac­tion is very dan­ger­ous, it will avoid that ac­tion, at least un­til it can rule out that world by some other means. As Leike and Hut­ter en­ter­tain­ingly show, this “Bayesian para­noia” can make AIXI be­have ar­bi­trar­ily badly, just by choos­ing the uni­ver­sal tur­ing ma­chine ap­pro­pri­ately, to as­sign high prob­a­bil­ity to a world where AIXI goes to hell and gets 0 re­ward for­ever if it ever takes some ac­tion.

This ac­tu­ally seems ac­cept­able to me. Just don’t be born with a bad prior. Or, at least, there may be some self-fulfilling prophe­cies, but it’s bet­ter then hav­ing ex­plo­ra­tion into bad out­comes in ev­ery world with ir­re­versible traps. In par­tic­u­lar, note that ex­plo­ra­tion steps are re­flec­tively in­con­sis­tent, be­cause AIXI (when con­sid­er­ing the fu­ture) will do worse (ac­cord­ing to the cur­rent prob­a­bil­ity dis­tri­bu­tion over Tur­ing ma­chines) if it uses ex­plo­ra­tion, rather than us­ing the cur­rent prob­a­bil­ity dis­tri­bu­tion. AIXI is op­ti­mal ac­cord­ing to the en­vi­ron­ment dis­tri­bu­tion it starts with, while AIXI with ex­plo­ra­tion is not.