Condi­tion­ing, Coun­ter­fac­tu­als, Ex­plor­a­tion, and Gears

The view of coun­ter­fac­tu­als as just con­di­tion­ing on low-prob­ab­il­ity events has a lot go­ing for it. To be­gin with, in a bayesian set­ting, up­dates are done by con­di­tion­ing. A prob­ab­il­ity dis­tri­bu­tion, con­di­tioned on some event (an ima­gin­ary up­date), and a prob­ab­il­ity dis­tri­bu­tion after ac­tu­ally see­ing (an ac­tual up­date) will be identical.

There is an is­sue with con­di­tion­ing on low-prob­ab­il­ity events, how­ever. When has a low prob­ab­il­ity, the con­di­tional prob­ab­il­ity has di­vi­sion by a small num­ber, which amp­li­fies noise and small changes in the prob­ab­il­ity of the con­junc­tion, so es­tim­ates of prob­ab­il­ity con­di­tional on lower-prob­ab­il­ity events are more un­stable. The worst-case ver­sion of this is con­di­tion­ing on a zero-prob­ab­il­ity event, be­cause the prob­ab­il­ity dis­tri­bu­tion after con­di­tion­ing can be lit­er­ally any­thing without af­fect­ing the ori­ginal prob­ab­il­ity dis­tri­bu­tion. One use­ful in­tu­tion for this is that prob­ab­il­it­ies con­di­tional on are go­ing to be less ac­cur­ate, when you’ve seen very few in­stances of oc­cur­ing, as the sample size is too small to draw strong con­clu­sions.

However, in the lo­gical in­ductor set­ting, it is pos­sible to get around this with in­fin­ite ex­plor­a­tion in the limit. If you act un­pre­dict­ably enough to take bad ac­tions with some (very small) prob­ab­il­ity, then in the limit, you’ll ex­per­i­ment enough with bad ac­tions to have well-defined con­di­tional prob­ab­il­it­ies on tak­ing ac­tions you have (a lim­it­ing) prob­ab­il­ity 0 of tak­ing. The coun­ter­fac­tu­als of stand­ard con­di­tion­ing are those where the ex­plor­a­tion step oc­cured, just as the coun­ter­fac­tu­als of modal UDT are those where the agents im­pli­cit chicken step went off be­cause it found a spuri­ous proof in a non­stand­ard model of PA.

Now, this no­tion of coun­ter­fac­tu­als can have bad ef­fects, be­cause zoom­ing in on the little slice of prob­ab­il­ity mass where you do is dif­fer­ent from the in­tu­it­ive no­tion of coun­ter­fact­ing on do­ing . Coun­ter­fac­tual on me walk­ing off a cliff, I’d be badly hurt, but con­di­tional on me do­ing so I’d prob­ably also have some sort of brain le­sion. Sim­ilar prob­lems ex­ist with Troll Bridge, and this mech­an­ism is the reason why lo­gical in­duct­ors con­verge to not giv­ing Omega money in a ver­sion of New­comb’s prob­lem where Omega can’t pre­dict the ex­plor­a­tion step. Condi­tional on them 2-box­ing, they are prob­ably ex­plor­ing in an un­pre­dict­able way, which catches Omega un­aware and earns more money.

However, there’s no bet­ter no­tion of coun­ter­fac­tu­als that cur­rently ex­ists, and in fully gen­eral en­vir­on­ments, this is prob­ably as well as you can do. In multi-armed ban­dit prob­lems, there are many ac­tions with un­known pay­off, and the agent must con­verge to fig­ur­ing out the best one. Pretty much all multi-armed ban­dit al­gorithms in­volve ex­per­i­ment­ing with ac­tions that are worse than baseline, which is a pretty strong clue that ex­plor­a­tion into bad out­comes is ne­ces­sary for good per­form­ance in ar­bit­rary en­vir­on­ments. If you’re in a world that will re­ward or pun­ish you in ar­bit­rary if-then fash­ion for se­lect­ing any ac­tion, then learn­ing the re­ward given by three of the ac­tions doesn’t help you fig­ure out the re­ward of the fourth ac­tion. Also, in a sim­ilar spirit as Troll Bridge, if there’s a lever that shocks you, but only when you pull it in the spirit of ex­per­i­ment­a­tion, then if you don’t have ac­cess to ex­actly how the lever is work­ing, but just the ex­ternal be­ha­vior, it’s per­fectly reas­on­able to be­lieve that it just al­ways shocks you (after all, it’s done that all other times it was tried).

And yet, des­pite these ar­gu­ments, hu­mans can make suc­cess­ful en­gin­eer­ing designs op­er­at­ing in realms they don’t have per­sonal ex­per­i­ence with. And hu­mans don’t seem to reason about what-ifs by check­ing what they think about the prob­ab­il­ity of and , and com­par­ing the two. Even when think­ing about stuff with me­dium-high prob­ab­il­ity, hu­mans seem to reason by ima­gin­ing some world where the thing is true, and then reas­on­ing about con­sequences of the thing. To put it an­other way, hu­mans are us­ing some no­tion of in place of con­di­tional prob­ab­il­it­ies.

Why can hu­mans do this at all?

Well, phys­ics has the nice prop­erty that if you know some sort of ini­tial state, then you can make ac­cur­ate pre­dic­tions about what will hap­pen as a res­ult. And these laws have proven their dur­ab­il­ity un­der a bunch of strange cir­cum­stances that don’t typ­ic­ally oc­cur in nature. Put an­other way, in the multi-armed ban­dit case, know­ing the out­put of three levers doesn’t tell you what the fourth will do, while phys­ics has far more cor­rel­a­tion among the vari­ous levers/​in­ter­ven­tions on the en­vir­on­ment, so it makes sense to trust the pre­dicted out­put of pulling a lever you’ve never pulled be­fore. Under­stand­ing how the en­vir­on­ment re­sponds to one se­quence of ac­tions tells you quite a bit about how things would go if you took some dif­fer­ent se­quence of ac­tions. (Also, as a side note, con­di­tion­ing-based coun­ter­fac­tu­als work very badly with full trees of ac­tions in se­quence, due to com­bin­at­or­ial ex­plo­sion and the res­ult­ing de­crease in the prob­ab­il­ity of any par­tic­u­lar ac­tion se­quence)

The en­vir­on­ment of math, and fig­ur­ing out which al­gorithms you con­trol when you take some ac­tion you don’t, ap­pears to be in­ter­me­di­ate between the case of fully gen­eral multi-armed ban­dit prob­lems, and phys­ics, though I’m un­sure of this.

Now, to take a de­tour to Abram’s old post on gears . I’ll ex­erpt a spe­cific part.

Here, I’m sid­ing with David Deutsch’s ac­count in the first chapter of The Fab­ric of Real­ity. He ar­gues that un­der­stand­ing and pre­dict­ive cap­ab­il­ity are dis­tinct, and that un­der­stand­ing is about hav­ing good ex­plan­a­tions. I may not ac­cept his whole cri­tique of Bayesian­ism, but that much of his view seems right to me. Un­for­tu­nately, he doesn’t give a tech­nical ac­count of what “ex­plan­a­tion” and “un­der­stand­ing” could be.

Well, if you already have maxed-out pre­dict­ive cap­ab­il­ity, what ex­tra thing does un­der­stand­ing buy you? What use­ful thing is cap­tured by “un­der­stand­ing” that isn’t cap­tured by “pre­dict­ive cap­ab­il­ity”?

I’d per­son­ally put it this way. Pre­dict­ive cap­ab­il­ity is how ac­cur­ate you are about what will hap­pen in the en­vir­on­ment. But when you truly un­der­stand some­thing, you can use that to fig­ure out ac­tions and in­ter­ven­tions to get the en­vir­on­ment to ex­hibit weird be­ha­vior that wouldn’t have pre­ced­ent in the past se­quence of events. You “un­der­stand” some­thing when you have a com­pact set of rules and con­straints telling you how a change in start­ing con­di­tions af­fects some set of other con­di­tions and prop­er­ties, which feels con­nec­ted to the no­tion of a gears-level model.

To sum­mar­ize, con­di­tion­ing-coun­ter­fac­tu­als are very likely the most gen­eral type, but when the en­vir­on­ment (whether it be phys­ics or math) has the prop­erty that the change in­duced by a dif­fer­ent start­ing con­di­tion is de­scrib­able by a much smal­ler pro­gram than an if-then table for all start­ing con­di­tions, then it makes sense to call to call it a “le­git­im­ate coun­ter­fac­tual”. The no­tion of there be­ing some­thing bey­ond ep­si­lon-ex­plor­a­tion is closely tied to hav­ing com­pact de­scrip­tions of the en­vir­on­ment and how it be­haves un­der in­ter­ven­tions, in­stead of the max-en­tropy prior where you can’t say any­thing con­fid­ently about what hap­pens when you take a dif­fer­ent ac­tion than you nor­mally do, and this also seems closely tied to Abram’s no­tion of a “gears-level model”

There are in­ter­est­ing paralells to this in the AIXI set­ting. The “mod­els” would be the Tur­ing ma­chines that may be the en­vir­on­ment, and the Tur­ing ma­chines are set up such that any ac­tion se­quence could be in­put into them and they would be­have in some pre­dict­able way. This at­tains the prop­erty of ac­cur­ately pre­dict­ing con­sequences of vari­ous ac­tion se­quences AIXI doesn’t take if the world it is in­ter­act­ing with is low-com­plex­ity, for much the same reason as hu­mans can reason through con­sequences of situ­ations they have never en­countered us­ing rules that ac­cur­ately de­scribe the situ­ations they have en­countered.

However if AIXI has some high-prob­ab­il­ity world (ac­cord­ing to the start­ing dis­tri­bu­tion) where an ac­tion is very dan­ger­ous, it will avoid that ac­tion, at least un­til it can rule out that world by some other means. As Leike and Hut­ter en­ter­tain­ingly show, this “Bayesian para­noia” can make AIXI be­have ar­bit­rar­ily badly, just by choos­ing the uni­ver­sal tur­ing ma­chine ap­pro­pri­ately, to as­sign high prob­ab­il­ity to a world where AIXI goes to hell and gets 0 re­ward forever if it ever takes some ac­tion.

This ac­tu­ally seems ac­cept­able to me. Just don’t be born with a bad prior. Or, at least, there may be some self-ful­filling proph­ecies, but it’s bet­ter then hav­ing ex­plor­a­tion into bad out­comes in every world with ir­re­vers­ible traps. In par­tic­u­lar, note that ex­plor­a­tion steps are re­flect­ively in­con­sist­ent, be­cause AIXI (when con­sid­er­ing the fu­ture) will do worse (ac­cord­ing to the cur­rent prob­ab­il­ity dis­tri­bu­tion over Tur­ing ma­chines) if it uses ex­plor­a­tion, rather than us­ing the cur­rent prob­ab­il­ity dis­tri­bu­tion. AIXI is op­timal ac­cord­ing to the en­vir­on­ment dis­tri­bu­tion it starts with, while AIXI with ex­plor­a­tion is not.