Com­ment on de­cision theory

A com­ment I made on so­cial me­dia last year about why MIRI cares about mak­ing pro­gress on de­cision the­ory:

We aren’t work­ing on de­cision the­ory in or­der to make sure that AGI sys­tems are de­cision-the­or­etic, whatever that would in­volve. We’re work­ing on de­cision the­ory be­cause there’s a cluster of con­fus­ing is­sues here (e.g., coun­ter­fac­tu­als, up­date­less­ness, co­ordin­a­tion) that rep­res­ent a lot of holes or an­om­alies in our cur­rent best un­der­stand­ing of what high-qual­ity reas­on­ing is and how it works.

As an ana­logy: it might be pos­sible to build a prob­ab­il­istic reasoner without hav­ing a work­ing un­der­stand­ing of clas­sical prob­ab­il­ity the­ory, through suf­fi­cient trial and er­ror. (Evolu­tion “built” hu­mans without un­der­stand­ing prob­ab­il­ity the­ory.) But you’d fun­da­ment­ally be fly­ing blind when it comes to design­ing the sys­tem — to a large ex­tent, you couldn’t pre­dict in ad­vance which classes of design were likely to be most prom­ising to con­sider, couldn’t look at par­tic­u­lar pro­posed designs and make good ad­vance pre­dic­tions about safety/​cap­ab­il­ity prop­er­ties of the cor­res­pond­ing sys­tem, couldn’t identify and ad­dress the root causes of prob­lems that crop up, etc.

The idea be­hind look­ing at (e.g.) coun­ter­fac­tual reas­on­ing is that coun­ter­fac­tual reas­on­ing is cent­ral to what we’re talk­ing about when we talk about “AGI,” and go­ing into the de­vel­op­ment pro­cess without a de­cent un­der­stand­ing of what coun­ter­fac­tual reas­on­ing is and how it works means you’ll to a sig­ni­fic­antly greater ex­tent be fly­ing blind when it comes to design­ing, in­spect­ing, re­pair­ing, etc. your sys­tem. The goal is to be able to put AGI de­velopers in a po­s­i­tion where they can make ad­vance plans and pre­dic­tions, shoot for nar­row design tar­gets, and un­der­stand what they’re do­ing well enough to avoid the kinds of kludgey, opaque, non-mod­u­lar, etc. ap­proaches that aren’t really com­pat­ible with how se­cure or ro­bust soft­ware is de­veloped.

Nate’s way of ar­tic­u­lat­ing it:

The reason why I care about lo­gical un­cer­tainty and de­cision the­ory prob­lems is some­thing more like this: The whole AI prob­lem can be thought of as a par­tic­u­lar lo­gical un­cer­tainty prob­lem, namely, the prob­lem of tak­ing a cer­tain func­tion f : QR and find­ing an in­put that makes the out­put large. To see this, let f be the func­tion that takes the AI agent’s next ac­tion (en­coded in Q) and de­term­ines how “good” the uni­verse is if the agent takes that ac­tion. The reason we need a prin­cipled the­ory of lo­gical un­cer­tainty is so that we can do func­tion op­tim­iz­a­tion, and the reason we need a prin­cipled de­cision the­ory is so we can pick the right ver­sion of the “if the AI sys­tem takes that ac­tion...” func­tion.

The work you use to get to AGI pre­sum­ably won’t look like prob­ab­il­ity the­ory, but it’s still the case that you’re build­ing a sys­tem to do prob­ab­il­istic reas­on­ing, and un­der­stand­ing what prob­ab­il­istic reas­on­ing is is likely to be very valu­able for do­ing that without re­ly­ing on brute force and trial-and-er­ror. Sim­il­arly, the work that goes into fig­ur­ing out how to design a rocket, ac­tu­ally build­ing one, etc. doesn’t look very much like the work that goes into fig­ur­ing out that there’s a uni­ver­sal force of grav­ity that op­er­ates by an in­verse square law; but you’ll have a vastly easier time ap­proach­ing the rocket-build­ing prob­lem with foresight and an un­der­stand­ing of what you’re do­ing if you have a men­tal model of grav­it­a­tion already in hand.

In pretty much the same way, de­vel­op­ing an un­der­stand­ing of roughly what coun­ter­fac­tu­als are and how they work won’t get you to AGI, and the work of im­ple­ment­ing an AGI design won’t look like de­cision the­ory, but you want to have in mind an un­der­stand­ing of what “AGI-style reas­on­ing” is (in­clud­ing “what prob­ab­il­istic reas­on­ing about em­pir­ical pro­pos­i­tions is” but also “what coun­ter­fac­tual reas­on­ing is”, “what prob­ab­il­istic reas­on­ing about math­em­at­ical pro­pos­i­tions is”, etc.), and very roughly how/​why it works, be­fore you start mak­ing ef­fect­ively ir­re­vers­ible design de­cisions.

Eliezer adds:

I do also re­mark that there are mul­tiple fix­points in de­cision the­ory. CDT does not evolve into FDT but into a weirder sys­tem Son-of-CDT. So, as with util­ity func­tions, there are bits we want that the AI does not ne­ces­sar­ily gen­er­ate from self-im­prove­ment or local com­pet­ence gains.