Comment on decision theory

A com­ment I made on so­cial me­dia last year about why MIRI cares about mak­ing progress on de­ci­sion the­ory:


We aren’t work­ing on de­ci­sion the­ory in or­der to make sure that AGI sys­tems are de­ci­sion-the­o­retic, what­ever that would in­volve. We’re work­ing on de­ci­sion the­ory be­cause there’s a cluster of con­fus­ing is­sues here (e.g., coun­ter­fac­tu­als, up­date­less­ness, co­or­di­na­tion) that rep­re­sent a lot of holes or anoma­lies in our cur­rent best un­der­stand­ing of what high-qual­ity rea­son­ing is and how it works.

As an anal­ogy: it might be pos­si­ble to build a prob­a­bil­is­tic rea­soner with­out hav­ing a work­ing un­der­stand­ing of clas­si­cal prob­a­bil­ity the­ory, through suffi­cient trial and er­ror. (Evolu­tion “built” hu­mans with­out un­der­stand­ing prob­a­bil­ity the­ory.) But you’d fun­da­men­tally be fly­ing blind when it comes to de­sign­ing the sys­tem — to a large ex­tent, you couldn’t pre­dict in ad­vance which classes of de­sign were likely to be most promis­ing to con­sider, couldn’t look at par­tic­u­lar pro­posed de­signs and make good ad­vance pre­dic­tions about safety/​ca­pa­bil­ity prop­er­ties of the cor­re­spond­ing sys­tem, couldn’t iden­tify and ad­dress the root causes of prob­lems that crop up, etc.

The idea be­hind look­ing at (e.g.) coun­ter­fac­tual rea­son­ing is that coun­ter­fac­tual rea­son­ing is cen­tral to what we’re talk­ing about when we talk about “AGI,” and go­ing into the de­vel­op­ment pro­cess with­out a de­cent un­der­stand­ing of what coun­ter­fac­tual rea­son­ing is and how it works means you’ll to a sig­nifi­cantly greater ex­tent be fly­ing blind when it comes to de­sign­ing, in­spect­ing, re­pairing, etc. your sys­tem. The goal is to be able to put AGI de­vel­op­ers in a po­si­tion where they can make ad­vance plans and pre­dic­tions, shoot for nar­row de­sign tar­gets, and un­der­stand what they’re do­ing well enough to avoid the kinds of kludgey, opaque, non-mod­u­lar, etc. ap­proaches that aren’t re­ally com­pat­i­ble with how se­cure or ro­bust soft­ware is de­vel­oped.

Nate’s way of ar­tic­u­lat­ing it:

The rea­son why I care about log­i­cal un­cer­tainty and de­ci­sion the­ory prob­lems is some­thing more like this: The whole AI prob­lem can be thought of as a par­tic­u­lar log­i­cal un­cer­tainty prob­lem, namely, the prob­lem of tak­ing a cer­tain func­tion f : QR and find­ing an in­put that makes the out­put large. To see this, let f be the func­tion that takes the AI agent’s next ac­tion (en­coded in Q) and de­ter­mines how “good” the uni­verse is if the agent takes that ac­tion. The rea­son we need a prin­ci­pled the­ory of log­i­cal un­cer­tainty is so that we can do func­tion op­ti­miza­tion, and the rea­son we need a prin­ci­pled de­ci­sion the­ory is so we can pick the right ver­sion of the “if the AI sys­tem takes that ac­tion...” func­tion.

The work you use to get to AGI pre­sum­ably won’t look like prob­a­bil­ity the­ory, but it’s still the case that you’re build­ing a sys­tem to do prob­a­bil­is­tic rea­son­ing, and un­der­stand­ing what prob­a­bil­is­tic rea­son­ing is is likely to be very valuable for do­ing that with­out rely­ing on brute force and trial-and-er­ror. Similarly, the work that goes into figur­ing out how to de­sign a rocket, ac­tu­ally build­ing one, etc. doesn’t look very much like the work that goes into figur­ing out that there’s a uni­ver­sal force of grav­ity that op­er­ates by an in­verse square law; but you’ll have a vastly eas­ier time ap­proach­ing the rocket-build­ing prob­lem with fore­sight and an un­der­stand­ing of what you’re do­ing if you have a men­tal model of grav­i­ta­tion already in hand.

In pretty much the same way, de­vel­op­ing an un­der­stand­ing of roughly what coun­ter­fac­tu­als are and how they work won’t get you to AGI, and the work of im­ple­ment­ing an AGI de­sign won’t look like de­ci­sion the­ory, but you want to have in mind an un­der­stand­ing of what “AGI-style rea­son­ing” is (in­clud­ing “what prob­a­bil­is­tic rea­son­ing about em­piri­cal propo­si­tions is” but also “what coun­ter­fac­tual rea­son­ing is”, “what prob­a­bil­is­tic rea­son­ing about math­e­mat­i­cal propo­si­tions is”, etc.), and very roughly how/​why it works, be­fore you start mak­ing effec­tively ir­re­versible de­sign de­ci­sions.


Eliezer adds:

I do also re­mark that there are mul­ti­ple fix­points in de­ci­sion the­ory. CDT does not evolve into FDT but into a weirder sys­tem Son-of-CDT. So, as with util­ity func­tions, there are bits we want that the AI does not nec­es­sar­ily gen­er­ate from self-im­prove­ment or lo­cal com­pe­tence gains.