What does it mean to apply decision theory?

Based on dis­cus­sions with Stu­art Arm­strong and Daniel Koko­ta­jlo.

There are two con­flict­ing ways of think­ing about foun­da­tional ra­tio­nal­ity ar­gu­ments such as the VNM the­o­rem.

  1. As di­rect ar­gu­ments for nor­ma­tive prin­ci­ples. The ax­ioms are sup­posed to be premises which you’d ac­tu­ally ac­cept. The ax­ioms im­ply the­o­ries of ra­tio­nal­ity such as prob­a­bil­ity the­ory and util­ity the­ory. Th­ese are sup­posed to ap­ply in prac­tice: if you ac­cept the ax­ioms, then you should be fol­low­ing them.

  2. As ideal­ized mod­els. Eliezer com­pares Bayesian rea­son­ers to a Carnot en­g­ine: an ideal­ized, ther­mo­dy­nam­i­cally perfect en­g­ine which can never be built. To the ex­tent that any real en­g­ine works, it ap­prox­i­mates a Carnot en­g­ine. To the ex­tent that any cog­ni­tion re­ally works, it ap­prox­i­mates Bayes. Bayes sets the bounds for what is pos­si­ble.

The sec­ond way of think­ing is very use­ful. Philoso­phers, economists, and oth­ers have made some real progress think­ing in this way. How­ever, I’m go­ing to ar­gue that we should push for the first sort of nor­ma­tive prin­ci­ple. We should not be satis­fied with nor­ma­tive prin­ci­ples which re­main as un­achiev­able ideals, giv­ing up­per bounds on perfor­mance with­out di­rectly helping us get there.

This im­plies deal­ing with prob­lems of bounded ra­tio­nal­ity. But it’s not the sort of “bounded ra­tio­nal­ity” where we set out to ex­plic­itly model ir­ra­tional­ity. We don’t want to talk about par­tial ra­tio­nal­ity; we want no­tions of ra­tio­nal­ity which bounded agents can fully satisfy.

Ap­prox­i­mat­ing Rationality

In or­der to ap­ply an ideal­ized ra­tio­nal­ity, such as Bayesian su­per­in­tel­li­gence, we need to have a con­cept of what it means to ap­prox­i­mate it. This is more sub­tle than it may seem. You can’t nec­es­sar­ily try to min­i­mize some no­tion of dis­tance be­tween your be­hav­ior and the ideal be­hav­ior. For one thing, you can’t com­pute the ideal be­hav­ior to find the dis­tance! But, for an­other thing, sim­ple imi­ta­tion of the ideal be­hav­ior can go wrong. Adopt­ing one part of an op­ti­mal policy with­out adopt­ing all the other parts might put you in a much worse po­si­tion than the one you started in.

Wei Dai dis­cusses the prob­lem in a post about Han­son’s pre-ra­tio­nal­ity con­cept:

[...] This is some­what similar to the ques­tion of how do we move from our cur­rent non-ra­tio­nal (ac­cord­ing to or­di­nary ra­tio­nal­ity) state to a ra­tio­nal one. Ex­pected util­ity the­ory says that we should act as if we are max­i­miz­ing ex­pected util­ity, but it doesn’t say what we should do if we find our­selves lack­ing a prior and a util­ity func­tion (i.e., if our ac­tual prefer­ences can­not be rep­re­sented as max­i­miz­ing ex­pected util­ity).

The fact that we don’t have good an­swers for these ques­tions per­haps shouldn’t be con­sid­ered fatal to [...] ra­tio­nal­ity, but it’s trou­bling that lit­tle at­ten­tion has been paid to them, rel­a­tive to defin­ing [...] ra­tio­nal­ity. (Why are ra­tio­nal­ity re­searchers more in­ter­ested in know­ing what ra­tio­nal­ity is, and less in­ter­ested in know­ing how to be ra­tio­nal? Also, BTW, why are there so few ra­tio­nal­ity re­searchers? Why aren’t there hordes of peo­ple in­ter­ested in these is­sues?)

Clearly, we have some idea of which moves to­ward ra­tio­nal­ity are cor­rect vs in­cor­rect. Think about the con­cept of cargo-cult­ing: pointless and in­effec­tive imi­ta­tion of a more ca­pa­ble agent. The prob­lem is the ab­sence of a for­mal the­ory.


One pos­si­ble way of fram­ing the prob­lem: the VNM ax­ioms, the Kol­mogorov prob­a­bil­ity ax­ioms, and/​or other ra­tio­nal­ity frame­works give us a no­tion of con­sis­tency. We can check our be­hav­iors and opinions for in­con­sis­tency. But what do we do when we no­tice an in­con­sis­tency? Which parts are we sup­posed to change?

Here are some cases where there is at least a ten­dency to up­date in a par­tic­u­lar di­rec­tion:

  • Sup­pose we value an event at 4.2 ex­pected utils. We then un­pack into two mu­tu­ally ex­clu­sive sub-events, . We no­tice that we value at 1.1 utils and at 3.4 utils. This is in­con­sis­tent with the eval­u­a­tion of . We usu­ally trust less than the un­packed ver­sion, and would re­set the eval­u­a­tion of to .

  • Sup­pose we no­tice that we’re do­ing things in a way that’s not op­ti­mal for our goals. That is, we no­tice some new way of do­ing things which is bet­ter for what we be­lieve our goals to be. We will tend to change our be­hav­ior rather than change our be­liefs about what our goals are. (Ob­vi­ously this is not always the case, how­ever.)

  • Similarly, sup­pose we no­tice that we are act­ing in a way which is in­con­sis­tent with our be­liefs. There is a ten­dency to cor­rect the ac­tion rather than the be­lief. (Again, not as surely as my first ex­am­ple, though.)

  • If we find that a be­lief was sub­ject to base-rate ne­glect, there is a ten­dency to mul­ti­ply by base-rates and renor­mal­ize, rather than ad­just our be­liefs about base rates to make them con­sis­tent.

  • If we no­tice that X and Y are equiv­a­lent, but we had differ­ent be­liefs about X and Y, then we tend to pool in­for­ma­tion from X and Y such that, for ex­am­ple, if we had a very sharp dis­tri­bu­tion about X and a very un­in­for­ma­tive dis­tri­bu­tion about Y, the sharp dis­tri­bu­tion would win.

If you’re like me, you might have read some of those and im­me­di­ately thought of a Bayesian model of the in­fer­ence go­ing on. Keep in mind that this is sup­posed to be about notic­ing ac­tual in­con­sis­ten­cies, and what we want is a model which deals di­rectly with that. It might turn out to be a kind of meta-Bayesian model, where we ap­prox­i­mate a Bayesian su­per­in­tel­li­gence by way of a much more bounded Bayesian view which at­tempts to rea­son about what a truly con­sis­tent view would look like. But don’t fool your­self into think­ing a stan­dard one-level Bayesian pic­ture is suffi­cient, just be­cause you can look at some of the bul­let points and imag­ine a Bayesian way to han­dle it.

It would be quite in­ter­est­ing to have a gen­eral “the­ory of be­com­ing ra­tio­nal” which had some­thing to say about how we make de­ci­sions in cases such as I’ve listed.

Log­i­cal Uncertainty

Ob­vi­ously, I’m point­ing in the gen­eral di­rec­tion of log­i­cal un­cer­tainty and bounded no­tions of ra­tio­nal­ity (IE no­tions of ra­tio­nal­ity which can ap­ply to bounded agents). Par­tic­u­larly in the “notic­ing in­con­sis­ten­cies” fram­ing, it sounds like this might en­tirely re­duce to log­i­cal un­cer­tainty. But I want to point at the broader prob­lem, be­cause (1) an ex­am­ple of this might not im­me­di­ately look like a prob­lem of log­i­cal un­cer­tainty; (2) a the­ory of log­i­cal un­cer­tainty, such as log­i­cal in­duc­tion, might not en­tirely solve this prob­lem; (3) log­i­cal un­cer­tainty is an epistemic is­sue, whereas this prob­lem ap­plies to in­stru­men­tal ra­tio­nal­ity as well; (4) even set­ting all that aside, it’s worth point­ing at the dis­tinc­tion be­tween ideal no­tions of ra­tio­nal­ity and ap­pli­ca­ble no­tions of ra­tio­nal­ity as a point in it­self.

The Ideal Fades into the Background

So far, it sounds like my sug­ges­tion is that we should keep our ideal­ized no­tions of ra­tio­nal­ity, but also de­velop “the­o­ries of ap­prox­i­ma­tion” which tell us what it means to ap­proach the ideals in a good way vs a bad way. How­ever, I want to point out an in­ter­est­ing phe­nomenon: some­times, when you get a re­ally good no­tion of “ap­prox­i­ma­tion”, the ideal­ized no­tion of ra­tio­nal­ity you started with fades into the back­ground.

Ex­am­ple 1: Log­i­cal Induction

Start with the Dem­ski Prior, which was sup­posed to be an ideal­ized no­tion of ra­tio­nal be­lief much like the Solomonoff prior, but built for logic rather than com­pu­ta­tion. I de­signed the prior with ap­prox­ima­bil­ity in mind, be­cause I thought it should be a con­straint on a nor­ma­tive the­ory that we ac­tu­ally be able to ap­prox­i­mate the ideal. Scott and Benya mod­ified the Dem­ski prior to make it nicer, and no­ticed that when you do so, the ap­prox­i­ma­tion it­self has a de­sir­able prop­erty. The line of re­search called asymp­totic log­i­cal un­cer­tainty fo­cused on such “good prop­er­ties of ap­prox­i­ma­tions”, even­tu­ally lead­ing to log­i­cal in­duc­tion.

A log­i­cal in­duc­tor is a se­quence of im­prov­ing be­lief as­sign­ments. The be­liefs do con­verge to a prob­a­bil­ity dis­tri­bu­tion, which will have some re­sem­blance to the mod­ified Dem­ski prior (and to Solomonoff’s prior). How­ever, the con­cept of log­i­cal in­duc­tion gives a much richer the­ory of ra­tio­nal­ity, in which this limit plays a minor role. Fur­ther­more, the the­ory of log­i­cal in­duc­tion comes much closer to ap­ply­ing to re­al­is­tic agents than “ra­tio­nal agents ap­prox­i­mate a Bayesian rea­son­ing with [some prior]”.

Ex­am­ple 2: Game-The­o­retic Equil­ibria vs MAL

Game-the­o­retic equil­ibrium con­cepts, such as Nash equil­ibrium and cor­re­lated equil­ibrium, provide a ra­tio­nal­ity con­cept for games: ra­tio­nal agents who know that each other are ra­tio­nal are sup­posed to be in equil­ibrium with each other. How­ever, most games have mul­ti­ple Nash equil­ibria, and even more cor­re­lated equil­ibria. How is a ra­tio­nal agent sup­posed to de­cide which of these to play? As­sum­ing only the ra­tio­nal­ity of the other play­ers is not enough to choose one equil­ibrium over an­other. If ra­tio­nal agents play an equil­ibrium, how do they get there?

One ap­proach to this co­nun­drum has been to in­tro­duce re­fined equil­ibrium con­cepts, which ad­mit some Nash equil­ibria and not oth­ers. Trem­bling Hand equil­ibrium is one such con­cept. This in­tro­duces a no­tion of “sta­ble” equil­ibria, point­ing out that it is im­plau­si­ble that agents play “un­sta­ble” equil­ibria. How­ever, while this nar­rows things down to a sin­gle equil­ibrium solu­tion in some cases, it does not do so for all cases. Other re­fined equil­ibrium con­cepts may leave no equil­ibria for some games. To get rid of the prob­lem, one would need an equil­ibrium con­cept which (a) leaves one and only one equil­ibrium for ev­ery game, and (b) fol­lows from plau­si­ble ra­tio­nal­ity as­sump­tions. Such things have been pro­posed, most promi­nently by Harsanyi & Selten A Gen­eral The­ory of Equil­ibrium Selec­tion in Games, but so far I find them un­con­vinc­ing.

A very differ­ent ap­proach is rep­re­sented by multi-agent learn­ing (MAL), which asks the ques­tion: can agents learn to play equil­ibrium strate­gies? In this ver­sion, agents must in­ter­act over time in or­der to con­verge to equil­ibrium play. (Or at least, agents simu­late dumber ver­sions of each other in an effort to figure out how to play.)

It turns out that, in MAL, there are some­what nicer sto­ries about how agents con­verge to cor­re­lated equil­ibria than there are about con­verg­ing to Nash equil­ibria. For ex­am­ple, Cal­ibrated Learn­ing and Cor­re­lated Equil­ibrium (Foster & Vohra) shows that agents with a cal­ibrated learn­ing prop­erty will con­verge to cor­re­lated equil­ibrium in re­peated play.

Th­ese new ra­tio­nal­ity prin­ci­ples, which come from MAL, are then much more rele­vant to the de­sign and im­ple­men­ta­tion of game-play­ing agents than the equil­ibrium con­cepts which they sup­port. Equil­ibrium con­cepts, such as cor­re­lated equil­ibria, tell you some­thing about what agents con­verge to in the limit; the learn­ing prin­ci­ples which let them ac­com­plish that, how­ever, tell you about the dy­nam­ics—what agents do at finite times, in re­sponse to non-equil­ibrium situ­a­tions. This is more rele­vant to agents “on the ground”, as it were.

And, to the ex­tent that re­quire­ments like cal­ibrated learn­ing are NOT com­pu­ta­tion­ally fea­si­ble, this weak­ens our trust in equil­ibrium con­cepts as a ra­tio­nal­ity no­tion—if there isn’t a plau­si­ble story about how (bounded-) ra­tio­nal agents can get into equil­ibrium, why should we think of equil­ibrium as ra­tio­nal?

So, we see that the bounded, dy­namic no­tions of ra­tio­nal­ity are more fun­da­men­tal than the un­bounded, fixed-point style equil­ibrium con­cepts: if we want to deal with re­al­is­tic agents, we should be more will­ing to ad­just/​aban­don our equil­ibrium con­cepts in re­sponse to how nice the MAL story is, than vice versa.

Coun­terex­am­ple: Com­plete Class Theorems

This doesn’t always hap­pen. The com­plete class the­o­rems give a pic­ture of ra­tio­nal­ity in which we start with the abil­ity and will­ing­ness to take Pareto-im­prove­ments. Given this, we end up with an agent be­ing clas­si­cally ra­tio­nal: hav­ing a prob­a­bil­ity dis­tri­bu­tion, and choos­ing ac­tions which max­i­mize ex­pected util­ity.

Given this ar­gu­ment, we be­come more con­fi­dent in the use­ful­ness of prob­a­bil­ity dis­tri­bu­tions. But why should this be the con­clu­sion? A differ­ent way of look­ing at the ar­gu­ment could be: we don’t need to think about prob­a­bil­ity dis­tri­bu­tions. All we need to think about is Pareto im­prove­ments.

Some­how, prob­a­bil­ity still seems very use­ful to think about. We don’t switch to the “dy­namic” view of agents who haven’t yet con­structed prob­a­bil­is­tic be­liefs, tak­ing Pareto im­prove­ments on their way to re­flec­tive con­sis­tency. This just doesn’t seem like a re­al­is­tic view of bounded agents. Yes, bounded agents are still en­gaged in a search for the best policy, which may in­volve find­ing new strate­gies which are strictly bet­ter along ev­ery rele­vant di­men­sion. But bounded agency also in­volves mak­ing trade-offs, when no Pareto im­prove­ment can be found. This ne­ces­si­tates think­ing of prob­a­bil­ities. So it doesn’t seem like we want to erase that from our pic­ture of prac­ti­cal agency.

Per­haps this is be­cause, in some sense, the com­plete class the­o­rems are not very good—they don’t re­ally end up ex­plain­ing a less ba­sic thing in terms of a more ba­sic thing. After all, when can you re­al­is­ti­cally find a pure Pareto im­prove­ment?


I’ve sug­gested that we move to­ward no­tions of ra­tio­nal­ity that are fun­da­men­tally bounded (ap­ply­ing to agents who lack the re­sources to be ra­tio­nal in more clas­si­cal senses) and dy­namic (fun­da­men­tally in­volv­ing learn­ing, rather than as­sum­ing the agent already has a good pic­ture of the world; break­ing down equil­ibrium con­cepts such as those in game the­ory, and in­stead look­ing for the dy­nam­ics which can con­verge to equil­ibrium).

This gives us a pic­ture of “ra­tio­nal­ity” which is more like “op­ti­mal­ity” in com­puter sci­ence: in com­puter sci­ence, it’s more typ­i­cal to come up with a no­tion of op­ti­mal­ity which ac­tu­ally ap­plies to some al­gorithms. For ex­am­ple, “op­ti­mal sort­ing al­gorithm” usu­ally refers to big-O op­ti­mal­ity, and many sort­ing al­gorithms are op­ti­mal in that sense. Similarly, in ma­chine learn­ing, re­gret bounds are mainly in­ter­est­ing when they are achiev­able by some al­gorithm. (Although, it could be in­ter­est­ing to know a lower bound on achiev­able re­gret guaran­tees.)

Why should no­tions of ra­tio­nal­ity be so far from no­tions of op­ti­mal­ity? Can we take a more com­puter-sci­ence fla­vored ap­proach to ra­tio­nal­ity?

Bar­ring that, it should at least be of crit­i­cal im­por­tance to in­ves­ti­gate in what sense ideal­ized no­tions of ra­tio­nal­ity are nor­ma­tive prin­ci­ples for bounded agents like us. What con­sti­tutes cargo-cult­ing ra­tio­nal­ity, vs re­ally be­com­ing more ra­tio­nal? What kind of ad­just­ments should an ir­ra­tional agent make when ir­ra­tional­ity is no­ticed?