Decision theory: Why we need to reduce “could”, “would”, “should”

(This is the sec­ond post in a planned se­quence.)

Let’s say you’re build­ing an ar­tifi­cial in­tel­li­gence named Bob. You’d like Bob to sally forth and win many utilons on your be­half. How should you build him? More speci­fi­cally, should you build Bob to have a world-model in which there are many differ­ent ac­tions he “could” take, each of which “would” give him par­tic­u­lar ex­pected re­sults? (Note that e.g. evolu­tion, rivers, and ther­mostats do not have ex­plicit “could”/​“would”/​“should” mod­els in this sense—and while evolu­tion, rivers, and ther­mostats are all vary­ing de­grees of stupid, they all still ac­com­plish spe­cific sorts of world-changes. One might imag­ine more pow­er­ful agents that also sim­ply take use­ful ac­tions, with­out claimed “could”s and “woulds”.)

My aim in this post is sim­ply to draw at­ten­tion to “could”, “would”, and “should”, as con­cepts folk in­tu­ition fails to un­der­stand, but that seem nev­er­the­less to do some­thing im­por­tant for real-world agents. If we want to build Bob, we may well need to figure out what the con­cepts “could” and “would” can do for him.*

In­tro­duc­ing Could/​Would/​Should agents:

Let a Could/​Would/​Should Al­gorithm, or CSA for short, be any al­gorithm that chooses its ac­tions by con­sid­er­ing a list of al­ter­na­tives, es­ti­mat­ing the pay­off it “would” get “if” it took each given ac­tion, and choos­ing the ac­tion from which it ex­pects high­est pay­off.

That is: let us say that to spec­ify a CSA, we need to spec­ify:

  1. A list of al­ter­na­tives a_1, a_2, …, a_n that are prim­i­tively la­beled as ac­tions it “could” take;

  2. For each al­ter­na­tive a_1 through a_n, an ex­pected pay­off U(a_i) that is la­beled as what “would” hap­pen if the CSA takes that al­ter­na­tive.


To be a CSA, the al­gorithm must then search through the pay­offs for each ac­tion, and must then trig­ger the agent to ac­tu­ally take the ac­tion a_i for which its la­beled U(a_i) is max­i­mal.



Note that we can, by this defi­ni­tion of “CSA”, cre­ate a CSA around any made-up list of “al­ter­na­tive ac­tions” and of cor­re­spond­ing “ex­pected pay­offs”.


The puz­zle is that CSAs are com­mon enough to sug­gest that they’re use­ful—but it isn’t clear why CSAs are use­ful, or quite what kinds of CSAs are what kind of use­ful. To spell out the puz­zle:

Puz­zle piece 1: CSAs are com­mon. Hu­mans, some (though far from all) other an­i­mals, and many hu­man-cre­ated de­ci­sion-mak­ing pro­grams (game-play­ing pro­grams, schedul­ing soft­ware, etc.), have CSA-like struc­ture. That is, we con­sider “al­ter­na­tives” and act out the al­ter­na­tive from which we “ex­pect” the high­est pay­off (at least to a first ap­prox­i­ma­tion). The ubiquity of ap­prox­i­mate CSAs sug­gests that CSAs are in some sense use­ful.

Puz­zle piece 2: The naïve re­al­ist model of CSAs’ na­ture and use­ful­ness doesn’t work as an ex­pla­na­tion.

That is: many peo­ple find CSAs’ use­ful­ness un­sur­pris­ing, be­cause they imag­ine a Phys­i­cally Irre­ducible Choice Point, where an agent faces Real Op­tions; by think­ing hard, and choos­ing the Op­tion that looks best, naïve re­al­ists figure that you can get the best-look­ing op­tion (in­stead of one of those other op­tions, that you Really Could have got­ten).

But CSAs, like other agents, are de­ter­minis­tic phys­i­cal sys­tems. Each CSA ex­e­cutes a sin­gle se­quence of phys­i­cal move­ments, some of which we con­sider “ex­am­in­ing al­ter­na­tives”, and some of which we con­sider “tak­ing an ac­tion”. It isn’t clear why or in what sense such sys­tems do bet­ter than de­ter­minis­tic sys­tems built in some other way.

Puz­zle piece 3: Real CSAs are pre­sum­ably not built from ar­bi­trar­ily la­beled “coulds” and “woulds”—pre­sum­ably, the “woulds” that hu­mans and oth­ers use, when con­sid­er­ing e.g. which chess move to make, have use­ful prop­er­ties. But it isn’t clear what those prop­er­ties are, or how to build an al­gorithm to com­pute “woulds” with the de­sired prop­er­ties.

Puz­zle piece 4: On their face, all calcu­la­tions of coun­ter­fac­tual pay­offs (“woulds”) in­volve ask­ing ques­tions about im­pos­si­ble wor­lds. It is not clear how to in­ter­pret such ques­tions.

Deter­minism notwith­stand­ing, it is tempt­ing to in­ter­pret CSAs’ “woulds”—our U(a_i)s above—as calcu­lat­ing what “re­ally would” hap­pen, if they “were” some­how able to take each given ac­tion.

But if agent X will (de­ter­minis­ti­cally) choose ac­tion a_1, then when he asks what would hap­pen “if” he takes al­ter­na­tive ac­tion a _2, he’s ask­ing what would hap­pen if some­thing im­pos­si­ble hap­pens.

If X is to calcu­late the pay­off “if he takes ac­tion a_2” as part of a causal world-model, he’ll need to choose some par­tic­u­lar mean­ing of “if he takes ac­tion a_2” – some mean­ing that al­lows him to com­bine a model of him­self tak­ing ac­tion a_2 with the rest of his cur­rent pic­ture of the world, with­out al­low­ing pre­dic­tions like “if I take ac­tion a_2, then the laws of physics will have been bro­ken”.

We are left with sev­eral ques­tions:

  • Just what are hu­mans, and other com­mon CSAs, calcu­lat­ing when we imag­ine what “would” hap­pen “if” we took ac­tions we won’t take?

  • In what sense, and in what en­vi­ron­ments, are such “would” calcu­la­tions use­ful? Or, if “would” calcu­la­tions are not use­ful in any rea­son­able sense, how did CSAs come to be so com­mon?

  • Is there more than one nat­u­ral way to calcu­late these coun­ter­fac­tual “would”s? If so, what are the al­ter­na­tives, and which al­ter­na­tive works best?


*A draft-reader sug­gested to me that this ques­tion is poorly mo­ti­vated: what other kinds of agents could there be, be­sides “could”/​“would”/​“should” agents? Also, how could mod­el­ing the world in terms of “could” and “would” not be use­ful to the agent?

My im­pres­sion is that there is a sort of gap in philo­soph­i­cal wari­ness here that is a bit difficult to bridge, but that one must bridge if one is to think well about AI de­sign. I’ll try an anal­ogy. In my ex­pe­rience, be­gin­ning math stu­dents sim­ply ex­pect their nice-sound­ing pro­ce­dures to work. For ex­am­ple, they ex­pect to be able to add frac­tions straight across. When you tell them they can’t, they de­mand to know why they can’t, as though most nice-sound­ing the­o­rems are true, and if you want to claim that one isn’t, the bur­den of proof is on you. It is only af­ter stu­dents gain con­sid­er­able math­e­mat­i­cal so­phis­ti­ca­tion (or ex­pe­rience get­ting burned by ex­pec­ta­tions that don’t pan out) that they place the bur­den of proofs on the the­o­rems, as­sume the­o­rems false or un-us­able un­til proven true, and try to ac­tively con­struct and prove their math­e­mat­i­cal wor­lds.

Reach­ing to­ward AI the­ory is similar. If you don’t un­der­stand how to re­duce a con­cept—how to build cir­cuits that com­pute that con­cept, and what ex­act pos­i­tive re­sults will fol­low from that con­cept and will be ab­sent in agents which don’t im­ple­ment it—you need to keep an­a­lyz­ing. You need to be sus­pi­cious of any­thing you can’t de­rive for your­self, from scratch. Other­wise, even if there is some­thing of the sort that is use­ful in the spe­cific con­text of your head (e.g., some sort of “could”s and “would”s that do you good), your at­tempt to re-cre­ate some­thing similar-look­ing in an AI may well lose the use­ful­ness. You get cargo cult could/​woulds.

+ Thanks to Z M Davis for the above gor­geous di­a­gram.