Decision theory: Why we need to reduce “could”, “would”, “should”

(This is the second post in a planned sequence.)

Let’s say you’re building an artificial intelligence named Bob. You’d like Bob to sally forth and win many utilons on your behalf. How should you build him? More specifically, should you build Bob to have a world-model in which there are many different actions he “could” take, each of which “would” give him particular expected results? (Note that e.g. evolution, rivers, and thermostats do not have explicit “could”/​“would”/​“should” models in this sense—and while evolution, rivers, and thermostats are all varying degrees of stupid, they all still accomplish specific sorts of world-changes. One might imagine more powerful agents that also simply take useful actions, without claimed “could”s and “woulds”.)

My aim in this post is simply to draw attention to “could”, “would”, and “should”, as concepts folk intuition fails to understand, but that seem nevertheless to do something important for real-world agents. If we want to build Bob, we may well need to figure out what the concepts “could” and “would” can do for him.*

Introducing Could/​Would/​Should agents:

Let a Could/​Would/​Should Algorithm, or CSA for short, be any algorithm that chooses its actions by considering a list of alternatives, estimating the payoff it “would” get “if” it took each given action, and choosing the action from which it expects highest payoff.

That is: let us say that to specify a CSA, we need to specify:

  1. A list of alternatives a_1, a_2, …, a_n that are primitively labeled as actions it “could” take;

  2. For each alternative a_1 through a_n, an expected payoff U(a_i) that is labeled as what “would” happen if the CSA takes that alternative.


To be a CSA, the algorithm must then search through the payoffs for each action, and must then trigger the agent to actually take the action a_i for which its labeled U(a_i) is maximal.



Note that we can, by this definition of “CSA”, create a CSA around any made-up list of “alternative actions” and of corresponding “expected payoffs”.


The puzzle is that CSAs are common enough to suggest that they’re useful—but it isn’t clear why CSAs are useful, or quite what kinds of CSAs are what kind of useful. To spell out the puzzle:

Puzzle piece 1: CSAs are common. Humans, some (though far from all) other animals, and many human-created decision-making programs (game-playing programs, scheduling software, etc.), have CSA-like structure. That is, we consider “alternatives” and act out the alternative from which we “expect” the highest payoff (at least to a first approximation). The ubiquity of approximate CSAs suggests that CSAs are in some sense useful.

Puzzle piece 2: The naïve realist model of CSAs’ nature and usefulness doesn’t work as an explanation.

That is: many people find CSAs’ usefulness unsurprising, because they imagine a Physically Irreducible Choice Point, where an agent faces Real Options; by thinking hard, and choosing the Option that looks best, naïve realists figure that you can get the best-looking option (instead of one of those other options, that you Really Could have gotten).

But CSAs, like other agents, are deterministic physical systems. Each CSA executes a single sequence of physical movements, some of which we consider “examining alternatives”, and some of which we consider “taking an action”. It isn’t clear why or in what sense such systems do better than deterministic systems built in some other way.

Puzzle piece 3: Real CSAs are presumably not built from arbitrarily labeled “coulds” and “woulds”—presumably, the “woulds” that humans and others use, when considering e.g. which chess move to make, have useful properties. But it isn’t clear what those properties are, or how to build an algorithm to compute “woulds” with the desired properties.

Puzzle piece 4: On their face, all calculations of counterfactual payoffs (“woulds”) involve asking questions about impossible worlds. It is not clear how to interpret such questions.

Determinism notwithstanding, it is tempting to interpret CSAs’ “woulds”—our U(a_i)s above—as calculating what “really would” happen, if they “were” somehow able to take each given action.

But if agent X will (deterministically) choose action a_1, then when he asks what would happen “if” he takes alternative action a _2, he’s asking what would happen if something impossible happens.

If X is to calculate the payoff “if he takes action a_2” as part of a causal world-model, he’ll need to choose some particular meaning of “if he takes action a_2” – some meaning that allows him to combine a model of himself taking action a_2 with the rest of his current picture of the world, without allowing predictions like “if I take action a_2, then the laws of physics will have been broken”.

We are left with several questions:

  • Just what are humans, and other common CSAs, calculating when we imagine what “would” happen “if” we took actions we won’t take?

  • In what sense, and in what environments, are such “would” calculations useful? Or, if “would” calculations are not useful in any reasonable sense, how did CSAs come to be so common?

  • Is there more than one natural way to calculate these counterfactual “would”s? If so, what are the alternatives, and which alternative works best?


*A draft-reader suggested to me that this question is poorly motivated: what other kinds of agents could there be, besides “could”/​“would”/​“should” agents? Also, how could modeling the world in terms of “could” and “would” not be useful to the agent?

My impression is that there is a sort of gap in philosophical wariness here that is a bit difficult to bridge, but that one must bridge if one is to think well about AI design. I’ll try an analogy. In my experience, beginning math students simply expect their nice-sounding procedures to work. For example, they expect to be able to add fractions straight across. When you tell them they can’t, they demand to know why they can’t, as though most nice-sounding theorems are true, and if you want to claim that one isn’t, the burden of proof is on you. It is only after students gain considerable mathematical sophistication (or experience getting burned by expectations that don’t pan out) that they place the burden of proofs on the theorems, assume theorems false or un-usable until proven true, and try to actively construct and prove their mathematical worlds.

Reaching toward AI theory is similar. If you don’t understand how to reduce a concept—how to build circuits that compute that concept, and what exact positive results will follow from that concept and will be absent in agents which don’t implement it—you need to keep analyzing. You need to be suspicious of anything you can’t derive for yourself, from scratch. Otherwise, even if there is something of the sort that is useful in the specific context of your head (e.g., some sort of “could”s and “would”s that do you good), your attempt to re-create something similar-looking in an AI may well lose the usefulness. You get cargo cult could/​woulds.

+ Thanks to Z M Davis for the above gorgeous diagram.