There are several proposed alternative theories of rationality. Some of them (causal decision theory (CDT), evidential decision theory (EDT), functional decision theory (FDT)) are referred to as decision theories, and there is endless debate over which decision theory is the best. Alternatively, there is the long prophesied, still-not-actually-discovered theory of embedded agency. Interestingly, everyone seems to pretty much agree that once this theory exists, it will be the actual really best theory of rationality, but regardless continues to argue about the best decision theory in the meantime.

I think that these theories of rationality really form a kind of ladder of increasing simplification. “Embedded agency” happens to be the lowest (say, deepest) rung on the ladder, making it the hardest to analyze. Causal decision theory is the top rung, making a lot of simplifications, and EDT is in some sense in between. This means that arguing over whether CDT or EDT is “better” is analogous to arguing about whether EDT or embedded agency is better—there are a lot of complications depending on what you want from your theory.

I will call the top rung simply “Action Theory”^[1] because I believe that a theory of the value of specific actions (reasonably equivalent to CDT) is the furthest simplification.

Action Theory

Decision theory, with its origins dating back to arguably the 1700s, was invented by a host of mathematicians and economists, most notably von Neumann and Morgenstern (in the 1900s), to help humans win at gambling and make other choices. Because it’s hard to mathematically formalize the problem “how should I human?” (or for that matter, “how can I build the most useful A.I.”, though no one was thinking about that in the 1700s) this theory ignores the internal details of the decision maker, walls them off from the rest of the universe, and asks what decision it would be best for them to make. For that reason, the classical decision theory is an action theory. This theory has been incredibly influential throughout economics, finance, A.I., and casino design. I wouldn’t even know how to get started quantifying its effects on government policy but it’s certainly the basis for a mathematical justification of capitalism, an institution which has had some impact. All of reinforcement learning (RL) is based on action theory. The framework is sufficiently general to encompass many problems we care about (winning at Go, driving a car, etc.) while still being sufficiently clean that some things have even been proven about it, including properties of some useful algorithms (though the general POMDP case is undecidable, and NP hard cases pop up everywhere). Of course, action theory fails to take into account the fact that humans are actually a part of our universe. This makes it harder to talk about anthropic effects, or to take into account the risk that one’s later decision won’t be as good if sleep deprived or poisoned (note the last two are a problem when decision theory is extended naively to the sequential case, assuming one will have full control over one’s later decisions). But all in, action theory has been a very successful paradigm.

Notable approaches: causal decision theory, sequential causal decision theory, ? sequential action evidential decision theory ?

General Solution: Arguably AIXI

Policy Theory

I just invented the name Policy Theory. It corresponds to the case that one desires to design the best policy, under the assumption that some details about the policy may be “public information” or may leak out and effect the environment in some way other than through decisions. For instance, in Newcomb’s Problem, the entity Omega has some knowledge of the “player’s” policy allowing Omega to simulate it. This family of theories, including EDT and FDT, was invented in the late 1900s by philosophers and philosophically inclined computer scientists. Though there doesn’t seem to be any overarching practical reason, Eliezer Yudkowsky has worked on this area in hopes of aligning an A.I. In fact, speculative A.I. design seems to be the most practical purpose of this theory. It can be formalized in toy models, but I am not aware of any useful applications of this theory in industry. It has been suggested that humans should apply this theory instead of causal decision theory, because other humans have some limited ability to read our policies. For instance, once spouse may be able to tell if one is dishonest or unfaithful. However, the lack of clearly useful applications causes me to doubt the seriousness of this necessity.

Orseau and Ring proposed an AIXI like general solution to Policy Theory for an environment interacting with an agent’s policy in arbitrary ways in their paper “Space Time Embedded Intelligence.” However, this formalization doesn’t seem to have led to any approximation algorithms, and it seems conceivable that other “Policy Theorists” would prefer other formalizations, perhaps with more restrictions to the interaction between environment and policy.

Notable approaches: evidential decision theory, sequential policy evidential decision theory, functional decision theory...

General Solution: I don’t think one has been invented.

Agent Theory

I chose the name Agent Theory for this group of approaches because it is as general as the assumptions made by the theories; an agent is just something that has an effect. In Agent Theory we focus on the design of a useful agent embedded to some extent in our universe. Within lesswrong this is usually called embedded agency and taken to describe how an agent that is part of its universe should act in general. Stuart Russell has a slightly more A.I. centered approach called bounded optimality, that asks explicitly what is the best program to implement on a particular machine (arguably this provides some simplification in separating the machine from the rest of the environment, but still acknowledges that the program must run on hardware). Agent theory was invented almost entirely to talk about building an agent, and its discussed almost entirely by A.I. researchers. There doesn’t seem to be a general formal theory at all (though Orseau and Ring took a swing at this too), so I don’t think it makes sense to talk about applications in industry. It is true however that some “embeddedness” themed research directions exist such as embodied intelligence in robotics. Agent theory doesn’t really seem to be focused on how humans should act at all, except for making the observation that it isn’t wise to drop an anvil on your head since your head won’t work for making decisions after that. Obviously this isn’t a useful theoretical result because everyone knows not to drop an anvil on their head. In fact, humans don’t have fine grained write access to our own brains, so we aren’t capable of applying bounded optimality results at all. The advantage of Agent Theory is that we really are beings embeddded inside a universe, trying to build other beings inside a universe.

Notable approaches: embedded agency, bounded optimality

General Solution: Not in the least, no.

Conclusions

Some conclusions are obvious from the above list. Action theory is a theory of the best actions, Policy Theory of the best policies (which choose actions), and Agent Theory of the best agents (which implement policies).

The higher rung theories tend to lay out normative decision making for humans, entities who’s policies are already a little fixed by habit and who’s programs are already mostly baked in by evolution. A theory of “the best possible agent” for humans might suggest a complete rewiring of the brain, which is not necessarily of any use to an existing human.

The lower rungs tend to focus on the optimal way to build an A.I.

The higher rungs are more simplified, and probably as a result have more elegant theory AND more practical results.

Now the debate over whether CDT or EDT is a better decision theory looks silly on many levels. CDT is an action theory but EDT is sometimes a policy theory; it makes less assumptions so may be closer to reality but is much harder to use for anything. I think some of the confusion has arisen because EDT has two extensions to the sequential case (see “Sequential Extensions of Causal and Evidential Decision Theory” by Everitt, Leike, and Hutter), one of which is more of an action theory and the other more of a policy theory. Ultimately, assuming that humans can in fact choose our policies (which is questionable), the “better” of CDT and EDT should be adjudicated by the more fundamental Agent Theory. Though EDT may seem to be closer to Agent Theory since it is further down the ladder, its assumptions actually seem pretty far fetched (why can the environment see the policy but not the program?) and it’s conceivably worse than CDT anyway, particularly if further arbitrary choices about how the environment uses the policy are made. Also, if the environment only runs simulations of the policy, CDT and EDT seem to collapse into each other; a causal decision theorist with reason to believe they may be in a simulation acts like an evidential decision theorist. Incidentally, I would one box in Newcomb’s problem NOT because I endorse EDT but because there seems to be a 50% chance I’m in a simulation. If I knew I weren’t in a simulation, I would two box. But if Omega is capable of reading my neuron’s directly to determine whether I would two box, considerations of simulation no longer apply. In a universe with many Omega’s floating around presenting Newcomb’s Problem, when one is designing an agent one should design it to one box. But without experience with Omega, it’s really unclear whether Omega would reward one pattern of neurons or another, which is why Agent Theory is so hard. In other words, the solution to Newcomb’s problem really depends on what assumptions you are making. But practically speaking, one box.

The problem with pursuing Agent Theory is that “building the best possible thing” is very thorny. It seems likely to depend somewhat on the details of our universe, since that is after all where we are building stuff, so a solution as general as AIXI looks unlikely. Also, the best possible “Agent” to build isn’t even guaranteed to be smart. Averaged out over all universes, maybe the best thing to build is nothing. If the world is about to flood, maybe the best thing to build is a boat. Studying Agent Theory is not the same as studying Action Theory, and any lessons learned won’t necessarily be useful for human applied rationality.

I expect that most of the time, the assumptions of Action Theory are pretty good across universes similar to ours. So if you wanted to build an A.G.I., it would actually be a good idea to start with the simplifying assumptions of Action Theory and then make corrections as necessary for the flaws in those assumptions. In fact, I think it is impossible to make progress in either A.G.I. design or A.I. alignment without leveraging some simplifying assumptions.

^
Originally, I called this “Decision Theory” but that name is already entrenched as referring to all three of my levels.

Action theory is not policy theory is not agent theory

Action Theory

Policy Theory

Agent Theory

Conclusions