Crystal Healing — or the Origins of Expected Utility Maximizers
(Note: John discusses similar ideas here. We drafted this before he published his post, so some of the concepts might jar if you read it with his framing in mind. )
Traditionally, focus in Agent Foundations has been around the characterization of ideal agents, often in terms of coherence theorems that state that, under certain conditions capturing rational decision-making, an agent satisfying these conditions must behave as if it maximizes a utility function. In this post, we are not so much interested in characterizing ideal agents — at least not directly. Rather, we are interested in how true agents and not-so true agents may be classified and taxonomized, how agents and pseudo-agents may be hierarchically aggregated and composed out of subagents and how agents with different preferences may be formed and selected for in different training regimes. First and foremost, we are concerned with how unified goal-directed agents form from a not-quite-agentic substratum. In other words, we are interested in Selection Theorems rather than Coherence Theorems. (We take the point of view that a significant part of the content of the coherence theorems is not so much in the theorems or rationality conditions themselves but in the money-pump arguments that are used to defend the conditions.)
This post concerns how expected utility maximizers may form from entities with incomplete preferences.
Incomplete preferences on the road to maximizing utility
The classical model of a rational agent assumes it has vNM-preferences. We assume, in particular, that the agent is complete — i.e., that for any options , we have or . However, in real-life agents, we often see incompleteness — i.e., a preference for default states, or maybe a path-dependent preference, or maybe a sense that a preference between two options is yet to be defined; we will leave the precise meaning of incompleteness somewhat open for the purposes of this post. The aim of this post is to understand the selection pressures that push agents towards completeness. Here are the main reasons we consider this important:
Understanding the pressures towards completeness from a plausibly incomplete state of nature can help us understand what kinds of intelligent optimization processes are likely to arise, and in which kinds of conditions / training regimes.
In particular, since something like completeness seems to be an important facet of consequentialism, understanding the pressures towards completeness helps us understand the pressures towards consequentialism.
Review of “Why Subagents”
Let’s review John Wentworth’s post “Why subagents?” . John observes that inexploitability (= not taking sure losses) is not sufficient to be a vNM expected utility maximizer. An agent with incomplete preferences can be inexploitable without being an expected utility maximizer.
Although these agents are inexploitable, they can have path-dependent preferences. If John had grown up in Manchester he’d be a United fan. If he had grown up in Liverpool he’d be cracking the skulls of Manchester United fans. We could model this as a dynamical change of preferences (“preference formation”). Alternatively, we can model this as John having incomplete preferences: if he grows up in Manchester he loves United and wouldn’t take the offer to switch to Liverpool. If he grows up in Liverpool he loves whatever the team in Liverpool is and doesn’t switch to United. In other words, incompleteness is a frame to look at preference formation.
John Wentworth observes that an incomplete preference agent can be seen as composed of a collection of pure vNM agents having a default (action); it only changes that default action when all vNM subagents unanimously agree. You could say that an incomplete egregore/agent is like a ‘vetocracy’ of VNM subagents.
We can now look at the set of total orders such that extends , meaning that if according to then according to . We know that may be characterized as the intersection of all in .
A vetocracy is a pair of a pre-order and a factorization into total orders , where we endow the product with the component-wise order if for all . We call these factors tribunes.
In this model, a superagent is composed out of subagents. Examples might be a company that is composed out of its employees or a market arising as the aggregation of many economic actors. Another example is an agent through time where as in the Steward of Myselves or in ‘society of mind’ models of the human mind where we see it as , see e.g. Multiagent Models of Mind.
Crystal Healing: how I stopped worrying and learned to accept Sure Gains 
An entity with incomplete preferences can be inexploitable (= does not take sure losses) but it generically leaves sure gains on the table.
As an example, Consider Atticus the Agent. Atticus is a vetocracy made of two tribunes, one that prefers and the other that prefers . Suppose it is offered (by a third party) to switch and then . If it would take both trades, it would gain in total: a sure gain. However, since its sub-tribunes will veto, this doesn’t happen.
If we could get the tribunes to cooperate with one another, the superagent would learn to accept the sure gain and become more complete. Moreover, once complete, it is subject to money-pumps — there is now selection pressure for it to move towards becoming a goal-directed (approximate) utility maximizer.
Consider, as an example, the case where one is placed behind a veil of ignorance about the default option, and one is told that there will be a chance of being placed in a world where is the default and one is offered to switch to -plus—dollar, and a chance of being placed in a world where is the default and one is offered to switch to -plus-1-dollar. An analogous situation could also effectively happen without a veil of ignorance for a logical decision theorist (see the discussion Löbian cooperation below). The strategy where both tribunes enforce their veto on the choices offered leads to an outcome distribution of , whereas the strategy where both tribunes forgo their veto rights leads to the same distribution but with dollar extra. Bilateral vetoing is thus Pareto dominated in this setup as well. Note that — in contrast to the sequential example above where if agents foresee the pair of offers, they could just agree on a policy of accepting both beforehand, and this would simply make everyone better off — in this probabilistic case, there is no world in which everyone becomes better off without the vetoes, and yet there is still a Pareto sense in which an incomplete agent is leaving money on the table.
In terms of the payoffs, the above reduces to a Prisoner’s Dilemma where vetoing corresponds to defect and not vetoing corresponds to cooperate. The usual considerations of prisoner’s dilemma apply: the default is the Defect-Defect Nash equilibrium unless
the game is repeated with no horizon
there is a credible commitment device
Remark. Some takeaways: consideration number 3 suggests superagents with more homogenous subagents are more coherent. Consideration number 4 suggests that doing ‘internal conflict resolution and trust building’ makes the superagent more coherent. This is highly reminiscient of the praxis of Internal Family Systems, see especially Kaj Sotala’s sequence on IFS and multiagent models of minds.
Remark. Usually the modal Löbian cooperation is dismissed as not relevant for real situations but it is plausible that Löbian cooperation extends far more broadly than what is proved currently. Löbian cooperation is stronger in cases where the players resemble each other and/or have access to one another’s blueprint. This is arguably only very approximately the case between different humans but it is much closer to be the case when we are considering different versions of the same human through time as well as subminds of that human.
Instead of the framing where completeness ought to come about first for any other selection pressures toward consistent preferences to apply (which we have operated within to some extent in the presentation above), a framing that is likely a better fit for the realistic situations is the following:
Simultaneously, there is a pressure to become complete and a pressure to become consistent. The pressure to become complete is the pressure of it being suboptimal to leave money on the table — to lose out on the many opportunities presented to one by the world. Insofar as one’s computational bounds or other constraints prevent one from ensuring that one’s preferences all hang together consistently, this pressure is balanced by a pressure not to be money-pumpable, which plausibly incentivizes one to avoid forming preferences — forming a preference comes with the danger of closing a cycle along which one can be money-pumped.
In conclusion, virtue ethics is a weakness of the will. Bold actors jump headfirst into the degenerate day trading of life—forever betwixt the siren call of FOMO and the pain of true loss.
In the vNM-theorem options are assumed to be ‘lotteries’, i.e. probability distributions on outcomes.
If one were maximally glib, one could say that all of AGI alignment can be considered a question of path-dependent preferences.
Or, more precisely, in canonical such arguments, completeness is established first, and the justification of every other coherence property relies on completeness. We do not wish to claim that completeness is necessarily a prerequisite — in fact, we will later gesture at a different perspective which places all coherence properties on more of an equal footing.
In this post we mostly think of as finite for simplicity, although this is highly restrictive—recall that the VNM theorem assumes that we start with a preference ordering on where is the space of probability distributions (or ‘lotteries’) on some sample space .
In shard “theory” subagenty things are called shards. In IFS they are called parts. Part of the IFS dogma is having your different parts talk with one another and resolve conflict and thereby gain greater coherence of the superagent (you!). Clem von Stengel suggested the name crystal healing for this process as a joke. All blame for using it goes to us.
for further reflections on the ubiquity of Lobian cooperation see here.