Epistemic status: Theorem.

When we look at the world we see a repeating pattern: people, corporations, bacteria, even some AIs; we call them agents. They take in sensory data and spit out actions that tend to move the world in a certain direction. Given the fact this pattern does exist, we might then ask: but what exactly is an agent? Is it a binary property? In principle, could we take a model of the world and have a computer program identify all the agents within? Why do they act they way they do? How could, in principle, such a thing behave? If we are agents, does our membership in this class, on its own, say something about how we ought to behave?

Arguably, the closest thing we have to an answer are the so called coherence theorems, of which there are a few. They basically say that if some entity’s behaviour is reasonable, in some sense, they must act in some particular way. And that way is expected utility maximisation: given a set of possible actions, each with an associated probability distribution, called a lottery, over outcomes, we must assign a number to each outcome, its utility, and pick the action with the highest utility in expectation.

Explaining how each coherence theorem works is beyond the scope of this post (for an introduction see e.g. this or this), but to see the issues with them, let’s start with what is perhaps the most famous one around here: the Von Neumann-Morgenstern (VNM) utility theorem. It talks about preferences over pairs of lotteries. For all lotteries , and , with representing that is “preferred” to , representing “indifference” between them, and that either of those is the case, if we have that:

, or , or
If and , then
If , there exists some probability such that
- Keep in mind that lotteries are probability distributions, which can be treated as vectors
if and only if, for every ,
then the theorem says that we can assign an utility to each possible outcome such that if and only if .

And sure, these all seem reasonable, but must agents act that way? Is there really something wrong with e.g. when given the pair of options and , choosing ; when given and , choosing ; and when given and choosing ? It might look stupid, but it’s certainly a way an agent is physically allowed to act. One way of justifying not behaving in this kind of way is via the so called money-pump arguments. Imagine we can have an apple (), bread (), or chocolate (), with , and . If we start with , then someone might offer us in exchange for some non-zero amount of money, and since we prefer to , there must be some amount, even if it is very small, for which we are willing to make the trade (otherwise, do we really prefer to ?), so we make it. But then they can offer us , free of charge, which we of course take (), and then they do the same with , which we also take (). So now we have again, just like at the beginning, but are a bit poorer. But then again, is this necessarily wrong? What’s to say that we prefer having money to not having it? That agents in general must prefer to have more money? What if there is no money around? Can’t we just behave one way when there is money and another way when there isn’t? What if we actually prefer going in circles (e.g. if , and instead are cities we are travelling to) and are willing to pay money to do so? And why must we have preferences in the first place?

We can also generalise the previous argument to work with lotteries, and (with some caveats) that kind of reasoning (also called a Dutch book argument) can be used to try to justify the other assumptions. In any case, the other assumptions are a bit more questionable, and have been endlessly questioned.

Savage’s theorem is similar to VNM’s, and other theorems are even worse: e.g. the complete class theorem and Peterson’s direct argument both start by assuming agents have an utility function, with bigger values always being better.

Here, I present a simple and arguably stronger case for expected utility maximisation, one that doesn’t start by basically assuming that what we have is already an optimiser. Since we are interested in a subset of the things in the world (agents), we need to put some bits into specifying which things those are. The assumption is, roughly speaking, that agents, if given the choice, choose to behave like they do behave. This can be turned into two formal criteria, which I’ll call axioms:

Axiom 1: Consequentialism

Let’s define a scenario to be an ordered list of lotteries, where:

As before, each lottery is a probability distribution over the set of possible outcomes.
The set of outcomes is finite and the same for all lotteries of all scenarios.
Each index in each of lotteries corresponds to an action an agent can take. If that action is chosen, the outcome is sampled from the corresponding lottery.
In each scenario, the number of possible actions is always the same. This number is finite, but at least two. Actions are only identified by their index.

Yes, that's "Outcomes", ending with a lowercase "s"

Faced with a scenario, an agent “outputs” a probability distribution over the actions. This distribution (like all others here) is allowed to be deterministic (gives probability 1 to some action). To get an outcome, we first sample the action distribution and then the outcome distribution of the associated lottery. If we weight each lottery (compute a weighted average of their vectors) by the probability of their associated action, we get a new lottery over outcomes. We say that the agent’s action distribution leads to this outcome distribution.

The behaviour of an agent is called consequentialist if it is a function from the list of lotteries of whichever scenario they face to a distribution over actions. Phrased another way, given a consequentialist agent and a scenario, the probability distribution over actions is completely determined by the (ordered) list of lotteries of that scenario.

For example, an agent that always picks the same action is a consequentialist. So is one that always picks an action uniformly at random. So is one that assigns numbers to outcomes and picks the action with the expected value closest to 12. Any function to valid probability distributions over actions is allowed, no matter how nonsensical.

If the action distributions depend on anything else, the behaviour is not consequentialist.

Axiom 2: Reflective stability

To motivate the definition, consider the leader of an organisation. They usually take actions, which then lead to consequences. But today they have to take a very special kind of action: they are retiring, and need to choose their successor. Their successor, whoever it may be, will face some scenario, unknown a priori, and will also have to take a (possibly stochastic) action, that will lead to consequences.

This can be formalised into a succession setup. It consists of the leader (an agent), a set of possible successor agents, a set of scenarios (always the same finite number, but at least two) and a probability distribution over those scenarios. It works as follows:

The leader produces a probability distribution over actions, each action associated with a possible successor.
An action is sampled an a successor is chosen.
The probability distribution over scenarios (independent of the chosen successor) is sampled, and a scenario is chosen.
The successor is faced with the scenario, and produces a probability distribution over possible actions.
An action is sampled and an outcome lottery is chosen.
The outcome lottery is sampled, leading to some concrete outcome.
At each node of the tree this creates, from the leaves to the root, the outcome distribution is computed as an appropriately weighted sum of the outcome distributions of each of the following nodes. The situation the leader faces when choosing a successor is itself a scenario (following the same rules as the others, which means the number of successors is the same as the number of actions), but for clarity won’t be referred to as such.

For two scenarios, three available actions and three possible outcomes, the drawing above depicts what such a succession setup could look like.

Now, there are two kinds of leaders: those that pick successors that would do the same as they would, and those that don’t. Organisations with leadership that chooses successors that behave like they do, can keep on behaving in whatever way they do, and those that don’t, well… don’t. This change in behaviour could in principle even imply losing the property of choosing successors that don’t behave like the current leader. In a word, the behaviour of this second kind of organisation is unstable.

We could call a leader reflectively stable iff, on all succession setups where they are the leader, they pick a successor that on each scenario produces the same action distribution as they themselves would. Although ideally reflective stability would be defined for agents in general, for the sake of clarity this definition and the ones that follow only apply to consequentialist leaders, and the successors are characterised by the action distributions they produce on each of the available scenarios.

An obvious problem with this definition is that it could be the case that there is no such successor. So we can change the criterion to: whenever there is at least one successor that on each scenario produces the same action distribution as the leader would, the leader should pick one such successor.

Then, it could also be the case that on some scenario there exists some other action distribution that leads to the exact same outcome distribution (which could happen e.g. if two different actions have the exact same consequences). It doesn’t make much sense to require that the action distributions be exactly the same, if the outcomes are identical. Therefore, we can change the criterion again to be: whenever there is at least one successor that on each scenario (for that same successor) leads to the same outcome distribution as the leader would, the leader should pick one such successor.

Finally, to deal with the case where there are multiple acceptable successors, we can change the criterion to be: whenever there is at least one successor that on each scenario (for that same successor) leads to the same outcome distribution as the leader would (let’s call these valid successors), the outcome distribution at the root node should be identical to the outcome distribution that would result if the leader were to pick a valid successor (or, equivalently, themselves, if they were available). Notice that this is a weaker condition than the previous one, which is good if we want to prove its implications. If this sounds confusing^[1], keep reading, and you will see that it’s geometrically intuitive.

And that’s it, no more assumptions are needed. These two combined give my theorem:

Theorem: Reflectively stable consequentialists are expected utility maximisers

Or, more precisely, among consequentialists, for some number of available actions and some set of outcomes, these two conditions are equivalent:

Being reflectively stable with respect to any choice of number of scenarios.
Being an expected utility maximiser.
- I’ll leave exactly what counts as an expected utility maximiser to a later section, for reasons that will then be clear.

Proof

For visualisation purposes, let’s use 3 possible outcomes: , and , as in the drawings above. We are interested in the set of lotteries over them. We have 3 probabilities (that of each outcome), so we can treat each lottery as a point in 3D space:

Since the probabilities must sum to one, all the valid lotteries lie on the plane that passes through (chance of 1 for to happen, and 0 to other outcomes), (same with ) and (same with ). And since to be valid probabilities they must be between 0 and 1, they must be on the cube with 3 of its edges coinciding with the vectors of those lotteries. Together, these conditions mean that the set of valid lotteries corresponds to the intersection of that plane through the unit cube:

This is the equilateral triangle with vertices at , and . From now on, it will be drawn directly, ignoring the 3D space in which it is contained:

Importantly for us, this set is convex, that is, given any pair of points within, the segment connecting them is completely within the set. This is the case for any number of outcomes. If we have some , and use it to linearly interpolate between lotteries and (treated as vectors), to obtain , then the result is also a valid lottery. This can be seen by checking that if the probabilities in and sum to one, and are all between 0 and 1, so are those of the result. Alternatively, if we treat as the probability of sampling an outcome from and as the probability of sampling from , then the resulting distribution over outcomes is and obviously still a lottery.

Now, consider two scenarios, each with 3 available actions:

Each vertex of the two inner triangles corresponds to the lottery of one of the three available actions in their associated scenario. Agents can choose any distribution over the actions available in each scenario, and therefore any point in the triangles formed by those actions. In general, an agent in a scenario can choose any point that lies in the convex hull^[2] of the lotteries of the available actions. Also in general, multiple distributions over actions might lead to the same final lottery, but that won’t matter for the proof, and in the drawing above the correspondence between action distributions and outcome lotteries is one-to-one.

Then, consider three agents, one of which is a valid successor (which, remember, on each scenario leads to the same outcome distribution as the leader would). In each scenario they choose an outcome distribution:

It’s important to keep in mind that for consequentialists the order of the actions matters, which isn’t depicted in the drawings.

Immediately after the leader chooses a successor, we have a probability distribution over scenarios. This is the same for all agents, and serves to interpolate over their lotteries. In the case above, and removing the triangle of all lotteries to avoid clutter, it could look as follows:

The brightened area represents all the lotteries the leader could choose from by picking a probability distribution over successors. A consequentialist is reflectively stable if when given the interpolated lotteries of each agent (in any order) its action distribution leads to picking the lottery produced by the (a) valid successor.

Consider the lottery the leader would choose when faced with some scenario. There will be a set of directions in which we could move that point while remaining in the set of achievable lotteries. Let’s call these free directions and the non-zero vectors from the choice of the agent to some other achievable point difference vectors or vectors.

Then we have:

Lemma 1: For a reflectively stable consequentialist there is no pair of scenarios where in one a direction is free and in the other the opposite direction is also free

The proof will be by contradiction: If such free directions exist for a pair of scenarios, it is possible (details below) to create two succession setups where the scenario presented to the leader (choice of successor) is exactly the same, but they must pick a different outcome distribution in each to be reflectively stable. Since the action distribution of a consequentialist is a function of the available lotteries, so is the outcome distribution, and functions must produce the same outputs given the same inputs, therefore we have a contradiction.

Consider any pair of scenarios, let’s call them 1 and 2. They could even be identical, but the order of the actions matters.

Let’s call:

$⭐$ the lottery a valid successor (or the leader, if placed there) leads to in scenario 1, and
$⭐$ the lottery a valid successor leads to in scenario 2.

Assume it is possible for some action distribution and some (non-zero) vector d to lead to:

$⭐$ in scenario 1, and
$⭐$ in scenario 2.

Then we can create the following succession setups:

Setup 1

Scenarios
- Scenario 1 (arbitrary), with probability
- Scenario 2 (arbitrary), with probability
- Other scenarios (arbitrary, if any exist), with probability each and in total
- - Remember we are considering some fixed number of scenarios on all setups
Agents (in this order)
- Valid successor
- - In scenario 1, leads to $⭐$
  - In scenario 2, leads to $⭐$
  - In other scenarios (if any exist), they lead to arbitrary lotteries
  - Given scenario uncertainty, this agent leads to $⭐ ⭐$
  - - This is what the whole setup should lead to for the leader to be reflectively stable
- Agent 2
- - In scenario 1, leads to $⭐$
  - In scenario 2, leads to $⭐$ (possible by assumption)
  - In other scenarios (if any exist), they lead to arbitrary lotteries
  - Given scenario uncertainty, this agent leads to $⭐ ⭐$
- Other agents (if any exist)
- - In scenario 1, they each lead to arbitrary lotteries (can differ between agents), let’s call them for agent
  - In scenario 2, they each lead to $⭐$ (possible by assumption)
  - In other scenarios (if any exist), they lead to arbitrary lotteries (that can differ between agents)
  - Given scenario uncertainty, this leads to $⭐$ for each agent

Visually, and drawing the outcome distributions of only the first two agents for the sake of clarity (even though that’s technically wrong for 3 actions), that looks like:

Setup 2

Scenarios
- Scenario 1 (same as before), with probability
- Scenario 3 (all actions lead to $⭐$ ), with probability
- Other scenarios (arbitrary, if any exist), with probability each and in total
Agents (in this order)
- Agent 3
- - In scenario 1, leads to $⭐$ (possible by assumption)
  - In scenario 3, leads to $⭐$ (as that’s the only possibility)
  - In other scenarios (if any exist), they lead to arbitrary lotteries
  - Given scenario uncertainty, this agent leads to $⭐ ⭐ ⭐ ⭐$
- Valid successor
- - In scenario 1, leads to $⭐$
  - In scenario 3, leads to $⭐$ (as that’s the only possibility)
  - In other scenarios (if any exist), they lead to arbitrary lotteries
  - Given scenario uncertainty, this agent leads to $⭐ ⭐$
  - - This is what the whole setup should lead to for the leader to be reflectively stable
- Other agents (if any exist)
- - In scenario 1, they lead to the same lotteries per agent as in setup 1 ( for agent )
  - - In general they aren’t necessarily consequentialist, only the leader is, so this is just a choice made to make the setup work
  - In scenario 2, they lead to $⭐$ (as that’s the only possibility)
  - In other scenarios (if any exist), they lead to arbitrary lotteries (that can differ between agents)
  - Given scenario uncertainty, this leads to $⭐$ for each agent

Visually, and again drawing the outcome distributions of only the first two agents, that looks like:

Consequentialism creates an information bottleneck that makes these two setups indistinguishable from the point of view of the behaviour of the leader. You can check that the lotteries they have available in each setup are exactly the same, but in setup 1 they should pick $⭐ ⭐$ (lottery of the valid successor, which is in position 1), and in setup 2 they should pick $⭐ ⭐$ (lottery of the valid successor, which is in position 2). These are only identical if , which is false by assumption and, again, choosing different lotteries given the same inputs contradicts consequentialism, and therefore we can deduce that for no pair of scenarios and a non-zero vector can the direction of be free at $⭐$ in the first scenario and the direction of be free at $⭐$ in the second scenario.

We can generalise this to directions by realising that if we can move a non-zero amount in one direction in one scenario, and a non-zero amount in the opposite direction in the other scenario, those vectors are either exactly the reverse of each other (in which case the reasoning above directly applies), or one is shorter (which is well-defined because they are proportional), in which case we can scale back the longest one (and get a valid lottery thanks to convexity) to get two exactly opposite vectors and therefore get a contradiction.

This is already a quite powerful result. It implies e.g. that leaders can only pick vertices. But its true purpose is to make it possible to prove the next lemmas.

Lemma 2: For reflectively stable consequentialists the set of d vectorsis convex

Consider again an arbitrary pair of scenarios, 1 and 2, where:

$⭐$ is the lottery a valid successor (or the leader, if placed there) leads to in scenario 1, and
$⭐$ the lottery a valid successor leads to in scenario 2.

Consider also any two (non-zero) vectors and such that it is possible for some action distribution to lead to:

$⭐$ in scenario 1, and
$⭐$ in scenario 2.

Then, for any , we can create the following succession setup:

Scenarios
- Scenario 1 (arbitrary), with probability
- Scenario 2 (arbitrary), with probability
- Other scenarios (arbitrary, if any exist), with probability each and in total
Agents
- Valid successor
- - In scenario 1, leads to $⭐$
  - In scenario 2, leads to $⭐$
  - In other scenarios (if any exist), they lead to arbitrary lotteries
  - Given scenario uncertainty, this agent leads to $⭐ ⭐$
  - - This is what the whole setup should lead to for the leader to be reflectively stable
- Agent 2
- - In scenario 1, leads to $⭐$ (possible by assumption)
  - In scenario 2, leads to $⭐$ (possible by assumption)
  - In other scenarios (if any exist), they lead to arbitrary lotteries
  - Given scenario uncertainty, this agent leads to $⭐ ⭐ ⭐ ⭐$
- Other agents (if any exist) lead to arbitrary lotteries

If the leader is reflectively stable, they choose an action distribution that leads to $⭐ ⭐$ . There also exists an action distribution that leads to $⭐ ⭐$ , which is giving probability one to agent 2. Therefore, for any , and any pair of vectors and , the difference between those two distribution, , that is, any other in the segment connecting them, is also an available vector, as long as it isn’t zero. We know it can’t be zero, because otherwise we have:

And is positive, which would mean that points in the opposite direction to , which is forbidden by lemma 1.

Lemma 3: For a reflectively stable consequentialist there exists a single hyperplane passing through zero such that one of the resulting half-spaces does not contain d vectors

Consider the set formed by taking, for each vector, its reverse, .

This and the set of vectors must be disjoint, because if there existed a vector that was a member of both, it would be a vector (since it is in the set of vectors), and its reverse would also be a vector (since it is in the set of vectors) , which contradicts lemma 1.

By lemma 2, the set of d vectors is convex, and since the vectors interpolating any pair of vectors are also reversed when considering the set of reverse d vectors, so is the reverse set.

Since vectors are obtained by subtracting pairs lotteries, and the components of lotteries sum to one (as they are probability distributions), d vectors (and their reverses) lie on the subspace of vectors whose components sum to zero. This subspace generalises d vectors and is obtained by considering any “lottery” of the appropriate dimensionality, whose components sum, even if they are not between 0 and 1, as if they were an available choice on some scenario, and subtracting the lottery the leader would actually lead to in that same scenario.

With these three facts, we can use the power of the hyperplane separation theorem, which roughly says (click the link for the full statement) that for any two disjoint convex sets there always exists a hyperplane separating them.

In the subspace, the lotteries the leader leads to correspond to the zero vector. The set of lotteries that action distributions can lead to in a scenario is always convex, because any one such lottery is a weighted sum (with weights between 0 and 1) over the action lotteries, and therefore any lottery in the segment connecting any pair of available outcome lotteries can be obtained by equivalently interpolating the action probabilities. Therefore, if we consider any of the vectors available on some scenario, we can scale them so that the result of adding them to the leader’s outcome lottery falls within an arbitrarily small radius. This means that there are vectors (and vectors) arbitrarily close to zero. Since these cannot be on the same side of the hyperplane, it must pass exactly through zero, which isn’t a vector by definition.

Considering a vector orthogonal to the hyperplane, pointing to the side of the vectors, it is clear that the outcome distribution the leader leads to on each scenario must correspond to a point maximal along that direction, otherwise there would exist a d vector on the side of the half-space produced by the hyperplane.

The only thing left to do to prove the lemma is showing that not only is there a direction in which the leader’s outcome lotteries are maximal, but that there is exactly one. This will be done by constructing a scenario where two different directions would imply different outcome distributions, while a consequentialist must have exactly one on each scenario.

Consider any lottery , such that none of its outcome probabilities are either 0 or 1. Assume that the leader has two different separating hyperplanes with normal vectors to the side, and , each with Euclidean norm 1. Then there must exist some such that both and are valid lotteries. This is because they satisfy the necessary conditions for being discrete probability distributions: the components of both and sum to 0, so the components of each of these new vectors must sum to 1, and with a sufficiently small strictly positive k, we can make all the components of the vectors we are adding small enough that they stay within the gaps between the components of and and .

We can then construct a scenario where the first action leads to , the second to , and all the others, if any exist, to lotteries in the segment between these two points, inclusive. Since all the action lotteries are in that segment, so are all the lotteries any agent could lead to, which means they can all be written as for some .

The Euclidean distance of a point along some direction is computed (up to an additive constant depending on where we start measuring from) by taking its dot product with the unit vector along that direction.

If we have some separating hyperplane with a unit-norm normal vector pointing to the side, we will select the lottery (with an associated ) that maximises:

Since is positive and the second term is constant in , the that maximises the expression is the same that maximises .

For the first hyperplane, and , which is the single maximum, and therefore is multiplying a strictly positive number, and the expression is maximised at , which corresponds to picking . Analogously, if , the expression is maximised at , which corresponds to picking .

Since by assumption , this means the agent must lead to different outcome distributions (by producing different action distributions) in the same scenario, which contradicts consequentialism, and therefore implies that there can only be one separating hyperplane.

Aside: Representation

For a lottery and a unit vector , the distance of along the direction of is . Since L is a probability distribution over outcomes, we can define a new random variable, let’s call it , that takes the value of the th component of with the th probability of . Then the formula can be rewritten as , where is a utility.

When people talk about expected utility maximisers, they might say that their utility function is only unique up to additive constants and multiplication by positive scalars, because those transformations don’t change which lottery from some set has higher expected utility, and therefore the actions agents take. That might be further be used to argue that it doesn’t make sense to compare the utilities given to outcomes by different agents.

I just want to point out that this is a representational artifact. EU maximisers have, in the sense explained above, one and only one direction in which the lotteries they lead to are maximal. By trying to assign a number to each outcome we are adding degrees of freedom that weren’t originally there, and if you want say that in order to compare different utility functions they must be normalised first, you have to argue for why you consider that representation the default instead of simply starting with directions. I don’t mean to imply that can’t be done (utilities make some sense if, e.g., we are talking about resources, and are useful for calculation), in fact I think that that for real agents it probably should be done, but it’s something that needs to be argued for.

This also doesn’t mean that the utilities of different agents can’t be meaningfully compared or combined, only that we can’t simply do so by taking arbitrary functions compatible with behaviour. Utilities shouldn’t be compared or combined for the sake of it, but because we have some some theory that implies that’s the right thing to do, and such a theory will then hopefully tell us the exact way we should do so.

It’s worth noting that in the case of the VNM theorem, while you might be aware of a constructive proof, it turns out to be possible to arrive at the same conclusion by also making use of the hyperplane separation theorem.

Arguably, an advantage of directions over utilities is that with utilities it is tempting to think that since they are numbers, they could always be bigger, so we can have unboundedly high utilities, or “infinite” utilities even. This then leads to paradoxes and inconsistencies, while with directions those either can’t happen or their premises become much more clearly absurd. When talking about utilities it’s worth keeping in mind which specific theorems you are relying on to justify their use. This one, for instance, relies on the hyperplane separation theorem, which only works in a finite number of dimensions, and therefore can’t deal with, for example, infinite ethics.

Lemma 4: Dealing with indifference

One way in which you might try to rationalise arbitrary behaviour is by saying that the agent’s utility function is constant, and that therefore any action they take is optimal. But when taking actions there’s no such thing as indifference, there’s only the choices that are actually made, even if they are stochastic. Arbitrary behaviour is obviously not reflectively stable.

By the reasoning of lemma 3, there must be a normal vector, and we know it is embedded in a subspace of vectors whose components sum to zero. If all the outcomes had identical utility, the only way of making the components of the normal sum to zero is if they are all exactly zero, but such a vector is not a valid normal. Still, looking at the drawing above of the hyperplanes on the scenario triangle(s), there is a clear edge case: the case where the hyperplane (here just a line) lands on an edge.

If there are only 3 possible outcomes (the set of valid lotteries is just a 2D triangle), it’s easy to figure out what happens: Whenever the line lands on an edge, the leader must always pick the vertex on the same side of the edge. Any other behaviour would cause there to be a pair of vectors pointing in opposite directions, which is forbidden by lemma 1.

Note that any outcome lottery chosen along such an edge would result in there being vectors that fall exactly on the hyperplane. For general sets^[3] to which the hyperplane separation theorem applies, it only says that their points fall on opposite half-spaces, or on the hyperplane itself.

To see what happens in higher dimensions, remember that the hyperplane separation theorem was applied to the subspace of vectors with components that sum to zero. We can apply it to lower-dimensional subspaces too, in particular, to the hyperplane itself. The sets of (and reverse ) vectors that fall on the hyperplane is still convex, because any point in the segment between any two vectors is still a vector (by convexity of the whole set) and remains on the hyperplane. Then, by applying the reasoning of lemma 3, we get that there must be a separating hyperplane within the previously found hyperplane.

With this we have two normal vectors, and we can then repeat this procedure as many times as it takes, each time considering the set of vectors orthogonal to all the previously found normals, until we get to the 1D case, where we can apply lemma 1 (and have a degenerate hyperplane: a single point at 0). This gives us not a single direction, or two, but an entire orthogonal basis, with as many vectors as dimensions in lottery-space (the number of outcomes minus one), each resolving the ties created by the previous ones.

Let’s look at a few examples for the case of 4 possible outcomes:

In the drawing above, the first direction is enough to single-out the top vertex of the outcome distribution tetrahedron as the lottery the leader should choose. There are no vectors in the orthogonal plane (represented by the parallel lines), so the other directions have no effect.

In the one above a whole face is on the plane, and a single direction, the second one, is enough to determine the chosen point.

And in this final drawing the top face of the tetrahedron is oriented in such a way that even the second direction is not enough to select a single point, so the third is used.

With lemma 4, we have half the proof: Reflectively stable consequentialists are associated with a unique series of orthogonal directions, as many as the number of possible outcomes minus one, and they choose an action distribution that leads to an outcome distribution maximal in the first direction, or if that’s not unique, in the second, third, etcetera, until we get a single outcome distribution. We can call this behaviour expected utility maximisation.

Lemma 5: The converse

Among consequentialists, reflective stability implies expected utility maximisation. But that, on its own, does not mean that that is the only restriction such agents have on behaviour, nor that there even are any reflectively stable consequentialists. So we must prove the converse: among consequentialists, expected utility maximisation implies reflective stability.

The ordered basis of expected utility maximisers creates a lexicographic ordering on the set of outcome lotteries. The leader will pick, among all the outcome distributions they could lead to, the one that appears first in that ordering (sorting utilities from high to low). One such outcome distribution is the one that corresponds to deterministically choosing a valid successor. If we can prove that that distribution is always either identical to, or comes before in the ordering, any other outcome distribution achievable by choosing some successor probability distribution, we will have shown that the leader is reflectively stable.

It turns out that this can be simplified to proving that the outcome distribution of a valid successor is either identical to or comes before that of any other one agent (as opposed to that of a distribution among successors). This is because the outcome distribution produced by a distribution over agents is identical to that produced by an agent with action distributions on each scenario that are the succession-probability-weighted combinations of the action distributions of the original agents.

Here is an example of how that works for the case of collapsing two agents into a new one. Stare at the drawing until you understand it:

Let’s call:

the first expected utility of the lottery
, , , etc. the probabilities of each scenario happening
, , , etc. the outcome distributions of a valid successor on each scenario ( is for scenario 1, for scenario 2, etc.)
, , , etc. the outcome distributions of an arbitrary agent (possibly a valid successor) on each scenario

A valid successor will in each scenario choose action distributions that lead to outcome distributions that maximise the first utility. That is:

Since probabilities are non-negative that means that:

And by the linearity of expected utilities:

Which means that the first expected utility of the outcome distribution of a valid successor is greater than or equal to that of any other agent.

If the inequality of at least one scenario with non-zero probability is strict, the expected utility of the valid successor will be greater than that of the other agent, and therefore the valid successor will come first in the ordering.

Otherwise, for scenarios with non-zero probability, we only have equalities, and therefore the expected utilities of both agents are identical. This means that on each of these scenarios, and before a scenario is sampled, the second expected utility of the valid successor is greater than or equal to that of the other agent. Then this reasoning applies again for each of the utilities, until we get that the outcome distribution of the valid successor comes first in the ordering, or that all of the expected utilities of the outcome distributions of the agents are identical. Since the utilities form an orthogonal basis with as many directions as the intrinsic dimensionality of the set of valid lotteries, being identical on each one of them means that the outcome distributions of the agents are also identical.

And finally, combining lemmas 4 and 5, we have the proof that, among consequentialists, reflective stability is equivalent to expected utility maximisation.

Thought and structure

An common argument against expected utility maximisation is that we empirically know (e.g. through the Allais ands Ellsberg “paradoxes”) that people aren’t expected utility maximisers. But of course they aren’t, people are clearly and unambiguously not rational, especially about things as abstract as money. A theory that accurately describes human behaviour is a failure as a prescriptive theory of rationality, and vice versa.

In the limit, only behaviour matters and it can be directly characterised by theory. For finite things made of parts, structure drives behaviour, and attempting to describe it directly is a fool’s errand. This is a problem if we want to take concepts derived from the analysis of idealised agents (like expected utility maximisation) and use them to study real ones, so we must create a link between them somehow.

Consider the theorem presented here, it can be viewed as being being about a property that ideal agents must have when placed in this setup: they must choose successors that behave like themselves. But does it say anything about people? We aren’t faced with choosing our own successor any more often than we are faced with money pumps, so as nice as they might be as properties for agents to have, they are not the reason why they behave as they do.

However! This theorem applies to any structure isomorphic to a succession setup, specifically, the hope is that something like it applies to thought. Imagine: At some point in time there is a great number of different thoughts we could have, so we choose one (say, a plan) and place it somewhere (say, in the phonological loop), then we receive some sensory datum, unknown a priori but independent of what went on in our mind, and finally, we combine both of those pieces of information to choose an action to take, perhaps randomly. As long as thoughts can encode effectively arbitrary behaviour when taking them into account, we can treat the choice of thought as a choice of the future agent we will be.

In favour of this way of viewing things we have that the systems that let us produce thoughts and actual actions must have a lot in common (think e.g. about how you can probably switch between speaking mentally and out loud). Against it, we have, e.g. that thoughts serve to perform computation and that while we are computing, we aren’t rational, that we don’t actually consider the tree before choosing a thought, that we can have preferences over the thoughts themselves, that thought can be performed externally (e.g. on paper), that thinking allows us to make better predictions, so considering the observation probabilities as independent of thoughts might not be very helpful, that we don’t actually get lotteries as input, but must derive them instead, etc.

But the point isn’t that the setup applies perfectly, but that just as that idealised setup lets us derive expected utility maximisation, so might some structural version, yet to be discovered, let us derive the corresponding versions of concepts such as utility and what an agent even is, which could then serve as the foundation for the design of artificial altruism.

^
A cleaner definition would have been: whenever it is possible for the leader to lead to the same outcome distribution as they would by picking themselves, they should do so. But that “whenever” includes all the cases where there is at least one valid successor, so this condition is stronger and therefore worse.
^
The convex hull of a set of points is the minimum hyper-volume convex set: Imagine wrapping a balloon over the points and then removing as much air as possible without getting it to bend inwards or pass trough the points. For a finite set of points, the result will have vertices that are among those points, but a point might not end up as a vertex if it is between (in the convex hull of) some other points.
^
The sets of vectors and reverse vectors are not closed because their complements contain the zero vector, but they aren’t open either. See the table on Wikipedia.