[Request for Distillation] Coherence of Distributed Decisions With Different Inputs Implies Conditioning
There’s been a lot of response to the Call For Distillers, so I’m experimenting with a new post format. This post is relatively short and contains only a simple mathematical argument, with none of the examples, motivation, more examples, or context which would normally make such a post readable. My hope is that someone else will write a more understandable version.
Jacob is offering a $500 bounty on a distillation.
Goal: following the usual coherence argument setup, show that if multiple decisions are each made with different input information available, then each decision maximizes expected utility given its input information.
We’ll start with the usual coherence argument setup: a system makes a bunch of choices, aiming to be pareto-optimal across a bunch of goals (e.g. amounts of various resources) . Pareto optimality implies that, at the pareto-optimum, there exists some vector of positive reals such that the choices maximize . Note that can be freely multiplied by a constant, so without loss of generality we could either take to sum to (in which case we might think of as probabilities) or take to be where is amount of money (in which case is a marginal price vector).
When the goals are all “the same goal” across different “worlds” , and we normalize to sum to , is a probability distribution over worlds in the usual Bayesian sense. The system then maximizes (over its actions ) , i.e. it’s an “expected utility maximizer”.
That’s the usual setup in a nutshell. Now, let’s say that the system makes multiple decisions in a distributed fashion. Each decision is made with only limited information: receives as input (and nothing else). The system then chooses the functions to maximize .
Consider the maximization problem for just , i.e. the optimal action for choice i given input . Expanded out, the objective is .
Note that the only terms in that sum which actually depend on are those for which . So, for purposes of choosing specifically, we can reduce the objective to
… which is equal to . The multiplier is always positive and does not depend on , so we can drop it without changing the optimal . Thus, action maximizes the conditional expected value .
Returning to the optimization problem for all of the actions simultaneously: any optimum for all actions must also be an optimum for each action individually (otherwise we could change one action to get a better result), so each action must maximize .
A few notes on this:
We’ve implicitly assumed that actions do not influence which information is available to other actions (i.e. the actions are “spacelike separated”). That can be relaxed: let depend on both and previous actions , and then will maximize ; the general structure of the proof carries over.
We’ve implicitly assumed that the action when is observed does not influence worlds where is not observed (i.e. no Newcomblike shenanigans). We can still handle Newcomblike problems if we use FDT, in which case the action function would appear in more than one place.
As usual with coherence arguments, we’re establishing conditions which must be satisfied (by a pareto-optimal system with the given objectives); the conditions do not necessarily uniquely specify the system’s behavior. The classic example is that might not be unique. Once we have distributed decisions there may also be “local optima” such that each individual action is optimal but the actions are not jointly optimal; that’s another form of non-uniqueness.