[Request for Distillation] Coherence of Distributed Decisions With Different Inputs Implies Conditioning

johnswentworth25 Apr 2022 17:01 UTC

LW: 22 AF: 11

Distillation & Pedagogy AI Coherence Arguments

There’s been a lot of response to the Call For Distillers, so I’m experimenting with a new post format. This post is relatively short and contains only a simple mathematical argument, with none of the examples, motivation, more examples, or context which would normally make such a post readable. My hope is that someone else will write a more understandable version.

Jacob is offering a $500 bounty on a distillation.

Goal: following the usual coherence argument setup, show that if multiple decisions are each made with different input information available, then each decision maximizes expected utility given its input information.

We’ll start with the usual coherence argument setup: a system makes a bunch of choices, aiming to be pareto-optimal across a bunch of goals (e.g. amounts of various resources) $u_{1} \dots u_{m}$ . Pareto optimality implies that, at the pareto-optimum, there exists some vector of positive reals $P_{1} \dots P_{m}$ such that the choices maximize $\sum_{i} P_{i} u_{i}$ . Note that $P$ can be freely multiplied by a constant, so without loss of generality we could either take $P$ to sum to $1$ (in which case we might think of $P$ as probabilities) or take $P_{1}$ to be $1$ where $u_{1}$ is amount of money (in which case $P$ is a marginal price vector).

When the goals are all “the same goal” across different “worlds” $X$ , and we normalize $P [X]$ to sum to $1$ , $P [X]$ is a probability distribution over worlds in the usual Bayesian sense. The system then maximizes (over its actions $A$ ) $\sum_{X} P [X] u (A, X) = E_{X} [u (A, X)]$ , i.e. it’s an “expected utility maximizer”.

That’s the usual setup in a nutshell. Now, let’s say that the system makes multiple decisions $A = A_{1} \dots A_{n}$ in a distributed fashion. Each decision is made with only limited information: $A_{i}$ receives $f_{i} (X)$ as input (and nothing else). The system then chooses the functions $A_{i} (f_{i} (X))$ to maximize $E_{X} [u (A, X)]$ .

Consider the maximization problem for just $A_{i} (f_{i}^{*})$ , i.e. the optimal action for choice i given input $f_{i}^{*}$ . Expanded out, the objective is $E_{X} [u (A, X)] = \sum_{X} u (A_{1} (f_{1} (X)), \dots, A_{i} (f_{i} (X)), \dots A_{n} (f_{n} (X)), X)$ .

Note that the only terms in that sum which actually depend on $A_{i} (f_{i}^{*})$ are those for which $f_{i} (X) = f_{i}^{*}$ . So, for purposes of choosing $A_{i} (f_{i}^{*})$ specifically, we can reduce the objective to

$\sum_{X : f_{i} (X) = f_{i}^{*}} u (A, X)$

… which is equal to $P [f_{i} (X) = f_{i}^{*}] E [u (A, X) | f_{i} (X) = f_{i}^{*}]$ . The $P [f_{i} (X) = f_{i}^{*}]$ multiplier is always positive and does not depend on $A_{i}$ , so we can drop it without changing the optimal $A_{i}$ . Thus, action $A_{i} (f_{i}^{*})$ maximizes the conditional expected value $E [u (A, X) | f_{i} (X) = f_{i}^{*}]$ .

Returning to the optimization problem for all of the actions simultaneously: any optimum for all actions must also be an optimum for each action individually (otherwise we could change one action to get a better result), so each action $A_{i} (f_{i}^{*})$ must maximize $E [u (A, X) | f_{i} (X) = f_{i}^{*}]$ .

A few notes on this:

We’ve implicitly assumed that actions do not influence which information is available to other actions (i.e. the actions are “spacelike separated”). That can be relaxed: let $f_{i}$ depend on both $X$ and previous actions $A_{< i}$ , and then $A_{i} (f_{i}^{*})$ will maximize $E [u (A, X) | f_{i} (A_{< i}, X) = f_{i}^{*}]$ ; the general structure of the proof carries over.
We’ve implicitly assumed that the action $A_{i}$ when $f_{i}^{*}$ is observed does not influence worlds where $f_{i}^{*}$ is not observed (i.e. no Newcomblike shenanigans). We can still handle Newcomblike problems if we use FDT, in which case the action function would appear in more than one place.
As usual with coherence arguments, we’re establishing conditions which must be satisfied (by a pareto-optimal system with the given objectives); the conditions do not necessarily uniquely specify the system’s behavior. The classic example is that $P [X]$ might not be unique. Once we have distributed decisions there may also be “local optima” such that each individual action is optimal but the actions are not jointly optimal; that’s another form of non-uniqueness.

What links here?

johnswentworth25 Apr 2022 17:01 UTC

LW: 22 AF: 11

14 comments2 min readLW link

Distillation & Pedagogy AI Coherence Arguments

johnswentworth 25 Apr 2022 17:12 UTC
LW: 5 AF: 3
AF
I haven’t put a distillation bounty on this, but if anyone else wants to do so, leave a comment and I’ll link to it in the OP.
- jacobjacob 26 Apr 2022 1:37 UTC
  LW: 4 AF: 2
  AF Parent
  How long would it have taken you to do the distillation step yourself for this one? I’d be happy to post a bounty, but price depends a bit on that.
  - johnswentworth 26 Apr 2022 4:01 UTC
    LW: 2 AF: 2
    AF Parent
    Short answer: about one full day.
    Longer answer: normally something like this would sit in my notebook for a while, only informing my own thinking. It would get written up as a post mainly if it were adjacent to something which came up in conversation (either on LW or in person). I would have the idea in my head from the conversation, already be thinking about how best to explain it, chew on it overnight, and then if I’m itching to produce something in the morning I’d bang out the post in about 3-4 hours.
    Alternative paths: I might need this idea as background for something else I’m writing up, or I might just be in a post-writing mood and not have anything more ready-to-go. In either of those cases, I’d be starting more from scratch, and it would take about a full day.
    - jacobjacob 26 Apr 2022 22:28 UTC
      LW: 5 AF: 4
      AF Parent
      Cool, I’ll add $500 to the distillation bounty then, to be paid out to anyone you think did a fine job of distilling the thing :) (Note: this should not be read as my monetary valuation for a day of John work!)
      (Also, a cooler pay-out would be basis points, or less, of Wentworth impact equity)
      What links here?
      [Request for Distillation] Coherence of Distributed Decisions With Different Inputs Implies Conditioning by johnswentworth (25 Apr 2022 17:01 UTC; 22 points)
      - johnswentworth 27 Apr 2022 21:19 UTC
        LW: 2 AF: 2
        AF Parent
        to be paid out to anyone you think did a fine job of distilling the thing
        Needing to judge submissions is the main reason I didn’t offer a bounty myself. Read the distillation, and see if you yourself understand it. If “Coherence of Distributed Decisions With Different Inputs Implies Conditioning” makes sense as a description of the idea, then you’ve probably understood it.
        If you don’t understand it after reading an attempted distillation, then it wasn’t distilled well enough.
        jacobjacob 16 May 2022 1:18 UTC
        LW: 4 AF: 3
        AF Parent
        An update on this: sadly I underestimated how busy I would be after posting this bounty. I spent 2h reading this and Thomas post the other day, but didn’t not manage to get into the headspace of evaluating the bounty (i.e. making my own interpretation of John’s post, and then deciding whether Thomas’ distillation captured that). So I will not be evaluating this. (Still happy to pay if someone else I trust claim Thomas’ distillation was sufficient.) My apologies to John and Thomas about that.
      - Thomas Kwa 27 Apr 2022 20:31 UTC
        1 point
        Parent
        I will attempt to fill this bounty. Does the fact that I’m on a grant preclude me from claiming it?
        jacobjacob 3 May 2022 19:01 UTC
        2 points
        Parent
        Sorry for late reply: no, it does not.
Thomas Kwa 3 May 2022 20:39 UTC
3 points
I don’t understand why this setup needs multiple decisions (even after asking johnswentworth).
- Thomas: Why doesn’t this setup work with a single decision (say, a poker player imagining her opponent raising, calling, or folding?)
- John (as understood by me): If the agent only ever receives one piece of information, the sense in which it uses conditional probability is a bit trivial. Suppose the agent has an explicit world-model and $U ∋ X$ is its set of all possible worlds. If the agent is only receiving a single piece of information $f (X)$ which constrains the set of worlds to $S \subseteq U$ , then the agent can have U=S, being unable to imagine any world inconsistent with what it sees. For this agent, conditioning on f is vacuous. But if the agent is making multiple decisions based on different information $f_{i}$ that constrain the possible worlds to different sets $S_{i}$ , it must be able to reason about a set of worlds larger than any particular $S_{i}$ .
- Thomas: But doesn’t the agent need to do this for a single decision, given that it could observe either $f^{*}$ or some other information ${f^{*}}^{'}$ ?
- Here I don’t know what to respond, nor does my model of John. Maybe the answer is it doesn’t have to construct a lookup table for $A (f^{*})$ and can just act “on the fly”? This doesn’t make sense, because it could do the same thing across multiple decisions. Also, there’s a weird thing going on where the math in the post is a behavioral claim: “we can model the agent as using conditional expected value”, but the interpretation, including the second bullet point, references the agent’s possible structure.
What links here?
- Deriving Conditional Expected Utility from Pareto-Efficient Decisions by Thomas Kwa (5 May 2022 3:21 UTC; 24 points)
- johnswentworth 3 May 2022 22:41 UTC
  3 points
  Parent
  Yeah, my explanation of that wasn’t very good. Let me try again.
  If there’s just one decision, the agent maximizes $E [u (A, X) | f (X) = f^{*}]$ . But then we could define a behaviorally-equivalent utility function $u^{'} (A, f^{*}) = E [u (A, X) | f (X) = f^{*}]$ ; there isn’t necessarily a sense in which the agent cares about $X$ rather than $f^{*}$ .
  With many decisions, we could perform a similar construction to get a behaviorally-equivalent utility function $u^{'} (A, f_{1}^{*}, . . ., f_{n}^{*})$ . But if there’s enough decisions with enough different inputs then $f_{1}^{*}, . . ., f_{n}^{*}$ may be bigger than $X$ - i.e. it may have more dimensions/more bits. Then representing all these different decision-inputs as being calculated from one “underlying world” $X$ yields a model which is “more efficient”, in some sense.
  Another way to put it: with just one decision, ~any $u^{'} (A, f^{*})$ should be behaviorally equivalent to a $E [u (A, X) | f (X) = f^{*}]$ -maximizer for some $u, f$ . But with many decisions, that should not be the case. (Though I have not yet actually come up with an example to prove that claim.)
  - Thomas Kwa 4 May 2022 2:48 UTC
    3 points
    Parent
    edit: the numbers are wrong here; go see my distillation for the correct numbers
    Proposed example to check my understanding:
    Here, $X = (x_{1}, x_{2}) \in X$ where $X$ is the 10 black points representing possible worlds.
    We have three different observations $f_{1}, f_{2}, f_{3}$ , each of which has 4 possible outcomes and gives partial information about X. Call the set of combinations of observations $O$ .
    It seems that
    $| X | = | f (X) | = 10$ while $| O | = | f_{1} (X) \times f_{2} (X) \times f_{3} (X) | = 64$ : there are more combinations of partial observations than possible worlds.
    Therefore, storing a representation of possible values of X might be simpler than storing a representation of possible values $(f_{1}^{*}, f_{2}^{*}, f_{3}^{*})$
    Also, this notion of conditional expected utility actually constrains the behavior; for an action space $A$ not all of the $| A |^{64}$ policies which map $O \to A$ correspond to conditional expected utility maximization.
    If we were not conditioning, there would be only $| A \times X |$ policies that are expected utility maximization.
    If we are conditioning, it seems like there are $| A |^{\sum_{i} | f_{i} (X) |} = | A |^{12}$ such policies—the agent is able to make decisions given 3 types of possible information $i = 1, 2, 3$ , and each possible type of information i has $| f_{i} (X) | = 4$ .
    So by pigeonhole not every policy over distributed decisions is a conditional expected utility maximizer?
    - johnswentworth 4 May 2022 3:42 UTC
      2 points
      Parent
      I didn’t check the math in your counting argument, but qualitatively that is correct.
johfst 2 May 2022 4:04 UTC
3 points
Registering that I will attempt this. Not sure if I will be able to produce something publishable in a reasonable amount of time, but I expect to learn from the attempt.

Also, I think some probabilities were left out of some sums there? Was that intentional, or a typo?
- johnswentworth 2 May 2022 4:38 UTC
  2 points
  Parent
  Typo. Good catch, thanks.