Edouard Harris comments on When Most VNM-Coherent Preference Orderings Have Convergent Instrumental Incentives

Edouard Harris 20 Aug 2021 20:00 UTC
LW: 11 AF: 8
AF
Thanks for writing this.
I have one point of confusion about some of the notation that’s being used to prove Lemma 3. Apologies for the detail, but the mistake could very well be on my end so I want to make sure I lay out everything clearly.
First, $ϕ$ is being defined here as an outcome permutation. Presumably this means that 1) $ϕ (o_{i}) = o_{j}$ for some $o_{i}$ , $o_{j}$ ; and 2) $ϕ$ admits a unique inverse $ϕ^{- 1} (o_{j}) = o_{i}$ . That makes sense.
We also define lotteries over outcomes, presumably as, e.g., $L = \sum_{i = 1}^{n} ℓ_{i} o_{i}$ , where $ℓ_{i}$ is the probability of outcome $o_{i}$ . Of course we can interpret the $o_{i}$ geometrically as mutually orthogonal unit vectors, so this lottery defines a point on the $n$ -simplex. So far, so good.
But the thing that’s confusing me is what this implies for the definition of $ϕ^{- 1} (L)$ . Because $ϕ$ is defined as a permutation over outcomes (and not over probabilities of outcomes), we should expect this to be
$ϕ^{- 1} (L) = ϕ^{- 1} (n \sum i = 1 ℓ_{i} o_{i}) = n \sum i = 1 ℓ_{i} ϕ^{- 1} (o_{i})$
The problem is that this seems to give a different EV from the lemma:
$E_{o \sim ϕ^{- 1} (L)} [u (o)] = n \sum i = 1 ℓ_{i} u (ϕ^{- 1} (o_{i})) = E_{o \sim L} [u (ϕ^{- 1} (o))]$
(Note that I’m using $o$ as the dummy variable rather than $ℓ$ , but the LHS above should correspond to line 2 of the proof.) Doing the same thing for the $M$ lottery gives an analogous result. And then looking at the inequality that results suggests that lemma 3 should actually be ” $≺_{ϕ}$ induces $u (ϕ^{- 1} (o_{i}))$ ” as opposed to ” $≺_{ϕ}$ induces $u (ϕ (o_{i}))$ ”.
(As a concrete example, suppose we have a lottery $L = ℓ_{1} o_{1} + ℓ_{2} o_{2} + ℓ_{3} o_{3}$ with the permutation $ϕ^{- 1} (o_{1}) = o_{2}$ , $ϕ^{- 1} (o_{2}) = o_{3}$ , $ϕ^{- 1} (o_{3}) = o_{1}$ . Then $ϕ^{- 1} (L) = ℓ_{1} o_{2} + ℓ_{2} o_{3} + ℓ_{3} o_{1}$ and our EV is
$E_{o \sim ϕ^{- 1} (L)} [u (o)] = ℓ_{1} u (o_{2}) + ℓ_{2} u (o_{3}) + ℓ_{3} u (o_{1}) = E_{o \sim L} [u (ϕ^{- 1} (o))]$
Yet $E_{o \sim L} [u (ϕ (o))] = ℓ_{1} u (o_{3}) + ℓ_{2} u (o_{1}) + ℓ_{3} u (o_{2}) \neq E_{o \sim ϕ^{- 1} (L)} [u (o)]$ which appears to contradict the lemma as stated.)
Note that even if this analysis is correct, it doesn’t invalidate your main claim. You only really care about the existence of a bijection rather than what that bijection is — the fact that your outcome space is finite ensures that the proportion of orbit elements that incentivize power seeking remains the same either way. (It could have implications if you try to extend this to a metric space, though.)
Again, it’s also possible I’ve just misunderstood something here — please let me know if that’s the case!
- TurnTrout 22 Aug 2021 18:34 UTC
  LW: 3 AF: 3
  AF Parent
  Thanks! I think you’re right. I think I actually should have defined $≻_{ϕ}$ differently, because writing it out, it isn’t what I want. Having written out a small example, intuitively, $L ≻_{ϕ} M$ should hold iff $ϕ (L) ≻ ϕ (M)$ , which will also induce $u (ϕ (o_{i}))$ as we want.
  I’m not quite sure what the error was in the original proof of Lemma 3; I think it may be how I converted to and interpreted the vector representation. Probably it’s more natural to represent $E_{ℓ \sim ϕ^{- 1} (L)} [u (ℓ)]$ as $u^{⊤} (P_{ϕ^{- 1}} l) = (u^{⊤} P_{ϕ^{- 1}}) l$ , which makes your insight obvious.
  The post is edited and the issues should now be fixed.
  - Edouard Harris 24 Aug 2021 22:26 UTC
    LW: 1 AF: 1
    AF Parent
    No problem! Glad it was helpful. I think your fix makes sense.
    I’m not quite sure what the error was in the original proof of Lemma 3; I think it may be how I converted to and interpreted the vector representation.
    Yeah, I figured maybe it was because the dummy variable $ℓ$ was being used in the EV to sum over outcomes, while the vector $l$ was being used to represent the probabilities associated with those outcomes. Because $ℓ$ and $l$ are similar it’s easy to conflate their meanings, and if you apply $ϕ$ to the wrong one by accident that has the same effect as applying $ϕ^{- 1}$ to the other one. In any case though, the main result seems unaffected.
    Cheers!