Edouard Harris comments on Utility Maximization = Description Length Minimization

Edouard Harris 15 Jul 2021 23:09 UTC
LW: 9 AF: 6
1
AF
Late comment here, but I really liked this post and want to make sure I’ve fully understood it. In particular there’s a claim near the end which says: if $H (X)$ is not fixed, then we can build equivalent models $M_{1}^{'}$ , $M_{2}^{'}$ for which it is fixed. I’d like to formalize this claim to make sure I’m 100% clear on what it means. Here’s my attempt at doing that:
For any pair of models $M_{1} (θ)$ , $M_{2}$ where $H (X_{0} | M_{1} (θ)) \neq H (X_{0} | M_{1} (θ^{'}))$ , there exists a variable $X$ (of which $X_{0}$ is a subset) and a pair of models $M_{1}^{'} (θ)$ , $M_{2}^{'}$ such that 1) $H (X | M_{1}^{'} (θ)) = H (X | M_{1}^{'} (θ^{'}))$ for any $θ$ , $θ^{'}$ ; and 2) the behavior of the system is the same under $M_{1}^{'} (θ)$ , $M_{2}^{'}$ as it was under $M_{1} (θ)$ , $M_{2}$ .
To satisfy this claim, we construct our $X$ as the conjunction of $X_{0}$ and some “extra” component $X_{0}^{'}$ . e.g., $X_{0} \in {heads, tails}$ for a coin flip, $X_{0}^{'} \in {1, 2, 3, 4, 5, 6}$ for a die roll, and so $X = X_{0} X_{0}^{'} \in {(heads, 1), (tails, 1), (heads, 2), . . .}$ is the conjunction of the coin flip and the die roll, and the domain of $X$ is the outer product of the coin flip domain and of the die roll domain.
Then we construct our $M_{1}^{'} (θ)$ by imposing 1) $P (X_{0} X_{0}^{'} | M_{1}^{'} (θ)) = P (X_{0} | M_{1}^{'} (θ)) P (X_{0}^{'} | M_{1}^{'} (θ))$ (i.e., $X_{0}$ , $X_{0}^{'}$ are logically independent given $M_{1}^{'} (θ)$ for every $θ$ ); and 2) $P (X_{0} | M_{1}^{'} (θ)) = P (X_{0} | M_{1} (θ))$ (i.e., the marginal prob given $M_{1}^{'} (θ)$ equals the original prob under $M_{1} (θ)$ ).
Finally we construct $M_{2}^{'}$ by imposing the analogous 2 conditions that we did for $M_{1}^{'}$ : 1) $P (X_{0} X_{0}^{'} | M_{2}^{'}) = P (X_{0} | M_{2}^{'}) P (X_{0}^{'} | M_{2}^{'})$ and 2) $P (X_{0} | M_{2}^{'}) = P (X_{0} | M_{2})$ . But we also impose the extra condition 3) $P (X_{0}^{'} | M_{2}^{'}) = \frac{1}{| X_{0}^{'} |}$ (assuming finite sets, etc.).
We can always find $X$ , $M_{1}^{'} (θ)$ and $M_{2}^{'}$ that satisfy the above conditions, and with these choices we end up with $H (X | M_{1}^{'} (θ)) = H (X | M_{1}^{'} (θ^{'}))$ for all $θ$ , $θ^{'}$ (i.e., $H$ is fixed) and $E [- log (P (X | M_{2}^{'})) | M_{1}^{'} (θ)] = E [- log (P (X_{0} | M_{2}^{'})) | M_{1}^{'} (θ)] + constant$ (i.e., the system retains the same dynamics).
Is this basically right? Or is there something I’ve misunderstood?
- johnswentworth 17 Jul 2021 12:42 UTC
  LW: 5 AF: 4
  0
  AF Parent
  The construction is correct.
  Note that for $M_{2}$ , conceptually we don’t need to modify it, we just need to use the original $M_{2}$ but apply it only to the subcomponents of the new $X$ -variable which correspond to the original $X$ -variable. Alternatively, we can take the approach you do: construct $M_{2}^{'}$ which has a distribution over the new $X$ , but “doesn’t say anything” about the new components, i.e. the it’s just maxentropic over the new components. This is equivalent to ignoring the new components altogether.
  - Edouard Harris 19 Jul 2021 20:20 UTC
    LW: 4 AF: 4
    0
    AF Parent
    Ah yes, that’s right. Yeah, I just wanted to make this part fully explicit to confirm my understanding. But I agree it’s equivalent to just let $M_{2}^{'}$ ignore the extra $X_{0}^{'}$ (or whatever) component.
    Thanks very much!