Jon Zero comments on Utility Maximization = Description Length Minimization

Jon Zero 18 Feb 2021 21:55 UTC
4 points
If $M_{1}$ were so general that by judicious choice of $θ$ you could impose an arbitrary distribution on $X$ then you’d pick the distribution that has $P (X = x^{⋆}) = 1$ , where $x^{⋆} = a r g m a x (u)$ . That is, a distribution where $H_{M_{1}} (X) = 0$ .
For me, that detracts a little from the entropy + KL divergence decomposition as applied to your utility maximisation problem. No balance point is reached; it’s all about the entropy term. Contrast with the bias/variance trade-off (which has applicability to the reference class problem), where balance between the two parts of the decomposition is very important.
- johnswentworth 18 Feb 2021 22:02 UTC
  3 points
  Parent
  It’s not quite all about the entropy term; it’s the KL-div term that determines which value $x^{*}$ is chosen. But you are correct insofar as this is not intended to be analogous to bias/variance tradeoff, and it’s not really about “finding a balance point” between the two terms.