mattmacdermott comments on Utility Maximization = Description Length Minimization

mattmacdermott 28 Jun 2023 22:58 UTC
11 points
3
It’s worth emphasising just how closely related it is. Fristons’ expected free energy of a policy is $G (π) = E_{Q (s_{τ} ∣ π)} D_{K L} [Q (s_{τ} ∣ π) ∣∣ Q (s_{τ} ∣ o_{τ})] - E_{Q (s_{τ}, o_{τ} ∣ π)} ln P (o_{τ})$ , where the first term is the expected information gained by following the policy and the second the expected ‘extrinsic value’.
The extrinsic value term $- E_{Q (s_{τ}, o_{τ} ∣ π)} ln P (o_{τ})$ , translated into John’s notation and setup, is precisely $E [- log P (X | M_{2}) ∣ M_{1} (θ)]$ . Where John has optimisers choosing $θ$ to minimise the cross-entropy of $X$ under $M_{2}$ with respect to $X$ under $M_{1}$ , Friston has agents choosing $π$ to minimise the cross-entropy of preferences ( $P$ ) with respect to beliefs ( $Q$ ).
What’s more, Friston explicitly thinks of the extrinsic value term $- E_{Q (s_{τ}, o_{τ} ∣ π)} ln P (o_{τ})$ as a way of writing expected utility (see the image below from one of his talks). In particular $P$ is a way of representing real-valued preferences as a probability distribution. He often constucts $P$ by writing down a utility function and then taking a softmax (like in this rat T-maze example), which is exactly what John’s construction amounts to.
It seems that John is completely right when he speculates that he’s rediscovered an idea well-known to Karl Friston.
- Mateusz Bagiński 30 Jul 2025 10:20 UTC
  2 points
  0
  Parent
  What talk is this slide from?
  - mattmacdermott 30 Jul 2025 13:10 UTC
    2 points
    0
    Parent
    Sorry. can’t remember. Something done virtually, maybe during Covid.