It’s worth emphasising just how closely related it is. Fristons’ expected free energy of a policy isG(π)=EQ(sτ∣π)DKL[Q(sτ∣π)∣∣Q(sτ∣oτ)]−EQ(sτ,oτ∣π)lnP(oτ), where the first term is the expected information gained by following the policy and the second the expected ‘extrinsic value’.
The extrinsic value term −EQ(sτ,oτ∣π)lnP(oτ), translated into John’s notation and setup, is precisely E[−logP(X|M2)∣M1(θ)]. Where John has optimisers choosing θ to minimise the cross-entropy of X under M2 with respect to X under M1, Friston has agents choosing π to minimise the cross-entropy of preferences (P) with respect to beliefs (Q).
What’s more, Friston explicitly thinks of the extrinsic value term −EQ(sτ,oτ∣π)lnP(oτ) as a way of writing expected utility (see the image below from one of his talks). In particular P is a way of representing real-valued preferences as a probability distribution. He often constucts P by writing down a utility function and then taking a softmax (like in this rat T-maze example), which is exactly what John’s construction amounts to.
It seems that John is completely right when he speculates that he’s rediscovered an idea well-known to Karl Friston.
It’s worth emphasising just how closely related it is. Fristons’ expected free energy of a policy isG(π)=EQ(sτ∣π)DKL[Q(sτ∣π)∣∣Q(sτ∣oτ)]−EQ(sτ,oτ∣π)lnP(oτ), where the first term is the expected information gained by following the policy and the second the expected ‘extrinsic value’.
The extrinsic value term −EQ(sτ,oτ∣π)lnP(oτ), translated into John’s notation and setup, is precisely E[−logP(X|M2)∣M1(θ)]. Where John has optimisers choosing θ to minimise the cross-entropy of X under M2 with respect to X under M1, Friston has agents choosing π to minimise the cross-entropy of preferences (P) with respect to beliefs (Q).
What’s more, Friston explicitly thinks of the extrinsic value term −EQ(sτ,oτ∣π)lnP(oτ) as a way of writing expected utility (see the image below from one of his talks). In particular P is a way of representing real-valued preferences as a probability distribution. He often constucts P by writing down a utility function and then taking a softmax (like in this rat T-maze example), which is exactly what John’s construction amounts to.
It seems that John is completely right when he speculates that he’s rediscovered an idea well-known to Karl Friston.