I think this is a really cool research agenda. I can also try to give my “skydiver’s perspective from 3000 miles in the air” overview of what I think expected free energy minimisation means, though I am by no means an expert. Epistemic status: this is a broad extrapolation of some intuitions I gained from reading a lot of papers, it may be very wrong.
In general, I think of free energy minimisation as a class of solutions for the problem of predicting complex systems behaviour, in line with other variational principles in physics. Thus, it is an attempt to use simple physical rules like “the ball rolls down the slope” to explain very complicated outcomes like “I decide to build a theme park with roller coasters in it”. In this case, the rule is “free energy is minimised”, but unlike a simple physical system whose dimensionality is very literally visible, VFE is minimised in high dimensional probability spaces.
Consider the concrete case below: there are five restaurants in a row and you have to pick one to go to. The intuitive physical interpretation is that you can be represented by a point particle moving to one of five coordinates, all relatively close by in the three dimensional XYZ coordinate space. However, if we assume that this is just some standard physical process you’ll end up with highly unintuitive behaviour (why does the particle keep drifting right and left in the middle of these coordinates, and then eventually go somewhere that isn’t the middle?). Instead we might say that in an RL sense there is a 5 dimensional action space and you must pick a dimension to maximise expected reward. Free energy minimisation is a rule that says that your action is the one that minimises variation between the predicted outcome your brain produces and the final outcome that your brain observes—which can happen either if your brain is very good at predicting the future or if you act to make your prediction come true. A preference in this case is a bias in the prediction (you can see yourself going to McDonald’s more, in some sense, and you feel some psychological aversion/repulsive force moving you away from Burger King) that is then satisfied by you going to the restaurant you are most attracted to. Of course this is just a single agent interpretation and with multiple subagents you can imagine valleys and peaks in the high dimensional probability space, which is resolved when you reach some minima that can be satisfied by action.
That surprisingly straight line reminds me of what happens when you use noise to regularise an otherwise decidedly non linear function: https://www.imaginary.org/snapshot/randomness-is-natural-an-introduction-to-regularisation-by-noise