Yeah, Friston is a bit notorious for not explaining his ideas clearly enough for others to understand easily. It took me a while to wrap my head around what all his equations were up to and what exactly “active inference” entails, but the concepts are relatively straightforward once it all clicks.
You can think of “free energy” as the discrepancy between prediction and observation, like the potential energy of a spring stretched between them. Minimizing free energy is all about finding states with the highest probability and setting things up such that the highest probability states are those where your model predictions match your observations. In statistical mechanics, the probability of a particle occupying a particular state is proportional to the exponential of the negative potential energy of that state. That’s why air pressure exponentially drops off with altitude (to a first approximation, p(h)∝exp(−mghRT)). For a normal distribution:
p(x)=1√2πσ2exp(−12(x−μ)2σ2)
the energy is a parabola:
E(x)=−log(p(x))=12(x−μ)2σ2+C
This is exactly the energy landscape you see for an ideal Newtonian spring with rest length μ and spring constant k=1σ2=precision. Physical systems always seek the configuration with the lowest free energy (e.g., a stretched spring contracting towards its rest length). In the context of mind engineering, x might represent an observation, μ the prediction of the agent’s internal model of the world, and 1σ2 the expected precision of that prediction. Of course, these are all high-dimensional vectors, so matrix math is involved (Friston always uses Π for the precision matrix).
For rational agents, free energy minimization involves adjusting the hidden variables in an agent’s internal predictive model (perception) or adjusting the environment itself (action) until “predictions” and “observations” align to within the desired/expected precision. (For actions, “prediction” is a bit of a misnomer; it’s actually a goal or a homeostatic set point that the agent is trying to achieve. This is what “active inference” is all about, though, and has caused free energy people to talk about motor outputs from the brain as being “self-fulfilling prophecies”.) The predictive models that the agent uses for perception are actually built hierarchically, with each level acting as a dynamic generative model making predictions about the level below. Higher levels send predictions down to compare with the “observations” (state) of the level below, and lower levels send prediction errors back up to the higher levels in order to adjust the hidden variables through something like online gradient descent. This process is called “predictive coding” and leads to the minimization of the free energy between all levels in the hierarchy.
My little limerick was alluding to the idea that you could build an AGI to include a generative model of human behavior, using predictive coding to find the goals, policies, instinctual drives, and homeostatic set points that best explain the human’s observed behavior. Then you could route these goals and policies to the AGI’s own teleological system. That is, make the human’s goals and drives, whatever it determines them to be using its best epistemological techniques, into its own goals and drives. Whether this could solve AI alignment would take some research to figure out. (Or just point out the glaring flaws in my reasoning here.)
Yeah, Friston is a bit notorious for not explaining his ideas clearly enough for others to understand easily. It took me a while to wrap my head around what all his equations were up to and what exactly “active inference” entails, but the concepts are relatively straightforward once it all clicks.
You can think of “free energy” as the discrepancy between prediction and observation, like the potential energy of a spring stretched between them. Minimizing free energy is all about finding states with the highest probability and setting things up such that the highest probability states are those where your model predictions match your observations. In statistical mechanics, the probability of a particle occupying a particular state is proportional to the exponential of the negative potential energy of that state. That’s why air pressure exponentially drops off with altitude (to a first approximation, p(h)∝exp(−mghRT)). For a normal distribution:
p(x)=1√2πσ2exp(−12(x−μ)2σ2)
the energy is a parabola:
E(x)=−log(p(x))=12(x−μ)2σ2+C
This is exactly the energy landscape you see for an ideal Newtonian spring with rest length μ and spring constant k=1σ2=precision. Physical systems always seek the configuration with the lowest free energy (e.g., a stretched spring contracting towards its rest length). In the context of mind engineering, x might represent an observation, μ the prediction of the agent’s internal model of the world, and 1σ2 the expected precision of that prediction. Of course, these are all high-dimensional vectors, so matrix math is involved (Friston always uses Π for the precision matrix).
For rational agents, free energy minimization involves adjusting the hidden variables in an agent’s internal predictive model (perception) or adjusting the environment itself (action) until “predictions” and “observations” align to within the desired/expected precision. (For actions, “prediction” is a bit of a misnomer; it’s actually a goal or a homeostatic set point that the agent is trying to achieve. This is what “active inference” is all about, though, and has caused free energy people to talk about motor outputs from the brain as being “self-fulfilling prophecies”.) The predictive models that the agent uses for perception are actually built hierarchically, with each level acting as a dynamic generative model making predictions about the level below. Higher levels send predictions down to compare with the “observations” (state) of the level below, and lower levels send prediction errors back up to the higher levels in order to adjust the hidden variables through something like online gradient descent. This process is called “predictive coding” and leads to the minimization of the free energy between all levels in the hierarchy.
My little limerick was alluding to the idea that you could build an AGI to include a generative model of human behavior, using predictive coding to find the goals, policies, instinctual drives, and homeostatic set points that best explain the human’s observed behavior. Then you could route these goals and policies to the AGI’s own teleological system. That is, make the human’s goals and drives, whatever it determines them to be using its best epistemological techniques, into its own goals and drives. Whether this could solve AI alignment would take some research to figure out. (Or just point out the glaring flaws in my reasoning here.)