Under this formulation, FEP is very similar to RL-as-inference. But RL-as-inference is a generalization of a huge number of RL algorithms from Q-learning to LLM fine-tuning. This does kind of make sense if we think of FEP as a just a different way of looking at things, but it doesn’t really help us narrow down the algorithms that the brain is actually using. Perhaps that’s actually all FEP is trying to do though, and Friston has IIRC said things to that effect—that FEP is just a reframing/generalization and not an actual model of the underlying algorithms being employed.
There are some conceptual differences. In RL, you define a value function for all possible states. In active inference, you make desirable sense data a priori likely. Sensory space is not only lower-dimensional than (unobserved) state space, but you only need to define a single point in it, rather than a function on the whole space. It’s often a much more natural way of defining goals and is more similar to control theory than RL. You’re directly optimizing for a desired (and known) outcome rather than having to figure out what to optimize for by reinforcement. For example, if you want a robot to walk to some goal point, RL would have to make the robot walk around a bit, figure out that the goal point gives high reward, and then do it (in another rollout). In active inference (and control theory), the robot already knows where the goal point is (or rather, what the world looks like when standing at that point), and merely figures out a sequence of actions that get it there. Another difference is that active inference automatically balances exploration and exploitation, while in RL it’s usually a hyperparameter. In RL, it tends to look like doing many random actions early on, to figure out what gives reward, and later on do actions that keep the agent in high-reward states. In control theory, exploration is more bespoke, and built specifically for system identification (learning a model) or adaptive control (adjusting known parameters based on observations). In active inference, there’s no aimless flailing about, but the agent can do any kind of experiment that minimizes future uncertainty—testing what beliefs and actions are likely to achieve the desired sense data. Here’s a nice demo of that:
CAI augments the ‘natural’ probabilistic graphical model with exogenous optimality variables. 4 . In contrast, AIF leaves the structure of the graphical model unaltered and instead encodes value into the generative model directly. These two approaches lead to significant differences between their respective functionals. AIF, by contaminating the veridical generative model with value-imbuing biases, loses a degree of freedom compared to CAI which maintains a strict separation between the veridical generative model of the environment and its goals. In POMDPs, this approach results in CAI being sensitive to an ‘observation-ambiguity’ term which is absent in the AIF formulation. Secondly, the different methods for encoding the probability of goals – likelihoods in CAI and priors in AIF – lead to different exploratory terms in the objective functionals. Specifically, AIF is endowed with an expected information gain that CAI lacks. AIF approaches thus lend themselves naturally to goal-directed exploration whereas CAI mandates only random, entropy-maximizing exploration.
These different ways of encoding goals into probabilistic models also lend themselves to more philosophical interpretations. CAI, by viewing goals as an additional exogenous factor in an otherwise unbiased inference process, maintains a clean separation between veridical perception and control, thus maintaining the modularity thesis of separate perception and action modules (Baltieri & Buckley, 2018). This makes CAI approaches consonant with mainstream views in machine learning that see the goal of perception as recovering veridical representations of the world, and control as using this world-model to plan actions. In contrast, AIF elides these clean boundaries between unbiased perception and action by instead positing that biased perception (Tschantz, Seth, & Buckley, 2020) is crucial to adaptive action. Rather than maintaining an unbiased world model that predicts likely consequences, AIF instead maintains a biased generative model which preferentially predicts our preferences being fulfilled. Active-inference thus aligns closely with enactive and embodied approaches (Baltieri & Buckley, 2019; Clark, 2015) to cognition, which view the action-perception loop as a continual flow rather than a sequence of distinct stages.
Nice, CAI is another similar approach, kind of in between the three already mentioned. I think “losing a degree of freedom” is very much a good thing, both computationally and functionally.
Under this formulation, FEP is very similar to RL-as-inference. But RL-as-inference is a generalization of a huge number of RL algorithms from Q-learning to LLM fine-tuning. This does kind of make sense if we think of FEP as a just a different way of looking at things, but it doesn’t really help us narrow down the algorithms that the brain is actually using. Perhaps that’s actually all FEP is trying to do though, and Friston has IIRC said things to that effect—that FEP is just a reframing/generalization and not an actual model of the underlying algorithms being employed.
There are some conceptual differences. In RL, you define a value function for all possible states. In active inference, you make desirable sense data a priori likely. Sensory space is not only lower-dimensional than (unobserved) state space, but you only need to define a single point in it, rather than a function on the whole space. It’s often a much more natural way of defining goals and is more similar to control theory than RL. You’re directly optimizing for a desired (and known) outcome rather than having to figure out what to optimize for by reinforcement. For example, if you want a robot to walk to some goal point, RL would have to make the robot walk around a bit, figure out that the goal point gives high reward, and then do it (in another rollout). In active inference (and control theory), the robot already knows where the goal point is (or rather, what the world looks like when standing at that point), and merely figures out a sequence of actions that get it there.
Another difference is that active inference automatically balances exploration and exploitation, while in RL it’s usually a hyperparameter. In RL, it tends to look like doing many random actions early on, to figure out what gives reward, and later on do actions that keep the agent in high-reward states. In control theory, exploration is more bespoke, and built specifically for system identification (learning a model) or adaptive control (adjusting known parameters based on observations). In active inference, there’s no aimless flailing about, but the agent can do any kind of experiment that minimizes future uncertainty—testing what beliefs and actions are likely to achieve the desired sense data. Here’s a nice demo of that:
See https://arxiv.org/abs/2006.12964:
Nice, CAI is another similar approach, kind of in between the three already mentioned. I think “losing a degree of freedom” is very much a good thing, both computationally and functionally.
Yeah my understanding is that FEP is meant to be quite general, the P and Q are doing a lot of the theory’s work for it.
Chapter 5 describes how you might apply it to the human brain in particular.