God Help Us, Let’s Try To Understand Friston On Free Energy

I’ve been try­ing to delve deeper into pre­dic­tive pro­cess­ing the­o­ries of the brain, and I keep com­ing across Karl Fris­ton’s work on “free en­ergy”.

At first I felt bad for not un­der­stand­ing this. Then I re­al­ized I wasn’t alone. There’s an en­tire not-un­der­stand­ing-Karl-Fris­ton in­ter­net fan­dom, com­plete with its own par­ody Twit­ter ac­count and Markov blan­ket memes.

From the jour­nal Neu­ropsy­cho­anal­y­sis (which based on its name I pre­dict is a cen­ter of ex­per­tise in not un­der­stand­ing things):

At Columbia’s psy­chi­a­try de­part­ment, I re­cently led a jour­nal club for 15 PET and fMRI re­searhers, PhDs and MDs all, with well over $10 mil­lion in NIH grants be­tween us, and we tried to un­der­stand Fris­ton’s 2010 Na­ture Re­views Neu­ro­science pa­per – for an hour and a half. There was a lot of math­e­mat­i­cal knowl­edge in the room: three statis­ti­ci­ans, two physi­cists, a phys­i­cal chemist, a nu­clear physi­cist, and a large group of neu­roimagers – but ap­par­ently we didn’t have what it took. I met with a Prince­ton physi­cist, a Stan­ford neu­ro­phys­iol­o­gist, a Cold Springs Har­bor neu­ro­biol­o­gist to dis­cuss the pa­per. Again blanks, one and all.

Nor­mally this is the point at which I give up and say “screw it”. But al­most all the most in­ter­est­ing neu­ro­science of the past decade in­volves this guy in one way or an­other. He’s the most-cited liv­ing neu­ro­scien­tist, in­vented large parts of mod­ern brain imag­ing, and re­ceived of the pres­ti­gious Golden Brain Award for ex­cel­lence in neu­ro­science, which is some­how a real thing. His Am I Autis­tic – An In­tel­lec­tual Au­to­bi­og­ra­phy short es­say, writ­ten in a weirdly lu­cid style and de­scribing hijinks like de­riv­ing the Schrod­inger equa­tion for fun in school, is as con­sis­tent with ge­nius as any­thing I’ve ever read.

As for free en­ergy, it’s been dubbed “a unified brain the­ory” (Fris­ton 2010), a key through which “nearly ev­ery as­pect of [brain] anatomy and phys­iol­ogy starts to make sense” (Fris­ton 2009), “[the source of] the abil­ity of biolog­i­cal sys­tems to re­sist a nat­u­ral ten­dency to di­s­or­der” (Fris­ton 2012), an ex­pla­na­tion of how life “in­evitably and emer­gently” arose from the pri­mor­dial soup (Fris­ton 2013), and “a real life ver­sion of Isaac Asi­mov’s psy­chohis­tory” (de­scrip­tion here of Allen 2018).

I con­tinue to hope some sci­ence jour­nal­ist takes up the man­tle of ex­plain­ing this com­pre­hen­sively. Un­til that hap­pens, I’ve been work­ing to gather as many per­spec­tives as I can, to talk to the few neu­ro­scien­tists who claim to even par­tially un­der­stand what’s go­ing on, and to piece to­gether a par­tial un­der­stand­ing. I am not at all the right per­son to do this, and this is not an at­tempt to get a gears-level un­der­stand­ing – just the kind of pop-sci­ence-jour­nal­ism un­der­stand­ing that gives us a slight sum­mary-level idea of what’s go­ing on. My ul­te­rior mo­tive is to get to the point where I can un­der­stand Fris­ton’s re­cent ex­pla­na­tion of de­pres­sion, rele­vant to my in­ter­ests as a psy­chi­a­trist.

Sources in­clude Dr. Ali­anna Maren’s How To Read Karl Fris­ton (In The Origi­nal Greek), Wil­son and Golonka’s Free En­ergy: How the F*ck Does That Work, Ecolog­i­cally?, Alius Magaz­ine’s in­ter­view with Fris­ton, Ob­serv­ing Ideas, and the om­i­nously named Wo’s We­blog.

From these I get the im­pres­sion that part of the prob­lem is that “free en­ergy” is a com­pli­cated con­cept be­ing used in a lot of differ­ent ways.

First, free en­ergy is a spe­cific math­e­mat­i­cal term in cer­tain Bayesian equa­tions.

I’m get­ting this from here, which goes into much more de­tail about the math than I can man­age. What I’ve man­aged to ex­tract: Bayes’ the­o­rem, as always, is the math­e­mat­i­cal rule for de­ter­min­ing how much to weigh ev­i­dence. The brain is some­times called a Bayesian ma­chine, be­cause it has to cre­ate a co­her­ent pic­ture of the world by weigh­ing all the differ­ent data it gets – ev­ery­thing from mil­lions of pho­tore­cep­tors’ worth of vi­sion, to mil­lions of cochlear re­cep­tors worth of hear­ing, to all the other sense, to log­i­cal rea­son­ing, to past ex­pe­rience, and so on. But ac­tu­ally us­ing Bayes on all this data quickly gets com­pu­ta­tion­ally in­tractable.

Free en­ergy is a quan­tity used in “vari­a­tional Bayesian meth­ods”, a spe­cific com­pu­ta­tion­ally tractable way of ap­prox­i­mat­ing Bayes’ The­o­rem. Un­der this in­ter­pre­ta­tion, Fris­ton is claiming that the brain uses this Bayes-ap­prox­i­ma­tion al­gorithm. Min­miz­ing the free en­ergy quan­tity in this al­gorithm is equiv­a­lent-ish to try­ing to min­i­mize pre­dic­tion er­ror, try­ing to min­i­mize the amount you’re sur­prised by the world around you, and try­ing to max­i­mize ac­cu­racy of men­tal mod­els. This sounds in line with stan­dard pre­dic­tive pro­cess­ing the­o­ries. Un­der this in­ter­pre­ta­tion, the brain im­ple­ments pre­dic­tive pro­cess­ing through free en­ergy min­i­miza­tion.

Se­cond, free en­ergy min­i­miza­tion is an al­gorithm-ag­nos­tic way of say­ing you’re try­ing to ap­prox­i­mate Bayes as ac­cu­rately as pos­si­ble.

This comes from the same source as above. It also ends up equiv­a­lent-ish to all those other things like try­ing to be cor­rect in your un­der­stand­ing of the world, and to stan­dard pre­dic­tive pro­cess­ing.

Third, free en­ergy min­i­miza­tion is a claim that the fun­da­men­tal psy­cholog­i­cal drive is the re­duc­tion of un­cer­tainty.

I get this claim from the Alius in­ter­view, where Fris­ton says:

If you sub­scribe to the premise that that crea­tures like you and me act to min­i­mize their ex­pected free en­ergy, then we act to re­duce ex­pected sur­prise or, more sim­ply, re­solve un­cer­tainty. So what’s the first thing that we would do on en­ter­ing a dark room — we would turn on the lights. Why? Be­cause this ac­tion has epistemic af­for­dance; in other words, it re­solves un­cer­tainty (ex­pected free en­ergy). This sim­ple ar­gu­ment gen­er­al­izes to our in­fer­ences about (hid­den or la­tent) states of the world — and the con­tin­gen­cies that un­der­write those states of af­fairs.

The dis­cov­ery that the only hu­man mo­tive is un­cer­tainty-re­duc­tion might come as a sur­prise to hu­mans who feel mo­ti­vated by things like money, power, sex, friend­ship, or al­tru­ism. But the neu­ro­scien­tist I talked to about this says I am not mis­in­ter­pret­ing the in­ter­view. The claim re­ally is that un­cer­tainty-re­duc­tion is the only game in town.

In a sense, it must be true that there is only one hu­man mo­ti­va­tion. After all, if you’re Paris of Troy, get­ting offered the choice be­tween power, fame, and sex – then some men­tal mod­ule must con­vert these to a com­mon cur­rency so it can de­cide which is most at­trac­tive. If that cur­rency is, I dunno, dopamine in the stri­a­tum, then in some re­duc­tive sense, the only hu­man mo­ti­va­tion is in­creas­ing stri­atal dopamine (don’t philoso­phize at me, I know this is a stupid way of fram­ing things, but you know what I mean). Then the only weird thing about the free en­ergy for­mu­la­tion is iden­ti­fy­ing the com­mon cur­rency with un­cer­tainty-min­i­miza­tion, which is some spe­cific thing that already has an­other mean­ing.

I think the claim (briefly men­tioned eg here) is that your brain hacks eg the hunger drive by “pre­dict­ing” that your mouth is full of deli­cious food. Then, when your mouth is not full of deli­cious food, it’s a “pre­dic­tion er­ror”, it sets off all sorts of alarm bells, and your brain’s pre­dic­tive ma­chin­ery is con­fused and un­cer­tain. The only way to “re­solve” this “un­cer­tainty” is to bring re­al­ity into line with the pre­dic­tion and ac­tu­ally fill your mouth with deli­cious food. On the one hand, there is a lot of ba­sic neu­ro­science re­search that sug­gests some­thing like this is go­ing on. On the other, Wo’s writes about this fur­ther:

The ba­sic idea seems to go roughly as fol­lows. Sup­pose my in­ter­nal prob­a­bil­ity func­tion Q as­signs high prob­a­bil­ity to states in which I’m hav­ing a slice of pizza, while my sen­sory in­put sug­gests that I’m cur­rently not hav­ing a slice of pizza. There are two ways of bring­ing Q in al­ign­ment with my sen­sory in­put: (a) I could change Q so that it no longer as­signs high prob­a­bil­ity to pizza states, (b) I could grab a piece of pizza, thereby chang­ing my sen­sory in­put so that it con­forms to the pizza pre­dic­tions of Q. Both (a) and (b) would lead to a state in which my (new) prob­a­bil­ity func­tion Q’ as­signs high prob­a­bil­ity to my (new) sen­sory in­put d’. Com­pared to the pre­sent state, the sen­sory in­put will then have lower sur­prise. So any tran­si­tion to these states can be seen as a re­duc­tion of free en­ergy, in the un­am­bi­tious sense of the term.
Ac­tion is thus ex­plained as an at­tempt to bring one’s sen­sory in­put in al­ign­ment with one’s rep­re­sen­ta­tion of the world.
This is clearly nuts. When I de­cide to reach out for the pizza, I don’t as­sign high prob­a­bil­ity to states in which I’m already eat­ing the slice. It is pre­cisely my knowl­edge that I’m not eat­ing the slice, to­gether with my de­sire to eat the slice, that ex­plains my reach­ing out.
There are at least two fun­da­men­tal prob­lems with the sim­ple pic­ture just out­lined. One is that it makes lit­tle sense with­out pos­tu­lat­ing an in­de­pen­dent source of goals or de­sires. Sup­pose it’s true that I reach out for the pizza be­cause I hal­lu­ci­nate (as it were) that that’s what I’m do­ing, and I try to turn this hal­lu­ci­na­tion into re­al­ity. Where does the hal­lu­ci­na­tion come from? Surely it’s not just a tech­ni­cal glitch in my per­cep­tual sys­tem. Other­wise it would be a mirac­u­lous co­in­ci­dence that I mostly hal­lu­ci­nate pleas­ant and fit­ness-in­creas­ing states. Some fur­ther part of my cog­ni­tive ar­chi­tec­ture must trig­ger the hal­lu­ci­na­tions that cause me to act. (If there’s no such source, the much dis­cussed “dark room prob­lem” arises: why don’t we effi­ciently min­i­mize sen­sory sur­prise (and thereby free en­ergy) by sit­ting still in a dark room un­til we die?)
The sec­ond prob­lem is that effi­cient ac­tion re­quires keep­ing track of both the ac­tual state and the goal state. If I want to reach out for the pizza, I’d bet­ter know where my arms are, where the pizza is, what’s in be­tween the two, and so on. If my in­ter­nal rep­re­sen­ta­tion of the world falsely says that the pizza is already in my mouth, it’s hard to ex­plain how I man­age to grab it from the plate.
A closer look at Fris­ton’s pa­pers sug­gests that the above rough pro­posal isn’t quite what he has in mind. Re­call that min­i­miz­ing free en­ergy can be seen as an ap­prox­i­mate method for bring­ing one prob­a­bil­ity func­tion Q close to an­other func­tion P. If we think of Q as rep­re­sent­ing the sys­tem’s be­liefs about the pre­sent state, and P as a rep­re­sen­ta­tion of its goals, then we have the re­quired two com­po­nents for ex­plain­ing ac­tion. What’s un­usual is only that the goals are rep­re­sented by a prob­a­bil­ity func­tion, rather than (say) a util­ity func­tion. How would that work?
Here’s an idea. Given the pre­sent prob­a­bil­ity func­tion Q, we can map any goal state A to the tar­get func­tion Q^A, which is Q con­di­tion­al­ized on A — or per­haps on cer­tain sen­sory states that would go along with A. For ex­am­ple, if I suc­cess­fully reach out for the pizza, my be­lief func­tion Q will change to a func­tion Q^A that as­signs high prob­a­bil­ity to my arm be­ing out­stretched, to see­ing and feel­ing the pizza in my fingers, etc. Choos­ing an act that min­i­mizes the differ­ence be­tween my be­lief func­tion and Q^A is then tan­ta­mount to choos­ing an act that re­al­izes my goal.
This might lead to an in­ter­est­ing em­piri­cal model of how ac­tions are gen­er­ated. Of course we’d need to know more about how the tar­get func­tion Q^A is de­ter­mined. I said it comes about by (ap­prox­i­mately?) con­di­tion­al­iz­ing Q on the goal state A, but how do we iden­tify the rele­vant A? Why do I want to reach out for the pizza? Ar­guably the ex­pla­na­tion is that reach­ing out is likely (ac­cord­ing to Q) to lead to a more dis­tal state in which I eat the pizza, which I de­sire. So to com­pute the prox­i­mal tar­get prob­a­bil­ity Q^A we pre­sum­ably need to en­code the sys­tem’s more dis­tal goals and then use tech­niques from (stochas­tic) con­trol the­ory, per­haps, to de­rive more im­me­di­ate goals.
That ver­sion of the story looks much more plau­si­ble, and much less rev­olu­tion­ary, than the story out­lined above. In the pre­sent ver­sion, per­cep­tion and ac­tion are not two means to the same end — min­i­miz­ing free en­ergy. The free en­ergy that’s min­i­mized in per­cep­tion is a com­pletely differ­ent quan­tity than the free en­ergy that’s min­i­mized in ac­tion. What’s true is that both tasks in­volve math­e­mat­i­cally similar op­ti­miza­tion prob­lems. But that isn’t too sur­pris­ing given the well-known math­e­mat­i­cal and com­pu­ta­tional par­allels be­tween con­di­tion­al­iz­ing and max­i­miz­ing ex­pected util­ity.

It’s tempt­ing to throw this out en­tirely. But part of me does feel like there’s a weird con­nec­tion be­tween cu­ri­os­ity and ev­ery other drive. For ex­am­ple, sex seems like it should be pretty ba­sic and cu­ri­os­ity-re­sis­tant. But how of­ten do peo­ple say that they’re at­tracted to some­one “be­cause he’s mys­te­ri­ous”? And what about the Coolidge Effect (known in the polyamory com­mu­nity as “new re­la­tion­ship en­ergy”)? After a while with the same part­ner, sex and ro­mance lose their magic – only to reap­pear if the an­i­mal/​per­son hooks up with a new part­ner. Doesn’t this point to some kind of con­nec­tion be­tween sex­u­al­ity and cu­ri­os­ity?

What about the typ­i­cal com­plaint of porn ad­dicts – that they start off watch­ing soft­corn porn, find af­ter a while that it’s no longer titil­lat­ing, move on to harder porn, and even­tu­ally have to get into re­ally per­verted stuff just to feel any­thing at all? Is this a sort of un­cer­tainty re­duc­tion?

The only prob­lem is that this is a re­ally spe­cific kind of un­cer­tainty re­duc­tion. Why should “un­cer­tainty about what it would be like to be in a re­la­tion­ship with that par­tic­u­lar at­trac­tive per­son” be so much more com­pel­ling than “un­cer­tainty about what the mid­dle let­ter of the Bible is”, a ques­tion which al­most no one feels the slight­est in­cli­na­tion to re­solve? The in­ter­view­ers ask Fris­ton some­thing sort of similar, refer­ring to some ex­per­i­ments where peo­ple are hap­piest not when given easy things with no un­cer­tainty, nor con­fus­ing things with un­re­solv­able un­cer­tainty, but puz­zles – things that seem con­fus­ing at first, but ac­tu­ally have a lot of hid­den or­der within them. They ask Fris­ton whether he might want to switch teams to sup­port a u-shaped the­ory where peo­ple like be­ing in the mid­dle be­tween too lit­tle un­cer­tainty or too much un­cer­tainty. Fris­ton…does not want to switch teams.

I do not think that “differ­ent laws may ap­ply at differ­ent lev­els”. I see a sin­gu­lar and sim­ple ex­pla­na­tion for all the ap­par­ent di­alec­tics above: they are all ex­plained by min­i­miza­tion of ex­pected free en­ergy, ex­pected sur­prise or un­cer­tainty. I feel slightly pu­ri­tan­i­cal when deflat­ing some of the (mag­i­cal) think­ing about in­verted U curves and “sweet spots”. How­ever, things are just sim­pler than that: there is only one sweet spot; namely, the free en­ergy min­i­mum at the bot­tom of a U-shaped free en­ergy func­tion […]
This means that any op­por­tu­nity to re­solve un­cer­tainty it­self now be­comes at­trac­tive (liter­ally, in the math­e­mat­i­cal sense of a ran­dom dy­nam­i­cal at­trac­tor) (Fris­ton, 2013). In short, as nicely ar­tic­u­lated by (Sch­mid­hu­ber, 2010), the op­por­tu­nity to an­swer “what would hap­pen if I did that” is one of the most im­por­tant re­solvers of un­cer­tainty. For­mally, the re­s­olu­tion of un­cer­tainty (aka in­trin­sic mo­ti­va­tion, in­trin­sic value, epistemic value, the value of in­for­ma­tion, Bayesian sur­prise, etc. (Fris­ton et al., 2017)) cor­re­sponds to salience. Note that in ac­tive in­fer­ence, salience be­comes an at­tribute of an ac­tion or policy in re­la­tion to the lived world. The math­e­mat­i­cal ho­mologue for con­tin­gen­cies (tech­ni­cally, the pa­ram­e­ters of a gen­er­a­tive model) cor­re­sponds to nov­elty. In other words, if there is an ac­tion that can re­duce un­cer­tainty about the con­se­quences of a par­tic­u­lar be­hav­ior, it is more likely to be ex­pressed.
Given these im­per­a­tives, then the two ends of the in­verted U be­come two ex­trema on differ­ent di­men­sions. In a world full of nov­elty and op­por­tu­nity, we know im­me­di­ately there is an op­por­tu­nity to re­solve re­ducible un­cer­tainty and will im­me­di­ately em­bark on joyful ex­plo­ra­tion — joyful be­cause it re­duces un­cer­tainty or ex­pected free en­ergy (Joffily & Cori­celli, 2013). Con­versely, in a com­pletely un­pre­dictable world (i.e., a world with no pre­cise sen­sory ev­i­dence, such as a dark room) there is no op­por­tu­nity and all un­cer­tainty is ir­re­ducible — a joyless world. Bore­dom is sim­ply the product of ex­plo­ra­tive be­hav­ior; emp­ty­ing a world of its epistemic value — a bar­ren world in which all epistemic af­for­dance has been ex­hausted through in­for­ma­tion seek­ing, free en­ergy min­i­miz­ing ac­tion.
Note that I slipped in the word “joyful” above. This brings some­thing in­ter­est­ing to the table; namely, the af­fec­tive valence of shifts in un­cer­tainty — and how they are eval­u­ated by our brains.

The only thing at all I am able to gather from this para­graph – be­sides the fact that ap­par­ently Karl Fris­ton cites him­self in con­ver­sa­tion – is the Sch­mid­hu­ber refer­ence, which is ac­tu­ally re­ally helpful. Sch­mid­hu­ber is the guy be­hind eg the For­mal The­ory Of Fun & Creativity Ex­plains Science, Art, Mu­sic, Hu­mor, in which all of these are some form of tak­ing a seem­ingly com­plex do­main (in the math­e­mat­i­cal sense of com­plex­ity) and re­duc­ing it to some­thing sim­ple (dis­cov­er­ing a hid­den or­der that makes it more com­press­ible). I think Fris­ton might be try­ing to hint that free en­ergy min­i­miza­tion works in a Sch­mid­hu­be­rian sense where it ap­plies to learn­ing things that sud­denly make large parts of our ex­pe­rience more com­pre­hen­si­ble at once, rather than just “Here are some num­bers: 1, 5, 7, 21 – now you have less un­cer­tainty over what num­bers I was about to tell you, isn’t that great?”

I agree this is one of life’s great joys, though maybe me and Karl Fris­ton are not a 100% typ­i­cal sub­set of hu­man­ity here. Also, I have trou­ble figur­ing out how to con­cep­tu­al­ize other hu­man drives like sex as this same kind com­plex­ity-re­duc­tion joy.

One more con­cern here – a lot of the things I read about this equiv­o­cate be­tween “model ac­cu­racy max­i­miza­tion” and “sur­prise min­i­miza­tion”. Th­ese end re­ally differ­ently. Model ac­cu­racy max­i­miza­tion sounds like cu­ri­os­ity – you go out and ex­plore as much of the world as pos­si­ble to get a model that pre­cisely matches re­al­ity. Sur­prise min­i­miza­tion sounds like lock­ing your­self in a dark room with no stim­uli, then pre­dict­ing that you will be in a dark room with no stim­uli, and never be­ing sur­prised when your pre­dic­tion turns out to be right. I un­der­stand Fris­ton has writ­ten about the so-called “dark room prob­lem”, but I haven’t had a chance to look into it as much as I should, and I can’t find any­thing that takes one or the other horn of the equiv­o­ca­tion and says “definitely this one”.

Fourth, okay, all of this is pretty neat, but how does it ex­plain all biolog­i­cal sys­tems? How does it ex­plain abio­ge­n­e­sis? And when do we get to the real-world ver­sion of psy­chohis­tory? In his Alius in­ter­view, Fris­ton writes:

I first came up with a pro­to­typ­i­cal free en­ergy prin­ci­ple when I was eight years old, in what I have pre­vi­ously called a “Ger­ald Dur­rell” mo­ment (Fris­ton, 2012). I was in the gar­den, dur­ing a glo­ri­ously hot 1960s Bri­tish sum­mer, pre­oc­cu­pied with the an­tics of some woodlice who were fran­ti­cally scur­ry­ing around try­ing to find some shade. After half an hour of ob­ser­va­tion and in­no­cent (childlike) con­tem­pla­tion, I re­al­ized their “scur­ry­ing” had no pur­pose or in­tent: they were sim­ply mov­ing faster in the sun — and slower in the shade. The sim­plic­ity of this ex­pla­na­tion — for what one could art­fully call biotic self-or­ga­ni­za­tion — ap­pealed to me then and ap­peals to me now. It is ex­actly the same prin­ci­ple that un­der­writes the en­sem­ble den­sity dy­nam­ics of the free en­ergy prin­ci­ple — and all its corol­laries.

How do the wood lice have any­thing to do with any of the rest of this?

As best I can un­der­stand (and I’m draw­ing from here and here again), this is an ul­ti­mate mean­ing of “free en­ergy” which is sort of like a for­mal­iza­tion of home­osta­sis. It goes like this: con­sider a prob­a­bil­ity dis­tri­bu­tion of all the states an or­ganism can be in. For ex­am­ple, your body can be at (90 de­grees F, heart rate 10), (90 de­grees F, heart rate 70), (98 de­grees F, heart rate 10), (98 de­grees F, heart rate 70), or any of a trillion other differ­ent com­bi­na­tions of pos­si­ble pa­ram­e­ters. But in fact, liv­ing sys­tems suc­cess­fully re­strict them­selves to tiny frac­tions of this space – if you go too far away from (98 de­grees F, heart rate 70), you die. So you have two prob­a­bil­ity dis­tri­bu­tions – the max­i­mum-en­tropy one where you could have any com­bi­na­tion of heart rate and body tem­per­a­ture, and the one your body is aiming for with a life-com­pat­i­ble com­bi­na­tion of heart rate and body tem­per­a­ture. When­ever you have a sys­tem try­ing to con­vert one prob­a­bil­ity dis­tri­bu­tion into an­other prob­a­bil­ity dis­tri­bu­tion, you can think of it as do­ing Bayesian work and fol­low­ing free en­ergy prin­ci­ples. So free en­ergy seems to be some­thing like just a for­mal ex­pla­na­tion of how cer­tain sys­tems dis­play goal-di­rected be­hav­ior, with­out hav­ing to bring in an an­thro­po­mor­phic or tele­olog­i­cal con­cept of “goal-di­rect­ed­ness”.

Fris­ton men­tions many times that free en­ergy is “al­most tau­tolog­i­cal”, and one of the neu­ro­scien­tists I talked to who claimed to half-un­der­stand it said it should be viewed more as an el­e­gant way of look­ing at things than as a sci­en­tific the­ory per se. From the Alius in­ter­view:

The free en­ergy prin­ci­ple stands in stark dis­tinc­tion to things like pre­dic­tive cod­ing and the Bayesian brain hy­poth­e­sis. This is be­cause the free en­ergy prin­ci­ple is what it is — a prin­ci­ple. Like Hamil­ton’s Prin­ci­ple of Sta­tion­ary Ac­tion, it can­not be falsified. It can­not be dis­proven. In fact, there’s not much you can do with it, un­less you ask whether mea­surable sys­tems con­form to the prin­ci­ple.

So we haven’t got a real-life ver­sion of Asi­mov’s psy­chohis­tory, is what you’re say­ing?

But also:

The Bayesian brain hy­poth­e­sis is a corol­lary of the free en­ergy prin­ci­ple and is re­al­ized through pro­cesses like pre­dic­tive cod­ing or ab­duc­tive in­fer­ence un­der prior be­liefs. How­ever, the Bayesian brain is not the free en­ergy prin­ci­ple, be­cause both the Bayesian brain hy­poth­e­sis and pre­dic­tive cod­ing are in­com­plete the­o­ries of how we in­fer states of af­fairs.
This miss­ing bit is the en­ac­tive com­pass of the free en­ergy prin­ci­ple. In other words, the free en­ergy prin­ci­ple is not just about mak­ing the best (Bayesian) sense of sen­sory im­pres­sions of what’s “out there”. It tries to un­der­stand how we sam­ple the world and au­thor our own sen­sa­tions. Again, we come back to the woodlice and their scur­ry­ing — and an at­tempt to un­der­stand the im­per­a­tives be­hind this ap­par­ently pur­pose­ful sam­pling of the world. It is this en­ac­tive, em­bod­ied, ex­tended, em­bed­ded, and en­cul­tured as­pect that is lack­ing from the Bayesian brain and pre­dic­tive cod­ing the­o­ries; pre­cisely be­cause they do not con­sider en­tropy re­duc­tion […]
In short, the free en­ergy prin­ci­ple fully en­dorses the Bayesian brain hy­poth­e­sis — but that’s not the story. The only way you can change “the shape of things” — i.e., bound en­tropy pro­duc­tion — is to act on the world. This is what dis­t­in­guishes the free en­ergy prin­ci­ple from pre­dic­tive pro­cess­ing. In fact, we have now taken to refer­ring to the free en­ergy prin­ci­ple as “ac­tive in­fer­ence”, which seems closer to the mark and slightly less pre­ten­tious for non-math­e­mat­i­ci­ans.

So maybe the free en­ergy prin­ci­ple is the unifi­ca­tion of pre­dic­tive cod­ing of in­ter­nal mod­els, with the “ac­tion in the world is just an­other form of pre­dic­tion” the­sis men­tioned above? I guess I thought that was part of the stan­dard pre­dic­tive cod­ing story, but maybe I’m wrong?

Over­all, the best I can do here is this: the free en­ergy prin­ci­ple seems like an at­tempt to unify per­cep­tion, cog­ni­tion, home­osta­sis, and ac­tion.

“Free en­ergy” is a math­e­mat­i­cal con­cept that rep­re­sents the failure of some things to match other things they’re sup­posed to be pre­dict­ing.

The brain tries to min­i­mize its free en­ergy with re­spect to the world, ie min­i­mize the differ­ence be­tween its mod­els and re­al­ity. Some­times it does that by up­dat­ing its mod­els of the world. Other times it does that by chang­ing the world to bet­ter match its mod­els.

Per­cep­tion and cog­ni­tion are both at­tempts to cre­ate ac­cu­rate mod­els that match the world, thus min­i­miz­ing free en­ergy.

Homeosta­sis and ac­tion are both at­tempts to make re­al­ity match men­tal mod­els. Ac­tion tries to get the or­ganism’s ex­ter­nal state to match a men­tal model. Homeosta­sis tries to get the or­ganism’s in­ter­nal state to match a men­tal model. Since even bac­te­ria are do­ing some­thing home­osta­sis-like, all life shares the prin­ci­ple of be­ing free en­ergy min­i­miz­ers.

So life isn’t do­ing four things – per­ceiv­ing, think­ing, act­ing, and main­tain­ing home­osta­sis. It’s re­ally just do­ing one thing – min­i­miz­ing free en­ergy – in four differ­ent ways – with the par­tic­u­lar way it im­ple­ments this in any given situ­a­tion de­pend­ing on which free en­ergy min­i­miza­tion op­por­tu­ni­ties are most con­ve­nient. Or some­thing. All of this might be a use­ful thing to know, or it might just be a cool philo­soph­i­cal way of look­ing at things, I’m still not sure.

Or some­thing like this? Maybe? Some­body please help?


Dis­cus­sion ques­tion for those of you on the sub­red­dit – if the free en­ergy prin­ci­ple were right, would it dis­prove the or­thog­o­nal­ity the­sis? Might it be im­pos­si­ble to de­sign a work­ing brain with any goal be­sides free en­ergy re­duc­tion? Would any­thing – even a pa­per­clip max­i­mizer – have to start by min­i­miz­ing un­cer­tainty, and then add pa­per­clip max­i­miza­tion in later as a hack? Would it change any­thing if it did?