My computational framework for the brain

(See comment here for some updates and corrections and retractions. —Steve, 2022)

By now I’ve written a bunch of blog posts on brain architecture and algorithms, not in any particular order and generally interspersed with long digressions into Artificial General Intelligence. Here I want to summarize my key ideas in one place, to create a slightly better entry point, and something I can refer back to in certain future posts that I’m planning. If you’ve read every single one of my previous posts (hi mom!), there’s not much new here.

In this post, I’m trying to paint a picture. I’m not really trying to justify it, let alone prove it. The justification ultimately has to be: All the pieces are biologically, computationally, and evolutionarily plausible, and the pieces work together to explain absolutely everything known about human psychology and neuroscience. (I believe it! Try me!) Needless to say, I could be wrong in both the big picture and the details (or missing big things). If so, writing this out will hopefully make my wrongness easier to discover!

Pretty much everything I say here and its opposite can be found in the cognitive neuroscience literature. (It’s a controversial field!) I make no pretense to originality (with one exception noted below), but can’t be bothered to put in actual references. My previous posts have a bit more background, or just ask me if you’re interested. :-P

So let’s start in on the 7 guiding principles for how I think about the brain:

1. Two subsystems: “Neocortex” and “Subcortex”

(Update: I have a revised discussion of this topic at my later post Two Subsystems: Learning and Steering.)

This is the starting point. I think it’s absolutely critical. The brain consists of two subsystems. The neocortex is the home of “human intelligence” as we would recognize it—our beliefs, goals, ability to plan and learn and understand, every aspect of our conscious awareness, etc. etc. (All mammals have a neocortex; birds and lizards have an homologous and functionally-equivalent structure called the “pallium”.) Some other parts of the brain (hippocampus, parts of the thalamus & basal ganglia & cerebellum—see further discussion here) help the neocortex do its calculations, and I lump them into the “neocortex subsystem”. I’ll use the term subcortex for the rest of the brain (brainstem, hypothalamus, etc.).

  • Aside: Is this the triune brain theory? No. Triune brain theory is, from what I gather, a collection of ideas about brain evolution and function, most of which are wrong. One aspect of triune brain theory is putting a lot of emphasis on the distinction between neocortical calculations and subcortical calculations. I like that part. I’m keeping that part, and I’m improving it by expanding the neocortex club to also include the thalamus, hippocampus, lizard pallium, etc., and then I’m ignoring everything else about triune brain theory.

2. Cortical uniformity

I claim that the neocortex is, to a first approximation, architecturally uniform, i.e. all parts of it are running the same generic learning algorithm in a massively-parallelized way.

The two caveats to cortical uniformity (spelled out in more detail at that link) are:

  • There are sorta “hyperparameters” on the generic learning algorithm which are set differently in different parts of the neocortex—for example, different regions have different densities of each neuron type, different thresholds for making new connections (which also depend on age), etc. This is not at all surprising; all learning algorithms inevitably have tradeoffs whose optimal settings depend on the domain that they’re learning (no free lunch).

    • As one of many examples of how even “generic” learning algorithms benefit from domain-specific hyperparameters, if you’ve seen a pattern “A then B then C” recur 10 times in a row, you will start unconsciously expecting AB to be followed by C. But “should” you expect AB to be followed by C after seeing ABC only 2 times? Or what if you’ve seen the pattern ABC recur 72 times in a row, but then saw AB(not C) twice? What “should” a learning algorithm expect in those cases? The answer depends on the domain—how regular vs random are the environmental patterns you’re learning? How stable are they over time? The answer is presumably different for low-level visual patterns vs motor control patterns etc.

  • There is a gross wiring diagram hardcoded in the genome—i.e., set of connections between different neocortical regions and each other, and other parts of the brain. These connections later get refined and edited during learning. These make the learning process faster and more reliable by bringing together information streams with learnable relationships—for example the wiring diagram seeds strong connections between toe-related motor output areas and toe-related proprioceptive (body position sense) input areas. We can learn relations between information streams without any help from the innate wiring diagram, by routing information around the cortex in more convoluted ways—see the Ian Waterman example here—but it’s slower, more limited, and may consume conscious attention. Related to this is a diversity of training signals: for example, different parts of the neocortex are trained to predict different signals, and also different parts of the neocortex get different dopamine training signals—or even none at all.

3. Blank-slate neocortex

(...But not blank-slate subcortex! More on that below.)

(Update: To avoid confusion, I’ve more recently been calling this concept “learning-from-scratch”—see discussion in my later post “Learning from Scratch” in the brain.)

I claim that the neocortex (and the rest of the telencephalon and cerebellum) starts out as a “blank slate”: Just like an ML model initialized with random weights, the neocortex cannot make any correct predictions or do anything useful until it learns to do so from previous inputs, outputs, and rewards.

In more neuroscience-y (and maybe less provocative) terms, I could say instead: the neocortex is a memory system. It’s a really fancy memory system—it’s highly structured to remember particular kinds of patterns and their relationships, and it comes with a sophisticated query language and so on—but at the end of the day, it’s still a type of memory. And like any memory system, it is useless to the organism until it gradually accumulates information. (Suggestively, if you go far enough back, the neocortex and hippocampus evolved out of the same ancient substructure (ref).)

(By the way, I am not saying that the neocortex’s algorithm is similar to today’s ML algorithms. There’s more than one blank-slate learning algorithm! See image.)

A “blank slate” learning algorithm, as I’m using the term, is one that learns information “from scratch”—an example would be a Machine Learning model that starts with random weights and then proceeds with gradient descent. When you imagine a “blank slate” learning algorithm, you should not imagine an empty void that gets filled with data. You should imagine a machine that learns more and better patterns over time, and writes those patterns into a memory bank—and “blank slate” just means that the memory bank starts out empty. There are many such machines, and they will learn different patterns and therefore do different things. See next section, and see also the discussion of hyperparameters in the previous section.

Why do I think that the neocortex starts from a blank slate? Two types of reasons:

  • Details of how I think the neocortical algorithm works: This is the main reason for me.

    • For example, as I mentioned here, there’s a theory I like that says that all feedforward signals (I’ll define that in the next section) in the neocortex—which includes all signals coming into the neocortex from the outside it, plus many cortex-to-cortex signals—are re-encoded into the data format that the neocortex can best process—i.e. a set of sparse codes, with low overlap, uniform distribution, and some other nice properties—and this re-encoding is done by a pseudorandom process! If that’s right, it would seem to categorically rule out anything but a blank-slate starting point.

    • More broadly, we know the algorithm can learn new concepts, and new relationships between concepts, without having any of those concepts baked in by evolution—e.g. learning about rocket engine components. So why not consider the possibility that that’s all it does, from the very beginning? I can see vaguely how that would work, why that would be biologically plausible and evolutionarily adaptive, and I can’t currently see any other way that the algorithm can work.

  • Absence of evidence to the contrary: I have a post Human Instincts, Symbol Grounding, and the Blank-Slate Neocortex where I went through a list of universal human instincts, and didn’t see anything inconsistent with a blank-slate neocortex. The subcortex—which is absolutely not a blank slate—plays a big role in most of those; for example, the mouse has a brainstem bird-detecting circuit wired directly to a brainstem running-away circuit. (More on this in a later section.) Likewise I’ve read about the capabilities of newborn humans and other animals, and still don’t see any problem. I accept all challenges; try me!

4. What is the neocortical algorithm?

4.1. “Analysis by synthesis” + “Planning by probabilistic inference”

“Analysis by synthesis” means that the neocortex searches through a space of generative models for a model that predicts its upcoming inputs (both external inputs, like vision, and internal inputs, like proprioception and reward). “Planning by probabilistic inference” (term from here) means that we treat our own actions as probabilistic variables to be modeled, just like everything else. In other words, the neocortex’s output lines (motor outputs, hormone outputs, etc.) are the same type of signal as any generative model prediction, and processed in the same way.

Here’s how those come together. As discussed in Predictive Coding = RL + SL + Bayes + MPC, and shown in this figure below:

  • The neocortex favors generative models that have been making correct predictions, and discards generative models that have been making predictions that are contradicted by input data (or by other favored generative models).

  • And, the neocortex favors generative models which predict larger future reward, and discards generative models that predict smaller (or more negative) future reward.

This combination allows both good epistemics (ever-better understanding of the world), and good strategy (planning towards goals) in the same algorithm. This combination also has some epistemic and strategic failure modes—e.g. a propensity to wishful thinking—but in a way that seems compatible with human psychology & behavior, which is likewise not perfectly optimal, if you haven’t noticed. Again, see the link above for further discussion.

Criteria by which generative models rise to prominence in the neocortex; see Predictive Coding = RL + SL + Bayes + MPC for detailed discussion. Note that (e) is implemented by a very different mechanism than the other parts.
  • Aside: Is this the same as Predictive Coding /​ Free-Energy Principle? Sorta. I’ve read a fair amount of “mainstream” predictive coding (Karl Friston, Andy Clark, etc.), and there are a few things about it that I like, including the emphasis on generative models predicting upcoming inputs, and the idea of treating neocortical outputs as just another kind of generative model prediction. It also has a lot of other stuff that I disagree with (or don’t understand). My account differs from theirs mainly by (1) emphasizing multiple simultaneous generative models that compete & cooperate (cf. “society of mind”, multiagent models of mind, etc.), rather than “a” (singular) prior, and (2) restricting discussion to the neocortex subsystem, rather than trying to explain the brain as a whole. In both cases, this may be partly a difference of emphasis & intuitions, rather than fundamental. But I think the core difference is that predictive coding /​ FEP takes some processes to be foundational principles, whereas I think that those same things do happen, but that they’re emergent behaviors that come out of the algorithm under certain conditions. For example, in Predictive Coding & Motor Control I talk about the predictive-coding story that proprioceptive predictions are literally exactly the same as motor outputs. Well, I don’t think they’re exactly the same. But I do think that proprioceptive predictions and motor outputs are the same in some cases (but not others), in some parts of the neocortex (but not others), and after (but not before) the learning algorithm has been running a while. So I kinda wind up in a similar place as predictive coding, in some respects.

4.2. Compositional generative models

Each of the generative models consists of predictions that other generative models are on or off, and/​or predictions that input channels (coming from outside the neocortex—vision, hunger, reward, etc.) are on or off. (“It’s symbols all the way down.”) All the predictions are attached to confidence values, and both the predictions and confidence values are, in general, functions of time (or of other parameters—I’m glossing over some details). The generative models are compositional, because if two of them make disjoint and/​or consistent predictions, you can create a new model that simply predicts that both of those two component models are active simultaneously. For example, we can snap together a “purple” generative model and a “jar” generative model to get a “purple jar” generative model. They are also compositional in other ways—for example, you can time-sequence them, by making a generative model that says “Generative model X happens and then Generative model Y happens”.

PGM-type message-passing: Among other things, the search process for the best set of simultaneously-active generative model involves something at least vaguely analogous to message-passing (belief propagation) in a probabilistic graphical model. Dileep George’s vision model is a well-fleshed-out example.

Hierarchies are part of the story but not everything: Hierarchies are a special case of compositional generative models. A generative model for an image of “85” makes a strong prediction that there is an “8“ generative model positioned next to a “5” generative model. The “8” generative model, in turn, makes strong predictions that certain contours and textures are present in the visual input stream.

However, not all relations are hierarchical. The “is-a-bird” model makes a medium-strength prediction that the “is-flying” model is active, and the “is-flying” model makes a medium-strength prediction that the “is-a-bird” model is active. Neither is hierarchically above the other.

As another example, the brain has a visual processing hierarchy, but as I understand it, studies show that the brain has loads of connections that don’t respect the hierarchy.

Feedforward and feedback signals: There are two important types of signals in the neocortex.

A “feedback” signal is a generative model prediction, attached to a confidence level, which includes all the following:

  • “I predict that neocortical input line #2433 will be active, with probability 0.6”.

  • “I predict that generative model #95738 will be active, with probability 0.4”.

  • “I predict that neocortical output line #185492 will be active, with probability 0.98”—and this one is a self-fulfilling prophecy, as the feedback signal is also the output line!

A “feedforward” signal is an announcement that a certain signal is, in fact, active right now, which includes all the following:

  • “Neocortical input line #2433 is currently active!”

  • “Generative model #95738 is currently active!”

There are about 10× more feedback connections than feedforward connections in the neocortex, I guess for algorithmic reasons I don’t currently understand.

In a hierarchy, the top-down signals are feedback, and the bottom-up signals are feedforward.

The terminology here is a bit unfortunate. In a motor output hierarchy, we think of information flowing “forward” from high-level motion plan to low-level muscle control signals, but that’s the feedback direction. The forward/​back terminology works better for sensory input hierarchies. Some people say “top-down” and “bottom-up” instead of “feedback” and “feedforward” respectively, which is nice and intuitive for both input and output hierarchies. But then that terminology gets confusing when we talk about non-hierarchical connections. Oh well.

(I’ll also note here that “mainstream” predictive coding discussions sometimes talk about feedback signals being associated with confidence intervals for analog feedforward signals, rather than confidence levels for binary feedforward signals. I changed it on purpose. I like my version better.)

5. The subcortex steers the neocortex towards biologically-adaptive behaviors.

The blank-slate neocortex can learn to predict input patterns, but it needs guidance to do biologically adaptive things. So one of the jobs of the subcortex is to try to “steer” the neocortex, and the subcortex’s main tool for this task is its ability to send rewards to the neocortex at the appropriate times. Everything that humans reliably and adaptively do with their intelligence, from liking food to making friends, depends on the various reward-determining calculations hardwired into the subcortex.

6. The neocortex is a black box from the perspective of the subcortex. So steering the neocortex is tricky!

Only the neocortex subsystem has an intelligent world-model. Imagine you just lost a big bet, and now you can’t pay back your debt to the loan shark. That’s bad. The subcortex (hypothalamus & brainstem) needs to send negative rewards to the neocortex. But how can it know? How can the subcortex have any idea what’s going on? It has no concept of a “bet”, or “debt”, or “payment” or “loan shark”.

This is a very general problem. I think there are two basic ingredients in the solution.

Here’s a diagram to refer to, based on the one I put in Inner Alignment in the Brain:

Schematic illustration of some aspects of the relationship between subcortex & neocortex. See also my previous post Inner Alignment in the Brain for more on this. (Update June 2021: I would no longer draw the diagram this way, see here. The biggest difference is: I would not draw a direct line from neocortex to a hormone change (for example); instead the cortex would ask the subcortex (hypothalamus + brainstem) to make that hormone change, and then the subcortex might or might not comply with that recommendation. (I guess the way I drew it here is more like somatic marker hypothesis.)))

6.1 The subcortex can learn what’s going on in the world via its own, parallel, sensory-processing system.

Thus, for example, we have the well-known visual processing system in our visual cortex, and we have the lesser-known visual processing system in our midbrain (superior colliculus). Ditto for touch, smell, proprioception, nociception, etc.

While they have similar inputs, these two sensory processing systems could not be more different!! The neocortex fits its inputs into a huge, open-ended predictive world-model, but the subcortex instead has a small and hardwired “ontology” consisting of evolutionarily-relevant inputs that it can recognize like faces, human speech sounds, spiders, snakes, looking down from a great height, various tastes and smells, stimuli that call for flinching, stimuli that one should orient towards, etc. etc., and these hardwired recognition circuits are connected to hardwired responses.

For example, babies learn to recognize faces quickly and reliably in part because the midbrain sensory processing system knows what a face looks like, and when it sees one, it will saccade to it, and thus the neocortex will spend disproportionate time building predictive models of faces.

...Or better yet, instead of saccading to faces itself, the subcortex can reward the neocortex each time it detects that it is looking at a face! Then the neocortex will go off looking for faces, using its neocortex-superpowers to learn arbitrary patterns of sensory inputs and motor outputs that tend to result in looking at people’s faces.

6.2 The subcortex can see the neocortex’s outputs—which include not only prediction but imagination, memory, and empathetic simulations of other people.

For example, if the neocortex never predicts or imagines any reward, then the subcortex can guess that the neocortex has a grim assessment of its prospects for the future—I’ll discuss that particular example much more in an upcoming post on depression. (Update: that was wrong; see better discussion here.)

To squeeze more information out of the neocortex, the subcortex can also “teach” the neocortex to reveal when it is thinking of one of the situations in the subcortex’s small hardwired ontology (faces, spiders, sweet tastes, etc.—see above). For example, if the subcortex rewards the neocortex for cringing in advance of pain, then the neocortex will learn to favor pain-prediction generative models that also send out cringe-motor-commands. And thus, eventually, it will also start sending weak cringe-motor-commands when imagining future pain, or when empathically simulating someone in pain—and the subcortex can detect that, and issue hardwired responses in turn.

(Update: I now think “the subcortex rewards the neocortex for cringing in advance of pain” is probably not quite the right mechanism, see here.)

See Inner Alignment in the Brain for more examples & discussion of all this stuff about steering.

Unlike most of the other stuff here, I haven’t seen anything in the literature that takes “how does the subcortex steer the neocortex?” to be a problem that needs to be solved, let alone that solves it. (Let me know if you have!) …Whereas I see it as The Most Important And Time-Sensitive Problem In All Of Neuroscience—because if we build neocortex-like AI algorithms, we will need to know how to steer them towards safe and beneficial behaviors!

7. The subcortical algorithms remain largely unknown

I think much less is known about the algorithms of the subcortex (brainstem, hypothalamus, amygdala, etc.) (Update: After further research I have promoted the amygdala up to the neocortex subsystem, see discussion here) than about the algorithms of the neocortex. There are a couple issues:

  • The subcortex’s algorithms are more complicated than the neocortex’s algorithms: As described above, I think the neocortex has more-or-less one generic learning algorithm. Sure, it consists of many interlocking parts, but it has an overall logic. The subcortex, by contrast, has circuitry for detecting and flinching away from an incoming projectile, circuitry for detecting spiders in the visual field, circuitry for (somehow) implementing lots of different social instincts, etc. etc. I doubt all these things strongly overlap each other, though I don’t know that for sure. That makes it harder to figure out what’s going on.

    • I don’t think the algorithms are “complicated” in the sense of “mysterious and sophisticated”. Unlike the neocortex, I don’t think these algorithms are doing anything where a machine learning expert couldn’t sit down and implement something functionally equivalent in PyTorch right now. I think they are complicated in that they have a complicated specification (this kind of input produces that kind of output, and this other kind of input produces this other kind of output, etc. etc. etc.), and this specification what we need to work out.

  • Fewer people are working on subcortical algorithms than the neocortex’s algorithms: The neocortex is the center of human intelligence and cognition. So very exciting! So very monetizable! By contrast, the midbrain seems far less exciting and far less practically useful. Also, the neocortex is nearest the skull, and thus accessible to some experimental techniques (e.g. EEG, MEG, ECoG) that don’t work on deeper structures. This is especially limiting when studying live humans, I think.

As mentioned above, I am very unhappy about this state of affairs. For the project of building safe and beneficial artificial general intelligence, I feel strongly that it would be better if we reverse-engineered subcortical algorithms first, and neocortical algorithms second.

(Edited to add: …if at all. Like, maybe, armed with a better understanding of how the subcortex steers the neocortex, we’ll realize that there’s just no way to keep a brain-like AGI under human control. Then we can advocate against people continuing to pursue the research program of reverse-engineering neocortical algorithms! Or conversely, if we have a really solid plan to build safe and beneficial brain-like AGIs, we could try to accelerate the reverse-engineering of the neocortex, as compared to other paths to AGI. This is a great example of how AGI-related technical safety research can be decision-relevant today even if AGI is centuries away.)


Well, my brief summary wasn’t all that brief after all! Congratulations on making it this far! I’m very open to questions, discussion, and criticism. I’ve already revised my views on all these topics numerous times, and expect to do so again. :-)