[Intro to brain-like-AGI safety] 2. “Learning from scratch” in the brain

2.1 Post summary /​ Table of contents

Part of the “Intro to brain-like-AGI safety” post series.

Having introduced the “brain-like-AGI safety” problem in the previous post, the next 6 posts (#2–#7) are primarily about neuroscience, building up to a more nuts-and-bolts understanding of what a brain-like AGI might look like (at least, in its safety-relevant aspects).

This post focuses on a concept that I call “learning from scratch”, and I will offer a hypothesized breakdown in which 96% of the human brain (including the neocortex) “learns from scratch”, and the other 4% (including the brainstem) doesn’t. This hypothesis is central to how I think the brain works, and hence will be a key prerequisite for the whole rest of the series.

  • In Section 2.2, I’ll define the concept of “learning from scratch”. As an example, if I assert that the neocortex “learns from scratch”, then I’m claiming that the neocortex starts out totally useless to the organism—outputting fitness-improving signals no more often than chance—until it starts learning things (within the individual’s lifetime). Here are a couple everyday examples of things that “learn from scratch”:

    • In most deep learning papers, the trained model “learns from scratch”—the model is initialized from random weights, and hence the model outputs are random garbage at first. But during training, the weights are updated, and the model outputs eventually become very useful.

    • A blank hard disk drive also “learns from scratch”—you can’t pull useful information out of it until after you’ve written information into it.

  • In Section 2.3, I will clarify some frequent confusions:

    • “Learning from scratch” is different from “blank slate”, because there is an innate learning algorithm, innate neural architecture, innate hyperparameters, etc.

    • “Learning from scratch” is different from “nurture-not-nature”, because (1) only some parts of the brain learn from scratch, while other parts don’t, and (2) the learning algorithms are not necessarily learning about the external environment—they could also be learning e.g. how to control one’s own body.

    • “Learning from scratch” is different from (and more specific than) “brain plasticity”, because the latter can also include (for example) a genetically-hardwired circuit with just one specific adjustable parameter, and that parameter changes semi-permanently under specific conditions.

  • In Section 2.4, I’ll propose my hypothesis that two major parts of the brain exist solely to run learning-from-scratch algorithms—namely, the telencephalon (neocortex, hippocampus, amygdala, most of the basal ganglia, etc.) and cerebellum. Together these comprise 96% of the volume of the human brain.

  • In Section 2.5, I’ll touch on four different lines of evidence concerning my hypothesis that the telencephalon and cerebellum learn from scratch: (1) big-picture thinking about how the brain works, (2) neonatal data, (3) a connection to the hypothesis of “cortical uniformity” and related issues, and (4) the possibility that a certain brain preprocessing motif—so-called “pattern separation”—involves randomization in a way that forces downstream algorithms to learn from scratch.

  • In Section 2.6, I’ll talk briefly about whether my hypothesis is mainstream vs idiosyncratic. (Answer: I’m not really sure.)

  • In Section 2.7, I’ll offer a little teaser of why learning-from-scratch is important for AGI safety—we wind up with a situation where the thing we want the AGI to be trying to do (e.g. cure Alzheimer’s) is a concept buried inside a big hard-to-interpret learned-from-scratch data structure. Thus, it is not straightforward for the programmer to write motivation-related code that refers to this concept. Much more on this topic in future posts.

  • Section 2.8 will be the first of three parts of my “timelines to brain-like AGI” discussion, focusing on how long it will take for future scientists to reverse-engineer the key operating principles of the learning-from-scratch part of the brain. (The remainder of the timelines discussion is in the next post.)

2.2 What is “learning from scratch”?

As in the intro above, I’m going to suggest that large parts of the brain—basically the telencephalon and cerebellum (see Section 2.4 below)—“learn from scratch”, in the sense that they start out emitting signals that are random garbage, not contributing to evolutionarily-adaptive behaviors, but over time become more and more immediately useful thanks to a within-lifetime learning algorithm.

Here are two ways to think about the learning-from-scratch hypothesis:

  • How you should think about learning-from-scratch (if you’re an ML reader): Think of a deep neural net initialized with random weights. Its neural architecture might be simple or might be incredibly complicated; it doesn’t matter. And it certainly has an inductive bias that makes it learn certain types of patterns more easily than other types of patterns. But it still has to learn them! If its weights are initially random, then it’s initially useless, and gets gradually more useful with training data. The idea here is that these parts of the brain (neocortex etc.) are likewise “initialized from random weights”, or something equivalent.

  • How you should think about learning-from-scratch (if you’re a neuroscience reader): Think of a memory-related system, like the hippocampus. The ability to form memories is a very helpful ability for an organism to have! …But it ain’t helpful at birth!![1] You need to accumulate memories before you can use them! My proposal is that everything in the telencephalon and cerebellum are in the same category—they’re kinds of memory modules. They may be very special kinds of memory modules! The neocortex, for example, can learn and remember a super-complex web of interconnected patterns, and comes with powerful querying features, and can even query itself in recurrent loops, and so on. But still, it’s a form of memory, and hence starts out useless, and gets progressively more useful to the organism as it accumulates learned content.

2.3 Three things that “learning from scratch” is NOT

2.3.1 Learning-from-scratch is NOT “blank slate”

I already mentioned this, but I want to be crystal clear: if the neocortex (for example) learns from scratch, that does not mean that there is no genetically-hardcoded information content in the neocortex. It means that the genetically-hardcoded information content is probably better thought of as the following:

  • Learning algorithm(s)—i.e., innate rules for semi-permanently changing the neurons or their connections, in a situation-dependent way.

  • Inference algorithm(s)—i.e., innate rules for what output signals should be sent right now, to help the animal survive and thrive. The actual output signals, of course, will also depend on previously-learned information.

  • Neural network architecture—i.e., an innate large-scale wiring diagram specifying how different parts of the learning module are connected to each other, and to input and output signals.

  • Hyperparameters—e.g., different parts of the architecture might innately have different learning rates. These hyperparameters can also change during development (cf. “sensitive periods”). There can also be an innate capacity to change hyperparameters on a moment-by-moment basis in response to special command signals (in the form of neuromodulators like acetylcholine).

Given all those innate ingredients, the learning-from-scratch algorithm is ready to receive input data and supervisory signals from elsewhere[2], and it gradually learns to do useful things.

This innate information is not necessarily simple. There could be 50,000 wildly different learning algorithms in 50,000 different parts of the neocortex, and that would still qualify as “learning-from-scratch” in my book! (I don’t think that’s the case though—see Section 2.5.3 on “uniformity”.)

When you imagine a learning-from-scratch algorithm, you should not imagine an empty void that gets filled with data. You should imagine an automaton that continually (1) writes information into a memory bank, and (2) performs queries on the current contents of the memory bank. “From scratch” just means that the memory bank starts out empty. There are many such automatons, each following a different procedure for exactly what to write and how to query. For example, a “lookup table” corresponds to a simple automaton that just records whatever it sees. Other automatons correspond to supervised learning algorithms, and reinforcement learning algorithms, and autoencoders, etc. etc.

2.3.2 Learning-from-scratch is NOT “nurture-over-nature”

There’s a tendency to associate “learning-from-scratch algorithms” with the “nurture” side of the “nature-vs-nurture” debate. I think that’s wrong. Quite the contrary: I think that the learning-from-scratch hypothesis is fully compatible with the possibility that evolved innate behaviors play a big role.

Two reasons:

First, some parts of the brain are absolutely NOT running learning-from-scratch algorithms! In this category is mainly the brainstem and hypothalamus (more about those below and in the next post). These non-learning-from-scratch parts of the brain would have to be fully responsible for any adaptive behavior at birth.[1] Is that plausible? I think so, given the impressive range of functionality in the brainstem. For example, the neocortex has circuitry for processing visual and other sensory data—but so does the brainstem! The neocortex has motor-control circuitry—but so does the brainstem! In at least some cases, full adaptive behaviors seem to be implemented entirely within the brainstem: for example, mice have a brainstem incoming-bird-detecting circuit wired directly to a brainstem running-away circuit. So my learning-from-scratch hypothesis is not making any blanket claims about what algorithms or functionalities are or aren’t present in the brain. It’s just a claim that certain types of algorithms are only in certain parts of the brain.

Second, “learning from scratch” is not the same as “learning from the environment”. Here’s a made-up example[3]. Imagine a bird brainstem is built with an innate capability to judge what a good birdsong should sound like, but lacks a recipe for how to produce a good birdsong. Well, a learning-from-scratch algorithm could fill in that gap—doing trial-and-error to get from the former to the latter. This example shows that learning-from-scratch algorithms can be in charge of behaviors that we would naturally and correctly describe as innate /​ “nature not nurture”.

2.3.3 Learning-from-scratch is NOT the more general notion of “plasticity”

“Plasticity” is a term for the brain semi-permanently changing itself, typically by changing the presence /​ absence /​ strength of neuron-to-neuron synapses, but also sometimes via other mechanisms, like changes of a neuron’s gene expression.

Any learning-from-scratch algorithm necessarily involves plasticity. But not all brain plasticity is part of a learning-from-scratch algorithm. A second possibility is what I call “individual innate adjustable parameters”. Here’s a table with both an example of each and general ways in which they differ:

Learning-from-scratch algorithmsIndividual innate adjustable parameters
Stereotypical example to keep in mind:Every deep learning paper: there’s a learning algorithm that gradually builds a trained model by adjusting lots of parameters.Some connection in the rat brain that strengthens when the rat wins a fight—basically, it’s a counter variable, tallying how many fights the rat has won over the course of its life. Then this connection is used to implement the behavior “If you’ve won lots of fights in your life, be more aggressive.” (ref)
Number of parameters that change based on input data (i.e. how many dimensions is the space of all possible trained models?)Maybe lots—hundreds, thousands, millions, etc.Probably few—even as few as one
If you could scale it up, would it work better after training?Yeah, probably.Huh?? WTF does “scale it up” mean?

I don’t think there’s a sharp line between these things; I think there’s a gray area where one blends into the other. Well, at least I think there’s a gray area in principle. In practice, I feel like it’s a pretty clean division—whenever I learn about a particular example of brain plasticity, it winds up being clearly in one category or the other.

My categorization here, by the way, is a bit unusual in neuroscience, I think. Neuroscientists more often focus on low-level implementation details: “Does the plasticity come from long-term synaptic change, or does it come from long-term gene expression change?” “What’s the biochemical mechanism?” Etc. That’s a totally different topic. For example, I’d bet that the exact same low-level biochemical synaptic plasticity mechanism can be involved in both a learning-from-scratch algorithm and an individual innate adjustable parameter.

Why do I bring this up? Because I’m planning to argue that the hypothalamus and brainstem have little or no learning-from-scratch algorithms, so far as I can tell. But they definitely have individual innate adjustable parameters.

To be concrete, here are three examples of “individual innate adjustable parameters” in the hypothalamus & brainstem:

  • I already mentioned the mouse hypothalamus circuit that says “if you keep winning fights, be more aggressive”—ref.

  • Here’s a rat hypothalamus circuit that says “if you keep getting dangerously salt-deprived, increase your baseline appetite for salt”.

  • The superior colliculus in the brainstem contains a visual map, auditory map, and saccade motor map, and it has a mechanism to keep all three lined up—so that when you see a flash or hear a noise, you immediately turn to look in exactly the right direction. This mechanism involves plasticity—it can self-correct in animals wearing prism glasses, for example. I’m not familiar with the details, but I’m guessing it’s something like: If you see a motion, and saccade to it, but the motion is not centered even after the saccade, then that generates an error signal that induces a corresponding incremental map shift. Maybe this whole system involves 8 adjustable parameters (scale and offset, horizontal and vertical, three maps to align), or maybe it’s more complicated—again, I don’t know the details.

See the difference? Go back to the table above if you’re still confused.

2.4 My hypothesis: the telencephalon and cerebellum learn from scratch, the hypothalamus and brainstem don’t

My hypothesis is that ~96% of the human brain by volume is running learning-from-scratch algorithms. The main exceptions are the brainstem and hypothalamus, which together are around the size of your thumb. Image source

Three claims:

First, I think the whole telencephalon learns from scratch (and is useless at birth[1]). The telencephalon (a.k.a. “cerebrum”) is mostly the neocortex in humans, plus the hippocampus, amygdala, most of the basal ganglia, and various more obscure bits and bobs.

Despite appearances, the model I like (due originally to the brilliant Larry Swanson) says that the whole telencephalon is organized into a nice three-layer structure (cortex, striatum, pallidum), and this structure aligns with a relatively small number of interconnected learning algorithms. See upcoming posts 5 & 6 for a bit more on that.

(UPDATE: After learning more, I want to revise this. I think that the whole “cortical mantle” and the whole “extended striatum” learn from scratch. (This includes things like the hippocampus, amygdala, lateral septum, etc.—which go together with the cortex and/​or striatum both embryologically and cytoarchitecturally). As for the pallidum, I think some parts of it are basically an extension of the brainstem RAS and thus definitely don’t belong in the learning-from-scratch bucket. Other parts of the pallidum could go either way, depending on some judgment calls about where to define the I/​O surface of certain learning algorithms. The pallidum is sufficiently small that I don’t need to change my previous volume estimates, including the “96%” figure. I’m not going to go through the whole series and change “telencephalon” to “cortical mantle & extended striatum” in a million places, sorry, you’ll just have to remember.)

The thalamus is technically outside the telencephalon, but most of it is intimately interconnected with the cortex—some researchers describe it as functionally like an “extra layer” of cortex. So I would lump that part in with the learning-from-scratch telencephalon too.

The telencephalon and thalamus together comprise ~86% the volume of the human brain (ref).

Second, I think the cerebellum also learns from scratch (and is likewise useless at birth). The cerebellum is ~10% of adult brain volume (ref). More on the cerebellum in Post #4.

Third, I think the hypothalamus and brainstem absolutely do NOT learn from scratch (and they are very active and useful right from birth). I think other parts of the diencephalon are in the same category too—e.g. the habenula and pineal gland.

OK, that’s my hypothesis.

I wouldn’t be surprised if there were minor exceptions to this picture. Maybe there’s some little nucleus somewhere in the telencephalon that orchestrates a biologically-adaptive behavior without first learning it from scratch. (Edited to add: Yup! See update above.) Sure, why not. But I currently think this picture is at least broadly right.

In the next two sections I’ll talk about some evidence related to my hypothesis, and then what others in the field think of it.

2.5 Evidence on whether the telencephalon & cerebellum learn from scratch

2.5.1 Big-picture-type evidence

I find from reading and talking to people that the biggest sticking points against believing that the telencephalon and cerebellum learn from scratch is overwhelmingly not detailed discussion of neuroscience data etc. but rather:

  1. failure to even consider this hypothesis as a possibility, and

  2. confusion about the consequences of this hypothesis, and in particular how to flesh it out into a sensible big picture of brain and behavior.

If you’ve read this far, then #1 should no longer be a problem.

What about #2? A central type of question is “If the telencephalon & cerebellum learn from scratch, then how do they do X?”—for various different X. If there’s an X for which we can’t answer this question at all, it suggests that the learning-from-scratch hypothesis is wrong. Conversely, if we can find really good answers to this question for lots of X, it would offer evidence (though not proof) that the learning-from-scratch hypothesis is right. The upcoming posts in this series will, I hope, offer some of this type of evidence.

2.5.2 Neonatal evidence

If the telencephalon & cerebellum cannot produce biologically-adaptive outputs except by learning to do so over time, then it follows that any biologically-adaptive neonatal[1] behavior would have to be driven by the brainstem & hypothalamus. Is that right? It seems like the kind of thing that should be experimentally measurable, right? This 1991 paper indeed says “evidence has been accumulating that suggests that newborn perceptuomotor activity is mainly controlled by subcortical mechanisms”. But I don’t know if anything has changed in the 30 years since that paper—let me know if you’ve seen other references on this.

Actually, it’s a harder question than it sounds. Suppose an infant does something biologically-adaptive…

  • The first question we need to ask is: really? Maybe it’s a bad (or wrongly-interpreted) experiment. For example, if an adult sticks his tongue out at a newborn infant human, will the infant stick out her own tongue as an act of imitation? Seems like a simple question, right? Nope, it’s a decades-long raging controversy. A competing theory is centered around oral exploration: “tongue protrusion seems to be a general response to salient stimuli and is modulated by the child’s interest in the stimuli”; a protruding adult tongue happens to elicit this response, but so do flashing lights and bursts of music. I’m sure some people know which newborn experiments are trustworthy, but I don’t, at least not at the moment. And I’m feeling awfully paranoid after seeing two widely-respected books in the field (Scientist in the Crib, Origin of Concepts) repeat that claim about newborn tongue imitation as if it’s a rock-solid fact.

  • The second question we need to ask is: is it the result of within-lifetime learning? Remember, even a 3-month-old infant has had 4 million waking seconds of “training data” to learn from. In fact, even zero-day-old infants could have potentially been running learning-from-scratch algorithms in the womb.[1]

  • The third question we need to ask is: what part of the brain is orchestrating this behavior? My hypothesis says that non-learned adaptive behaviors cannot be orchestrated by the telencephalon or cerebellum. But my hypothesis does allow such behaviors to be orchestrated by the brainstem! And figuring out which part of the neonatal brain is causally upstream of some behavior can be experimentally challenging.

2.5.3 “Uniformity” evidence

The “cortical uniformity” hypothesis says that every part of the neocortex runs a more-or-less similar algorithm. (…With various caveats, especially related to the non-uniform neural architecture and hyperparameters). Opinions differ on whether (or to what extent) cortical uniformity is true—I have a brief discussion of the evidence and arguments here (and links to more). I happen to think it’s very probably true, at least in the weak sense that a future researcher who has a really good nuts-and-bolts understanding of how Neocortex Area #147 works would be well on their way to understanding how literally any other part of the neocortex works. I won’t be diving into that here; I consider it generally off-topic for this series.

I bring this up because if you believe in cortical uniformity, then you should probably believe in cortical learning-from-scratch as well. The argument goes as follows:

The adult neocortex does lots of apparently very different things: vision processing, sound processing, motor control, language, planning, and so on. How would one reconcile this fact with cortical uniformity?

Learning-from-scratch offers a plausible way to reconcile them. After all, we know that a single learning-from-scratch algorithm, fed with very different input data and supervisory signals, can wind up doing very different things—consider how transformer-architecture deep neural networks can be trained to generate natural-language text, or images, or music, or robot motor control signals, etc.

By contrast, if we accept cortical uniformity but reject learning-from-scratch, well, umm, I can’t see any way to make sense of how that would work.

Analogously (but less often discussed than the neocortex case), should we believe in “allocortical uniformity”? As background, allocortex seems to be a simpler version of neocortex, with three layers instead of six; before the neocortex evolved, early amniotes are believed to have had 100% allocortex. Allocortex, like neocortex, does various different things: in adult humans, the hippocampus is involved in navigation and episodic memory, while the piriform cortex is involved in olfactory processing. So there’s a potential analogous argument for learning-from-scratch there.

Moving on, I mentioned above (and more in Big Picture of Phasic Dopamine, and also upcoming in Post #5, Section 5.4.1) the idea (due to Larry Swanson) that the whole telencephalon seems to be organized into three layers—“cortex”, “striatum”, and “pallidum”. I just talked about cortex; what about “striatal uniformity” and “pallidal uniformity”? Don’t expect to find a dedicated literature review—in fact, the previous sentence seems to be the first time those two terms have ever been written down. But there are in fact at least some commonalities across each of those layers—e.g., medium spiny neurons exist everywhere in the striatum layer, I think. And I continue to believe that the picture I outlined in Big Picture of Phasic Dopamine (and upcoming Posts #5-#6) is a reasonable first-pass reconciliation between “everything we know about the striatum and pallidum” on the one hand, and “several variations on a certain learning-from-scratch algorithm” on the other hand.

In the cerebellum case, there is at least some literature on the uniformity hypothesis (search for the term “universal cerebellar transform”), but again no consensus. The adult cerebellum is likewise involved in apparently-different functions like motor coordination, language, cognition, and emotions. I personally believe in uniformity there too, with details coming up in Post #4.

2.5.4 Locally-random pattern separation

This is another reason that I personally put a lot of stock in the learning-from-scratch telencephalon and cerebellum. It’s kinda specific, but very salient in my mind; see if you buy it. What is pattern separation?

There is a common motif in the brain called “pattern separation”. Let me explain what it is and why it exists.

Suppose you’re an ML engineer working for a restaurant chain. Your boss tells you to predict sales for different candidate franchise locations.

The first thing you might do is to gather a bunch of data-streams—local unemployment rate, local restaurant ratings, local grocery store prices, whether there happens to be a novel coronavirus spreading around the world right now, etc. I like to call these “context data”. You would use the context data as inputs to a neural network. The output of the network is supposed to be a prediction of the restaurant sales. You adjust the neural network weights (using supervised learning, with data from existing restaurants) to make that happen. No problem!

Pattern separation is when you add an extra step at the beginning. You take your various context data-streams, and randomly combine them in lots and lots of different ways. Then you sprinkle in some nonlinearity, and voilà! You now have way more context data-streams than you started with! Then those can be the inputs to the trainable neural net.[4]

Illustration of (part of) fruit fly sensory processing. The tall vertical gray bar in the center-left is the “pattern separation” layer; it takes the organized sensory signals coming from the left, and remixes them up into a large number of different, (locally) random combinations. These are then sent rightward, to serve as “context” inputs for the supervised learning modules. Image source: Li et al. 2020.

In ML terms, pattern separation is like adding a very wide hidden layer, at the input side, with fixed weights. If the layer is wide enough, you’ll find that some neurons in the layer are carrying useful representations, just by good luck. And then the next layer can use those useful neurons, and ignore the rest.

ML readers are now thinking to themselves: “OK, fine, but this is kinda dumb. Why add a extra-wide hidden layer at the beginning, with non-learnable weights? Why not just add a normal-sized hidden layer at the beginning, with learnable weights? Wouldn’t that be easier and better?” Umm, probably! At least, it would indeed probably be better in this particular example.[5]

So why add a pattern-separation layer, instead of an extra learnable layer? Well, remember that in biological neurons, doing backprop (or something equivalent) through multiple learnable layers is at best a complicated procedure, and at worst totally impossible. Or at least, that’s my current impression. Backprop as such is widely acknowledged to be impossible to implement in biological neurons (cf. “the weight transport problem”). Various groups in ML and neuroscience have taken that as a challenge, and devised roughly 7 zillion different mechanisms that are (allegedly) biologically plausible and (allegedly) wind up functionally similar to backprop for one reason or another.[6] I haven’t read all these papers. But anyway, even if it’s possible to propagate errors through 2 learnable layers of biological neurons (or 3 layers, or even N layers), let’s remember that it’s an absolute breeze to do error-driven learning with just one learnable layer, using biological neurons. (All it takes is a set of identical synapses, getting updated by a 3-factor learning rule. Details coming up in Post #4.) So it’s not crazy to think that evolution might settle on a solution like pattern separation, which gets some of the advantage of an extra learnable layer, but without the complication of actually propagating errors through an extra learnable layer. Where is pattern-separation?

Pattern separation is thought to occur in a number of places, particularly involving the tiny and numerous neurons called “granule cells”:

  • The cerebellum has pattern-separating granule cells in its “granular layer” (ref). And boy are there a lot of them! Adult humans have 50 billion of them—more than half the neurons in your entire brain.

  • The hippocampus has pattern-separating granule cells in its “dentate gyrus”.

  • The neocortex has pattern-separating granule cells in “layer 4”, its primary (feedforward) input layer. To be clear, some neocortex is called “agranular”, meaning that it lacks this granular layer. But that’s just because not all the neocortex is processing inputs of the type that gets pattern-separated. Some neocortex is geared to outputs instead (details here).

  • The fruit fly nervous system has a “mushroom body” consisting of “kenyon cells” which are also believed to be pattern-separators—see references here. Why does pattern separation suggest learning-from-scratch?

The thing is, pattern separation seems to be a locally random process. What does “locally” mean here? Well, it’s generally not true that any one input is equally likely to be mixed with any other input. (At least, not in fruit flies.) I only claim that it involves randomness at a small scale—like, out of this microscopic cluster of dozens of granule cells, exactly which cell connects with exactly which of the nearby input signals? I think the answer to those kinds of questions is: it’s random.

Why do I say that it’s probably (locally) random? Well, I can’t be sure, but I do have a few reasons.

  • From an algorithm perspective, (local) randomness seems like it would work, and indeed has some nice properties, like statistical guarantees about low overlap between sparse activation patterns.

  • From an information-theory perspective, if there are 50 billion granule cells in an adult cerebellum, I find it pretty hard to imagine that the exact connections to each of them is deterministically orchestrated by the <1GB genome, while still satisfying the various algorithmic and biological constraints.

  • From an experimental perspective, I’m not sure about vertebrates, but at least in the fruit fly case, genetically-identical fruit flies are known to have different kenyon cell connectivity (ref).

Anyway, if pattern-separation is a (locally) random process, then that means that you can’t do anything useful with the outputs of a pattern-separation layer, except by learning to do so. In other words, we wind up with a learning-from-scratch algorithm! (Indeed, one that would stay learning-from-scratch even in the face of evolutionary pressure to micromanage the initial parameters!)

2.5.5 Summary: I don’t pretend that I’ve proven the hypothesis of learning-from-scratch telencephalon and cerebellum, but I’ll ask you to suspend disbelief and read on

In my own head, when I mash together all the considerations I discussed above (big-picture stuff, neonatal stuff, uniformity, and locally-random pattern separation), I wind up feeling quite confident in my hypothesis of a learning-from-scratch telencephalon and cerebellum. But really, everything here is suggestive, not the kind of definitive, authoritative discussion that would convince every skeptic. The comprehensive scholarly literature review on learning-from-scratch telencephalon and cerebellum, so far as I know, has yet to be written.

Don’t get me wrong; I would love to write that! I would dive into all the relevant evidence, like everything discussed above, plus other things like experiments on decorticate rats, etc. That would be awesome, and I may well do that at some point in the future. (Or reach out if you want to collaborate!)

But meanwhile, I’m going to treat the hypothesis as if it were true. This is just for readability—the whole rest of the series will be exploring bigger-picture consequences of the hypothesis, and it would get really annoying if I put apologies and caveats in every other sentence.

2.6 Is my hypothesis consensus, or controversial?

Weirdly, I just don’t know! This is not a hot topic of discussion in neuroscience. I think most people haven’t even thought to formulate “what parts of the brain learn from scratch” as an explicit question, let alone a question of absolutely central importance.

(I heard from an old-timer in the field that the question “what parts of the brain learn from scratch?” smells too much like “nature vs nurture”. According to them, everyone had a fun debate about “nature vs nurture” in the 1990s, and then they got sick of it and moved on to other things! Indeed, I asked for a state-of-the art reference on the evidence for learning-from-scratch in the telencephalon and cerebellum, and they suggested a book from 25 years ago! It’s a good book—in fact I had already read it. But c’mon!! We’ve learned new things since 1996, right??)

Some data points:

  • Neuroscientist Randall O’Reilly explicitly endorses a learning-from-scratch neocortex (in agreement with me). He talks about it here (30:00), citing this paper on infant face recognition as a line of evidence. In fact, I think O’Reilly would agree with at least most of my hypothesis, and maybe all of it.

  • I’m also pretty confident that Jeff Hawkins and Dileep George would endorse my hypothesis, or at least something very close to it. More on them in the next post.

  • A commenter suggested the book Beyond Evolutionary Psychology by George Ellis & Mark Solms (2018), which (among other things) argues for something strikingly similar to my hypothesis—they list the brain’s “soft-wired domains” as consisting of the neocortex, the cerebellum, and “parts of the limbic system, for instance most of the hippocampus and amygdala, and large parts of the basal ganglia” (page 209). Almost a perfect match to my list! But their notion of “soft-wired domains” is defined somewhat differently than my notion of “learning from scratch”, and indeed I disagree with the book in numerous areas. But anyway, the book has lots of relevant evidence and literature. Incidentally, the book was mainly written as an argument against an “innate cognitive modules” perspective exemplified by Steven Pinker’s How the Mind Works (1994). So it’s no surprise that Steven Pinker would disagree with the claim that the neocortex learns from scratch (well, I’m 99% confident that he would)—see, for example, his book The Blank Slate (2003) chapter 5.

  • Some corners of computational neuroscience—particularly those with ties to the deep learning community—seem very enthusiastic about learning-from-scratch algorithms in the brain in general. But the discourse there doesn’t seem specific enough to answer my question. For example, I’m looking for statements like “Within-lifetime learning algorithms are a good starting point for understanding the neocortex, but a bad starting point for understanding the medulla.” I can’t find anything like that. Instead I see, for example, the paper “A deep learning framework for neuroscience” (by 32 people including Blake Richards and Konrad Kording), which says something like “Learning algorithms are very important for the brain, and sometimes those learning algorithms are within-lifetime learning algorithms, whereas other times the only learning algorithm is evolution.” But which parts of the brain are in which category? The paper doesn’t say.

  • My vague impression from sporadically reading papers with computational models of the neocortex, hippocampus, cerebellum, and striatum, from various different groups, is that the models are at least often learning-from-scratch models, but not always.

In summary, while I’m uncertain, there’s some reason to believe that my hypothesis is not too far outside the mainstream…

But it’s only “not too far outside the mainstream” in a kind of tunnel-vision sense. Almost nobody in neuroscience is taking the hypothesis seriously enough to grapple with its bigger-picture consequences. As mentioned above, if you believe (as I do) that “if the telencephalon or cerebellum perform a useful function X, they must have learned to perform that function, within the organism’s lifetime, somehow or other”, then that immediately spawns a million follow-up questions of the form: “How did it learn to do X? Is there a ground-truth that it’s learning from? What is it? Where does it come from?” I have a hard time finding good discussions of these questions in the literature. Whereas I’m asking these questions constantly, as you’ll see if you read on.

In this series of posts, I’m going to talk extensively about the bigger-picture framework around learning-from-scratch. By contrast, I’m going to talk relatively little about the nuts-and-bolts of how the learning algorithms work. That would be a complicated story which is not particularly relevant for AGI safety. And at least in some cases, nobody really knows the exact learning algorithms anyway.

2.7 Why does learning-from-scratch matter for AGI safety?

Much more on this later, but here’s a preview.

The single most important question in AGI safety is: Is the AGI trying to do something that we didn’t intend for it to be trying to do?

If no, awesome! This is sometimes called “intent alignment”. Granted, even with intent alignment, we can’t quite declare complete victory over accident risk—the AGI can still screw things up despite good intentions (see Post #11). But we’ve made a lot of progress, and probably averted the worst problems.

By contrast, if the AGI is trying to do something that we hadn’t intended for it to be trying to do, that’s where we get into the really bad minefield of catastrophic accidents. And as we build more and more capable AGIs over time, the accidents get worse, not better, because the AGI will become more skillful at figuring out how best to do those things that we had never wanted it to be doing in the first place.

So the critical question is: how does the AGI wind up trying to do one thing and not another? And the follow-up question is: if we want the AGI to be trying to do a particular thing X (where X is “act ethically”, or “be helpful”, or whatever—more on this in future posts), what code do we write?

Learning-from-scratch means that the AGI’s common-sense world-model involves one or more big data structures that are built from scratch during the AGI’s “lifetime” /​ “training”. The stuff inside those data structures is not necessarily human-interpretable[7]—after all, it was never put in by humans in the first place!

And unfortunately, the things that we want the AGI to be trying to do—“act ethically”, or “solve Alzheimer’s”, or whatever—are naturally defined in terms of abstract concepts. At best, those concepts are buried somewhere inside those big data structures. At worst (e.g. early in training), the AGI might not even have those concepts in the first place. So how do we write code such that the AGI wants to solve Alzheimer’s?

In fact, evolution has the same problem! Evolution would love to paint the abstract concept “Have lots of biological descendants” with positive valence, but thanks to learning-from-scratch, the genome doesn’t know which precise set of neurons will ultimately be representing this concept. (And not all humans have a “biological descendants” concept anyway.) The genome does other things instead, and later in the series I’ll be talking more about what those things are.

2.8 Timelines-to-brain-like-AGI part 1/​3: how hard will it be to reverse-engineer the learning-from-scratch parts of the brain, well enough for AGI?

This isn’t exactly on-topic, so I don’t want to get too deep into it. But in Section 1.5 of the previous post, I mentioned that there’s a popular idea that “brain-like AGI” (as defined in Section 1.3.2 of the previous post) is probably centuries away because the brain is so very horrifically complicated. I then said that I strongly disagreed. Now I can say a bit about why.

As context, we can divide the “build brain-like AGI” problem up into three pieces:

  1. Reverse-engineer the learning-from-scratch parts of the brain (telencephalon & cerebellum) well enough for AGI,

  2. Reverse-engineer everything else (mainly the brainstem & hypothalamus) well enough for AGI,

  3. Actually build the AGI—including hardware-accelerating the code, running model trainings, working out all the kinks, etc.

This section is about #1. I’ll get back to #2 & #3 in Sections 3.7 & 3.8 of the next post.

Learning-from-scratch is highly relevant here because reverse-engineering learning-from-scratch algorithms is a way simpler task than reverse-engineering trained models.

For example, think of the OpenAI Microscope visualizations of different neurons in a deep neural net. There’s so much complexity! But no human needed to design that complexity; it was automatically discovered by the learning algorithm. The learning algorithm itself is comparatively simple—gradient descent and so on.

Here are some more intuitions on this topic:

  • I think that learning-from-scratch algorithms kinda have to be simple, because they have to draw on broadly-applicable regularities—“patterns tend to recur”, and “patterns are often localized in time and space”, and “things are often composed of other things”, and so on.

  • Human brains were able to invent quantum mechanics. I can kinda see how a learning algorithm based on simple, general principles like “things are often composed of other things” (as above) can eventually invent quantum mechanics. I can’t see how a horrifically-complicated-Rube-Goldberg-machine of an algorithm can invent quantum mechanics. It’s just so wildly different from anything in the ancestral environment.

  • The learning algorithm’s neural architecture, hyperparameters, etc. could be kinda complicated. I freely admit it. For example, this study says that the neocortex has 180 architecturally-distinguishable areas. But on the other hand, future researchers don’t need to reinvent that stuff from scratch; they could also just “crib the answers” from the neuroscience literature. And also, not all that complexity is necessary for human intelligence—as we know from the ability of infants to (sometimes) fully recover from various forms of brain damage. Some complexity might just help speed the learning process up a bit, on the margin, or might help with unnecessary-for-AGI things like our sense of smell.

  • In Section 2.5.3, I discussed the “cortical uniformity” hypothesis and its various cousins. If true, it would greatly limit the potential difficulty of understanding how those parts of the brain work. But I don’t think anything I’m saying here depends on the “uniformity” hypotheses being true, let alone strictly true.

Going back to the question at issue. In another (say) 20 years, will we understand the telencephalon and cerebellum well enough to build the learning-from-scratch part of an AGI?

I say: I don’t know! Maybe we will, maybe we won’t.

There are people who disagree with me on that. They claim that the answer is “Absolutely 100% not! Laughable! How dare you even think that? That’s the kind of thing that only a self-promoting charlatan would say, as they try to dupe money out of investors! That’s not the kind of thing that a serious cautious neuroscientist would say!!!” Etc. etc.

My response is: I think that this is wildly-unwarranted overconfidence. I don’t see any good reason to rule out figuring this out in (say) 20 years, or even 5 years. Or maybe it will take 100 years! I think we should remain uncertain. As they say, “Predictions are hard, especially about the future.”

  1. ^

    I keep saying that “learning from scratch” implies “unhelpful for behavior at birth”. This is an oversimplification, because it’s possible for “within-lifetime learning” to happen in the womb. After all, there should already be plenty of data to learn from in the womb—interoception, sounds, motor control, etc. And maybe retinal waves too—those could be functioning as fake sensory data for the learning algorithm to learn from.

  2. ^

    Minor technicality: Why did I say the input data and supervisory signals for the neocortex (for example) come from outside the neocortex? Can’t one part of the neocortex get input data and/​or supervisory signals from a different part of the neocortex? Yes, of course. However, I would instead describe that as “part of the neocortex’s neural architecture”. By analogy, in ML, people normally would NOT say “ConvNet-layer-12 gets input data from ConvNet-layer-11”. Instead, they would be more likely to say “The ConvNet (as a whole) gets input data from outside the ConvNet”. This is just a way of talking, it doesn’t really matter.

  3. ^

    I’m framing this as a “made-up example” because I’m trying to make a simple conceptual point, and don’t want to get bogged down in complicated uncertain empirical details. That said, the bird song thing is not entirely made up—it’s at least “inspired by a true story”. See discussion here of Gadagkar 2016, which found that a subset of dopamine neurons in the songbird brainstem send signals that look like RL rewards for song quality, and those signals go specifically to the vocal motor system, presumably training it to sing better. The missing part of that story is: what calculations are upstream of those particular dopamine neurons? In other words, how does the bird brain judge its own success at singing? For example, does it match its self-generated auditory inputs to an innate template? Or maybe the template is generated in a more complicated way—say, involving listening to adult birds of the same species? Or something else? I’m not sure the details here are known—or at least, I don’t personally know them.

  4. ^

    Why is it called “pattern separation”? It’s kinda related to the fact that a pattern-separator has more output lines than input lines. For example, you might regularly encounter five different “patterns” of sensory data, and maybe all of them consist of activity in the same set of 30 input lines, albeit with subtle differences—maybe one pattern has such-and-such input signal slightly stronger than in the other patterns, etc. So on the input side, we might say that these five patterns “overlap”. But on the output side, maybe these five patterns would wind up activating entirely different sets of neurons. Hence, the patterns have been “separated”.

  5. ^

    In other examples, I think pattern separation is serving other purposes too, e.g. sparsifying the neuron activations, which turns out to be very important for various reasons, including not getting seizures.

  6. ^

    If you want to dive into the rapidly-growing literature on biologically-plausible backprop-ish algorithms, a possible starting point would be References #12, 14, 34–38, 91, 93, and 94 of A deep learning framework for neuroscience.

  7. ^

    There is a field of “machine learning interpretability”, dedicated to interpreting the innards of learned-from-scratch “trained models”—example. I (along with everyone else working on AGI safety) strongly endorse efforts to advance that field, including tackling much bigger models, and models trained by a wider variety of different learning algorithms. Also on this topic: I sometimes hear an argument that a brain-like AGI using a brain-like learning algorithm will produce a relatively more human-interpretable trained model than alternatives. This strikes me as maybe true, but far from guaranteed, and anyway “relatively more human-interpretable” is different than “very human-interpretable”. Recall that the neocortex has ~100 trillion synapses, and an AGI could eventually have many more than that.