Human instincts, symbol grounding, and the blank-slate neocortex

In­tro: What is Com­mon Cor­ti­cal Al­gorithm (CCA) the­ory, and why does it mat­ter for AGI?

As I dis­cussed at Jeff Hawk­ins on neu­ro­mor­phic AGI within 20 years, and was ear­lier dis­cussed on LessWrong at The brain as a uni­ver­sal learn­ing ma­chine, there is a the­ory, due origi­nally to Ver­non Mount­cas­tle in the 1970s, that the neo­cor­tex[1] (75% of the hu­man brain by weight) con­sists of ~150,000 in­ter­con­nected copies of a lit­tle mod­ule, the “cor­ti­cal column”, each of which im­ple­ments the same al­gorithm. Fol­low­ing Jeff Hawk­ins, I’ll call this the “com­mon cor­ti­cal al­gorithm” (CCA) the­ory. (I don’t think that ter­minol­ogy is stan­dard.)

So in­stead of say­ing that the hu­man brain has a vi­sion pro­cess­ing al­gorithm, mo­tor con­trol al­gorithm, lan­guage al­gorithm, plan­ning al­gorithm, and so on, in CCA the­ory we say that (to a first ap­prox­i­ma­tion) we have a mas­sive amount of “gen­eral-pur­pose neo­cor­ti­cal tis­sue”, and if you dump vi­sual in­for­ma­tion into that tis­sue, it does vi­sual pro­cess­ing, and if you con­nect that tis­sue to mo­tor con­trol path­ways, it does mo­tor con­trol, etc.

Whether and to what ex­tent CCA the­ory is true is, I think, very im­por­tant for AGI fore­cast­ing, strat­egy, and both tech­ni­cal and non-tech­ni­cal safety re­search di­rec­tionssee my an­swer here for more de­tails.

Should we be­lieve CCA the­ory?

CCA the­ory, as I’m us­ing the term, is a sim­plified model. There are al­most definitely a cou­ple caveats to it:

  1. There are sorta “hy­per­pa­ram­e­ters” on the generic learn­ing al­gorithm which seem to be set differ­ently in differ­ent parts of the neo­cor­tex. For ex­am­ple, some ar­eas of the cor­tex have higher or lower den­sity of par­tic­u­lar neu­ron types. There are other ex­am­ples too.[2] I don’t think this sig­nifi­cantly un­der­mines the use­ful­ness or cor­rect­ness of CCA the­ory, as long as these changes re­ally are akin to hy­per­pa­ram­e­ters, as op­posed to spec­i­fy­ing fun­da­men­tally differ­ent al­gorithms. So my read­ing of the ev­i­dence is that if you put, say, mo­tor nerves com­ing out of vi­sual cor­tex tis­sue, the tis­sue could do mo­tor con­trol, but it wouldn’t do it quite as well as the mo­tor cor­tex does.[3]

  2. There is al­most definitely a gross wiring di­a­gram hard­coded in the genome—i.e., set of con­nec­tions be­tween differ­ent neo­cor­ti­cal re­gions and each other, and other parts of the brain. Th­ese con­nec­tions later get re­fined and ed­ited dur­ing learn­ing. Again, we can ask how much the ex­is­tence of this in­nate gross wiring di­a­gram un­der­mines CCA the­ory. How com­pli­cated is the wiring di­a­gram? Is it mil­lions of con­nec­tions among thou­sands of tiny re­gions, or just tens of con­nec­tions among a few re­gions? Would the brain work at all if you started with a ran­dom wiring di­a­gram? I don’t know for sure, but for var­i­ous rea­sons, my cur­rent be­lief is that this ini­tial gross wiring di­a­gram is not car­ry­ing much of the weight of hu­man in­tel­li­gence, and thus that this point is not a sig­nifi­cant prob­lem for the use­ful­ness of CCA the­ory. (This is a loose state­ment; of course it de­pends on what ques­tions you’re ask­ing.) I think of it more like: if it’s biolog­i­cally im­por­tant to learn a con­cept space that’s built out of as­so­ci­a­tions be­tween in­for­ma­tion sources X, Y, and Z, well, you just dump those three in­for­ma­tion streams into the same part of the cor­tex, and then the CCA will take it from there, and it will re­li­ably build this con­cept space. So once you have the CCA nailed down, it kinda feels to me like you’re most of the way there....[4]

Go­ing be­yond these caveats, I found pretty helpful liter­a­ture re­views on both sides of the is­sue:

  • The ex­per­i­men­tal ev­i­dence for CCA the­ory: see chap­ter 5 of Re­think­ing In­nate­ness (1996)

  • The ex­per­i­men­tal ev­i­dence against CCA the­ory: see chap­ter 5 of The Blank Slate by Steven Pinker (2002).

I won’t go through the de­bate here, but af­ter read­ing both of those I wound up feel­ing that CCA the­ory (with the caveats above) is prob­a­bly right, though not 100% proven. Please com­ment if you’ve seen any other good refer­ences on this topic, es­pe­cially more up-to-date ones.

I some­times re­fer to the CCA the­ory as there be­ing a “blank-slate neo­cor­tex”. Here, “blank slate” does not mean “no in­duc­tive bi­ases”—of course there are in­duc­tive bi­ases! It means that the in­duc­tive bi­ases are suffi­ciently gen­eral and low-level that they work equally well for ex­tremely di­verse do­mains such as lan­guage, vi­sion, mo­tor con­trol, plan­ning, math home­work, and so on. I typ­i­cally think that the in­duc­tive bi­ases are at a very low level, things like “we should model in­puts us­ing a cer­tain type of data struc­ture in­volv­ing tem­po­ral se­quences and spa­tial re­la­tions”, and not higher-level se­man­tic knowl­edge like in­tu­itive biol­ogy or “when is it ap­pro­pri­ate to feel guilty?” or tool use etc. (I don’t even think ob­ject per­ma­nence or in­tu­itive psy­chol­ogy are built into the neo­cor­tex; I think they’re learned in early in­fancy. This is con­tro­ver­sial and I won’t try to jus­tify it here. Well, in­tu­itive psy­chol­ogy is a com­pli­cated case, see be­low.) Any­way, that brings us to...

CCA the­ory vs hu­man-uni­ver­sal traits and instincts

The main topic for this post is:

If Com­mon Cor­ti­cal Al­gorithm the­ory is true, then how do we ac­count for all the hu­man-uni­ver­sal in­stincts and be­hav­iors that evolu­tion­ary psy­chol­o­gists talk about?

In­deed, we know that there are a di­verse set of re­mark­ably spe­cific hu­man in­stincts and men­tal be­hav­iors evolved by nat­u­ral se­lec­tion. Again, Steven Pinker’s The Blank Slate is a pop­u­lariza­tion of this ar­gu­ment; it ends with Don­ald E. Brown’s gi­ant list of “hu­man uni­ver­sals”, i.e. be­hav­iors that are ob­served in ev­ery hu­man cul­ture.

Now, 75% of the hu­man brain (by weight) is the neo­cor­tex, but the other 25% con­sists of var­i­ous sub­cor­ti­cal (“old-brain”) struc­tures like the amyg­dala, and these struc­tures are perfectly ca­pa­ble of im­ple­ment­ing spe­cific in­stincts. But these struc­tures do not have ac­cess to an in­tel­li­gent world-model—only the neo­cor­tex does! So how can the brain im­ple­ment in­stincts that re­quire in­tel­li­gent un­der­stand­ing? For ex­am­ple, maybe the fact that “Alice got two cook­ies and I only got one!” is rep­re­sented in the neo­cor­tex as the ac­ti­va­tion of neu­ral firing pat­tern 7482943. There’s no ob­vi­ous mechanism to con­nect this ar­bi­trary, learned pat­tern to the “That’s so un­fair!!!” sec­tion of the amyg­dala. The neo­cor­tex doesn’t know about un­fair­ness, and the amyg­dala doesn’t know about cook­ies. Quite a co­nun­drum!

This is re­ally a sym­bol ground­ing prob­lem, which is the other rea­son this post is rele­vant to AI al­ign­ment. When the hu­man genome builds a hu­man, it faces the same prob­lem as a hu­man pro­gram­mer build­ing an AI: how can one point a goal sys­tem at things in the world, when the in­ter­nal rep­re­sen­ta­tion of the world is a com­pli­cated, idiosyn­cratic, learned data struc­ture? As we wres­tle with the AI goal al­ign­ment prob­lem, it’s worth study­ing what hu­man evolu­tion did here.

List of ways that hu­man-uni­ver­sal in­stincts and be­hav­iors can ex­ist de­spite CCA theory

Fi­nally, the main part of this post. I don’t know a com­plete an­swer, but here are some of the cat­e­gories I’ve read about or thought of, and please com­ment on things I’ve left out or got­ten wrong!

Mechanism 1: Sim­ple hard­coded con­nec­tions, not im­ple­mented in the neocortex

Ex­am­ple: En­joy­ing the taste of sweet things. This one is easy. I be­lieve the nerve sig­nals com­ing out of taste buds branch, with one branch go­ing to the cor­tex to be in­te­grated into the world model, and an­other branch go­ing to sub­cor­ti­cal re­gions. So the genes merely have to wire up the sweet­ness taste buds to the good-feel­ings sub­cor­ti­cal re­gions.

Mechanism 2: Sub­cor­tex-su­per­vised learn­ing.

Ex­am­ple: Want­ing to eat choco­late. This is differ­ent than the pre­vi­ous item be­cause “sweet taste” refers to a spe­cific in­nate phys­iolog­i­cal thing, whereas “choco­late” is a learned con­cept in the neo­cor­tex’s world-model. So how do we learn to like choco­late? Be­cause when we eat choco­late, we en­joy it (Mechanism 1 above). The neo­cor­tex learns to pre­dict a sweet taste upon eat­ing choco­late, and thus paints the world-model con­cept of choco­late with a “sweet taste” prop­erty. The su­per­vi­sory sig­nal is mul­ti­di­men­sional, such that the neo­cor­tex can learn to paint con­cepts with var­i­ous la­bels like “painful”, “dis­gust­ing”, “com­fortable”, etc., and gen­er­ate ap­pro­pri­ate be­hav­iors in re­sponse. (Vaguely re­lated: the Deep­Mind pa­per Pre­frontal cor­tex as a meta-re­in­force­ment learn­ing sys­tem.)

Mechanism 3: Same learn­ing al­gorithm + same world = same in­ter­nal model

Pos­si­ble ex­am­ple: In­tu­itive biol­ogy. In The Blank Slate you can find a dis­cus­sion of in­tu­itive biol­ogy /​ es­sen­tial­ism, which “be­gins with the con­cept of an in­visi­ble essence re­sid­ing in liv­ing things, which gives them their form and pow­ers.” Thus preschool­ers will say that a dog al­tered to look like a cat is still a dog, yet a wooden toy boat cut into the shape of a toy car has in fact be­come a toy car. I think we can ac­count for this very well by say­ing that ev­ery­one’s neo­cor­tex has the same learn­ing al­gorithm, and when they look at plants and an­i­mals they ob­serve the same kinds of things, so we shouldn’t be sur­prised that they wind up form­ing similar in­ter­nal mod­els and rep­re­sen­ta­tions. I found a pa­per that tries to spell out how this works in more de­tail; I don’t know if it’s right, but it’s in­ter­est­ing: free link, offi­cial link.

Mechan­sim 4: Hu­man-uni­ver­sal memes

Ex­am­ple: Fire. I think this is pretty self-ex­plana­tory. Peo­ple learn about fire from each other. No need to talk about neu­rons, be­yond the more gen­eral is­sues of lan­guage and so­cial learn­ing dis­cussed be­low.

Mechanism 5: “Two-pro­cess the­ory”

Pos­si­ble ex­am­ple: In­nate in­ter­est in hu­man faces.[5] The sub­cor­tex-su­per­vised learn­ing mechanism above (Mechanism 2) can be thought of more broadly as an in­ter­ac­tion be­tween a hard­wired sub­cor­ti­cal sys­tem that cre­ates a “ground truth”, and a cor­ti­cal learn­ing al­gorithm that then learns to re­late that ground truth to its com­plex in­ter­nal rep­re­sen­ta­tions. Here, John­son’s “two-pro­cess the­ory” for faces fits this same mold, but with a more com­pli­cated sub­cor­ti­cal sys­tem for ground truth. In this the­ory, a sub­cor­ti­cal sys­tem (ETA: speci­fi­cally, the su­pe­rior col­licu­lus[6]) gets di­rect ac­cess to a low-re­s­olu­tion ver­sion of the vi­sual field, and looks for a pat­tern with three blobs in lo­ca­tions cor­re­spond­ing to the eyes and mouth of a blurry face. When it finds such a pat­tern, it passes in­for­ma­tion to the cor­tex that this is a very im­por­tant thing to at­tend to, and over time the cor­tex learns what faces ac­tu­ally look like (and sup­presses the origi­nal sub­cor­ti­cal tem­plate cir­cuitry). Any­way, John­son came up with this the­ory partly based on the ob­ser­va­tion that new­borns are equally en­tranced by pic­tures of three blobs ver­sus ac­tual faces (each of which were much more in­ter­est­ing than other pat­terns), but af­ter a few months the ba­bies were more in­ter­ested in ac­tual face pic­tures than the three-blob pic­tures. (Not sure what John­son would make of this twit­ter ac­count.)

(Other pos­si­ble ex­am­ples of in­stincts formed by two-pro­cess the­ory: fear of snakes, in­ter­est in hu­man speech sounds, sex­ual at­trac­tion.)

(Up­date: See my later post In­ner al­ign­ment in the brain for a more fleshed-out dis­cus­sion of this mechanism.)

Mechanism 6: Time-windows

Ex­am­ples: Filial im­print­ing in an­i­mals, in­cest re­pul­sion (Wester­marck effect) in hu­mans. Filial im­print­ing is a fa­mous re­sult where new­born chicks (and many other species) form a per­ma­nent at­tach­ment to the most con­spicu­ous mov­ing ob­ject that they see in a cer­tain pe­riod shortly af­ter hatch­ing. In na­ture, they always im­print on their mother, but in lab ex­per­i­ments, chicks can be made to im­print on a per­son, or even a box. As with other mechanisms here, time-win­dows pro­vides a nice solu­tion to the sym­bol ground­ing prob­lem, in that the genes don’t need to know what pre­cise col­lec­tion of neu­rons cor­re­sponds to “mother”, they only need to set up a time win­dow and a way to point to “con­spicu­ous mov­ing ob­jects”, which is pre­sum­ably eas­ier. The brain mechanism of filial im­print­ing has been stud­ied in de­tail for chicks, and con­sists of the com­bi­na­tion of time-win­dows plus the two-pro­cess model (mechanism 5 above). In fact, I think the two-pro­cess model was proven in chick brains be­fore it was pos­tu­lated in hu­man brains.

There like­wise seem to be var­i­ous time-win­dow effects in peo­ple, such as the Wester­marck effect, a sex­ual re­pul­sion be­tween two peo­ple raised to­gether as young chil­dren (an in­stinct which pre­sum­ably evolved to re­duce in­cest).

Mechanism 7 (spec­u­la­tive): em­pa­thetic ground­ing of in­tu­itive psy­chol­ogy.

Pos­si­ble ex­am­ple: So­cial emo­tions (grat­i­tude, sym­pa­thy, guilt,...) Again, the prob­lem is that the neo­cor­tex is the only place with enough in­for­ma­tion to, say, de­cide when some­one slighted you, so there’s no “ground truth” to use for sub­cor­tex-su­per­vised learn­ing. At first I was think­ing that the two-pro­cess model for hu­man faces and speech could be play­ing a role, but as far as I know, deaf-blind peo­ple have the nor­mal suite of so­cial emo­tions, so that’s not it ei­ther. I looked in the liter­a­ture a bit and couldn’t find any­thing helpful. So, I made up this pos­si­ble mechanism (warn­ing: wild spec­u­la­tion).

Step 1 is that a baby’s neo­cor­tex builds a “pre­dict­ing my own emo­tions” model us­ing nor­mal sub­cor­tex-su­per­vised learn­ing (Mechanism 2 above). Then a nor­mal Heb­bian learn­ing mechanism makes two-way con­nec­tions be­tween the rele­vant sub­cor­ti­cal struc­tures (amyg­dala) and the cor­ti­cal neu­rons in­volved in this pre­dic­tive model.

Step 2 is that the neo­cor­tex’s uni­ver­sal learn­ing al­gorithm will, in the nor­mal course of de­vel­op­ment, nat­u­rally dis­cover that this same “pre­dict­ing my own emo­tions” model from step 1 can be reused to pre­dict other peo­ple’s emo­tions (cf. Mechanism 3 above), form­ing the ba­sis for in­tu­itive psy­chol­ogy. Now, be­cause of those con­nec­tions-to-the-amyg­dala men­tioned in step 1, the amyg­dala is in­ci­den­tally get­ting sig­nals from the neo­cor­tex when the lat­ter pre­dicts that some­one else is an­gry, for ex­am­ple.

Step 3 is that the amyg­dala (and/​or neo­cor­tex) some­how learns the differ­ence be­tween the in­tu­itive psy­chol­ogy model run­ning in first-per­son mode ver­sus em­pa­thetic mode, and can thus gen­er­ate ap­pro­pri­ate re­ac­tions, with one path­way for “be­ing an­gry” and a differ­ent path­way for “know­ing that some­one else is an­gry”.

So let’s now re­turn to my cookie puz­zle above. Alice gets two cook­ies and I only get one. How can I feel it’s un­fair, given that the neo­cor­tex doesn’t have a built-in no­tion of un­fair­ness, and the amyg­dala doesn’t know what cook­ies are? The an­swer would be: thanks to sub­cor­tex-su­per­vised learn­ing, the amyg­dala gets a mes­sage that one yummy cookie is com­ing, but the neo­cor­tex also thinks “Alice is even hap­pier”, and that thought also re­cruits the amyg­dala, since in­tu­itive psy­chol­ogy is built on em­pa­thetic mod­el­ing. Now the amyg­dala knows that I’m gonna get some­thing good, but that Alice is gonna get some­thing even bet­ter, and that com­bi­na­tion (in the cur­rent emo­tional con­text) trig­gers the amyg­dala to send out waves of jeal­ousy and in­dig­na­tion. This is then a new su­per­vi­sory sig­nal for the neo­cor­tex, which al­lows the neo­cor­tex to grad­u­ally de­velop a model of fair­ness, which in turn feeds back into the in­tu­itive psy­chol­ogy mod­ule, and thereby back to the amyg­dala, al­low­ing the amyg­dala to ex­e­cute more com­pli­cated in­nate emo­tional re­sponses in the fu­ture, and so on.

(Up­date: See my later post In­ner al­ign­ment in the brain for a slightly more fleshed-out dis­cus­sion of this mechanism.)

The spe­cial case of lan­guage.

It’s tempt­ing to put lan­guage in the cat­e­gory of memes (mechanism 4 above)—we do gen­er­ally learn lan­guage from each other—but it’s not re­ally, be­cause ap­par­ently groups of kids can in­vent gram­mat­i­cal lan­guages from scratch (e.g. Ni­caraguan Sign Lan­guage). My cur­rent guess is that it com­bines three things: (1) a two-pro­cess mechanism (Mechanism 5 above) that makes peo­ple highly at­ten­tive to hu­man speech sounds. (2) pos­si­bly “hy­per­pa­ram­e­ter tun­ing” in the lan­guage-learn­ing ar­eas of the cor­tex, e.g. maybe to sup­port taller com­po­si­tional hi­er­ar­chies than would be re­quired el­se­where in the cor­tex. (3) The fact that lan­guage can sculpt it­self to the com­mon cor­ti­cal al­gorithm rather than the other way around—i.e., maybe “gram­mat­i­cal lan­guage” is just an­other word for “a lan­guage that con­forms to the types of rep­re­sen­ta­tions and data struc­tures that are na­tively sup­ported by the com­mon cor­ti­cal al­gorithm”.

By the way, lots of peo­ple (in­clud­ing Steven Pinker) seem to ar­gue that lan­guage pro­cess­ing is a fun­da­men­tally differ­ent and harder task than, say, vi­sual pro­cess­ing, be­cause lan­guage re­quires sym­bolic rep­re­sen­ta­tions, com­po­si­tion, re­cur­sion, etc. I don’t un­der­stand this ar­gu­ment; I think vi­sion pro­cess­ing needs the ex­act same things! I don’t see a fun­da­men­tal differ­ence be­tween the vi­sual-pro­cess­ing sys­tem know­ing that “this sheet of pa­per is part of my note­book”, and the gram­mat­i­cal “this prepo­si­tional phrase is part of this noun phrase”. Like­wise, I don’t see a differ­ence be­tween rec­og­niz­ing a back­ground ob­ject in­ter­rupted by a fore­ground oc­clu­sion, ver­sus rec­og­niz­ing a noun phrase in­ter­rupted by an in­ter­jec­tion. It seems to me like a similar set of prob­lems and solu­tions, which again strength­ens my be­lief in CCA the­ory.


When I ini­tially read about CCA the­ory, I didn’t take it too se­ri­ously be­cause I didn’t see how in­stincts could be com­pat­i­ble with it. But I now find it pretty likely that there’s no fun­da­men­tal in­com­pat­i­bil­ity. So hav­ing re­moved that ob­sta­cle, and also read the liter­a­ture a bit more, I’m much more in­clined to be­lieve that CCA the­ory is fun­da­men­tally cor­rect.

Again, I’m learn­ing as I go, and in some cases mak­ing things up as I go along. Please share any thoughts and poin­t­ers!

  1. I’ll be talk­ing a lot about the neo­cor­tex in this ar­ti­cle, but shout-out to the tha­la­mus and hip­pocam­pus, the other two pri­mary parts of the brain’s pre­dic­tive-world-model-build­ing-sys­tem. I’m just leav­ing them out for sim­plic­ity; this doesn’t have any im­por­tant im­pli­ca­tions for this ar­ti­cle. ↩︎

  2. More ex­am­ples of re­gion-to-re­gion vari­a­tion in the neo­cor­tex that are (plau­si­bly) ge­net­i­cally-coded: (1) Spin­dle neu­rons only ex­ist in a cou­ple spe­cific parts of the neo­cor­tex. I don’t re­ally know what’s the deal with those. Kurzweil claims they’re im­por­tant for so­cial emo­tions and em­pa­thy, if I re­call cor­rectly. Hmmm. (2) “Sen­si­tive win­dows” (see De­haene): Low-level sen­sory pro­cess­ing ar­eas more-or-less lock them­selves down to pre­vent fur­ther learn­ing very early in life, and cer­tain lan­guage-pro­cess­ing ar­eas lock them­selves down some­what later, and high-level con­cep­tual ar­eas don’t ever lock them­selves down at all (at least, not as com­pletely). I bet that’s ge­net­i­cally hard­wired. I guess psychedelics can un­der­mine this lock-down mechanism? ↩︎

  3. I have heard that the pri­mary mo­tor cor­tex is not the only part of the neo­cor­tex that emits mo­tor com­mands, but don’t know the de­tails. ↩︎

  4. Also, peo­ple who lose var­i­ous parts of the neo­cor­tex are of­ten ca­pa­ble of full re­cov­ery, if it hap­pens early enough in in­fancy, which sug­gests to me that the CCA’s wiring-via-learn­ing ca­pa­bil­ity is do­ing most of the work, and maybe the in­nate wiring di­a­gram is mostly just get­ting things set up more quickly and re­li­ably. ↩︎

  5. See Re­think­ing In­nate­ness p116, or bet­ter yet John­son’s ar­ti­cle ↩︎

  6. See, for ex­am­ple, Fast De­tec­tor/​First Re­spon­der: In­ter­ac­tions be­tween the Su­pe­rior Col­licu­lus-Pul­v­inar Path­way and Stim­uli Rele­vant to Pri­mates. Also, let us pause and re­flect on the fact that hu­mans have two differ­ent vi­sual pro­cess­ing sys­tems! Pretty cool! The most fa­mous con­se­quence is blind­sight, a con­di­tion where the sub­con­scious mid­brain vi­sion pro­cess­ing sys­tem (su­pe­rior col­licu­lus) is in­tact but the con­scious neo­cor­ti­cal vi­sual pro­cess­ing sys­tem is not work­ing. This study proves that blind­sighted peo­ple can rec­og­nize not just faces but spe­cific fa­cial ex­pres­sions. I strongly sus­pect blind­sighted peo­ple would re­act to snakes and spi­ders too, but can’t find any good stud­ies (that study in the pre­vi­ous sen­tence re­gret­tably used sta­tion­ary pic­tures of spi­ders and snakes, not videos of them scam­per­ing and slith­er­ing). ↩︎