How uniform is the neocortex?

How uniform is the neo­cor­tex?

The neo­cor­tex is the part of the hu­man brain re­spon­si­ble for higher-or­der func­tions like sen­sory per­cep­tion, cog­ni­tion, and lan­guage, and has been hy­poth­e­sized to be uniformly com­posed of gen­eral-pur­pose data-pro­cess­ing mod­ules. What does the cur­rently available ev­i­dence sug­gest about this hy­poth­e­sis?

“How uniform is the neo­cor­tex?” is one of the back­ground vari­ables in my frame­work for AGI timelines. My aim for this post is not to pre­sent a com­plete ar­gu­ment for some view on this vari­able, so much as it is to:

  • pre­sent some con­sid­er­a­tions I’ve en­coun­tered that shed light on this variable

  • in­vite a col­lab­o­ra­tive effort among read­ers to shed fur­ther light on this vari­able (e.g. by leav­ing com­ments about con­sid­er­a­tions I haven’t in­cluded, or point­ing out mis­takes in my analy­ses)

There’s a long list of differ­ent re­gions in the neo­cor­tex, each of which ap­pears to be re­spon­si­ble for some­thing to­tally differ­ent. One in­ter­pre­ta­tion is that these cor­ti­cal re­gions are do­ing fun­da­men­tally differ­ent things, and that we ac­quired the ca­pac­i­ties to do all these differ­ent things over hun­dreds of mil­lions of years of evolu­tion.

A rad­i­cally differ­ent per­spec­tive, first put forth by Ver­non Mount­cas­tle in 1978, hy­poth­e­sizes that the neo­cor­tex is im­ple­ment­ing a sin­gle gen­eral-pur­pose data pro­cess­ing al­gorithm all through­out. From the pop­u­lar neu­ro­science book On In­tel­li­gence, by Jeff Hawk­ins[1]:

[...] Mount­cas­tle points out that the neo­cor­tex is re­mark­ably uniform in ap­pear­ance and struc­ture. The re­gions of cor­tex that han­dle au­di­tory in­put look like the re­gions that han­dle touch, which look like the re­gions that con­trol mus­cles, which look like Broca’s lan­guage area, which look like prac­ti­cally ev­ery other re­gion of the cor­tex. Mount­cas­tle sug­gests that since these re­gions all look the same, per­haps they are ac­tu­ally perform­ing the same ba­sic op­er­a­tion! He pro­poses that the cor­tex uses the same com­pu­ta­tional tool to ac­com­plish ev­ery­thing it does.


Mount­cas­tle [...] shows that de­spite the differ­ences, the neo­cor­tex is re­mark­ably uniform. The same lay­ers, cell types, and con­nec­tions ex­ist through­out. [...] The differ­ences are of­ten so sub­tle that trained anatomists can’t agree on them. There­fore, Mount­cas­tle ar­gues, all re­gions of the cor­tex are perform­ing the same op­er­a­tion. The thing that makes the vi­sion area vi­sual and the mo­tor area mo­toric is how the re­gions of cor­tex are con­nected to each other and to other parts of the cen­tral ner­vous sys­tem.

In fact, Mount­cas­tle ar­gues that the rea­son one re­gion of cor­tex looks slightly differ­ent from an­other is be­cause of what it is con­nected to, and not be­cause its ba­sic func­tion is differ­ent. He con­cludes that there is a com­mon func­tion, a com­mon al­gorithm, that is performed by all the cor­ti­cal re­gions. Vi­sion is no differ­ent from hear­ing, which is no differ­ent from mo­tor out­put. He al­lows that our genes spec­ify how the re­gions of cor­tex are con­nected, which is very spe­cific to func­tion and species, but the cor­ti­cal tis­sue it­self is do­ing the same thing ev­ery­where.

If Mount­cas­tle is cor­rect, the al­gorithm of the cor­tex must be ex­pressed in­de­pen­dently of any par­tic­u­lar func­tion or sense. The brain uses the same pro­cess to see as to hear. The cor­tex does some­thing uni­ver­sal that can be ap­plied to any type of sen­sory or mo­tor sys­tem.

The rest of this post will re­view some of the ev­i­dence around Mount­cas­tle’s hy­poth­e­sis.

Cor­ti­cal func­tion is largely de­ter­mined by in­put data

When vi­sual in­puts are fed into the au­di­tory cor­tices of in­fant fer­rets, those au­di­tory cor­tices de­velop into func­tional vi­sual sys­tems. This sug­gests that differ­ent cor­ti­cal re­gions are all ca­pa­ble of gen­eral-pur­pose data pro­cess­ing.

Hu­mans can learn how to perform forms of sen­sory pro­cess­ing we haven’t evolved to perform—blind peo­ple can learn to see with their tongues, and can learn to echolo­cate well enough to dis­cern den­sity and tex­ture. On the flip side, forms of sen­sory pro­cess­ing that we did evolve to perform de­pend heav­ily on the data we’re ex­posed to—for ex­am­ple, cats ex­posed only to hori­zon­tal edges early in life don’t have the abil­ity to dis­cern ver­ti­cal edges later in life. This sug­gests that our ca­pac­i­ties for sen­sory pro­cess­ing stem from some sort of gen­eral-pur­pose data pro­cess­ing, rather than in­nate ma­chin­ery handed to us by evolu­tion.

Blind peo­ple who learn to echolo­cate do so with the help of re­pur­posed vi­sual cor­tices, and they can learn to read Braille us­ing re­pur­posed vi­sual cor­tices. Our vi­sual cor­tices did not evolve to be uti­lized in these ways, sug­gest­ing that the vi­sual cor­tex is do­ing some form of gen­eral-pur­pose data pro­cess­ing.

There’s a man who had the en­tire left half of his brain re­moved when he was 5, who has above-av­er­age in­tel­li­gence, and went on to grad­u­ate col­lege and main­tain steady em­ploy­ment. This would only be pos­si­ble if the right half of his brain were ca­pa­ble of tak­ing on the cog­ni­tive func­tions of the left half of the brain.

The pat­terns iden­ti­fied by the pri­mary sen­sory cor­tices (for vi­sion, hear­ing, and see­ing) over­lap sub­stan­tially with the pat­terns that nu­mer­ous differ­ent un­su­per­vised learn­ing al­gorithms iden­ti­fied from the same data, sug­gest­ing that the differ­ent cor­ti­cal re­gions (along with the differ­ent un­su­per­vised learn­ing al­gorithms) are all just do­ing some form of gen­eral-pur­pose pat­tern recog­ni­tion on its in­put data.

Deep learn­ing and cor­ti­cal generality

The above ev­i­dence does not rule out the pos­si­bil­ity that the cor­tex’s ap­par­ent adapt­abil­ity stems from de­vel­op­men­tal trig­gers, rather than some ca­pa­bil­ity for gen­eral-pur­pose data-pro­cess­ing. By anal­ogy, stem cells all start out very similar, only to differ­en­ti­ate into cells with func­tions tai­lored to the con­texts in which they find them­selves. It’s pos­si­ble that differ­ent cor­ti­cal re­gions have hard-coded ge­nomic re­sponses for han­dling par­tic­u­lar data in­puts, such that the cor­tex gives one hard-coded re­sponse when it de­tects that it’s re­ceiv­ing vi­sual data, an­other hard-coded re­sponse when it de­tects that it’s re­ceives au­di­tory data, etc.

If this were the case, the cor­tex’s data-pro­cess­ing ca­pa­bil­ities can best be un­der­stood as spe­cial­ized re­sponses to dis­tinct evolu­tion­ary needs, and our abil­ity to pro­cess data that we haven’t evolved to pro­cess (e.g. be­ing able to look at a Go board and in­tu­itively dis­cern what a good next move would be) most likely uti­lizes a com­pli­cated mish­mash of het­ero­ge­neous data-pro­cess­ing abil­ities ac­quired over evolu­tion­ary timescales.

Be­fore I learned about any of the ad­vance­ments in deep learn­ing, this was my most likely guess about how the brain worked. It had always seemed to me that the hard­est and most mys­te­ri­ous part of in­tel­li­gence was in­tu­itive pat­tern-recog­ni­tion, and that the var­i­ous forms of in­tu­itive pro­cess­ing that let us rec­og­nize images, say sen­tences, and play Go might be to­tally differ­ent and pos­si­bly ar­bi­trar­ily com­plex.

So I was very sur­prised when I learned that a sin­gle gen­eral method in deep learn­ing (train­ing an ar­tifi­cial neu­ral net­work on mas­sive amounts of data us­ing gra­di­ent de­scent)[2] led to perfor­mance com­pa­rable or su­pe­rior to hu­mans’ in tasks as dis­parate as image clas­sifi­ca­tion, speech syn­the­sis, and play­ing Go. I found su­per­hu­man Go perfor­mance par­tic­u­larly sur­pris­ing—in­tu­itive judg­ments of Go boards en­code dis­til­la­tions of high-level strate­gic rea­son­ing, and are highly sen­si­tive to small changes in in­put. Nei­ther of these is true for sen­sory pro­cess­ing, so my prior guess was that the meth­ods that worked for sen­sory pro­cess­ing wouldn’t have been suffi­cient for play­ing Go as well as hu­mans.[3]

This sug­gested to me that there’s noth­ing fun­da­men­tally com­plex or mys­te­ri­ous about in­tu­ition, and that seem­ingly-het­ero­ge­neous forms of in­tu­itive pro­cess­ing can re­sult from sim­ple and gen­eral learn­ing al­gorithms. From this per­spec­tive, it seems most par­si­mo­nious to ex­plain the cor­tex’s seem­ingly gen­eral-pur­pose data-pro­cess­ing ca­pa­bil­ities as re­sult­ing straight­for­wardly from a gen­eral learn­ing al­gorithm im­ple­mented all through­out the cor­tex. (This is not to say that I think the cor­tex is do­ing what ar­tifi­cial neu­ral net­works are do­ing—rather, I think deep learn­ing pro­vides ev­i­dence that gen­eral learn­ing al­gorithms ex­ist at all, which in­creases the prior like­li­hood on the cor­tex im­ple­ment­ing a gen­eral learn­ing al­gorithm.[4])

The strength of this con­clu­sion hinges on the ex­tent to which the “ar­tifi­cial in­tu­ition” that cur­rent ar­tifi­cial neu­ral net­works (ANNs) are ca­pa­ble of is analo­gous to the in­tu­itive pro­cess­ing that hu­mans are ca­pa­ble of. It’s pos­si­ble that the “in­tu­ition” uti­lized by ANNs is deeply analo­gous to hu­man in­tu­ition, in which case the gen­er­al­ity of ANNs would be very in­for­ma­tive about the gen­er­al­ity of cor­ti­cal data-pro­cess­ing. It’s also pos­si­ble that “ar­tifi­cial in­tu­ition” is differ­ent in kind from hu­man in­tu­ition, or that it only cap­tures a small frac­tion of what goes into hu­man in­tu­ition, in which case the gen­er­al­ity of ANNs would not be very in­for­ma­tive about the gen­er­al­ity of cor­ti­cal data-pro­cess­ing.

It seems that ex­perts are di­vided about how analo­gous these forms of in­tu­ition are, and I con­jec­ture that this is a ma­jor source of dis­agree­ment about over­all AI timelines. Shane Legg (a cofounder of Deep­Mind, a lead­ing AI lab) has been talk­ing about how deep be­lief net­works might be able to repli­cate the func­tion of the cor­tex be­fore deep learn­ing took off, and he’s been pre­dict­ing hu­man-level AGI in the 2020s since 2009. Eliezer Yud­kowsky has di­rectly talked about AlphaGo pro­vid­ing ev­i­dence of “neu­ral al­gorithms that gen­er­al­ize well, the way that the hu­man cor­ti­cal al­gorithm gen­er­al­izes well” as an in­di­ca­tion that AGI might be near. Rod­ney Brooks (the former di­rec­tor of MIT’s AI lab) has writ­ten about how deep learn­ing is not ca­pa­ble of real per­cep­tion or ma­nipu­la­tion, and thinks AGI is over 100 years away. Gary Mar­cus has de­scribed deep learn­ing as a “wild over­sim­plifi­ca­tion” of the “hun­dreds of anatom­i­cally and likely func­tion­ally [dis­tinct] ar­eas” of the cor­tex, and es­ti­mates AGI to be 20-50 years away.

Canon­i­cal micro­cir­cuits for pre­dic­tive coding

If the cor­tex were uniform, what might it ac­tu­ally be do­ing uniformly?

The cor­tex has been hy­poth­e­sized to con­sist of canon­i­cal micro­cir­cuits that im­ple­ment pre­dic­tive cod­ing. In a nut­shell, pre­dic­tive cod­ing (aka pre­dic­tive pro­cess­ing) is a the­ory of brain func­tion which hy­poth­e­sizes that the cor­tex learns hi­er­ar­chi­cal struc­ture of the data it re­ceives, and uses this struc­ture to en­code pre­dic­tions about fu­ture sense in­puts, re­sult­ing in “con­trol­led hal­lu­ci­na­tions” that we in­ter­pret as di­rect per­cep­tion of the world.

On In­tel­li­gence has an ex­cerpt that cleanly com­mu­ni­cates what I mean by “learn­ing hi­er­ar­chi­cal struc­ture”:

[...] The real world’s nested struc­ture is mir­rored by the nested struc­ture of your cor­tex.

What do I mean by a nested or hi­er­ar­chi­cal struc­ture? Think about mu­sic. Notes are com­bined to form in­ter­vals. In­ter­vals are com­bined to form melodic phrases. Phrases are com­bined to form melodies or songs. Songs are com­bined into albums. Think about writ­ten lan­guage. Let­ters are com­bined to form syl­la­bles. Syl­la­bles are com­bined to form words. Words are com­bined to form clauses and sen­tences. Look­ing at it the other way around, think about your neigh­bor­hood. It prob­a­bly con­tains roads, schools, and houses. Houses have rooms. Each room has walls, a ceiling, a floor, a door, and one or more win­dows. Each of these is com­posed of smaller ob­jects. Win­dows are made of glass, frames, latches, and screens. Latches are made from smaller parts like screws.

Take a mo­ment to look up at your sur­round­ings. Pat­terns from the retina en­ter­ing your pri­mary vi­sual cor­tex are be­ing com­bined to form line seg­ments. Line seg­ments com­bine to form more com­plex shapes. Th­ese com­plex shapes are com­bin­ing to form ob­jects like noses. Noses are com­bin­ing with eyes and mouths to form faces. And faces are com­bin­ing with other body parts to form the per­son who is sit­ting in the room across from you.

All ob­jects in your world are com­posed of sub­ob­jects that oc­cur con­sis­tently to­gether; that is the very defi­ni­tion of an ob­ject. When we as­sign a name to some­thing, we do so be­cause a set of fea­tures con­sis­tently trav­els to­gether. A face is a face pre­cisely be­cause two eyes, a nose, and a mouth always ap­pear to­gether. An eye is an eye pre­cisely be­cause a pupil, an iris, an eye­lid, and so on, always ap­pear to­gether. The same can be said for chairs, cars, trees, parks, and coun­tries. And, fi­nally, a song is a song be­cause a se­ries of in­ter­vals always ap­pear to­gether in se­quence.

In this way the world is like a song. Every ob­ject in the world is com­posed of a col­lec­tion of smaller ob­jects, and most ob­jects are part of larger ob­jects. This is what I mean by nested struc­ture. Once you are aware of it, you can see nested struc­tures ev­ery­where. In an ex­actly analo­gous way, your mem­o­ries of things and the way your brain rep­re­sents them are stored in the hi­er­ar­chi­cal struc­ture of the cor­tex. Your mem­ory of your home does not ex­ist in one re­gion of cor­tex. It is stored over a hi­er­ar­chy of cor­ti­cal re­gions that re­flect the hi­er­ar­chi­cal struc­ture of the home. Large-scale re­la­tion­ships are stored at the top of the hi­er­ar­chy and small-scale re­la­tion­ships are stored to­ward the bot­tom.

The de­sign of the cor­tex and the method by which it learns nat­u­rally dis­cover the hi­er­ar­chi­cal re­la­tion­ships in the world. You are not born with knowl­edge of lan­guage, houses, or mu­sic. The cor­tex has a clever learn­ing al­gorithm that nat­u­rally finds what­ever hi­er­ar­chi­cal struc­ture ex­ists and cap­tures it.

The clear­est ev­i­dence that the brain is learn­ing hi­er­ar­chi­cal struc­ture comes from the vi­sual sys­tem. The vi­sual cor­tex is known to have edge de­tec­tors at the low­est lev­els of pro­cess­ing, and neu­rons that fire when shown images of par­tic­u­lar peo­ple, like Bill Clin­ton.

What does pre­dic­tive cod­ing say the cor­tex does with this learned hi­er­ar­chi­cal struc­ture? From an in­tro­duc­tory blog post about pre­dic­tive pro­cess­ing:

[...] the brain is a multi-layer pre­dic­tion ma­chine. All neu­ral pro­cess­ing con­sists of two streams: a bot­tom-up stream of sense data, and a top-down stream of pre­dic­tions. Th­ese streams in­ter­face at each level of pro­cess­ing, com­par­ing them­selves to each other and ad­just­ing them­selves as nec­es­sary.

The bot­tom-up stream starts out as all that in­com­pre­hen­si­ble light and dark­ness and noise that we need to pro­cess. It grad­u­ally moves up all the cog­ni­tive lay­ers that we already knew ex­isted – the edge-de­tec­tors that re­solve it into edges, the ob­ject-de­tec­tors that shape the edges into solid ob­jects, et cetera.

The top-down stream starts with ev­ery­thing you know about the world, all your best heuris­tics, all your pri­ors, [all the struc­ture you’ve learned,] ev­ery­thing that’s ever hap­pened to you be­fore – ev­ery­thing from “solid ob­jects can’t pass through one an­other” to “e=mc^2” to “that guy in the blue uniform is prob­a­bly a po­lice­man”. It uses its knowl­edge of con­cepts to make pre­dic­tions – not in the form of ver­bal state­ments, but in the form of ex­pected sense data. It makes some guesses about what you’re go­ing to see, hear, and feel next, and asks “Like this?” Th­ese pre­dic­tions grad­u­ally move down all the cog­ni­tive lay­ers to gen­er­ate lower-level pre­dic­tions. If that uniformed guy was a po­lice­man, how would that af­fect the var­i­ous ob­jects in the scene? Given the an­swer to that ques­tion, how would it af­fect the dis­tri­bu­tion of edges in the scene? Given the an­swer to that ques­tion, how would it af­fect the raw-sense data re­ceived?

As these two streams move through the brain side-by-side, they con­tinu­ally in­ter­face with each other. Each level re­ceives the pre­dic­tions from the level above it and the sense data from the level be­low it. Then each level uses Bayes’ The­o­rem to in­te­grate these two sources of prob­a­bil­is­tic ev­i­dence as best it can.


“To deal rapidly and fluently with an un­cer­tain and noisy world, brains like ours have be­come mas­ters of pre­dic­tion – sur­fing the waves and noisy and am­bigu­ous sen­sory stim­u­la­tion by, in effect, try­ing to stay just ahead of them. A skil­led surfer stays ‘in the pocket’: close to, yet just ahead of the place where the wave is break­ing. This pro­vides power and, when the wave breaks, it does not catch her. The brain’s task is not dis­similar. By con­stantly at­tempt­ing to pre­dict the in­com­ing sen­sory sig­nal we be­come able [...] to learn about the world around us and to en­gage that world in thought and ac­tion.”

The re­sult is per­cep­tion, which the PP the­ory de­scribes as “con­trol­led hal­lu­ci­na­tion”. You’re not see­ing the world as it is, ex­actly. You’re see­ing your pre­dic­tions about the world, cashed out as ex­pected sen­sa­tions, then shaped/​con­strained by the ac­tual sense data.

An illus­tra­tion of pre­dic­tive pro­cess­ing, from the same source:

This demon­strates the de­gree to which the brain de­pends on top-down hy­pothe­ses to make sense of the bot­tom-up data. To most peo­ple, these two pic­tures start off look­ing like in­co­her­ent blotches of light and dark­ness. Once they figure out what they are (spoiler) the scene be­comes ob­vi­ous and co­her­ent. Ac­cord­ing to the pre­dic­tive pro­cess­ing model, this is how we per­ceive ev­ery­thing all the time – ex­cept usu­ally the con­cepts nec­es­sary to make the scene fit to­gether come from our higher-level pre­dic­tions in­stead of from click­ing on a spoiler link.

Pre­dic­tive cod­ing has been hailed by promi­nent neu­ro­scien­tists as a pos­si­ble unified the­ory of the brain, but I’m con­fused about how much phys­iolog­i­cal ev­i­dence there is that the brain is ac­tu­ally im­ple­ment­ing pre­dic­tive cod­ing. It seems like there’s phys­iolog­i­cal ev­i­dence in sup­port of pre­dic­tive cod­ing be­ing im­ple­mented in the vi­sual cor­tex and in the au­di­tory cor­tex, and there’s a the­o­ret­i­cal ac­count of how the pre­frontal cor­tex (re­spon­si­ble for higher cog­ni­tive func­tions like plan­ning, de­ci­sion-mak­ing, and ex­ec­u­tive func­tion) might be uti­liz­ing similar prin­ci­ples. This pa­per and this pa­per re­view some phys­iolog­i­cal ev­i­dence of pre­dic­tive cod­ing in the cor­tex that I don’t re­ally know how to in­ter­pret.

My cur­rent take

I find the var­i­ous pieces of ev­i­dence that cor­ti­cal func­tion de­pends largely on data in­puts (e.g. the fer­ret rewiring ex­per­i­ment) to be pretty com­pel­ling ev­i­dence of gen­eral-pur­pose data-pro­cess­ing in the cor­tex. The suc­cess of sim­ple and gen­eral meth­ods in deep learn­ing across a wide range of tasks sug­gests that it’s most par­si­mo­nious to model the cor­tex as em­ploy­ing gen­eral meth­ods through­out, but only to the ex­tent that the ca­pa­bil­ities of ar­tifi­cial neu­ral net­works can be taken to be analo­gous to the ca­pa­bil­ities of the cor­tex. I cur­rently con­sider the anal­ogy to be deep, and in­tend to ex­plore my rea­sons for think­ing so in fu­ture posts.

I think the fact that pre­dic­tive cod­ing offers a plau­si­ble the­o­ret­i­cal ac­count for what the cor­tex could be do­ing uniformly, which can ac­count for higher-level cog­ni­tive func­tions in ad­di­tion to sen­sory pro­cess­ing, is it­self some ev­i­dence of cor­ti­cal unifor­mity. I’m con­fused about how much phys­iolog­i­cal ev­i­dence there is that the brain is ac­tu­ally im­ple­ment­ing pre­dic­tive cod­ing, but I’m very bullish on pre­dic­tive cod­ing as a ba­sis for a unified brain the­ory based on non-phys­iolog­i­cal ev­i­dence (like our sub­jec­tive ex­pe­riences mak­ing sense of the images of splotches) that I in­tend to ex­plore in a fu­ture post.

Thanks to Paul Kreiner, David Spi­vak, and Stag Lynn for helpful sug­ges­tions and feed­back, and thanks to Ja­cob Can­nell for writ­ing a post that in­spired much of my think­ing here.

  1. This blog post com­ment has some good ex­cerpts from On In­tel­li­gence. ↩︎

  2. Deep learn­ing is a gen­eral method in the sense that most tasks are solved by uti­liz­ing a hand­ful of ba­sic tools from a stan­dard toolkit, adapted for the spe­cific task at hand. Once you’ve se­lected the ba­sic tools, all that’s left is figur­ing out how to sup­ply the train­ing data, spec­i­fy­ing the ob­jec­tive that lets the AI know how well it’s do­ing, throw­ing a lot of com­pu­ta­tion at the prob­lem, and fid­dling with de­tails. My un­der­stand­ing is that there typ­i­cally isn’t much con­cep­tual in­ge­nu­ity in­volved in solv­ing the prob­lems, that most of the work goes into fid­dling with de­tails, and that try­ing to be clever doesn’t lead to bet­ter re­sults than us­ing stan­dard tricks with more com­pu­ta­tion and train­ing data. It’s also worth not­ing that most of the tools in this stan­dard toolkit have been around since the 90′s (e.g. con­volu­tional neu­ral net­works, LSTMs, re­in­force­ment learn­ing, back­prop­a­ga­tion), and that the re­cent boom in AI was driven by us­ing these decades-old tools with un­prece­dented amounts of com­pu­ta­tion. ↩︎

  3. AlphaGo did simu­late fu­ture moves to achieve su­per­hu­man perfor­mance, so the di­rect com­par­i­son against hu­man in­tu­ition isn’t com­pletely fair. But AlphaGo Zero’s raw neu­ral net­work, which just looks at the “tex­ture” of the board with­out simu­lat­ing any fu­ture moves, can still play quite formidably. From the AlphaGo Zero pa­per: “The raw neu­ral net­work, with­out us­ing any looka­head, achieved an Elo rat­ing of 3,055. AlphaGo Zero achieved a rat­ing of 5,185, com­pared to 4,858 for AlphaGo Master, 3,739 for AlphaGo Lee and 3,144 for AlphaGo Fan.” (AlphaGo Fan beat the Euro­pean Go cham­pion 5-0.) ↩︎

  4. Eliezer Yud­kowsky has an in­sight­ful ex­po­si­tion of this point in a Face­book post. ↩︎