Neural Categories

In Dis­guised Queries, I talked about a clas­sifi­ca­tion task of “bleggs” and “rubes”. The typ­i­cal blegg is blue, egg-shaped, furred, flex­ible, opaque, glows in the dark, and con­tains vana­dium. The typ­i­cal rube is red, cube-shaped, smooth, hard, translu­cent, un­glow­ing, and con­tains pal­la­dium. For the sake of sim­plic­ity, let us for­get the char­ac­ter­is­tics of flex­i­bil­ity/​hard­ness and opaque­ness/​translu­cency. This leaves five di­men­sions in thingspace: Color, shape, tex­ture, lu­mi­nance, and in­te­rior.

Sup­pose I want to cre­ate an Ar­tifi­cial Neu­ral Net­work (ANN) to pre­dict un­ob­served blegg char­ac­ter­is­tics from ob­served blegg char­ac­ter­is­tics. And sup­pose I’m fairly naive about ANNs: I’ve read ex­cited pop­u­lar sci­ence books about how neu­ral net­works are dis­tributed, emer­gent, and par­allel just like the hu­man brain!! but I can’t de­rive the differ­en­tial equa­tions for gra­di­ent de­scent in a non-re­cur­rent mul­ti­layer net­work with sig­moid units (which is ac­tu­ally a lot eas­ier than it sounds).

Then I might de­sign a neu­ral net­work that looks some­thing like this:


Net­work 1 is for clas­sify­ing bleggs and rubes. But since “blegg” is an un­fa­mil­iar and syn­thetic con­cept, I’ve also in­cluded a similar Net­work 1b for dis­t­in­guish­ing hu­mans from Space Mon­sters, with in­put from Aris­to­tle (“All men are mor­tal”) and Plato’s Academy (“A feather­less biped with broad nails”).

A neu­ral net­work needs a learn­ing rule. The ob­vi­ous idea is that when two nodes are of­ten ac­tive at the same time, we should strengthen the con­nec­tion be­tween them—this is one of the first rules ever pro­posed for train­ing a neu­ral net­work, known as Hebb’s Rule.

Thus, if you of­ten saw things that were both blue and furred—thus si­mul­ta­neously ac­ti­vat­ing the “color” node in the + state and the “tex­ture” node in the + state—the con­nec­tion would strengthen be­tween color and tex­ture, so that + col­ors ac­ti­vated + tex­tures, and vice versa. If you saw things that were blue and egg-shaped and vana­dium-con­tain­ing, that would strengthen pos­i­tive mu­tual con­nec­tions be­tween color and shape and in­te­rior.

Let’s say you’ve already seen plenty of bleggs and rubes come off the con­veyor belt. But now you see some­thing that’s furred, egg-shaped, and—gasp!—red­dish pur­ple (which we’ll model as a “color” ac­ti­va­tion level of −2/​3). You haven’t yet tested the lu­mi­nance, or the in­te­rior. What to pre­dict, what to pre­dict?

What hap­pens then is that the ac­ti­va­tion lev­els in Net­work 1 bounce around a bit. Pos­i­tive ac­ti­va­tion flows lu­mi­nance from shape, nega­tive ac­ti­va­tion flows to in­te­rior from color, nega­tive ac­ti­va­tion flows from in­te­rior to lu­mi­nance… Of course all these mes­sages are passed in par­allel!! and asyn­chronously!! just like the hu­man brain...

Fi­nally Net­work 1 set­tles into a sta­ble state, which has high pos­i­tive ac­ti­va­tion for “lu­mi­nance” and “in­te­rior”. The net­work may be said to “ex­pect” (though it has not yet seen) that the ob­ject will glow in the dark, and that it con­tains vana­dium.

And lo, Net­work 1 ex­hibits this be­hav­ior even though there’s no ex­plicit node that says whether the ob­ject is a blegg or not. The judg­ment is im­plicit in the whole net­work!! Bleg­gness is an at­trac­tor!! which arises as the re­sult of emer­gent be­hav­ior!! from the dis­tributed!! learn­ing rule.

Now in real life, this kind of net­work de­sign—how­ever fad­dish it may sound—runs into all sorts of prob­lems. Re­cur­rent net­works don’t always set­tle right away: They can os­cillate, or ex­hibit chaotic be­hav­ior, or just take a very long time to set­tle down. This is a Bad Thing when you see some­thing big and yel­low and striped, and you have to wait five min­utes for your dis­tributed neu­ral net­work to set­tle into the “tiger” at­trac­tor. Asyn­chronous and par­allel it may be, but it’s not real-time.

And there are other prob­lems, like dou­ble-count­ing the ev­i­dence when mes­sages bounce back and forth: If you sus­pect that an ob­ject glows in the dark, your sus­pi­cion will ac­ti­vate be­lief that the ob­ject con­tains vana­dium, which in turn will ac­ti­vate be­lief that the ob­ject glows in the dark.

Plus if you try to scale up the Net­work 1 de­sign, it re­quires O(N2) con­nec­tions, where N is the to­tal num­ber of ob­serv­ables.

So what might be a more re­al­is­tic neu­ral net­work de­sign?

In this net­work, a wave of ac­ti­va­tion con­verges on the cen­tral node from any clamped (ob­served) nodes, and then surges back out again to any un­clamped (un­ob­served) nodes. Which means we can com­pute the an­swer in one step, rather than wait­ing for the net­work to set­tle—an im­por­tant re­quire­ment in biol­ogy when the neu­rons only run at 20Hz. And the net­work ar­chi­tec­ture scales as O(N), rather than O(N2).

Ad­mit­tedly, there are some things you can no­tice more eas­ily with the first net­work ar­chi­tec­ture than the sec­ond. Net­work 1 has a di­rect con­nec­tion be­tween ev­ery two nodes. So if red ob­jects never glow in the dark, but red furred ob­jects usu­ally have the other blegg char­ac­ter­is­tics like egg-shape and vana­dium, Net­work 1 can eas­ily rep­re­sent this: it just takes a very strong di­rect nega­tive con­nec­tion from color to lu­mi­nance, but more pow­er­ful pos­i­tive con­nec­tions from tex­ture to all other nodes ex­cept lu­mi­nance.

Nor is this a “spe­cial ex­cep­tion” to the gen­eral rule that bleggs glow—re­mem­ber, in Net­work 1, there is no unit that rep­re­sents blegg-ness; blegg-ness emerges as an at­trac­tor in the dis­tributed net­work.

So yes, those N2 con­nec­tions were buy­ing us some­thing. But not very much. Net­work 1 is not more use­ful on most real-world prob­lems, where you rarely find an an­i­mal stuck halfway be­tween be­ing a cat and a dog.

(There are also facts that you can’t eas­ily rep­re­sent in Net­work 1 or Net­work 2. Let’s say sea-blue color and spheroid shape, when found to­gether, always in­di­cate the pres­ence of pal­la­dium; but when found in­di­vi­d­u­ally, with­out the other, they are each very strong ev­i­dence for vana­dium. This is hard to rep­re­sent, in ei­ther ar­chi­tec­ture, with­out ex­tra nodes. Both Net­work 1 and Net­work 2 em­body im­plicit as­sump­tions about what kind of en­vi­ron­men­tal struc­ture is likely to ex­ist; the abil­ity to read this off is what sep­a­rates the adults from the babes, in ma­chine learn­ing.)

Make no mis­take: Nei­ther Net­work 1, nor Net­work 2, are biolog­i­cally re­al­is­tic. But it still seems like a fair guess that how­ever the brain re­ally works, it is in some sense closer to Net­work 2 than Net­work 1. Fast, cheap, scal­able, works well to dis­t­in­guish dogs and cats: nat­u­ral se­lec­tion goes for that sort of thing like wa­ter run­ning down a fit­ness land­scape.

It seems like an or­di­nary enough task to clas­sify ob­jects as ei­ther bleggs or rubes, toss­ing them into the ap­pro­pri­ate bin. But would you no­tice if sea-blue ob­jects never glowed in the dark?

Maybe, if some­one pre­sented you with twenty ob­jects that were al­ike only in be­ing sea-blue, and then switched off the light, and none of the ob­jects glowed. If you got hit over the head with it, in other words. Per­haps by pre­sent­ing you with all these sea-blue ob­jects in a group, your brain forms a new sub­cat­e­gory, and can de­tect the “doesn’t glow” char­ac­ter­is­tic within that sub­cat­e­gory. But you prob­a­bly wouldn’t no­tice if the sea-blue ob­jects were scat­tered among a hun­dred other bleggs and rubes. It wouldn’t be easy or in­tu­itive to no­tice, the way that dis­t­in­guish­ing cats and dogs is easy and in­tu­itive.

Or: “Socrates is hu­man, all hu­mans are mor­tal, there­fore Socrates is mor­tal.” How did Aris­to­tle know that Socrates was hu­man? Well, Socrates had no feathers, and broad nails, and walked up­right, and spoke Greek, and, well, was gen­er­ally shaped like a hu­man and acted like one. So the brain de­cides, once and for all, that Socrates is hu­man; and from there, in­fers that Socrates is mor­tal like all other hu­mans thus yet ob­served. It doesn’t seem easy or in­tu­itive to ask how much wear­ing clothes, as op­posed to us­ing lan­guage, is as­so­ci­ated with mor­tal­ity. Just, “things that wear clothes and use lan­guage are hu­man” and “hu­mans are mor­tal”.

Are there bi­ases as­so­ci­ated with try­ing to clas­sify things into cat­e­gories once and for all? Of course there are. See e.g. Cul­tish Coun­ter­cultish­ness.

To be con­tinued...