Bridging syntax and semantics, empirically

EDIT: I’ve found that us­ing hu­mans to de­tect in­trud­ers, is a more illus­tra­tive ex­am­ple than the tem­per­a­ture ex­am­ple one of this post. The in­truder ex­am­ple can be found in this post.

This is a se­ries of posts with the mod­est goal of show­ing on how you can get syn­tax from se­man­tics, solve the ground­ing prob­lem, and start look­ing for mod­els within hu­man brains.

I think much of the work in this area has been fo­cus­ing on the wrong ques­tion, look­ing at how sym­bols might be grounded in the­ory, rather that whether a par­tic­u­lar sym­bol is well grounded for a par­tic­u­lar con­cept. When Searle ar­gues against a ther­mo­stat hav­ing be­liefs about tem­per­a­ture, what is ac­tu­ally hap­pen­ing is that the ther­mo­stat’s in­ter­nal vari­ables cor­re­late poorly with tem­per­a­ture in gen­eral en­vi­ron­ments.

So, I’ll start by pre­sent­ing a de­ri­sively sim­ple solu­tion to the sym­bol ground­ing prob­lem, and then see what this means in prac­tice:

  • The vari­able within agent is a sym­bol for vari­able in the set of en­vi­ron­ments , iff know­ing al­lows one to pre­dict well within .

This could be mea­sured, for ex­am­ple, by high mu­tual in­for­ma­tion be­tween the vari­ables, or low con­di­tional en­tropy .

Why do I men­tion the set ? It’s be­cause any claim that is a sym­bol of will al­most always in­clude an ex­am­ple in which that is the case. Then those ar­gu­ing against that claim will of­ten pro­duce an­other en­vi­ron­ment in which fails to cor­re­late with , thus show­ing that the agent didn’t have a “gen­uine” un­der­stand­ing of . So lack of un­der­stand­ing is of­ten demon­strated by er­ror, which is an em­piri­cal stan­dard. Thus keep­ing track of the en­vi­ron­ments that cause er­ror—and those that don’t—is im­por­tant.

Vari­ables that always move together

If you trained a neu­ral net on images of black cats ver­sus white dogs, you might think you’re train­ing an an­i­mal clas­sifier, when you’re re­ally train­ing an colour clas­sifier. Ac­cord­ing to the defi­ni­tion above, the out­put vari­able of the neu­ral net, in the train­ing en­vi­ron­ment, counts as both a “sym­bol” for “black” and a sym­bol for “cat”. But which is it?

That ques­tion has no real mean­ing in the train­ing en­vi­ron­ment. We can la­bel that vari­able “cat”, or “black”, or “mix of black­ness and cat­ness”, and all are equally good. This might seem like a cheat—but re­mem­ber that within the train­ing en­vi­ron­ment, there is no such thing as a non-black cat or a non-cat black ob­ject. Hence “cat” and “black” are syn­onyms within the train­ing en­vi­ron­ment.

In or­der to sep­a­rate the la­bels, we need to pre­sent the neu­ral net with a black dog, a white cat, or some­thing else that al­lows dis­tinc­tions to be made. That’s the rea­son I talked about “ex­tend­ing defi­ni­tions” and “web of con­no­ta­tions” in the pre­vi­ous post. Within the nar­row set­ting of the train­ing en­vi­ron­ment, “black” is in the web of con­no­ta­tions of “cat”. In the more gen­eral en­vi­ron­ment of cur­rent real world, it is not, but “has paws” and “gives birth to live young” are in the web of con­no­ta­tions. As is, to a lesser ex­tent, “has fur”.

Note that in the past, “has fur” was more strongly in the web of con­no­ta­tions of “cat”, but this con­nec­tion has be­come weaker. Con­versely, in the fu­ture, we may see things cur­rently in the web mov­ing out; for ex­am­ple, it’s perfectly plau­si­ble that within a cen­tury or so, most or all cats will be cre­ated in ar­tifi­cial wombs.

But, re­turn­ing to the neu­ral net ex­am­ple, the clas­sifi­ca­tion failures show that no neu­ral net, to­day, has an in­ter­nal vari­able that cor­re­sponds well with “cat” in the real world en­vi­ron­ment.

Temperature

How well can agents rep­re­sent tem­per­a­ture? As­sume we have three “agents”: a ther­mo­stat, a hu­man, and some ideal­ised su­per­in­tel­li­gent robot that is highly mo­ti­vated to record the cor­rect tem­per­a­ture. We have four in­ter­nal men­tal vari­ables:

  • , the value of in­ter­nal tem­per­a­ture vari­able in the the ther­mo­stat, given by, say, a cur­rent com­ing in along a wire from a ther­mome­ter.

  • , the hu­man value of “this feels hot here”.

  • , the hu­man vari­able that cov­ers the es­ti­mate of a hu­man that is highly mo­ti­vated to re­port the cor­rect tem­per­a­ture. They can make use of ther­mome­ters and similar tools.

  • , the vari­able in­side the robot that mea­sures tem­per­a­ture.

Let be the av­er­age tem­per­a­ture around the agent. The first thing to note is that is a poor pre­dic­tor of , in al­most any set of en­vi­ron­ments. It lacks dis­crim­i­na­tion power, and it can eas­ily be led astray by fever, or chills, or hav­ing just moved from a hot area to a cold one (or vice versa). Things like anger, are enough to get our body tem­per­a­ture to rise.

Why men­tion it at all, then? Be­cause, in a sense, it is the origi­nal defi­ni­tion from which tem­per­a­ture it­self de­rives. In the lan­guage of this post, the hu­man feel­ing of heat was defined in typ­i­cal en­vi­ron­ments, and tem­per­a­ture was a nar­row ex­ten­sion of that defi­ni­tion—an ex­ten­sion that turned out to not map very well onto the origi­nal feel­ing, but has other things go­ing for it, such as a won­der­fully rigor­ous in­ten­sional defi­ni­tion.

What about the other vari­ables? Well, let’s start by spec­i­fy­ing a very nar­row set of en­vi­ron­ments, maybe within a lab set­ting. In this set, all of , , and cor­re­spond to .

Let’s gen­er­al­ise a bit more, to , the set of all typ­i­cal en­vi­ron­ments—en­vi­ron­ments which we wouldn’t find par­tic­u­larly un­usual. The and the are still go­ing fine—the is likely more pre­cise than the , but they’re both still pretty cor­re­lated with - but can have some prob­lems.

For ex­am­ple, the ther­mo­stat’s ther­mome­ter could be left in the sun, caus­ing it to mis-read the tem­per­a­ture. If a hu­man or robot was in charge of the ther­mome­ter, then they could move it into shade to get a cor­rect read­ing, but the ther­mome­ter has no un­der­stand­ing of this, so will read an overly high tem­per­a­ture. Similarly, if the wire into the ther­mo­stat was re­placed by an­other wire, would di­verge com­pletely from .

If we define as the vari­able de­not­ing the cur­rent in the wire go­ing into the ther­mo­stat, then the cor­re­la­tion be­tween and is much higher, in , than be­tween and . In , both cor­re­la­tions were al­most perfect, and and were within each other’s web of con­no­ta­tions. But they come apart in , so we can say that ther­mo­stat is not `re­ally’ mea­sur­ing tem­per­a­ture: mea­sur­ing cur­rent is a much bet­ter de­scrip­tion.

For , let’s al­low for the en­vi­ron­ments that are slightly ad­ver­sar­ial. A rather dumb agent is try­ing to fool our agents. The hu­man will gen­er­ally try and pro­tect the intgrity of its mea­sure­ments, as will the robot. The ther­mo­stat, on the other hand, is com­pletely hope­less.

For , we al­low a very in­tel­li­gent (but not su­per­in­tel­li­gent) ad­ver­sary. We ex­pect, at this point, that will be­come decor­re­lated from , while the robot is suffi­ciently smart to see through the ma­nipu­la­tions and keep close to .

At this point, should we say that demon­strates that hu­mans fail the sym­bol ground­ing prob­lem—that we don’t un­der­stand what tem­per­a­ture re­ally is? Per­haps. We cer­tainly don’t un­der­stand enough about our ad­ver­sary to undo their ma­nipu­la­tions and re­turn to the true tem­per­a­ture es­ti­mate, so we are failing to un­der­stand some­thing. But maybe if we were given the right in­for­ma­tion, we could cor­rect for this, whereas there is no “right in­for­ma­tion” that would make the ther­mo­stat be­have cor­rectly. The hu­man mind is limited, though, and as the in­tel­li­gence of the ad­ver­sary in­creased, we would find it harder and harder to even un­der­stand the tricks it was play­ing on us. It’s pos­si­ble that, in , we truly don’t un­der­stand tem­per­a­ture.

What about the fully gen­eral set of all en­vi­ron­ments? Given that the robot is a com­putable agent there is cer­tainly some sort of no free lunch the­o­rem here, so in some sub­sets the robot will fail; we have reached the limits of even su­per­in­tel­li­gent un­der­stand­ing of tem­per­a­ture.

Nat­u­ral and Good

In the pre­vi­ous ex­am­ple, the en­vi­ron­ment vari­able was the pre­cisely defined tem­per­a­ture. It can be in­struc­tive to con­sider what hap­pens when the en­vi­ron­men­tal vari­able is it­self more com­pli­cated to define.

  • .

  • .

  • .

As ar­tic­u­lated in a pre­vi­ous post, isn’t well defined out­side of very nar­row sets of en­vi­ron­ments. A failure to un­der­stand an con­cept that doesn’t make sense is not re­ally a failure to un­der­stand.

The vari­able is ac­tu­ally well-defined in a large class of en­vi­ron­ments. As long as we re­strict “hu­man” to mean­ing some­thing very close to cur­rent homo sapi­ens, the web of con­no­ta­tions about “good” will be pre­served, so we can try and clas­sify be­havi­ours that are good, kind, con­sid­er­ate, loyal, etc… and clas­sify the hu­man as good if it scores highly on most of these. The pic­ture will be similar to , above, ex­cept that will be much more com­pli­cated than , even if it doesn’t feel that this is the case. But, when us­ing and when con­fronted by , with a clever ad­ver­sary, it feels more nat­u­ral to say that hu­mans just don’t have an un­der­stand­ing of “good” in this cir­cum­stance.

The vari­able is more com­pli­cated; though its val­ues are rel­a­tively clear when the agent is a hu­man, when ap­ply­ing it to a gen­eral agent, we have mul­ti­ple choice about how to ex­tend it. We could go for the nar­row/​dis­til­led “EAGood” that moves “good” closer to a tem­per­a­ture, or for a broad “Mun­daneGood” that tries to pre­serve the web of con­no­ta­tions around good.

So be­fore we claim that an agent doesn’t un­der­stand in an un­usual en­vi­ron­ment, we should first check that is un­am­bigu­ously defined in that en­vi­ron­ment.

Find­ing the variables

So far, I as­sumed the vari­ables were given; but what if all we have is the agent’s al­gorithm (or the agent it­self) and need to in­fer their in­ter­nal vari­ables? And what about bi­ased/​in­cor­rect be­liefs? I’ll look at those in a sub­se­quent post.