Web of connotations: Bleggs, Rubes, thermostats and beliefs

This is a se­ries of posts with the mod­est goal of show­ing on how you can get syn­tax from se­man­tics, solve the ground­ing prob­lem, and start look­ing for mod­els within hu­man brains.

Be­fore get­ting to the meat of the ap­proach, I’ll have to de­tour, Eliezer-like, into a few back­ground con­cepts. They’ll be pre­sented in this post. The over­all arc of the se­ries is that, once we un­der­stand how syn­tax and se­man­tics can fail to match up, then check­ing if they do be­comes easy.

Old ar­gu­ments: does a ther­mo­stat have be­liefs?

I’d like to start with Searle. He’s one of the most an­noy­ing philoso­phers, be­cause he has gen­uine in­sights, but since he fo­cuses mainly on the Chi­nese room thought ex­per­i­ment—which is an in­tu­ition pump with no real con­tent be­hind it—ev­ery­one ig­nores these.

McCarthy ar­gued:

Machines as sim­ple as ther­mostats can be said to have be­liefs, and hav­ing be­liefs seems to be a char­ac­ter­is­tic of most ma­chines ca­pa­ble of prob­lem solv­ing performance

Searle counter-ar­gues:

Think hard for one minute about what would be nec­es­sary to es­tab­lish that that hunk of metal on the wall over there had real be­liefs: be­liefs with di­rec­tion of fit, propo­si­tional con­tent, and con­di­tions of satis­fac­tion; be­liefs that had the pos­si­bil­ity of be­ing strong be­liefs or weak be­liefs; ner­vous, anx­ious, or se­cure be­liefs; dog­matic, ra­tio­nal, or su­per­sti­tious be­liefs; blind faiths or hes­i­tant cog­i­ta­tions; any kind of be­liefs. The ther­mo­stat is not a can­di­date.

This seems like it’s just a defi­ni­tion prob­lem, similar to the dis­cus­sion of whether some­thing is a Rube or a Blegg. Searle lists the rich and varied prop­er­ties of hu­man be­liefs, and con­trasts them with McCarthy’s stripped down in­ter­pre­ta­tion of be­lief. It seems that we could just call the first “hu­man-like be­liefs”, and the sec­ond “sim­plified be­liefs”, and we would have dis­solved the ques­tion.

But I’d ar­gue that the im­plicit ques­tion is ac­tu­ally more com­pli­cated than that, and that Searle was, in this in­stance, cor­rect.

The im­plicit part of com­plex definitions

Bleggs and rubes are defined by only five char­ac­ter­is­tics:


Let’s have a look at two more com­pli­cated ex­am­ples: is this ob­ject a chair? Is per­son X good (and what is good­ness any­way)?


For chairs, we have a rough in­ten­sional defi­ni­tion that goes some­thing like “some­thing hu­mans sit on”, and a huge amount of men­tal ex­am­ples of chairs from our ex­pe­rience (ex­ten­sional defi­ni­tion).

When I bring up the ques­tion of what is and what isn’t a chair, most peo­ple are un­cer­tain, be­cause they don’t of­ten en­counter edge cases. When I pre­sent these, by, eg sit­ting on the edge of a table or ask­ing whether a bro­ken chair counts as a chair, they gen­er­ally try and re­fine the in­ten­sional and the ex­ten­sional defi­ni­tions by adding more ex­plicit char­ac­ter­is­tics, and rul­ing that spe­cific ex­am­ple are or aren’t chairs.

There are many ways of do­ing so, but some ways are ac­tu­ally bet­ter than oth­ers. Scott Alexan­der’s post ar­gues that defi­ni­tions can’t be right or wrong, but I think in gen­eral they can be.

A defi­ni­tion of chair is wrong if it doesn’t ap­prox­i­mate the bound­aries of the men­tal set of ex­am­ples of chairs that we have in our minds. If your defi­ni­tion an­swers “yes” to the ques­tion “is your mo­bile phone a chair if you sit on it?“, then this is a wrong/​bad/​in­cor­rect defi­ni­tion.

A defi­ni­tion doesn’t have to be boolean, yes/​no—you could say that a mo­bile phone be­ing sat on is 5% a chair, while one be­ing talked into is 0.1% a chair. And you have quite a lot free­dom when set­ting these per­centages—and differ­ent con­ver­sa­tions with differ­ent peo­ple will re­sult in quite differ­ent num­bers. But you don’t have ar­bi­trary free­dom. Some defi­ni­tions of chair are just bad.

Im­pos­si­ble definitions

Note that it may be im­pos­si­ble in some cases to find a good gen­eral defi­ni­tion for a term. “Nat­u­ral” seems to be one of those cat­e­gories. There are clear ex­am­ples of nat­u­ral ob­jects (trees, an­i­mals) and clear ex­am­ples of non-nat­u­ral ob­jects (swords, iphones).

Yet the defi­ni­tion seems to com­pletely fall apart when we press it too hard. Hu­man be­ings have been hav­ing a huge effect on the world, so, for ex­am­ple, the “nat­u­ral land­scapes of England” are also highly ar­tifi­cial. Ge­netic en­g­ineer­ing to pro­duce a cow: non-nat­u­ral. Selec­tive breed­ing that pro­duces the same cow: ar­guably nat­u­ral. Nat­u­ral se­lec­tion to pro­duce the same cow: to­tally nat­u­ral, as the name in­di­cates. But what if the nat­u­ral se­lec­tion pres­sure was an un­fore­seen con­se­quence of hu­man ac­tion? Or a fore­seen con­se­quences? Or a de­liber­ate one?

The point is, when we con­sider the set of all things and pro­cesses in the world, there does not seem to be a good defi­ni­tion of what is nat­u­ral and what isn’t. Like the AI search­ing for the near­est un­blocked strat­egy, it seems that most non-nat­u­ral ob­jects can be ap­prox­i­mated by some­thing ar­guably nat­u­ral.

That doesn’t mean that the cat­e­gory “nat­u­ral” is en­tirely vac­u­ous. We could, for ex­am­ple, see it as the con­junc­tion of more use­ful defi­ni­tions (I would per­son­ally start by dis­t­in­guish­ing be­tween nat­u­ral ob­jects, and ob­jects cre­ated by nat­u­ral meth­ods). Much philo­soph­i­cal de­bate about defi­ni­tions is of this type.

Another thing to note is that “nat­u­ral” can have a clear defi­ni­tion when ap­plied only to a sub­set of things. In the set {oak tree, lice, air­craft-car­rier, win­dow}, it’s clear which ob­jects are nat­u­ral and which are not. Sports and games that hu­mans play are similar. Peo­ple play a well defined game of foot­ball, de­spite the fact that the rules don’t cover what hap­pens if you were to play in space, on Mars, or with jet­pack-en­hanced su­per-hu­mans who could clone them­selves in un­der ten sec­onds. It is plau­si­ble that “game of foot­ball” can­not be well-defined in gen­eral; how­ever, within the limi­ta­tions of typ­i­cal hu­man situ­a­tions, it’s perfectly well defined.

Ex­tend­ing defi­ni­tions, AI, and moral realism

As we move to more and more novel situ­a­tions, and dis­cover new ideas, pre­vi­ously well-defined con­cepts be­come un­der­defined and have to be up­dated. For ex­am­ple, the defi­ni­tion of For­mula One rac­ing car is con­stantly evolv­ing and get­ting more com­plex, just to keep the sport some­what similar to what it was origi­nally (there are other goals to the chang­ing defi­ni­tion, but pre­vent­ing the sport from be­com­ing “rac­ing rock­ets vaguely around a track” is a ma­jor one).

This is a ma­jor challenge in build­ing safe AI—when peo­ple pe­ri­od­i­cally come up with ideas such as train­ing AI to be “re­spect­ful” or what­ever, they are think­ing of a set of situ­a­tions within typ­i­cal hu­man situ­a­tions, and as­sign­ing re­spect­ful/​non-re­spect­ful la­bels to them.

They don’t gen­er­ally con­sider how the bound­ary ex­tends when we get pow­er­ful AIs ca­pa­ble of go­ing rad­i­cally be­yond the limits of typ­i­cal hu­man situ­a­tions. Very naive peo­ple sug­gest in­ten­sional defi­ni­tions of re­spect that will ex­plode into use­less­ness the mo­ment we leave typ­i­cal situ­a­tions. More so­phis­ti­cated peo­ple ex­pect the AI to learn and ex­tend the bound­ary it­self. I ex­pect this to fail be­cause I don’t see “re­spect­ful” as hav­ing any mean­ingful defi­ni­tion across all situ­a­tions, but it’s much bet­ter than the in­ten­sional ap­proach.

I’m strongly spec­u­lat­ing here, but I sus­pect the differ­ence be­tween moral re­al­ists and non-moral re­al­ists is mainly whether we ex­pect moral cat­e­gory defi­ni­tions to ex­tend well or badly.

Good­ness and the unar­tic­u­lated web of connotations

Sup­pose some­one has pro­grammed an AI, and promised that that AI is “good”. They are trust­wor­thy, and even Eliezer has been con­vinced that they’ve achieved that goal. Given that, the ques­tion is:

  • Should I let that AI take care of an ag­ing rel­a­tive?

The con­cept of “good” is quite am­bigu­ous, but gen­er­ally, if some­one is a “good” hu­man, then they’d also be loyal, con­sid­er­ate, char­i­ta­ble, kind, truth­ful, hon­ourable, etc… in all typ­i­cal hu­man situ­a­tions. Th­ese con­cepts of loy­alty, con­sid­er­a­tion, etc… form a web of con­no­ta­tions that go along with good. We can think of them as con­no­ta­tions that go along with “good”, or as prop­er­ties of cen­tral ex­am­ples of “good” peo­ple, or as prop­er­ties shared by most peo­ple in our men­tal set of “good” peo­ple.

No­tice that un­like the blegg/​rube ex­am­ple, I haven’t listed all the prop­er­ties of “good”. In­deed the first step of any philo­soph­i­cal in­ves­ti­ga­tion of “good” is to figure out and list these prop­er­ties, start­ing from our men­tal ex­am­ple set.

There are two ob­vi­ous ways to ex­tend the con­cept of good to more gen­eral cases. First of all, fol­low­ing the ex­am­ple of Peter Singer, we could dis­till the con­cept by look­ing at prop­er­ties that we might like for defi­ni­tion of good—things like dis­tance to the vic­tim not mat­ter­ing. If we fol­low that route, we’d end up some­where close to a prefer­ence or he­do­nis­tic util­i­tar­ian. Call this “EAGood”—the sort of defi­ni­tion of good that most ap­peals to effec­tive al­tru­ists. Prop­er­ties like loy­alty, con­sid­er­a­tion, and so on, get sac­ri­ficed to the no­tion of util­i­tar­i­anism: an AI max­imis­ing EA good would only have those prop­er­ties when they helped it in­crease the EAGood.

Or we could go the other way, and aim to pre­serve all the things in the web of con­no­ta­tions. For­mal­ise loy­alty, con­sid­er­a­tion, char­i­ta­bil­ity, and so on, com­bine all these defi­ni­tions, and then gen­er­al­ise the com­bined defi­ni­tion. Call this broad defi­ni­tion “Mun­daneGood”. No­tice that this is much harder, but it also means that our in­tu­itions about “good” in typ­i­cal situ­a­tions, are more portable to ex­treme situ­a­tions. The anal­ogy be­tween “EAGood” and “Mun­daneGood” and bul­let-dodgers and bul­let-swal­low­ers is quite clear.

No­tice that in prac­tice, most peo­ple who are “EAGood” are also “Mun­daneGood”; this is not un­ex­pected, as these two defi­ni­tions, by con­struc­tion, over­lap in the typ­i­cal hu­man situ­a­tions.

In any case, it would be perfectly fine to leave an ag­ing rel­a­tive with a “Mun­daneGood” AI: they will do their best to help them, be­cause a good hu­man given that role would do so. It would much more du­bi­ous to leave them with a “EAGood” AI: they would only help them if do­ing so would in­crease hu­man util­ity, di­rectly or in­di­rectly (such as caus­ing me to trust the AI). They may well kill them quickly, if this in­creases the util­ity of the rest of hu­man­ity (on the other hand, the EAGood AI is more likely to be already helping my rel­a­tive with­out me ask­ing it to do so).

So if we started with a con­cept , and had a nar­row ex­ten­sion of of , then pre­dict­ing the be­havi­our of an -max­imiser in un­typ­i­cal situ­a­tions is hard. How­ever, if is a broad ex­ten­sion of , then our in­tu­itions can be used to pre­dict the be­havi­our of a -max­imiser in the same situ­a­tions. In par­tic­u­lar, a -max­imiser is likely to have all the im­plicit prop­er­ties of that we haven’t defined or even thought of.

Back to the thermostat

All sorts of things have webs of con­no­ta­tions, not just eth­i­cal con­cepts. Ob­jects have them as well (a car is a ve­hi­cle is a phys­i­cal ob­ject; a car gen­er­ally burns car­bon-based fuels; a car gen­er­ally has a speed be­low 150km/​h), as do an­i­mals and peo­ple. The Cyc pro­ject is es­sen­tially an at­tempt to for­mal­ise all the web of con­no­ta­tions that go with stan­dard con­cepts.

In terms of “be­liefs”, it’s clear that McCarthy is us­ing a nar­row/​dis­til­led ex­ten­sion of the con­cept. Some­thing like “a be­lief is some­thing within the agent that co-varies with the truth of the propo­si­tion in the out­side world”. So la­bel­ing some­thing “be­lief about tem­per­a­ture”, and at­tach­ing that to the cor­rect wires that go to a ther­mome­ter, is enough to say that thing has a be­lief.

In con­trast, Searle uses a broad defi­ni­tion of be­lief. He sees the ther­mo­stat failing to have be­liefs be­cause it fails to have the same web of con­no­ta­tions that hu­man be­liefs do.

The im­plicit ques­tion is:

  • Is McCarthy’s defi­ni­tion of be­lief suffi­cient to de­sign an agent that can rea­son as a hu­man can?

Note that nei­ther Searle nor McCarthy’s defi­ni­tions ad­dress this ques­tion di­rectly. This means that it’s an im­plicit prop­erty of the defi­ni­tion. By the pre­vi­ous sec­tion, im­plicit prop­er­ties are much more likely to be pre­served in broad defi­ni­tions than in nar­row ones. There­fore McCarthy’s defi­ni­tion is un­likely to lead to a suc­cess­ful AI; the failure of Sym­bolic AI is a tes­ta­ment to this. Searle was not only right about this; he was right for (par­tially) the right rea­sons.

Suffi­cient ver­sus necessary

Does this mean that a true AI would need to have all the prop­er­ties that Searle men­tioned? That it would need to have “strong be­liefs or weak be­liefs; ner­vous, anx­ious, or se­cure be­liefs; dog­matic, ra­tio­nal, or su­per­sti­tious be­liefs; blind faiths or hes­i­tant cog­i­ta­tions”? No, of course—it’s un­likely that the whole web of con­no­ta­tions would be needed. How­ever, some of the web may be nec­es­sary; and the more of the web that the agent has, the more likely it is to be able to rea­son as a hu­man can.

Side-note about the Chi­nese Room.

It’s in­ter­est­ing to note that in the Chi­nese Room thought ex­per­i­ment, Searle tries to fo­cus at­ten­tion on the man in the room, ma­nipu­lat­ing Chi­nese sym­bols. In con­trast, many of the counter-ar­gu­ments fo­cus on the prop­er­ties of the whole sys­tem.

Th­ese prop­er­ties must in­clude any­thing that hu­mans dis­play in their be­havi­our. For in­stance, hav­ing a (flawed) mem­ory, stored in com­plex ways, hav­ing pro­cesses cor­re­spond­ing to all the usual hu­man emo­tions, pe­ri­ods of sleep, and so on. In fact, these prop­er­ties in­clude some­thing for mod­el­ling the prop­er­ties of be­liefs—such as “be­liefs with di­rec­tion of fit, propo­si­tional con­tent, and con­di­tions of satis­fac­tion; be­liefs that had the pos­si­bil­ity of be­ing strong be­liefs or weak be­liefs; ner­vous, anx­ious, or se­cure be­liefs; dog­matic, ra­tio­nal, or su­per­sti­tious be­liefs; blind faiths or hes­i­tant cog­i­ta­tions; any kind of be­liefs”.

Thus, the Chi­nese room is a broad model of a hu­man mind, with a full web of con­no­ta­tions, but Searle’s phras­ing en­courages us to see it as nar­row one.