Algorithmic Intent: A Hansonian Generalized Anti-Zombie Principle

“Why didn’t you tell him the truth? Were you afraid?”

“I’m not afraid. I chose not to tell him, be­cause I an­ti­ci­pated nega­tive con­se­quences if I did so.”

“What do you think ‘fear’ is, ex­actly?”

The Gen­er­al­ized Anti-Zom­bie Prin­ci­ple calls for us to posit “con­scious­ness” as ca­su­ally up­stream of re­ports of phe­nomenolog­i­cal ex­pe­rience (even if the causal link might be com­pli­cated and we might be wrong about the de­tails of what con­scious­ness is). If you’re already fa­mil­iar with con­scious hu­mans, then maybe you can speci­fi­cally en­g­ineer a non-con­scious chat­bot that imi­tates the sur­face be­hav­iors of hu­mans talk­ing about their ex­pe­riences, but you can’t have a zom­bie that just hap­pens to talk about be­ing con­scious for no rea­son.

A similar philo­soph­i­cal method­ol­ogy may help us un­der­stand other men­tal phe­nom­ena that we can­not per­ceive di­rectly, but in­fer from be­hav­ior. The Han­so­nian Gen­er­al­ized Anti-Zom­bie Prin­ci­ple calls for us to posit “in­tent” as causally up­stream of op­ti­mized be­hav­ior (even if the causal link might be com­pli­cated and we might be wrong about the de­tails of what in­tent is). You can’t have a zom­bie that just hap­pens to sys­tem­at­i­cally se­lect ac­tions that re­sult in out­comes that rank high with re­spect to a rec­og­niz­able prefer­ence or­der­ing for no rea­son.


It’s tempt­ing to think that con­scious­ness isn’t part of the phys­i­cal uni­verse. Seem­ingly, we can imag­ine a world phys­i­cally iden­ti­cally to our own—the same atom-con­figu­ra­tions evolv­ing un­der the same laws of physics—but with no con­scious­ness, a world in­hab­ited by philo­soph­i­cal “zom­bies” who move and talk, but only as mere au­toma­tons, with­out the spark of mind within.

It can’t ac­tu­ally work that way. When we talk about con­scious­ness, we do so with our merely phys­i­cal lips or merely phys­i­cal key­boards. The causal ex­pla­na­tion for talk about con­scious­ness has to ei­ther ex­ist en­tirely within physics (in which case any­thing we say about con­scious­ness is causally un­re­lated to con­scious­ness, which is ab­surd), or there needs to be some place where the laws of physics are vi­o­lated as the im­ma­te­rial soul is ob­served to be “tug­ging” on the brain (which is in-prin­ci­ple ex­per­i­men­tally de­tectable). Zom­bies can’t ex­ist.

But if con­scious­ness ex­ists within physics, it should re­spect a cer­tain “lo­cal­ity”: if the con­figu­ra­tion-of-mat­ter that is you, is con­scious, then al­most-iden­ti­cal con­figu­ra­tions should also be con­scious for al­most the same rea­sons. An ar­tifi­cial neu­ron that im­ple­ments the same in­put-out­put re­la­tion­ships as a biolog­i­cal one, would “play the same role” within the brain, which would con­tinue to com­pute the same ex­ter­nally-ob­serv­able be­hav­ior.

We don’t want to say that only ex­ter­nally-ob­serv­able be­hav­ior mat­ters and in­ter­nal mechanisms don’t mat­ter at all, be­cause sub­stan­tively differ­ent in­ter­nal mechanisms could com­pute the same be­hav­ior. Pro­saically, act­ing ex­ists: even the best method ac­tors aren’t re­ally oc­cu­py­ing the same men­tal state that the char­ac­ters they por­tray would be in. In the limit, we could (pre­tend that we could) imag­ine an in­com­pre­hen­si­bly vast Gi­ant Lookup Table that has stored the out­puts that a con­scious mind would have pro­duced in re­sponse to any in­put. Is such a Gi­ant Lookup Table—an en­tirely static map­ping of in­puts to out­puts—con­scious? Really?

But this thought ex­per­i­ment re­quires us to posit the ex­is­tence of a Gi­ant Lookup Table that just hap­pens to mimic the be­hav­ior of a con­scious mind. Why would that hap­pen? Why would that ac­tu­ally hap­pen, in the real world? (Or the clos­est pos­si­ble world large enough to con­tain the Gi­ant Lookup Table.) “Just as­sume it hap­pened by co­in­ci­dence, for the sake of the thought ex­per­i­ment” is un­satis­fy­ing, be­cause that kind of ar­bi­trary mir­a­cle doesn’t help us un­der­stand what kind of cog­ni­tive work the or­di­nary sim­ple con­cept of con­scious­ness is do­ing for us. You can as­sume that a bro­ken and scram­bled egg will spon­ta­neously re­assem­ble it­self for the sake of a thought ex­per­i­ment, but the in­ter­pre­ta­tion of your thought-ex­per­i­men­tal re­sults may seem ten­den­tious given that we have Godlike con­fi­dence that you will never, ever see that hap­pen in the real world.

The hard prob­lem of con­scious­ness is still con­fus­ing unto me—it seems im­pos­si­ble that any ar­range­ment of mere mat­ter could add up to the in­ef­fable qualia of sub­jec­tive ex­pe­rience. But the eas­ier and yet clearly some­how re­lated prob­lem of how mere mat­ter can do in­for­ma­tion-pro­cess­ing—can do things like con­struct “mod­els” by us­ing sen­sory data to cor­re­late its in­ter­nal state with the state of the world—seems un­der­stand­able, and a lot of our or­di­nary use of the con­cept of con­scious­ness nec­es­sar­ily deals with the “easy” prob­lems, like how per­cep­tion works or how to in­ter­pret peo­ple’s self-re­ports, even if we can’t see the iden­tity be­tween the hard prob­lem and the sum of all the easy prob­lems. What­ever the true refer­ent of “con­scious­ness” is—how­ever con­fused our cur­rent con­cept of it may be—it’s go­ing to be, among other things, the cause of our think­ing that we have “con­scious­ness.”

If I were to punch you in the face, I can an­ti­ci­pate the ex­pe­rience of you re­act­ing some­how—per­haps by say­ing, “Ow, that re­ally hurt! I’m per­ceiv­ing an on­tolog­i­cally-ba­sic quale of pain right now! I hereby com­mit to ex­tract a costly re­venge on you if you do that again, even at dis­pro­por­tionate cost to my­self!” The fact that the hu­man brain has the de­tailed func­tional struc­ture to com­pute that kind of re­sponse, whereas rocks and trees don’t, is why we can be con­fi­dent that rocks and trees don’t se­cretly have minds like ours.

We rec­og­nize con­scious­ness by its effects be­cause we can only rec­og­nize any­thing by its effects. For a much sim­pler ex­am­ple, con­sider the idea of sort­ing. Hu­man alpha­bets aren’t just a set of sym­bols—we also have a con­cept of the alpha­bet com­ing in some canon­i­cal or­der. The or­der of the alpha­bet doesn’t play any role in the writ­ten lan­guage it­self: you wouldn’t have trou­ble read­ing books from an al­ter­nate world where the or­der of the Ro­man alpha­bet ran KUWONSEZYFIJTABHQGPLCMVDXR, but all English words were the same—but you would have trou­ble find­ing the books on a shelf that wasn’t sorted in the or­der you’re used to. Sort­ing is use­ful be­cause it lets us find things more eas­ily: “The ti­tle I’m look­ing for starts with a P, but the book in front of me starts with a B; skip ahead” is faster than “look at ev­ery book un­til you find the one”.

In the days be­fore com­put­ers, the work of sort­ing was always done by hu­mans: if you want your phys­i­cal book­shelf to be alpha­bet­ized, you prob­a­bly don’t have a lot of other op­tions than man­u­ally han­dling the books your­self (“This ti­tle starts with a Pl; I should put it … da da da here, af­ter this ti­tle start­ing with Pe but be­fore its neigh­bor start­ing with Po”). But the com­pu­ta­tional work of sort­ing is sim­ple enough that we can pro­gram com­put­ers to do it and prove the­o­rems about what is be­ing ac­com­plished, with­out get­ting con­fused about the sa­cred mys­tery of sort­ing-ness.

Very differ­ent sys­tems can perform the work of sort­ing, but whether it’s a hu­man tidy­ing her book­shelf, or a punch­card-sort­ing ma­chine, or a mod­ern com­puter sort­ing in RAM, it’s use­ful to have a short word to de­scribe pro­cesses that “take in” some list of el­e­ments, and “out­put” a list with the same el­e­ments or­dered with re­spect to some crite­rion, for which we can know that the the­o­rems we prove about sort­ing-in-gen­eral will ap­ply to any sys­tem that im­ple­ments sort­ing. (For ex­am­ple, sort­ing pro­cesses that can only com­pare two items to check which is “greater” (as op­posed to be­ing able to ex­ploit more de­tailed prior in­for­ma­tion about the dis­tri­bu­tion of el­e­ments) can ex­pect to have to perform com­par­i­sons, where is the length of the list.)

Some­one who wasn’t fa­mil­iar with com­put­ers might re­fuse to rec­og­nize sort­ing al­gorithms as real sort­ing, as op­posed to mere “ar­tifi­cial sort­ing”. After all, a hu­man sort­ing her book­shelf in­tends to put the books in or­der, whereas the com­puter is just an au­toma­ton fol­low­ing in­struc­tions, and doesn’t in­tend any­thing at all—a zom­bie sorter!

But this po­si­tion is kind of silly, a ger­ry­man­dered con­cept defi­ni­tion. To be sure, it’s true that the in­ter­nal work­ings of the hu­man are very differ­ent from that of the com­puter. The hu­man wasn’t spe­cial-pur­pose pro­grammed to sort and is nec­es­sar­ily do­ing a lot more things. The whole modal­ity of vi­sual per­cep­tion, whereby pho­tons bounc­ing off a phys­i­cal copy of Ra­tion­al­ity: AI to Zom­bies and ab­sorbed by the hu­man’s retina are in­ter­preted as ev­i­dence to con­struct a men­tal rep­re­sen­ta­tion of the book in phys­i­cal re­al­ity, whose “ti­tle” “be­gins” with an “R”, is much more com­pli­cated than just stor­ing the bit-pat­tern 1010010 (the ASCII code for R) in RAM. Nor does the com­puter have the sub­jec­tive ex­pe­rience of ea­gerly look­ing for­ward to how much eas­ier it will be to find books af­ter the book­shelf is sorted. The hu­man also prob­a­bly won’t perform the ex­act same se­quence of com­par­i­sons as a com­puter pro­gram im­ple­ment­ing quick­sort—which also won’t perform the same se­quence of com­par­i­sons as a differ­ent pro­gram im­ple­ment­ing merge sort. But the com­par­i­sons—the act of tak­ing two things and plac­ing them some­where that de­pends on which one is “greater”—need to hap­pen in or­der to get the right an­swer.

The con­cept of “sort­ing into alpha­bet­i­cal or­der” may have been in­vented be­fore our con­cept of “com­put­ers”, but the most nat­u­ral con­cept of sort­ing in­cludes com­put­ers perform­ing quick­sort, merge sort, &c.., de­spite the lack of in­tent. We might say that in­tent is epiphe­nom­i­nal with re­spect to sort­ing.

But even if we can un­der­stand sort­ing with­out un­der­stand­ing in­tent, in­tent isn’t epiphe­nom­i­nal to the uni­verse. In­tent is part of the fabric of stuff that makes stuff hap­pen: there are sen­sory ex­pe­riences that will cause you to use­fully at­tribute in­tent to some phys­i­cal sys­tems and not oth­ers.

Speci­fi­cally, what­ever “in­tent” is—how­ever con­fused our cur­rent con­cept of it may be—it’s go­ing to be, among other things, the cause of op­ti­mized be­hav­ior. We can think of some­thing as an op­ti­miza­tion pro­cess if it’s eas­ier to pre­dict its effects on the world by at­tribut­ing goals to it, rather than by simu­lat­ing its de­tailed ac­tions and in­ter­nal state. “To figure out a strange plot, look at what hap­pens, then ask who benefits.”

Alex Flint iden­ti­fies ro­bust­ness to per­tur­ba­tions as an­other fea­ture of op­ti­miz­ing sys­tems. If you scram­bled the books on the shelf while the hu­man was tak­ing a bath­room break away from sort­ing, when she came back she would no­tice the re­ar­ranged books, and sort them again—that’s be­cause she in­tends to achieve the out­come of the shelf be­ing sorted. Sort­ing al­gorithms don’t, in gen­eral, have this prop­erty: if you shuffle a sub­ar­ray in mem­ory that the op­er­a­tion of the al­gorithm as­sumes has already been sorted, there’s noth­ing in the code to no­tice or care that the “in­tended” out­put was not achieved.

Note that this is a “be­hav­iorist”, “third per­son” per­spec­tive: we’re not talk­ing about some sub­jec­tive feel­ing of in­tend­ing some­thing, just sys­tems that sys­tem­at­i­cally steer re­al­ity into oth­er­wise-im­prob­a­ble states that rank high with re­spect to some prefer­ence or­der­ing.

Robin Han­son of­ten writes about hid­den mo­tives in ev­ery­day life, ad­vanc­ing the the­sis that the crite­ria that con­trol our de­ci­sions aren’t the same as the high-minded story we tell other peo­ple, and even the story we rep­re­sent to our­selves. If you take a strictly first-per­son per­spec­tive on in­tent, the very idea of hid­den mo­tives seems ab­surd—a con­tra­dic­tion in terms. What would it even mean, to in­tend some­thing with­out be­ing aware of it? How would you iden­tify an alleged hid­den mo­tive?

The an­swer is that posit­ing hid­den mo­tives can sim­plify our pre­dic­tions of be­hav­ior. It can be eas­ier to “look back­wards” from what goals the be­hav­ior achieves, and con­tinues to achieve in the pres­ence of novel ob­sta­cles, than to “look for­wards” from a de­tailed model of the un­der­ly­ing psy­cholog­i­cal mechanisms (which are typ­i­cally un­known).

Han­son and coau­thor Kevin Sim­ler dis­cuss the ex­am­ple of non­hu­man pri­mates groom­ing each other—man­u­ally comb­ing each other’s fur to re­move dirt and par­a­sites. One might as­sume that the func­tion of groom­ing is just what it ap­pears to be: hy­giene. But that doesn’t ex­plain why pri­mates spend more time groom­ing than they need to, why they pre­dom­i­nately groom oth­ers rather than them­selves, and why the amount of time a species spends groom­ing is un­re­lated to the amount of hair it has to groom, but is re­lated to the size of so­cial group­ings. Th­ese anoma­lies make more sense if we posit that groom­ing has been op­ti­mized for so­cial-poli­ti­cal func­tions, to provide a cred­ible sig­nal of trust.[1] (The sig­nal has to cost some­thing—in this case, time—in or­der for it to not be prof­itable to fake.) The hy­gienic func­tion of groom­ing isn’t un­real—par­a­sites do in fact get re­moved—but the world looks more con­fus­ing if you as­sume the be­hav­ior is op­ti­mized solely for hy­giene.

This kind of mul­ti­plic­ity of pur­poses is ubiquitous: thus, no­body does the thing they are sup­pos­edly do­ing: poli­tics isn’t about policy, school is not about learn­ing, medicine is not about health, &c.

There are func­tional rea­sons for some of the pur­poses of so­cial be­hav­ior to be covert, to con­ceal or mis­rep­re­sent in­for­ma­tion that it wouldn’t be prof­itable for oth­ers to know. (And covert mo­ti­va­tions might be a more effec­tive de­sign from an evolu­tion­ary per­spec­tive than out­right ly­ing if it’s too ex­pen­sive to main­tain two men­tal rep­re­sen­ta­tions: the real map for our­selves, and a fake map for our vic­tims.) This is some­times ex­plained as, “We self-de­ceive in or­der to bet­ter de­ceive oth­ers,” but I fear that this for­mu­la­tion might sug­gest more “cen­tral plan­ning” on the cog­ni­tive side of the evolu­tion­ary–cog­ni­tive bound­ary than is re­ally nec­es­sary: “self-de­cep­tion” can arise from differ­ent parts of the mind work­ing at cross-pur­poses.

Ziz dis­cusses the ex­am­ple of a father at­tempt­ing to prac­tice non­vi­o­lent com­mu­ni­ca­tion with his un­ruly teenage son: the father wants to have an hon­est and peace­ful dis­cus­sion of feel­ings and needs, but is afraid he’ll lose con­trol and be­come an­gry and threat­en­ing.

But an­gry threats aren’t just a ran­dom mis­take, in the way it’s a ran­dom mis­take if I for­get to carry the one while adding 143 + 28. Ran­dom mis­takes don’t serve a pur­pose and don’t re­sist cor­rec­tion: there’s no plau­si­ble rea­son for me to want the in­cor­rect an­swer 143 + 28 = 161, and if you say, “Hey, you for­got to carry the one,” I’ll al­most cer­tainly just say “Oops” and get it right the sec­ond time. Even if I’m more likely to make ar­ith­metic er­rors when I’m tired, the er­rors prob­a­bly won’t cor­re­late in a way that steers the fu­ture in a par­tic­u­lar di­rec­tion: you can’t use in­for­ma­tion about what I want to make bet­ter pre­dic­tions about what spe­cific er­rors I’ll make, nor use ob­ser­va­tions of spe­cific er­rors to in­fer what I want.

In con­trast, the father is likely to “lose con­trol” and make an­gry threats pre­cisely when peace­ful be­hav­ior isn’t get­ting him what he wants. That’s what anger is de­signed to do: threaten to im­pose costs or with­hold benefits to in­duce con­speci­fics to place more weight on the an­gry in­di­vi­d­ual’s welfare.

Another ex­am­ple of hid­den mo­tives: Less Wrong com­menter Car­avelle tells a story about find­ing a loop­hole in an on­line game, and be­ing out­raged to later be ac­cused of cheat­ing by the game ad­minis­tra­tors—only in ret­ro­spect re­mem­ber­ing that, on first dis­cov­er­ing the loop­hole, they had speci­fi­cally told their team­mates not to tell the ad­minis­tra­tors. The ear­lier Car­avelle-who-dis­cov­ered-the-bug must have known that the ad­mins wouldn’t al­low it (or else why in­struct team­mates to keep quiet about it?), but the later Car­avelle-who-ex­ploited-the-bug was able to protest with perfect sincer­ity that they couldn’t have known.

Another ex­am­ple: some­one asks me an in­nocu­ous-as-far-as-they-know ques­tion that I don’t feel like an­swer­ing. Maybe we’re mak­ing a cake, and I feel self-con­scious about my lack of bak­ing ex­pe­rience. You ask, “Why did you just add an eighth-cup of vanilla?” I ini­tially mis­hear you as hav­ing said, “Did you just add …” and re­ply, “Yes.” It’s only a mo­ment later that I re­al­ize that that’s not what you asked: you said “Why did you …”, not “Did you …”. But I don’t cor­rect my­self, and you don’t press the point. I am not a cog­ni­tive sci­en­tist and I don’t know what was re­ally go­ing on in my brain when I mis­heard you: maybe my au­dio pro­cess­ing is just slow. But it seems awfully con­ve­nient for me that I mo­men­tar­ily mis­heard your ques­tion speci­fi­cally when I didn’t want to an­swer it and thereby re­veal that I don’t know what I’m do­ing—al­most as if the elephant in my brain bet that it could get away with pre­tend­ing to mis­hear you, and the bet paid off.

Our ex­ist­ing lan­guage may lack the vo­cab­u­lary to ad­e­quately de­scribe op­ti­mized be­hav­ior that comes from a mix­ture of overt and hid­den mo­tives. Does the father in­tend to make an­gry threats? Did the gamer in­tend to cheat? Was I only pre­tend­ing to mis­hear your ques­tion, rather than ac­tu­ally mis­hear­ing it? We want to say No—not in the same sense that some­one con­sciously in­tends to sort her book­shelf. And yet it seems use­ful to have short code­words to talk about the as­pects of these be­hav­iors that seem op­ti­mized. The Han­so­nian Gen­er­al­ized Anti-Zom­bie Prin­ci­ple says that when some­one “loses con­trol” and makes an­gry threats, it’s not be­cause they’re a zom­bie that co­in­ci­den­tally hap­pens to do so when be­ing nice isn’t get­ting them what they want.

As Jes­sica Tay­lor ex­plains, when our ex­ist­ing lan­guage lacks the vo­cab­u­lary to ac­com­mo­date our ex­panded on­tol­ogy in the wake of a new dis­cov­ery, one strat­egy for adapt­ing our lan­guage is to define new senses of ex­ist­ing words that metaphor­i­cally ex­tend the origi­nal mean­ing. The state­ment “Ice is a form of wa­ter” might be new in­for­ma­tion to a child or a prim­i­tive AI who has already seen (liquid) wa­ter, and already seen ice, but didn’t know that the former turns into the lat­ter when suffi­ciently cold.

The word wa­ter in the sen­tence “Ice is a form of wa­ter” has a differ­ent ex­ten­sional mean­ing than the word wa­ter in the sen­tence “Water is a liquid”, but both defi­ni­tions can co­ex­ist as long as we’re care­ful to pre­cisely dis­am­biguate which sense of the word is meant in con­texts where equiv­o­ca­tion could be de­cep­tive.

We might wish to ap­ply a similar lin­guis­tic tac­tic in or­der to be able to con­cisely talk about cases where we think some­one’s be­hav­ior is op­ti­mized to achieve goals, but the com­pu­ta­tion that de­ter­mines the be­hav­ior isn’t nec­es­sar­ily overt or con­scious.

Al­gorith­mic seems like a promis­ing can­di­date for a dis­am­biguat­ing ad­jec­tive to make it clear that we’re talk­ing about the op­ti­miza­tion crite­ria im­plied by a sys­tem’s in­puts and out­puts, rather than what it sub­jec­tively feels like to be that sys­tem. We could then speak of an “al­gorith­mic in­tent” that doesn’t nec­es­sar­ily im­ply “(con­scious) in­tent”, similarly to how ice is a form of “wa­ter” de­spite not be­ing “(liquid) wa­ter”. We might similarly want to speak of al­gorith­mic “hon­esty” (refer­ring to sig­nals se­lected on the crite­rion of mak­ing re­ceivers have more ac­cu­rate be­liefs), “de­cep­tion” (refer­ring to sig­nals se­lected for pro­duc­ing less ac­cu­rate be­liefs), or even “fraud” (de­cep­tion that moves re­sources to the agent send­ing the de­cep­tive sig­nal).

Some au­thors might ad­mit the prag­matic use­ful­ness of the metaphor­i­cal ex­ten­sion, but in­sist that the new us­age be marked as “just a metaphor” with a pre­fix such as pseudo- or quasi-. But I claim that broad “al­gorith­mic” senses of “men­tal” words like in­tent of­ten are more rele­vant and use­ful for mak­ing sense of the world than the origi­nal, nar­rower defi­ni­tions that were in­vented by hu­mans in the con­text of deal­ing with other hu­mans, be­cause the uni­verse in fact does not re­volve around hu­mans.

When a preda­tory Pho­turis fire­fly sends the mat­ing sig­nal of a differ­ent species of fire­fly in or­der to lure prey, I think it makes sense to straight-up call this de­cep­tive (rather than merely pseudo- or quasi-de­cep­tive), even though fire­flies don’t have lan­guage with which to think the ver­bal thought, “And now I’m go­ing to send an­other species’s mat­ing sig­nal in or­der to lure prey …”

When a gen­er­a­tive ad­ver­sar­ial net­work learns to pro­duce images of re­al­is­tic hu­man faces or anime char­ac­ters, it would in no way aid our un­der­stand­ing to in­sist that the sys­tem isn’t re­ally “learn­ing” just be­cause it’s not a hu­man learn­ing the way a hu­man would—any more than it would to in­sist that quick­sort isn’t re­ally sort­ing. “Us­ing ex­po­sure to data as an in­put into gain­ing ca­pa­bil­ities” is a perfectly ad­e­quate defi­ni­tion of learn­ing in this con­text.

In a nearby pos­si­ble fu­ture, when you sue a com­pany for fraud be­cause their ad­ver­tis­ing claimed that their product would dis­in­fect wolf bites, but the product in­stead gave you can­cer, we would hope that the court will not be per­suaded if the com­pany’s defense-lawyer AI says, “But that ad­ver­tise­ment was com­posed by fil­ter­ing GPT-5 out­put for the ver­sion that in­creased sales the most—at no point did any hu­man form the con­scious in­tent to de­ceive you!”

Another pos­si­ble con­cern with this pro­posed lan­guage us­age is that if it’s so­cially per­mis­si­ble to at­tribute un­con­scious mo­tives to in­ter­locu­tors, peo­ple will abuse this to se­lec­tively ac­cuse their ri­vals of bad in­tent, lead­ing to toxic so­cial out­comes: there’s no way for nega­tively-valenced in­tent-lan­guage like “fraud” or “de­cep­tion” to sta­bly have de­no­ta­tive mean­ings in­de­pen­dently of ques­tions of who should be pun­ished.

It seems plau­si­ble to me that this con­cern is cor­rect: in a hu­man com­mu­nity of any ap­pre­cia­ble size, if you let peo­ple ques­tion the sto­ries we tell about our­selves, you are go­ing to get ac­rimo­nious and not-read­ily-falsifi­able ac­cu­sa­tions of bad in­tent. (“Liar!” “Huh? You can ar­gue that I’m wrong, but I ac­tu­ally be­lieve what I’m say­ing!” “Oh, maybe con­sciously, but I was ac­cus­ing you of be­ing an al­gorith­mic liar.”)

Un­for­tu­nately, as an as­piring epistemic ra­tio­nal­ist, I’m not al­lowed to care whether some de­scrip­tions might be so­cially harm­ful for a hu­man com­mu­nity to adopt; I’m only al­lowed to care about what de­scrip­tions shorten the length of the mes­sage needed to de­scribe my ob­ser­va­tions.


  1. Robin Han­son and Kevin Sim­ler, The Elephant in the Brain: Hid­den Mo­tives in Every­day Life, Ch. 1, “An­i­mal Be­hav­ior” ↩︎