Humans interpreting humans

In a pre­vi­ous post, I showed how, given cer­tain nor­ma­tive as­sump­tions, one could dis­t­in­guish agents for whom an­chor­ing was a bias, from those for which it was a prefer­ence.

But agent looks clearly ridicu­lous—how could an­chor­ing be a bias, it makes no sense. And I agree with that as­sess­ment! ’s prefer­ences make no sense—if we think of it as a hu­man.

Hu­mans model each other in very similar ways

This is an­other way in which I think we can ex­tract hu­man prefer­ences: us­ing the fact that hu­man mod­els of each other, and self-mod­els, are all in­cred­ibly similar. Con­sider the fol­low­ing as­tound­ing state­ments:

  • If some­body turns red, shouts at you, then punches you in the face, they are prob­a­bly an­gry at you.

  • If some­body is drunk, they are less ra­tio­nal at im­ple­ment­ing long-term plans.

  • If some­body close to you tells you an in­ti­mate se­cret, then they prob­a­bly trust you.

Most peo­ple will agree with all those state­ments, to a large ex­tent—in­clud­ing the “some­body” be­ing talked about. But what is go­ing on here? Have I not shown that you can’t de­duce prefer­ences or ra­tio­nal­ity from be­havi­our? It’s not like we’ve put the “some­body” in an FMRI scan to con­struct their in­ter­nal model, so how do we know?

The thing is, that nat­u­ral se­lec­tion is lazy, and a) differ­ent hu­mans use the same type of cog­ni­tive ma­chin­ery to as­sess each other, and b) in­di­vi­d­ual hu­mans tend to use their own self-as­sess­ment ma­chin­ery to as­sess other hu­mans. Con­se­quently, there tends to be large agree­ment be­tween our own in­ter­nal self-as­sess­ment mod­els, our mod­els of other peo­ple, other peo­ple’s mod­els of other peo­ple, and other peo­ple’s self-as­sess­ment mod­els of them­selves:

This agree­ment is not perfect, by any means—I’ve men­tioned that it varies from cul­ture to cul­ture, in­di­vi­d­ual to in­di­vi­d­ual, and even within the same in­di­vi­d­ual. But even so, we can add the nor­ma­tive as­sump­tion:

  • : If is a hu­man and an­other hu­man, then ‘s mod­els of ‘s prefer­ences and ra­tio­nal­ity are in­for­ma­tive of ’s prefer­ences and ra­tio­nal­ity.

That ex­plains why I said that was a hu­man, while was not: my model of what a hu­man would pre­fer in those cir­cum­stances was cor­rect for but not for .

Im­plicit models

Note that this mod­el­ling is of­ten car­ried out im­plic­itly, through se­lect­ing the sce­nar­ios, and tweak­ing the for­mal model, so as to make the agent be­ing as­sessed more hu­man-like. With many vari­ables to play with, it’s easy to re­strict to a set that seems to demon­strate hu­man-like be­havi­our (for ex­am­ple, us­ing al­most-ra­tio­nal­ity as­sump­tions for agents with small ac­tion spaces but not for agents with large ones).

There’s noth­ing wrong with this ap­proach, but it needs to be made clear that, when we are do­ing that, we are pro­ject­ing our own as­sess­ments of hu­man ra­tio­nal­ity on the agent; we not mak­ing “cor­rect” choices as if we were dis­pas­sion­ately im­prov­ing the hy­per­pa­ram­e­ters of an image recog­ni­tion pro­gram.