Gricean communication and meta-preferences

There’s a bit of an anal­ogy be­tween com­mu­ni­ca­tion and value learn­ing that I’ve been think­ing about re­cently.

Com­mu­ni­ca­tion:

By “Gricean com­mu­ni­ca­tion,” I mean Paul Grice’s model of com­mu­ni­ca­tion as ac­tions based on re­cur­sive mod­el­ing.

A clas­sic ex­am­ple is me flash­ing my head­lights to com­mu­ni­cate that you need to check yours.

Why do I take this weird ac­tion of flash­ing my head­lights? Well, be­cause I ex­pect you to check yours. But I don’t ex­pect this be­cause I think you have some evolved, un­con­scious re­flex to check your head­lights when you see a flash­ing light; I ex­pect you to model my be­hav­ior and de­duce that I ex­pect you to check your head­lights.

So this sim­ple act in­vokes sev­eral lay­ers of re­cur­sive agent-shaped mod­els:

  • I flash my head­lights be­cause I model you (layer 4)...

  • as mod­el­ing me as mak­ing a rea­soned de­ci­sion to flash my lights (layer 3)...

  • in which I model you (layer 2)...

  • as in­ter­pret­ing my ac­tions in terms of a model (layer 1)...

  • where I flash my lights when yours are in an un­usual state.

The layer 1 model is “Char­lie thinks your lights should be checked, so he flashes his.”

Layer 2 is “You know that I’m flash­ing my lights be­cause I think you should check yours.”

Layer 3 is “Char­lie knows that you’ll in­ter­pret the flash as in­di­cat­ing check­ing your lights.”

Layer 4 is “You know that I’m flash­ing my lights be­cause I ex­pect you to in­ter­pret that as a sig­nal about what I’m think­ing.”

If we could only rea­son at layer 1, we would only be ca­pa­ble of rea­son­ing about other peo­ple as stim­u­lus-re­sponse ma­chines—things in the en­vi­ron­ment with com­pli­cated be­hav­iors that are worth learn­ing about, but are more or less black boxes. As we in­crease the depth of these re­cur­sive mod­els of our part­ner in com­mu­ni­ca­tion, we’re com­mit­ting our­selves to more in­for­ma­tive pri­ors about how we ex­pect peo­ple to rea­son and be­have.

(See also pre­vi­ous similar dis­cus­sion on LW)

Now, if we wanted, we could keep adding lay­ers in an in­finite tower. But more gen­er­ally, we don’t need to use lots of lay­ers in the real world, and so we don’t. As Den­nett puts it in an ex­cel­lent pa­per about Gricean com­mu­ni­ca­tion among mon­keys:

A fourth-or­der sys­tem might want you to think it un­der­stood you to be re­quest­ing that it leave. How high can we hu­man be­ings go? “In prin­ci­ple,” for­ever, no doubt, but in fact I sus­pect that you won­der whether I re­al­ize how hard it is for you to be sure that you un­der­stand whether I mean to be say­ing that you can rec­og­nize that I can be­lieve you to want me to ex­plain that most of us can keep track of only about five or six or­ders, un­der the best of cir­cum­stances.

Values:

What’s the re­la­tion to value learn­ing? Well, hu­man val­ues aren’t writ­ten down on a stone tablet some­where, they’re in­side hu­mans. But our val­ues aren’t writ­ten down in plain FORTRAN on the in­side of our frontal lobe, ei­ther—they can be pointed to only as el­e­ments of some model of hu­mans and the sur­round­ing en­vi­ron­ment.

If val­ues are sup­posed to live in a model of hu­mans, this raises the ques­tion “which model of hu­mans?” And the an­swer is “the one hu­mans think is best”—and now we’re go­ing to need those re­cur­sive agent-shaped mod­els.

And why stop there? Why not lo­cate the best way to model hu­mans as hav­ing a prefer­ences about mod­els of hu­mans, by con­sult­ing a model of hu­mans with prefer­ences over prefer­ences over mod­els?

There are a cou­ple differ­ent di­rec­tions to go from here. One way is to try to col­lapse the re­cur­sion. Find a sin­gle agent-shaped model of hu­mans that is (or ap­prox­i­mates) a fixed point of this model-rat­ifi­ca­tion pro­cess (and also hope­fully stays close to real hu­mans by some met­ric), and use the prefer­ences of that. This is what I see as the endgame of the imi­ta­tion /​ boot­strap­ping re­search.

Another way might be to imi­tate com­mu­ni­ca­tion, and find a way to use re­cur­sive mod­els such that we can stop the re­cur­sion early with­out much loss in effec­tive­ness. In com­mu­ni­ca­tion, the in­ner­most layer of the model can be quite sim­plis­tic, and then the next is more com­pli­cated by virtue of tak­ing ad­van­tage of the first, and so on. At each layer you can do some amount of ab­stract­ing away of the de­tails of pre­vi­ous lay­ers, so by the time you’re at layer 4 maybe it doesn’t mat­ter that layer 1 was just a crude fac­simile of hu­man be­hav­ior.

Is­sues:

The main prob­lem with both of these is the difficulty of con­vert­ing hu­man be­hav­ior, at any layer of re­cur­sion, into a for­mat wor­thy of be­ing called prefer­ences. Hu­mans are in­con­sis­tent on the ob­ject-level, and in­ca­pable of hold­ing en­tire mod­els of them­selves in their heads in or­der to have spe­cific thoughts about them.

Of course, it’s fine if “prefer­ences” aren’t a big lookup table, but are in­stead some com­pu­ta­tion that takes in op­tions and out­puts re­sponses. But we don’t have good enough ways to think about what hap­pens when the out­puts are in­con­sis­tent.

The best hu­mans can do, it of­ten feels like, is to es­pouse mostly-con­sis­tent gen­eral prin­ci­ples that are in­suffi­cient to pin down any one prefer­ence ex­actly, and might change day to day. If some­thing can do com­pli­cated rat­ifi­ca­tion of mod­els of hu­man prefer­ences about mod­els, that’s so not-hu­man-like that I worry that some­thing even more in­hu­man is go­ing on un­der the hood.

This is why I’m in­ter­ested in the anal­ogy to Gricean com­mu­ni­ca­tion, where you just need a few lay­ers of mod­els and they get less com­pli­cated the more re­cur­sive they are. It sort of seems like what I do when I think about metaethics.

The anal­ogy isn’t all that ex­act, though, be­cause in com­mu­ni­ca­tion the ac­tual com­mu­ni­ca­tor con­tains the re­cur­sive model as a part of it­self, and the model con­tains the deeper lay­ers as strict sub-mod­els. But it feels like in our cur­rent pic­ture of meta-prefer­ences, the ac­tual prefer­rer is con­tained in­side the meta-prefer­rer, which is con­tained in­side the meta-meta-prefer­rer, and the re­cur­sion ex­plodes rather than con­verg­ing.

So yeah. If any­one else out there is think­ing about tech­ni­cal ways to talk about meta-prefer­ences, I’m in­ter­ested in how you’re try­ing to think about the prob­lem.

No comments.