Humans Consulting HCH

(See also: strong HCH.)

Con­sider a hu­man Hugh who has ac­cess to a ques­tion-an­swer­ing ma­chine. Sup­pose the ma­chine an­swers ques­tion Q by perfectly imi­tat­ing how Hugh would an­swer ques­tion Q, if Hugh had ac­cess to the ques­tion-an­swer­ing ma­chine.

That is, Hugh is able to con­sult a copy of Hugh, who is able to con­sult a copy of Hugh, who is able to con­sult a copy of Hugh…

Let’s call this pro­cess HCH, for “Hu­mans Con­sult­ing HCH.”

I’ve talked about many var­i­ants of this pro­cess be­fore, but I find it eas­ier to think about with a nice han­dle. (Credit to Eliezer for propos­ing us­ing a re­cur­sive acronym.)

HCH is easy to spec­ify very pre­cisely. For now, I think that HCH is our best way to pre­cisely spec­ify “a hu­man’s en­light­ened judg­ment.” It’s got plenty of prob­lems, but for now I don’t know any­thing bet­ter.


We can define re­al­iz­able var­i­ants of this in­ac­cessible ideal:

  • For a par­tic­u­lar pre­dic­tion al­gorithm P, define HCHᴾ as:
    “P’s pre­dic­tion of what a hu­man would say af­ter con­sult­ing HCHᴾ”

  • For a re­in­force­ment learn­ing al­gorithm A, define max-HCHᴬ as:
    “A’s out­put when max­i­miz­ing the eval­u­a­tion of a hu­man af­ter con­sult­ing max-HCHᴬ”

  • For a given mar­ket struc­ture and par­ti­ci­pants, define HCHᵐᵃʳᵏᵉᵗ as:
    “the mar­ket’s pre­dic­tion of what a hu­man will say af­ter con­sult­ing HCHᵐᵃʳᵏᵉᵗ”

Note that e.g. HCHᴾ is to­tally differ­ent from “P’s pre­dic­tion of HCH.” HCHᴾ will gen­er­ally make worse pre­dic­tions, but it is eas­ier to im­ple­ment.


The best case is that HCHᴾ, max-HCHᴬ, and HCHᵐᵃʳᵏᵉᵗ are:

  • As ca­pa­ble as the un­der­ly­ing pre­dic­tor, re­in­force­ment learner, or mar­ket par­ti­ci­pants.

  • Aligned with the en­light­ened judg­ment of the hu­man, e.g. as eval­u­ated by HCH.

(At least when the hu­man is suit­ably pru­dent and wise.)

It is clear from the defi­ni­tions that these sys­tems can’t be any more ca­pa­ble than the un­der­ly­ing pre­dic­tor/​learner/​mar­ket. I hon­estly don’t know whether we should ex­pect them to match the un­der­ly­ing ca­pa­bil­ities. My in­tu­ition is that max-HCHᴬ prob­a­bly can, but that HCHᴾ and HCHᵐᵃʳᵏᵉᵗ prob­a­bly can’t.

It is similarly un­clear whether the sys­tem con­tinues to re­flect the hu­man’s judg­ment. In some sense this is in ten­sion with the de­sire to be ca­pa­ble — the more guarded the hu­man, the less ca­pa­ble the sys­tem but the more likely it is to re­flect their in­ter­ests. The ques­tion is whether a pru­dent hu­man can achieve both goals.

This was origi­nally posted here on 29th Jan­uary 2016.

To­mor­row’s AI Align­ment Fo­rum se­quences will take a break, and to­mor­row’s post will be Is­sue #34 of the Align­ment Newslet­ter.

The next post in this se­quence is ‘Cor­rigi­bil­ity’ by Paul Chris­ti­ano, which will be pub­lished on Tues­day 27th Novem­ber.

No nominations.
No reviews.