Approval-directed bootstrapping

Ap­proval-di­rected be­hav­ior works best when the over­seer is very smart. Where can we find a smart over­seer?

One ap­proach is boot­strap­ping. By think­ing for a long time, a weak agent can over­see an agent (slightly) smarter than it­self. Now we have a slightly smarter agent, who can over­see an agent which is (slightly) smarter still. This pro­cess can go on, un­til the in­tel­li­gence of the re­sult­ing agent is limited by tech­nol­ogy rather than by the ca­pa­bil­ity of the over­seer. At this point we have reached the limits of our tech­nol­ogy.

This may sound ex­otic, but we can im­ple­ment it in a sur­pris­ingly straight­for­ward way.

Sup­pose that we eval­u­ate Hugh’s ap­proval by pre­dict­ing what Hugh would say if we asked him; the rat­ing of ac­tion a is what Hugh would say if, in­stead of tak­ing ac­tion a, we asked Hugh, “How do you rate ac­tion a?”

Now we get boot­strap­ping al­most for free. In the pro­cess of eval­u­at­ing a pro­posed ac­tion, Hugh can con­sult Arthur. This new in­stance of Arthur will, in turn, be over­seen by Hugh—and in this new role Hugh can, in turn, be as­sisted by Arthur. In prin­ci­ple we have defined the en­tire in­finite regress be­fore Arthur takes his first ac­tion.

We can even learn this func­tion by ex­am­ples — no elab­o­rate defi­ni­tions nec­es­sary. Each time Arthur pro­poses an ac­tion, we ac­tu­ally ask Hugh to eval­u­ate the ac­tion with some prob­a­bil­ity, and we use our ob­ser­va­tions to train a model for Hugh’s judg­ments.

In prac­tice, Arthur might not be such a use­ful as­sis­tant un­til he has ac­quired some train­ing data. As Arthur ac­quires train­ing data, the Hugh+Arthur sys­tem be­comes more in­tel­li­gent, and so Arthur ac­quires train­ing data from a more in­tel­li­gent over­seer. The boot­strap­ping un­folds over time as Arthur ad­justs to in­creas­ingly pow­er­ful over­seers.


This was origi­nally posted here on 21st De­cem­ber 2014.

To­mor­row’s AI Align­ment Fo­rum se­quences will take a break, and to­mor­row’s post will be Is­sue #34 of the Align­ment Newslet­ter.

The next post in this se­quence is ‘Hu­mans con­sult­ing HCH’, also re­leased to­day.

No comments.