Thoughts on Quantilizers

A pu­ta­tive new idea for AI con­trol; in­dex here.

This post will look at some of the prop­er­ties of quan­tiliz­ers, when they suc­ceed and how they might fail.

Roughly speak­ing, let be some true ob­jec­tive func­tion that we want to max­imise. We haven’t been able to spec­ify it fully, so we have in­stead a proxy func­tion . There is a cost func­tion which mea­sures how much falls short of . Then a quan­tilizer will choose ac­tions (or poli­cies) radomly from the top of ac­tions available, rank­ing those ac­tions ac­cord­ing to .

It is plau­si­ble that for stan­dard ac­tions or poli­cies, and are pretty similar. But that when we push to max­imis­ing , then the tiny de­tails where and differ will bal­loon, and the cost can grow very large in­deed.

This could be illus­trated roughly by figure I, where and are plot­ted against each other; imag­ine that is on a log scale.

The blue ar­eas are pos­si­ble ac­tions that can be taken. Note a large bunch of ac­tions that are not par­tic­u­larly good for but have low cost, a thin­ner tail of more op­ti­mised ac­tions that have higher and still have low cost, and a much thin­ner tail that has even higher but high cost. The -max­imis­ing ac­tions with max­i­mal cost are rep­re­sented by the red star.

Figure I thus shows a situ­a­tion ripe for some form of quan­tiliza­tion. But con­sider figure II:

In figure 2, the only way to get high is to have a high . The situ­a­tion is com­pletely un­suited for quan­tiliza­tion: any max­imiser, even a quan­tilizer, will score ter­ribly un­der . But that means mainly that we have cho­sen a ter­rible .

Now, back to figure I, where quan­tiliza­tion might work, at least in prin­ci­ple. The ideal would be situ­a­tion Ia; here blue rep­re­sents ac­tions be­low the top cut-off, green those above (which in­clude the edge-case red-star ac­tions, as be­fore):

Here the top of ac­tions all score a good value un­der , and yet most of them have low cost.

But even within the the broad strokes of figure I, quan­tiliza­tion can fail. Figure Ib shows a first type of failure:

Here the prob­lem is that the quan­tilizer lefts in too many mediocre ac­tions, so the ex­pec­ta­tion of (and ) is mediocre; with a smaller , the quan­tilizer would be bet­ter.

Another failure mode is figure Ic:

Here the is too low: all the quan­tilized solu­tions have high cost.

Another quan­tilizer design

An idea I had some time ago was that, in­stead of of tak­ing the top of the ac­tions, the quan­tilizer in­stead choose among the ac­tions that are within of the top -max­imis­ing ac­tions. Such a de­sign would be less likely to en­counter situ­a­tions like Ib, but more likely to face situ­a­tions like Ic.

What can be done?

So, what can be done to im­prove quan­tiliz­ers? I’ll be post­ing some thoughts as they de­velop, but there are two ideas that spring to mind im­me­di­ately. First of all, we can use CUA or­a­cles to in­ves­ti­gate the shape of the space of ac­tions, at least from the per­spec­tive of (, like , can­not be calcu­lated ex­plic­itly).

Se­condly, there’s an idea that I had around low-im­pact AIs. Ba­si­cally, it was to en­sure that there was some ac­tion the AI could take that could eas­ily reach some ap­prox­i­ma­tion of its goal. For in­stance, have a util­ity func­tion that en­courages the AI to build one pa­peclip, and cap that util­ity at one. Then scat­ter around some ba­sic ma­chin­ery to melt steel, stretch it, give the AI some ma­nipu­la­tor arms, etc… The idea is to en­sure there is at least one safe policy that gives the AI some high ex­pected util­ity. Then if there is one policy, there’s prob­a­bly a large amount of similar poli­cies in its vicinity, safe poli­cies with high ex­pec­ta­tion. Then it seems that quan­tiliza­tion should work, prob­a­bly best in its ‘within of the max­i­mal policy’ ver­sion (work­ing well be­cause we know the cap of the util­ity func­tion, hence have a cap on the max­i­mal policy).

Now, how do we know that a safe policy ex­ists? We have to rely on hu­man pre­dic­tive abil­ities, which can be flawed. But the rea­son we’re rea­son­ably con­fi­dent in this sce­nario is that we be­lieve that we could figure out how to build a pa­per­clip, given the stuff the AI has ly­ing around. And the AI would pre­sum­ably do bet­ter than us.

No comments.