Weak HCH accesses EXP

This post is a fol­low-up to my “Align­ment pro­pos­als and com­plex­ity classes” post. Thanks to Sam Eisen­stat for helping with part of the proof here.

Pre­vi­ously, I proved that imi­ta­tive am­plifi­ca­tion with weak HCH, ap­proval-based am­plifi­ca­tion, and re­cur­sive re­ward mod­el­ing ac­cess while AI safety via mar­ket mak­ing ac­cesses . At the time, I wasn’t sure whether my mar­ket mak­ing proof would gen­er­al­ize to the oth­ers, so I just pub­lished it with the proofs in­stead. How­ever, I have since be­come con­vinced that the proof does gen­er­al­ize—and that it gen­er­al­izes for all of the pro­pos­als I men­tioned—such that imi­ta­tive am­plifi­ca­tion with weak HCH, ap­proval-based am­plifi­ca­tion, and re­cur­sive re­ward mod­el­ing all ac­tu­ally ac­cess . This post at­tempts to prove that.

Up­dated list of pro­pos­als by com­plex­ity class

: Imi­ta­tion learn­ing (triv­ial)

: AI safety via de­bate (proof)

: AI safety via mar­ket mak­ing (proof), Imi­ta­tive am­plifi­ca­tion with weak HCH (proof be­low), Ap­proval-based am­plifi­ca­tion (proof be­low), Re­cur­sive re­ward mod­el­ing (proof be­low)

: De­bate with cross-ex­am­i­na­tion (proof)

: Imi­ta­tive am­plifi­ca­tion with strong HCH (proof), AI safety via mar­ket mak­ing with poin­t­ers (proof)

Proofs

Imi­ta­tive am­plifi­ca­tion with weak HCH ac­cesses

The proof here is similar in struc­ture to my pre­vi­ous proof that weak HCH ac­cesses , so I’ll only ex­plain where this proof differs from that one.

First, since , we know that for any , halts in steps where . Thus, we can con­struct a func­tion such that for all , halts in less than or equal to steps by pick­ing large enough that they dom­i­nate all other terms in the polyno­mial for all . Note that is then com­putable in time polyno­mial in .

Se­cond, let ’s new strat­egy be as fol­lows:

  1. Given , let . Then, re­turn ac­cept/​re­ject based on whether is an ac­cept or re­ject state (it will always be one or the other)

  2. Given , re­turn where is the start­ing state and is the empty tape sym­bol.

  3. Given , let Then, re­turn where is the next state of from state over sym­bol (where if we’re already in an ac­cept/​re­ject state we just stay there) and is the new tape sym­bol at the head (which can be de­ter­mined given ).

  4. Given , re­turn .

  5. Given , let . Then, let be the amount by which on changes the head po­si­tion by (such that ). If , let oth­er­wise let . Re­turn .

Note that this strat­egy pre­cisely repli­cates the strat­egy used in my proof that mar­ket mak­ing ac­cesses for in­puts and . Thus, I’ll just defer to that proof for why the strat­egy works and is polyno­mial time on those in­puts. Note that this is where be­comes es­sen­tial, as it guaran­tees that and can be rep­re­sented in polyno­mial space.

Then, the only differ­ence be­tween this strat­egy and the mar­ket mak­ing one is for the base in­put. On , given that works, will always re­turn a state af­ter has halted, which will always be an ac­cept state if ac­cepts and a re­ject state if it re­jects. Fur­ther­more, since and is com­putable in polyno­mial time, the strat­egy for is polyno­mial, as de­sired.

Since this pro­ce­dure works for all , we get that am­plifi­ca­tion with weak HCH ac­cesses , as de­sired.

Ap­proval-based am­plifi­ca­tion ac­cesses

The proof here is al­most pre­cisely the same as my pre­vi­ous proof that ap­proval-based am­plifi­ca­tion ac­cesses with the only mod­ifi­ca­tion be­ing that we need to ver­ify that are still polyno­mial in size for the new imi­ta­tive am­plifi­ca­tion strat­egy given above. How­ever, the fact that and are bounded above by means that they are rep­re­sentable in polyno­mial space, as de­sired.

Re­cur­sive re­ward mod­el­ing ac­cesses

Again, the proof here is al­most pre­cisely the same as my pre­vi­ous proof that re­cur­sive re­ward mod­el­ing ac­cesses with the only mod­ifi­ca­tion be­ing the same as with the above proof that ap­proval-based am­plifi­ca­tion ac­cesses .

No comments.