[Question] How does iterated amplification exceed human abilities?

When I first started learn­ing about IDA, I thought that agents trained us­ing IDA would be hu­man-level af­ter the first stage, i.e. that Distill(H) would be hu­man-level. As I’ve writ­ten about be­fore, Paul later clar­ified this, so my new un­der­stand­ing is that af­ter the first stage, the dis­til­led agent will be su­per-hu­man in some re­spects and in­fra-hu­man in oth­ers, but wouldn’t be “ba­si­cally hu­man” in any sense.

But IDA is aiming to even­tu­ally be su­per-hu­man in al­most ev­ery way (be­cause it’s aiming to be com­pet­i­tive with un­al­igned AGI), so that raises some new ques­tions:

  1. If IDA isn’t go­ing to be hu­man-level af­ter the first stage, then at what stage does IDA be­come at-least-hu­man-level in al­most ev­ery way?

  2. What ex­actly is the limi­ta­tion that pre­vents the first stage of IDA from be­ing hu­man-level in al­most ev­ery way?

  3. When IDA even­tu­ally does be­come at-least-hu­man-level in al­most ev­ery way, how is the limi­ta­tion from (2) avoided?

That brings me to Evans et al., which con­tains a de­scrip­tion of IDA in sec­tion 0. The way IDA is set up in this pa­per leads me to be­lieve that the an­swer to (2) above is that the hu­man over­seer can­not provide a suffi­cient num­ber of demon­stra­tions for the most difficult tasks. For ex­am­ple, maybe the hu­man can provide enough demon­stra­tions for the agent to learn to an­swer very sim­ple ques­tions (tasks in in the pa­per) but it’s too time-con­sum­ing for the hu­man to an­swer enough com­pli­cated ques­tions (say, in ). My un­der­stand­ing is that IDA gets around this by hav­ing an am­plified sys­tem that is it­self au­to­mated (i.e. does not in­volve hu­mans in a ma­jor way, so can­not be bot­tle­necked on the slow­ness of hu­mans); this al­lows the am­plified sys­tem to provide a suffi­cient num­ber of demon­stra­tions for the dis­til­la­tion step to work.

So in the above view, the an­swer to (2) is that the limi­ta­tion is the num­ber of demon­stra­tions the hu­man can provide, and the an­swer to (3) is that the hu­man can seed the IDA pro­cess with suffi­cient demon­stra­tions of easy tasks, af­ter which the (au­to­mated) am­plified sys­tem can provide suffi­cient demon­stra­tions of the harder tasks. The an­swer to (1) is kind of vague: it’s just the small­est for which con­tains al­most all tasks a hu­man can do.

But the above view seems to con­flict with what’s in the IDA post and the IDA pa­per. In both of those, the am­plified sys­tem is de­scribed as a hu­man do­ing the de­com­po­si­tions (so it will be slow, or else one would need to ar­gue that the slow­ness of hu­mans de­com­pos­ing tasks doesn’t mean­ingfully re­strict the num­ber of demon­stra­tions). Also, the main benefit of am­plifi­ca­tion is de­scribed not as the abil­ity to provide more demon­stra­tions, but rather to provide demon­stra­tions for more difficult tasks. Un­der this al­ter­na­tive view, the an­swers to ques­tions (1), (2), (3) aren’t clear to me.

Thanks to Vipul Naik for read­ing through this ques­tion and giv­ing feed­back.

No comments.