Supervising strong learners by amplifying weak experts

Link post


Many real world learn­ing tasks in­volve com­plex or hard-to-spec­ify ob­jec­tives, and us­ing an eas­ier-to-spec­ify proxy can lead to poor perfor­mance or mis­al­igned be­hav­ior. One solu­tion is to have hu­mans provide a train­ing sig­nal by demon­strat­ing or judg­ing perfor­mance, but this ap­proach fails if the task is too com­pli­cated for a hu­man to di­rectly eval­u­ate. We pro­pose Iter­ated Am­plifi­ca­tion, an al­ter­na­tive train­ing strat­egy which pro­gres­sively builds up a train­ing sig­nal for difficult prob­lems by com­bin­ing solu­tions to eas­ier sub­prob­lems. Iter­ated Am­plifi­ca­tion is closely re­lated to Ex­pert Iter­a­tion (An­thony et al., 2017; Silver et al., 2017b), ex­cept that it uses no ex­ter­nal re­ward func­tion. We pre­sent re­sults in al­gorith­mic en­vi­ron­ments, show­ing that Iter­ated Am­plifi­ca­tion can effi­ciently learn com­plex be­hav­iors.

To­mor­row’s AI Align­ment Fo­rum se­quences post will be ‘AI safety with­out goal-di­rected be­hav­ior’ by Ro­hin Shah, in the se­quence on Value Learn­ing.

The next post in this se­quence on Iter­ated Am­plifi­ca­tion will be ‘AlphaGo Zero and ca­pa­bil­ity am­plifi­ca­tion’, by Paul Chris­ti­ano, on Tues­day 8th Jan­uary.

No comments.