My use of the phrase “Super-Human Feedback”

I’ve taken to call­ing De­bate, Am­plifi­ca­tion, and Re­cur­sive Re­ward Model­ing “Su­per-hu­man feed­back” (SHF) tech­niques. The point of this post is just to in­tro­duce that ter­minol­ogy and ex­plain a bit why I like it and what I mean by it.

By call­ing some­thing SHF I mean that it aims to out­perform a sin­gle, un­aided hu­man H at the task of pro­vid­ing feed­back about H’s in­ten­tions for train­ing an AI sys­tem. I like think­ing of it this way, be­cause I think it makes it clear that these three ap­proaches are nat­u­rally grouped to­gether like this, and might in­spire us to con­sider what else could fall into that cat­e­gory (a sim­ple ex­am­ple is just us­ing a team of hu­mans).

I think this is very similar to “scal­able over­sight” (as dis­cussed in Con­crete Prob­lems), but maybe differ­ent be­cause:

1) It doesn’t im­ply that the ap­proach must be scalable

2) It doesn’t re­quire that feed­back is ex­pen­sive, i.e. it ap­plies to things where hu­man feed­back is cheap, but we can do bet­ter than the cheap hu­man feed­back with SHF.

No comments.