Input a corpus of text (could be multiple posts) describing technical approaches to align a powerful AI. Split this into a finite number of items that are relatively short (such as paragraphs). Ask the oracle to choose the part that is most worth spending more time on. (For example, there might be a paragraph with a dangerous hidden assumption in an otherwise promising approach, and thinking more about it might reveal that and lead to conceptual progress.)
Have a team of researches look into it for an adequate amount of time which is fixed (and told to the oracle) in advance (maybe three months?) After the time is over, have them rate the progress they made compared to some sensible baseline. Use this as the oracle’s reward.
Of course this has the problem of maximizing for apparent insight rather than actual insight.
Submission for LBO:
Input a corpus of text (could be multiple posts) describing technical approaches to align a powerful AI. Split this into a finite number of items that are relatively short (such as paragraphs). Ask the oracle to choose the part that is most worth spending more time on. (For example, there might be a paragraph with a dangerous hidden assumption in an otherwise promising approach, and thinking more about it might reveal that and lead to conceptual progress.)
Have a team of researches look into it for an adequate amount of time which is fixed (and told to the oracle) in advance (maybe three months?) After the time is over, have them rate the progress they made compared to some sensible baseline. Use this as the oracle’s reward.
Of course this has the problem of maximizing for apparent insight rather than actual insight.
Until we can measure actual insight, this will remain a problem ^_^