Vaniver comments on An overview of 11 proposals for building safe advanced AI

Vaniver 24 Oct 2022 16:04 UTC
4 points
You can try to have feedback separately on the ‘ultimate desirability’ of consequences and the ‘practical usefulness’ of actions, where you build the consequence-prediction model solely from experimental data and the value-estimation model solely from human feedback. I think this runs into serious issues because humans have to solve the mixed problem, not the split problem, and so it will be difficult for humans to give well-split training data.
As well, having a solution that’s “real but expensive” would be a real step up from having no solution!