The biggest problem I have with a lot of these is they require human feedback. Imagine a Chess AI receiving human feedback on each move having to compete with Alpha Zero’s self-supervised RL system, which beat every other Chess AI and human after just 72 hours of training. I just don’t see how human-feedback systems can possibly compete.
You can try to have feedback separately on the ‘ultimate desirability’ of consequences and the ‘practical usefulness’ of actions, where you build the consequence-prediction model solely from experimental data and the value-estimation model solely from human feedback. I think this runs into serious issues because humans have to solve the mixed problem, not the split problem, and so it will be difficult for humans to give well-split training data.
As well, having a solution that’s “real but expensive” would be a real step up from having no solution!
The biggest problem I have with a lot of these is they require human feedback. Imagine a Chess AI receiving human feedback on each move having to compete with Alpha Zero’s self-supervised RL system, which beat every other Chess AI and human after just 72 hours of training. I just don’t see how human-feedback systems can possibly compete.
You can try to have feedback separately on the ‘ultimate desirability’ of consequences and the ‘practical usefulness’ of actions, where you build the consequence-prediction model solely from experimental data and the value-estimation model solely from human feedback. I think this runs into serious issues because humans have to solve the mixed problem, not the split problem, and so it will be difficult for humans to give well-split training data.
As well, having a solution that’s “real but expensive” would be a real step up from having no solution!