I work on AI safety via learning from human feedback. In response to your three ideas:
Uniformly random human noise actually isn’t much of a problem. It becomes a problem when the human noise is systematically biased in some way, and the AI doesn’t know exactly what that bias is. Another core problem (which overlaps with the human bias), is that the AI must use a model of human decision-making to back out human values from human feedback/behavior/interaction, etc. If this model is wrong, even slightly (for example, the AI doesn’t realize that the noise is biased along one axis), the AI can infer incorrect human values.
I’m working on it, stay tuned.
Our most capable AI systems require a LOT of training data, and it’s already expensive to generate enough human feedback for training. Limiting the pool of human teachers to trusted experts, or providing pre-training to all of the teachers, would make this even more expensive. One possible way out of this is to train AI systems themselves to give feedback, in imitation of a small trusted set of human teachers.
I work on AI safety via learning from human feedback. In response to your three ideas:
Uniformly random human noise actually isn’t much of a problem. It becomes a problem when the human noise is systematically biased in some way, and the AI doesn’t know exactly what that bias is. Another core problem (which overlaps with the human bias), is that the AI must use a model of human decision-making to back out human values from human feedback/behavior/interaction, etc. If this model is wrong, even slightly (for example, the AI doesn’t realize that the noise is biased along one axis), the AI can infer incorrect human values.
I’m working on it, stay tuned.
Our most capable AI systems require a LOT of training data, and it’s already expensive to generate enough human feedback for training. Limiting the pool of human teachers to trusted experts, or providing pre-training to all of the teachers, would make this even more expensive. One possible way out of this is to train AI systems themselves to give feedback, in imitation of a small trusted set of human teachers.