I know that AI alignment researchers don’t aim to hand-code human values into AI systems, and most aim to ‘implicitly describe human values’. Agreed.
The issue is, which human values are you trying to implicitly incorporate into the AI system?
I guess if you think that all human values are generic, computationally interchangeable, extractible (from humans) by the same methods, and can be incorporated into AIs using the same methods, then that could work, in principle. But if we don’t explicitly consider the whole range of human value types, how would we even test whether our generic methods could work for all relevant value types?
I know that AI alignment researchers don’t aim to hand-code human values into AI systems, and most aim to ‘implicitly describe human values’. Agreed.
The issue is, which human values are you trying to implicitly incorporate into the AI system?
I guess if you think that all human values are generic, computationally interchangeable, extractible (from humans) by the same methods, and can be incorporated into AIs using the same methods, then that could work, in principle. But if we don’t explicitly consider the whole range of human value types, how would we even test whether our generic methods could work for all relevant value types?