Just to clarify: I agree that there would be no point in an AI flagging different value types with a little metadata flag saying ‘religious taboo’ vs ‘food preference’ unless that metadata was computationally relevant to the kinds of learning, inference, generalization, and decision-making that the AI did. But my larger point was that humans treat these value types very differently in terms of decision-making (especially in social contexts), so true AI alignment would require that AI systems do too.
I wasn’t picturing human programmers designing value representations by hand for each value type. I don’t know how to take seriously the heterogeneity of value types when developing AI systems. I was just making an argument that we need to solve that problem somehow, if we actually want the AI to act in accordance with the way that humans treat different types of values differently.....
Hi Charlie, thanks for your comment.
Just to clarify: I agree that there would be no point in an AI flagging different value types with a little metadata flag saying ‘religious taboo’ vs ‘food preference’ unless that metadata was computationally relevant to the kinds of learning, inference, generalization, and decision-making that the AI did. But my larger point was that humans treat these value types very differently in terms of decision-making (especially in social contexts), so true AI alignment would require that AI systems do too.
I wasn’t picturing human programmers designing value representations by hand for each value type. I don’t know how to take seriously the heterogeneity of value types when developing AI systems. I was just making an argument that we need to solve that problem somehow, if we actually want the AI to act in accordance with the way that humans treat different types of values differently.....