You draw a distinction between human values and human norms. For example, an AI can respect someone’s autonomy before the AI gets to know their values and the exact amount of autonomy they want.
I draw the same distinction, but more abstract. It’s a distinction between human values and properties of any system/task. AI can respect keeping some properties of its reward systems intact before it gets to know human values.
I think even in very simple games an AI could learn important properties of systems. Which would significantly help the AI to respect human values.
I checked out some of your posts (haven’t read 100% of them): Learning Normativity: A Research Agenda and Non-Consequentialist Cooperation?
You draw a distinction between human values and human norms. For example, an AI can respect someone’s autonomy before the AI gets to know their values and the exact amount of autonomy they want.
I draw the same distinction, but more abstract. It’s a distinction between human values and properties of any system/task. AI can respect keeping some properties of its reward systems intact before it gets to know human values.
I think even in very simple games an AI could learn important properties of systems. Which would significantly help the AI to respect human values.