It seems worth reflecting on the fact that the point of the foundational LW material discussing utility functions was to make people better at reasoning about AI behavior and not about human behavior.
For value extrapolation problem, you need to consider both what an AI could do with a goal (how to use it, what kind of thing it is), and which goal represents humane values (how to define it).
I still think there’s too much confusion between ethics-for-AI and ethics-for-humans discussions here. There’s no particular reason that a conceptual apparatus suited for the former discussion should also be suited for the latter discussion.
Yep. Particularly as humans are observably not human-friendly. (Even to the extent of preserving human notions of value—plenty of humans go dangerously nuts.)
For value extrapolation problem, you need to consider both what an AI could do with a goal (how to use it, what kind of thing it is), and which goal represents humane values (how to define it).
I still think there’s too much confusion between ethics-for-AI and ethics-for-humans discussions here. There’s no particular reason that a conceptual apparatus suited for the former discussion should also be suited for the latter discussion.
Yep. Particularly as humans are observably not human-friendly. (Even to the extent of preserving human notions of value—plenty of humans go dangerously nuts.)