Agreed. The problem is with AI designs which don’t do that. It seems to me like this perspective is quite rare. For example, my post Policy Alignment was about something similar to this, but I got a ton of pushback in the comments—it seems to me like a lot of people really think the AI should use better AI concepts, not human concepts. At least they did back in 2018.
As you mention, this is partly due to overly reductionist world-views. If tables/happiness aren’t reductively real, the fact that the AI is using those concepts is evidence that it’s dumb/insane, right?
From an “engineering perspective”, if I was forced to choose something right now, it would be an AI “optimizing human utility according to AI beliefs” but asking for clarification when such choice diverges too much from the “policy-approval”.
Probably most of the problem was that my post didn’t frame things that well—I was mainly talking in terms of “beliefs”, rather than emphasizing ontology, which makes it easy to imagine AI beliefs are about the same concepts but just more accurate. John’s description of the pointers problem might be enough to re-frame things to the point where “you need to start from human concepts, and improve them in ways humans endorse” is bordering on obvious.
(Plus I arguably was too focused on giving a specific mathematical proposal rather than the general idea.)
Agreed. The problem is with AI designs which don’t do that. It seems to me like this perspective is quite rare. For example, my post Policy Alignment was about something similar to this, but I got a ton of pushback in the comments—it seems to me like a lot of people really think the AI should use better AI concepts, not human concepts. At least they did back in 2018.
As you mention, this is partly due to overly reductionist world-views. If tables/happiness aren’t reductively real, the fact that the AI is using those concepts is evidence that it’s dumb/insane, right?
Illustrative excerpt from a comment there:
Probably most of the problem was that my post didn’t frame things that well—I was mainly talking in terms of “beliefs”, rather than emphasizing ontology, which makes it easy to imagine AI beliefs are about the same concepts but just more accurate. John’s description of the pointers problem might be enough to re-frame things to the point where “you need to start from human concepts, and improve them in ways humans endorse” is bordering on obvious.
(Plus I arguably was too focused on giving a specific mathematical proposal rather than the general idea.)