I think the capabilities of the AI matters a lot for alignment strategies, and that’s why I’m asking you about it and why I need you to answer that question.
A subhuman intelligence would rely on humans to make most of the decisions. It would order human-designed furniture types through human-created interfaces and receive human-fabricated furniture. At each of those steps, it delgates an enormous number of decisions to humans, which makes those decisions automatically end up reasonably aligned, but also prevents the AI from doing optimization over them. In the particular case of human-designed interfaces, they tend to automatically expose information about the things that humans care about, and eliciting human preferences can be shortcut be focusing on these dimensions.
But a superhuman intelligence would solve tasks through taking actions independently of humans, as that can allow it to more highly optimize the outcomes. And a solution for alignment that relies on humans making most of the decisions would presumably not generalize to this case, where the AI makes most of the decisions.
I think the capabilities of the AI matters a lot for alignment strategies, and that’s why I’m asking you about it and why I need you to answer that question.
A subhuman intelligence would rely on humans to make most of the decisions. It would order human-designed furniture types through human-created interfaces and receive human-fabricated furniture. At each of those steps, it delgates an enormous number of decisions to humans, which makes those decisions automatically end up reasonably aligned, but also prevents the AI from doing optimization over them. In the particular case of human-designed interfaces, they tend to automatically expose information about the things that humans care about, and eliciting human preferences can be shortcut be focusing on these dimensions.
But a superhuman intelligence would solve tasks through taking actions independently of humans, as that can allow it to more highly optimize the outcomes. And a solution for alignment that relies on humans making most of the decisions would presumably not generalize to this case, where the AI makes most of the decisions.
I think there are intermediate cases—delegating some but not all decisions—that require this sort of tooling. See Eg this paper from today: http://ai.googleblog.com/2022/04/simple-and-effective-zero-shot-task.html that focuses on how to learn intent.