I think that right now we don’t know how to bridge the gap between the thing that presses the buttons on the computer, and a fuzzy specification of a human as a macroscopic physical object. And so if you are defining “human” as the thing that presses the buttons, and you can take actions that fully control which buttons get pressed, it makes sense that there’s not necessarily a definition of what this “human” wants.
If we actually start bridging the gap, though, I think it makes lots of sense for the AI to start building up a model of the human-as-physical-object which also takes into account button presses, and in that case I’m not too pessimistic about regularization.
I think of the example as illustrative but the real power of the argument comes from the planner+reward formalism and the associated impossibility theorem. The fact that Kolmogorov complexity doesn’t help is worrying. It’s possible that other regularization techniques work where Kolmogorov complexity doesn’t, but that begs the question of what is so special about these other regularization techniques.
Suppose we start our AI off with the intentional stance, where we have a high-level description of these human objects as agents with desires and plans, beliefs and biases and abilities and limitations.
What I’m thinking when I say we need to “bridge the gap” is that I think if we knew what we were doing, we could stipulate that some set of human button-presses is more aligned with some complicated object “hDesires” than not, and the robot should care about hDesires, where hDesires is the part of the intentional stance description of the physical human that plays the functional role of desires.
I think that right now we don’t know how to bridge the gap between the thing that presses the buttons on the computer, and a fuzzy specification of a human as a macroscopic physical object. And so if you are defining “human” as the thing that presses the buttons, and you can take actions that fully control which buttons get pressed, it makes sense that there’s not necessarily a definition of what this “human” wants.
If we actually start bridging the gap, though, I think it makes lots of sense for the AI to start building up a model of the human-as-physical-object which also takes into account button presses, and in that case I’m not too pessimistic about regularization.
I think of the example as illustrative but the real power of the argument comes from the planner+reward formalism and the associated impossibility theorem. The fact that Kolmogorov complexity doesn’t help is worrying. It’s possible that other regularization techniques work where Kolmogorov complexity doesn’t, but that begs the question of what is so special about these other regularization techniques.
Suppose we start our AI off with the intentional stance, where we have a high-level description of these human objects as agents with desires and plans, beliefs and biases and abilities and limitations.
What I’m thinking when I say we need to “bridge the gap” is that I think if we knew what we were doing, we could stipulate that some set of human button-presses is more aligned with some complicated object “hDesires” than not, and the robot should care about hDesires, where hDesires is the part of the intentional stance description of the physical human that plays the functional role of desires.