I think of the example as illustrative but the real power of the argument comes from the planner+reward formalism and the associated impossibility theorem. The fact that Kolmogorov complexity doesn’t help is worrying. It’s possible that other regularization techniques work where Kolmogorov complexity doesn’t, but that begs the question of what is so special about these other regularization techniques.
Suppose we start our AI off with the intentional stance, where we have a high-level description of these human objects as agents with desires and plans, beliefs and biases and abilities and limitations.
What I’m thinking when I say we need to “bridge the gap” is that I think if we knew what we were doing, we could stipulate that some set of human button-presses is more aligned with some complicated object “hDesires” than not, and the robot should care about hDesires, where hDesires is the part of the intentional stance description of the physical human that plays the functional role of desires.
I think of the example as illustrative but the real power of the argument comes from the planner+reward formalism and the associated impossibility theorem. The fact that Kolmogorov complexity doesn’t help is worrying. It’s possible that other regularization techniques work where Kolmogorov complexity doesn’t, but that begs the question of what is so special about these other regularization techniques.
Suppose we start our AI off with the intentional stance, where we have a high-level description of these human objects as agents with desires and plans, beliefs and biases and abilities and limitations.
What I’m thinking when I say we need to “bridge the gap” is that I think if we knew what we were doing, we could stipulate that some set of human button-presses is more aligned with some complicated object “hDesires” than not, and the robot should care about hDesires, where hDesires is the part of the intentional stance description of the physical human that plays the functional role of desires.