Thanks! Pulling on that thread a bit more, compare:
My goal is that the human overseer achieves her goals. To accomplish this, I need to observe and interact with the human to understand her better—what kind of food she likes, how she responds to different experiences, etc. etc.
My goal is to maximize the speed of this racecar. To accomplish this, I need to observe and interact with the racecar to understand it better—how its engine responds to different octane fuels, how its tires respond to different weather conditions, etc. etc.
To me, they don’t seem that different on a fundamental level. But they do have the super-important practical difference that the first one doesn’t seem to have problematic instrumental subgoals.
(I think I’m just agreeing with your comment here?)
Thanks! Pulling on that thread a bit more, compare:
My goal is that the human overseer achieves her goals. To accomplish this, I need to observe and interact with the human to understand her better—what kind of food she likes, how she responds to different experiences, etc. etc.
My goal is to maximize the speed of this racecar. To accomplish this, I need to observe and interact with the racecar to understand it better—how its engine responds to different octane fuels, how its tires respond to different weather conditions, etc. etc.
To me, they don’t seem that different on a fundamental level. But they do have the super-important practical difference that the first one doesn’t seem to have problematic instrumental subgoals.
(I think I’m just agreeing with your comment here?)
Yeah, I think that’s basically right.