I think there is a bit of a rhetorical issue here with the necessity argument: I agree that a powerful program aligned to a person would have an accurate internal model of that person, but I think that this is true by default whenever a powerful, goal seeking program interacts with a person- it’s just one of the default instrumental subgoals, not alignment specific.
There’s a difference between building a model of a person and using that model as a core element of your decision making algorithm. So what you’re describing seems even weaker than weak necessity.
However, I agree that some of the ideas I’ve sketched are pretty loose. I’m trying to provide a conceptual frame and work out some of the implications only.
I think there is a bit of a rhetorical issue here with the necessity argument: I agree that a powerful program aligned to a person would have an accurate internal model of that person, but I think that this is true by default whenever a powerful, goal seeking program interacts with a person- it’s just one of the default instrumental subgoals, not alignment specific.
There’s a difference between building a model of a person and using that model as a core element of your decision making algorithm. So what you’re describing seems even weaker than weak necessity.
However, I agree that some of the ideas I’ve sketched are pretty loose. I’m trying to provide a conceptual frame and work out some of the implications only.