My conclusion would be that the intermediate version is true but the strong version false then. Would you say that’s an accurate summary?
I’m not totally sure I fully follow the conclusion, but I’ll take a shot at answering—correct me if it seems like I’m talking past you.
Taking Y to be some notion of human values, I think it’s both true that Y actively decreases and Y becomes harder for us to optimize. Both of these are caused, I think, by the agent’s drive to take power / resources from us. If this weren’t true, we might expect to see only “evil” objectives inducing catastrophically bad outcomes.
I should’ve specified that the strong version is “Y decreases relative to a world where neither of X nor Y are being optimized”. Am I right that this version is not true?
If X is “number of paperclips” and Y is something arbitrary that nobody optimizes, such as the ratio of number of bicycles on the moon to flying horses, optimizing X should be equally likely to increase or decrease Y in expectation. Otherwise “1-Y” would go in the opposite direction which can’t be true by symmetry. But if Y is something like “number of happy people”, Y will probably decrease because the world is already set up to keep Y up and a misaligned agent could disturb that state.
That makes sense, thanks. I then agree that it isn’t always true that Y actively decreases, but it should generally become harder for us to optimize. This is the difference between a utility decrease and an attainable utility decrease.
I’m not totally sure I fully follow the conclusion, but I’ll take a shot at answering—correct me if it seems like I’m talking past you.
Taking Y to be some notion of human values, I think it’s both true that Y actively decreases and Y becomes harder for us to optimize. Both of these are caused, I think, by the agent’s drive to take power / resources from us. If this weren’t true, we might expect to see only “evil” objectives inducing catastrophically bad outcomes.
I should’ve specified that the strong version is “Y decreases relative to a world where neither of X nor Y are being optimized”. Am I right that this version is not true?
I don’t immediately see why this wouldn’t be true as well as the “intermediate version”. Can you expand?
If X is “number of paperclips” and Y is something arbitrary that nobody optimizes, such as the ratio of number of bicycles on the moon to flying horses, optimizing X should be equally likely to increase or decrease Y in expectation. Otherwise “1-Y” would go in the opposite direction which can’t be true by symmetry. But if Y is something like “number of happy people”, Y will probably decrease because the world is already set up to keep Y up and a misaligned agent could disturb that state.
That makes sense, thanks. I then agree that it isn’t always true that Y actively decreases, but it should generally become harder for us to optimize. This is the difference between a utility decrease and an attainable utility decrease.