TurnTrout comments on Seeking Power is Often Convergently Instrumental in MDPs

TurnTrout 5 Dec 2019 22:01 UTC
LW: 5 AF: 3
AF

My conclusion would be that the intermediate version is true but the strong version false then. Would you say that’s an accurate summary?

I’m not totally sure I fully follow the conclusion, but I’ll take a shot at answering—correct me if it seems like I’m talking past you.

Taking $Y$ to be some notion of human values, I think it’s both true that $Y$ actively decreases and $Y$ becomes harder for us to optimize. Both of these are caused, I think, by the agent’s drive to take power / resources from us. If this weren’t true, we might expect to see only “evil” objectives inducing catastrophically bad outcomes.
- SoerenMind 6 Dec 2019 12:13 UTC
  LW: 1 AF: 1
  AF Parent
  I should’ve specified that the strong version is “Y decreases relative to a world where neither of X nor Y are being optimized”. Am I right that this version is not true?
  - TurnTrout 6 Dec 2019 14:01 UTC
    LW: 4 AF: 2
    AF Parent
    I don’t immediately see why this wouldn’t be true as well as the “intermediate version”. Can you expand?
    - SoerenMind 7 Dec 2019 15:35 UTC
      LW: 4 AF: 3
      AF Parent
      If X is “number of paperclips” and Y is something arbitrary that nobody optimizes, such as the ratio of number of bicycles on the moon to flying horses, optimizing X should be equally likely to increase or decrease Y in expectation. Otherwise “1-Y” would go in the opposite direction which can’t be true by symmetry. But if Y is something like “number of happy people”, Y will probably decrease because the world is already set up to keep Y up and a misaligned agent could disturb that state.
      - TurnTrout 7 Dec 2019 17:04 UTC
        LW: 5 AF: 3
        AF Parent
        That makes sense, thanks. I then agree that it isn’t always true that $Y$ actively decreases, but it should generally become harder for us to optimize. This is the difference between a utility decrease and an attainable utility decrease.