AlexMennen comments on Proper value learning through indifference

AlexMennen 19 Jun 2014 21:59 UTC
7 points
0
Problem: not only will such an AI not resist its utility function being altered by you, it will also not resist its utility function being altered by a saboteur or by accident. I don’t think I’d want to call this proposal a form of value learning, since it does not involve the AI trying to learn values, and instead just makes the AI hold still while values are force-fed to it.
- Stuart_Armstrong 20 Jun 2014 9:38 UTC
  5 points
  0
  Parent
  The AI will not resist its values being changed in the particular way that is specified in to trigger a U transition. It will resist other changes of value.
  - AlexMennen 20 Jun 2014 21:09 UTC
    1 point
    0
    Parent
    That’s true; it will resist changes to its “outer” utility function U. But it won’t resist changes to its “inner” utility function v, which still leaves a lot of flexibility, even though that isn’t its true utility function in the VNM sense. That restriction isn’t strong enough to avoid the problem I pointed out above.
    - Stuart_Armstrong 21 Jun 2014 5:41 UTC
      3 points
      0
      Parent
      I will only allow v to change if that change will trigger the “U adaptation” (the adding and subtracting of constants). You have to specify what processes count as U adaptations (certain types of conversations with certain people, eg) and what doesn’t.
      - AlexMennen 21 Jun 2014 16:05 UTC
        1 point
        0
        Parent
        Oh, I see. So the AI simply losing the memory that v was stored in and replacing it with random noise shoudn’t count as something it will be indifferent about? How would you formalize this such that arbitrary changes to v don’t trigger the indifference?
        Stuart_Armstrong 22 Jun 2014 20:47 UTC
        1 point
        0
        Parent
        By specifying what counts as an allowed change in U, and making the agent in to a U maximiser. Then, just as standard maximises defend their utilities, it should defend U(un clubbing the update, and only that update)
      - drnickbone 7 Jul 2014 17:23 UTC
        0 points
        0
        Parent
        I think there is a genuine problem here… the AI imposes no obstacle to “trusted programmers” changing its utility function. But apart from the human difficulties (the programmers could be corrupted by power, make mistakes etc.) what stops the AI manipulating the programmers into changing its utility function e.g. changing a hard to satisfy v into some w which is very easy to satisfy, and gives it a very high score?
- [deleted] 23 Jun 2014 13:43 UTC
  1 point
  0
  Parent
  You can’t always solve human problems with AI design.
  - AlexMennen 23 Jun 2014 21:38 UTC
    1 point
    0
    Parent
    I’m not sure what you mean. The problem I was complaining about is an AI design problem, not a human problem.
    - [deleted] 24 Jun 2014 5:44 UTC
      3 points
      0
      Parent
      No, I would say that if you start entering false utility data into the AI and it believes you, because after all it was programmed to be indifferent to new utility data, that’s your problem.
      - AlexMennen 24 Jun 2014 6:00 UTC
        4 points
        0
        Parent
        If the AI’s utility function changes randomly for no apparent reason because the AI has litterally zero incentive to make sure that doesn’t happen, then you have an AI design problem.
        [deleted] 24 Jun 2014 8:10 UTC
        3 points
        0
        Parent
        It didn’t change for no reason. It changed because someone fed new data into the AI’s utility-learning algorithm which made it change. Don’t give people root access if you don’t want them using it!
        AlexMennen 24 Jun 2014 17:06 UTC
        1 point
        0
        Parent
        Being changed by an attacker is only one of the scenarios I was suggesting. And even then, presumably you would want the AI to help prevent them from hacking its utility function if they aren’t supposed to have root access, but it won’t.
        
        Anyway, that problem is just a little bit stupid. But you can also get really stupid problems, like the AI wants more memory, so it replaces its utility function with something more compressible so that it can scavange from the memory where its utility function was stored.