Wei Dai comments on AI indifference through utility manipulation

Wei Dai 16 Oct 2011 21:50 UTC
4 points
0
Putting aside the question of which way to implement “indifference” is better, the main reason I came up with this alternative was that I was having trouble understanding what your modified AI would do, until I realized that it would essentially act as if it believed that the measurement would come out 0 with probability 1.

Given this, I think a modified AI in your explosives example would not actually do D-H. Suppose it believes that the measurement will certainly come out 0 and therefore no explosion will occur, it would still want to avoid alarming the observer since he or she may have other ways of shutting down the AI (for example by calling someone in charge of the power plant, or pushing another button it doesn’t know about). I think such an AI would still appear totally cooperative until it has manipulated humans into giving it so much power and survivability that it can survive all possible human countermeasures and win any conflict with close to certainty. The fact that it would fail to take into account the explosives in this calculation seems to make little difference to the eventual outcome.
- Stuart_Armstrong 17 Oct 2011 11:50 UTC
  0 points
  0
  Parent
  
  I realized that it would essentially act as if it believed that the measurement would come out 0 with probability 1.
  
  Yes.
  
  The fact that it would fail to take into account the explosives in this calculation seems to make little difference to the eventual outcome.
  
  Little difference—but maybe some. Maybe it will neutralise all the other countermeasures first, giving us time? Anyways, the explosive example wasn’t ideal; we can probably do better. And we can use indifference for other things, such as making an oracle indifferent to the content of its answers (pipe the answer though a channel that has a quantum process that deletes it with tiny probability). These seems many things we can use it for.
  - Wei Dai 18 Oct 2011 0:15 UTC
    0 points
    0
    Parent
    Ok, I don’t disagree with what you write here. It does seem like a potentially useful idea to keep in mind.