Rob Bensinger comments on The Hidden Complexity of Wishes

Rob Bensinger 31 Aug 2013 19:30 UTC
2 points
0
A relatively non-scary possibility: The AI destroys itself, because that’s the best way to ensure it doesn’t positively ‘affect’ others in the intuitive sense you mean. (Though that would still of course have effects, so this depends on reproducing in AI our intuitive concept of ‘side-effect’ vs. ‘intended effect’....)

Scarier possibilities, depending on how we implement the goal:
- the AI doesn’t kill you and then simulate you; rather, it kills you and then simulates a single temporally locked frame of you, to minimize the possibility that it (or anything) will change you.
- the AI just kills everyone, because a large and drastic change now reduces to ~0 the probability that it will cause any larger perturbations later (e.g., when humans might have a big galactic civilization that it would be a lot worse to perturb).
- the AI has a model of physics on which all of its actions (eventually) have a roughly equal effect on the atoms that at present compose human beings. So it treats all its possible actions (and inactions) as equivalent, and ignores your restriction in making decisions.
- Kawoomba 31 Aug 2013 20:19 UTC
  0 points
  0
  Parent
  Yes, implementing such a goal is not easy and has pitfalls of its own, however it’s probably easi-er than the alternative, since a metric for “no large scale effects” seems easier to formalize than “human friendliness”, where we have little idea of what’s that even supposed to mean.