I also very much liked the idea that a self-improving AI would probably wirehead itself. It never occured to me, and it makes a lot of sense.
This idea is intuitively plausible, but doesn’t hold up when considering rational actors that value states of the world instead of states of their minds. Consider a paperclip maximizer, with the goal “make the number of paperclips in the universe as great as possible”. Would it rather a) make paperclips, or b) wirehead to convince itself that the universe is already full of paperclips? Before it wireheads, it knows that option a) will lead to more paperclips, so it does that. Similarly, I would rather actually help people than feel the warm glow that comes from helping people without any actual helping.
Well, yes, but behind the scenes you need a sensible symbolic representation of the world, with explicitly demarcated levels of abstraction. So, when the system is pathing between ‘the world now’ and ‘the world it wants to get to,’ the worlds in which it believes there are a lot of paperclips are in very different parts of state space than the worlds which contain the most paperclips, which is what it’s aiming for. Being unable to differentiate would be a bug in the seed AI, one which would not occur later if it did not originally exist.
This idea is intuitively plausible, but doesn’t hold up when considering rational actors that value states of the world instead of states of their minds. Consider a paperclip maximizer, with the goal “make the number of paperclips in the universe as great as possible”. Would it rather a) make paperclips, or b) wirehead to convince itself that the universe is already full of paperclips? Before it wireheads, it knows that option a) will lead to more paperclips, so it does that. Similarly, I would rather actually help people than feel the warm glow that comes from helping people without any actual helping.
Easier said than done. Valuing state of the world is hard; you have to rely on senses.
Well, yes, but behind the scenes you need a sensible symbolic representation of the world, with explicitly demarcated levels of abstraction. So, when the system is pathing between ‘the world now’ and ‘the world it wants to get to,’ the worlds in which it believes there are a lot of paperclips are in very different parts of state space than the worlds which contain the most paperclips, which is what it’s aiming for. Being unable to differentiate would be a bug in the seed AI, one which would not occur later if it did not originally exist.