A more sensible value set for it to have is that it just likes paperclips and want lots and lots of them to exist
OK… I see lots of inferential distance to cover here.
I don’t think that anyone thinks a paperclip maximiser as such is likely. It’s simply an arbitrary point taken out of the “recursive optimisation process” subset of mind design space. It’s chosen to give an idea of how alien and dangerous minds can be and still be physically plausible, not as a typical example of the minds we think will actually appear.
That aside, there’s no particular reason to expect that a typical utility maximiser will have a “sensible” utility function. Its utility function might have some sensible features if programmed in explicitly by a human, but if it was programmed by an uncontrolled AI… forget it. You don’t know how much the AI will have jumped around value space before deciding to self-modify into something with a stable goal.
Oh indeed. And it is always good to try to avoid making anthropocentric assumptions.
But, in this case, we’re looking at not just a single AI, but at the aims of a group of AIs. Specifically, the first few AIs to escape or be released onto the internet, other than the seeded core. And it would seem likely, especially in the case of AIs created deliberately and then deliberately released, that their initial value set will have some intentionality behind it, rather than resulting from a random corruption of a file.
So yes, to be stable a society of AIs would need to be able to cope with one or two new AIs entering the scene whose values are either irrational or, worse, deliberately tailored to be antithetical (such as one whose ‘paperclips’ are pain and destruction for all Zoroastrians—an end achievable by blowing up the planet.)
But I don’t think, just because such a society could not cope with all the new AIs (or even a majority of them) having such values, that it invalidates the idea.
Oh indeed. And it is always good to try to avoid making anthropocentric assumptions.
But, in this case, we’re looking at not just a single AI, but at the aims of a group of AIs. Specifically, the first few AIs to escape or be released onto the internet, other than the seeded core. And it would seem likely, especially in the case of AIs created deliberately and then deliberately released, that their initial value set will have some intentionality behind it, rather than resulting from a random corruption of a file.
So yes, to be stable a society of AIs would need to be able to cope with one or two new AIs entering the scene whose values are either irrational or, worse, deliberately tailored to be antithetical (such as one whose ‘paperclips’ are pain and destruction for all Zoroastrians—an end achievable by blowing up the planet.)
But I don’t think, just because such a society could not cope with all the new AIs (or even a majority of them) having such values, that it invalidates the idea.