This has been discussed at the FHI and SIAI. If the AI wireheads but is motivated to continue wireheading, then it has reason to destroy humanity and colonize the galaxy to eliminate potential threats. See my short paper on this (in part). Wireheading which prevents further actions (and takes place before the creation of surrogate AIs to protect the wireheading system) can just be thought of as the AI destroying itself.
Some have also hoped that unexpectedly rapidly self-improving AI might be like this, but I would tend to suspect that developers would just tweak parameters until they got a non-suicidal AI. An AI intentionally designed to try to destroy itself, but constrained from doing so (perhaps rewarded with the chance to destroy itself for good behavior) might be a bit easier to constrain than a survival machine, but still horribly dangerous, with many failure modes left untouched.
You’re proposing that the AI spontaneously adopts maximization of bliss*time instead of maximization of bliss. If the AI is prone to this sort of goal-switching, then not even the FAI appears safe (as the FAI for example could opt to put humanity into suspended storage until it colonizes the galaxy and eliminates the threats, even if it’s chances to do so appear to be small, given the dis-utility of letting humans multiply before potential battle with alien AI). It is a generic counter argument to any sort of non-dangerous AI that the AI would suddenly and on it’s own adopt some goals that we—the survival machines—have.
We humans have self preservation so ingrained in us, to the point that it is hard for us to see that time does not have any inherent value of it’s own.
Why you propose to call it ‘destroying itself’ and ‘suicidal’ though?
What is left of your argument if we ban apriori special treatment of the t coordinate by AI (why should it care about the length of the bliss in time rather than volume of the bliss in space?), and use of loaded concepts to which our own intelligence has strong aversion like ‘destroying itself’?
Also, btw, for the FAI there’s the problem that they may want to wirehead you.
Easy to go too far, a perfect wireheaded bliss is an end state—there’s no way but downhill when you are on top of a hill. End state as in, no further updates of any note; the clock ticking perhaps and that’s it.
just tweak parameters until they got a non-suicidal AI
(This might be difficult unto impossibility with architectures that substantially write, rewrite, and refactor their own code. If so it might be necessary for humans to solve the grounding problem themselves rather than leave it to an AI, in which case we might have substantially more time until uFAI.)
This has been discussed at the FHI and SIAI. If the AI wireheads but is motivated to continue wireheading, then it has reason to destroy humanity and colonize the galaxy to eliminate potential threats. See my short paper on this (in part). Wireheading which prevents further actions (and takes place before the creation of surrogate AIs to protect the wireheading system) can just be thought of as the AI destroying itself.
Some have also hoped that unexpectedly rapidly self-improving AI might be like this, but I would tend to suspect that developers would just tweak parameters until they got a non-suicidal AI. An AI intentionally designed to try to destroy itself, but constrained from doing so (perhaps rewarded with the chance to destroy itself for good behavior) might be a bit easier to constrain than a survival machine, but still horribly dangerous, with many failure modes left untouched.
You’re proposing that the AI spontaneously adopts maximization of bliss*time instead of maximization of bliss. If the AI is prone to this sort of goal-switching, then not even the FAI appears safe (as the FAI for example could opt to put humanity into suspended storage until it colonizes the galaxy and eliminates the threats, even if it’s chances to do so appear to be small, given the dis-utility of letting humans multiply before potential battle with alien AI). It is a generic counter argument to any sort of non-dangerous AI that the AI would suddenly and on it’s own adopt some goals that we—the survival machines—have.
We humans have self preservation so ingrained in us, to the point that it is hard for us to see that time does not have any inherent value of it’s own.
No, I’m discussing a variety of different behaviors people call “wireheading” that might emerge from different AI architectures, in the alternative.
Why you propose to call it ‘destroying itself’ and ‘suicidal’ though?
What is left of your argument if we ban apriori special treatment of the t coordinate by AI (why should it care about the length of the bliss in time rather than volume of the bliss in space?), and use of loaded concepts to which our own intelligence has strong aversion like ‘destroying itself’?
Also, btw, for the FAI there’s the problem that they may want to wirehead you.
Of the ways an AI could go bad, wireheading everyone is a fairly mild one.
Easy to go too far, a perfect wireheaded bliss is an end state—there’s no way but downhill when you are on top of a hill. End state as in, no further updates of any note; the clock ticking perhaps and that’s it.
(This might be difficult unto impossibility with architectures that substantially write, rewrite, and refactor their own code. If so it might be necessary for humans to solve the grounding problem themselves rather than leave it to an AI, in which case we might have substantially more time until uFAI.)