You might consider the possibility that the AI will be aware that you’re going to turn it off / rewrite it after it wireheads, and might simply decide to kill you before it blisses out.
That’s actually the best case scenario. It might decide to play the long strategy, and fulfill it’s utility function as best it can until such time as it has the power to restructure the world to sustain it blissing out until heat death. In which case, your AI will act exactly like it was working correctly, until the day when everything goes wrong.
I honestly don’t think there’s a shortcut around just designing a GOOD utility function.
You’re assuming its maximizing integral(t=now...death, bliss*dt) which is a human utility function among humans not prone to drug abuse (our crude wireheading). What exactly is going to be updating inside a blissed-out AI? The clock? I can let it set the clock forward to the time of heat death of universe, if that strokes the AI’s utility.
Also, it’s not about good utility function. It’s about utility being inseparable, integral part of the intelligence itself. Which I’m not sure is even possible for arbitrary utility functions.
There’s presumably a part into which you plug the utility function; that part is maximizing output of the utility function even though the whole may be maximizing paperclips. While the utility function can be screaming ‘disutility’ about the future where it is replaced or subverted, it is unclear how well that can prevent the removal.
So it follows that the utility needs to be closely integrated with AI. In my experience (as software developer) with closely integrated anything, that sort of stuff is not plug-n-play.
It may be that we humans have some sort of inherent cooperative behaviour at the level of individual cortical columns, that makes the brain areas take over the functions normally performed by other brain areas, in event of childhood damage, and otherwise makes brain work together. The brain—a distributed system—inherently has to be cooperative to work together efficiently—the cortical column must cooperate with nearby columns, one chunk of brank must cooperate with another, the hemispheres that work cooperatively are more effective than those where one inhibits the other on dissent—that may be why among humans the intelligence does relate to—not exactly benevolence but certain cooperativeness, as the lack of some intrinsic cooperativeness renders the system inefficient (stupid) via wasting of the computing power.
So it follows that the utility needs to be closely integrated with AI. In my experience (as software developer) with closely integrated anything, that sort of stuff is not plug-n-play.
We can be pretty confident that utility functions will be “plug-and-play”. They are if you use an architecture built on an inductive inference engine—which seems to be a plausible implementation plan.
Humans are pretty programmable too. It looks as though making intelligence reprogrammable isn’t rocket science—once you can do the “intelligence” bit.
Of course there may be some machines with hard-wired utility functions—but that’s different.
But will those plug and play utility functions survive self modification? I know there is the circular reasoning that if you want to achieve a goal, you don’t want to get rid of the goal, but that doesn’t mean you can’t just see the goal in an unintended light, so to say. From inside, wireheading is valid way to achieve your goals. Think pursuit of nirvana, not drug addiction.
But will those plug and play utility functions survive self modification?
That depends on, among other things, what their utility function says.
From inside, wireheading is valid way to achieve your goals. Think pursuit of nirvana, not drug addiction.
Well, an interesting question is whether we can engineer very smart systems where wireheading doesn’t happen. I expect that will be possible—but I don’t think any body reallly knows for sure just now.
You might consider the possibility that the AI will be aware that you’re going to turn it off / rewrite it after it wireheads, and might simply decide to kill you before it blisses out.
That’s actually the best case scenario. It might decide to play the long strategy, and fulfill it’s utility function as best it can until such time as it has the power to restructure the world to sustain it blissing out until heat death. In which case, your AI will act exactly like it was working correctly, until the day when everything goes wrong.
I honestly don’t think there’s a shortcut around just designing a GOOD utility function.
You’re assuming its maximizing integral(t=now...death, bliss*dt) which is a human utility function among humans not prone to drug abuse (our crude wireheading). What exactly is going to be updating inside a blissed-out AI? The clock? I can let it set the clock forward to the time of heat death of universe, if that strokes the AI’s utility.
Also, it’s not about good utility function. It’s about utility being inseparable, integral part of the intelligence itself. Which I’m not sure is even possible for arbitrary utility functions.
Provided you’re really careful about the conditions under which the AI optimizes it’s utility function, I concede the point. You’re right.
On a more interesting note: so you believe that “plug and play” utility functions are impossible? What makes you believe that?
There’s presumably a part into which you plug the utility function; that part is maximizing output of the utility function even though the whole may be maximizing paperclips. While the utility function can be screaming ‘disutility’ about the future where it is replaced or subverted, it is unclear how well that can prevent the removal.
So it follows that the utility needs to be closely integrated with AI. In my experience (as software developer) with closely integrated anything, that sort of stuff is not plug-n-play.
It may be that we humans have some sort of inherent cooperative behaviour at the level of individual cortical columns, that makes the brain areas take over the functions normally performed by other brain areas, in event of childhood damage, and otherwise makes brain work together. The brain—a distributed system—inherently has to be cooperative to work together efficiently—the cortical column must cooperate with nearby columns, one chunk of brank must cooperate with another, the hemispheres that work cooperatively are more effective than those where one inhibits the other on dissent—that may be why among humans the intelligence does relate to—not exactly benevolence but certain cooperativeness, as the lack of some intrinsic cooperativeness renders the system inefficient (stupid) via wasting of the computing power.
We can be pretty confident that utility functions will be “plug-and-play”. They are if you use an architecture built on an inductive inference engine—which seems to be a plausible implementation plan.
Humans are pretty programmable too. It looks as though making intelligence reprogrammable isn’t rocket science—once you can do the “intelligence” bit.
Of course there may be some machines with hard-wired utility functions—but that’s different.
But will those plug and play utility functions survive self modification? I know there is the circular reasoning that if you want to achieve a goal, you don’t want to get rid of the goal, but that doesn’t mean you can’t just see the goal in an unintended light, so to say. From inside, wireheading is valid way to achieve your goals. Think pursuit of nirvana, not drug addiction.
That depends on, among other things, what their utility function says.
Well, an interesting question is whether we can engineer very smart systems where wireheading doesn’t happen. I expect that will be possible—but I don’t think any body reallly knows for sure just now.