Can you expand on why you don’t like the random mindspace argument? I’m curious to hear that. I don’t think anyone making that argument is arguing that the first strong AIs will hit completely random points in mindspace, and I don’t think anyone is arguing that they have a precise probability measure or notion of metrics on mindspace. The argument is purely that mindspace seems to be large and that points in mindspace very close to humans could easily be highly inimical to our value system. In that context, what is your objection?
The argument is purely that mindspace seems to be large and that points in mindspace very close to humans could easily be highly inimical to our value system.
Considering the diversity of human values, I think other humans already are a working demonstration of “point in mindspace very close” to ours that is however “highly inimical to our value system”.
The argument is purely that mindspace seems to be large and that points in mindspace very close to humans could easily be highly inimical to our value system. In that context, what is your objection?
That argument seems to be true—but insignificant. Similarly programs with a small hamming distance from Microsoft Windows crash when executed. So what? That doesn’t mean that the operating system is unlikely to work.
This sort of statistic is just not very relevant—unless the aim is to sound scary.
It’s not the risk of an AI crashing that is worrying. To continue in the form of your analogy:
Programs a small distance from correct IDE drivers have overwritten large chunks of a couple of my hard drives with garbage, leaving data irrecoverable. These programs had all the code in them to do low-level edits to hard drives, so a slight error simply caused them to write horribly wrong things.
Programs a small distance from correct video drivers have put garbage on my computer monitor. This one is so common that I can recall random colored ASCII text, stretched and distorted versions of the correct image, clips of data that had been “freed” but not overwritten by other programs using video memory, large blocks of color… in each case the driver had all the code in it to edit the image on the screen, and lots of different bugs led to writing various sorts of grossly incorrect images.
So if we write a program which has all the code in it to try to edit the universe according to its values, and theres a bug in the part which tells it that its values are our values, what do we expect to happen?
And unless people are all quite paranoid, there will be lots of bugs. Windows XP SP2 included over a thousand bug fixes. I agree that our first AGIs are likely to be as correct as our first operating systems. This is not reassuring.
It’s not the risk of an AI crashing that is worrying.
That wasn’t really the point of the analogy. The idea was of a target representing success being surrounded by a larger space of failure. The seriousness of the failure was intended to be an incidental aspect of the analogy.
Your link seems to address only a restricted case of the random mind space argument, where an AI is given a correctly specified goal but insufficient constraints on its behavior wrt resources. The randomness is not in what it principally values (e.g., paperclips) but in what else it values. A complete counterargument should address the case where, say, we try to create a paperclip maximizer and end up creating a staple maximizer.
Can you expand on why you don’t like the random mindspace argument? I’m curious to hear that. I don’t think anyone making that argument is arguing that the first strong AIs will hit completely random points in mindspace, and I don’t think anyone is arguing that they have a precise probability measure or notion of metrics on mindspace. The argument is purely that mindspace seems to be large and that points in mindspace very close to humans could easily be highly inimical to our value system. In that context, what is your objection?
Considering the diversity of human values, I think other humans already are a working demonstration of “point in mindspace very close” to ours that is however “highly inimical to our value system”.
That argument seems to be true—but insignificant. Similarly programs with a small hamming distance from Microsoft Windows crash when executed. So what? That doesn’t mean that the operating system is unlikely to work.
This sort of statistic is just not very relevant—unless the aim is to sound scary.
It’s not the risk of an AI crashing that is worrying. To continue in the form of your analogy:
Programs a small distance from correct IDE drivers have overwritten large chunks of a couple of my hard drives with garbage, leaving data irrecoverable. These programs had all the code in them to do low-level edits to hard drives, so a slight error simply caused them to write horribly wrong things.
Programs a small distance from correct video drivers have put garbage on my computer monitor. This one is so common that I can recall random colored ASCII text, stretched and distorted versions of the correct image, clips of data that had been “freed” but not overwritten by other programs using video memory, large blocks of color… in each case the driver had all the code in it to edit the image on the screen, and lots of different bugs led to writing various sorts of grossly incorrect images.
So if we write a program which has all the code in it to try to edit the universe according to its values, and theres a bug in the part which tells it that its values are our values, what do we expect to happen?
And unless people are all quite paranoid, there will be lots of bugs. Windows XP SP2 included over a thousand bug fixes. I agree that our first AGIs are likely to be as correct as our first operating systems. This is not reassuring.
That wasn’t really the point of the analogy. The idea was of a target representing success being surrounded by a larger space of failure. The seriousness of the failure was intended to be an incidental aspect of the analogy.
See P4, third paragraph, here.
Your link seems to address only a restricted case of the random mind space argument, where an AI is given a correctly specified goal but insufficient constraints on its behavior wrt resources. The randomness is not in what it principally values (e.g., paperclips) but in what else it values. A complete counterargument should address the case where, say, we try to create a paperclip maximizer and end up creating a staple maximizer.