The argument is purely that mindspace seems to be large and that points in mindspace very close to humans could easily be highly inimical to our value system. In that context, what is your objection?
That argument seems to be true—but insignificant. Similarly programs with a small hamming distance from Microsoft Windows crash when executed. So what? That doesn’t mean that the operating system is unlikely to work.
This sort of statistic is just not very relevant—unless the aim is to sound scary.
It’s not the risk of an AI crashing that is worrying. To continue in the form of your analogy:
Programs a small distance from correct IDE drivers have overwritten large chunks of a couple of my hard drives with garbage, leaving data irrecoverable. These programs had all the code in them to do low-level edits to hard drives, so a slight error simply caused them to write horribly wrong things.
Programs a small distance from correct video drivers have put garbage on my computer monitor. This one is so common that I can recall random colored ASCII text, stretched and distorted versions of the correct image, clips of data that had been “freed” but not overwritten by other programs using video memory, large blocks of color… in each case the driver had all the code in it to edit the image on the screen, and lots of different bugs led to writing various sorts of grossly incorrect images.
So if we write a program which has all the code in it to try to edit the universe according to its values, and theres a bug in the part which tells it that its values are our values, what do we expect to happen?
And unless people are all quite paranoid, there will be lots of bugs. Windows XP SP2 included over a thousand bug fixes. I agree that our first AGIs are likely to be as correct as our first operating systems. This is not reassuring.
It’s not the risk of an AI crashing that is worrying.
That wasn’t really the point of the analogy. The idea was of a target representing success being surrounded by a larger space of failure. The seriousness of the failure was intended to be an incidental aspect of the analogy.
That argument seems to be true—but insignificant. Similarly programs with a small hamming distance from Microsoft Windows crash when executed. So what? That doesn’t mean that the operating system is unlikely to work.
This sort of statistic is just not very relevant—unless the aim is to sound scary.
It’s not the risk of an AI crashing that is worrying. To continue in the form of your analogy:
Programs a small distance from correct IDE drivers have overwritten large chunks of a couple of my hard drives with garbage, leaving data irrecoverable. These programs had all the code in them to do low-level edits to hard drives, so a slight error simply caused them to write horribly wrong things.
Programs a small distance from correct video drivers have put garbage on my computer monitor. This one is so common that I can recall random colored ASCII text, stretched and distorted versions of the correct image, clips of data that had been “freed” but not overwritten by other programs using video memory, large blocks of color… in each case the driver had all the code in it to edit the image on the screen, and lots of different bugs led to writing various sorts of grossly incorrect images.
So if we write a program which has all the code in it to try to edit the universe according to its values, and theres a bug in the part which tells it that its values are our values, what do we expect to happen?
And unless people are all quite paranoid, there will be lots of bugs. Windows XP SP2 included over a thousand bug fixes. I agree that our first AGIs are likely to be as correct as our first operating systems. This is not reassuring.
That wasn’t really the point of the analogy. The idea was of a target representing success being surrounded by a larger space of failure. The seriousness of the failure was intended to be an incidental aspect of the analogy.