It is not strong. The basic idea is that if you pull a mind at random from design space then it will be unfriendly. I am not even sure if that is true. But it is the strongest argument they have. And it is completely bogus because humans do not pull AGI’s from mind design space at random.
An AI’s mind doesn’t have to be pulled from design space at random to be disastrous. The primary issue that the SIAI has to grapple with (based on my understanding,) is that deliberately designing an AI that does what we would want it to do, rather than fulfilling proxy criteria in ways that we would not like at all, is really difficult. Even getting one to recognize “humans” as a category in a way that would be acceptable to us is a major challenge.
Although it’s worth pointing out that this is also an obstacle to AGI, since presumably an AI that did not understand what a human was would be pretty unintelligent. So I think it’s unfair to claim this as a “friendliness” issue.
Note that I do think there are some important friendliness-related problems, but, assuming I understand your objection, this is not one of them.
An AI could be an extremely powerful optimizer without having a category for “humans” that mapped to our own. “Human,” the way we conceive of it, is a leaky surface generalization.
A strong paperclip maximizer would understand humans as well as it had to to contend with us in its attempts to paperclip the universe, but it wouldn’t care about us. And a strong optimizer programmed to maximize the values of “humans” would also probably understand us, but if we don’t program into its values an actual category that maps to our conception of humans, it could perfectly well end up applying that understanding to, for example, tiling the universe with crash test dummies.
How do you intend to build a powerful optimizer without having a method of representing (or of building a representation of) the concept of “human” (where “human” can be replaced with any complex concept, even probably paperclips)?
I agree that value specification is a hard problem. But I don’t think the complexity of “human” is the reason for this, although it does rule out certain simple approaches like hard-coding values.
(Also, since your link seems to indicate you believe otherwise, I am fairly familiar with the content in the sequences. Apologies if this statement represents an improper inference.)
How do you intend to build a powerful optimizer without having a method of representing (or of building a representation of) the concept of “human” (where “human” can be replaced with any complex concept, even probably paperclips)?
If a machine can learn, empirically, exactly what humans are, on the most fundamental levels, but doesn’t have any values associated with them, why should it need a concept of “human?” We don’t have a category that distinguishes igneous rocks that are circular and flat on one side, but we can still recognize them and describe them precisely.
Humans are an unnatural category. Whether a fetus, an individual in a persistent vegetative state, an amputee, a corpse, an em or a skin cell culture fall into the category of “human” depends on value-sensitive boundaries. It’s not necessarily because humans are so complex that we can’t categorize them in an appropriate manner for an AI (or at least, not just because humans are complex,) it’s because we don’t have an appropriate formulation of the values that would allow a computer to draw the boundaries of the category in a way we’d want it to.
(I wasn’t sure how familiar you were with the sequences, but in any case I figured it can’t hurt to add links for anyone who might be following along who’s not familiar.)
An AI’s mind doesn’t have to be pulled from design space at random to be disastrous. The primary issue that the SIAI has to grapple with (based on my understanding,) is that deliberately designing an AI that does what we would want it to do, rather than fulfilling proxy criteria in ways that we would not like at all, is really difficult. Even getting one to recognize “humans” as a category in a way that would be acceptable to us is a major challenge.
Although it’s worth pointing out that this is also an obstacle to AGI, since presumably an AI that did not understand what a human was would be pretty unintelligent. So I think it’s unfair to claim this as a “friendliness” issue.
Note that I do think there are some important friendliness-related problems, but, assuming I understand your objection, this is not one of them.
An AI could be an extremely powerful optimizer without having a category for “humans” that mapped to our own. “Human,” the way we conceive of it, is a leaky surface generalization.
A strong paperclip maximizer would understand humans as well as it had to to contend with us in its attempts to paperclip the universe, but it wouldn’t care about us. And a strong optimizer programmed to maximize the values of “humans” would also probably understand us, but if we don’t program into its values an actual category that maps to our conception of humans, it could perfectly well end up applying that understanding to, for example, tiling the universe with crash test dummies.
How do you intend to build a powerful optimizer without having a method of representing (or of building a representation of) the concept of “human” (where “human” can be replaced with any complex concept, even probably paperclips)?
I agree that value specification is a hard problem. But I don’t think the complexity of “human” is the reason for this, although it does rule out certain simple approaches like hard-coding values.
(Also, since your link seems to indicate you believe otherwise, I am fairly familiar with the content in the sequences. Apologies if this statement represents an improper inference.)
If a machine can learn, empirically, exactly what humans are, on the most fundamental levels, but doesn’t have any values associated with them, why should it need a concept of “human?” We don’t have a category that distinguishes igneous rocks that are circular and flat on one side, but we can still recognize them and describe them precisely.
Humans are an unnatural category. Whether a fetus, an individual in a persistent vegetative state, an amputee, a corpse, an em or a skin cell culture fall into the category of “human” depends on value-sensitive boundaries. It’s not necessarily because humans are so complex that we can’t categorize them in an appropriate manner for an AI (or at least, not just because humans are complex,) it’s because we don’t have an appropriate formulation of the values that would allow a computer to draw the boundaries of the category in a way we’d want it to.
(I wasn’t sure how familiar you were with the sequences, but in any case I figured it can’t hurt to add links for anyone who might be following along who’s not familiar.)