But you’re being vague otherwise. Name a crazy or unfounded belief.
Holden asked me something similar today via mail. Here is what I replied:
You wrote in ‘Other objections to SI’s views’:
Unlike the three objections I focus on, these other issues have been discussed a fair amount, and if these other issues were the only objections to SI’s arguments I would find SI’s case to be strong (i.e., I would find its scenario likely enough to warrant investment in).
It is not strong. The basic idea is that if you pull a mind at random from design space then it will be unfriendly. I am not even sure if that is true. But it is the strongest argument they have. And it is completely bogus because humans do not pull AGI’s from mind design space at random.
Further, the whole case for AI risk is based on the idea that there will be a huge jump in capability at some point. Which I think is at best good science fiction, like faster-than-light propulsion, or antimatter weapons (when in doubt that it is possible in principle).
The basic fact that an AGI will most likely need something like advanced nanotechnology to pose a risk, which is itself an existential risk, hints at a conjunction fallacy. We do not need AGI to then use nanotechnology to wipe us out, nanotechnology is already enough if it is possible at all.
Anyway, it feels completely ridiculous to talk about it in the first place. There will never be a mind that can quickly and vastly improve itself and then invent all kinds of technological magic to wipe us out. Even most science fiction books avoid that because it sounds too implausible.
I have written thousands of words about all this and never got any convincing reply. So if you have any specific arguments, let me know.
They say what what I write is unconvincing. But given the amount of vagueness they use to protect their beliefs, my specific criticisms basically amount to a reductio ad absurdum. I don’t even need to criticize them, they would have to support their extraordinary beliefs first or make them more specific. Yet I am able to come up with a lot of arguments that speak against the possibility they envision, without any effort and no knowledge of the relevant fields like complexity theory.
Here is a comment I received lately:
…in defining an AGI we are actually looking for a general optimization/compression/learning algorithm which when fed itself as an input, outputs a new algorithm that is better by some multiple. Surely this is at least an NP-Complete if not more problem. It may improve for a little bit and then hit a wall where the search space becomes intractable. It may use heuristics and approximations and what not but each improvement will be very hard won and expensive in terms of energy and matter. But no matter how much it tried, the cold hard reality is that you cannot compute an EXPonential Time algorithm in polynomial time unless (P=EXPTIME :S). A no self-recursive exponential intelligence theorem would fit in with all the other limitations (speed, information density, Turing, Gödel, uncertainties etc) the universe imposes.
If you were to turn IBM Watson gradually into a seed AI, at which point would it become an existential risk and why? They can’t answer that at all. It is pure fantasy.
The basic idea is that if you pull a mind at random from design space then it will be unfriendly. I am not even sure if that is true. But it is the strongest argument they have. And it is completely bogus because humans do not pull AGI’s from mind design space at random.
I don’t have the energy to get into an extended debate, but the claim that this is “the basic idea” or that this would be “the strongest argument” is completely false. A far stronger basic idea is the simple fact that nobody has yet figured out a theory of ethics that would work properly, which means that even that AGIs that were specifically designed to be ethical are most likely to lead to bad outcomes. And that’s presuming that we even knew how to program them exactly.
I did skim through the last paper. I am going to review it thoroughly at some point.
On first sight one of the problems is the whole assumption of AI drives. On the one hand you claim that an AI is going to follow its code, is its code (as if anyone would doubt causality). On the other hand you talk about the emergence of drives like unbounded self-protection. And if someone says that unbounded self-protection does not need to be part of an AGI, you simply claim that your definition of AGI will have those drives. Which allows you to arrive at your desired conclusion of AGI being an existential risk.
Another problem is the idea that an AGI will be a goal executor (I can’t help but interpret that to be your position) when I believe that the very nature of artificial general intelligence implies the correct interpretation of “Understand What I Mean” and that “Do What I Mean” is the outcome of virtually any research. Only if you were to pull an AGI at random from mind design space could you possible arrive at “Understand What I Mean” without “Do What I Mean”.
To see why look at any software product or complex machine. Those products are continuously improved. Where “improved” means that they become better at “Understand What I Mean” and “Do What I Mean”.
There is no good reason to believe that at some point that development will suddenly turn into “Understand What I Mean” and “Go Batshit Crazy And Do What I Do Not Mean”.
There are other problems with the paper. I hope I will find some time to write a review soon.
One problem for me with reviewing such papers is that I doubt a lot of underlying assumptions like that there exists a single principle of general intelligence. As I see it there will never be any sudden jump in capability. I also think that intelligence and complex goals are fundamentally interwoven. An AGI will have to be hardcoded, or learn, to care about a manifold of things. No simple algorithm, given limited computational resources, will give rise to the drives that are necessary to undergo strong self-improvement (if that is possible at all).
It is not strong. The basic idea is that if you pull a mind at random from design space then it will be unfriendly. I am not even sure if that is true. But it is the strongest argument they have. And it is completely bogus because humans do not pull AGI’s from mind design space at random.
An AI’s mind doesn’t have to be pulled from design space at random to be disastrous. The primary issue that the SIAI has to grapple with (based on my understanding,) is that deliberately designing an AI that does what we would want it to do, rather than fulfilling proxy criteria in ways that we would not like at all, is really difficult. Even getting one to recognize “humans” as a category in a way that would be acceptable to us is a major challenge.
Although it’s worth pointing out that this is also an obstacle to AGI, since presumably an AI that did not understand what a human was would be pretty unintelligent. So I think it’s unfair to claim this as a “friendliness” issue.
Note that I do think there are some important friendliness-related problems, but, assuming I understand your objection, this is not one of them.
An AI could be an extremely powerful optimizer without having a category for “humans” that mapped to our own. “Human,” the way we conceive of it, is a leaky surface generalization.
A strong paperclip maximizer would understand humans as well as it had to to contend with us in its attempts to paperclip the universe, but it wouldn’t care about us. And a strong optimizer programmed to maximize the values of “humans” would also probably understand us, but if we don’t program into its values an actual category that maps to our conception of humans, it could perfectly well end up applying that understanding to, for example, tiling the universe with crash test dummies.
How do you intend to build a powerful optimizer without having a method of representing (or of building a representation of) the concept of “human” (where “human” can be replaced with any complex concept, even probably paperclips)?
I agree that value specification is a hard problem. But I don’t think the complexity of “human” is the reason for this, although it does rule out certain simple approaches like hard-coding values.
(Also, since your link seems to indicate you believe otherwise, I am fairly familiar with the content in the sequences. Apologies if this statement represents an improper inference.)
How do you intend to build a powerful optimizer without having a method of representing (or of building a representation of) the concept of “human” (where “human” can be replaced with any complex concept, even probably paperclips)?
If a machine can learn, empirically, exactly what humans are, on the most fundamental levels, but doesn’t have any values associated with them, why should it need a concept of “human?” We don’t have a category that distinguishes igneous rocks that are circular and flat on one side, but we can still recognize them and describe them precisely.
Humans are an unnatural category. Whether a fetus, an individual in a persistent vegetative state, an amputee, a corpse, an em or a skin cell culture fall into the category of “human” depends on value-sensitive boundaries. It’s not necessarily because humans are so complex that we can’t categorize them in an appropriate manner for an AI (or at least, not just because humans are complex,) it’s because we don’t have an appropriate formulation of the values that would allow a computer to draw the boundaries of the category in a way we’d want it to.
(I wasn’t sure how familiar you were with the sequences, but in any case I figured it can’t hurt to add links for anyone who might be following along who’s not familiar.)
I’ve read most of that now, and have subscribed to your newsletter.
Reasonable people can disagree in estimating the difficulty of AI and the visibility/pace of AI progress (is it like hunting for a single breakthrough and then FOOM? etc).
I find all of your “it feels ridiculous” arguments by analogy to existing things interesting but unpersuasive.
Anyway, it feels completely ridiculous to talk about it in the first place. There will never be a mind that can quickly and vastly improve itself and then invent all kinds of technological magic to wipe us out. Even most science fiction books avoid that because it sounds too implausible.
Says the wooly mammoth, circa 100,000 BC.
Sounding silly and low status and science-fictiony doesn’t actually make it unlikely to happen in the real world.
Especially when not many people want to read a science fiction book where humanity gets quickly and completely wiped out by a superior force. Even works where humans slowly die off due to their own problems (e.g. On the Beach) are uncommon.
Anyway, it feels completely ridiculous to talk about it in the first place. There will never be a mind that can quickly and vastly improve itself and then invent all kinds of technological magic to wipe us out. Even most science fiction books avoid that because it sounds too implausible
Do you acknowledge that :
We will some day make an AI that is at least as smart as humans?
Humans do try to improve their intelligence (rationality/memory training being a weak example, cyborg research being a better example, and im pretty sure we will soon design physical augmentations to improve our intelligence)
If you acknowledge 1 and 2, then that implies there can (and probably will) be an AI that tries to improve itself
I think you missed the “quickly and vastly” part as well as the “and then invent all kinds of technological magic to wipe us out”. Note I still think XiXiDu is wrong to be as confident as he is (assuming “there will never” implies >90% certainty), but if you are going to engage with him then you should engage with his actual arguments.
Holden asked me something similar today via mail. Here is what I replied:
You wrote in ‘Other objections to SI’s views’:
It is not strong. The basic idea is that if you pull a mind at random from design space then it will be unfriendly. I am not even sure if that is true. But it is the strongest argument they have. And it is completely bogus because humans do not pull AGI’s from mind design space at random.
Further, the whole case for AI risk is based on the idea that there will be a huge jump in capability at some point. Which I think is at best good science fiction, like faster-than-light propulsion, or antimatter weapons (when in doubt that it is possible in principle).
The basic fact that an AGI will most likely need something like advanced nanotechnology to pose a risk, which is itself an existential risk, hints at a conjunction fallacy. We do not need AGI to then use nanotechnology to wipe us out, nanotechnology is already enough if it is possible at all.
Anyway, it feels completely ridiculous to talk about it in the first place. There will never be a mind that can quickly and vastly improve itself and then invent all kinds of technological magic to wipe us out. Even most science fiction books avoid that because it sounds too implausible.
I have written thousands of words about all this and never got any convincing reply. So if you have any specific arguments, let me know.
They say what what I write is unconvincing. But given the amount of vagueness they use to protect their beliefs, my specific criticisms basically amount to a reductio ad absurdum. I don’t even need to criticize them, they would have to support their extraordinary beliefs first or make them more specific. Yet I am able to come up with a lot of arguments that speak against the possibility they envision, without any effort and no knowledge of the relevant fields like complexity theory.
Here is a comment I received lately:
If you were to turn IBM Watson gradually into a seed AI, at which point would it become an existential risk and why? They can’t answer that at all. It is pure fantasy.
END OF EMAIL
For more see the following posts:
Is an Intelligence Explosion a Disjunctive or Conjunctive Event?
Risks from AI and Charitable Giving
Why I am skeptical of risks from AI
Implicit constraints of practical goals (including the follow-up comments that I posted.)
Some old posts:
Should I believe what the SIAI claims?
What I would like the SIAI to publish
SIAI’s Short-Term Research Program
See also:
Interview series on risks from AI
We are SIAI. Argument is futile.
If you believe I don’t understand the basics, see:
A Primer On Risks From AI
Also:
Open Problems in Ethics and Rationality
Objections to Coherent Extrapolated Volition
There is a lot more, especially in the form of comments where I talk about specifics.
I don’t have the energy to get into an extended debate, but the claim that this is “the basic idea” or that this would be “the strongest argument” is completely false. A far stronger basic idea is the simple fact that nobody has yet figured out a theory of ethics that would work properly, which means that even that AGIs that were specifically designed to be ethical are most likely to lead to bad outcomes. And that’s presuming that we even knew how to program them exactly.
This isn’t even something that you’d need to read a hundred blog posts for, it’s well discussed in both The Singularity and Machine Ethics and Artificial Intelligence as a Positive and Negative Factor in Global Risk. Complex Value Systems are Required to Realize Valuable Futures, too.
I did skim through the last paper. I am going to review it thoroughly at some point.
On first sight one of the problems is the whole assumption of AI drives. On the one hand you claim that an AI is going to follow its code, is its code (as if anyone would doubt causality). On the other hand you talk about the emergence of drives like unbounded self-protection. And if someone says that unbounded self-protection does not need to be part of an AGI, you simply claim that your definition of AGI will have those drives. Which allows you to arrive at your desired conclusion of AGI being an existential risk.
Another problem is the idea that an AGI will be a goal executor (I can’t help but interpret that to be your position) when I believe that the very nature of artificial general intelligence implies the correct interpretation of “Understand What I Mean” and that “Do What I Mean” is the outcome of virtually any research. Only if you were to pull an AGI at random from mind design space could you possible arrive at “Understand What I Mean” without “Do What I Mean”.
To see why look at any software product or complex machine. Those products are continuously improved. Where “improved” means that they become better at “Understand What I Mean” and “Do What I Mean”.
There is no good reason to believe that at some point that development will suddenly turn into “Understand What I Mean” and “Go Batshit Crazy And Do What I Do Not Mean”.
There are other problems with the paper. I hope I will find some time to write a review soon.
One problem for me with reviewing such papers is that I doubt a lot of underlying assumptions like that there exists a single principle of general intelligence. As I see it there will never be any sudden jump in capability. I also think that intelligence and complex goals are fundamentally interwoven. An AGI will have to be hardcoded, or learn, to care about a manifold of things. No simple algorithm, given limited computational resources, will give rise to the drives that are necessary to undergo strong self-improvement (if that is possible at all).
Not saying I particularly disagree with your other premises, but saying something can’t be true because it sounds implausible is not a valid argument.
An AI’s mind doesn’t have to be pulled from design space at random to be disastrous. The primary issue that the SIAI has to grapple with (based on my understanding,) is that deliberately designing an AI that does what we would want it to do, rather than fulfilling proxy criteria in ways that we would not like at all, is really difficult. Even getting one to recognize “humans” as a category in a way that would be acceptable to us is a major challenge.
Although it’s worth pointing out that this is also an obstacle to AGI, since presumably an AI that did not understand what a human was would be pretty unintelligent. So I think it’s unfair to claim this as a “friendliness” issue.
Note that I do think there are some important friendliness-related problems, but, assuming I understand your objection, this is not one of them.
An AI could be an extremely powerful optimizer without having a category for “humans” that mapped to our own. “Human,” the way we conceive of it, is a leaky surface generalization.
A strong paperclip maximizer would understand humans as well as it had to to contend with us in its attempts to paperclip the universe, but it wouldn’t care about us. And a strong optimizer programmed to maximize the values of “humans” would also probably understand us, but if we don’t program into its values an actual category that maps to our conception of humans, it could perfectly well end up applying that understanding to, for example, tiling the universe with crash test dummies.
How do you intend to build a powerful optimizer without having a method of representing (or of building a representation of) the concept of “human” (where “human” can be replaced with any complex concept, even probably paperclips)?
I agree that value specification is a hard problem. But I don’t think the complexity of “human” is the reason for this, although it does rule out certain simple approaches like hard-coding values.
(Also, since your link seems to indicate you believe otherwise, I am fairly familiar with the content in the sequences. Apologies if this statement represents an improper inference.)
If a machine can learn, empirically, exactly what humans are, on the most fundamental levels, but doesn’t have any values associated with them, why should it need a concept of “human?” We don’t have a category that distinguishes igneous rocks that are circular and flat on one side, but we can still recognize them and describe them precisely.
Humans are an unnatural category. Whether a fetus, an individual in a persistent vegetative state, an amputee, a corpse, an em or a skin cell culture fall into the category of “human” depends on value-sensitive boundaries. It’s not necessarily because humans are so complex that we can’t categorize them in an appropriate manner for an AI (or at least, not just because humans are complex,) it’s because we don’t have an appropriate formulation of the values that would allow a computer to draw the boundaries of the category in a way we’d want it to.
(I wasn’t sure how familiar you were with the sequences, but in any case I figured it can’t hurt to add links for anyone who might be following along who’s not familiar.)
I’ve read most of that now, and have subscribed to your newsletter.
Reasonable people can disagree in estimating the difficulty of AI and the visibility/pace of AI progress (is it like hunting for a single breakthrough and then FOOM? etc).
I find all of your “it feels ridiculous” arguments by analogy to existing things interesting but unpersuasive.
Says the wooly mammoth, circa 100,000 BC.
Sounding silly and low status and science-fictiony doesn’t actually make it unlikely to happen in the real world.
Especially when not many people want to read a science fiction book where humanity gets quickly and completely wiped out by a superior force. Even works where humans slowly die off due to their own problems (e.g. On the Beach) are uncommon.
Do you acknowledge that :
We will some day make an AI that is at least as smart as humans?
Humans do try to improve their intelligence (rationality/memory training being a weak example, cyborg research being a better example, and im pretty sure we will soon design physical augmentations to improve our intelligence)
If you acknowledge 1 and 2, then that implies there can (and probably will) be an AI that tries to improve itself
I think you missed the “quickly and vastly” part as well as the “and then invent all kinds of technological magic to wipe us out”. Note I still think XiXiDu is wrong to be as confident as he is (assuming “there will never” implies >90% certainty), but if you are going to engage with him then you should engage with his actual arguments.