In many places in his review he [...] criticizes the book as not making the case that extreme pessimism is warranted.
I think this is a valid criticism, which I share. My main criticism of IABIED was that it didn’t argue for its title claim. See the 1700-word section of my review IABIED does not argue for its thesis. (I didn’t cross-post my review to LW or anywhere because I didn’t like that I was just complaining about the book being disappointing when I had such high hopes for it, but if anyone reading this thinks it’s worthwhile to post to LW, say so and I’ll listen.)
By default, it’s reasonable for readers of a book with the title IABIED to expect that the book will at least attempt to explain why if anyone builds ASI anytime soon, then it is almost certain that ASI will cause human extinction.
If the book merely explains why ASI might cause human extinction if anyone builds ASI anytime soon, then I think it is reasonable for readers to criticize this.
BB seems to say that IABIED does argue for its title thesis with the analogy to evolution, and just says that the argument is not decisive because it doesn’t address the “disanalogies between evolution and reinforcement learning.”
Whether one takes BB’s view that the book did argue for its title thesis and just didn’t do a very good (or complete?) job, or whether one takes my views that Y&S largely just didn’t attempt to explain their reasons for why they put such high credence in their title claim, I think your response to BB on this topic is missing something, which is why I’m commenting.
You continue:
I think this is a basic misunderstanding of the book’s argument. IABIED is not arguing for the thesis that “you should believe ‘if anyone builds superintelligence with modern methods, everyone will die’ with >90% probability”, which is a meta-level point about confidence, and instead the thesis is the object-level claim that “if anyone builds ASI with modern methods, everyone will die.”
I agree with you that the book was not and should not have been attempting to raise the reader’s credence in the title thesis to >90%. As you said:
I kinda think anyone (who is not an expert) who reads IABIED and comes away with a similar level of pessimism as the authors is making an error. If you read any single book on a wild, controversial topic, you should not wind up extremely confident!
(Only disagreement: I think even experts shouldn’t read IABIED and update their credence in the title claim above 90% if it was previously below 90%.)
Given that a short, accessible book written for the general public could not possible provide all the evidence that the authors have seen over the years that has led to them being so confident in their title thesis, what should the book do instead?
The suggestion I gave in my review was that the authors should have provided a disclaimer in the Introduction, such as the following:
By the way, it is impossible for us to provide a complete account here of why we are almost certain that if anyone builds ASI anytime soon, everyone will die. We have been researching this question for decades and there are simply far too many considerations for us to address in this short book that we are trying to make accessible to a wide audience. Consequently, we are only going to lay out basic arguments for considerations that are particularly concerning to us. If after reading the book you think, ‘I can see why ASI might cause human extinction, but I don’t understand why the authors think it is inevitable that ASI would cause human extinction if built soon,’ then we have accomplished what we set out to do, and all we can say if you feel we left you hanging about why we are so confident is that we warned you, and to encourage you to read our online resources and other materials to begin to understand our high confidence.
Such a disclaimer would be sufficient to pre-empt the criticism that the book does not actually argue for its title thesis that if anyone builds ASI anytime soon, then it is almost certain that ASI will cause human extinction.
But the book could do more beyond this if it wanted to. In addition, it could say, “While we know we can’t possibly convey all the evidence that we have that lead to us having such high credences in our title claim, we can at least provide a summary of what led us to be so confident. While we don’t necessarily think this summary should update anyone’s credence in the title, it will at least give interested readers an idea of what led us to become so confident.” But Y&S did not provide any such summary in the book.
Such a summary is actually what I was hoping for. I’ve been curious about this for years and even asked Eliezer why his credence in existential catastrophe from AI was so high at a conference once (his answer, which was about rockets, didn’t seem like an explanation to me). To this day, if someone were to ask me why Eliezer is so much more confident in the IABIED claim than Paul Christiano or Daniel Kokotajlo or whoever, I still don’t have an answer that doesn’t make it sound like Eliezer’s reasons are obviously bad.
The cached explanation that comes to mind when I ask myself this question is “Well he’s been thinking about it for years and has become convinced that every alignment proposal he has seen fails.” But there are a lot of smart researchers who also aren’t aware of any alignment proposal that they think works, but that’s obviously not sufficient for their credence to be ~99%, so clearly Eliezer must have some other reasons that I’m not aware of. But what are those reasons? I don’t know, and IABIED didn’t give me any hints.
But there are a lot of smart researchers who also aren’t aware of any alignment proposal that they think works, but that’s obviously not sufficient for their credence to be ~99%, so clearly Eliezer must have some other reasons that I’m not aware of. But what are those reasons?
I think that, in such cases, Eliezer is simply not making a mistake that those other researchers are making, where they have substantial hope in unknown unknowns (some of which are in fact known, but maybe not to them).
I’m also a little confused by why you expect such a summary to exist. Or, rather, why the section titles from The Problem are insufficient:
There isn’t a ceiling at human-level capabilities.
ASI is very likely to exhibit goal-oriented behavior.
ASI is very likely to pursue the wrong goals.
It would be lethally dangerous to build ASIs that have the wrong goals.
If it’s because you think one or more of those steps aren’t obviously true and need more justification, well, you’re not alone, and many people think different parts of it need more justification, so there is no single concise summary that satisfies everyone.[1]
I think that, in such cases, Eliezer is simply not making a mistake that those other researchers are making, where they have substantial hope in unknown unknowns (some of which are in fact known, but maybe not to them).
You don’t get to adopt a prior where you have a 50-50 chance of winning the lottery “because either you win or you don’t”; the question is not whether we’re uncertain, but whether someone’s allowed to milk their uncertainty to expect good outcomes.
Reading your version of it now, it still seems to me like the point is just wrong. Updating to 99% because none of the alignment proposals you’ve considered seem like they would work just seems like overconfidence. Saying ‘no, you should update to 99% if you’ve considered as many alignment proposals as Eliezer has, and remaining less confident is the mistake of milking uncertainty into expecting good outcomes’ seems like the real mistake.
Does Eliezer really not have other reasons beyond this epistemological view that he ought to update to ~99% based on his own inability to find a potentially-promising solution to the alignment problem over the course of his career? I’ve long assumed that there was more to it than this, but maybe this epistemological point is actually just a major crux between Yudkowsky and others with significantly lower credences of extinction from AI.
I’m also a little confused by why you expect such a summary to exist. Or, rather, why the section titles from The Problem are insufficient
In short, I think they’re not sufficient because a person can agree with all those statements and also rationally think the title claim is >>1% likely to be false.
And also because e.g. saying you’re 99% confident that building ASIs with the wrong goals would lead to human extinction because “It would be lethally dangerous to build ASIs that have the wrong goals” is circular and doesn’t actually explain why you’re so confident. The layperson writing a book report doesn’t have anything to point to as the reason why you’re 99% confident while researchers like e.g. Christiano are much less confident (20% extinction within 10 years of powerful AI being built).
Does Eliezer really not have other reasons beyond this epistemological view that he ought to update to ~99% based on his own inability to find a potentially-promising solution to the alignment problem over the course of his career?
I don’t really understand what kinds of reasons you think would justify having 99% confidence in an outcome. 99% is not very high confidence, in log-odds—I am much more than 99% confident in many claims. But, that aside, he has written millions of words on the subject, explaining his views in detail, including describing much of the enormous amount of evidence that he believes bears on this question. It is difficult to compress that evidence into a short summary. (Though there have been numerous attempts.)
And also because e.g. saying you’re 99% confident that building ASIs with the wrong goals would lead to human extinction because “It would be lethally dangerous to build ASIs that have the wrong goals” is circular and doesn’t actually explain why you’re so confident.
I mean, yes, I was trying to demonstrate that a short summary will obviously fail to convey information that most readers would find necessary to carry the argument (and that most readers would want different additional pieces of information from each other). However, “It would be lethally dangerous to build ASIs that have the wrong goals” is not circular. You might say it lacks justification, but many people have background beliefs such that a statement like that requires little or no additional justification[1].
99% is not very high confidence, in log-odds—I am much more than 99% confident in many claims.
I am too. But for how many of those beliefs that you’re 99+% sure of can you name several people like Paul Christiano who think you’re on the wrong-side-of-maybe about? For me, not a single example comes to mind.
However, “It would be lethally dangerous to build ASIs that have the wrong goals” is not circular. You might say it lacks justification
I agree that’s not circular. I meant that the full claim “building ASIs with the wrong goals would lead to human extinction because ‘It would be lethally dangerous to build ASIs that have the wrong goals’” is circular. “Lacks justification” would have been clearer.
For example, if they believe both that Drexlerian nanotechnology is possible and that the ASI in question would be able to build it.
I hold this background belief but don’t think that it means the original claim requires little additional justification. But getting into such details is beyond the scope of this discussion thread. (Brief gesture at an explanation: Even though humans could exterminate all the ants in a backyard when they build a house, they don’t. It similarly seems plausible to me that ASI could start building its factories on Earth to enable it to build von Neuman probes to begin colonizing the universe all without killing all humans on Earth. Maybe it’d extinct humanity by boiling the oceans like mentioned in IABIED, but I have enough doubt in these sorts of predictions to remain <<99% confident in the ‘It would be lethally dangerous [i.e. it’d lead to extinction] to build ASIs that have the wrong goals’ claim.)
Even though humans could exterminate all the ants in a backyard when they build a house, they don’t.
They do, however, exterminate all the ants (and many other species) in a lot more than a backyard when they use pesticides on a farm. Or members of a lot more species in an even wider area when they build a hydroelectric dam.
True, and humans do cause the extinction of some species globally too, not just in certain farm fields. But notably most species humans don’t cause the extinction of, so using the analogy with humans-animals as a reason to expect ASI to be 99% likely to extinct humanity doesn’t work. The analogy is merely suggestive of risk.
I think this is a valid criticism, which I share. My main criticism of IABIED was that it didn’t argue for its title claim. See the 1700-word section of my review IABIED does not argue for its thesis. (I didn’t cross-post my review to LW or anywhere because I didn’t like that I was just complaining about the book being disappointing when I had such high hopes for it, but if anyone reading this thinks it’s worthwhile to post to LW, say so and I’ll listen.)
By default, it’s reasonable for readers of a book with the title IABIED to expect that the book will at least attempt to explain why if anyone builds ASI anytime soon, then it is almost certain that ASI will cause human extinction.
If the book merely explains why ASI might cause human extinction if anyone builds ASI anytime soon, then I think it is reasonable for readers to criticize this.
BB seems to say that IABIED does argue for its title thesis with the analogy to evolution, and just says that the argument is not decisive because it doesn’t address the “disanalogies between evolution and reinforcement learning.”
Whether one takes BB’s view that the book did argue for its title thesis and just didn’t do a very good (or complete?) job, or whether one takes my views that Y&S largely just didn’t attempt to explain their reasons for why they put such high credence in their title claim, I think your response to BB on this topic is missing something, which is why I’m commenting.
You continue:
I agree with you that the book was not and should not have been attempting to raise the reader’s credence in the title thesis to >90%. As you said:
(Only disagreement: I think even experts shouldn’t read IABIED and update their credence in the title claim above 90% if it was previously below 90%.)
Given that a short, accessible book written for the general public could not possible provide all the evidence that the authors have seen over the years that has led to them being so confident in their title thesis, what should the book do instead?
The suggestion I gave in my review was that the authors should have provided a disclaimer in the Introduction, such as the following:
Such a disclaimer would be sufficient to pre-empt the criticism that the book does not actually argue for its title thesis that if anyone builds ASI anytime soon, then it is almost certain that ASI will cause human extinction.
But the book could do more beyond this if it wanted to. In addition, it could say, “While we know we can’t possibly convey all the evidence that we have that lead to us having such high credences in our title claim, we can at least provide a summary of what led us to be so confident. While we don’t necessarily think this summary should update anyone’s credence in the title, it will at least give interested readers an idea of what led us to become so confident.” But Y&S did not provide any such summary in the book.
Such a summary is actually what I was hoping for. I’ve been curious about this for years and even asked Eliezer why his credence in existential catastrophe from AI was so high at a conference once (his answer, which was about rockets, didn’t seem like an explanation to me). To this day, if someone were to ask me why Eliezer is so much more confident in the IABIED claim than Paul Christiano or Daniel Kokotajlo or whoever, I still don’t have an answer that doesn’t make it sound like Eliezer’s reasons are obviously bad.
The cached explanation that comes to mind when I ask myself this question is “Well he’s been thinking about it for years and has become convinced that every alignment proposal he has seen fails.” But there are a lot of smart researchers who also aren’t aware of any alignment proposal that they think works, but that’s obviously not sufficient for their credence to be ~99%, so clearly Eliezer must have some other reasons that I’m not aware of. But what are those reasons? I don’t know, and IABIED didn’t give me any hints.
I think that, in such cases, Eliezer is simply not making a mistake that those other researchers are making, where they have substantial hope in unknown unknowns (some of which are in fact known, but maybe not to them).
I’m also a little confused by why you expect such a summary to exist. Or, rather, why the section titles from The Problem are insufficient:
There isn’t a ceiling at human-level capabilities.
ASI is very likely to exhibit goal-oriented behavior.
ASI is very likely to pursue the wrong goals.
It would be lethally dangerous to build ASIs that have the wrong goals.
If it’s because you think one or more of those steps aren’t obviously true and need more justification, well, you’re not alone, and many people think different parts of it need more justification, so there is no single concise summary that satisfies everyone.[1]
Though some summaries probably satisfy some people.
Eliezer has phrased this as:
Rob Bensinger quoted an exchange on this topic between Eliezer and Aryeh Englander. When I first read it years ago I recall thinking that Eliezer was wrong in the exchange and was confused why Rob was quoting it in apparent endorsement.
Reading your version of it now, it still seems to me like the point is just wrong. Updating to 99% because none of the alignment proposals you’ve considered seem like they would work just seems like overconfidence. Saying ‘no, you should update to 99% if you’ve considered as many alignment proposals as Eliezer has, and remaining less confident is the mistake of milking uncertainty into expecting good outcomes’ seems like the real mistake.
Does Eliezer really not have other reasons beyond this epistemological view that he ought to update to ~99% based on his own inability to find a potentially-promising solution to the alignment problem over the course of his career? I’ve long assumed that there was more to it than this, but maybe this epistemological point is actually just a major crux between Yudkowsky and others with significantly lower credences of extinction from AI.
In short, I think they’re not sufficient because a person can agree with all those statements and also rationally think the title claim is >>1% likely to be false.
And also because e.g. saying you’re 99% confident that building ASIs with the wrong goals would lead to human extinction because “It would be lethally dangerous to build ASIs that have the wrong goals” is circular and doesn’t actually explain why you’re so confident. The layperson writing a book report doesn’t have anything to point to as the reason why you’re 99% confident while researchers like e.g. Christiano are much less confident (20% extinction within 10 years of powerful AI being built).
I don’t really understand what kinds of reasons you think would justify having 99% confidence in an outcome. 99% is not very high confidence, in log-odds—I am much more than 99% confident in many claims. But, that aside, he has written millions of words on the subject, explaining his views in detail, including describing much of the enormous amount of evidence that he believes bears on this question. It is difficult to compress that evidence into a short summary. (Though there have been numerous attempts.)
I mean, yes, I was trying to demonstrate that a short summary will obviously fail to convey information that most readers would find necessary to carry the argument (and that most readers would want different additional pieces of information from each other). However, “It would be lethally dangerous to build ASIs that have the wrong goals” is not circular. You might say it lacks justification, but many people have background beliefs such that a statement like that requires little or no additional justification[1].
For example, if they believe both that Drexlerian nanotechnology is possible and that the ASI in question would be able to build it.
Thanks for the replies.
I am too. But for how many of those beliefs that you’re 99+% sure of can you name several people like Paul Christiano who think you’re on the wrong-side-of-maybe about? For me, not a single example comes to mind.
I agree that’s not circular. I meant that the full claim “building ASIs with the wrong goals would lead to human extinction because ‘It would be lethally dangerous to build ASIs that have the wrong goals’” is circular. “Lacks justification” would have been clearer.
I hold this background belief but don’t think that it means the original claim requires little additional justification. But getting into such details is beyond the scope of this discussion thread. (Brief gesture at an explanation: Even though humans could exterminate all the ants in a backyard when they build a house, they don’t. It similarly seems plausible to me that ASI could start building its factories on Earth to enable it to build von Neuman probes to begin colonizing the universe all without killing all humans on Earth. Maybe it’d extinct humanity by boiling the oceans like mentioned in IABIED, but I have enough doubt in these sorts of predictions to remain <<99% confident in the ‘It would be lethally dangerous [i.e. it’d lead to extinction] to build ASIs that have the wrong goals’ claim.)
They do, however, exterminate all the ants (and many other species) in a lot more than a backyard when they use pesticides on a farm. Or members of a lot more species in an even wider area when they build a hydroelectric dam.
True, and humans do cause the extinction of some species globally too, not just in certain farm fields. But notably most species humans don’t cause the extinction of, so using the analogy with humans-animals as a reason to expect ASI to be 99% likely to extinct humanity doesn’t work. The analogy is merely suggestive of risk.