Feature suggestion: Allow extension users to agree/disagree vote on the “corrections”, and give users the option to hide any corrections that others have disagreed with (or allow them to set the threshold to show the correction).
WilliamKiely
I haven’t signed the statement mostly because I snagged on the second bit about public support and wanted to think through in more detail, and ideally to write up/explain, what kind of public support I think it makes sense to condition otherwise safe forms of technological development on (especially insofar as the support in question is supposed to go beyond what’s at stake in e.g. standard democratic decision-making). And I haven’t had a chance to do this yet. That said, I may still sign. And per my comments in the post (see screenshot), I do support the right kind of global prohibition on developing superintelligence until we have a vastly better understanding of how to do safely—though “the right kind” is important here, and (as I expect you agree) I also think that there are a lot of important downside risks in this vicinity.
Originally the plan was to also analyze optimal timing from an impersonal (xrisk-minimization) perspective; but to prevent the text from ballooning even more, that topic was set aside for future work (which might never get done).
That’s unfortunate. It seem like it would have been better for you to start with the optimal timing analysis from an impersonal perspective since an impersonal perspective seems much more plausible than a person-affecting perspective.
Do you think your analysis of optimal timing from the person-affecting view is useful even if person-affecting views are wrong?
Do you think an analysis of optimal timing from an all-things-considered moral parliamentary perspective that only gives some weight to person-affecting views as appropriate would come to a similar conclusion about optimal timing?
“Don’t build it” is short for “Don’t built it yet” or “Don’t build it before it’s proven that doing so will not cause extinction” or something, else, right? That is, Y&S say in IABIED that they prefer that ASI gets built eventually. I’m not sure if this nuance should be included on the landing page, but if there’s a simple way to include it that it probably should be added.
I suspect that you’d get more people to pledge to attend if the messaging encouraged everyone who agrees with “Don’t Build It” to pledge, rather than just the small subset that believes IABIED. (I pledged despite disagreeing with IABIED, but I suspect that “we believe” IABIED messaging would discourage many who otherwise favor not building ASI to pledge.
True, and humans do cause the extinction of some species globally too, not just in certain farm fields. But notably most species humans don’t cause the extinction of, so using the analogy with humans-animals as a reason to expect ASI to be 99% likely to extinct humanity doesn’t work. The analogy is merely suggestive of risk.
I was recently reminded of the 2023 conversation between Aryeh Englander and Eliezer Yudkowsky quoted at the end of this post about model uncertainty. I re-read it today as well as all of the other comment’s on Aryeh’s Facebook post and still think that Aryeh’s perspective seems reasonable while Eliezer-and-Rob’s perspective seems to be lacking justification. That is, despite the conversation, it doesn’t seem like Eliezer’s comments about milking uncertainty into expecting good outcomes is actually an adequate answer to Aryeh’s question about why Eliezer is so confident that his model is correct and that everyone else’s models (of those with much lower p(doom from AI)) are wrong.
When I first read the quoted conversation a few years ago I didn’t think it was a major crux, but now I’m leaning toward thinking that this epistemological point is probably a major factor in why Eliezer’s credence that if anyone builds ASI anytime soon is ~99% while credence is much lower. (My p(doom from AI) is ~65%, my p(extinction from AI by 2100) is ~20%, and my p(doom from AI by 2100) is ~35%). Just wanted to note that I’ve updated on this point being a major crux.
Thanks for the replies.
99% is not very high confidence, in log-odds—I am much more than 99% confident in many claims.
I am too. But for how many of those beliefs that you’re 99+% sure of can you name several people like Paul Christiano who think you’re on the wrong-side-of-maybe about? For me, not a single example comes to mind.
However, “It would be lethally dangerous to build ASIs that have the wrong goals” is not circular. You might say it lacks justification
I agree that’s not circular. I meant that the full claim “building ASIs with the wrong goals would lead to human extinction because ‘It would be lethally dangerous to build ASIs that have the wrong goals’” is circular. “Lacks justification” would have been clearer.
For example, if they believe both that Drexlerian nanotechnology is possible and that the ASI in question would be able to build it.
I hold this background belief but don’t think that it means the original claim requires little additional justification. But getting into such details is beyond the scope of this discussion thread. (Brief gesture at an explanation: Even though humans could exterminate all the ants in a backyard when they build a house, they don’t. It similarly seems plausible to me that ASI could start building its factories on Earth to enable it to build von Neuman probes to begin colonizing the universe all without killing all humans on Earth. Maybe it’d extinct humanity by boiling the oceans like mentioned in IABIED, but I have enough doubt in these sorts of predictions to remain <<99% confident in the ‘It would be lethally dangerous [i.e. it’d lead to extinction] to build ASIs that have the wrong goals’ claim.)
I think that, in such cases, Eliezer is simply not making a mistake that those other researchers are making, where they have substantial hope in unknown unknowns (some of which are in fact known, but maybe not to them).
Eliezer has phrased this as:
You don’t get to adopt a prior where you have a 50-50 chance of winning the lottery “because either you win or you don’t”; the question is not whether we’re uncertain, but whether someone’s allowed to milk their uncertainty to expect good outcomes.
Rob Bensinger quoted an exchange on this topic between Eliezer and Aryeh Englander. When I first read it years ago I recall thinking that Eliezer was wrong in the exchange and was confused why Rob was quoting it in apparent endorsement.
Reading your version of it now, it still seems to me like the point is just wrong. Updating to 99% because none of the alignment proposals you’ve considered seem like they would work just seems like overconfidence. Saying ‘no, you should update to 99% if you’ve considered as many alignment proposals as Eliezer has, and remaining less confident is the mistake of milking uncertainty into expecting good outcomes’ seems like the real mistake.
Does Eliezer really not have other reasons beyond this epistemological view that he ought to update to ~99% based on his own inability to find a potentially-promising solution to the alignment problem over the course of his career? I’ve long assumed that there was more to it than this, but maybe this epistemological point is actually just a major crux between Yudkowsky and others with significantly lower credences of extinction from AI.
I’m also a little confused by why you expect such a summary to exist. Or, rather, why the section titles from The Problem are insufficient
In short, I think they’re not sufficient because a person can agree with all those statements and also rationally think the title claim is >>1% likely to be false.
And also because e.g. saying you’re 99% confident that building ASIs with the wrong goals would lead to human extinction because “It would be lethally dangerous to build ASIs that have the wrong goals” is circular and doesn’t actually explain why you’re so confident. The layperson writing a book report doesn’t have anything to point to as the reason why you’re 99% confident while researchers like e.g. Christiano are much less confident (20% extinction within 10 years of powerful AI being built).
In many places in his review he [...] criticizes the book as not making the case that extreme pessimism is warranted.
I think this is a valid criticism, which I share. My main criticism of IABIED was that it didn’t argue for its title claim. See the 1700-word section of my review IABIED does not argue for its thesis. (I didn’t cross-post my review to LW or anywhere because I didn’t like that I was just complaining about the book being disappointing when I had such high hopes for it, but if anyone reading this thinks it’s worthwhile to post to LW, say so and I’ll listen.)
By default, it’s reasonable for readers of a book with the title IABIED to expect that the book will at least attempt to explain why if anyone builds ASI anytime soon, then it is almost certain that ASI will cause human extinction.
If the book merely explains why ASI might cause human extinction if anyone builds ASI anytime soon, then I think it is reasonable for readers to criticize this.
BB seems to say that IABIED does argue for its title thesis with the analogy to evolution, and just says that the argument is not decisive because it doesn’t address the “disanalogies between evolution and reinforcement learning.”
Whether one takes BB’s view that the book did argue for its title thesis and just didn’t do a very good (or complete?) job, or whether one takes my views that Y&S largely just didn’t attempt to explain their reasons for why they put such high credence in their title claim, I think your response to BB on this topic is missing something, which is why I’m commenting.
You continue:
I think this is a basic misunderstanding of the book’s argument. IABIED is not arguing for the thesis that “you should believe ‘if anyone builds superintelligence with modern methods, everyone will die’ with >90% probability”, which is a meta-level point about confidence, and instead the thesis is the object-level claim that “if anyone builds ASI with modern methods, everyone will die.”
I agree with you that the book was not and should not have been attempting to raise the reader’s credence in the title thesis to >90%. As you said:
I kinda think anyone (who is not an expert) who reads IABIED and comes away with a similar level of pessimism as the authors is making an error. If you read any single book on a wild, controversial topic, you should not wind up extremely confident!
(Only disagreement: I think even experts shouldn’t read IABIED and update their credence in the title claim above 90% if it was previously below 90%.)
Given that a short, accessible book written for the general public could not possible provide all the evidence that the authors have seen over the years that has led to them being so confident in their title thesis, what should the book do instead?
The suggestion I gave in my review was that the authors should have provided a disclaimer in the Introduction, such as the following:
By the way, it is impossible for us to provide a complete account here of why we are almost certain that if anyone builds ASI anytime soon, everyone will die. We have been researching this question for decades and there are simply far too many considerations for us to address in this short book that we are trying to make accessible to a wide audience. Consequently, we are only going to lay out basic arguments for considerations that are particularly concerning to us. If after reading the book you think, ‘I can see why ASI might cause human extinction, but I don’t understand why the authors think it is inevitable that ASI would cause human extinction if built soon,’ then we have accomplished what we set out to do, and all we can say if you feel we left you hanging about why we are so confident is that we warned you, and to encourage you to read our online resources and other materials to begin to understand our high confidence.
Such a disclaimer would be sufficient to pre-empt the criticism that the book does not actually argue for its title thesis that if anyone builds ASI anytime soon, then it is almost certain that ASI will cause human extinction.
But the book could do more beyond this if it wanted to. In addition, it could say, “While we know we can’t possibly convey all the evidence that we have that lead to us having such high credences in our title claim, we can at least provide a summary of what led us to be so confident. While we don’t necessarily think this summary should update anyone’s credence in the title, it will at least give interested readers an idea of what led us to become so confident.” But Y&S did not provide any such summary in the book.
Such a summary is actually what I was hoping for. I’ve been curious about this for years and even asked Eliezer why his credence in existential catastrophe from AI was so high at a conference once (his answer, which was about rockets, didn’t seem like an explanation to me). To this day, if someone were to ask me why Eliezer is so much more confident in the IABIED claim than Paul Christiano or Daniel Kokotajlo or whoever, I still don’t have an answer that doesn’t make it sound like Eliezer’s reasons are obviously bad.
The cached explanation that comes to mind when I ask myself this question is “Well he’s been thinking about it for years and has become convinced that every alignment proposal he has seen fails.” But there are a lot of smart researchers who also aren’t aware of any alignment proposal that they think works, but that’s obviously not sufficient for their credence to be ~99%, so clearly Eliezer must have some other reasons that I’m not aware of. But what are those reasons? I don’t know, and IABIED didn’t give me any hints.
But that’s passing the buck… where to find the trustworthy commenters?
My idea for this has been that rather than require that all users use and trust the extension’s single foxy aggregation / deference algorithm, the tool instead ought to give users the freedom to choose between different aggregation mechanisms, including being able to select which users to epistemically trust or not. In other words, it could almost be like an epistemic social network where users can choose whose judgment they respect qnd have their aggregation algorithm give special weight to those users (as well as users those users say they respect the judgment of).
Perhaps this would lead to some users using the system to support their own tribalism or whatever and have their personalized aggregation algorithm spit out poor judgments, but I think it’d allow users like those on LW to use the tool and become more informed as a result.
Another solution could be to let every user specify whom they trust, and show the opinions of your friends more visibly than the opinion of randos. So you would get mostly good results if you import the list of rationalists; and everyone else, uhm, will use the tool to reinforce the bubble they are already in.
Yeah, exactly.
I think it’d be a valuable tool despite the challenges you mentioned.
I think the main challenge would be getting enough people to give the tool/extension enough input epistemic data, rather than (in my view) the lesser challenges of making the outputs based on that input data valuable enough to be informative to users.
And to solve this problem, I imagine the developers would have to come up with creative ways to make giving the tool epistemic data fast and low friction (though maybe not—e.g. is submitting Community Notes fast or low friction? (IDK, but) perhaps not necesarily and maybe some users do it anways because they value the exposure and impact their note may have if approved).
And perhaps also making sure that the way the users provide the onput data is a way that allows that data to be aggregated by some algorithm. E.g. It’s easier to aggregate submissions claiming a sentence is true or false, but what if a user just wants to submit a claim as misleading—do you need a more creative way to capture that data if you want to be able to communicate to other users the manner in which it is misleading rather than just a “misleading” tag? I haven’t thought through these sorts of questions, but suspect strongly that there is some MVP version of the extension that I at the very least would value as an end user and would also be happy to contribute to, even if only a few people I know would be seeing my data/notes when reading the same content as me after the fact. Though of course the more people who uae the tool and see the data, the more willing I’d be to contribute assuming some small time cost of contributing data. I already spend time leaving comments on things to point out mistakes and I imagine such a tool would just reduce the friction of providing such feedback.
You can’t fact check everything you hear and read; you literally don’t have the time, energy, or knowledge needed.
I’ve long thought that it’s also true that an entrpreneur could build a tool that allows people to easily see whether virtually everything they read or see on the internet is true.
On LessWrong if a reader thinks something someone says in a post is false they can highlight the sentence and Disagree-react it. Then everyone else reading the post can see that the sentence is highlighted and see who said they disagreed with it. This is great for epistemics.
I envision a system (could be as simple as a browser extension) that allows users to frictionlessly report their feedback/beliefs when reading any content online, noting when things they read seem true or false or definitely false, etc. The system crowdsources all of this epistemic feedback and then uses the data to estimate whether things actually are true or false, and shares this insight with other users.
Then no longer will someone have to read a news article or post that 100 or more other people have already read and be left to their own devices to determine what parts are true or not.
Perhaps some users might not trust the main algorithm’s judgment and would prefer to choose a set of other users who they trust have good judgment, and have their personalized algorithm give these people’s epistemic feedback extra weight. Great, the system should have this feature.
Perhaps some users mark something as false and later other users come along and show that it is true. Then perhaps the first users should have an epistemic score that goes down as a consequence of their mistaken/bad epistemic feedback.
Perhaps the system should track how good of judgment users have over time to ascertain which users give reliable feedback and which users have bad epistemics and largely just contribute noise.
There are a lot of features that could be added to such a system. But the point is that I’ve read far too many news articles and posts on the broader internet and had the experience of noting a mistake or inaccuracy or outright falsehood, and then moved on without sharing the insight with anyone due to there being no efficient way to do so.
Surely also there are many inaccuracies that I miss, and I’d benefit from being informed by others who did catch them noting that they were there in a way I could just believe as a non-expert on the claim.
First, environment: if you want to believe true things, try not to spend too much time around people who are going to sneeze false information or badly reasoned arguments into your face. You can’t fact check everything you hear and read; you literally don’t have the time, energy, or knowledge needed. Cultivate a social network that cares about true things.
This is good advice, but I really wish (and think it possible) that some competent entrepreneurs made it much less needed by creating epistemic tools that enhance the ability of anyone to discern what’s true out in the wild where people do commonly sneeze false information in your face.
What’s more, I think no private company should be in a position to impose this kind of risk on every living human, and I support efforts to make sure that no company ever is.
I don’t see your name on the Statement on Superintelligence when I search for it. Assuming you didn’t sign it, why not? Do you disagree with it?
It seems like an effort to make sure that no company is in the position to impose this kind of risk on every living human:
We call for a prohibition on the development of superintelligence, not lifted before there is
broad scientific consensus that it will be done safely and controllably, and
strong public buy-in.
(Several Anthropic, OpenAI, and Google DeepMind employees signed.)
Chapter 5, “Its Favorite Things,” starts with Yudkowsky’s “Correct-Nest parable” about intelligent aliens who care a lot about the exact number of stones found in their nests.
Immediately after the parable, on page 82:
Most alien species, if they evolved similarly to how known biological evolution usually works, and if given a chance to have things the way they liked them most, probably would not choose a civilization where all their homes contained a large prime number of stones. There are just a lot of other ways to be; there are a lot of other directions one could steer. Much like predicting that your next lottery ticket won’t be a winning one, this is an easy call.
Similarly, most powerful artificial intelligences, created by any method remotely resembling the current methods, would not choose to build a future full of happy, free people. We aren’t saying this because we get a kick out of being bleak. It’s just that those powerful machine intelligences will not be born with preferences much like ours.
This is just a classic “counting argument” against alignment efforts being successful, right?
I recall Alex Turner (TurnTrout) arguing that (at least some) counting arguments (that are often made) are wrong (Many arguments for AI x-risk are wrong) and quoting Nora Belrose and Quintin Pope arguing the same (Counting arguments provide no evidence for AI doom). Some people in the comments, such as Evan Hubinger, seem to disagree, but as a layperson the discussion became too technical for me to understand.
In any case, the version of the counting argument in the book seems simple enough that as a layperson I can tell that it’s wrong. To me it seems like it clearly proves too much.
Insofar as Yudkowsky and Soares are saying here that an ASI created by any method remotely resembling the current method will likely not choose to build a future full of happy, free people because there are many more possible preferences that an ASI could have than the narrow subset of preferences that would lead to it building a future of happy, free people, then I think the argument is wrong.
It seems like this counting observation is a reason to think (so maybe I think the “no evidence” in the above linked post title is too strong) that the preferences an ASI ends up having might not be the preferences its creators try training into it, because the target preferences are indeed a narrow target and narrow targets are easier to miss than broad targets. But surely this counting observation is not sufficient to conclude that ASI creators will fail to hit their narrow target. It seems like you would need more reasons to conclude that.
IABIED Misc. Discussion Thread
Agreed that current models fail badly at alignment in many senses.
I still feel like the bet that OP offered Collier in response to her stating that currently available techniques do a reasonably good job of making potentially alien and incomprehensible jealous ex-girlfriends like “Sydney” very rare was inappropriate, as the bet was clearly about a different claim than her claim about the frequency of Sydney-like behavior.
A more appropriate response from OP would have been to say that while current techniques may have successfully reduced the frequency of Syndey-like behavior, they’re still failing badly in other respects, such as your observation with Claude Code.
But the way you are reading it seems to mean her “strawmann[ed]” point is irrelevant to the claim she made!
I agree.
(I only skimmed your review / quickly read about half of it. I agree with some of your criticisms of Collier’s review and disagree with others. I don’t have an overall take.)
One criticism of Collier’s review you appeared not to make that I would make is the following.
Collier wrote:
By far the most compelling argument that extraordinarily advanced AIs might exist in the future is that pretty advanced AIs exist right now, and they’re getting more advanced all the time. One can’t write a book arguing for the danger of superintelligence without mentioning this fact.
I disagree. I think it was clear decades before the pretty advanced AIs of today existed that extraordinarily advanced AIs might exist (and indeed probably would exist) eventually. As such, the most compelling argument that extraordinarily advanced AIs might or probably will exist in the future is not that pretty advanced AIs exist today, but the same argument one could have made (and some did make) decades ago.
One version of the argument is that the limits of how advanced AI could be in principle seem extraordinarily advanced (human brains are an existence proof and human brains have known limitations relative to machines) and it seems unlikely that AI progress would permantently stall before getting to a point where there are extraordinarily advanced AIs.
E.g. I.J. Good foresaw superintelligent machines, and I don’t think he was just getting lucky to imagine that they might or probably would come to exist at some point. I think he had access to compelling reasons.
The existence of pretty advanced AIs today is some evidence and allows us to be a bit more confident that extraordinarily advanced AIs will eventually be built, but their existence is not the most compelling reason to expect significantly more capable AIs to be created eventually.
Bernie’s Sanders quoted the March 2023 Pause Giant AI Experiments Open Letter’s language “governments should step in and institute a moratorium” in a video today as justification for his legislation calling for a moratorium on the construction of new data centers, even though a moratorium on new data centers is not the kind of moratorium that the letter called for.
Bernie quotes the pause letter at 7:12: