Communications @ MIRI. Unless otherwise indicated, my posts and comments here reflect my own views, and not necessarily my employer’s. (Though we agree about an awful lot.)
Rob Bensinger
It’s a bit complicated, but after looking into this and weighing this against other factors, MIRI and our publisher both think that the best option is for people to just buy it when they think to buy it—the sooner, the better.
Whether you’re buying on Amazon or elsewhere, on net I think it’s a fair bit better to buy now than to wait.
Yeah, I think the book is going to be (by a very large margin) the best resource in the world for this sort of use case. (Though I’m potentially biased as a MIRI employee.) We’re not delaying; this is basically as fast as the publishing industry goes, and we expected the audience to be a lot smaller if we self-published. (A more typical timeline would have put the book another 3-20 months out.)
If Eliezer and Nate could release it sooner than September while still gaining the benefits of working with a top publishing house, doing a conventional media tour, etc., then we’d definitely be releasing it immediately. As is, our publisher has done a ton of great work already and has been extremely enthusiastic about this project, in a way that makes me feel way better about this approach. “We have to wait till September” is a real cost of this option, but I think it’s a pretty unavoidable cost given that we need this book to reach a lot of people, not just the sort of people who would hear about it from a friend on LessWrong.
I do think there are a lot of good resources already online, like MIRI’s recently released intro resource, “The Problem”. It’s a very different beast from If Anyone Build It, Everyone Dies (mainly written by different people, and independent of the whole book-writing process), and once the book comes out I’ll consider the book strictly better for anyone willing to read something longer. But I think “The Problem” is a really good overview in its own right, and I expect to continue citing it regularly, because having something shorter and free-to-read does matter a lot.
Some other resources I especially like include:
Gabriel Alfour’s Preventing Extinction from Superintelligence, for a quick and to-the-point overview of the situation.
Ian Hogarth’s We Must Slow Down the Race to God-Like AI (requires Financial Times access), for an overview with a bit more discussion of recent AI progress.
The AI Futures Project’s AI 2027, for a discussion focused on very near-term disaster scenarios. (See also a response from Max Harms, who works at MIRI.)
MIRI’s AGI Ruin, for people who want a more thorough and (semi)technical “why does AGI alignment look hard?” argument. This is a tweaked version of the LW AGI Ruin post, with edits aimed at making the essay more useful to share around widely. (The original post kinda assumed you were vaguely in the LW/EA ecosystem.)
In my experience, “normal” folks are often surprisingly open to these arguments, and I think the book is remarkably normal-person-friendly given its topic. I’d mainly recommend telling your friends what you actually think, and using practice to get better at it.
Context: One of the biggest bottlenecks on the world surviving, IMO, is the amount (and quality!) of society-wide discourse about ASI. As a consequence, I already thought one of the most useful things most people can do nowadays is to just raise the alarm with more people, and raise the bar on the quality of discourse about this topic. I’m treating the book as an important lever in that regard (and an important lever for other big bottlenecks, like informing the national security community in particular). Whether you have a large audience or just a network of friends you’re talking to, this is how snowballs get started.
If you’re just looking for text you can quote to get people interested, I’ve been using:
As the AI industry scrambles to build increasingly capable and general AI, two researchers speak out about a disaster on the horizon.
In 2023, hundreds of AI scientists and leaders in the field, including the three most cited living AI scientists, signed an open letter warning that AI poses a serious risk of causing human extinction. Today, however, the AI race is only heating up. Tech CEOs are setting their sights on smarter-than-human AI. If they succeed, the world is radically unprepared for what comes next.
In this book, Eliezer Yudkowsky and Nate Soares explain the nature of the threat posed by smarter-than-human AI. In a conflict between humans and AI, a superintelligence would win, as easily as modern chess AIs crush the world’s best humans at chess. The conflict would not be close, or even especially interesting.
The world is racing to build something truly new under the sun. And if anyone builds it, everyone dies.
Stephen Fry’s blurb from Nate’s post above might also be helpful here:
The most important book I’ve read for years: I want to bring it to every political and corporate leader in the world and stand over them until they’ve read it. Yudkowsky and Soares, who have studied AI and its possible trajectories for decades, sound a loud trumpet call to humanity to awaken us as we sleepwalk into disaster. Their brilliant gift for analogy, metaphor and parable clarifies for the general reader the tangled complexities of AI engineering, cognition and neuroscience better than any book on the subject I’ve ever read, and I’ve waded through scores of them. We really must rub our eyes and wake the **** up!
If your friends are looking for additional social proof that this is a serious issue, you could cite things like the Secretary-General of the United Nations:
Alarm bells over the latest form of artificial intelligence, generative AI, are deafening. And they are loudest from the developers who designed it. These scientists and experts have called on the world to act, declaring AI an existential threat to humanity on par with the risk of nuclear war. We must take those warnings seriously.
(This is me spitballing ideas; if a bunch of LWers take a crack at figuring out useful things to say, I expect at least some people to have better ideas.)
You could also try sending your friends an online AI risk explainer, e.g., MIRI’s The Problem or Ian Hogarth’s We Must Slow Down the Race to God-Like AI (requires Financial Times access) or Gabriel Alfour’s Preventing Extinction from Superintelligence.
There’s a professional Russian translator lined up for the book already, though we may need volunteer help with translating the online supplements. I’ll keep you (and others who have offered) in mind for that—thanks, Tapatakt. :)
Yep! This is the first time I’m hearing the claim that hardcover matters more for bestseller lists; but I do believe hardcover preorders matter a bit more than audiobook preorders (which matters a bit more than ebook preorders). I was assuming the mechanism for this is that they provide different amounts of evidence about print demand, and thereby influence the print run a bit differently. AFAIK all the options are solidly great, though; mostly I’d pick the one(s) that you actually want the most.
I didn’t cross-post it, but I’ve poked EY about the title!
I feel pretty frustrated at how rarely people actually bet or make quantitative predictions about existential risk from AI. EG my recent attempt to operationalize a bet with Nate went nowhere. Paul trying to get Eliezer to bet during the MIRI dialogues also went nowhere, or barely anywhere—I think they ended up making some random bet about how long an IMO challenge would take to be solved by AI. (feels pretty weak and unrelated to me. lame. but huge props to Paul for being so ready to bet, that made me take him a lot more seriously.)
This paragraph doesn’t seem like an honest summary to me. Eliezer’s position in the dialogue, as I understood it, was:
The journey is a lot harder to predict than the destination. Cf. “it’s easier to use physics arguments to predict that humans will one day send a probe to the Moon, than it is to predict when this will happen or what the specific capabilities of rockets five years from now will be”. Eliezer isn’t claiming to have secret insights about the detailed year-to-year or month-to-month changes in the field; if he thought that, he’d have been making those near-term tech predictions already back in 2010, 2015, or 2020 to show that he has this skill.
From Eliezer’s perspective, Paul is claiming to know a lot about the future trajectory of AI, and not just about the endpoints: Paul thinks progress will be relatively smooth and continuous, and thinks it will get increasingly smooth and continuous as time passes and more resources flow into the field. Eliezer, by contrast, expects the field to get choppier as time passes and we get closer to ASI.
A way to bet on this, which Eliezer repeatedly proposed but wasn’t able to get Paul to do very much, would be for Paul to list out a bunch of concrete predictions that Paul sees as “yep, this is what smooth and continuous progress looks like”. Then, even though Eliezer doesn’t necessarily have a concrete “nope, the future will go like X instead of Y” prediction, he’d be willing to bet against a portfolio of Paul-predictions: when you expect the future to be more unpredictable, you’re willing to at least weakly bet against any sufficiently ambitious pool of concrete predictions.
(Also, if Paul generated a ton of predictions like that, an occasional prediction might indeed make Eliezer go “oh wait, I do have a strong prediction on that question in particular; I didn’t realize this was one of our points of disagreement”. I don’t think this is where most of the action is, but it’s at least a nice side-effect of the person-who-thinks-this-tech-is-way-more-predictable spelling out predictions.)
Eliezer was also more interested in trying to reach mutual understanding of the views on offer, as opposed to bet let’s bet on things immediately never mind the world-views. But insofar as Paul really wanted to have the bets conversation instead, Eliezer sunk an awful lot of time into trying to find operationalizations Paul and he could bet on, over many hours of conversation.
If your end-point take-away from that (even after actual bets were in fact made, and tons of different high-level predictions were sketched out) is “wow how dare Eliezer be so unwilling to make bets on anything”, then I feel a lot less hope that world-models like Eliezer’s (“long-term outcome is more predictable than the detailed year-by-year tech pathway”) are going to be given a remotely fair hearing.
(Also, in fairness to Paul, I’d say that he spent a bunch of time working with Eliezer to try to understand the basic methodologies and foundations for their perspectives on the world. I think both Eliezer and Paul did an admirable job going back and forth between the thing Paul wanted to focus on and the thing Eliezer wanted to focus on, letting us look at a bunch of different parts of the elephant. And I don’t think it was unhelpful for Paul to try to identify operationalizations and bets, as part of the larger discussion; I just disagree with TurnTrout’s summary of what happened.)
If I was misreading the blog post at the time, how come it seems like almost no one ever explicitly predicted at the time that these particular problems were trivial for systems below or at human-level intelligence?!?
Quoting the abstract of MIRI’s “The Value Learning Problem” paper (emphasis added):
Autonomous AI systems’ programmed goals can easily fall short of programmers’ intentions. Even a machine intelligent enough to understand its designers’ intentions would not necessarily act as intended. We discuss early ideas on how one might design smarter-than-human AI systems that can inductively learn what to value from labeled training data, and highlight questions about the construction of systems that model and act upon their operators’ preferences.
And quoting from the first page of that paper:
The novelty here is not that programs can exhibit incorrect or counter-intuitive behavior, but that software agents smart enough to understand natural language may still base their decisions on misrepresentations of their programmers’ intent. The idea of superintelligent agents monomaniacally pursuing “dumb”-seeming goals may sound odd, but it follows from the observation of Bostrom and Yudkowsky [2014, chap. 7] that AI capabilities and goals are logically independent.1 Humans can fully comprehend that their “designer” (evolution) had a particular “goal” (reproduction) in mind for sex, without thereby feeling compelled to forsake contraception. Instilling one’s tastes or moral values into an heir isn’t impossible, but it also doesn’t happen automatically.
I won’t weigh in on how many LessWrong posts at the time were confused about where the core of the problem lies. But “The Value Learning Problem” was one of the seven core papers in which MIRI laid out our first research agenda, so I don’t think “we’re centrally worried about things that are capable enough to understand what we want, but that don’t have the right goals” was in any way hidden or treated as minor back in 2014-2015.
I also wouldn’t say “MIRI predicted that NLP will largely fall years before AI can match e.g. the best human mathematicians, or the best scientists”, and if we saw a way to leverage that surprise to take a big bite out of the central problem, that would be a big positive update.
I’d say:
MIRI mostly just didn’t make predictions about the exact path ML would take to get to superintelligence, and we’ve said we didn’t expect this to be very predictable because “the journey is harder to predict than the destination”. (Cf. “it’s easier to use physics arguments to predict that humans will one day send a probe to the Moon, than it is to predict when this will happen or what the specific capabilities of rockets five years from now will be”.)
Back in 2016-2017, I think various people at MIRI updated to median timelines in the 2030-2040 range (after having had longer timelines before that), and our timelines haven’t jumped around a ton since then (though they’ve gotten a little bit longer or shorter here and there).
So in some sense, qualitatively eyeballing the field, we don’t feel surprised by “the total amount of progress the field is exhibiting”, because it looked in 2017 like the field was just getting started, there was likely an enormous amount more you could do with 2017-style techniques (and variants on them) than had already been done, and there was likely to be a lot more money and talent flowing into the field in the coming years.
But “the total amount of progress over the last 7 years doesn’t seem that shocking” is very different from “we predicted what that progress would look like”. AFAIK we mostly didn’t have strong guesses about that, though I think it’s totally fine to say that the GPT series is more surprising to the circa-2017 MIRI than a lot of other paths would have been.
(Then again, we’d have expected something surprising to happen here, because it would be weird if our low-confidence visualizations of the mainline future just happened to line up with what happened. You can expect to be surprised a bunch without being able to guess where the surprises will come from; and in that situation, there’s obviously less to be gained from putting out a bunch of predictions you don’t particularly believe in.)
Pre-deep-learning-revolution, we made early predictions like “just throwing more compute at the problem without gaining deep new insights into intelligence is less likely to be the key thing that gets us there”, which was falsified. But that was a relatively high-level prediction; post-deep-learning-revolution we haven’t claimed to know much about how advances are going to be sequenced.
We have been quite interested in hearing from others about their advance prediction record: it’s a lot easier to say “I personally have no idea what the qualitative capabilities of GPT-2, GPT-3, etc. will be” than to say ”… and no one else knows either”, and if someone has an amazing track record at guessing a lot of those qualitative capabilities, I’d be interested to hear about their further predictions. We’re generally pessimistic that “which of these specific systems will first unlock a specific qualitative capability?” is particularly predictable, but this claim can be tested via people actually making those predictions.
But the benefit of a Pause is that you use the extra time to do something in particular. Why wouldn’t you want to fiscally sponsor research on problems that you think need to be solved for the future of Earth-originating intelligent life to go well?
MIRI still sponsors some alignment research, and I expect we’ll sponsor more alignment research directions in the future. I’d say MIRI leadership didn’t have enough aggregate hope in Agent Foundations in particular to want to keep supporting it ourselves (though I consider its existence net-positive).
My model of MIRI is that our main focus these days is “find ways to make it likelier that a halt occurs” and “improve the world’s general understanding of the situation in case this helps someone come up with a better idea”, but that we’re also pretty open to taking on projects in all four of these quadrants, if we find something that’s promising and that seems like a good fit at MIRI (or something promising that seems unlikely to occur if it’s not housed at MIRI):
AI alignment work Non-alignment work High-EV absent a pause High-EV given a pause
I don’t find this convincing. I think the target “dumb enough to be safe, honest, trustworthy, relatively non-agentic, etc., but smart enough to be super helpful for alignment” is narrow (or just nonexistent, using the methods we’re likely to have on hand).
Even if this exists, verification seems extraordinarily difficult: how do we know that the system is being honest? Separately, how do we verify that its solutions are correct? Checking answers is sometimes easier than generating them, but only to a limited degree, and alignment seems like a case where checking is particularly difficult.
It’s also important to keep in mind that on Leopold’s model (and my own), these problems need to be solved under a ton of time pressure. To maintain a lead, the USG in Leopold’s scenario will often need to figure out some of these “under what circumstances can we trust this highly novel system and believe its alignment answers?” issues in a matter of weeks or perhaps months, so that the overall alignment project can complete in a very short window of time. This is not a situation where we’re imagining having a ton of time to develop mastery and deep understanding of these new models. (Or mastery of the alignment problem sufficient to verify when a new idea is on the right track or not.)
one positive feature it does have, it proposes to rely on a multitude of “limited weakly-superhuman artificial alignment researchers” and makes a reasonable case that those can be obtained in a form factor which is alignable and controllable.
I don’t find this convincing. I think the target “dumb enough to be safe, honest, trustworthy, relatively non-agentic, etc., but smart enough to be super helpful for alignment” is narrow (or just nonexistent, using the methods we’re likely to have on hand).
Even if this exists, verification seems extraordinarily difficult: how do we know that the system is being honest? Separately, how do we verify that its solutions are correct? Checking answers is sometimes easier than generating them, but only to a limited degree, and alignment seems like a case where checking is particularly difficult.
You and Leopold seem to share the assumption that huge GPU farms or equivalently strong compute are necessary for superintelligence.
Nope! I don’t assume that.
I do think that it’s likely the first world-endangering AI is trained using more compute than was used to train GPT-4; but I’m certainly not confident of that prediction, and I don’t think it’s possible to make reasonable predictions (given our current knowledge state) about how much more compute might be needed.
(“Needed” for the first world-endangeringly powerful AI humans actually build, that is. I feel confident that you can in principle build world-endangeringly powerful AI with far less compute than was used to train GPT-4; but the first lethally powerful AI systems humans actually build will presumably be far from the limits of what’s physically possible!)
But what would happen if one effectively closes that path? There will be huge selection pressure to look for alternative routes, to invest more heavily in those algorithmic breakthroughs which can work with modest GPU power or even with CPUs.
Agreed. This is why I support humanity working on things like human enhancement and (plausibly) AI alignment, in parallel with working on an international AI development pause. I don’t think that a pause on its own is a permanent solution, though if we’re lucky and the laws are well-designed I imagine it could buy humanity quite a few decades.
I hope people will step back from solely focusing on advocating for policy-level prescriptions (as none of the existing policy-level prescriptions look particularly promising at the moment) and invest some of their time in continuing object-level discussions of AI existential safety without predefined political ends.
FWIW, MIRI does already think of “generally spreading reasonable discussion of the problem, and trying to increase the probability that someone comes up with some new promising idea for addressing x-risk” as a top organizational priority.
The usual internal framing is some version of “we have our own current best guess at how to save the world, but our idea is a massive longshot, and not the sort of basket humanity should put all its eggs in”. I think “AI pause + some form of cognitive enhancement” should be a top priority, but I also consider it a top priority for humanity to try to find other potential paths to a good future.
As a start, you can prohibit sufficiently large training runs. This isn’t a necessary-and-sufficient condition, and doesn’t necessarily solve the problem on its own, and there’s room for debate about how risk changes as a function of training resources. But it’s a place to start, when the field is mostly flying blind about where the risks arise; and choosing a relatively conservative threshold makes obvious sense when failing to leave enough safety buffer means human extinction. (And when algorithmic progress is likely to reduce the minimum dangerous training size over time, whatever it is today—also a reason the cap is likely to need to lower over time to some extent, until we’re out of the lethally dangerous situation we currently find ourselves in.)
Alternatively, they either don’t buy the perils or believes there’s a chance the other chance may not?
If they “don’t buy the perils”, and the perils are real, then Leopold’s scenario is falsified and we shouldn’t be pushing for the USG to build ASI.
If there are no perils at all, then sure, Leopold’s scenario and mine are both false. I didn’t mean to imply that our two views are the only options.
Separately, Leopold’s model of “what are the dangers?” is different from mine. But I don’t think the dangers Leopold is worried about are dramatically easier to understand than the dangers I’m worried about (in the respective worlds where our worries are correct). Just the opposite: the level of understanding you need to literally solve alignment for superintelligences vastly exceeds the level you need to just be spooked by ASI and not want it to be built. Which is the point I was making; not “ASI is axiomatically dangerous”, but “this doesn’t count as a strike against my plan relative to Leopold’s, and in fact Leopold is making a far bigger ask of government than I am on this front”.
Nuclear war essentially has a localized p(doom) of 1
I don’t know what this means. If you’re saying “nuclear weapons kill the people they hit”, I don’t see the relevance; guns also kill the people they hit, hut that doesn’t make a gun strategically similar to a smarter-than-human AI system.
Yep, I had in mind AI Forecasting: One Year In.
Why? 95% risk of doom isn’t certainty, but seems obviously more than sufficient.
For that matter, why would the USG want to build AGI if they considered it a coinflip whether this will kill everyone or not? The USG could choose the coinflip, or it could choose to try to prevent China from putting the world at risk without creating that risk itself. “Sit back and watch other countries build doomsday weapons” and “build doomsday weapons yourself” are not the only two options.
Leopold’s scenario requires that the USG come to deeply understand all the perils and details of AGI and ASI (since they otherwise don’t have a hope of building and aligning a superintelligence), but then needs to choose to gamble its hegemony, its very existence, and the lives of all its citizens on a half-baked mad science initiative, when it could simply work with its allies to block the tech’s development and maintain the status quo at minimal risk.
Success in this scenario requires a weird combination of USG prescience with self-destructiveness: enough foresight to see what’s coming, but paired with a weird compulsion to race to build the very thing that puts its existence at risk, when it would potentially be vastly easier to spearhead an international alliance to prohibit this technology.
Responding to Matt Reardon’s point on the EA Forum:
Leopold’s implicit response as I see it:
Convincing all stakeholders of high p(doom) such that they take decisive, coordinated action is wildly improbable (“step 1: get everyone to agree with me” is the foundation of many terrible plans and almost no good ones)
Still improbable, but less wildly, is the idea that we can steer institutions towards sensitivity to risk on the margin and that those institutions can position themselves to solve the technical and other challenges ahead
Maybe the key insight is that both strategies walk on a knife’s edge. While Moore’s law, algorithmic improvement, and chip design hum along at some level, even a little breakdown in international willpower to enforce a pause/stop can rapidly convert to catastrophe. Spending a lot of effort to get that consensus also has high opportunity cost in terms of steering institutions in the world where the effort fails (and it is very likely to fail). [...]
Three high-level reasons I think Leopold’s plan looks a lot less workable:
It requires major scientific breakthroughs to occur on a very short time horizon, including unknown breakthroughs that will manifest to solve problems we don’t understand or know about today.
These breakthroughs need to come in a field that has not been particularly productive or fast in the past. (Indeed, forecasters have been surprised by how slowly safety/robustness/etc. have progressed in recent years, and simultaneously surprised by the breakneck speed of capabilities.)
It requires extremely precise and correct behavior by a giant government bureaucracy that includes many staff who won’t be the best and brightest in the field — inevitably, many technical and nontechnical people in the bureaucracy will have wrong beliefs about AGI and about alignment.
The “extremely precise and correct behavior” part means that we’re effectively hoping to be handed an excellent bureaucracy that will rapidly and competently solve a thirty-digit combination lock requiring the invention of multiple new fields and the solving of a variety of thorny and poorly-understood technical problems — in many cases, on Leopold’s view, in a space of months or weeks. This seems… not like how the real world works.
It also separately requires that various guesses about the background empirical facts all pan out. Leopold can do literally everything right and get the USG fully on board and get the USG doing literally everything correctly by his lights — and then the plan ends up destroying the world rather than saving it because it just happened to turn out that ASI was a lot more compute-efficient to train than he expected, resulting in the USG being unable to monopolize the tech and unable to achieve a sufficiently long lead time.
My proposal doesn’t require qualitatively that kind of success. It requires governments to coordinate on banning things. Plausibly, it requires governments to overreact to a weird, scary, and publicly controversial new tech to some degree, since it’s unlikely that governments will exactly hit the target we want. This is not a particularly weird ask; governments ban things (and coordinate or copy-paste each other’s laws) all the time, in far less dangerous and fraught areas than AGI. This is “trying to get the international order to lean hard in a particular direction on a yes-or-no question where there’s already a lot of energy behind choosing ‘no’”, not “solving a long list of hard science and engineering problems in a matter of months and getting a bureaucracy to output the correct long string of digits to nail down all the correct technical solutions and all the correct processes to find those solutions”.
The CCP’s current appetite for AGI seems remarkably small, and I expect them to be more worried that an AGI race would leave them in the dust (and/or put their regime at risk, and/or put their lives at risk), than excited about the opportunity such a race provides. Governments around the world currently, to the best of my knowledge, are nowhere near advancing any frontiers in ML.
From my perspective, Leopold is imagining a future problem into being (“all of this changes”) and then trying to find a galaxy-brained incredibly complex and assumption-laden way to wriggle out of this imagined future dilemma, when the far easier and less risky path would be to not have the world powers race in the first place, have them recognize that this technology is lethally dangerous (something the USG chain of command, at least, would need to fully internalize on Leopold’s plan too), and have them block private labs from sending us over the precipice (again, something Leopold assumes will happen) while not choosing to take on the risk of destroying themselves (nor permitting other world powers to unilaterally impose that risk).
(Though he also has an incentive to not die.)
As is typical for Twitter, we also signal-boosted a lot of other people’s takes. Some non-MIRI people whose social media takes I’ve recently liked include Wei Dai, Daniel Kokotajlo, Jeffrey Ladish, Patrick McKenzie, Zvi Mowshowitz, Kelsey Piper, and Liron Shapira.
Yep, this counts! :)