Bostrom Goes Unheard

[Editor’s Note: This post is split off from AI #38 and only on LessWrong because I want to avoid overloading my general readers with this sort of thing at this time, and also I think it is potentially important we have a link available. I plan to link to it from there with a short summary.]

Nick Bostrom was interviewed on a wide variety of questions on UnHerd, primarily on existential risk and AI, I found it thoughtful throughout. In it, he spent the first 80% of the time talking about existential risk. Then in the last 20% he expressed the concern that it was unlikely but possible we would overshoot our concerns about AI and never build AGI at all, which would be a tragedy.

How did those who would dismiss AI risk and build AGI as fast as possible react?

About how you would expect. This is from a Marginal Revolution links post.

Tyler Cowen: Nick Bostrom no longer the Antichrist.

The next link in that post was to the GPT-infused version of Rohit Krishnan’s book about AI, entitled Creating God (should I read it?).

What exactly changed? Tyler links to an extended tweet from Jordan Chase-Young, mostly a transcript from the video, with a short introduction.

Jordan Chase-Young: FINALLY: AI x-risker Nick Bostrom regrets focusing on AI risk, now worries that our fearful herd mentality will drive us to crush AI and destroy our future potential. (from an UnHerd podcast today).

In other words, Nick Bostrom previously focused on the fact that AI might kill everyone, thought that was bad actually, and attempted to prevent it. But now the claim is that Bostrom regrets this—he repented.

The context is that Peter Thiel, who warns that those warning about existential risk have gone crazy, has previously on multiple occasions referred seemingly without irony to Nick Bostrom as the Antichrist. So perhaps now Peter and others who agree will revise their views? And indeed, there was much ‘one of us’ talk.

Frequently those who warn of existential risk from AI are told they are saying something religious, are part of a cult, or are pattern matching to the Christian apocalypse, usually as justification for dismissing our concerns without argument.

The recent exception on the other side that proves the rule was Byrne Hobart, author of the excellent blog The Diff, who unlike most concerned about existential risk is explicitly religious and gave a talk about this at a religious conference. Then Dr. Jonathan Askonas, who gave a talk as well, notes he is an optimist skeptical of AI existential risk, and also draws the parallels, and talks about ‘the rationality of the Antichrist’s agenda.’

Note who actually uses such language, and both the symmetries and asymmetries.

Was Jordan’s statement a fair description of what was said by Bostrom?

Mu. Both yes and no would be misleading answers.

His statement is constructed so as to imply something stronger than is present. I would not go so far as to call it ‘lying’ but I understand why so many responses labeled it that. I would instead call the description highly misleading, especially in light of the rest of the podcast and sensible outside context. But yes, Under the rules of Bounded Distrust, this is a legal move one can make, based on the text quoted. You are allowed to be this level of misleading. And I thank him for providing the extended transcript.

Similarly and reacting to Jordan, here is Louis Anslow saying Bostrom has ‘broken ranks,’ and otherwise doing his best to provide a maximally sensationalist reading (scare words in bold red!) while staying within the Bounded Distrust rules. Who are the fearmongers, again?

Jordan Chase-Young then quotes at length from the interview, bold is his everywhere.

To avoid any confusion, and because it was a thoughtful discussion worth reading, I will quote the whole section he quoted, and recommend those interested read (or listen to) the whole thing.

I will also note that the chosen title for this talk, ‘Nick Bostrom: How AI Will Lead to Tyranny,’ seems to go off the rails in the other direction and very much not a central description, while again also being something he does mention as a possibility. There is a section where Bostrom discusses that AI could enshrine permanent power structures including a tyranny, and make surveillance more effective, but he is not saying it will lead to tyranny nor is that discussion central to the interview.

Headlines are often atrocious even when the article or discussion is quite good.

What Bostrom Centrally Said Was Mostly Not New or Controversial

Nick Bostrom says four central things in the quoted text.

Never building AGI would be a tragic.
There is a now a chance, although unlikely, that we might overshoot, and do that.
He regrets his focus exclusively on the risk side, although I am confused why.
He notes we face potential existential risks from biology or nanotech, which AGI could help prevent.

Only the third point is not common among the people I know concerned about existential risk. As stated by Bostrom I think you’d get near universal agreement on #4 - I’d go so far as to say that those who don’t agree aren’t thinking reasonably about this. #1 and #4 are periodically affirmed by for example Eliezer Yudkowsky and the other primaries at MIRI, and I can’t actually think of anyone who explicitly disagrees with either proposition. In case I haven’t done so recently enough, I affirm them.

The point #2 is a matter of probability. Bostrom would never have said 0% here, nor would anyone else thinking clearly, although you might think—and I very much do think—that if the choices are ‘never build a machine in the image of a human mind’ and ‘chances are very high that soon never will a human again exist in this universe’ I am going to pick Box A.

Indeed, until very recently, talk was more ‘why are you trying to prevent us from building AGI as quickly as possible, that’s impossible,’ which is a strictly easier task than never building it ever, and people like Yudkowsky going ‘yeah, looks very hard, we look pretty doomed on that one, but going to try anyway.’ Many, including Tyler Cowen, have essentially argued that preventing AGI for very long is impossible, the incentives work too strongly against it.

And I continue to think that is a highly reasonable position, that might well be true. Getting everyone to pause seems incredibly hard, and maintaining that indefinitely seems also incredibly hard. But yes, some probability. More than 1%. If we presume that AGI is not so technically difficult, and that we don’t otherwise blow up our civilization for at least a few decades, I’d say less than 10%.

Responses Confirming Many Concerned About Existential Risk Mostly Agree

Here’s one response.

Rob Bensinger (MIRI): I disagree with Bostrom on some points here, but if you think x-risk people want humanity to literally never build AGI, you just haven’t been paying attention. The risk of overshooting is real; I just think it’s far smaller than the risk of undershooting.
Nuanced views on this are in fact possible, and in fact were the norm in public AI x-risk discourse until, I’d say, the past year or two. Bostrom and Yudkowsky are holdovers from a more precise culture that doesn’t ground its world-view in zingers, political slogans, and memes.
Rob Bensinger (quoting himself from June 25, 2022): Also, if it helps, I’m happy to say that MIRI leadership thinks “humanity never builds AGI” would be the worst catastrophe in history, would cost nearly all of the future’s value, and is basically just unacceptably bad as an option.

Here’s Dan Elton, who is in a similar place to Bostrom.

More than that, we highly value voicing disagreement on such questions.

Rob Bensinger: I disagree with Bostrom’s ‘society isn’t yet worried enough, but I now worry there’s a strong chance we’ll overreact’. I think underreaction is still hugely more likely, and hugely more costly. But I’m extremely glad x-risk people are the sorts to loudly voice worries like that.
Overreaction is a possibility. In fact, when people ask me to sketch out realistic futures in which we survive, I generally describe ones where society overreacts to some degree.
Because the classic failure mode of ‘attempts to visualize a good future’ is ‘you didn’t include a realistic amount of messiness and fuck-ups in your visualization’. Realistic futures don’t go exactly the way you’d want them to, even if you managed to avoid catastrophe.
And ‘society is over-cautious and takes way too long to build AGI’ is exactly the sort of Bad Thing we should expect to happen in a realistic world where we somehow don’t kill ourselves.
(Non-catastrophic) overreaction is a realistic way for things to moderately suck, if we somehow avoid the two worst outcomes. (Those being “rush to ASI and destroy ourselves” and “never build ASI at all in humanity’s entire future history”. Both are catastrophically bad.)
I think Bostrom’s quite wrong in this case, and am curious about what disagreements are upstream of that. (Does he think hardware and software are that easy to regulate? Does he think ASI risk isn’t that high?)
But it’s obvious that trying to mobilize policymakers and the general public to respond sanely to a fast-changing highly technical field, is a bit of a last-resort desperation move, and what happens next is going to be very hard to predict or steer.
Good intentions aren’t enough. We can have the best arguments in the world, and still see it all go wrong once implementation depends on policymakers all over the world doing the right thing.
By being the sort of community that’s willing to talk about these risks, however, and give voice to them even when it’s not politically convenient, we cut off a lot of the paths by which good intentions can result in things going awry.
We’re making it the case that this train has brakes. We’re making it possible to discuss whether we need to change strategy. Even if we don’t need to (as I think is the case here), it’s very important that we maintain that option. The future is, after all, hard to predict.

To those who would weaponize such statements as Bostrom’s, rather than join into dialogue with them, I would say: You are not making this easy.

My guess is that the crux of the disagreement between Bostrom and Bensinger, in which I mostly agree with Bensinger, is a disagreement about the necessary level of concern to get the proper precautions actually taken. Bostrom says somewhat higher, Bensinger would say much higher and much more precise. This is based most importantly on differences in how hard it will be to actually stop AI development, secondarily on Bensinger having a very high p(doom | AGI soon).

There is also a disagreement I model as being caused by Bostrom being a philosopher used to thinking in terms of lock-in and permanent equilibria—he thinks we might lock ourselves into no AGI ever through fear, and it could well stick indefinitely. I see a lot of similar cultural lock-in arguments in other longtermist philosophy (e.g. Toby Ord and Will MacAskill) and I am skeptical of such long-term path dependence more generally. It also seems likely Bostrom thinks we are ‘on the clock’ more than Bensinger does, due to other existential risks and the danger of civilizational collapse. This is a reason to be more willing to risk undershoot to prevent overshoot.

I also think that Bensinger has gotten a sense that Bostrom’s update is much bigger than it was, exactly because of the framing of this discussion. Bostrom says there is a small possibility of this kind of overshoot.

Quoted Text in Detail

First, Bostrom echoes a basic principle almost everyone agrees with, including Eliezer Yudkowsky, who has said it explicitly many times. I agree as well.

Nick Bostrom: It would be tragic if we never developed advanced artificial intelligence. I think it’s a kind of a portal through which humanity will at some point have to passage, that all the paths to really great futures ultimately lead through the development of machine superintelligence, but that this actual transition itself will be associated with major risks, and we need to be super careful to get that right.

The second thing Bostrom said is that there is a small danger we might overshoot, and indeed not create AI, and we should try to avoid that.

But I’ve started slightly worrying now, in the last year or so, that we might overshoot with this increase in attention to the risks and downsides, which I think is welcome, because before that this was neglected for decades. We could have used that time to be in a much better position now, but people didn’t. Anyway, it’s starting to get more of the attention it deserves, which is great, and it still seems unlikely, but less unlikely than it did a year ago, that we might overshoot and get to the point of a permafrost—like, some situation where AI is never developed.

I often get whiplash between the ‘AI cannot be stopped and all your attempts to do so only at most slow things down and thereby make everything worse in every way’ and ‘AI could be stopped rather easily, we are in danger of doing that if we , and that would be the biggest tragedy possible, so we need to move as fast as possible and never worry about the risks lest that happen.’

And yes, some people will switch between those statements as convenient.

FR: We need to get to a kind of Goldilocks level of feeling about AI.
NB: Yeah. I’m worrying that it’s like a big wrecking ball that you can’t really control in a fine-grained way.

The Dial of Progress, the danger that we are incapable of any nuance, here within AI. And yes, I worry about this too. Perhaps all we have, at the end of the day, is the wrecking ball. I will keep fighting for nuance. But if ultimately we must choose, and all we have is the wrecking ball, we do not have the option to not swing it. ‘Goldilocks level of feeling’ is not something our civilization does well.

Flo Read: Like a kind of AI nihilism that would come from being so afraid?
Nick Bostrom: Yeah. So stigmatized that it just becomes impossible for anybody to say anything positive about it, and then we get one of these other lock-in effects, like with the other AI tools, from surveillance and propaganda and censorship, and whatever the sort of orthodoxy is—five years from now, ten years from now, whatever—that sort of gets locked in somehow, and we then never take this next step. I think that would be very tragic.

If it’s actually never, then yes, that is tragic. But I say far less tragic than everyone dying. I once again, if forced to choose, choose Box A. To not make a machine in the image of a human mind. There is some price or level of risk that gets me to choose Box B, it is more than 2%, but it is far less than both 50% and my current estimate of the risks of going forward under the baseline scenario, should we succeed at building AGI.

Perhaps you care to speak directly into the microphone and disagree.

I strongly agree overshoot is looking a lot more possible now than months ago.

Nick Bostrom: I still think it’s unlikely, but certainly more likely than even just six or twelve months ago. If you just plot the change in public attitude and policymaker attitude, and you sort of think what’s happened in the last year—if that continues to happen the next year and the year after and the year after that, then we’ll pretty much be there as a kind of permanent ban on AI, and I think that could be very bad. I still think we need to move to a greater level of concern than we currently have, but I would want us to sort of reach the optimal level of concern and then stop there rather than just kind of continue--

So to be clear, Nick Bostrom continues to think we are insufficiently concerned now, but is worried we might have an overshoot if things go too far, as is confirmed next.

FR: We need to get to a kind of Goldilocks level of feeling about AI.
NB: Yeah. I’m worrying that it’s like a big wrecking ball that you can’t really control in a fine-grained way. People like to move in herds, and they get an idea, and then—you know how people are. I worry a little bit about it becoming a big social stampede to say negative things about AI and then it just running completely out of control and sort of destroying the future in that way instead. Then, of course, we go extinct through some other method instead, maybe synthetic biology, without even ever getting at least to roll the die with the...
FR: So, it’s sort of a ‘pick your poison’.
NB: Yeah.

Again, yes, many such cases. What changed for Bostrom is not that he did not previously believe there was enough chance we would overshoot to be worth worrying about. Now he thinks it is big enough to consider, and that a small possibility of a very bad thing is worth worrying about. Quite so.

I asked GPT-4, which said this is an expansion of his previous position from Superintelligence, adding nuance, but it does not contradict it, and could not recall any comments by Bostrom on that question at all.

FR: It just so happens that this poison might kill you or might poison you, and you just kind of have to roll the dice on it.
NB: Yes. I think there’s a bunch of stuff we could do to improve the odds on the sequence of different things and stuff like that, and we should do all of those.
FR: Being a scholar of existential risk, though, I suppose, puts you in the category or the camp of people who are often—this show being an example—asked to speak about the terrifying hypothetical futures that AI could draw us to. Do you regret that focus on risk?

To be clear, yes, he now says the third thing, he regrets the focus of work, although it does not seem from his other beliefs like he should regret it?

NB: Yeah, because I think, now—there was this deficit for decades. It was obvious—to me at least, but it should have been pretty obvious—that eventually AI was gonna succeed, and then we were gonna be confronted with this problem of, “How do we control them and what do we do with them?” and then that’s gonna be really hard and therefore risky, and that was just neglected. There were like 10,000 people building AI, but like five or something thinking about how we would control them if we actually succeeded. But now that’s changed, and this is recognized, so I think there’s less need now maybe to add more to the sort of concern bucket.
FR: The doomerist work is done, and now you can go and do other things.
NB: Yeah. It’s hard, because it’s always a wobbly thing, and different groups of people have different views, and there are still people dismissing the risks or not thinking about them. I would think the optimal level of concern is slightly greater than what we currently have, so I still think there should be more concern. It’s more dangerous than most people have realized, but I’m just starting to worry about it then kind of overshooting that, and the conclusion being, “Well, let’s wait for a thousand years before we do that,” and then, of course, it’s unlikely that our civilization would remain on-track for a thousand years, and...

That sounds like it was right to raise the level of concern in 2014, and right up until at least mid-2023? I am confused.

The Broader Podcast Context

If one listens to the full context, that which is scarce, you see a podcast whose first ~80% was almost entirely focused on Bostrom warning about various issues of existential risk from AI. The quoted text was the last ~20% of the podcast. That does not seem like someone that regretful about focusing on that issue.

Around 9:50 Bostrom notes that he still expects fast takeoff, at least relative to general expectations.

At about 12:30 he discusses the debate about open source, noting that any safeguards in open source models will be removed.

Around 14:00 he says AI will increase the power of surveillance by a central power, including over what people are thinking.

Around 17:00 he discusses the potential of AI to reinforce power structures including tyrannical ones.

Around 20:00 he introduces the alignment problem and attempts to explain it.

Around 22:00 they discuss the clash between Western liberal values and the utilitarianism one would expect in an AI briefly and Bostrom pivots back to talk more about why alignment is hard.

Around 26:00 Flo Read raises concern about powerful getting superintelligence first and taking control, then asks about military applications. Bostrom seems not to be getting through that the central threat isn’t that power would go to the wrong people.

I worry that much of the discussion was simultaneously covering a lot of basic territory, with explanations too dense and difficult for those encountering it for the first time. It was all very good, but also very rushed.

I do think this did represent a substantial shift in emphasis from this old Q&A, where his response to whether we should build an AI was ‘not any time soon’ but he does still posit ideas like the long reflection and endorse building the AI once we know how to do so safety.

A Call for Nuance

Taking it all together, it seems to me that Bostrom:

Still thinks it is important to raise concerns about existential risk from artificial intelligence, as evidenced by him continuing to do this.
Now worries we might overshoot and shut down AI too much.
Regrets that his focus was not sufficiently nuanced, that it was exclusively on existential risks and he did not mention the possibility of an overshoot.
Is now trying to provide that nuance, that we need to do that which is necessary to guard against existential risks without locking into a permanent abandonment of AI.

Which all seems great? Also highly miscategorized by all but one of the responses I saw from those who downplays existential risk. The exception was the response from Marc Andreessen, which was, and I quote in its entirety, “FFS.” Refreshingly honest and straightforward for all of its meanings. Wastes no words, ¹⁰⁄₁₀, no notes. He abides no quarter, accepts no compromise.

This all also conflates worries about existential risk with more general FUD about AI, which again is the worry that there is no room for such nuance, that one cannot differentially move one without the other. But Bostrom himself shows that this is indeed possible. Who can doubt that the world without Bostrom would have reduced existential risk concern, and proportionally far less reduction in concerns about AI in general or about mundane harms?

It is coherent to say that while we have not overshot on general levels of AI worry yet, that the natural reaction to growing capabilities, and the social and political dynamics involved, will themselves raise the concern level, and that on the margin pushing towards more concern now could be counterproductive. I presume that is Bostrom’s view.

I agree that this dynamic will push concern higher. I disagree that we are on track for sufficient concern, and I definitely disagree that we would be on such track if people stopped pushing for more concern.

I especially worry that the concern will be the wrong concerns. That we will thus take the wrong precautions, rather than the right ones. But the only way to stop that is to be loud about the right concerns, because the ‘natural’ social and political forces will be on (often real but) non-central, and thus essentially wrong, concerns.

Again, nuance is first best and I will continue to fight for that.

One way of thinking about this is, the ideal level of concern, C, is a function of the level of nuance N or quality of response Q. As N and Q go up, C goes down, both because actual concern goes down—we’re more likely to get this right—and also because we get to substitute out of blunt reactions and thus require less concern to take necessary countermeasures.

Here’s Rob Miles, nailing it.

Rob Miles: My expectation (informed by, uh, recent history) is that society will simultaneously under and over react, taking dramatic actions that don’t help much, while failing to find the will to do relatively simple things that would help a lot more.
Or to put it another way, the concept of a society under or over reacting compresses possible responses down to a single axis in a way that doesn’t capture most of what’s important. There isn’t just a single big dial labelled “reaction” that we turn up or down.

Thus, if you want to lower level of concerns and reduce attempts to increase concern? Fight with me for more and better nuance, responses that might actually work, and for strong implementation details.

The Quoted Text Continued

The fourth thing Bostrom says is that we will eventually face other existential risks, and AGI could help prevent them. No argument here, I hope everyone agrees, and that we are fully talking price.

FR: So we’re damned if we do and damned if we don’t.
NB: We will hopefully be fine either way, but I think I would like the AI before some radical biotech revolution. If you think about it, if you first get some sort of super-advanced synthetic biology, that might kill us. But if we’re lucky, we survive it. Then, maybe you invent some super-advanced molecular nanotechnology, that might kill us, but if we’re lucky we survive that. And then you do the AI. Then, maybe that will kill us, or if we’re lucky we survive that and then we get to utopia.
Well, then you have to get through sort of three separate existential risks—first the biotech risks, plus the nanotech risks, plus the AI risks, whereas if we get AI first, maybe that will kill us, but if not, we get through that, then I think that will handle the biotech and nanotech risks, and so the total amount of existential risk on that second trajectory would sort of be less than on the former.
Now, it’s more complicated than that, because we need some time to prepare for the AI, but you can start to think about sort of optimal trajectories rather than a very simplistic binary question of, “Is technology X good or bad?” We might more think, “On the margin, which ones should we try to accelerate, which ones retard?” And you get a more nuanced picture of the field of possible interventions that way, I think.

That seems reasonable. This is a good preference, and would be of greater concern if we seemed close to either of those things, and we had to, as they say above, pick our poison. And talk price. There are those (such as Scott Aaronson) who have said outright they think the price here plausibly justifies the path, that the existential risks of waiting exceed those of not waiting. I strongly disagree. I also expect for what it is worth that it will be decades, unless the risks here are enabled by AI, before I am more worried about nanotechnology or synthetic biology than about nuclear war.

My model here is that current levels of AI raise such risks of these additional dangers rather than lower them, and I expect this to continue roughly until we reach AGI, at which point things get weird and it could go either way and it mostly doesn’t much matter because we have either bigger problems or good solutions, so effectively that will lower the cost such risks dramatically.

Conclusion

Nuance is one key to our survival.

It is not sufficient to choose the ‘right level of concern about AI’ by turning the dial of progress. If we turn it too far down, we probably get ourselves killed. If we turn it too far up, it might be a long time before we ever build AGI, and we could lose out on a lot of mundane utility, face a declining economy and be vulnerable over time to other existential and catastrophic risks.

A successful middle path is only viable if we can choose interventions that are well-considered and well-targeted, that actually prevent sufficiently capable AI from being created until we know how to do so safely and navigate what comes next, while also allowing us to enjoy the benefits of mundane AI and that doesn’t shut the door to AI completely. Bostrom says such a path is unlikely but possible. It still seems to me to require some things that seem very hard to pull off, even with much larger levels of fear and hatred of AI.

Regulations often find a way to do exactly the wrong thing, as accelerationists commonly point out but also sometimes lobby and directly aim for in this case—letting frontier model development continue as fast as possible, while all regulations that do exist take away mundane utility. And there will over time be increasing temptation and ability to train a more powerful AI anyway. So what to do?

Right now, we do not even know how to stop AI even bluntly, given the coordination required to do so and all the incentives that must be overcome. We certainly have no idea how to find a middle path. What does that world even look like? No, seriously, what does that timeline look like? How does it get there, what does it do? How does it indefinitely maintain sufficient restraints on the actually dangerous AI training runs and deployments without turning everyone against AI more generally?

I don’t know. I do know that the only way to get there is to take the risks seriously, look at them on a technical level, figure out what it takes to survive acceptably often and plotting the best available a path through causal space given all the constraints. Which means facing down the actual arguments, and the real risks. We cannot get there by the right side ‘winning’ a zero-sum conflict.

There are only two ways to react to an exponential such as AI. Too early and too late.

Similarly, there are only two levels of safety precaution when preventing a catastrophe. Too little and too much. If you are not (nontrivially) risking an overreaction, and especially accusations and worries of overreaction, you know you are underreacting. The reverse is also true if the associated costs are so high as to be comparable.

In most of the worlds where we survive, if I am still writing posts at all, I will at various points be saying that many AI regulations and restrictions went too far, or were so poorly chosen as to be negative. Most exceptions are the worlds where we roll the dice in ways I consider very foolish, and then get profoundly lucky. If we get a true ‘soft landing’ and middle path, that would be amazing, but let’s not fool ourselves about how difficult it will be to get there.

It is possible that there is, in practice, no middle path. That our only three available choices, as a planet, are ‘build AGI almost as fast as possible, assume alignment is easy on the first try and that the dynamics that arise after solving alignment can be solved before catastrophe as well,’ ‘build AGI as fast as possible knowing we will likely die because AGI replacing humans is good actually’ or ‘never build a machine in the image of a human mind.’

If that is indeed the case? I believe the choice is clear.