Thoughts on Robin Hanson’s AI Impacts interview

There was already a LessWrong Post here. I started writing this as a comment there, but it got really long, so here we are! For convenience, here is the link to interview transcript and audio, in which he argues that AGI risks are modest, and that EAs spend too much time thinking about AGI. I found it very interesting and highly recommend reading /​ listening to it.

That said, I disagree with almost all of it. I’m going to list areas where my intuitions seem to differ from Robin’s, and where I’m coming from. Needless to say, I only speak for myself, I’m not super confident about any of this, and I offer this in the spirit of “brainstorming conversation” rather than “rebuttal”.

How likely is it that the transition to superhuman AGI will be overwhelmingly important for the far future?

Robin implies that the likelihood is low: “How about a book that has a whole bunch of other scenarios, one of which is AI risk which takes one chapter out of 20, and 19 other chapters on other scenarios?” I find this confusing. What are the other 19 chapter titles? See, in my mind, the main categories are that (1) technological development halts forever, or (2) AGI is overwhelmingly important for the far future, being central to everything that people and societies do (both good and bad) thereafter. I don’t immediately see any plausible scenario outside of those two categories … and of those two categories, I put most of the probability weight in (2).

I assume Robin would want one of the 20 chapters to be about whole-brain emulation (since he wrote a whole book about that), but even if whole-brain emulation happens (which I think very unlikely), I would still expect fully-artificial intelligence to be overwhelmingly important in this scenario, as soon as the emulations invent it—i.e. this would be in category 2. So anyway, if I wrote a book like that, I would spend most of the chapters talking about AGI risks, AGI opportunities, and what might happen in a post-AGI world. The rest of the chapters would include things like nuclear winter or plagues that destroy our technological civilization forever. Again, I’m curious what else Robin has in mind.

How hard is it to make progress on AGI safety now? How easy will it be in the future?

I could list off dozens of specific open research problems in AGI safety where (1) we can make real progress right now; (2) we are making real progress right now; (3) it doesn’t seem like the problems will resolve themselves, or even become substantially easier, after lots more research progress towards building AGI.

Here’s a few off the top of my head: (1) If we wind up building AGIs using methods similar to today’s deep RL, how would we ensure that they are safe and beneficial? (This is the “prosaic AGI” research program.) (2) If we wind up building AGIs using algorithms similar to the human brain’s, how would we ensure that they are safe and beneficial? (3) If we want task-limited AGIs, or norm-following AGIs, or impact-limited AGIs, or interpretable AGIs, what exactly does this mean, in terms of a specification that we can try to design to? (4) Should we be trying to build AGI agents with explicit goals, or “helper AIs”, or oracles, or “microscope AIs”, or “tool AIs”, or what? (5) If our AGIs have explicit goals, what should the goal be? (6) Max Tegmark’s book lists 12 “AI aftermath scenarios”; what post-AGI world do we want, and what AGI research, strategy, and policies will help us get there? …

Robin suggests that there will be far more work to do on AGI safety in the future, when we know what we’re building, we’re actually building it, and we have to build it right. I agree with that 100%. But I would phrase it as “even more” work to do in the future, as opposed to implying that there is not much to do right now.

How soon are high-leverage decision points?

Robin suggests that we should have a few AGI safety people on Earth, and their role should be keeping an eye on developments to learn when it’s time to start real work, and that time has not yet arrived. On the contrary, I see key, high-leverage decision points swooshing by us as we speak.

The type of AI research we do today will determine the type of AGI we wind up building tomorrow; and some AGI architectures are bound to create worse safety & coordination problems than others. The sooner we establish that a long-term research program is leading towards a problematic type of AGI, the easier it is for the world to coordinate on not proceeding in that research program. On one extreme, if this problematic research program is still decades away from fruition, then not pursuing it (in favor of a different path to AGI) seems pretty feasible, once we have a good solid argument for why it’s problematic. On the opposite extreme, if this research program has gotten all the way to working AGI code posted on GitHub, well good luck getting the whole world to agree not to run it!

How much warning will we have before AGI? How much do we need?

Lots of AGI safety questions seem hard (particularly, “How do we make an AGI that robustly does what we want it to do, even as it becomes arbitrarily capable and knowledgeable?”, and also see the list a few paragraphs above). It’s unclear what the answers will look like, indeed it’s not yet proven that solutions even exist. (After all, we only have one example of an AGI, i.e. humans, and they display all sorts of bizarre and destructive behaviors.) When we have a misbehaving AGI right in front of us, with a reproducible problem, that doesn’t mean that we will know how to fix it.

Thus, I see it as entirely possible that AIs develop gradually into more and more powerful AGIs over the course of a decade or two, and with each passing year, we see worse and worse out-of-control-AGI accidents. Each time, people have lots of ideas about what the solution is, and none of them work, or the ones that work also make the AGI less effective, and so people keep experimenting with the more powerful designs. And the accidents keep getting worse. And then some countries try to regulate AGI research, while others tell themselves that if only the AGI were even more capable, then the safety problems would resolve themselves because the AGI would understand humans better, and hey it can even help chase down and destroy those less-competent out-of-control AGIs from last year that are still self-reproducing around the internet. And the accidents get even worse still … and on and on...

(ETA: For more on this topic, see my later post On unfixably unsafe AGI architectures.)

This is the kind of thing I have in mind when I say that even a very gradual development of AGI poses catastrophic risks. (I’m not saying anything original here; this is really the standard argument that if AGI takes N years, and AGI safety research takes N+5 years, then we’re in a bad situation … I’m just trying to make that process more vivid.) Note that I gave an example focused on catastrophic accidents, but of course risk is disjunctive. In particular, in slow-takeoff scenarios, I often think about coordination problems /​ competitive pressures leading us to a post-AGI world that nobody wanted.

That said, I do also think that fast takeoff is a real possibility, i.e. that we may well get very powerful and dangerous AGI with little or no warning, as we improve learning-and-reasoning algorithms. Humans have built a lot of tools to amplify our intellectual power, and maybe “AGI code version 4” can really effectively take advantage of them, while “AGI code version 3″ can’t really get much out of them. By “tools” I am thinking of things like coding (recursive self-improvement, writing new modules, interfacing with preexisting software and code), taking in human knowledge (reading and deeply understanding books, videos, wikipedia, etc., a.k.a. “content overhang”) , computing hardware (self-reproduction /​ seizing more computing power, a.k.a. “hardware overhang”), the ability of humans to coordinate and cooperate (social manipulation, earning money, etc.) and so on. It’s hard to say how gradual the transition will be between not getting much out of these “tools” versus really being able to use them to their full potential, and don’t see why a fast transition (weeks or months) should be ruled out. In fact, I see a fast transition as reasonably likely, for inside-view reasons that I haven’t articulated and am not terribly confident about. (Further reading.) (Also relevant: Paul Christiano is well-known around here for arguing in favor of slow takeoff … but he still assigns 30% chance of fast takeoff.)

Robin had a lot of interesting arguments in favor of slow takeoff (and long timelines, see below). He offered some inside-view arguments about the nature of intelligence and AGI, which I would counter with different inside-view arguments about the nature of intelligence and AGI, but that’s beyond the scope of this post.

Robin also offered an outside-view argument, related to the statistics of citations in different fields—what fraction of papers get what fraction of citations? The statistics are interesting, but I don’t think they shed light on the questions at issue. Take the Poincaré conjecture, which for 100 years was unproven, then all of the sudden in 2002, a reclusive genius (Perelman) announced a proof. In hindsight, we can say that the theorem was proved gradually, with Perelman building on Hamilton’s ideas from the 1980s. But really, nobody knew if Hamilton’s ideas were on the right track, or how many steps away from a proof we were, until bam, a proof appeared. Likewise, no one knew how far away heavier-than-air flight was until the Wright Brothers announced that they had already done it (and indeed, people wouldn’t believe them even after their public demonstrations). Will AGI be like that? Or will it be like Linux, developing from almost-useless to super-useful very very gradually and openly? The fact that citations are widely distributed among different papers is not incompatible with the existence of occasional sudden advances from private projects like Perelman or the Wright Brothers—indeed, these citation statistics hold in math and engineering just like everything else. The citation statistics just mean that academic fields are diverse, with lots of people working on different problems using different techniques … which is something we already knew.

Timelines; Are we “crying wolf” about AGI?

Robin says he sees a lot of arguments that we should work on AGI prep because AGI is definitely coming soon, and that this is “crying wolf” that will discredit the field when AGI doesn’t come soon. My experience is different. Pretty much all the material I read advocating for AGI safety & policy, from both inside and outside the field, is scrupulously careful to say that they do not know with confidence when we’ll get AGI, and that this work is important and appropriate regardless of timelines. That doesn’t mean Robin is wrong; I presume we’re reading different things. I’m sure that people on the internet have said all kinds of crazy things about AGI. Oh well, what can you do?

It does seem to be an open secret that many of the people working full-time on AGI safety & policy assign a pretty high probability to AGI coming soon (say, within 10 or 20 years, or at least within their lifetimes, as opposed to centuries). I put myself in that category too. This is naturally to be expected from self-selection effects.

Again, I have inside-view reasons for privately believing that AGI has a reasonable chance of coming “soon” (as defined above), that I won’t get into here. I’m not sure that this belief is especially communicable, or defensible. The party line, that “nobody knows when AGI is coming”, is a lot more defensible. I am definitely willing to believe and defend the statement “nobody knows when AGI is coming” over an alternative statement “AGI is definitely not going to happen in the next 20 years”. OK, well Robin didn’t exactly say the latter statement, but he kinda gave that impression (and sorry if I’m putting words in his mouth). Anyway, I have pretty high confidence that the latter statement is unjustifiable. We even have good outside-view support for the statement “People declaring that a particular technology definitely will or won’t be developed by a particular date have a terrible track-record and should be disbelieved.” (see examples in There’s No Fire Alarm For AGI). We don’t know how many revolutionary insights lie between us and AGI, or how quickly they will come, we don’t know how many lines of code need to be written (or how many ASICs need to be spun), and how long it will take to debug. We don’t know any of these things. I’ve heard lots of prestigious domain experts talk about what steps are needed to get to AGI, and they all say different things. And they could all be wrong anyway—none of them has built an AGI! (The first viable airplane was built by the then-obscure Wright Brothers, who had better ideas than the then-most-prestigious domain experts.) Robin hasn’t built an AGI either, and neither have I. Best to be humble.