I like this post, but I think it’s somewhat misleading to call your scenario a “slow takeoff”. To my mind, a “slow takeoff” evokes an image from non-singularitarian science fiction where you have human-level robots running around and they’ve been running around for decades if not centuries: that is, a very slow and gradual development that gives society and institutions plenty of time to adapt. But your version is clearly not this, since you are talking on the timescale of a few years, and note yourself that time will be of the essence even with a “slow” takeoff:
If takeoff is slow: it will become quite obvious that AI is going to transform the world well before we kill ourselves, we will have some time to experiment with different approaches to safety, policy-makers will have time to understand and respond to AI, etc. But this process will take place over only a few years, and the world will be changing very quickly, so we could easily drop the ball unless we prepare in advance.
You also note that much of the safety community seems to believe in a fast takeoff, in disagreement with you. I don’t know whether you’re including me there, but I’ve previously talked about a takeoff on the scale of a few years being a fast one, since to me a “fast takeoff” is one where there’s little time for existing institutions to prepare and respond adequately, and a few years still seems short enough to meet that criteria.
I’d prefer to use some term like “moderate takeoff” for the scenario that you’re talking about.
[edit: sorry if I seem like I’m piling on with the terminology, I first wrote this comment and only then read the other comments and saw that they ~all brought up the same thing.]
In Superintelligence, Bostrom defines a “takeoff” as the transition “from human-level intelligence to superintelligence.” This seems like a poor definition for several reasons: “human-level intelligence” is a bad concept, and “superintelligence” is ambiguous in Bostrom’s book between “strong superintelligence” (“a level of intelligence vastly greater than contemporary humanity’s combined intellectual wherewithal”) and “[weak?] superintelligence” (“greatly [exceeding] the cognitive performance of [individual?] humans in virtually all domains of interest”).
Moreover, neither of these thresholds is strategically important/relevant: “superintelligence” is too high and anthropocentric a bar for talking about seed AGI, and is too low a bar for talking about decisive strategic advantage; whereas “strong superintelligence” is just a really weird/arbitrary/confusing bar when the thing we care about is DSA.
More relevant thresholds on my view are things like “is it an AGI yet? can it, e.g., match the technical abilities of an average human engineer in at least one rich, messy real-world scientific area?” and “is it strong enough to prevent any competing AGI systems from being deployed in the future?”
All of this is to say that “takeoff” in Bostrom’s sense may not be the most helpful term. That said, Bostrom defines a fast takeoff as one that takes “minutes, hours, or days,” a moderate takeoff as one that takes “months or years,” and a slow takeoff as one that takes “decades or centuries.”
A further problem with applying these definitions to the present discussion is that the question Grace/Hanson/Christiano care about is often “how well can humanity, or human institutions, or competing AI projects, keep up with an AGI project?”, but Bostrom’s definitions of “superintelligence” are unclear about whether they’re assuming some static threshold (e.g., ‘capability of humans in 2014’ or ‘capability of human when the first AGI begins training’) versus a moving threshold that can cause an AGI to fall short of “superintelligence” because other actors are keeping pace.
The place to situate the disagreement for mainstream skeptics of what Eliezer calls “rapid capability gain” might be something like: “Once we have AGI, is it more likely to take 2 subjective years to blow past human scientific reasoning in the way AlphaZero blew past human chess reasoning, or 10 subjective years?” I often phrase the MIRI position along the lines of “AGI destroys or saves the world within 5 years of being developed”.
That’s just talking in terms of widely held views in the field, though. I think that e.g. MIRI/Christiano disagreements are less about whether “months” versus “years” is the right timeframe, and more about things like: “Before we get AGI, will we have proto-AGI that’s nearly as good as AGI in all strategically relevant capabilities?” And the MIRI/Hanson disagreements are maybe less about months vs. years and more about whether AGI will be a discrete software product invented at a particular time and place at all.
I think of “extrapolation of historical trends” (with the last 40 years of slowing growth as an aberration) as the prototypical slow takeoff view. For example, this was clear in Robin and Eliezer’s Jane Street debate on the topic (though they debated concentration rather than speed), and seems like what’s implicit in Superintelligence.
At any rate, it seems like fast-takeoff proponents’ visualizations of the world normally involve powerful AI emerging in a world that hasn’t doubled over the last 4 years, so this is at least a disagreement if not what they mean by fast takeoff.
ETA: I agree it’s important to be clear what we are talking about though and that if slow takeoff invokes the wrong image then I should say something else.
My sense was “slow takeoff” usually means “measured in decades”, moderate takeoff means “measured in years”, and fast takeoff means… “measured in something faster than years.”
Might avoid confusion if we actually just said a “decades-measured takeoff” vs “years-measured”, etc
If the economy doubles in 16, then 8, then 4, then 2 years, is that takeoff measured in decades or years?
(Definitions like this make more sense to people who imagine AGI as a discrete thing, and make less sense to people who just imagine AGI progress as an ongoing process with impacts increasing over time at an accelerating pace.)
[Edit: I think Robby’s answer upthread is a more complete version of what I’m saying here, not sure if I’m adding anything]
In my mind, the time interval between “the existence of the first artificial process capable of all human feats, and/or specifically the feats of ‘general consequentialism and AGI design’” and “the existence of an artificial process that is overwhelmingly superior at all human cognitive feats.”
I can imagine this feeling like the wrong frame to focus on given some suppositions, but doesn’t seem especially dependent on AGI being discrete.
On my view there are months rather than years between “better than human at everything” and “singularity,” but I’m not sure that’s a particularly relevant interval. For example, I expect both of them to happen after you’ve already died if you didn’t solve AI alignment, so that interval doesn’t affect strategic questions about AI alignment.
For example, I expect both of them to happen after you’ve already died if you didn’t solve AI alignment, so that interval doesn’t affect strategic questions about AI alignment.
Ah, gotcha that wasn’t clear to me, and further reframes the disagreement to me pretty considerably (and your position as I understand it makes more sense to me now). Will think on that.
(You had said “crazy things are happening” but I assumed this was “the sort of crazy thing where you can’t predict what will happen” vs “the crazy thing where most humans are dead)
I’m actually fairly curious what you consider some plausible scenarios in which I might be dead before overwhelmingly superior intelligence is at play.
Hmm, mulling over this a bit more. (spends 20 minutes)
Two tldrs:
tldr#1: clarifying question for Paul: Do you see a strong distinction between a growth in capabilities shaped like a hyperbolic hockey stick, and a discontinuitous one? (I don’t currently see that strong a distinction between them)
tldr#2: A world that seems most likely to me that seems less likely to be “takeoff like” (or at least moves me most towards looking at other ways to think about it) is a world where we get a process that can design better AGI (which may or may not be an AGI), but does not have general consequentialism/arbitrary learning.
More meandering background thoughts, not sure if legible or persuasive because it’s 4am.
Robby:
I think that e.g. MIRI/Christiano disagreements are less about whether “months” versus “years” is the right timeframe, and more about things like: “Before we get AGI, will we have proto-AGI that’s nearly as good as AGI in all strategically relevant capabilities?”
Assuming that’s accurate, looking at it a second time crystallized some things for me.
And also Robby’s description of “what seems strategically relevant”:
More relevant thresholds on my view are things like “is it an AGI yet? can it, e.g., match the technical abilities of an average human engineer in at least one rich, messy real-world scientific area?” and “is it strong enough to prevent any competing AGI systems from being deployed in the future?”
I’m assuming the “match technical abilities” thing is referencing something like “the beginning of a takeoff” (or at least something that 2012 Bostrom would have called a takeoff?) and the “prevent competitors” is the equivalent “takeoff is complete, for most intents and purposes.”
I agree with those being better thresholds than “human” and “superhuman”
But looking at the nuts and bolts of what might cause those thresholds, the feats that seem most likely produce a sharp takeoff (“sharp” meaning the rate of change increases after these capabilities exist in the world. I’m not sure if this is meaningfully distinct from a hyperbolic curve.)
general consequentialist behavior
arbitrary learning capability (possibly by spinning up subsystems that learn for it, don’t think that distinction matters much)
ability to do AGI design
(not sure if #2 can be meaningfully split from #1 or not, and doubt they would be in practice)
These three are the combo that seem, to me, better modeled as something different from “the economy just doing it’s thing, but acceleratingly”.
And one range of things-that-could-happen is “do we get #1, #2 and #3 together? what happens if we just get #1 or #2? What happens if we just get #3?”
If we get #1, and it’s allowed to run unfettered, I expect that process would try to gain properties #2 and #3.
But upon reflection, a world where we get property #3 without 1 and 2 seems fairly qualitatively different, and is the world that looks, to me, more like “progress accelerates but looks more like various organizations building things in a way best modeled as an accelerating economy.
These three are the combo that seem, to me, better modeled as something different from “the economy just doing it’s thing, but acceleratingly”.
I don’t see this.
And why is “arbitrary learning capacity” a discrete thing? I’d think the important thing is that future systems will learn radically faster than current systems and be able to learn more complex things, but still won’t learn infinitely faster or be able to learn arbitrarily complex things (in the same ways that humans can’t). Why wouldn’t these parameters increase gradually?
A thought: you’ve been using the phrase “slow takeoff” to distinguish your model vs the MIRI-ish model, but I think the relevant phrase is more like “smooth takeoff vs sharp takeoff” (where the shape of the curve changes at some point)
But, your other comment + Robby’s has me convinced that the key disagreement doesn’t have anything to do with smooth vs sharp takeoff either. Just happens to be a point of disagreement without being an important.
Not sure if this is part of the confusion/disagreement, but by “arbitrary” I mean “able to learn ‘anything’” as opposed to “able to learn everything arbitrarily fast/well.” (i.e. instead of systems tailored to learn specific things like we have today, a system that can look at the domains that it might want to learn, choose which of those domains are most strategically relevant, and then learn whichever ones seem highest priority)
(The thing clearly needs to be better than a chimp at general purpose learning, it’s not obvious to me if it needs any particular equivalent IQ for this to start changing the nature of technological progress, but probably needs to be at least equivalent IQ 80 and maybe IQ 100 at least in some domains before it transitions from ‘cute science fair project’ to ‘industry-relevant’)
I like this post, but I think it’s somewhat misleading to call your scenario a “slow takeoff”. To my mind, a “slow takeoff” evokes an image from non-singularitarian science fiction where you have human-level robots running around and they’ve been running around for decades if not centuries: that is, a very slow and gradual development that gives society and institutions plenty of time to adapt. But your version is clearly not this, since you are talking on the timescale of a few years, and note yourself that time will be of the essence even with a “slow” takeoff:
You also note that much of the safety community seems to believe in a fast takeoff, in disagreement with you. I don’t know whether you’re including me there, but I’ve previously talked about a takeoff on the scale of a few years being a fast one, since to me a “fast takeoff” is one where there’s little time for existing institutions to prepare and respond adequately, and a few years still seems short enough to meet that criteria.
I’d prefer to use some term like “moderate takeoff” for the scenario that you’re talking about.
[edit: sorry if I seem like I’m piling on with the terminology, I first wrote this comment and only then read the other comments and saw that they ~all brought up the same thing.]
In Superintelligence, Bostrom defines a “takeoff” as the transition “from human-level intelligence to superintelligence.” This seems like a poor definition for several reasons: “human-level intelligence” is a bad concept, and “superintelligence” is ambiguous in Bostrom’s book between “strong superintelligence” (“a level of intelligence vastly greater than contemporary humanity’s combined intellectual wherewithal”) and “[weak?] superintelligence” (“greatly [exceeding] the cognitive performance of [individual?] humans in virtually all domains of interest”).
Moreover, neither of these thresholds is strategically important/relevant: “superintelligence” is too high and anthropocentric a bar for talking about seed AGI, and is too low a bar for talking about decisive strategic advantage; whereas “strong superintelligence” is just a really weird/arbitrary/confusing bar when the thing we care about is DSA.
More relevant thresholds on my view are things like “is it an AGI yet? can it, e.g., match the technical abilities of an average human engineer in at least one rich, messy real-world scientific area?” and “is it strong enough to prevent any competing AGI systems from being deployed in the future?”
All of this is to say that “takeoff” in Bostrom’s sense may not be the most helpful term. That said, Bostrom defines a fast takeoff as one that takes “minutes, hours, or days,” a moderate takeoff as one that takes “months or years,” and a slow takeoff as one that takes “decades or centuries.”
A further problem with applying these definitions to the present discussion is that the question Grace/Hanson/Christiano care about is often “how well can humanity, or human institutions, or competing AI projects, keep up with an AGI project?”, but Bostrom’s definitions of “superintelligence” are unclear about whether they’re assuming some static threshold (e.g., ‘capability of humans in 2014’ or ‘capability of human when the first AGI begins training’) versus a moving threshold that can cause an AGI to fall short of “superintelligence” because other actors are keeping pace.
The place to situate the disagreement for mainstream skeptics of what Eliezer calls “rapid capability gain” might be something like: “Once we have AGI, is it more likely to take 2 subjective years to blow past human scientific reasoning in the way AlphaZero blew past human chess reasoning, or 10 subjective years?” I often phrase the MIRI position along the lines of “AGI destroys or saves the world within 5 years of being developed”.
That’s just talking in terms of widely held views in the field, though. I think that e.g. MIRI/Christiano disagreements are less about whether “months” versus “years” is the right timeframe, and more about things like: “Before we get AGI, will we have proto-AGI that’s nearly as good as AGI in all strategically relevant capabilities?” And the MIRI/Hanson disagreements are maybe less about months vs. years and more about whether AGI will be a discrete software product invented at a particular time and place at all.
I tend to agree with Robin that AGI won’t be a discrete product, though that’s much less confident.
I think of “extrapolation of historical trends” (with the last 40 years of slowing growth as an aberration) as the prototypical slow takeoff view. For example, this was clear in Robin and Eliezer’s Jane Street debate on the topic (though they debated concentration rather than speed), and seems like what’s implicit in Superintelligence.
At any rate, it seems like fast-takeoff proponents’ visualizations of the world normally involve powerful AI emerging in a world that hasn’t doubled over the last 4 years, so this is at least a disagreement if not what they mean by fast takeoff.
ETA: I agree it’s important to be clear what we are talking about though and that if slow takeoff invokes the wrong image then I should say something else.
My sense was “slow takeoff” usually means “measured in decades”, moderate takeoff means “measured in years”, and fast takeoff means… “measured in something faster than years.”
Might avoid confusion if we actually just said a “decades-measured takeoff” vs “years-measured”, etc
But what is being measured in decades?
If the economy doubles in 16, then 8, then 4, then 2 years, is that takeoff measured in decades or years?
(Definitions like this make more sense to people who imagine AGI as a discrete thing, and make less sense to people who just imagine AGI progress as an ongoing process with impacts increasing over time at an accelerating pace.)
[Edit: I think Robby’s answer upthread is a more complete version of what I’m saying here, not sure if I’m adding anything]
In my mind, the time interval between “the existence of the first artificial process capable of all human feats, and/or specifically the feats of ‘general consequentialism and AGI design’” and “the existence of an artificial process that is overwhelmingly superior at all human cognitive feats.”
I can imagine this feeling like the wrong frame to focus on given some suppositions, but doesn’t seem especially dependent on AGI being discrete.
On my view there are months rather than years between “better than human at everything” and “singularity,” but I’m not sure that’s a particularly relevant interval. For example, I expect both of them to happen after you’ve already died if you didn’t solve AI alignment, so that interval doesn’t affect strategic questions about AI alignment.
Ah, gotcha that wasn’t clear to me, and further reframes the disagreement to me pretty considerably (and your position as I understand it makes more sense to me now). Will think on that.
(You had said “crazy things are happening” but I assumed this was “the sort of crazy thing where you can’t predict what will happen” vs “the crazy thing where most humans are dead)
I’m actually fairly curious what you consider some plausible scenarios in which I might be dead before overwhelmingly superior intelligence is at play.
Hmm, mulling over this a bit more. (spends 20 minutes)
Two tldrs:
tldr#1: clarifying question for Paul: Do you see a strong distinction between a growth in capabilities shaped like a hyperbolic hockey stick, and a discontinuitous one? (I don’t currently see that strong a distinction between them)
tldr#2: A world that seems most likely to me that seems less likely to be “takeoff like” (or at least moves me most towards looking at other ways to think about it) is a world where we get a process that can design better AGI (which may or may not be an AGI), but does not have general consequentialism/arbitrary learning.
More meandering background thoughts, not sure if legible or persuasive because it’s 4am.
Robby:
Assuming that’s accurate, looking at it a second time crystallized some things for me.
And also Robby’s description of “what seems strategically relevant”:
I’m assuming the “match technical abilities” thing is referencing something like “the beginning of a takeoff” (or at least something that 2012 Bostrom would have called a takeoff?) and the “prevent competitors” is the equivalent “takeoff is complete, for most intents and purposes.”
I agree with those being better thresholds than “human” and “superhuman”
But looking at the nuts and bolts of what might cause those thresholds, the feats that seem most likely produce a sharp takeoff (“sharp” meaning the rate of change increases after these capabilities exist in the world. I’m not sure if this is meaningfully distinct from a hyperbolic curve.)
general consequentialist behavior
arbitrary learning capability (possibly by spinning up subsystems that learn for it, don’t think that distinction matters much)
ability to do AGI design
(not sure if #2 can be meaningfully split from #1 or not, and doubt they would be in practice)
These three are the combo that seem, to me, better modeled as something different from “the economy just doing it’s thing, but acceleratingly”.
And one range of things-that-could-happen is “do we get #1, #2 and #3 together? what happens if we just get #1 or #2? What happens if we just get #3?”
If we get #1, and it’s allowed to run unfettered, I expect that process would try to gain properties #2 and #3.
But upon reflection, a world where we get property #3 without 1 and 2 seems fairly qualitatively different, and is the world that looks, to me, more like “progress accelerates but looks more like various organizations building things in a way best modeled as an accelerating economy.
I don’t see this.
And why is “arbitrary learning capacity” a discrete thing? I’d think the important thing is that future systems will learn radically faster than current systems and be able to learn more complex things, but still won’t learn infinitely faster or be able to learn arbitrarily complex things (in the same ways that humans can’t). Why wouldn’t these parameters increase gradually?
A thought: you’ve been using the phrase “slow takeoff” to distinguish your model vs the MIRI-ish model, but I think the relevant phrase is more like “smooth takeoff vs sharp takeoff” (where the shape of the curve changes at some point)
But, your other comment + Robby’s has me convinced that the key disagreement doesn’t have anything to do with smooth vs sharp takeoff either. Just happens to be a point of disagreement without being an important.
Not sure if this is part of the confusion/disagreement, but by “arbitrary” I mean “able to learn ‘anything’” as opposed to “able to learn everything arbitrarily fast/well.” (i.e. instead of systems tailored to learn specific things like we have today, a system that can look at the domains that it might want to learn, choose which of those domains are most strategically relevant, and then learn whichever ones seem highest priority)
(The thing clearly needs to be better than a chimp at general purpose learning, it’s not obvious to me if it needs any particular equivalent IQ for this to start changing the nature of technological progress, but probably needs to be at least equivalent IQ 80 and maybe IQ 100 at least in some domains before it transitions from ‘cute science fair project’ to ‘industry-relevant’)