# Christiano, Cotra, and Yudkowsky on AI progress

This post is a transcript of a discussion between Paul Christiano, Ajeya Cotra, and Eliezer Yudkowsky on AGI forecasting, following up on Paul and Eliezer’s “Takeoff Speeds” discussion.

Color key:

# 9. September 21 conversation

## 9.13. GPT-n and small architectural innovations vs. large ones

• A bunch of this was frustrating to read because it seemed like Paul was yelling “we should model continuous changes!” and Eliezer was yelling “we should model discrete events!” and these were treated as counter-arguments to each other.

It seems obvious from having read about dynamical systems that continuous models still have discrete phase changes. E.g. consider boiling water. As you put in energy the temperature increases until it gets to the boiling point, at which point more energy put in doesn’t increase the temperature further (for a while), it converts more of the water to steam; after all the water is converted to steam, more energy put in increases the temperature further.

So there are discrete transitions from (a) energy put in increases water temperature to (b) energy put in converts water to steam to (c) energy put in increases steam temperature.

In the case of AI improving AI vs. humans improving AI, a simple model to make would be one where AI quality is modeled as a variable, , with the following dynamical equation:

where is the speed at which humans improve AI and is a recursive self-improvement efficiency factor. The curve transitions from a line at early times (where ) to an exponential at later times (where ). It could be approximated as a piecewise function with a linear part followed by an exponential part, which is a more-discrete approximation than the original function, which has a continuous transition between linear and exponential.

This is nowhere near an adequate model of AI progress, but it’s the sort of model that would be created in the course of a mathematically competent discourse on this subject on the way to creating an adequate model.

Dynamical systems contains many beautiful and useful concepts like basins of attraction which make sense of discrete and continuous phenomena simultaneously (i.e. there are a discrete number of basins of attraction which points fall into based on their continuous properties).

I’ve found Strogatz’s book, Nonlinear Dynamics and Chaos, helpful for explaining the basics of dynamical systems.

• I don’t really feel like anything you are saying undermines my position here, or defends the part of Eliezer’s picture I’m objecting to.

(ETA: but I agree with you that it’s the right kind of model to be talking about and is good to bring up explicitly in discussion. I think my failure to do so is mostly a failure of communication.)

I usually think about models that show the same kind of phase transition you discuss, though usually significantly more sophisticated models and moving from exponential to hyperbolic growth (you only get an exponential in your model because of the specific and somewhat implausible functional form for technology in your equation).

With humans alone I expect efficiency to double roughly every year based on the empirical returns curves, though it depends a lot on the trajectory of investment over the coming years. I’ve spent a long time thinking and talking with people about these issues.

At the point when the work is largely done by AI, I expect progress to be maybe 2x faster, so doubling every 6 months. And them from there I expect a roughly hyperbolic trajectory over successive doublings.

If takeoff is fast I still expect it to most likely be through a similar situation, where e.g. total human investment in AI R&D never grows above 1% and so at the time when takeoff occurs the AI companies are still only 1% of the economy.

• Excuse my ignorance, what does a hyperbolic function look like? If an exponential is f(x) = r^x, what is f(x) for a hyperbolic function?

• Finally a definitely of The Singularity that actually involves a mathematical singularity! Thank you.

• . It’s the solution to the differential equation instead of . I usually use it more broadly for , which is the solution to

• Why do you use this form? Do you lean more on:
1. Historical trends that look hyperbolic;
2. Specific dynamical models like: let α be the synergy between “different innovations” as they’re producing more innovations; this gives f’(x) = f(x)^(1+α) *; or another such model?;
3. Something else?

I wonder if there’s a Paul-Eliezer crux here about plausible functional forms. For example, if Eliezer thinks that there’s very likely also a tech tree of innovations that change the synergy factor α, we get something like e.g. (a lower bound of) f’(x) = f(x)^f(x). IDK if there’s any help from specific forms; just that, it’s plausible that there’s forms that are (1) pretty simple, pretty straightforward lower bounds from simple (not necessarily high confidence) considerations of the dynamics of intelligence, and (2) look pretty similar to hyperbolic growth, until they don’t, and the transition happens quickly. Though maybe, if Eliezer thinks any of this and also thinks that these superhyperbolic synergy dynamics are already going on, and we instead use a stochastic differential equation, there should be something more to say about variance or something pre-End-times.

*ETA: for example, if every innovation combines with every other existing innovation to give one unit of progress per time, we get the hyperbolic f’(x) = f(x)^2; if innovations each give one progress per time but don’t combine, we get the exponential f’(x) = f(x).

• I think there are two easy ways to get hyperbolic growth:

• As long as there is free energy in the environment, without any technological change you can grow like . Then if there is any technological progress that can be driven by your expanding physical civilization, then you get , where depends on how fast the returns to technology diminish.

• Even without physical growth, if you have sufficiently good returns to technology (as we observe for historical technologies, if you treat doubling food as doubling output, or for modern information technology) then you end up with a similar functional form.

That would feel more like “plausible guess” if we didn’t have any historical data, but given that historical growth has in fact accelerated a huge amount it seems like a solid best guess to me. There’s been a bunch of debate about whether the historical data implies something kind of like this kind of functional form, or merely implies some kind of dramatic acceleration and is consistent with this functional form. But either way, it seems like the good bet is further dramatic acceleration if we either start returning energy capture to output (via AI) or start getting overall technological progress that is similar to existing rates of progress in computer hardware and software (via AI).

• (I’m interested in which of my claims seem to dismiss or not adequately account for the possibility that continuous systems have phase changes.)

• This section seemed like an instance of you and Eliezer talking past each other in a way that wasn’t locating a mathematical model containing the features you both believed were important (e.g. things could go “whoosh” while still being continuous):

[Christiano][13:46]

Even if we just assume that your AI needs to go off in the corner and not interact with humans, there’s still a question of why the self-contained AI civilization is making ~0 progress and then all of a sudden very rapid progress

[Yudkowsky][13:46]

unfortunately a lot of what you are saying, from my perspective, has the flavor of, “but can’t you tell me about your predictions earlier on of the impact on global warming at the Homo erectus level”

you have stories about why this is like totally not a fair comparison

I do not share these stories

[Christiano][13:46]

I don’t understand either your objection nor the reductio

like, here’s how I think it works: AI systems improve gradually, including on metrics like “How long does it take them to do task X?” or “How high-quality is their output on task X?”

[Yudkowsky][13:47]

I feel like the thing we know is something like, there is a sufficiently high level where things go whooosh humans-from-hominids style

[Christiano][13:47]

We can measure the performance of AI on tasks like “Make further AI progress, without human input”

Any way I can slice the analogy, it looks like AI will get continuously better at that task

• My claim is that the timescale of AI self-improvement, at the point it takes over from humans, is the same as the previous timescale of human-driven AI improvement. If it was a lot faster, you would have seen a takeover earlier instead.

This claim is true in your model. It also seems true to me about hominids, that is I think that cultural evolution took over roughly when its timescale was comparable to the timescale for biological improvements, though Eliezer disagrees

I thought Eliezer’s comment “there is a sufficiently high level where things go whooosh humans-from-hominids style” was missing the point. I think it might have been good to offer some quantitative models at that point though I haven’t had much luck with that.

I can totally grant there are possible models for why the AI moves quickly from “much slower than humans” to “much faster than humans,” but I wanted to get some model from Eliezer to see what he had in mind.

(I find fast takeoff from various frictions more plausible, so that the question mostly becomes one about how close we are to various kinds of efficient frontiers, and where we respectively predict civilization to be adequate/​inadequate or progress to be predictable/​jumpy.)

• It seems to me that Eliezer’s model of AGI is bit like an engine, where if any important part is missing, the entire engine doesn’t move. You can move a broken steam locomotive as fast as you can push it, maybe 1km/​h. The moment you insert the missing part, the steam locomotive accelerates up to 100km/​h. Paul is asking “when does the locomotive move at 20km/​h” and Eliezer says “when the locomotive is already at full steam and accelerating to 100km/​h.” There’s no point where the locomotive is moving at 20km/​h and not accelerating, because humans can’t push it that fast, and once the engine is working, it’s already accelerating to a much faster speed.

In Paul’s model, there IS such a thing as 95% AGI, and it’s 80% or 20% or 2% as powerful on some metric we can measure, whereas in Eliezer’s model there’s no such thing as 95% AGI. The 95% AGI is like a steam engine that’s missing it’s pistons, or some critical valve, and so it doesn’t provide any motive power at all. It can move as fast as humans can push it, but it doesn’t provide any power of it’s own.

• And then Paul’s response to Eliezer is like “but engines don’t just appear without precedent, there’s worse partial versions of them beforehand, much more so if people are actually trying to do locomotion; so even if knocking out a piece of the AI that FOOMs would make it FOOM much slower, that doesn’t tell us much about the lead-up to FOOM, and doesn’t tell us that the design considerations that go into the FOOMer are particularly discontinuous with previously explored design considerations”?

• Right, and history sides with Paul. The earliest steam engines were missing key insights and so operated slowly, used their energy very inefficiently, and were limited in what they could do. The first steam engines were used as pumps, and it took a while before they were powerful enough to even move their own weight (locomotion). Each progressive invention, from Savery to Newcomen to Watt dramatically improved the efficiency of the engine, and over time engines could do more and more things, from pumping to locomotion to machining to flight. It wasn’t just one sudden innovation and now we have an engine that can do all the things including even lifting itself against the pull of Earth’s gravity. It took time, and progress on smooth metrics, before we had extremely powerful and useful engines that powered the industrial revolution. That’s why the industrial revolution(s) took hundreds of years. It wasn’t one sudden insight that made it all click.

• My main concern is that progress on the frontier tends to be bursty.

There are many metrics of AI performance on particular tasks where performance does indeed increase fairly continuously on the larger scale, but not in detail. Over the scale of many years it goes from abysmal to terrible to merely bad to nearly human to worse than human in some ways but better than human in others, and then to superhuman. Each of these transitions is often a sharp jump, but you see steady progress if you plot it on a graph. When you combine with having thousands of types of tasks, you end up with an overview of even smoother progress over the whole field.

There are three problems I’m worried about.

The first is that “designing better AIs” may turn out to be a relatively narrow task, and subject to a lot more burstiness than broad spectrum performance that could steadily increase world GDP.

The second is that for purposes of the future of humanity, only the last step from human-adjacent to strictly superhuman really matters. On the scale of intelligence for all the beings we know about, chimpanzees are very nearly human, but the economic effect of chimpanzees is essentially zero.

The third is that we are nowhere near fully exploiting the hardware we have for AI, and I expect that to continue for quite a while.

I think any two of these three are enough for a fast takeoff with little warning.

• +1 on using dynamical systems models to try to formalize the frameworks in this debate. I also give Eliezer points for trying to do something similar in Intelligence Explosion Microeconomics (and to people who have looked at this from the macro perspective).

• I feel like the biggest subjective thing is that I don’t feel like there is a “core of generality” that GPT-3 is missing

I just expect it to gracefully glide up to a human-level foom-ing intelligence

This is a place where I suspect we have a large difference of underlying models. What sort of surface-level capabilities do you, Paul, predict that we might get (or should not get) in the next 5 years from Stack More Layers? Particularly if you have an answer to anything that sounds like it’s in the style of Gwern’s questions, because I think those are the things that actually matter and which are hard to predict from trendlines and which ought to depend on somebody’s model of “what kind of generality makes it into GPT-3′s successors”.

• If you give me 1 or 10 examples of surface capabilities I’m happy to opine. If you want me to name industries or benchmarks, I’m happy to opine on rates of progress. I don’t like the game where you say “Hey, say some stuff. I’m not going to predict anything and I probably won’t engage quantitatively with it since I don’t think much about benchmarks or economic impacts or anything else that we can even talk about precisely in hindsight for GPT-3.”

I don’t even know which of Gwern’s questions you think are interesting/​meaningful. “Good meta-learning”—I don’t know what this means but if actually ask a real question I can guess. Qualitative descriptions—what is even a qualitative description of GPT-3? “Causality”—I think that’s not very meaningful and will be used to describe quantitative improvements at some level made up by the speaker. The spikes in capabilities Gwern talks about seem to be basically measurement artifacts, but if you want to describe a particular measurements I can tell you whether they will have similar artifacts. (How much economic value I can talk about, but you don’t seem interested.)

• Mostly, I think the Future is not very predictable in some ways, and this extends to, for example, it being the possible that 2022 is the year where we start Final Descent and by 2024 it’s over, because it so happened that although all the warning signs were Very Obvious In Retrospect they were not obvious in antecedent and so stuff just started happening one day. The places where I dare to extend out small tendrils of prediction are the rare exception to this rule; other times, people go about saying, “Oh, no, it definitely couldn’t start in 2022” and then I say “Starting in 2022 would not surprise me” by way of making an antiprediction that contradicts them. It may sound bold and startling to them, but from my own perspective I’m just expressing my ignorance. That’s one reason why I keep saying, if you think the world more orderly than that, why not opine on it yourself to get the Bayes points for it—why wait for me to ask you?

If you ask me to extend out a rare tendril of guessing, I might guess, for example, that it seems to me that GPT-3′s current text prediction-hence-production capabilities are sufficiently good that it seems like somewhere inside GPT-3 must be represented a level of understanding which seems like it should also suffice to, for example, translate Chinese to English or vice-versa in a way that comes out sounding like a native speaker, and being recognized as basically faithful to the original meaning. We haven’t figured out how to train this input-output behavior using loss functions, but gradient descent on stacked layers the size of GPT-3 seems to me like it ought to be able to find that functional behavior in the search space, if we knew how to apply the amounts of compute we’ve already applied using the right loss functions.

So there’s a qualitative guess at a surface capability we might see soon—but when is “soon”? I don’t know; history suggests that even what predictably happens later is extremely hard to time. There are subpredictions of the Yudkowskian imagery that you could extract from here, including such minor and perhaps-wrong but still suggestive implications like, “170B weights is probably enough for this first amazing translator, rather than it being a matter of somebody deciding to expend 1.7T (non-MoE) weights, once they figure out the underlying setup and how to apply the gradient descent” and “the architecture can potentially look like somebody Stacked More Layers and like it didn’t need key architectural changes like Yudkowsky suspects may be needed to go beyond GPT-3 in other ways” and “once things are sufficiently well understood, it will look clear in retrospect that we could’ve gotten this translation ability in 2020 if we’d spent compute the right way”.

It is, alas, nowhere written in this prophecy that we must see even more un-Paul-ish phenomena, like translation capabilities taking a sudden jump without intermediates. Nothing rules out a long wandering road to the destination of good translation in which people figure out lots of little things before they figure out a big thing, maybe to the point of nobody figuring out until 20 years later the simple trick that would’ve gotten it done in 2020, a la ReLUs vs sigmoids. Nor can I say that such a thing will happen in 2022 or 2025, because I don’t know how long it takes to figure out how to do what you clearly ought to be able to do.

I invite you to express a different take on machine translation; if it is narrower, more quantitative, more falsifiable, and doesn’t achieve this just by narrowing its focus to metrics whose connection to the further real-world consequences is itself unclear, and then it comes true, you don’t need to have explicitly bet against me to have gained more virtue points.

• I’m mostly not looking for virtue points, I’m looking for: (i) if your view is right then I get some kind of indication of that so that I can take it more seriously, (ii) if your view is wrong then you get some indication feedback to help snap you out of it.

I don’t think it’s surprising if a GPT-3 sized model can do relatively good translation. If talking about this prediction, and if you aren’t happy just predicting numbers for overall value added from machine translation, I’d kind of like to get some concrete examples of mediocre translations or concrete problems with existing NMT that you are predicting can be improved.

• It seems like Eliezer is mostly just more uncertain about the near future than you are, so it doesn’t seem like you’ll be able to find (ii) by looking at predictions for the near future.

• It seems to me like Eliezer rejects a lot of important heuristics like “things change slowly” and “most innovations aren’t big deals” and so on. One reason he may do that is because he literally doesn’t know how to operate those heuristics, and so when he applies them retroactively they seem obviously stupid. But if we actually walked through predictions in advance, I think he’d see that actual gradualists are much better predictors than he imagines.

• That seems a bit uncharitable to me. I doubt he rejects those heuristics wholesale. I’d guess that he thinks that e.g. recursive self improvement is one of those things where these heuristics don’t apply, and that this is foreseeable because of e.g. the nature of recursion. I’d love to hear more about what sort of knowledge about “operating these heuristics” you think he’s missing!

Anyway, it seems like he expects things to seem more-or-less gradual up until FOOM, so I think my original point still applies: I think his model would not be “shaken out” of his fast-takeoff view due to successful future predictions (until it’s too late).

• He says things like AlphaGo or GPT-3 being really surprising to gradualists, suggesting he thinks that gradualism only works in hindsight.

I agree that after shaking out the other disagreements, we could just end up with Eliezer saying “yeah but automating AI R&D is just fundamentally unlike all the other tasks to which we’ve applied AI” (or “AI improving AI will be fundamentally unlike automating humans improving AI”) but I don’t think that’s the core of his position right now.

• I agree we seem to have some kind of deeper disagreement here.

I think stack more layers + known training strategies (nothing clever) + simple strategies for using test-time compute (nothing clever, nothing that doesn’t use the ML as a black box) can get continuous improvements in tasks like reasoning (e.g. theorem-proving), meta-learning (e.g. learning to learn new motor skills), automating R&D (including automating executing ML experiments, or proposing new ML experiments), or basically whatever.

I think these won’t get to human level in the next 5 years. We’ll have crappy versions of all of them. So it seems like we basically have to get quantitative. If you want to talk about something we aren’t currently measuring, then that probably takes effort, and so it would probably be good if you picked some capability where you won’t just say “the Future is hard to predict.” (Though separately I expect to make somewhat better predictions than you in most of these domains.)

A plausible example is that I think it’s pretty likely that in 5 years, with mere stack more layers + known techniques (nothing clever), you can have a system which is clearly (by your+my judgment) “on track” to improve itself and eventually foom, e.g. that can propose and evaluate improvements to itself, whose ability to evaluate proposals is good enough that it will actually move in the right direction and eventually get better at the process, etc., but that it will just take a long time for it to make progress. I’d guess that it looks a lot like a dumb kid in terms of the kind of stuff it proposes and its bad judgment (but radically more focused on the task and conscientious and wise than any kid would be). Maybe I think that’s 10% unconditionally, but much higher given a serious effort. My impression is that you think this is unlikely without adding in some missing secret sauce to GPT, and that my picture is generally quite different from your criticallity-flavored model of takeoff.

• How long time do you see between “1 AI clearly on track to Foom” and “First AI to actually Foom”? My weak guess is Eliezer would say “Probably quite little time”, but your model of the world requires the GWP to double over a 4 year period, and I’m guessing that period probably starts later than 2026.

I would be surprised if by 2027, I could point to an AI that for a full year had been on track to Foom, without Foom happening.

• I think “on track to foom” is a very long way before “actually fooms.”

• and some of my sense here is that if Paul offered a portfolio bet of this kind, I might not take it myself, but EAs who were better at noticing their own surprise might say, “Wait, that’s how unpredictable Paul thinks the world is?”

If Eliezer endorses this on reflection, that would seem to suggest that Paul actually has good models about how often trend breaks happen, and that the problem-by-Eliezer’s-lights is relatively more about, either:

• that Paul’s long-term predictions do not adequately take into account his good sense of short-term trend breaks.

• that Paul’s long-term predictions are actually fine and good, but that his communication about it is somehow misleading to EAs.

That would be a very different kind of disagreement than I thought this was about. (Though actually kind-of consistent with the way that Eliezer previously didn’t quite diss Paul’s track-record, but instead dissed “the sort of person who is taken in by this essay [is the same sort of person who gets taken in by Hanson’s arguments in 2008 and gets caught flatfooted by AlphaGo and GPT-3 and AlphaFold 2]”?)

Also, none of this erases the value of putting forward the predictions mentioned in the original quote, since that would then be a good method of communicating Paul’s (supposedly miscommunicated) views.

• Apologies for my ignorance, does EA mean Effective Altruist?

• Yup. Both Effective Altruism and Effective Altruist are abbreviated as EA.

• superforecasters were claiming that AlphaGo had a 20% chance of beating Lee Se-dol and I didn’t disagree with that at the time

Good Judgment Open had the probability at 65% on March 8th 2016, with a generally stable forecast since early February (Wikipedia says that the first match was on March 9th).

Metaculus had the probability at 64% with similar stability over time. Of course, there might be another source that Eliezer is referring to, but for now I think it’s right to flag this statement as false.

• A note I want to add, if this fact-check ends up being valid:

It appears that a significant fraction of Eliezer’s argument relies on AlphaGo being surprising. But then his evidence for it being surprising seems to rest substantially on something that was misremembered. That seems important if true.

I would point to, for example, this quote, “I mean the superforecasters did already suck once in my observation, which was AlphaGo, but I did not bet against them there, I bet with them and then updated afterwards.” It seems like the lesson here, if indeed superforecasters got AlphaGo right and Eliezer got it wrong, is that we should update a little bit towards superforecasting, and against Eliezer.

• Adding my recollection of that period: some people made the relevant updates when DeepMind’s system beat the European Champion Fan Hui (in October 2015). My hazy recollection is that beating Fan Hui started some people going “Oh huh, I think this is going to happen” and then when AlphaGo beat Lee Sedol (in March 2016) everyone said “Now it is happening”.

• It seems from this Metaculus question that people indeed were surprised by the announcement of the match between Fan Hui and AlphaGo (which was disclosed in January, despite the match happening months earlier, according to Wikipedia).

It seems hard to interpret this as AlphaGo being inherently surprising though, because the relevant fact is that the question was referring only to 2016. It seems somewhat reasonable to think that even if a breakthrough is on the horizon, it won’t happen imminently with high probability.

Perhaps a better source of evidence of AlphaGo’s surprisingness comes from Nick Bostrom’s 2014 book Superintelligence in which he says, “Go-playing amateur programs have been improving at a rate of about 1 level dan/​year in recent years. If this rate of improvement continues, they might beat the human world champion in about a decade.” (Chapter 1).

This vindicates AlphaGo being an impressive discontinuity from pre-2015 progress. Though one can reasonably dispute whether superforecasters thought that the milestone was still far away after being told that Google and Facebook made big investments into it (as was the case in late 2015).

• Wow thanks for pulling that up. I’ve gotta say, having records of people’s predictions is pretty sweet. Similarly, solid find on the Bostrom quote.

Do you think that might be the 20% number that Eliezer is remembering? Eliezer, interested in whether you have a recollection of this or not. [Added: It seems from a comment upthread that EY was talking about superforecasters in Feb 2016, which is after Fan Hui.]

• My memory of the past is not great in general, but considering that I bet sums of my own money and advised others to do so, I am surprised that my memory here would be that bad, if it was.

Neither GJO nor Metaculus are restricted to only past superforecasters, as I understand it; and my recollection is that superforecasters in particular, not all participants at GJO or Metaculus, were saying in the range of 20%. Here’s an example of one such, which I have a potentially false memory of having maybe read at the time: https://​​www.gjopen.com/​​comments/​​118530

• Thanks for clarifying. That makes sense that you may have been referring to a specific subset of forecasters. I do think that some forecasters tend to be much more reliable than others (and maybe there was/​is a way to restrict to “superforecasters” in the UI).

I will add the following piece of evidence, which I don’t think counts much for or against your memory, but which still seems relevant. Metaculus shows a histogram of predictions. On the relevant question, a relatively high fraction of people put a 20% chance, but it also looks like over 80% of forecasters put higher credences.

• Some thinking-out-loud on how I’d go about looking for testable/​bettable prediction differences here...

I think my models overlap mostly with Eliezer’s in the relevant places, so I’ll use my own models as a proxy for his, and think about how to find testable/​bettable predictions with Paul (or Ajeya, or someone else in their cluster).

One historical example immediately springs to mind where something-I’d-consider-a-Paul-esque-model utterly failed predictively: the breakdown of the Philips curve. The original Philips curve was based on just fitting a curve to inflation-vs-unemployment data; Friedman and Phelps both independently came up with theoretical models for that relationship in the late sixties (’67-‘68), and Friedman correctly forecasted that the curve would break down in the next recession (i.e. the “stagflation” of ‘73-’75). This all led up to the Lucas Critique, which I’d consider the canonical case-against-what-I’d-call-Paul-esque-worldviews within economics. The main idea which seems transportable to other contexts is that surface relations (like the Philips curve) break down under distribution shifts in the underlying factors.

So, how would I look for something analogous to that situation in today’s AI? We need something with an established trend, but where a distribution shift happens in some underlying factor. One possible place to look: I’ve heard that OpenAI plans to make the next generation of GPT not actually much bigger than the previous generation; they’re trying to achieve improvement through strategies other than Stack More Layers. Assuming that’s true, it seems like a naive Paul-esque model would predict that the next GPT is relatively unimpressive compared to e.g. the GPT2 → GPT 3 delta? Whereas my models (or I’d guess Eliezer’s models) would predict that it’s relatively more impressive, compared to the expectations of Paul-esque models (derived by e.g. extrapolating previous performance as a function of model size and then plugging in actual size of the next GPT)? I wouldn’t expect either view to make crisp high-certainty predictions here, but enough to get decent Bayesian evidence.

Other than distribution shifts, the other major place I’d look for different predictions is in the extent to which aggregates tell us useful things. The post got into that in a little detail, but I think there’s probably still room there. For instance, I recently sat down and played with some toy examples of GDP growth induced by tech shifts, and I was surprised by how smooth GDP was even in scenarios with tech shifts which seemed very impactful to me. I expect that Paul would be even more surprised by this if he were to do the same exercise. In particular, this quote seems relevant:

the point is that housing and healthcare are not central examples of things that scale up at the beginning of explosive growth, regardless of whether it’s hard or soft

It is surprisingly difficult to come up with a scenario where GDP growth looks smooth AND housing+healthcare don’t grow much AND GDP growth accelerates to a rate much faster than now. If everything except housing and healthcare are getting cheaper, then housing and healthcare will likely play a much larger role in GDP (and together they’re 30-35% already), eventually dominating GDP. This isn’t a logical necessity; in principle we could consume so much more of everything else that the housing+healthcare share shrinks, but I think that would probably diverge from past trends (though I have not checked). What I actually expect is that as people get richer, they spend a larger fraction on things which have a high capacity to absorb marginal income, of which housing and healthcare are central examples.

If housing and healthcare aren’t getting cheaper, and we’re not spending a smaller fraction of income on them (by buying way way more of the things which are getting cheaper), then that puts a pretty stiff cap on how much GDP can grow.

Zooming out a meta-level, I think GDP is a particularly good example of a big aggregate metric which approximately-always looks smooth in hindsight, even when the underlying factors of interest undergo large jumps. I think Paul would probably update toward that view if he spent some time playing around with examples (similar to this post).

Similarly, I’ve heard that during training of GPT-3, while aggregate performance improves smoothly, performance on any particular task (like e.g. addition) is usually pretty binary—i.e. performance on any particular task tends to jump quickly from near-zero to near-maximum-level. Assuming this is true, presumably Paul already knows about it, and would argue that what matters-for-impact is ability at lots of different tasks rather than one (or a few) particular tasks/​kinds-of-tasks? If so, that opens up a different line of debate, about the extent to which individual humans’ success today hinges on lots of different skills vs a few, and in which areas.

• I don’t necessarily expect GPT-4 to do better on perplexity than would be predicted by a linear model fit to neuron count plus algorithmic progress over time; my guess for why they’re not scaling it bigger would be that Stack More Layers just basically stopped scaling in real output quality at the GPT-3 level. They can afford to scale up an OOM to 1.75 trillion weights, easily, given their funding, so if they’re not doing that, an obvious guess is that it’s because they’re not getting a big win from that. As for their ability to then make algorithmic progress, depends on how good their researchers are, I expect; most algorithmic tricks you try in ML won’t work, but maybe they’ve got enough people trying things to find some? But it’s hard to outpace a field that way without supergeniuses, and the modern world has forgotten how to rear those.

• While GPT-4 wouldn’t be a lot bigger than GPT-3, Sam Altman did indicate that it’d use a lot more compute. That’s consistent with Stack More Layers still working; they might just have found an even better use for compute.

(The increased compute-usage also makes me think that a Paul-esque view would allow for GPT-4 to be a lot more impressive than GPT-3, beyond just modest algorithmic improvements.)

• If they’ve found some way to put a lot more compute into GPT-4 without making the model bigger, that’s a very different—and unnerving—development.

• I believe Sam Altman implied they’re simply training a GPT-3-variant for significantly longer for “GPT-4”. The GPT-3 model in prod is nowhere near converged on its training data.

Edit: changed to be less certain, pretty sure this follows from public comments by Sam, but he has not said this exactly

• Say more about the source for this claim? I’m pretty sure he didn’t say that during the Q&A I’m sourcing my info from. And my impression is that they’re doing something more than this, both on priors (scaling laws says that optimal compute usage means you shouldn’t train to convergence — why would they start now?) and based on what he said during that Q&A.

• This is based on:

1. The Q&A you mention

2. GPT-3 not being trained on even one pass of its training dataset

3. “Use way more compute” achieving outsized gains by training longer than by most other architectural modifications for a fixed model size (while you’re correct that bigger model = faster training, you’re trading off against ease of deployment, and models much bigger than GPT-3 become increasingly difficult to serve at prod. Plus, we know it’s about the same size, from the Q&A)

4. Some experience with undertrained enormous language models underperforming relative to expectation

This is not to say that GPT-4 wont have architectural changes. Sam mentioned a longer context at the least. But these sorts of architectural changes probably qualify as “small” in the parlance of the above conversation.

• To be clear: Do you remember Sam Altman saying that “they’re simply training a GPT-3-variant for significantly longer”, or is that an inference from ~”it will use a lot more compute” and ~”it will not be much bigger”?

Because if you remember him saying that, then that contradicts my memory (and, uh, the notes that people took that I remember reading), and I’m confused.

While if it’s an inference: sure, that’s a non-crazy guess, and I take your point that smaller models are easier to deploy. I just want it to be flagged as a claimed deduction, not as a remembered statement.

(And I maintain my impression that something more is going on; especially since I remember Sam generally talking about how models might use more test-time compute in the future, and be able to think for longer on harder questions.)

• One way they could do that, is by pitting the model against modified versions of itself, like they did in OpenAI Five (for Dota).

From the minimizing-X-risk perspective, it might be the worst possible way to train AIs.

As Jeff Clune (Uber AI) put it:

[O]ne can imagine that some ways of configuring AI-GAs (i.e. ways of incentivizing progress) that would make AI-GAs more likely to succeed in producing general AI also make their value systems more dangerous. For example, some researchers might try to replicate a basic principle of Darwinian evolution: that it is ‘red in tooth and claw.’

If a researcher tried to catalyze the creation of an AI-GA by creating conditions similar to those on Earth, the results might be similar. We might thus produce an AI with human vices, such as violence, hatred, jealousy, deception, cunning, or worse, simply because those attributes make an AI more likely to survive and succeed in a particular type of competitive simulated world. Note that one might create such an unsavory AI unintentionally by not realizing that the incentive structure they defined encourages such behavior.

Additionally, if you train a language model to outsmart millions of increasingly more intelligent copies of itself, you might end up with the perfect AI-box escape artist.

• I was under the impression that GPT-4 would be gigantic, according to this quote from this Wired article:

“From talking to OpenAI, GPT-4 will be about 100 trillion parameters,” Feldman says. “That won’t be ready for several years.”

• Transcript error fixed—the line that previously read

should be

• since you disagree with them eventually, e.g. >2/​3 doom by 2030

This apparently refers to Yudkowsky’s credences, and I notice I am surprised — has Yudkowsky said this somewhere? (Edit: the answer is no, thanks for responses.)

• I think Ajeya is inferring this from Eliezer’s 2017 bet with Bryan Caplan. The bet was jokey and therefore (IMO) doesn’t deserve much weight, though Eliezer comments that it’s maybe not totally unrelated to timelines he’d reflectively endorse:

[T]he generator of this bet does not necessarily represent a strong epistemic stance on my part, which seems important to emphasize. But I suppose one might draw conclusions from the fact that, when I was humorously imagining what sort of benefit I could get from exploiting this amazing phenomenon, my System 1 thought that having the world not end before 2030 seemed like the most I could reasonably ask.

In general, my (maybe-partly-mistaken) Eliezer-model...

• thinks he knows very little about timelines (per the qualitative reasoning in There’s No Fire Alarm For AGI and in Nate’s recent post—though not necessarily endorsing Nate’s quantitative probabilities);

• and is wary of trying to turn ‘I don’t know’ into a solid, stable number for this kind of question (cf. When (Not) To Use Probabilities);

• but recognizes that his behavior at any given time, insofar as it is coherent, must reflect some implicit probabilities. Quoting Eliezer back in 2016:

[… T]imelines are the hardest part of AGI issues to forecast, by which I mean that if you ask me for a specific year, I throw up my hands and say “Not only do I not know, I make the much stronger statement that nobody else has good knowledge either.” Fermi said that positive-net-energy from nuclear power wouldn’t be possible for 50 years, two years before he oversaw the construction of the first pile of uranium bricks to go critical. The way these things work is that they look fifty years off to the slightly skeptical, and ten years later, they still look fifty years off, and then suddenly there’s a breakthrough and they look five years off, at which point they’re actually 2 to 20 years off.

If you hold a gun to my head and say “Infer your probability distribution from your own actions, you self-proclaimed Bayesian” then I think I seem to be planning for a time horizon between 8 and 40 years, but some of that because there’s very little I think I can do in less than 8 years, and, you know, if it takes longer than 40 years there’ll probably be some replanning to do anyway over that time period.

And then how *long* takeoff takes past that point is a separate issue, one that doesn’t correlate all that much to how long it took to start takeoff. [...]

• Furthermore 23 doom is straightforwardly the wrong thing to infer from the 1:1 betting odds, even taking those at face value and even before taking interest rates into account; Bryan gave me $100 which gets returned as$200 later.

(I do consider this a noteworthy example of ‘People seem systematically to make the mistake in the direction that interprets Eliezer’s stuff as more weird and extreme’ because it’s a clear arithmetical error and because I saw a recorded transcript of it apparently passing the notice of several people I considered usually epistemically strong.)

(Though it’s also easier than people expect to just not notice things; I didn’t realize at the time that Ajeya was talking about a misinterpretation of the implied odds from the Caplan bet, and thought she was just guessing my own odds at 23, and I didn’t want to argue about that because I don’t think it valuable to the world or maybe even to myself to go about arguing those exact numbers.)

• Yes, Rob is right about the inference coming from the bet and Eliezer is right that the bet was actually 1:1 odds but due to the somewhat unusual bet format I misread it as 2:1 odds.

• Maybe I’m wrong about her deriving this from the Caplan bet? Ajeya hasn’t actually confirmed that, it was just an inference I drew. I’ll poke her to double-check.

• I think the bet is a bad idea if you think in terms of Many Worlds. Say 55% of all worlds end by 2030. Then, even assuming that value-of-$-in-2017 = value-of-$-in-2030, Eliezer personally benefited from the bet. However, the epistemic result is Bryan getting prestige points in 45% of worlds, Eliezer getting prestige points in 0% of worlds.

The other problem with the bet is that, if we adjust for inflation and returns of money, the bet is positive EV for Eliezer even given P(world-ends-by-2030) << .

• (ETA: this wasn’t actually in this log but in a future part of the discussion.)

I found the elephants part of this discussion surprising. It looks to me like human brains are better than elephant brains at most things, and it’s interesting to me that Eliezer thought otherwise. This is one of the main places where I couldn’t predict what he would say.

• I also think human brains are better than elephant brains at most things—what did I say that sounded otherwise?

• Oops, this was in reference to the later part of the discussion where you disagreed with “a human in a big animal body, with brain adapted to operate that body instead of our own, would beat a big animal [without using tools]”.

• why aren’t elephants GI?

As Herculano-Houzel called it, the human brain is a remarkable, yet not extraordinary, scaled-up primate brain. It seems that our main advantage in hardware is quantitative: more cortical columns to process more reference frames to predict more stuff.

And the primate brain is mostly the same as of other mammals (which shouldn’t be surprising, as the source code is mostly the same).

And the intelligence of mammals seems to be rather general. It allows them to solve a highly diverse set of cognitive tasks, including the task of learning to navigate at the Level 5 autonomy in novel environments (which is still too hard for the most general of our AIs).

One may ask: why aren’t elephants making rockets and computers yet?

But one may ask the same question about any uncontacted human tribe.

Thus, it seems to me that the “elephants are not GI” part of the argument is incorrect. Elephants (and also chimps, dolphins etc) seem to possess a rather general but computationally capped intelligence.

• Somebody tries to measure the human brain using instruments that can only detect numbers of neurons and energy expenditure, but not detect any difference of how the fine circuitry is wired; and concludes the human brain is remarkable only in its size and not in its algorithms. You see the problem here? The failure of large dinosaurs to quickly scale is a measuring instrument that detects how their algorithms scaled with more compute (namely: poorly), while measuring the number of neurons in a human brain tells you nothing about that at all.

• Jeff Hawkins provided a rather interesting argument on the topic:

The scaling of the human brain has happened too fast to implement any deep changes in how the circuitry works. The entire scaling process was mostly done by the favorite trick of biological evolution: copy and paste existing units (in this case—cortical columns).

Jeff argues that there is no change in the basic algorithm between earlier primates and humans. It’s the same reference-frames processing algo distributed across columns. The main difference is, humans have much more columns.

I’ve found his arguments convincing for two reasons:

• his neurobiological arguments are surprisingly good (to the point of being surprisingly obvious in hindsight)

• It’s the same “just add more layers” trick we reinvented in ML

The failure of large dinosaurs to quickly scale is a measuring instrument that detects how their algorithms scaled with more compute

Are we sure about the low intelligence of dinosaurs?

Judging by the living dinos (e.g. crows), they are able to pack a chimp-like intelligence into a 0.016 kg brain.

And some of the dinos have had x60 more of it (e.g. the brain of Tyrannosaurus rex weighted about 1 kg, which is comparable to Homo erectus).

And some of the dinos have had a surprisingly large encephalization quotient, combined with bipedalism, gripping hands, forward-facing eyes, omnivorism, nest building, parental care, and living in groups (e.g. troodontids).

Maybe it was not an asteroid after all...

(Very unlikely, of course. But I find the idea rather amusing)

• One may ask: why aren’t elephants making rockets and computers yet?

But one may ask the same question about any uncontacted human tribe.

Seems more surprising for elephants, by default: elephants have apparently had similarly large brains for about 20 million years, which is far more time than uncontacted human tribes have had to build rockets. (~100x as long as anatomically modern humans have existed at all, for example.)

• I agree. Additionally, the life expectancy of elephants is significantly higher than of paleolithic humans (1, 2). Thus, individual elephants have much more time to learn stuff.

In humans, technological progress is not a given. Across different populations, it seems to be determined by the local culture, and not by neurobiological differences. For example, the ancestors of Wernher von Braun have left their technological local minimum thousands of years later than Egyptians or Chinese. And the ancestors of Sergei Korolev lived their primitive lives well into the 8th century C.E. If a Han dynasty scholar had visited the Germanic and Slavic tribes, he would’ve described them as hopeless barbarians, perhaps even as inherently predisposed to barbarism.

Maybe if we give elephants more time, they will overcome their biological limitations (limited speech, limited “hand”, fewer neurons in neocortex etc), and will escape the local minimum. But maybe not.

• I think Herculano-Houzel would want to mention that humans have 3x (iirc) more neurons in their cerebral cortex than even the elephant species with the biggest brains. Those elephants have more total neurons because their cerebellar cortices have like 200 billion neurons. Humans have more cortical neurons than any animal, including blue whales, because neuron sizes scale differently for different Orders and primates specifically scale well.

Crucially, people have thought human brains were special among primates but she makes the point that it’s the other great apes that are special in having smaller brains according to primate brain scaling laws. This is because humans either had a unique incentive to keep up with the costs of scaling or because they had a unique ability to keep up with the costs (due to e.g. cooking).

Having better algorithms that could take advantage of scale fits with her views, I think.

• I don’t know much about chess, so maybe this is wrong, but I would tend to think of Elo ratings as being more like a logarithmic scale of ability than like a linear scale of ability. In the sense that e.g. probability of winning changes exponentially with Elo difference, so a linear trend on an Elo graph translates to an exponential trend in competitiveness. “The chances of an AI solving the tasks better than a human are increasing exponentially” sounds more like fast takeoff than slow takeoff to me.

• I think everyone in the discussion expects AI progress to be at least exponentially fast. See all of Paul’s mention of hyperbolic growth — that’s faster than an exponential.

The discussion is more about continuous vs discontinuous takeoff, or centralised vs decentralised takeoff. (The slow/​fast terminology isn’t great.)

• Eliezer should have taken Cotra up on that bet about “will someone train a 10T param model before end days” considering one already exists.

• Is that one dense or sparse/​MoE? How many data points was it trained for? Does it set SOTA on anything? (I’m skeptical; I’m wondering if they only trained it for a tiny amount, for example.)