This is a transcription of Eliezer Yudkowsky responding to Paul Christiano’s Takeoff Speeds live on Sep. 14, followed by a conversation between Eliezer and Paul. This discussion took place after Eliezer’s conversation with Richard Ngo.

# 6. Follow-ups on “Takeoff Speeds”

## 6.1. Eliezer Yudkowsky’s commentary

• I stand ready to bet with Eliezer on any topic related to AI, science, or technology. I’m happy for him to pick but I suggest some types of forecast below.

If Eliezer’s predictions were roughly as good as mine (in cases where we disagree), then I would update towards taking his views more seriously. Right now it looks to me like his view makes bad predictions about lots of everyday events.

It’s possible that we won’t be able to find cases where we disagree, and perhaps that Eliezer’s model totally agrees with mine until we develop AGI. But I think that’s unlikely for a few reasons:

• I constantly see observations that seem like evidence for Eliezer’s views (e.g. any time I see an ML paper with a surprisingly large effect size, or ML labs failing to make investments in scaling, or people being surprisingly unreasonable), it’s just that I see significantly more evidence against his views. The point of making bets in advance is that it can correct for my hindsight bias or for my inability to simulate “what Eliezer’s view would say about this.” Eliezer could also say that actually all of the observations I listed aren’t evidence for his view, which would be interesting to me.

• Eliezer frequently talks smack about how the real world is surprising to fools like Paul (e.g. he talks about the “sort of person who gets taken in by Hanson’s arguments in 2008 and gets caught flatfooted by AlphaGo and GPT-3 and AlphaFold 2”,). If that’s right, then it must correspond to differences in prediction. And if Eliezer literally can’t state where he expects to make better predictions than me other than AGI then I think people should mostly ignore the bluster and he should probably cut it out.

• Eliezer frequently acknowledges “sure, lines look straight in hindsight, but that’s not how they look at the time.” But to me it looks like lines are also (mostly) straight even with foresight. How could this not correspond to some difference in prediction? I’d be happy to use historical case studies instead of predictions, but Eliezer thinks you need to make them in advance—so I’m happy to just apply my straight-line-extrapolation methodology to arbitrary near-term forecasts. I think Eliezer would prefer that I somehow make predictions and evaluate them in absolute terms rather than by comparing to Eliezer’s predictions, but that’s not what at’s issue—I think my forecasts are more accurate than Eliezer’s, not that they meet some absolute bar of quality.

• When trying to define bets, I think we get stuck at the stage where Eliezer isn’t giving probability distributions over quantitative measures, not the stage where Eliezer gives them but they’re the same as mine. My tentative guess is that Eliezer can’t predict what-Paul-would-call-a-reasonable-forecast, rather than understanding what Paul would forecast but disagreeing with it. This is related to disagreements over how to interpret the past evidence. I’m less clear on whether I can simulate Eliezer forecasts.

Anyway, I think Eliezer should probably pick a domain where he thinks his model shines. But I’m going to propose some domains where I expect to find disagreements and where I expect to beat his model, just to help get the ball rolling:

• Performance on any ML benchmark in 1, 2, 5 years. Happy to propose examples (basically taking those from existing work) in theorem proving, standard NLP, mathematical reasoning, or coding.

• Performance on any interesting real-world tasks where we can readily define the task in 1, 2, 5 years. Happy to propose examples on e.g. translation, picking robots, self-driving cars.

• Signs of impact from various kinds of AI in 2, 5, 10 years, e.g. coding, marketing copy, industrial robotics, self-driving cars, translation, whatever.

• Progress in performance or adoption for non-AI technologies, e.g. energy (solar, fission, fusion, wind…), various parts of biotech or materials science, whatever.

• Total investment in AI research of various kinds, either in the industry overall or at particular labs.

• Total valuation of AI companies, hardware companies, or whatever.

• Sizes of improvements over SOTA from ML papers in various domains.

• Relative success of different ML approaches, e.g. importance of architectural changes vs transformers, how much gradient descent will play a role in future results, meta-learning vs fine-tuning…

• Specific claims about model sizes, training costs, the role of planning, etc. in high-profile results.

I’m happy to provide more specific operationalizations and questions in any of those domains, if there are any categories where Eliezer is up for actually forecasting.

The high-level patterns that I think will generate lots of moderate lower-level disagreements:

• I expect things to be significantly more incremental and “boring.” I put smaller probabilities on trend breaks and big jumps, and I have a strong sense for many kinds of metrics that move more regularly. I think Eliezer literally can’t tell how to translate this heuristic into predictions, which is part of why I think he is going to predictably make bad predictions.

• I think I have more understanding of modern AI in particular, so I expect to make better predictions for boring reasons for anything in that space.

• I generally expect a continuing ramp-up in AI investment and effort, and for that to lead to predictable changes as the field scales.

• I have a different picture of how AI will work where AGI is not special and so won’t affect any evaluations of tasks in the near future, leading to more “boring” claims about the hardness of different tasks (though not sure this will generate disagreements within 5 years).

We might try to use operationalizations like: “In how many of these 10 quantities is there a year with 4x more change than any previous year” (h/​t holden), or “How much of the economic value of AI comes from applications whose value has more than doubled in the last year?” or “For each of these pairs of capabilities, which will happen first if at least one happens in the next 5 years?” or so on. But even if we can’t find something clever, I feel like the differences in quantitative view are stark enough that we’re just going to disagree about a bunch of numbers.

I would prefer state predictions and discuss rationales publicly, allow some informed folks to kibitz, and then revise based on people pointing out facts we don’t know, since I think that makes it cheaper to make forecasts and reduces the probability that the test is decided by specific facts rather than a general view.

• I do wish to note that we spent a fair amount of time on Discord trying to nail down what earlier points we might disagree on, before the world started to end, and these Discord logs should be going up later.

From my perspective, the basic problem is that Eliezer’s story looks a lot like “business as usual until the world starts to end sharply”, and Paul’s story looks like “things continue smoothly until their smooth growth ends the world smoothly”, and both of us have ever heard of superforecasting and both of us are liable to predict near-term initial segments by extrapolating straight lines while those are available. Another basic problem, as I’d see it, is that we tend to tell stories about very different subject matters—I care a lot less than Paul about the quantitative monetary amount invested into Intel, to the point of not really trying to develop expertise about that.

I claim that I came off better than Robin Hanson in our FOOM debate compared to the way that history went. I’d claim that my early judgments of the probable importance of AGI, at all, stood up generally better than early non-Yudkowskian EA talking about that. Other people I’ve noticed ever making correct bold predictions in this field include Demis Hassabis, for predicting that deep learning would work at all, and then for predicting that he could take the field of Go and taking it; and Dario Amodei, for predicting that brute-forcing stacking more layers would be able to produce GPT-2 and GPT-3 instead of just asymptoting and petering out. I think Paul doesn’t need to bet against me to start producing a track record like this; I think he can already start to accumulate reputation by saying what he thinks is bold and predictable about the next 5 years; and if it overlaps “things that interest Eliezer” enough for me to disagree with some of it, better yet.

I agree that it’s plausible that we both make the same predictions about the near future. I think we probably don’t, and there are plenty of disagreements about all kinds of stuff. But if in fact we agree, then in 5 years you shouldn’t say “and see how much the world looked like I said?”

It feels to me like it goes: you say AGI will look crazy. Then I say that sounds unlike the world of today. Then you say “no, the world actually always looks discontinuous in the ways I’m predicting and your model is constantly surprised by real stuff that happens, e.g. see transformers or AlphaGo” and then I say “OK, let’s bet about literally anything at all, you pick.”

I think it’s pretty likely that we actually do disagree about how much the world of today is boring and continuous, where my error theory is that you spend too much time reading papers and press releases that paint a misleading picture and just aren’t that familiar with what’s happening on the ground. So I expect if we stake out any random quantity we’ll disagree somewhat.

Most things just aren’t bold and predictable, they are modest disagreements. I’m not saying I have some deep secret about the world, just that you are wrong in this case.

• I feel a bit confused about where you think we meta-disagree here, meta-policy-wise. If you have a thesis about the sort of things I’m liable to disagree with you about, because you think you’re more familiar with the facts on the ground, can’t you write up Paul’s View of the Next Five Years and then if I disagree with it better yet, but if not, you still get to be right and collect Bayes points for the Next Five Years?

I mean, it feels to me like this should be a case similar to where, for example, I think I know more about macroeconomics than your typical EA; so if I wanted to expend the time/​stamina points, I could say a bunch of things I consider obvious and that contradict hot takes on Twitter and many EAs would go “whoa wait really” and then I could collect Bayes points later and have performed a public service, even if nobody showed up to disagree with me about that. (The reason I don’t actually do this… is that I tried; I keep trying to write a book about basic macro, only it’s the correct version explained correctly, and have a bunch of isolated chapters and unfinished drafts.) I’m also trying to write up my version of The Next Five Years assuming the world starts to end in 2025, since this is not excluded by my model; but writing in long-form requires stamina and I’ve been tired of late which is part of why I’ve been having Discord conversations instead.

I think you think there’s a particular thing I said which implies that the ball should be in my court to already know a topic where I make a different prediction from what you do, and so I should be able to state my own prediction about that topic and bet with you about that; or, alternatively, that I should retract some thing I said recently which implies that. And so, you shouldn’t need to have to do all the work to write up your forecasts generally, and it’s unfair that I’m trying to make you do all that work. Check? If so, I don’t yet see the derivation chain on this meta-level point.

I think the Hansonian viewpoint—which I consider another gradualist viewpoint, and whose effects were influential on early EA and which I think are still lingering around in EA—seemed surprised by AlphaGo and Alpha Zero, when you contrast its actual advance language with what actually happened. Inevitably, you can go back afterwards and claim it wasn’t really a surprise in terms of the abstractions that seem so clear and obvious now, but I think it was surprised then; and I also think that “there’s always a smooth abstraction in hindsight, so what, there’ll be one of those when the world ends too”, is a huge big deal in practice with respect to the future being unpredictable. From this, you seem to derive that I should already know what to bet with you about, and are annoyed by how I’m playing coy; because if I don’t bet with you right now, I should retract the statement that I think gradualists were surprised; but to me I’m not following the sequitur there.

Or maybe I’m just entirely misinterpreting the flow of your thoughts here.

• I think you think there’s a particular thing I said which implies that the ball should be in my court to already know a topic where I make a different prediction from what you do.

I’ve said I’m happy to bet about anything, and listed some particular questions I’d bet about where I expect you to be wronger. If you had issued the same challenge to me, I would have picked one of the things and we would have already made some bets. So that’s why I feel like the ball is in your court to say what things you’re willing to make forecasts about.

That said, I don’t know if making bets is at all a good use of time. I’m inclined to do it because I feel like your view really should be making different predictions (and I feel like you are participating in good faith and in fact would end up making different predictions). And I think it’s probably more promising than trying to hash out the arguments since at this point I feel like I mostly know your position and it’s incredibly slow going. But it seems very plausible that the right move is just to agree to disagree and not spend time on this. In that case it was particularly bad of me to try to claim the epistemic high ground. I can’t really defend myself there, but can explain by saying that I found your vitriolic reading of takeoff speeds pretty condescending and frustrating and, given that I think you are more wrong than right, wanted a nice way to demonstrate that.

I’ve mentioned the kinds of things I think your model will forecast badly, and suggested that we bet about them in particular:

• I think you generally overestimate the rate of trend breaks on measurable trends. So let’s pick some trends and estimate probability of trend breaks.

• I think you don’t understand in which domains trend-breaks are surprising and where they aren’t surprising, so you will be sometimes underconfident and sometimes overconfident on any given forecast. Same bet as last time.

• I think you are underconfident about the fact that almost all AI profits will come from areas that had almost-as-much profit in recent years. So we could bet about where AI profits are in the near term, or try to generalize this.

• I think you are underconfident about continuing scale-up in AI. So we can bet about future spending, size of particular labs, size of the ML field.

• I think you overestimate DeepMind’s advantage over the rest of the field and so will make bad forecasts about where any given piece of progress comes from.

• I think your AI timelines are generally too short. You can point to cool stuff happening as a vindication for your view, and there will certainly be some cool stuff happening, but I think if we actually get concrete you are just going to make worse predictions.

• I think you are underconfident about the fact that almost all AI profits will come from areas that had almost-as-much profit in recent years. So we could bet about where AI profits are in the near term, or try to generalize this.

• I’d be happy to disagree about romantic chatbots or machine translation. I’d have to look into it more to get a detailed sense in either, but I can guess. I’m not sure what “wouldn’t be especially surprised” means, I think to actually get disagreements we need way more resolution than that so one question is whether you are willing to play ball (since presumably you’d also have to looking into to get a more detailed sense). Maybe we could save labor if people would point out the empirical facts we’re missing and we can revise in light of that, but we’d still need more resolution. (That said: what’s up for grabs here are predictions about the future, not present.)

I’d guess that machine translation is currently something like $100M/​year in value, and will scale up more like 2x/​year than 10x/​year as DL improves (e.g. most of the total log increase will be in years with <3x increase rather than >3x increase, and 3 is like the 60th percentile of the number for which that inequality is tight). I’d guess that increasing deployment of romantic chatbots will end up with technical change happening first followed by social change second, so the speed of deployment and change will depend on the speed of social change. At early stages of the social change you will likely see much large investment in fine-tuning for this use case, and the results will be impressive as you shift from random folks doing it to actual serious efforts. The fact that it’s driven by social rather than technical change means it could proceed at very different paces in different countries. I don’t expect anyone to make a lot of profit from this before self-driving cars, for example I’d be pretty surprised if this surpassed$1B/​year of revenue before self-driving cars passed $10B/​year of revenue. I have no idea what’s happening in China. It would be fairly surprising to me if there was currently an actually-compelling version of the technology—which we could try operationalize as something like how bad your best available romantic relationship with humans has to be, or how lonely you’d have to be, or how short-sighted you’d have to be, before it’s appealing. I don’t have strong views about a mediocre product with low activation energy that’s nevertheless used by many (e.g. in the same way we see lots of games with mediocre hedonic value and high uptake, or lots of passive gambling). • Thanks for continuing to try on this! Without having spent a lot of labor myself on looking into self-driving cars, I think my sheer impression would be that we’ll get$1B/​yr waifutech before we get AI freedom-of-the-road; though I do note again that current self-driving tech would be more than sufficient for $10B/​yr revenue if people built new cities around the AI tech level, so I worry a bit about some restricted use-case of self-driving tech that is basically possible with current tech finding some less regulated niche worth a trivial$10B/​yr. I also remark that I wouldn’t be surprised to hear that waifutech is already past $1B/​yr in China, but I haven’t looked into things there. I don’t expect the waifutech to transcend my own standards for mediocrity, but something has to be pretty good before I call it more than mediocre; do you think there’s particular things that waifutech won’t be able to do? My model permits large jumps in ML translation adoption; it is much less clear about whether anyone will be able to build a market moat and charge big prices for it. Do you have a similar intuition about # of users increasing gradually, not just revenue increasing gradually? I think we’re still at the level of just drawing images about the future, so that anybody who came back in 5 years could try to figure out who sounded right, at all, rather than assembling a decent portfolio of bets; but I also think that just having images versus no images is a lot of progress. • Yes, I think that value added by automated translation will follow a similar pattern. Number of words translated is more sensitive to how you count and random nonsense, as is number of “users” which has even more definitional issues. You can state a prediction about self-driving cars in any way you want. The obvious thing is to talk about programs similar to the existing self-driving taxi pilots (e.g. Waymo One) and ask when they do$X of revenue per year, or when $X of self-driving trucking is done per year. (I don’t know what AI freedom-of-the-road means, do you mean something significantly more ambitious than self-driving trucks or taxis?) • jumping to newly accessible domains Man, the problem is that you say the “jump to newly accessible domains” will be the thing that lets you take over the world. So what’s up for dispute is the prototype being enough to take over the world rather than years of progress by a giant lab on top of the prototype. It doesn’t help if you say “I expect new things to sometimes become possible” if you don’t further say something about the impact of the very early versions of the product. Maybe you’ll want to say that however much Google spends on that, they must rationally anticipate at least that much added revenue If e.g. people were spending$1B/​year developing a technology, and then after a while it jumps from 0/​year to $1B/​year of profit, I’m not that surprised. (Note that machine translation is radically smaller than this, I don’t know the numbers.) I do suspect they could have rolled out a crappy version earlier, perhaps by significantly changing their project. But why would they necessarily bother doing that? For me this isn’t violating any of the principles that make your stories sound so crazy. The crazy part is someone spending$1B and then generating $100B/​year in revenue (much less$100M and then taking over the world).

(Note: it is surprising if an industry is spending $10T/​year on R&D and then jumps from$1T --> $10T of revenue in one year in a world that isn’t yet growing crazily. The surprising depends a lot on the numbers involved, and in particular on how valuable it would have been to deploy a worse version earlier and how hard it is to raise money at different scales.) • The crazy part is someone spending$1B and then generating $100B/​year in revenue (much less$100M and then taking over the world).

Would you say that this is a good description of Suddenly Hominids but you don’t expect that to happen again, or that this is a bad description of hominids?

• It’s not a description of hominids at all, no one spent any money on R&D.

I think there are analogies where this would be analogous to hominids (which I think are silly, as we discuss in the next part of this transcript). And there are analogies where this is a bad description of hominids (which I prefer).

• Spending money on R&D is essentially the expenditure of resources in order to explore and optimize over a promising design space, right? That seems like a good description of what natural selection did in the case of hominids. I imagine this still sounds silly to you, but I’m not sure why. My guess is that you think natural selection isn’t relevantly similar because it didn’t deliberately plan to allocate resources as part of a long bet that it would pay off big.

• I think natural selection has lots of similarities to R&D, but (i) there are lots of ways of drawing the analogy, (ii) some important features of R&D are missing in evolution, including some really important ones for fast takeoff arguments (like the existence of actors who think ahead).

If someones wants to spell out why they think evolution of hominids means takeoff is fast then I’m usually happy to explain why I disagree with their particular analogy. I think this happens in the next discord log between me and Eliezer.

• My uncharitable read on many of these domains is that you are saying “Sure, I think that Paul might have somewhat better forecasts than me on those questions, but why is that relevant to AGI?”

In that case it seems like the situation is pretty asymmetrical. I’m claiming that my view of AGI is related to beliefs and models that also bear on near-term questions, and I expect to make better forecasts than you in those domains because I have more accurate beliefs/​models. If your view of AGI is unrelated to any near-term questions where we disagree, then that seems like an important asymmetry.

• Inevitably, you can go back afterwards and claim it wasn’t really a surprise in terms of the abstractions that seem so clear and obvious now, but I think it was surprised then

It seems like you are saying that there is some measure that was continuous all along, but that it’s not obvious in advance which measure was continuous. That seems to suggest that there are a bunch of plausible measures you could suggest in advance, and lots of interesting action will be from changes that are discontinuous changes on some of those measures. Is that right?

If so, don’t we get out a ton of predictions? Like, for every particular line someone thinks might be smooth, the gradualist has a higher probability on it being smooth than you would? So why can’t I just start naming some smooth lines (like any of the things I listed in the grandparent) and then we can play ball?

If not, what’s your position? Is it that you literally can’t think of the possible abstractions that would later make the graph smooth? (This sounds insane to me.)

• I disagree that this is a meaningful forecasting track record. Massive degrees of freedom, and the mentioned events seem unresolvable, and it’s highly ambiguous how these things particularly prove the degree of error unless they were properly disambiguated in advance. Log score or it didn’t happen.

(Slightly edited to try and sound less snarky)

• BTW, a few days ago Eliezer made a specific prediction that is perhaps relevant to your discussion:

I [would very tentatively guess that] AGI to kill everyone before self-driving cars are commercialized

(I suppose Eliezer is talking about Level 5 autonomy cars here).

Maybe a bet like this could work:

At least one month will elapse after the first Level 5 autonomy car hits the road, without AGI killing everyone.

“Level 5 autonomy” could be further specified to avoid ambiguities. For example, like this:

The car must be publicly accessible (e.g. available for purchase, or as a taxi etc). The car should be able to drive from some East Coast city to some West Coast city by itself.

• Once you can buy a self-driving car, the thing that Paul predicts with surety and that I shrug about has already happened. If it does happen, my model says very little about remaining timeline from there one way or another. It shrugs again and says, “Guess that’s how difficult the AI problem and regulatory problem were.”

• sort of person who gets taken in by Hanson’s arguments in 2008 and gets caught flatfooted by AlphaGo and GPT-3 and AlphaFold 2

I find this kind of bluster pretty frustrating and condescending. I also feel like the implication is just wrong—if Eliezer and I disagree, I’d guess it’s because he’s worse at predicting ML progress. To me GPT-3 feels much (much) closer to my mainline than to Eliezer’s, and AlphaGo is very unsurprising. But it’s hard to say who was actually “caught flatfooted” unless we are willing to state some of these predictions in advance.

I got pulled into this interaction because I wanted to get Eliezer to make some real predictions, on the record, so that we could have a better version of this discussion in 5 years rather than continuing to both say “yeah, in hindsight this looks like evidence for my view.” I apologize if my tone (both in that discussion and in this comment) is a bit frustrated.

It currently feels from the inside like I’m holding the epistemic high ground on this point, though I expect Eliezer disagrees strongly:

• I’m willing to bet on anything Eliezer wants, or to propose my own questions if Eliezer is willing in principle to make forecasts. I expect to outperform Eliezer on these bets and am happy to state in advance that I’d update in his direction if his predictions turned out to be as good as mine. It’s possible that we don’t have disagreements, but I doubt it. (See my other comment.)

• I’m not talking this much smack based on “track records” imagined in hindsight. I think that if you want to do this then you should have been making predictions in the past, and you definitely should be willing to make predictions about the future. (I suspect you’ll often find that other people don’t disagree with the predictions that turned out to be reasonable, even if from your perspective it was all part of one coherent story.)

• I wish to acknowledge this frustration, and state generally that I think Paul Christiano occupies a distinct and more clueful class than a lot of, like, early EAs who mm-hmmmed along with Robin Hanson on AI—I wouldn’t put, eg, Dario Amodei in that class either, though we disagree about other things.

But again, Paul, it’s not enough to say that you weren’t surprised by GPT-2/​3 in retrospect, it kinda is important to say it in advance, ideally where other people can see? Dario picks up some credit for GPT-2/​3 because he clearly called it in advance. You don’t need to find exact disagreements with me to start going on the record as a forecaster, if you think the course of the future is generally narrower than my own guesses—if you think that trends stay on course, where I shrug and say that they might stay on course or break. (Except that of course in hindsight somebody will always be able to draw a straight-line graph, once they know which graph to draw, so my statement “it might stay on trend or maybe break” applies only to graphs extrapolating into what is currently the future.)

• Suppose your view is “crazy stuff happens all the time” and my view is “crazy stuff happens rarely.” (Of course “crazy” is my word, to you it’s just normal stuff.) Then what am I supposed to do, in your game?

More broadly: if you aren’t making bold predictions about the future, why do you think that other people will? (My predictions all feel boring to me.) And if you do have bold predictions, can we talk about some of them instead?

It seems to me like I want you to say “well I think 20% chance something crazy happens here” and I say “nah, that’s more like 5%” and then we batch up 5 of those and when none of them happen I get a bayes point.

I could just give my forecast. But then if I observe that 220 of them happen, how exactly does that help me in figuring out whether I should be paying more attention to your views (or help you snap out of it)?

I can list some particular past bets and future forecasts, but it’s really unclear what to do with them without quantitative numbers or a point of comparison.

Like you I’ve predicted that AI is undervalued and will grow in importance, although I think I made a much more specific prediction that investment in AI would go up a lot in the short term. This made me some money, but like you I just don’t care much about money and it’s not a game worth playing. I bet quite explicitly on deep learning by pivoting my career into practical ML and then spending years of my life working on it, despite loving theory and thinking it’s extremely important. We can debate whether the bet is good, but it was certainly a bet and by my lights it looks very reasonable in retrospect.

Over the next 10 years I think powerful ML systems will be trained mostly by imitating human behavior over short horizons, and then fine-tuned using much smaller amounts of long-horizon feedback. This has long been my prediction, and it’s why I’ve been interested in language modeling, and has informed some of my research. I think that’s still basically valid and will hold up in the future. I predict that people will explicitly collect much larger datasets of human behavior as the economic stakes rise. This is in contrast to e.g. theorem-proving working well, although I think that theorem-proving may end up being an important bellwether because it allows you to assess the capabilities of large models without multi-billion-dollar investments in training infrastructure.

I expect to see truly massive training runs in the not that distant future. I think the current rate of scaling won’t be sustained, but that over the next 10-20 years scaling will get us into human-level behavior for “short-horizon” tasks which may or may not suffice for transformative AI. I expect that to happen at model sizes within 2 orders of magnitude of the human brain on one side or the other, i.e. 1e12 to 1e16 parameters.

I could list a lot more, but I don’t think any of it seems bold and it’s not clear what the game is. It’s clearly bold by comparison to market forecasts or broader elite consensus, but so what? I understand much better how to compare one predictor to another. I mostly don’t know what it means to evaluate a predictor on an absolute scale.

• I predict that people will explicitly collect much larger datasets of human behavior as the economic stakes rise. This is in contrast to e.g. theorem-proving working well, although I think that theorem-proving may end up being an important bellwether because it allows you to assess the capabilities of large models without multi-billion-dollar investments in training infrastructure.

Well, it sounds like I might be more bullish than you on theorem-proving, possibly. Not on it being useful or profitable, but in terms of underlying technology making progress on non-profitable amazing demo feats, maybe I’m more bullish on theorem-proving than you are? Is there anything you think it shouldn’t be able to do in the next 5 years?

• I’m going to make predictions by drawing straight-ish lines through metrics like the ones in the gpt-f paper. Big unknowns are then (i) how many orders of magnitude of “low-hanging fruit” are there before theorem-proving even catches up to the rest of NLP? (ii) how hard their benchmarks are compared to other tasks we care about. On (i) my guess is maybe 2? On (ii) my guess is “they are pretty easy” /​ “humans are pretty bad at these tasks,” but it’s somewhat harder to quantify. If you think your methodology is different from that then we will probably end up disagreeing.

Looking towards more ambitious benchmarks, I think that the IMO grand challenge is currently significantly more than 5 years away. In 5 year’s time my median guess (without almost any thinking about it) is that automated solvers can do 10% of non-geometry, non-3-variable-inequality IMO shortlist problems.

So yeah, I’m happy to play ball in this area, and I expect my predictions to be somewhat more right than yours after the dust settles. Is there some way of measuring such that you are willing to state any prediction?

(I still feel like I’m basically looking for any predictions at all beyond sometimes saying “my model wouldn’t be surprised by <vague thing X>”, whereas I’m pretty constantly throwing out made-up guesses which I’m happy to refine with more effort. Obviously I’m going to look worse in retrospect than you if we keep up this way though, that particular asymmetry is a lot of the reason people mostly don’t play ball. ETA: that’s a bit unfair, the romantic chatbot vs self-driving car prediction is one where we’ve both given off-the-cuff takes.)

• I have a sense that there’s a lot of latent potential for theorem-proving to advance if more energy gets thrown at it, in part because current algorithms seem a bit weird to me—that we are waiting on the equivalent of neural MCTS as an enabler for AlphaGo, not just a bigger investment, though of course the key trick could already have been published in any of a thousand papers I haven’t read. I feel like I “would not be surprised at all” if we get a bunch of shocking headlines in 2023 about theorem-proving problems falling, after which the IMO challenge falls in 2024 - though of course, as events like this lie in the Future, they are very hard to predict.

Can you say more about why or whether you would, in this case, say that this was an un-Paulian set of events? As I have trouble manipulating my Paul model, it does not exclude Paul saying, “Ah, yes, well, they were using 700M models in that paper, so if you jump to 70B, of course the IMO grand challenge could fall; there wasn’t a lot of money there.” Though I haven’t even glanced at any metrics here, let alone metrics that the IMO grand challenge could be plotted on, so if smooth metrics rule out IMO in 5yrs, I am more interested yet—it legit decrements my belief, but not nearly as much as I imagine it would decrement yours.

(Edit: Also, on the meta-level, is this, like, anywhere at all near the sort of thing you were hoping to hear from me? Am I now being a better epistemic citizen, if maybe not a good one by your lights?)

• Yes, IMO challenge falling in 2024 is surprising to me at something like the 1% level or maybe even more extreme (though could also go down if I thought about it a lot or if commenters brought up relevant considerations, e.g. I’d look at IMO problems and gold medal cutoffs and think about what tasks ought to be easy or hard; I’m also happy to make more concrete per-question predictions). I do think that there could be huge amounts of progress from picking the low hanging fruit and scaling up spending by a few orders of magnitude, but I still don’t expect it to get you that far.

I don’t think this is an easy prediction to extract from a trendline, in significant part because you can’t extrapolate trendlines this early that far out. So this is stress-testing different parts of my model, which is fine by me.

At the meta-level, this is the kind of thing I’m looking for, though I’d prefer have some kind of quantitative measure of how not-surprised you are. If you are only saying 2% then we probably want to talk about things less far in your tails than the IMO challenge.

• Okay, then we’ve got at least one Eliezerverse item, because I’ve said below that I think I’m at least 16% for IMO theorem-proving by end of 2025. The drastic difference here causes me to feel nervous, and my second-order estimate has probably shifted some in your direction just from hearing you put 1% on 2024, but that’s irrelevant because it’s first-order estimates we should be comparing here.

So we’ve got huge GDP increases for before-End-days signs of Paulverse and quick IMO proving for before-End-days signs of Eliezerverse? Pretty bare portfolio but it’s at least a start in both directions. If we say 5% instead of 1%, how much further would you extend the time limit out beyond 2024?

I also don’t know at all what part of your model forbids theorem-proving to fall in a shocking headline followed by another headline a year later—it doesn’t sound like it’s from looking at a graph—and I think that explaining reasons behind our predictions in advance, not just making quantitative predictions in advance, will help others a lot here.

EDIT: Though the formal IMO challenge has a barnacle about the AI being open-sourced, which is a separate sociological prediction I’m not taking on.

• I think IMO gold medal could be well before massive economic impact, I’m just surprised if it happens in the next 3 years. After a bit more thinking (but not actually looking at IMO problems or the state of theorem proving) I probably want to bump that up a bit, maybe 2%, it’s hard reasoning about the tails.

I’d say <4% on end of 2025.

I think this is the flipside of me having an intuition where I say things like “AlphaGo and GPT-3 aren’t that surprising”—I have a sense for what things are and aren’t surprising, and not many things happen that are so surprising.

If I’m at 4% and you are 12% and we had 8 such bets, then I can get a factor of 2 if they all come out my way, and you get a factor of ~1.5 if one of them comes out your way.

I might think more about this and get a more coherent probability distribution, but unless I say something else by end of 2021 you can consider 4% on end of 2025 this my prediction.

• Maybe another way of phrasing this—how much warning do you expect to get, how far out does your Nope Vision extend? Do you expect to be able to say “We’re now in the ‘for all I know the IMO challenge could be won in 4 years’ regime” more than 4 years before it happens, in general? Would it be fair to ask you again at the end of 2022 and every year thereafter if we’ve entered the ‘for all I know, within 4 years’ regime?

Added: This question fits into a larger concern I have about AI soberskeptics in general (not you, the soberskeptics would not consider you one of their own) where they saunter around saying “X will not occur in the next 5 /​ 10 /​ 20 years” and they’re often right for the next couple of years, because there’s only one year where X shows up for any particular definition of that, and most years are not that year; but also they’re saying exactly the same thing up until 2 years before X shows up, if there’s any early warning on X at all. It seems to me that 2 years is about as far as Nope Vision extends in real life, for any case that isn’t completely slam-dunk; when I called upon those gathered AI luminaries to say the least impressive thing that definitely couldn’t be done in 2 years, and they all fell silent, and then a single one of them named Winograd schemas, they were right that Winograd schemas at the stated level didn’t fall within 2 years, but very barely so (they fell the year after). So part of what I’m flailingly asking here, is whether you think you have reliable and sensitive Nope Vision that extends out beyond 2 years, in general, such that you can go on saying “Not for 4 years” up until we are actually within 6 years of the thing, and then, you think, your Nope Vision will actually flash an alert and you will change your tune, before you are actually within 4 years of the thing. Or maybe you think you’ve got Nope Vision extending out 6 years? 10 years? Or maybe theorem-proving is just a special case and usually your Nope Vision would be limited to 2 years or 3 years?

This is all an extremely Yudkowskian frame on things, of course, so feel free to reframe.

• I think I’ll get less confident as our accomplishments get closer to the IMO grand challenge. Or maybe I’ll get much more confident if we scale up from $1M →$1B and pick the low hanging fruit without getting fairly close, since at that point further progress gets a lot easier to predict

There’s not really a constant time horizon for my pessimism, it depends on how long and robust a trend you are extrapolating from. 4 years feels like a relatively short horizon, because theorem-proving has not had much investment so compute can be scaled up several orders of magnitude, and there is likely lots of low-hanging fruit to pick, and we just don’t have much to extrapolate from (compared to more mature technologies, or how I expect AI will be shortly before the end of days), and for similar reasons there aren’t really any benchmarks to extrapolate.

(Also note that it matters a lot whether you know what problems labs will try to take a stab at. For the purpose of all of these forecasts, I am trying insofar as possible to set aside all knowledge about what labs are planning to do though that’s obviously not incentive-compatible and there’s no particular reason you should trust me to do that.)

• I feel like I “would not be surprised at all” if we get a bunch of shocking headlines in 2023 about theorem-proving problems falling, after which the IMO challenge falls in 2024

Possibly helpful: Metaculus currently puts the chances of the IMO grand challenge falling by 2025 at about 8%. Their median is 2039.

I think this would make a great bet, as it would definitely show that your model can strongly outperform a lot of people (and potentially Paul too). And the operationalization for the bet is already there—so little work will be needed to do that part.

• Ha! Okay then. My probability is at least 16%, though I’d have to think more and Look into Things, and maybe ask for such sad little metrics as are available before I was confident saying how much more. Paul?

EDIT: I see they want to demand that the AI be open-sourced publicly before the first day of the IMO, which unfortunately sounds like the sort of foolish little real-world obstacle which can prevent a proposition like this from being judged true even where the technical capability exists. I’ll stand by a >16% probability of the technical capability existing by end of 2025, as reported on eg solving a non-trained/​heldout dataset of past IMO problems, conditional on such a dataset being available; I frame no separate sociological prediction about whether somebody is willing to open-source the AI model that does it.

• I’ll stand by a >16% probability of the technical capability existing by end of 2025, as reported on eg solving a non-trained/​heldout dataset of past IMO problems, conditional on such a dataset being available

It feels like this bet would look a lot better if it were about something that you predict at well over 50% (with people in Paul’s camp still maintaining less than 50%). So, we could perhaps modify the terms such that the bot would only need to surpass a certain rank or percentile-equivalent in the competition (and not necessarily receive the equivalent of a Gold medal).

The relevant question is which rank/​percentile you think is likely to be attained by 2025 under your model but you predict would be implausible under Paul’s model. This may be a daunting task, but one way to get started is to put a probability distribution over what you think the state-of-the-art will look like by 2025, and then compare to Paul’s.

Edit: Here are, for example, the individual rankings for 2021: https://​​www.imo-official.org/​​year_individual_r.aspx?year=2021

• I expect it to be hella difficult to pick anything where I’m at 75% that it happens in the next 5 years and Paul is at 25%. Heck, it’s not easy to find things where I’m at over 75% that aren’t just obvious slam dunks; the Future isn’t that easy to predict. Let’s get up to a nice crawl first, and then maybe a small portfolio of crawlings, before we start trying to make single runs that pierce the sound barrier.

I frame no prediction about whether Paul is under 16%. That’s a separate matter. I think a little progress is made toward eventual epistemic virtue if you hand me a Metaculus forecast and I’m like “lol wut” and double their probability, even if it turns out that Paul agrees with me about it.

• It feels like this bet would look a lot better if it were about something that you predict at well over 50% (with people in Paul’s camp still maintaining less than 50%).

My model of Eliezer may be wrong, but I’d guess that this isn’t a domain where he has many over-50% predictions of novel events at all? See also ‘I don’t necessarily expect self-driving cars before the apocalypse’.

My Eliezer-model has a more flat prior over what might happen, which therefore includes stuff like ‘maybe we’ll make insane progress on theorem-proving (or whatever) out of the blue’. Again, I may be wrong, but my intuition is that you’re Paul-omorphizing Eliezer when you assume that >16% probability of huge progress in X by year Y implies >50% probability of smaller-but-meaningful progress in X by year Y.

• If this task is bad for operationalization reasons, there are other theorem proving benchmarks. Unfortunately it looks like there aren’t a lot of people that are currently trying to improve on the known benchmarks, as far as I’m aware.

The code generation benchmarks are slightly more active. I’m personally partial to Hendrycks et al.’s APPS benchmark, which includes problems that “range in difficulty from introductory to collegiate competition level and measure coding and problem-solving ability.” (Github link).

• To me GPT-3 feels much (much) closer to my mainline than to Eliezer’s

To add to this sentiment, I’ll post the graph from my notebook on language model progress. I refer to the Penn Treebank task a lot when making this point because it seems to have a lot of good data, but you can also look at the other tasks and see basically the same thing.

The last dip in the chart is from GPT-3. It looks like GPT-3 was indeed a discontinuity in progress but not a very shocking one. It roughly would have taken about one or two more years at ordinary progress to get to that point anyway—which I just don’t see as being all that impressive.

I sorta feel like the main reason why lots of people found GPT-3 so impressive was because OpenAI was just good at marketing the results [ETA: sorry, I take back the use of the word “marketing”]. Maybe OpenAI saw an opportunity to dump a lot of compute into language models and have a two year discontinuity ahead of everyone else, and showcase their work. And that strategy seemed to really worked well for them.

I admit this is an uncharitable explanation, but is there a better story to tell about why GPT-3 captured so much attention?

• The impact of GPT-3 had nothing whatsoever to do with its perplexity on Penn Treebank. I think this is a good example of why focusing on perplexity and ‘straight lines on graph go brr’ is so terrible, such cargo cult mystical thinking, and crippling. There’s something astonishing to see someone resort to explaining away GPT-3′s impact as ‘OpenAI was just good at marketing the results’. Said marketing consisted of: ‘dropping a paper on Arxiv’. Not even tweeting it! They didn’t even tweet the paper! (Forget an OA blog post, accompanying NYT/​TR articles, tweets by everyone at OA, a fancy interactive interface—none of that.) And most of the initial reaction was “GPT-3: A Disappointing Paper”-style. If this is marketing genius, then it is truly 40-d chess, is all I can say.

The impact of GPT-3 was in establishing that trendlines did continue in a way that shocked pretty much everyone who’d written off ‘naive’ scaling strategies. Progress is made out of stacked sigmoids: if the next sigmoid doesn’t show up, progress doesn’t happen. Trends happen, until they stop. Trendlines are not caused by the laws of physics. You can dismiss AlphaGo by saying “oh, that just continues the trendline in ELO I just drew based on MCTS bots”, but the fact remains that MCTS progress had stagnated, and here we are in 2021, and pure MCTS approaches do not approach human champions, much less beat them. (This is also true of SVMs. Notice SVMs solving ImageNet because the trendlines continued? No, of course you did not. It drives me bonkers to see AI Impacts etc make arguments like “deep learning is unimportant because look, ImageNet follows a trendline”. Sheer numerology.) Appealing to trendlines is roughly as informative as “calories in calories out”; ‘the trend continued because the trend continued’. A new sigmoid being discovered is extremely important.

GPT-3 further showed completely unpredicted emergence of capabilities across downstream tasks which are not measured in PTB perplexity. There is nothing obvious about a PTB BPC of 0.80 that causes it to be useful where 0.90 is largely useless and 0.95 is a laughable toy. (OAers may have had faith in scaling, but they could not have told you in 2015 that interesting behavior would start at O(1b), and it’d get really cool at O(100b).) That’s why it’s such a useless metric. There’s only one thing that a PTB perplexity can tell you, under the pretraining paradigm: when you have reached human AGI level. (Which is useless for obvious reasons: much like saying that “if you hear the revolver click, the bullet wasn’t in that chamber and it was safe”. Surely true, but a bit late.) It tells you nothing about intermediate levels. I’m reminded of the Steven Kaas line:

Why idly theorize when you can JUST CHECK and find out the ACTUAL ANSWER to a superficially similar-sounding question SCIENTIFICALLY?

Using PBT, and talking only about perplexity, is a precise answer to the wrong question. (This is a much better argument when it comes to AlphaGo/​ELO, because at least there, ‘ELO’ is in fact the ultimate objective, and not a proxy pretext. But perplexity is of no interest to anyone except an information theorist. Unfortunately, we lack any ‘take-over-the-world-ELO’ we can benchmark models on and extrapolate there. If we did and there was a smooth curve, I would indeed agree that we should adopt that as the baseline. But the closest things we have to downstream tasks are all wildly jumpy—even superimposing scores of downstream tasks barely gives you a recognizable smooth curve, and certainly nothing remotely as smooth as the perplexity curve. My belief is that this is because the overall perplexity curve comes from hundreds or thousands of stacked sigmoids and plateau/​breakthroughs averaging out in terms of prediction improvements.)

I emphasized this poverty of extrapolation in my scaling hypothesis writeup already, but permit me to vent a little more here:

“So, you’re forecasting AI progress using PTB perplexity/​BPC. Cool, good work, nice notebook, surely this must be useful for forecasting on substantive AI safety/​capability questions of interest to us. I see it’s a pretty straight line on a graph. OK, can you tell me at what BPC a large language model could do stuff like hack computers and escape onto the Internet?”

“No. I can tell you what happens if I draw the line out x units, though.”

“Perhaps that’s an unfairly specific question to ask, as important as it is. OK, can you tell me when we can expect to see well-known benchmarks like Winograd schemas be solved?”

“No. I can draw you a line on PTB to estimate when PTB is solved, though, if you give me a second and define a bound for ‘PTB is solved’.”

“Hm. Can you at least tell me when we can expect to see meta-learning emerge, with good few-shot learning—does the graph predict 0.1b, 1b, 10b, 100b, or what?”

“No idea.”

“Do you know what capabilities will be next to emerge? We got text style transfer in LaMDA and pretty good programming performance in Copilot at O(100b), what’s next?”

“I don’t know.”

“Can you qualitatively describe what we’d get at 1t, or 10t?”

“No, but I can draw the line in perplexity. It gets pretty low.”

“How about the existence of any increasing returns to scale in downstream tasks? Does it tell us anything about spikes in capabilities (such as we observe in many places, most recently BIG-bench)? Such as whether there are any more spikes past O(100b), whether we’ll see holdouts like causality suddenly fall at O(1000b), anything like that?”

“No.”

“How about RL: what sort of world modeling can we get by plugging them into DRL agents?”

“I don’t know.”

“Fine, let’s leave it at tool AIs doing text in text out. Can you tell me how much economic value will be driven by dropping another 0.01 BPC?”

“No. I can tell you how much it’d cost in GPU-time, though, by the awesome power of drawing lines!”

“OK, how about that: how low does it need to go to support a multi-billion dollar company running something like the OA API, to defray the next 0.01 drop and pay for the GPU-time to get more drops?”

“No idea.”

“How do you know BPC is the right metric to use?”

“Everyone chose it post hoc after seeing that it worked and better BPC = better models.”

“Absolutely not.”

“Ugh. Fine, what can you tell me about AI safety/​risk/​capabilities/​economics/​societal-disruption with these analyses of absolute loss?”

“Lines go straight?”

• And to say it also explicitly, I think this is part of why I have trouble betting with Paul. I have a lot of ? marks on the questions that the Gwern voice is asking above, regarding them as potentially important breaks from trend that just get dumped into my generalized inbox one day. If a gradualist thinks that there ought to be a smooth graph of perplexity with respect to computing power spent, in the future, that’s something I don’t care very much about except insofar as it relates in any known way whatsoever to questions like those the Gwern voice is asking. What does it even mean to be a gradualist about any of the important questions like those of the Gwern-voice, when they don’t relate in known ways to the trend lines that are smooth? Isn’t this sort of a shell game where our surface capabilities do weird jumpy things, we can point to some trend lines that were nonetheless smooth, and then the shells are swapped and we’re told to expect gradualist AGI surface stuff? This is part of the idea that I’m referring to when I say that, even as the world ends, maybe there’ll be a bunch of smooth trendlines underneath it that somebody could look back and point out. (Which you could in fact have used to predict all the key jumpy surface thresholds, if you’d watched it all happen on a few other planets and had any idea of where jumpy surface events were located on the smooth trendlines—but we haven’t watched it happen on other planets so the trends don’t tell us much we want to know.)

• This seems totally bogus to me.

It feels to me like you mostly don’t have views about the actual impact of AI as measured by jobs that it does or the $s people pay for them, or performance on any benchmarks that we are currently measuring, while I’m saying I’m totally happy to use gradualist metrics to predict any of those things. If you want to say “what does it mean to be a gradualist” I can just give you predictions on them. To you this seems reasonable, because e.g.$ and benchmarks are not the right way to measure the kinds of impacts we care about. That’s fine, you can propose something other than $or measurable benchmarks. If you can’t propose anything, I’m skeptical. My basic guess is that you probably can’t effectively predict$ or benchmarks or anything else quantitative. If you actually agreed with me on all that stuff, then I might suspect that you are equivocating between a gradualist-like view that you use for making predictions about everything near term and then switching to a more bizarre perspective when talking about the future. But fortunately I think this is more straightforward, because you are basically being honest when you say that you don’t understand how the gradualist perspective makes predictions.

• I kind of want to see you fight this out with Gwern (not least for social reasons, so that people would perhaps see that it wasn’t just me, if it wasn’t just me).

But it seems to me that the very obvious GPT-5 continuation of Gwern would say, “Gradualists can predict meaningless benchmarks, but they can’t predict the jumpy surface phenomena we see in real life.” We want to know when humans land on the moon, not whether their brain sizes continued on a smooth trend extrapolated over the last million years.

I think there’s a very real sense in which, yes, what we’re interested in are milestones, and often milestones that aren’t easy to define even after the fact. GPT-2 was shocking, and then GPT-3 carried that shock further in that direction, but how do you talk with that about somebody who thinks that perplexity loss is smooth? I can handwave statements like “GPT-3 started to be useful without retraining via just prompt engineering” but qualitative statements like those aren’t good for betting, and it’s much much harder to come up with the right milestone like that in advance, instead of looking back in your rearview mirror afterwards.

But you say—I think? - that you were less shocked by this sort of thing than I am. So, I mean, can you prophesy to us about milestones and headlines in the next five years? I think I kept thinking this during our dialogue, but never saying it, because it seemed like such an unfair demand to make! But it’s also part of the whole point that AGI and superintelligence and the world ending are all qualitative milestones like that. Whereas such trend points as Moravec was readily able to forecast correctly—like 10 teraops /​ plausibly-human-equivalent-computation being available in a 10 million supercomputer around 2010 - are really entirely unanchored from AGI, at least relative to our current knowledge about AGI. (They would be anchored if we’d seen other planets go through this, but we haven’t.) • But it seems to me that the very obvious GPT-5 continuation of Gwern would say, “Gradualists can predict meaningless benchmarks, but they can’t predict the jumpy surface phenomena we see in real life.” Don’t you think you’re making a falsifiable prediction here? Name something that you consider part of the “jumpy surface phenomena” that will show up substantially before the world ends (that you think Paul doesn’t expect). Predict a discontinuity. Operationalize everything and then propose the bet. • What does it even mean to be a gradualist about any of the important questions like those of the Gwern-voice, when they don’t relate in known ways to the trend lines that are smooth? Perplexity is one general “intrinsic” measure of language models, but there are many task-specific measures too. Studying the relationship between perplexity and task-specific measures is an important part of the research process. We shouldn’t speak as if people do not actively try to uncover these relationships. I would generally be surprised if there were many highly non-linear relationship between perplexity and something like Winograd accuracy, human evaluation, or whatever other concrete measure you can come up with, such that the underlying behavior of the surface phenomenon is best described as a discontinuity with the past even when the latent perplexity changed smoothly. I admit the existence of some measures that exhibit these qualities (such as, potentially, the ability to do arithmetic), but I expect them to be quite a bit harder to find than the reverse. Furthermore, it seems like if this is the crux — ie. that surface-level qualitative phenomena will experience discontinuities even while latent variables do not — then I do not understand why it’s hard to come up with bet conditions. Can’t you just pick a surface level phenomenon that’s easy to measure and strongly interpretable in a qualitative sense — like Sensibleness and Specificity Average from the paper on Google’s chatbot — and then predict discontinuities in that metric? (I should note that the paper shows a highly linear relationship between perplexity and Sensibleness and Specificity Average. Just look at the first plot in the PDF.) • Well put /​ endorsed /​ +1. • I think that most people who work on models like GPT-3 seem more interested in trendlines than you do here. That said, it’s not super clear to me what you are saying so I’m not sure I disagree. Your narrative sounds like a strawman since people usually extrapolate performance on downstream tasks they care about rather than on perplexity. But I do agree that the updates from GPT-3 are not from OpenAI’s marketing but instead from people’s legitimate surprise about how smart big language models seem to be. As you say, I think the interesting claim in GPT-3 was basically that scaling trends would continue, where pessimists incorrectly expected they would break based on weak arguments. I think that looking at all the graphs, both of perplexity and performance on individual tasks, helps establish this as the story. I don’t really think this lines up with Eliezer’s picture of AGI but that’s presumably up for debate. There are always a lot of people willing to confidently decree that trendlines will break down without much argument. (I do think that eventually the GPT-3 trendline will break if you don’t change the data, but for the boring reason that the entropy of natural language will eventually dominate the gradient noise and so lead to a predictable slowdown.) • There’s something astonishing to see someone resort to explaining away GPT-3′s impact as ‘OpenAI was just good at marketing the results’. Said marketing consisted of: ‘dropping a paper on Arxiv’. Not even tweeting it! Yeah, my phrasing there was not ideal here. I regret using the word “marketing”, but to be fair, I mostly meant what I said in the next few sentences, “Maybe OpenAI saw an opportunity to dump a lot of compute into language models and have a two year discontinuity ahead of everyone else, and showcase their work. And that strategy seemed to really worked well for them.” Of course, seeing that such an opportunity exists is itself laudable and I give them Bayes points for realizing that scaling laws are important. At the same time, don’t you think we would have expected similar results in like two more years at ordinary progress? I do agree that it’s extremely interesting to know why the lines go straight. I feel like I wasn’t trying to say that GPT-3 wasn’t intrinsically interesting. I was more saying it wasn’t unpredictable, in the sense that Paul Christiano would have strongly said “no I do not expect that to happen” in 2018. • Again, the fact that it is a straight line on a metric which is, if not meaningless, is extremely difficult to interpret, is irrelevant. Maybe OA moved up by 2 years. Why would anyone care in the slightest bit? That is, before they knew about how interesting the consequences would be of that small change in BPC? At the same time, don’t you think we would have expected similar results in like two more years at ordinary progress? Who’s ‘we’, exactly? Who are these people who expected all of this to happen, and are going around saying “ah yes, these BIG-Bench results are exactly as I calculated back in 2018, the capabilities are all emerging like clockwork, each at their assigned BPC; next is capability Z, obviously”? And what are they saying about 500b, 1000b, and so on? I was more saying it wasn’t unpredictable, in the sense that Paul Christiano would have strongly said “no I do not expect that to happen” in 2018. OK. So can you link me to someone saying in 2018 that we’d see GPT-2-1.5b’s behavior at ~1.5b parameters, and that we’d get few-shot metalearning and instructability past that with another OOM? And while you’re at it, if it’s so predictable, please answer all the other questions I gave, even if only the ones about scale. After all, you’re claiming it’s so easy to predict based on straight lines on convenient metrics like BPC and that there’s nothing special or unpredictable about jumping 2 years. So, please jump merely 2 years ahead and tell me what I can look forward as the SOTA in Nov 2023, I’m dying of excitement here. • I’m confused why you think looking at the rate and lumpiness of historical progress on narrowly circumscribed performance metrics is not meaningful, because it seems like you do seem to think that drawing straight lines is fine when compute is on the x-axis—which seems like a similar exercise. What’s going on there? • Again, the fact that it is a straight line on a metric which is, if not meaningless, is extremely difficult to interpret, is irrelevant. Maybe OA moved up by 2 years. Why would anyone care in the slightest bit? Because the point I was trying to make was that the result was relatively predictable? I’m genuinely confused what you’re asking. I get a slight sense that you’re interpreting me as saying something about the inherent dullness of GPT-3 or that it doesn’t teach us anything interesting about AI, but I don’t see myself as saying anything like that. I actually really enjoy reading the output from it, your commentary on it, and what it reveals about the nature of intelligence. I am making purely a point about predictability, and whether the result was a “discontinuity” from past progress, in the sense meant by Paul Christiano (in the way I think he means these things). Who’s ‘we’, exactly We refers in that sentence to competent observers in 2018 who predict when we’ll get ML milestones mostly by using the outside view, ie. by extrapolating trends on charts. OK. So can you link me to someone saying in 2018 that we’d see GPT-2-1.5b’s behavior at ~1.5b parameters, and that we’d get few-shot metalearning and instructability past that with another OOM? No, but 1. That seems like a different and far more specific question than whether we’d have language models that perform at roughly the same measured-level as GPT-3. 2. In general, people make very few specific predictions about what they expect to happen in the future about these sorts of things (though, if I may add, I’ve been making modest progress trying to fix this broad problem by writing lots of specific questions on Metaculus). • I think what gwern is trying to say is that continuous progress on a benchmark like PTB appears (from what we’ve seen so far) to map to discontinuous progress in qualitative capabilities, in a surprising way which nobody seems to have predicted in advance. Qualitative capabilities are more relevant to safety than benchmark performance is, because while qualitative capabilities include things like “code a simple video game” and “summarize movies with emojis”, they also include things like “break out of confinement and kill everyone”. It’s the latter capability, and not PTB performance, that you’d need to predict if you wanted to reliably stay out of the x-risk regime — and the fact that we can’t currently do so is, I imagine, what brought to mind the analogy between scaling and Russian roulette. I.e., a straight line in domain X is indeed not surprising; what’s surprising is the way in which that straight line maps to the things we care about more than X. (Usual caveats apply here that I may be misinterpreting folks, but that is my best read of the argument.) • I think what gwern is trying to say is that continuous progress on a benchmark like PTB appears (from what we’ve seen so far) to map to discontinuous progress in qualitative capabilities, in a surprising way which nobody seems to have predicted in advance. This is a reasonable thesis, and if indeed it’s the one Gwern intended, then I apologize for missing it! That said, I have a few objections, • Isn’t it a bit suspicious that the thing-that’s-discontinuous is hard to measure, but the-thing-that’s-continuous isn’t? I mean, this isn’t totally suspicious, because subjective experiences are often hard to pin down and explain using numbers and statistics. I can understand that, but the suspicion is still there. • “No one predicted X in advance” is only damning to a theory if people who believed that theory were making predictions about it at all. If people who generally align with Paul Christiano were indeed making predictions to the effect of GPT-3 capabilities being impossible or very unlikely within a narrow future time window, then I agree that would be damning to Paul’s worldview. But—and maybe I missed something—I didn’t see that. Did you? • There seems to be an implicit claim that Paul Christiano’s theory was falsified via failure to retrodict the data. But that’s weird, because much of the evidence being presented is mainly that the previous trends were upheld (for example, with Gwern saying, “The impact of GPT-3 was in establishing that trendlines did continue...”). But if Paul’s worldview is that “we should extrapolate trends, generally” then that piece of evidence seems like a remarkable confirmation of his theory, not a disconfirmation. • Do we actually have strong evidence that the qualitative things being mentioned were discontinuous with respect to time? I can certainly see some things being discontinuous with past progress (like the ability for GPT-3 to do arithmetic). But overall I feel like I’m being asked to believe something quite strong about GPT-3 breaking trends without actual references to what progress really looked like in the past. I don’t deny that you can find quite a few discontinuities on a variety of metrics, especially if you search for them post-hoc. I think it would be fairly strawmanish to say that people in Paul Christiano’s camp don’t expect those at all. My impression is that they just don’t expect those to be overwhelming in a way that makes reliable reference class forecasting qualitatively useless; it seems like extrapolating from the past still gives you a lot better of a model than most available alternatives. • it seems like extrapolating from the past still gives you a lot better of a model than most available alternatives. My impression is that some people are impressed by GPT-3′s capabilities, whereas your response is “ok, but it’s part of the straight-line trend on Penn Treebank; maybe it’s a little ahead of schedule, but nothing to write home about.” But clearly you and they are focused on different metrics! That is, suppose it’s the case that GPT-3 is the first successfully commercialized language model. (I think in order to make this literally true you have to throw on additional qualifiers that I’m not going to look up; pretend I did that.) So on a graph of “language model of type X revenue over time”, total revenue is static at 0 for a long time and then shortly after GPT-3′s creation departs from 0. It seems like the fact that GPT-3 could be commercialized in this way when GPT-2 couldn’t is a result of something that Penn Treebank perplexity is sort of pointing at. (That is, it’d be hard to get a model with GPT-3′s commercializability but GPT-2′s Penn Treebank score.) But what we need in order for the straight line on PTB to be useful as a model for predicting revenue is to know ahead of time what PTB threshold you need for commercialization. And so this is where the charge of irrelevancy is coming from: yes, you can draw straight lines, but they’re straight lines in the wrong variables. In the interesting variables (from the “what’s the broader situation?” worldview), we do see discontinuities, even if there are continuities in different variables. [As an example of the sort of story that I’d want, imagine we drew the straight line of ELO ratings for Go-bots, had a horizontal line of “human professionals” on that line, and were able to forecast the discontinuity in “number of AI wins against human grandmasters” by looking at straight-line forecasts in ELO.] • That is, suppose it’s the case that GPT-3 is the first successfully commercialized language model. (I think in order to make this literally true you have to throw on additional qualifiers that I’m not going to look up; pretend I did that.) So on a graph of “language model of type X revenue over time”, total revenue is static at 0 for a long time and then shortly after GPT-3′s creation departs from 0. I think it’s the nature of every product that comes on the market that it will experience a discontinuity from having zero revenue to having some revenue at some point. It’s an interesting question of when that will happen, and maybe your point is simply that it’s hard to predict when that will happen when you just look at the Penn Treebank trend. However, I suspect that the revenue curve will look pretty continuous, now that it’s gone from zero to one. Do you disagree? In a world with continuous, gradual progress across a ton of metrics, you’re going to get discontinuities from zero to one. I don’t think anyone from the Paul camp disagrees with that (in fact, Katja Grace talked about this in her article). From the continuous takeoff perspective, these discontinuities don’t seem very relevant unless going from zero to one is very important in a qualitative sense. But I would contend that going from “no revenue” to “some revenue” is not actually that meaningful in the sense of distinguishing AI from the large class of other economic products that have gradual development curves. • your point is simply that it’s hard to predict when that will happen when you just look at the Penn Treebank trend. This is a big part of my point; a smaller elaboration is that it can be easy to trick yourself into thinking that, because you understand what will happen with PTB, you’ll understand what will happen with economics/​security/​etc., when in fact you don’t have much understanding of the connection between those, and there might be significant discontinuities. [To be clear, I don’t have much understanding of this either; I wish I did!] For example, I imagine that, by thirty years from now, we’ll have language/​code models that can do significant security analysis of the code that was available in 2020, and that this would have been highly relevant/​valuable to people in 2020 interested in computer security. But when will this happen in the 2020-2050 range that seems likely to me? I’m pretty uncertain, and I expect this to look a lot like ‘flicking a switch’ in retrospect, even tho the leadup to flicking that switch will probably look like smoothly increasing capabilities on ‘toy’ problems. [My current guess is that Paul /​ people in “Paul’s camp” would mostly agree with the previous paragraph, except for thinking that it’s sort of weird to focus on specifically AI computer security productivity, rather than the overall productivity of the computer security ecosystem, and this misplaced focus will generate the ‘flipping the switch’ impression. I think most of the disagreements are about ‘where to place the focus’, and this is one of the reasons it’s hard to find bets; it seems to me like Eliezer doesn’t care much about the lines Paul is drawing, and Paul doesn’t care much about the lines Eliezer is drawing.] However, I suspect that the revenue curve will look pretty continuous, now that it’s gone from zero to one. Do you disagree? I think I agree in a narrow sense and disagree in a broad sense. For this particular example, I expect OpenAI’s revenues from GPT-3 to look roughly continuous now that they’re selling/​licensing it at all (until another major change happens; like, the introduction of a competitor would likely cause the revenue trend to change). More generally, suppose we looked at something like “the total economic value of horses over the course of human history”. I think we would see mostly smooth trends plus some implied starting and stopping points for those trends. (Like, “first domestication of a horse” probably starts a positive trend, “invention of stirrups” probably starts another positive trend, “introduction of horses to America” starts another positive trend, “invention of the automobile” probably starts a negative trend that ends with “last horse that gets replaced by a tractor/​car”.) In my view, ‘understanding the world’ looks like having a causal model that you can imagine variations on (and have those imaginations be meaningfully grounded in reality), and I think the bits that are most useful for building that causal model are the starts and stops of the trends, rather than the smooth adoption curves or mostly steady equilibria in between. So it seems sort of backwards to me to say that for most of the time, most of the changes in the graph are smooth, because what I want out of the graph is to figure out the underlying generator, where the non-smooth bits are the most informative. The graph itself only seems useful as a means to that end, rather than an end in itself. • Yeah, these are interesting points. Isn’t it a bit suspicious that the thing-that’s-discontinuous is hard to measure, but the-thing-that’s-continuous isn’t? I mean, this isn’t totally suspicious, because subjective experiences are often hard to pin down and explain using numbers and statistics. I can understand that, but the suspicion is still there. I sympathize with this view, and I agree there is some element of truth to it that may point to a fundamental gap in our understanding (or at least in mine). But I’m not sure I entirely agree that discontinuous capabilities are necessarily hard to measure: for example, there are benchmarks available for things like arithmetic, which one can train on and make quantitative statements about. I think the key to the discontinuity question is rather that 1) it’s the jumps in model scaling that are happening in discrete increments; and 2) everything is S-curves, and a discontinuity always has a linear regime if you zoom in enough. Those two things together mean that, while a capability like arithmetic might have a continuous performance regime on some domain, in reality you can find yourself halfway up the performance curve in a single scaling jump (and this is in fact what happened with arithmetic and GPT-3). So the risk, as I understand it, is that you end up surprisingly far up the scale of “world-ending” capability from one generation to the next, with no detectable warning shot beforehand. “No one predicted X in advance” is only damning to a theory if people who believed that theory were making predictions about it at all. If people who generally align with Paul Christiano were indeed making predictions to the effect of GPT-3 capabilities being impossible or very unlikely within a narrow future time window, then I agree that would be damning to Paul’s worldview. But—and maybe I missed something—I didn’t see that. Did you? No, you’re right as far as I know; at least I’m not aware of any such attempted predictions. And in fact, the very absence of such prediction attempts is interesting in itself. One would imagine that correctly predicting the capabilities of an AI from its scale ought to be a phenomenally valuable skill — not just from a safety standpoint, but from an economic one too. So why, indeed, didn’t we see people make such predictions, or at least try to? There could be several reasons. For example, perhaps Paul (and other folks who subscribe to the “continuum” world-model) could have done it, but they were unaware of the enormous value of their predictive abilities. That seems implausible, so let’s assume they knew the value of such predictions would be huge. But if you know the value of doing something is huge, why aren’t you doing it? Well, if you’re rational, there’s only one reason: you aren’t doing it because it’s too hard, or otherwise too expensive compared to your alternatives. So we are forced to conclude that this world-model — by its own implied self-assessment — has, so far, proved inadequate to generate predictions about the kinds of capabilities we really care about. (Note: you could make the argument that OpenAI did make such a prediction, in the approximate yet very strong sense that they bet big on a meaningful increase in aggregate capabilities from scale, and won. You could also make the argument that Paul, having been at OpenAI during the critical period, deserves some credit for that decision. I’m not aware of Paul ever making this argument, but if made, it would be a point in favor of such a view and against my argument above.) • Can I try to parse out what you’re saying about stacked sigmoids? Because it seems weird to me. Like, in that view, it still seems like showing a trendline is some evidence that it’s not “interesting”. I feel like this because I expect the asymptote of the AlphaGo sigmoid to be independent of MCTS bots, so surely you should see some trends where AlphaGo (or equivalent) was invented first, and jumped the trendline up really fast. So not seeing jumps should indicate that it is more a gradual progression, because otherwise, if they were independent, about half the time the more powerful technique should come first. The “what counter argument can I come up with” part of me says, tho, that how quickly the sigmoid grows likely depends on lots of external factors (like compute available or something). So instead of sometimes seeing a sigmoid that grows twice as fast as the previous ones, you should expect one that’s not just twice as tall, but twice as wide, too. And if you have that case, you should expect the “AlphaGo was invented first” sigmoid to be under the MCTS bots sigmoid for some parts of the graph, where it then reaches the same asymptote as AlphaGo in the mainline. So, if we’re in the world where AlphaGo is invented first, you can make gains by inventing MCTS bots, which will also set the trendline. And so, seeing a jump would be less “AlphaGo was invented first” and more “MCTS bots were never invented during the long time when they would’ve outcompeted AlphaGo version −1″ Does that seem accurate, or am I still missing something? • “How do you know BPC is the right metric to use?” “Everyone chose it post hoc after seeing that it worked and better BPC = better models.” I realize your comment is in context of a comment I also disagree with, and I also think I agree with most what you’re saying, but I want to challenge this framing you have at the end. BPC is at its core a continuous generalization of the Turing Test, aka. the imitation game. It is not an exact translation, but it preserves all the key difficulties, and therefore keeps most of its same strengths, and it does this while extrapolating to weaker models in a useful and modelable way. We might only have started caring viscerally about the numbers that BPC gives, or associating them directly to things of huge importance, around the advent of GPT, but that’s largely just a situational byproduct of our understanding. Turing understood the importance of the imitation game back in 1950, enough to write a paper on it, and certainly that paper didn’t go unnoticed. Nor can I see the core BPC:Turing Test correspondance as something purely post-hoc. If people didn’t give it much thought, that’s probably because there never was a scaling law then, there never was an expectation that you could just take your hacky grammar-infused Markov chain and extrapolate it out to capture more than just surface level syntax. Even among earlier neural models, what’s the point of looking at extrapolations of a generalized Turing Test, when the models are still figuring out surface level syntactic details? Like, is it really an indictment of BPC, to say that when we saw the meaning of life is that only if an end would be of the whole supplier. widespread rules are regarded as the companies of refuses to deliver. in balance of the nation’s information and loan growth associated with the carrier thrifts are in the process of slowing the seed and commercial paper we weren’t asking, ‘gee, I wonder how close this is to passing the Turing Test, by some generalized continuous measure’? I think it’s quite surprising—importantly surprising—how it’s turned out that it actually is a relevant question, that performance on this original datapoint does actually bear some continuous mathematical relationship with models for which mere grammar is a been-there-done-that, and we now regularly test for the strength of their world models. And I get the dismissal, that it’s no proven law that it goes so far before stopping, rather than some other stretch, or that it gives no concrete conclusions for what happens at each 0.01 perplexity increment, but I look at my other passion with a straight line, hardware, and I see exactly the same argument applied to almost the same arrow-straight trendline, and I think, I’d still much rather trust the person willing to look at the plot and say, gee, those transistors will be fucking cheap. Would that person predict today, back at the start? Hell no. Knowing transistor scaling laws doesn’t directly tell you all that much about the discontinuous changes in how computation is done. You can’t look at a graph and say “at a transistor density of X, there will be the iPhone, and at a transistor density of Y, microcontrollers will get so cheap that they will start replacing simple physical switches.” It certainly will not tell you when people will start using the technology to print out tiny displays they will stick inside your glasses, or build MEMS accelerometers, nor can it tell you all of the discrete and independent innovations that overcame the challenges that got us here. But yet, but yet, lines go straight. Moore’s Law pushed computing forward not because of these concrete individual predictions, but because it told us there was more of the same surprising progress to come, and that the well has yet to run dry. That too is why I think seeing GPT-3′s perplexity is so important. I agree with you, it’s not that we need the perplexity to tell us what GPT-3 can do. GPT-3 will happily tell us that itself. And I think you will agree with me when I say that what’s most important about these trends is that they’re saying there’s more to come, that the next jump will be just as surprising as the last. Where we maybe disagree is that I’m willing to say these lines can stand by themselves; that you don’t need to actually see anything more of GPT-3 than its perplexity to know that its capabilities must be so impressive, even if you might need to see it to feel it emotionally. You don’t even need to know anything about neural networks or their output samples to see a straight line of bits-per-character that threatens to go so low in order to forecast that something big is going on. You didn’t need to know anything about CPU microarchitecture to imagine that having ten billion transistors per square centimeter would have massive societal impacts either, as long as you knew what a transistor was and understood its fundamental relations to computation. • It just seems very clear to me that the sort of person who is taken in by [Paul Christiano’s slow takeoff] essay is the same sort of person who gets taken in by Hanson’s arguments in 2008 and gets caught flatfooted by AlphaGo and GPT-3 and AlphaFold 2. We can very loosely test this hypothesis by asking whether predictors on Metaculus were surprised by these developments, since Metaculus tends to generally agree with Paul Christiano’s model (see here for example). Unfortunately, we can’t make many inferences with the data available, as it’s too sparse. Still, I’m leaving the following information here in case people find it interesting. • AlphaGo. There were two questions on Metaculus about Go before AlphaGo beat Lee Sedol. The first was this question about whether an AI would beat a top human Go player in 2016. Before AlphaGo became widely known—following the announcement of its match against Fan Hui—the median prediction was around 30%. After the announcement, the probability shot up to 90%. Unfortunately, this can’t be taken to be much evidence that Metaculus impressively foresaw a breakthrough that year, since Demis Hassabis had already hinted at a breakthrough at the time the question was opened. (Before the matches, Metaculus put the chances of AlphaGo beating Lee Sedol at 64%). • GPT-3. It’s unclear what relevant metrics would have counted as “predicting GPT-3”. There was a question for the best Penn Treebank perplexity score in 2019 and it turned out Metaculus over-predicted progress (though this was mostly a failure in operationalization, see Daniel Filan’s post-mortem). Metaculus had generally anticipated a great increase in parameter counts for ML models in early 2020, as evidenced by this question. More generally, GPT-3 doesn’t seem like a good example of a discontinuity in machine learning progress in perplexity when looking at the benchmark data. It’s possible GPT-3 is a discontinuity from previous progress in some other, harder to measure sense, but I’m not currently aware of what that might be. • AlphaFold 2. Metaculus wasn’t generally very surprised by a breakthrough in protein folding prediction. Since early 2019, predictors placed a greater than 80% chance that a breakthrough would happen by 2031 (note, AlphaFold 2 technically doesn’t count as a “breakthrough” by the strict definition in the question criteria). However, it is probably true that Metaculus was surprised that it happened so early. • So… I totally think there are people who sort of nod along with Paul, using it as an excuse to believe in a rosier world where things are more comprehensible and they can imagine themselves doing useful things without having a plan for solving the actual hard problems. Those types of people exist. I think there’s some important work to be done in confronting them with the hard problem at hand. But, also… Paul’s world AFAICT isn’t actually rosier. It’s potentially more frightening to me. In Smooth Takeoff world, you can’t carefully plan your pivotal act with an assumption that the strategic landscape will remain roughly the same by the time you’re able to execute on it. Surprising partial-gameboard-changing things could happen that affect what sort of actions are tractable. Also, dumb, boring ML systems run amok could kill everyone before we even get to the part where recursive self improving consequentialists eradicate everyone. I think there is still something seductive about this world – dumb, boring ML systems run amok feels like the sort of problem that is easier to reason about and maybe solve. (I don’t think it’s actually necessarily easier to solve, but I think it can feel that way, whether it’s easier or not). And if you solve ML-run-amok-problems, you still end up dead from recursive-self-improving-consequentialists if you didn’t have a plan for them. But, that seductiveness feels like a different problem to me than what’s getting argued about in this dialog. (This post seemed to mostly be arguing on the object level at Paul. I recall a previous Eliezer comment where he complained that Paul kept describing things in language that were easy to round off to “things are easy to deal with” even though Eliezer knew that Paul didn’t believe that. That feels more like what the argument here was actually about, but the way the conversation was conducted didn’t seem to acknowledge that.) My current take some object level points in this post: • It (probably) matters what the strategic landscape looks like in the years leading up to AGI. • It might not matter if you have a plan for pivotal acts that you’re confident are resilient against the sort of random surprises that might happen in Smooth Takeoff World. • A few hypotheses that are foregrounded by this post include: • Smooth Takeoff World, as measured in GDP. • GDP mostly doesn’t seem like it matters except as a proxy, so I’m not that hung up on evaluating this. (That said, the “Bureaucracy and Thielian Secrets” model is interesting, and does provoke some interesting thoughts on how the world might be shaped) • Smooth Takeoff World, as measured by “AI-breakthroughs-per-year-or-something”. • This feels like something that might potentially matter. I agree that AI-breakthroughs-per-year is hard to operationalize, but if AI is able to feed back into AI research that seems strategically relevant. I’m surprised/​confused that Eliezer wasn’t more interested in exploring this. • Abrupt Fast Takeoff World, which mostly like this one except suddenly someone has a decisive advantage and/​or we’re all dead. • Chunky Takeoff World. Mostly listed for completeness. Maybe there won’t be a smooth hyperbolic curve all the way to FOOM, there might be a few discrete advances in between here and there. • Eliezer’s arguments against Smooth-Takeoff-World generally don’t feel as ironclad to me as the arguments about FOOM. AFAICT he also only specified arguments in this post against Smooth-Takeoff-Measured-By GDP. It seems possible that, i.e. Deepmind could start making AI advances that they use fully internally without running them through external bureaucracy bottlenecks. It’s possible that any sufficiently large organization develops it’s own internal bureaucracy bottlenecks, but also totally possible that all the smartest people at DeepMind talk to each other and the real work gets done in a way that cuts through it • The “Bureaucracy Bottleneck as crux against Smooth Takeoff GDP World” was quite interesting for general worldmodeling, whether or not it’s strategically relevant. It does suggest it might be quite bad if the AI ecosystem figured out how to bypass it’s own bureaucracy bottlenecks. • My basic take is that there will be lots of empirical examples where increasing model size by a factor of 100 leads to nonlinear increases in capabilities (and perhaps to qualitative changes in behavior). On median, I’d guess we’ll see at least 2 such examples in 2022 and at least 100 by 2030. At the point where there’s a “FOOM”, such examples will be commonplace and happening all the time. Foom will look like one particularly large phase transition (maybe 99th percentile among examples so far) that chains into more and more. It seems possible (though not certain—maybe 33%?) that once you have the right phase transition to kick off the rest, everything else happens pretty quickly (within a few days). Is this take more consistent with Paul’s or Eliezer’s? I’m not totally sure. I’d guess closer to Paul’s, but maybe the “1 day” world is consistent with Eliezer’s? (One candidate for the “big” phase transition would be if the model figures out how to go off and learn on its own, so that number of SGD updates is no longer the primary bottleneck on model capabilities. But I could also imagine us getting that even when models are still fairly “dumb”.) • your view seems to imply that we will move quickly from much worse than humans to much better than humans, but it’s likely that we will move slowly through the human range on many tasks We might be able to falsify that in a few months. There is a joint Google /​ OpenAI project called BIG-bench. They’ve crowdsourced ~200 of highly diverse text tasks (from answering scientific questions to predicting protein interacting sites to measuring self-awareness). One of the goals of the project is to see how the performance on the tasks is changing with the model size, with the size ranging by many orders of magnitude. A half-year ago, they presented some preliminary results. A quick summary: if you increase the N of parameters from 10^7 to 10^10, the aggregate performance score grows roughly like log(N). But after the 10^10 point, something interesting happens: the score starts growing much faster (~N). And for some tasks, the plot looks like a hockey stick (a sudden change from ~0 to almost-human). The paper with the full results is expected to be published in the next few months. Judging by the preliminary results, the FOOM could start like this: The GPT-5 still sucks on most tasks. It’s mostly useless. But what if we increase parameters_num by 2? What could possibly go wrong? • Hot damn, where can I see these preliminary results? • The results were presented at a workshop by the project organizers. The video from the workshop is available here (the most relevant presentation starts at 5:05:00). It’s one of those innocent presentations that, after you understand the implications, keep you awake at night. • Presumably you’re referring to this graph. The y-axis looks like the kind of score that ranges between 0 and 1, in which case this looks sort-of like a sigmoid to me, which accelerates when it gets closer to ~50% performance (and decelarates when it gets closer to 100% performance). If so, we might want to ask whether these tasks are chosen ~randomly (among tasks that are indicative of how useful AI is) or if they’re selected for difficulty in some way. In particular, assume that most tasks look sort-of like a sigmoid as they’re scaled up (accelerating around 50%, improving slower when they’re closer to 0% and 100%). Then you might think that the most exciting tasks to submit to big bench would be the tasks that can’t be handled by small models, but that large models rapidly improve upon (as opposed to tasks that are basically-solved already by 10^10 parameters). In which case the aggregation of all these tasks could be expected to look sort-of like this, improving faster after 10^10 than before. ...is one story I can tell, but idk if I would have predicted that beforehand, and fast acceleration after 10^10 is certainly consistent with many people’s qualitative impressions of GPT-3. So maybe there is some real acceleration going on. (Also, see this post for similar curves, but for the benchmarks that OpenAI tested GPT-3 on. There’s no real acceleration visible there, other than for arithmetic.) • The preliminary results where obtained on a subset of the full benchmark (~90 tasks vs 206 tasks). And there were many changes since then, including scoring changes. Thus, I’m not sure we’ll see the same dynamics in the final results. Most likely yes, but maybe not. I agree that the task selection process could create the dynamics that look like the acceleration. A good point. As I understand, the organizers have accepted almost all submitted tasks (the main rejection reasons were technical—copyright etc). So, it was mostly self-selection, with the bias towards the hardest imaginable text tasks. It seems that for many contributors, the main motivation was something like: Take that, the most advanced AI of Google! Let’s see if you can handle my epic task! This includes many cognitive tasks that are supposedly human-complete (e.g. understanding of humor, irony, ethics), and the tasks that are probing the model’s generality (e.g. playing chess, recognizing images, navigating mazes—all in text). I wonder if the performance dynamics on such tasks will follow the same curve. The list of of all tasks is available here. • For Sanh et al. (2021), we were able to negotiate access to preliminary numbers from the BIG Bench project and run the T0 models on it. However the authors of Sanh et al. and the authors of BIG Bench are different groups of people. • Nope. Although the linked paper uses the same benchmark (a tiny subset of it), the paper comes from a separate research project. As I understand, the primary topic of the future paper will be the BIG-bench project itself, and how the models from Google /​ OpenAI perform on it. • But after the 10^10 point, something interesting happens: the score starts growing much faster (~N). And for some tasks, the plot looks like a hockey stick (a sudden change from ~0 to almost-human). Seems interestingly similar to the grokking phenomenon. • But after the 10^10 point, something interesting happens: the score starts growing much faster (~N). And for some tasks, the plot looks like a hockey stick (a sudden change from ~0 to almost-human). ... Judging the preliminary results, the FOOM could start like this: “The GPT-5 still sucks on most tasks. It’s mostly useless. But what if we increase parameters_num by 2? What could possibly go wrong?” Hypothesis: • doing things in the real world requires diverse skills (strong performance on a diverse set of tasks) • hockey-sticking performance on a particular task makes that task no longer the constraint on what you can accomplish • but now some other task is the bottleneck • so, unless you can hockey-stick on all the tasks all at once, your overall ability to do things in the world will get smoothed out a bunch, even if it still grows very rapidly • Seems like there’s a spectrum between smooth accelerating progress and discontinuous takeoff. And where we end up on that spectrum depends on a few things: • how much simple improvements (better architecture, more compute) help with a wide variety of tasks • how much improvements in AI systems is bottlenecked on those tasks • how many resources the world is pouring into finding and making those improvements Recent evidence (success of transformers, scaling laws) seems to suggest that Eliezer was right in the FOOM debate that simple input changes could make a large difference across a wide variety of tasks. It’s less clear to me though whether that means a local system is going to outcompete the rest of the economy, because it seems plausible to me that the rest of the economy is also going to be full-steam ahead searching the same improvement space that a local system will be searching. And I think in general real world complexity tends to smooth out lumpy graphs. As an example, even once we realize that GPT-2 is powerful and GPT-3 will be even better, there’s a whole bunch of engineering work that had to go into figuring out how to run such a big neural network across multiple machines. That kind of real-world messiness seems like it will introduce new bottlenecks at every step along the way, and every order-of-magnitude change in scale, which makes me think that the actual impact of AI will be a lot more smooth than we might otherwise think just based on simple architectures being generally useful and scalable. • What makes you say BIG Bench is a joint Google /​ OpenAI project? I’m a contributor to it and have seen no evidence of that. • During the workshop presentation, Jascha said that the OpenAI will run their models on the benchmark. This suggests that there is (was?) some collaboration. But it was a half a year ago. Just checked, the repo’s readme doesn’t mention OpenAI anymore. In the earlier versions, it was mentioned like this: Teams at Google and OpenAI have committed to evaluate BIG-Bench on their best-performing model architectures So, it seems that OpenAI withdrew from the project, partially or fully. • GPT-4 is expected to have about 10^14 parameters and be ready in a few years. And, we already know that GPT-3 can write code. The following all seem (to me at least) like very reasonable conjectures: (i) Writing code is one of the tasks at which GPT-4 will have (at least) human level capability. (ii) Clones of GPT-4 will be produced fairly rapidly after GPT-4, say 1-3 years. (iii) GPT-4 and its clones will have a significant impact on society. This will show up in the real economy. (iv) GPT-4 will be enough to shock governments into paying attention. (But as we have seen with climate change governments can pay attention to an issue for a long time without effectively doing anything about it.) (v) Someone is going to ask for GPT-4 (clone) to produce code that generates AGI. (Implicitly, if not explicitly.) I have absolutely no idea whether GPT-4 will succeed at this endeavor. But if not, GPT-5 should be available a few years later.... (And, of course, this is just one pathway.) • GPT-4 is expected to have about 10^14 parameters There was a Q&A where Sam Altman said GPT-4 is going to be a lot smaller than that (in particular, that it wouldn’t have a lot more parameters than GPT-3). • [ETA: In light of pushback from Rob: I really don’t want this to become a self-fulfilling prophecy. My hope in making this post was to make the prediction less likely to come true, not more! I’m glad that MIRI & Eliezer are publicly engaging with the rest of the community more again, I want that to continue, and I want to do my part to help everybody to understand each other.] And I know, before anyone bothers to say, that all of this reply is not written in the calm way that is right and proper for such arguments. I am tired. I have lost a lot of hope. There are not obvious things I can do, let alone arguments I can make, which I expect to be actually useful in the sense that the world will not end once I do them. I don’t have the energy left for calm arguments. What’s left is despair that can be given voice. I grimly predict that the effect of this dialogue on the community will be polarization: People who didn’t like Yudkowsky and/​or his views will like him /​ his views less, and the gap between them and Yud-fans will grow (more than it shrinks due to the effect of increased dialogue). I say this because IMO Yudkowsky comes across as angry and uncharitable in various parts of this dialogue, and also I think it was kinda a slog to get through & it doesn’t seem like much intellectual progress was made here. FWIW I continue to think that Yudkowskys model of how the future will go is basically right, at least more right than Christiano’s. This is a big source of sadness and stress for me too, because (for example) my beloved daughter probably won’t live to adulthood. The best part IMO was the mini-essay at the end about Thielian secrets and different kinds of tech progress—a progression of scenarios adding up to Yudkowsky’s understanding of Paul’s model: But we can imagine that doesn’t happen either, because instead of needing to build a whole huge manufacturing plant, there’s just lots and lots of little innovations adding up to every key AGI threshold, which lots of actors are investing10 million in at a time, and everybody knows which direction to move in to get to more serious AGI and they’re right in this shared forecast.

It does seem to me that the AI industry will move more in this direction than it currently is, over the next decade or so. However I still do expect that we won’t get all the way there. I would love to hear from Paul whether he endorses the view Yudkowsky attributes to him in this final essay.

• I grimly predict that the effect of this dialogue on the community will be polarization

Beware of self-fulfilling prophecies (and other premature meta)! If both sides in a dispute expect the other side to just entrench, then they’re less likely to invest the effort to try to bridge the gap.

This very comment section is one of the main things that will determine the community’s reaction, and diverting our focus to ‘what will our reaction be?’ before we’ve talked about the object-level claims can prematurely lock in a certain reaction.

(That said, I think you’re doing a useful anti-polarization thing here, by showing empathy for people you disagree with, and showing willingness to criticize people you agree with. I don’t at all dislike this comment overall; I just want to caution against giving up on something before we’ve really tried. This is the first proper MIRI-response to Paul’s takeoff post, and should be a pretty big update for a lot of people—I don’t think people were even universally aware that Eliezer endorses hard takeoff anymore, much less aware of his reasoning.)

• Fair enough! I too dislike premature meta, and feel bad that I engaged in it. However… I do still feel like my comment probably did more to prevent polarization than cause it? That’s my independent impression at any rate. (For the reasons you mention).

I certainly don’t want to give up! In light of your pushback I’ll edit to add something at the top.

• While this may not be the ideal format for it, I thought Eliezer’s voicing of despair was a useful update to publish to the LW community about the current state of his AI beliefs.

• I grimly predict that the effect of this dialogue on the community will be polarization: People who didn’t like Yudkowsky and/​or his views will like him /​ his views less, and the gap between them and Yud-fans will grow (more than it shrinks due to the effect of increased dialogue). I say this because IMO Yudkowsky comes across as angry and uncharitable in various parts of this dialogue, and also I think it was kinda a slog to get through & it doesn’t seem like much intellectual progress was made here.

Strongly agree with that.

Since you agree with Yudkowksy, do you think you could strongman his position?

• Yes, though I’m much more comfortable explaining and arguing for my own position than EY’s. It’s just that my position turns out to be pretty similar. (Partly this is independent convergence, but of course partly this is causal influence since I’ve read a lot of his stuff.)

There’s a lot to talk about, I’m not sure where to begin, and also a proper response would be a whole research project in itself. Fortunately I’ve already written a bunch of it; see these two sequences.

Here are some quick high-level thoughts:

1. Begin with timelines. The best way to forecast timelines IMO is Ajeya’s model; it should be the starting point and everything else should be adjustments from it. The core part of Ajeya’s model is a probability distribution over how many OOMs of compute we’d need with today’s ideas to get to TAI /​ AGI /​ APS-AI /​ AI-PONR /​ etc. [Unfamiliar with these acronyms? See Robbo’s helpful comment below] For reasons which I’ve explained in my sequence (and summarized in a gdoc) my distribution has significantly more mass on the 0-6 OOM range than Paul does, and less on the 13+ range. The single post that conveys this intuition most is Fun with +12 OOMs.

Now consider how takeoff speed views interact with timelines views. Paul-slow takeoff and <10 year timelines are in tension with each other. If <7 OOMs of compute would be enough to get something crazy powerful with today’s ideas, then the AI industry is not an efficient market right now. If we get human-level AGI in 2030, then on Paul’s view that means the world economy should be doubling in 2029 and should have doubled over the course of 2025 − 2028 and should already be accelerating now probably. It doesn’t look like that’s happening or about to happen. I think Paul agrees with this; in various conversations he’s said things like “If AGI happens in 10 years or less then probably we get fast takeoff.” [Paul please correct me if I’m mischaracterizing your view!]

Ajeya (and Paul) mostly update against <10 year timelines for this reason. I, by contrast, mostly update against slow takeoff. (Obviously with both do a bit of both, like good Bayesians.)

2. I feel like the debate between EY and Paul (and the broader debate about fast vs. slow takeoff) has been frustratingly much reference class tennis and frustratingly little gears-level modelling. This includes my own writing on the subject—lots of historical analogies and whatnot. I’ve tentatively attempted some things sorta like gears-level modelling (arguably What 2026 Looks Like is an example of this) and so far it seems to be pushing my intuitions more towards “Yep, fast takeoff is more likely.” But I feel like my thinking on this is super inadequate and I think we all should be doing better. Shame! Shame on all of us!

3. I think the focus on GDP (especially GWP) is really off, for reasons mentioned here. I think AI-PONR will probably come before GWP accelerates, and at any rate what we care about for timelines and takeoff speeds is AI-PONR and so our arguments should be about e.g. whether there will be warning shots and powerful AI tools of the sort that are relevant to solving alignment for APS-AI systems.

(Got to go now)

• The core part of Ajeya’s model is a probability distribution over how many OOMs of compute we’d need with today’s ideas to get to TAI /​ AGI /​ APS-AI /​ AI-PONR /​ etc.

I didn’t know the last two acronyms despite reading a decent amount of this literature, so thought I’d leave this note for other readers. Listing all of them for completeness (readers will of course know the first two):

TAI: transformative AI

AGI: artificial general intelligence

APS-AI: Advanced, Planning, Strategically aware AI [1]

AI-PONR: AI point of no return [2]

[1] from Carlsmith, which Daniel does link to

[2] from Daniel, which he also linked

• Sorry! I’ll go back and insert links + reference your comment

• I feel like the debate between EY and Paul (and the broader debate about fast vs. slow takeoff) has been frustratingly much reference class tennis and frustratingly little gears-level modelling.

So, there’s this inherent problem with deep gearsy models, where you have to convey a bunch of upstream gears (and the evidence supporting them) before talking about the downstream questions of interest, because if you work backwards then peoples’ brains run out of stack space and they lose track of the whole multi-step path. But if you just go explaining upstream gears first, then people won’t immediately see how they’re relevant to alignment or timelines or whatever, and then lots of people just wander off. Then you go try to explain something about alignment or timelines or whatever, using an argument which relies on those upstream gears, and it goes right over a bunch of peoples’ heads because they don’t have that upstream gear in their world-models.

For the sort of argument in this post, it’s even worse, because a lot of people aren’t even explicitly aware that the relevant type of gear is a thing, or how to think about it beyond a rough intuitive level.

I first ran into this problem in the context of takeoff arguments a couple years ago, and wrote up this sequence mainly to convey the relevant kinds of gears and how to think about them. I claim that this (i.e. constraint slackness/​tautness) is usually a good model for gear-type in arguments about reference-classes in practice: typically an intuitively-natural reference class is a set of cases which share some common constraint, and the examples in the reference class then provide evidence for the tautness/​slackness of the constraint. For instance, in this post, Paul often points to market efficiency as a taut constraint, and Eliezer argues that constraint is not very taut (at least not in the way needed for the slow takeoff argument). Paul’s intuitive estimates of tautness are presumably driven by things like e.g. financial markets. On the other side, Eliezer wrote Inadequate Equilibria to talk about how taut market efficiency is in general, including gears “further up” and more examples.

If you click through the link in the post to Intelligence Explosion Microeconomics, there’s a lot of this sort of reasoning in it.

• So, there’s this inherent problem with deep gearsy models, where you have to convey a bunch of upstream gears (and the evidence supporting them) before talking about the downstream questions of interest, because if you work backwards then peoples’ brains run out of stack space and they lose track of the whole multi-step path. But if you just go explaining upstream gears first, then people won’t immediately see how they’re relevant to alignment or timelines or whatever, and then lots of people just wander off. Then you go try to explain something about alignment or timelines or whatever, using an argument which relies on those upstream gears, and it goes right over a bunch of peoples’ heads because they don’t have that upstream gear in their world-models.

The solution might be to start with a concise, low-detail summery (not even one that argues the case, just states it), then start explaining in full detail from the start, knowing that your readers now know which way you’re going.

Wait, I think I just invented the Abstract (not meant as a snide remark. I really did realize it after writing the above, and found it funny).

• Survey on model updates from reading this post. Figuring out to what extent this post has led people to update may inform whether future discussions are valuable.

Results: (just posting them here, doesn’t really need its own post)

The question was to rate agreement on the 1=Paul to 9=Eliezer axis before and after reading this post.

Data points: 35

Mean:

Median:

Raw Data

Agreement more on need for actions than on probabilities. Would be better to first present points of agreement (that it is at least possible for non(dangerously)-general AI to change situation).

the post was incredibly confusing to me and so I haven’t really updated at all because I don’t feel like I can crisply articulate yudkowsky’s model or his differences with christiano

• Wow, I did not expect those results!

• I wonder what effect there is from selecting for reading the third post in a sequence of MIRI conversations from start to end and also looking at the comments and clicking links in them.

• Were you surprised by the direction of the change or the amount?

• My prediction was mainly about polarization rather than direction, but I would have expected the median or average to not move much probably, and to be slightly more likely to move towards Paul than towards Yudkowsky. I think. I don’t think I was very surprised.

• Why would it move toward Paul? He made almost no arguments, and Eliezer made lots. When Paul entered the chat it was focused on describing what each of them believe in order to find a bet, not communicating why they believe it.

• I think I was expecting somewhat better from EY; I was expecting more solid, well-explained arguments/​rebuttals to Paul’s points from “Takeoff Speeds.” Also EY seemed to be angry and uncharitable, as opposed to calm and rational. I was imagining an audience that mostly already agrees with Paul encountering this and being like “Yeah this confirms what we already thought.”

• FWIW “yeah this confirms what we already thought” makes no sense to me. I heard someone say this the other day, and I was a bit floored. Who knew that Eliezer would respond with a long list of examples that didn’t look like continuous progress at the time, and said this more than 3 days ago?

I feel like I got a much better sense of Eliezer’s perspective reading this. One key element is whether AI progress is surprising, which it often is even if you can make trend-line arguments after-the-fact, people basically don’t, and when they do they often get it wrong. (Here’s an example of Dario Amodei + Danny Hernandez finding a trend in AI, that apparently immediately stopped trending as soon as they reported it.) There’s also lots of details about what the chimps-to-humans transition shows, and various other points (like regulation preventing most AI progress from showing up in GDP).

I do think I could’ve gotten a lot of this understanding earlier by more carefully reading IEM, and now that I’m rereading it I get it much better. But nobody seems to have engaged with the arguments in it and tried to connect them to Paul’s post that I can see. Perhaps someone did, and I’d be pretty interested to read that now with the benefit of hindsight.

To be clear, I think that if EY put more effort into it (and perhaps had some help from other people as RAs) he could write a book or sequence rebutting Paul & Katja much more thoroughly and convincingly than this post did. [ETA: I.e. I’m much more on Team Yud than Team Paul here.] The stuff said here felt like a rehashing of stuff from IEM and the Hanson-Yudkowsky AI foom debate to me. [ETA: Lots of these points were good! Just not surprising to me, and not presented as succinctly and compellingly (to an audience of me) as they could have been.]

Also, it’s plausible that a lot of what’s happening here is that I’m conflating my own cruxes and confusions for The Big Points EY Objectively Should Have Covered To Be More Convincing. :)

ETA: And the fact that people updated towards EY on average, and significantly so, definitely updates me more towards this hypothesis!

• This is my take: if I had been very epistemically self-aware, and carefully distinguished my own impression/​models and my all-things considered beliefs, before I started reading, then this would’ve updated my models towards Eliezer (because hey, I heard new not-entirely-uncompelling arguments) but my all-things considered beliefs away from Eliezer (because I would have expected it to be even more convincing).

I’m not that surprised by the survey results. Most people don’t obey conservation of expected evidence, because they don’t take into account arguments they haven’t heard /​ don’t think carefully enough about how deferring to others works. People will predictably update toward a thesis after reading a book that argues for it, not have a 5050 chance of updating positively or negatively on it.

• I didn’t move significantly towards either party but it seemed like Eliezer was avoiding bets, and generally, in my humble opinion, making his theory unfalsifiable rather than showing what its true weakpoints are. That doesn’t seem like what a confidently correct person would do (but it was already mostly what I expected, so I didn’t update by much on his theory’s truth value).

ETA: After re-reading my comment, I feel I may have come off too strong. I’ll completely unendorse my language and comment if people think this sort of thing is not conducive to productive discourse. Also, I greatly appreciate both parties for doing this.

• I find it valuable to know what impressions other people had themselves; it only becomes tone-policing when you worry loudly about what impressions other people ‘might’ have. (If one is worried about how it looks to say so publicly, one could always just DM me (though I might not respond).)

• FWIW I also don’t like the phrasing of my comment very much either. I came back thinking to remove it but saw you’d already replied :P

• (Not being too specific to avoid spoilers) Quick note: I think the direction of the shift in your conclusion might be backwards, given the statistics you’ve posted and that 1=Eliezer and 9=Paul.

• How interesting; I am the median.

• Unfortunately, it looks like Yudkowsky and Christiano weren’t able to come to an agreement on what bets to make.

In place of that, I’ll ask, whatever camp you belong to: what concrete predictions do you make that you believe most strongly diverge from what people in the “other” camp believe, and can be resolved substantially before the world ends?

I propose we restrict our predictions to roughly 2026, which is pretty soon but probably not world-ending-soon (on almost all views).

• I would say I agree more with Christiano.

By 2026:

• At least 50% of programming work that would have been done by a human programmer in 2019 will be done by systems like Codex or Co-Pilot.

• Humaniod robotic maids, butlers and companions will be for sale in some form, although they will be limited and underwhelming, and few people will have them in their homes.

• Self driving will finally be practical and applied widely. In the USA, between 10 and 70% of automobile trips will be autonomous or in self driving mode. Humans will not be banned from driving anywhere in the world, that’s more of a 2030s+ thing.

• AI will beat human grandmasters at nearly every video game or formal game. There might be 1-5 games which AI still struggles with, and they will be notable exceptions. Or there might be 0 such games. RL systems can learn most games from pixels in less than a GPU-day (using 2026-era GPUs, consuming less than 1000 watts and costing less than $4,000 USD2019 adjusted for inflation.) RL research will be focused on beating humans in sports and physical games like soccer, basketball, golf, etc. • Chatbots will regularly pass Turing tests, although it will remain controversial whether that means anything. Publicly available chatbots will be about as good as GPT-3 in grammar and competence, but unlike GPT-3 they will have consistent personalities and memory over time—i.e., the limitations of the 2048 token window will be overcome somehow. Good chatbots will be available to the public, and will be ubiquitous in customer service, but whether they are popular as companions or personal assistants will depend on public acceptance. This is the same problem faced by AR: the tech will definitely be there, but the public might not be interested and might be somewhat hostile. • I personally am not sure if GWP growth will be significantly above historical baselines. I think AI will have progressed significantly, but we also know that, even going back to the 90s, information technology has made an underwhelming impact on productivity. The world economy is such a weird mess right now for reasons that have nothing to do with AI, so it’s hard to make predictions. • There won’t be significant unemployment due to technology (yet), but some careers will be significantly altered, including drivers and programmers. I consider these predictions to be pretty conservative. I would not be surprised to be surprised by AI progress, but I would be very disappointed if we didn’t meet 57 of my predictions. • I think I’m happy to bet against predictions (1), (2), (3), and (5). Predictions (6) and (7) don’t seem like they’re committing to anything specific so I don’t know whether I disagree. My worry is that when we get more specific about what each of these things mean, you might end up backing off and use a much more modest operationalization than I’m hoping for. For example, when you say, Chatbots will regularly pass Turing tests I don’t think that a chatbot will pass a strong (adversarial) Turing test by 2026, of the type specified in Kapor and Kurzweil’s 2029 bet. However, I expect there will be weaker, less impressive Turing tests that chatbots will pass by then. Also it’s unclear what “regularly pass” means (did bots “regularly” beat top Go players in 2016, or was that just a few games?). • 6 and 7 are definitely non-predictions, or a prediction that nothing interesting will happen. 1, 2, 4 and 5 are softly almost true today: (1) AI Programming—I heard a rumor (don’t have a source on this) that something like 30% of new GitHub commits involve Co-Pilot. I can’t imagine that is really true, seems so implausible, but my prediction can come true if AI code completion becomes very popular. (2) Household Robots—Every year for the last decade or so some company has demoed some kind of home robot at an electronics convention, but if any of them have actually made it to market, the penetration is very small. Eventually someone will make one that’s useful enough to sell a few hundred or more units. I don’t think a Roomba should qualify as meeting my prediction, which is why I specified a “humanoid” robot. (3) Self Driving—I stand by what I said, nothing to expand on. I believe that Tesla and Waymo, at least, already have self driving tech good enough, so this is mostly about institutional acceptance. (4) DRL learning games from pixels—EfficientZero essentially already does this, but restricted to the 57 Atari games. My prediction is that there will be an EfficientZero for all video games. (5) Turing Test—I think that the Turing test is largely a matter of how long the computer can fool the judge for, in addition to the judge knowing what to look for. Systems from the 70s could probably fool a judge for about 30 seconds. Modern chatbots might be able to fool a competent judge for 10 minutes, and an incompetent judge (naive casual user) for a couple hours at the extreme. I think by 2026 chatbots will be able to fool competent judges for at least 30 minutes, and will be entertaining to naive casual users indefinitely (i.e., people can make friends with their chatbots and it doesn’t get boring quickly if ever.) For 6 and 7, I’m going to make concrete predictions. (6) Some research institute or financial publication of repute will claim that AI technology (not computers generally, just AI) will have “added X Trillion Dollars” to the US or world economy, where X is at least 0.5% of US GDP or GWP, respectively. Whether this is actually true might be controversial, but someone will have made the claim. GWP will not be significantly above trendline. (7) At least two job titles will have less than 50% the number of workers as 2019. The most likely jobs to have been affected are drivers, cashiers, fast food workers, laundry workers, call center workers, factory workers, workers in petroleum-related industries*, QA engineers, and computer programmers. These jobs might shift within the industry such that the number of people working in industry X is similar, but there has to be a job title that shrunk by 50%. For example, the same X million people still work in customer service, but most of them are doing something like prompt engineering on AI chatbots, as opposed to taking phone calls directly. * This one has nothing to do with AI, but I expect it to happen by 2026 nonetheless. Let me know if you want to formalize a bet on some website. • For (2) I am less interested in betting than I was previously. Before, I assumed you meant that there would be actual, competent Humaniod robotic maids and butlers for sale in 2026. But now I’m imagining that you meant just any ordinary Humanoid robot on the market, even if doesn’t do what a real human maid or butler does. Like, I think technically in 1990 companies could have already been selling “Humanoid robotic maids”, but they would’ve been functionally useless. Without some sort of constraint on what actually counts as a robotic maid, I think some random flashy-yet-useless robot that changed hands and made some company$300,000 in revenue might count for the purposes of this bet. And I would prefer not to take a bet with that as a potential outcome.

• Some research institute or financial publication of repute will claim that AI technology (not computers generally, just AI) will have “added X Trillion Dollars” to the US or world economy, where X is at least 0.5% of US GDP or GWP, respectively. Whether this is actually true might be controversial, but someone will have made the claim.

This seems like an extremely weak prediction. Institutions, even fairly reputable ones, make fantastic claims like that all the time.

For example, I found one article written in 2019 that says, “By one estimate, AI contributed a whopping $2 trillion to global GDP last year.” It cites the PricewaterhouseCoopers, which according to Wikipedia is “the second-largest professional services network in the world and is considered one of the Big Four accounting firms, along with Deloitte, EY and KPMG.” Since GWP was about 86.1 trillion USD in 2018, according to the World Bank, this means that PwC thinks that artificial intelligence is already contributing more than 2% of our gross world product, four times more than you expected would be claimed by 2026! • Modern chatbots might be able to fool a competent judge for 10 minutes I am highly skeptical. Which chatbots are you imagining here? • (3) Self Driving—I stand by what I said, nothing to expand on. I believe that Tesla and Waymo, at least, already have self driving tech good enough, so this is mostly about institutional acceptance. The problem with some of your predictions is that I don’t know how to operationalize them. For example, does L4 self-driving count? What about L3? What source can be used to resolve this question? I’m not currently aware of any source that counts the number of trips done in automobiles in the US, and tabulates them by car type (or self-driving status). So, to bet, we’d either need to get a source, or come up with a different way of operationalizing the question. (As an aside, I have found that a very high fraction of predictions—even among people who care a lot about betting—tend to be extremely underspecified. I think it’s a non-trivial skill to know how to operationalize bets, and most people just aren’t very good at it without lots of practice. That’s not a criticism of you :). However, I do prefer that you state your predictions very precisely because otherwise we’re just not going to be able to do the bet.) • I think you’re 100% right. Most (>>80%) of the bets I see on Long Bets, or predictions on MetaCalculus, are underspecified to the point where where a human mediator would have to make a judgement call that can be considered unfair to someone. I don’t expect that to change no matter how much work I do, unless I make bets on specific statistics from well known sources, e.g. the stock market, or the CIA World Factbook. There are possible futures where prediction (3) is obvious. For example, if someone predicted that 50% of trips will be self driving in 2021 (many people did predict that 5 years ago) we can easily prove them wrong without having to debate whether Tesla is L2 or L5 and whether that matters. Teslas are not 50% of the cars on the road, nor are Waymos, so you can easily see that most trips in 2021 are not self driving by any definition. I think there are also future worlds were 95% of cars and trips are L5, most cars can legally autonomously drive anywhere without any humans inside, etc, and in that world there isn’t much to debate about unless you’re really petty. So we could make bets hoping that things will be that obvious, but I don’t think either of us want to do the work to avoid this kind of ambiguity. I’m happy to consider my bets as paid in Bayes points without any need for future adjudication. So, for all the Bayes points, I’d love to hear what your equivalent predictions are for 2026. For what it’s worth, here’s my revised (3): Greater than 10% of cars on the road will be legally capable of either L4/​L5 OR legally L2/​L3 but disengagements will be uncommon, less than once in a typical trip. (Meaning, if you watch a video from the AI DRIVR YouTube channel, there’s less than one disengagement per 20 minutes of driving time.) • I think you’re 100% right. Most (>>80%) of the bets I see on Long Bets, or predictions on MetaCalculus, are underspecified to the point where where a human mediator would have to make a judgement call that can be considered unfair to someone. To be clear, I have spent a ton of time on Metaculus and I find this impression incorrect. I have spent comparatively little time on Long Bets but I think it’s also wrong there for the most part. I think you may have accidentally called out parties who are, in my opinion, exemplars of what solid prediction platforms should look like. There are far, far worse parties that you could have called out. • Summary of my response: before you can train a really powerful AI, someone else can train a slightly worse AI. Yeah, and before you can evolve a human, you can evolve a Homo erectus, which is a slightly worse human. I might be wrong about this, but my impression was that the rise of human culture and civilization was timed with the end of the Pleistocene, rather than timed with the development of better (and more general) brains. My guess is that modern humans probably do have more general brains than Homo erectus that came before us. But if Homo erectus had not been living in a geological epoch of repeated glaciations, then perhaps we would have seen a simpler Homo erectus civilization? In general, I don’t yet see a strong reason to think that our general brain architecture is the sole, or potentially even primary reason why we’ve developed civilization, discontinuous with the rest of the animal kingdom. A strong requirement for civilization is the development of cultural accumulation via language, and more specifically, the ability to accumulate knowledge and technology over generations. Just having a generalist brain doesn’t seem like enough; for example, could there have been a dolphin civilization? • If I take the number of years since the emergence of Homo erectus (2 million years) and divide that by the number of years since the origin of life (3.77 billion years), and multiply that by the number of years since the founding of the field of artificial intelligence (65 years), I get a little under twelve days. This seems to at least not directly contradict my model of Eliezer saying “Yes, there will be an AGI capable of establishing an erectus-level civilization twelve days before there is an AGI capable of establishing a human-level one, or possibly an hour before, if reality is again more extreme along the Eliezer-Hanson axis than Eliezer. But it makes little difference whether it’s an hour or twelve days, given anything like current setups.” Also insert boilerplate “essentially constant human brain architectures, no recursive self-improvement, evolutionary difficulty curves bound above human difficulty curves, etc.” for more despair. I guess even though I don’t disagree that knowledge accumulation has been a bottleneck for humans dominating all other species, I don’t see any strong reason to think that knowledge accumulation will be a bottleneck for an AGI dominating humans, since the limits to human knowledge accumulation seem mostly biological. Humans seem to get less plastic with age, mortality among other things forces us to specialize our labor, we have to sleep, we lack serial depth, we don’t even approach the physical limits on speed, we can’t run multiple instances of our own source, we have no previous example of an industrial civilization to observe, I could go on: a list of biological fetters that either wouldn’t apply to an AGI or that an AGI could emulate inside of a single mind instead of across a civilization. I am deeply impressed by what has come out of the bare minimum of human innovative ability plus cultural accumulation. You say “The engine is slow,” I say “The engine hasn’t stalled, and look how easy it is to speed up!” I’m not sure I like using the word ‘discontinuous’ to describe any real person’s position on plausible investment-output curves any longer; people seem to think it means “intermediate value theorem doesn’t apply,” (which seems reasonable) when usually hard/​fast takeoff proponents really mean “intermediate value theorem still applies but the curve can be almost arbitrarily steep on certain subintervals.” • That was a pretty good Eliezer model; for a second I was trying to remember if and where I’d said that. • I guess even though I don’t disagree that knowledge accumulation has been a bottleneck for humans dominating all other species, I don’t see any strong reason to think that knowledge accumulation will be a bottleneck for an AGI dominating humans, since the limits to human knowledge accumulation seem mostly biological. Humans seem to get less plastic with age, mortality among other things forces us to specialize our labor, we have to sleep, we lack serial depth, we don’t even approach the physical limits on speed, we can’t run multiple instances of our own source, we have no previous example of an industrial civilization to observe, I could go on: a list of biological fetters that either wouldn’t apply to an AGI or that an AGI could emulate inside of a single mind instead of across a civilization. I agree with this, and I think that you are hitting on a key a reason that these debates don’t hinge on what the true story of the human intelligence explosion ends up being. Whichever of these is closer to the truth a) the evolution of individually smarter humans using general reasoning ability was the key factor b) the evolution of better social learners and the accumulation of cultural knowledge was the key factor ...either way, there’s no reason to think that AGI has to follow the same kind of path that humans did. I found an earlier post on the Henrich model of the evolution of intelligence, Musings on Cumulative Cultural Evolution and AI. I agree with Rohin Shah’s takeaway on that post : I actually don’t think that this suggests that AI development will need both social and asocial learning: it seems to me that in this model, the need for social learning arises because of the constraints on brain size and the limited lifetimes. Neither of these constraints apply to AI—costs grow linearly with “brain size” (model capacity, maybe also training time) as opposed to superlinearly for human brains, and the AI need not age and die. So, with AI I expect that it would be better to optimize just for asocial learning, since you don’t need to mimic the transmission across lifetimes that was needed for humans. • I’m not sure I like using the word ‘discontinuous’ to describe any real person’s position on plausible investment-output curves any longer; people seem to think it means “intermediate value theorem doesn’t apply,” (which seems reasonable) when usually hard/​fast takeoff proponents really mean “intermediate value theorem still applies but the curve can be almost arbitrarily steep on certain subintervals.” FWIW when I use the word discontinuous in these contexts, I’m almost always referring to the definition Katja Grace uses, We say a technological discontinuity has occurred when a particular technological advance pushes some progress metric substantially above what would be expected based on extrapolating past progress. We measure the size of a discontinuity in terms of how many years of past progress would have been needed to produce the same improvement. We use judgment to decide how to extrapolate past progress. This is quite different than the mathematical definition of continuous. • In general, I don’t yet see a strong reason to think that our general brain architecture is the sole, or potentially even primary reason why we’ve developed civilization, discontinuous with the rest of the animal kingdom. A strong requirement for civilization is the development of cultural accumulation via language, and more specifically, the ability to accumulate knowledge and technology over generations. In The Secrets of Our Success, Joe Henrich argues that without our stock of cultural knowledge, individual humans are not particularly more generally intelligent than apes. (Neanderthals may very well have been more generally intelligent than humans—and indeed, their brains are bigger than ours.) And, he claims, to the extent that individual humans are now especially intelligent, this was because of culture-driven natural selection. For Henrich, the story of human uniqueness is a story of a feedback loop: increased cultural know-how, which drives genetic selection for bigger brains and better social learning, which leads to increased cultural know-how, which drives genetic selection for bigger brains….and so forth, until you have a very weird great ape that is weak, hairless, and has put a flag on the moon. Note: this evolution + culture feedback loop is still a huge discontinuity that led to massive changes in relatively short evolutionary time! Just having a generalist brain doesn’t seem like enough; for example, could there have been a dolphin civilization? Heinrich speculates that a bunch of idiosyncratic features came together to launch us into the feedback loop that led to us being cultural species. Most species, including dolphins, do not get onto this feedback loop because of a “startup” problem: bigger brains will give a fitness advantage only up to a certain point, because individual learning can only be so useful. For there to be further selection for bigger brains, you need a stock of cultural know-how (cooking, hunting, special tools) that makes individual learning very important for fitness. But, to have a stock of cultural know-how, you need big brains. Heinrich speculates that humans overcame the startup problem due to a variety of factors that came together when we descended from the trees and started living on the ground. The important consequences of a species being on the ground (as opposed to in the trees): 1. It frees up your hands for tool use. Captive chimps, which are more “grounded” than wild chimps, make more tools. 2. It’s easier for you to find tools left by other people. 3. It’s easier for you to see what other people are doing and hang out with them. (“Hang out” being inapt, since that’s precisely not what you’re doing). 4. You need to group up with people to survive, since there are terrifying predators on the ground. Larger groups offer protection; these larger groups will accelerate the process of people messing around with tools and imitating each other. Larger groups also produce new forms of social organization. Apparently, in smaller groups of chimps, the reproductive strategy that every male tries to follow is “fight as many males as you can for mating opportunities.” But in a larger group, it becomes better for some males to try to pair bond – to get multiple reproductive opportunities with one female, by hanging around her and taking care of her. Pair bonding in turn allows for more kinship relationships. Kinship relationships mean you grow up around more people; this accelerates learning. Kinship also allows for more genetic selection for big-brained, slow-developing learners: it becomes less prohibitively costly to give birth to big-brained, slow-growing children, because more people are around to help out and pool food resources. This story is, by Henrich’s own account, quite speculative. You can find it in Chapter 16 of the book. • In The Secrets of Our Success, Joe Henrich argues that without our stock of cultural knowledge, individual humans are not particularly more generally intelligent than apes. I 75% agree with this, but I do think that individual humans are smarter than individual chimpanzees. A big area of disagreement is distinguishing between “intrinsic ability to innovate” vs. “ability to process culture”, and whether it’s even possible to distinguish the two. I wrote a post about this two years ago. For Henrich, the story of human uniqueness is a story of a feedback loop: increased cultural know-how, which drives genetic selection for bigger brains and better social learning, which leads to increased cultural know-how, which drives genetic selection for bigger brains….and so forth, until you have a very weird great ape that is weak, hairless, and has put a flag on the moon. This is the big crux for me on the evolution of humans and its relevance to the foom debate. Roughly, I think Henrich’s model is correct. I think his model provides a simple, coherent explanation for why humans dominate the world, and why it happened on such a short timescale, discontinuously with other animals. Of course, intelligence plays a large role on his model: you can’t get ants who can go to the moon, no matter how powerful their culture. But the the great insight is that our power does not come from our raw intelligence: it comes from our technology/​culture, which is so powerful because it was allowed to accumulate. Cultural accumulation is a zero-to-one discontinuity. That is, you can go a long time without any of it, and then something comes along that’s able to do it just a little bit and then shortly after, it blows up. But after you’ve already reached one, going from “being able to accumulate culture at all” to “being able to accumulate it slightly faster” does not give you the same discontinuous foom as before. We could, for example, imagine that an AI that can accumulate culture slightly faster than other humans. Since this AI is only slightly better than humans, however, it doesn’t go and create its own culture on its own. Unlike the humans—who actually did go and create their own culture completely on their own, separate from other animals—the AI will simply be one input to the human economy. This AI would be important input to our economy for sure, but not a completely separate entity producing its own distinct civilization, like the prototypical AI that spins up nanobot factories and kills us all within 3 minutes. It will be more like the brilliant professor, or easily-copyable-worker. In other words, it might speed up our general civilizational abilities to develop technology, and greatly enhance our productive capabilities. But it won’t, on its own, discontinuously produce technology 2.0 (where 1.0 was humans and animals roughly are technology 0.0). • I think a superintelligent AI can FOOM its way to manufacturing nanobots because the biggest bottleneck to engineering and manufacturing those is research that can be done without needing input from the physical universe beyond the physics we already know, and the machines we already have, with very slight upgrades or creative usages beyond what they were designed for. Manufacturing nanobots is like a logic brain teaser for a sufficiently intelligent reasoner. I guess you have a different perspective in that you think the process requires a culture of socializing beings, and/​or more input from the physical universe? • Summary of my response: at the point where humans are completely removed from a process, they will have been modestly improving output rather than acting as a sharp bottleneck that is suddenly removed. Not very relevant to my whole worldview in the first place; also not a very good description of how horses got removed from automobiles, or how humans got removed from playing Go. I’m not sure about horses, but Go doesn’t seem like a central example of human labor being automated. I definitely feel that the following examples have been more continuous (in the sense of human labor becoming gradually obsolete, rather than all-at-once), • Agriculture • Manufacturing • Travel agents My guess is that it’s also been true for people doing manual calculations, language translation, and speech-to-text. • Here’s a source on horse population in the US But also, your first graph covers a time period of 200 years whereas the third graph only covers 13; that’s not even the same order of magnitude. If you zoom in enough, any curve looks smooth, even an AI that FOOMs in mere hours. Also, the original quote is stating something about sharp increases in output once the last human bottleneck is gone, not how gradual human elements are being removed. • The central hypothesis of “takeoff speeds” is that at the time of serious AGI being developed, it is perfectly anti-Thielian in that it is devoid of secrets No, the slow takeoff model just precludes there being one big secret that unlocks both 30%/​year growth and dyson spheres. It’s totally compatible with a bunch of medium-sized$1B secrets that different actors discover, adding up to hyperbolic economic growth in the years leading up to “rising out of the atmosphere”.

Rounding off the slow takeoff hypothesis to “lots and lots of little innovations adding up to every key AGI threshold, which lots of actors are investing \$10 million in at a time” seems like black-and-white thinking, demanding that the future either be perfectly Thielien or perfectly anti-Thielien. The real question is a quantitative one — how lumpy will takeoff be?

• Did you ever finalize any bet(s)?

• Historical AI applications have had a relatively small loading on key-insights and seem like the closest analogies to AGI.

...Transformers as the key to text prediction?

It’s hard to see transformers making a big difference in text prediction trends when you look at benchmark data. On language modeling benchmarks such as the Penn Treebank Dataset we saw roughly smooth progress since at least 2014, and continuing at roughly the same rate through late 2017 and 2018 when the first transformer models were coming out.

It’s plausible that progress after 2017 has been faster than progress prior to 2017, but that this is hard to see in the data on Papers With Code, which only goes back to about 2013. That said, we can still see significant gradual progress prior to 2013 documented in Shen et al. which in my opinion does not look radically slower than progress post-2017.

• My question after reading this is about Eliezer’s predictions in a counterfactual without regulatory bottlenecks on economic growth. Would it change the probable outcome, or would we just get a better look at the oncoming AGI train before it hit us? (Or is there no such counterfactual well-defined enough to give us an answer?) ETA: Basically trying to get at whether that debate’s actually a crux of anything.

• it legitimately takes the whole 4 years after that to develop real AGI that ends the world. FINE. SO WHAT. EVERYONE STILL DIES.

By Gricean implicature, “everyone still dies” is relevant to the post’s thesis. Which implies that the post’s thesis is that humanity will not go extinct. But the post is about the rate of AI progress, not human extinction.

This seems like a bucket error, where “will takeoff be fast or slow?” and “will AI cause human extinction?” are put in the same bucket.

• The real world is allowed to do discontinuous things to you anyways.

There is not necessarily a presage of 9/​11 where somebody flies a small plane into a building and kills 100 people, before anybody flies 4 big planes into 3 buildings and kills 3000 people; and even if there is some presaging event like that, which would not surprise me at all, the rest of the world’s response to the two cases was evidently discontinuous.

There have been numerous terrorist incidents in world history, and triggers to war, and it’s not clear to me that 9/​11 is the most visceral. To the extent that AI disasters will be discontinuous in the sense that 9/​11 was discontinuous, this seems like a reason for optimism, not pessimism. We largely overreacted to 9/​11, rather than just letting it slide and allowing some much larger disaster take us by surprise.

ETA: I should note that without a clear definition of “discontinuous” I’m not sure whether I disagree with what was said. I do think 9/​11 was discontinuous in the sense of it being shocking and unexpected. But it doesn’t seem strongly discontinuous in the sense of breaking from historical trends.

• There have been numerous terrorist incidents in world history, and triggers to war, and it’s not clear to me that 9/​11 is the most visceral.

I do think part of the problem here is ‘reference class tennis’, where you can draw boundaries in different ways to get different conclusions, and it’s not quite clear which boundaries are the most predictive.

As I understand Eliezer’s point in that section, Paul’s model seems to predict there won’t be discontinuities in the input/​output response, but we have lots of examples of that sort of thing. Two years before the 9/​11 attacks, EgyptAir Flight 990 was deliberately crashed into the ocean by its first officer with 217 fatalities, about 10% of the 9/​11 fatalities, and yet the response to Flight 990 was much, much less than 10% of the response to 9/​11.

Before orchestrating the 1914 assassination of Archduke Franz Ferdinand, the same person orchestrated the assassination of King Alexander Obrenović and others in 1901, which did not lead to a war 10% the size of WWI (just sanctions and withdrawn ambassadors).

Separately, there’s the question of how much you should expect there to be trend-breaking events. If you’re working with just data collected up until 2000, I think you’ll be surprised by 2001; the number of fatalities is far outside of distribution (the recent plane crashes primarily killed passengers, you have to go back to WWII to get kamikaze attacks that kill more people on the ground than passengers, and even then the average number of casualties per suicide was 2, with the highest I can find being 389), and there isn’t a trendline suggesting a huge increase is coming.

• I think you’ll be surprised by 2001; the number of fatalities is far outside of distribution

Good point. I think I had overstated the extent to which terrorism had been a frequent occurance. 9/​11 is indeed the deadliest terrorist attack ever recorded (I didn’t realize that few other attacks even came close).

However, I do want to push back against the idea that this event was totally unprecedented. The comparison to other “terrorist attacks” is, as you hint at, a bit of a game of reference class tennis. When compared to other battles, air raids, and massacres, Wikipedia notes that there have been several dozen that compare in the context of war. But of course, the United States did not see itself in an active state of war at the time.

The closest comparison is probably the attack on Pearl Harbor, in which a comparable number of people died. But that attack was orchestrated by an industrializing state, not an insurgent terrorist group.

• I mean, as written, I’d want to avoid cases like 10% growth on paper while recovering from a pandemic that produced 0% growth the previous year.

The simplest way of doing this is probably to bet on whether there will be a yearly GWP/​GDP that exceeds 110% of every previous year. For example, the sequence [1, 0.9, 1.05] would not count, even though the last jump represented 16.7% growth.

• bet on whether there will be a yearly GWP/​GDP that exceeds 110% of a previous year

Did you mean “that exceeds 110% of all previous years”? (To exclude steady growth that eventually goes over 110% in aggregate, like [1.0, 1.03, 1.06, 1.09, 1.12].)

• Yes, this is what I meant. I edited my original comment to correct my mistake.

• Oh, come on. That is straight-up not how simple continuous toy models of RSI work. Between a neutron multiplication factor of 0.999 and 1.001 there is a very huge gap in output behavior.

Nitpick: I think that particular analogy isn’t great.

For nuclear stuff, we have two state variables: amount of fissile material and current number of neutrons flying around. The amount of fissile material determines the “neutron multiplication factor”, but it is the number of neutrons that goes crazy, not fissile material. And the current number of neurons doesn’t matter for whether the pile will eventually go crazy or not.

But in the simplest toy models of RSI, we just have one variable: intelligence. We can’t change the “intelligence multiplication factor”, there’s just intelligence figuring out how to build more intelligence.

Maybe exothermic chemical reactions, like fire, is a better analogy. Either you have enough heat to create a self-sustaining reaction, or you don’t.

• [Yudkowsky][23:25]

there’s a lot of noise in a 2-stock prediction.

[Christiano][23:25]

I mean, it’s a 1-stock prediction about nvidia

I didn’t get that part and thought others might not have either. First I thought 2-stock, 1-stock was some jargon I didn’t know related to shorting stocks. But as far as I can, tell this simply means that Yudkowsky expected that Christiano invested in both nvidia and more in tsmc, but Christiano just invested in tsmc.

• “Takeoff Speeds” has become kinda “required reading” in discussions on takeoff speeds. It seems like Eliezer hadn’t read it until September of this year? He may have other “required reading” from the past four years to catch up on.

(Of course, if one predictably won’t learn anything from an article, there’s not much point in reading it.)

• I read “Takeoff Speeds” at the time. I did not liveblog my reaction to it at the time. I’ve read the first two other items.

I flag your weirdly uncharitable inference.

• I apologize, I shouldn’t have leapt to that conclusion.

• FWIW, I did not find this weirdly uncharitable, only mildly uncharitable. I have extremely wide error bars on what you have and have not read, and “Eliezer has not read any of the things on that list” was within those error bars. It is really quite difficult to guess your epistemic state w.r.t. specific work when you haven’t been writing about it for a while.

(Though I guess you might have been writing about it on Twitter? I have no idea, I generally do not use Twitter myself, so I might have just completely missed anything there.)