# rohinmshah(Rohin Shah)

Karma: 8,263

PhD student at the Center for Human-Compatible AI. Creator of the Alignment Newsletter. http://​​rohinshah.com/​​

# AI Align­ment 2018-19 Review

28 Jan 2020 2:19 UTC
115 points

# The Align­ment Prob­lem: Ma­chine Learn­ing and Hu­man Values

6 Oct 2020 17:41 UTC
109 points
(www.amazon.com)

# Refram­ing Su­per­in­tel­li­gence: Com­pre­hen­sive AI Ser­vices as Gen­eral Intelligence

8 Jan 2019 7:12 UTC
93 points
(www.fhi.ox.ac.uk)

# Align­ment Newslet­ter One Year Retrospective

10 Apr 2019 6:58 UTC
90 points
• Planned summary for the Alignment Newsletter:

Once again, we have a piece of work so large and detailed that I need a whole newsletter to summarize it! This time, it is a quantitative model for forecasting when transformative AI will happen.

The overall framework

The key assumption behind this model is that if we train a neural net or other ML model that uses about as much computation as a human brain, that will likely result in transformative AI (TAI) (defined as AI that has an impact comparable to that of the industrial revolution). In other words, we _anchor_ our estimate of the ML model’s inference computation to that of the human brain. This assumption allows us to estimate how much compute will be required to train such a model _using 2020 algorithms_. By incorporating a trend extrapolation of how algorithmic progress will reduce the required amount of compute, we can get a prediction of how much compute would be required for the final training run of a transformative model in any given year.

We can also get a prediction of how much compute will be _available_ by predicting the cost of compute in a given year (which we have a decent amount of past evidence about), and predicting the maximum amount of money an actor would be willing to spend on a single training run. The probability that we can train a transformative model in year Y is then just the probability that the compute _requirement_ for year Y is less than the compute _available_ in year Y.

The vast majority of the report is focused on estimating the amount of compute required to train a transformative model using 2020 algorithms (where most of our uncertainty would come from); the remaining factors are estimated relatively quickly without too much detail. I’ll start with those so that you can have them as background knowledge before we delve into the real meat of the report. These are usually modeled as logistic curves in log space: that is, they are modeled as improving at some constant rate, but will level off and saturate at some maximum value after which they won’t improve.

Algorithmic progress

First off, we have the impact of _algorithmic progress_. <@AI and Efficiency@> estimates that algorithms improve enough to cut compute times in half every 16 months. However, this was measured on ImageNet, where researchers are directly optimizing for reduced computation costs. It seems less likely that researchers are doing as good a job at reducing computation costs for “training a transformative model”, and so the author increases the **halving time to 2-3 years**, with a maximum of **somewhere between 1-5 orders of magnitude** (with the assumption that the higher the “technical difficulty” of the problem, the more algorithmic progress is possible).

Cost of compute

Second, we need to estimate a trend for compute costs. There has been some prior work on this (summarized in [AN #97](https://​​mailchi.mp/​​a2b5efbcd3a7/​​an-97-are-there-historical-examples-of-large-robust-discontinuities)). The report has some similar analyses, and ends up estimating **a doubling time of 2.5 years**, and a (very unstable) maximum of improvement by **a factor of 2 million by 2100**.

Willingness to spend

Third, we would like to know the maximum amount (in 2020 dollars) any actor might spend on a single training run. Note that we are estimating the money spent on a _final training run_, which doesn’t include the cost of initial experiments or the cost of researcher time. Currently, the author estimates that all-in project costs are 10-100x larger than the final training run cost, but this will likely go down to something like 2-10x, as the incentive for reducing this ratio becomes much larger.

The author estimates that the most expensive run _in a published paper_ was the final <@AlphaStar@>(@AlphaStar: Mastering the Real-Time Strategy Game StarCraft II@) training run, at ~1e23 FLOP and \$1M cost. However, there have probably been unpublished results that are slightly more expensive, maybe \$2-8M. In line with <@AI and Compute@>, this will probably increase dramatically to about **\$1B in 2025**.

Given that AI companies each have around \$100B cash on hand, and could potentially borrow additional several hundreds of billions of dollars (given their current market caps and likely growth in the worlds where AI still looks promising), it seems likely that low hundreds of billions of dollars could be spent on a single run by 2040, corresponding to a doubling time (from \$1B in 2025) of about 2 years.

To estimate the maximum here, we can compare to megaprojects like the Manhattan Project or the Apollo program, which suggests that a government could spend around 0.75% of GDP for ~4 years. Since transformative AI will likely be more valuable economically and strategically than these previous programs, we can shade that upwards to 1% of GDP for 5 years. Assuming all-in costs are 5x that of the final training run, this suggests the maximum willingness to spend should be 1% of GDP of the largest country, which we assume grows at ~3% every year.

Strategy for estimating training compute for a transformative model

In addition to the three factors of algorithmic progress, cost of compute, and willingness to spend, we need an estimate of how much computation would be needed to train a transformative model using 2020 algorithms (which I’ll discuss next). Then, at year Y, the compute required is given by computation needed with 2020 algorithms * improvement factor from algorithmic progress, which (in this report) is a probability distribution. At year Y, the compute available is given by FLOP per dollar (aka compute cost) * money that can be spent, which (in this report) is a point estimate. We can then simply read off the probability that the compute required is greater than the compute available.

Okay, so the last thing we need is a distribution over the amount of computation that would be needed to train a transformative model using 2020 algorithms, which is the main focus of this report. There is a lot of detail here that I’m going to elide over, especially in talking about the _distribution_ as a whole (whereas I will focus primarily on the median case for simplicity). As I mentioned early on, the key hypothesis is that we will need to train a neural net or other ML model that uses about as much compute as a human brain. So the strategy will be to first translate from “compute of human brain” to “inference compute of neural net”, and then to translate from “inference compute of neural net” to “training compute of neural net”.

How much inference compute would a transformative model use?

We can talk about the rate at which synapses fire in the human brain. How can we convert this to FLOP? The author proposes the following hypothetical: suppose we redo evolutionary history, but in every animal we replace each neuron with N [floating-point units](https://​​en.wikipedia.org/​​wiki/​​Floating-point_unit) that each perform 1 FLOP per second. For what value of N do we still get roughly human-level intelligence over a similar evolutionary timescale? The author then does some calculations about simulating synapses with FLOPs, drawing heavily on the <@recent report on brain computation@>(@How Much Computational Power It Takes to Match the Human Brain@), to estimate that N would be around 1-10,000, which after some more calculations suggests that the human brain is doing the equivalent of 1e13 − 1e16 FLOP per second, with **a median of 1e15 FLOP per second**, and a long tail to the right.

Does this mean we can say that a transformative model will use 1e15 FLOP per second during inference? Such a model would have a clear flaw: even though we are assuming that algorithmic progress reduces compute costs over time, if we did the same analysis in e.g. 1980, we’d get the _same_ estimate for the compute cost of a transformative model, which would imply that there was no algorithmic progress between 1980 and 2020! The problem is that we’d always estimate the brain as using 1e15 FLOP per second (or around there), but for our ML models there is a difference between FLOP per second _using 2020 algorithms_ and FLOP per second _using 1980 algorithms_. So how do we convert form “brain FLOP per second” to “inference FLOP per second for 2020 ML algorithms”?

One approach is to look at how other machines we have designed compare to the corresponding machines that evolution has designed. An [analysis](https://​​docs.google.com/​​document/​​d/​​1HUtUBpRbNnnWBxiO2bz3LumEsQcaZioAPZDNcsWPnos/​​edit) by Paul Christiano concluded that human-designed artifacts tend to be 2-3 orders of magnitude worse than those designed by evolution, when considering energy usage. Presumably a similar analysis done in the past would have resulted in higher numbers and thus wouldn’t fall prey to the problem above. Another approach is to compare existing ML models to animals with a similar amount of computation, and see which one is subjectively “more impressive”. For example, the AlphaStar model uses about as much computation as a bee brain, and large language models use somewhat more; the author finds it reasonable to say that AlphaStar is “about as sophisticated” as a bee, or that <@GPT-3@>(@Language Models are Few-Shot Learners@) is “more sophisticated” than a bee.

We can also look at some abstract considerations. Natural selection had _a lot_ of time to optimize brains, and natural artifacts are usually quite impressive. On the other hand, human designers have the benefit of intelligent design and can copy the patterns that natural selection has come up with. Overall, these considerations roughly balance each other out. Another important consideration is that we’re only predicting what would be needed for a model that was good at most tasks that a human would currently be good at (think a virtual personal assistant), whereas evolution optimized for a whole bunch of other skills that were needed in the ancestral environment. The author subjectively guesses that this should reduce our estimate of compute costs by about an order of magnitude.

Overall, putting all these considerations together, the author intuitively guesses that to convert from “brain FLOP per second” to “inference FLOP per second for 2020 ML algorithms”, we should add an order of magnitude to the median, and add another two orders of magnitude to the standard deviation to account for our large uncertainty. This results in a median of **1e16 FLOP per second** for the inference-time compute of a transformative model.

Training compute for a transformative model

We might expect a transformative model to run a forward pass **0.1 − 10 times per second** (which on the high end would match human reaction time of 100ms), and for each parameter of the neural net to contribute **1-100 FLOP per forward pass**, which implies that if the inference-time compute is 1e16 FLOP per second then the model should have **1e13 − 1e17 parameters**, with a median of **3e14 parameters**.

We now need to estimate how much compute it takes to train a transformative model with 3e14 parameters. We assume this is dominated by the number of times you have to run the model during training, or equivalently, the number of data points you train on times the number of times you train on each data point. (In particular, this assumes that the cost of acquiring data is negligible in comparison. The report argues for this assumption; for the sake of brevity I won’t summarize it here.)

For this, we need a relationship between parameters and data points, which we’ll assume will follow a power law KP^α, where P is the number of parameters and K and α are constants. A large number of ML theory results imply that the number of data points needed to reach a specified level of accuracy grows linearly with the number of parameters (i.e. α=1), which we can take as a weak prior. We can then update this with empirical evidence from papers. <@Scaling Laws for Neural Language Models@> suggests that for language models, data requirements scale as α=0.37 or as α=0.74, depending on what measure you look at. Meanwhile, [Deep Learning Scaling is Predictable, Empirically](https://​​arxiv.org/​​abs/​​1712.00409) suggests that α=1.39 for a wide variety of supervised learning problems (including language modeling). However, the former paper studies a more relevant setting: it includes regularization, and asks about the number of data points needed to reach a target accuracy, whereas the latter paper ignores regularization and asks about the minimum number of data points that the model _cannot_ overfit to. So overall the author puts more weight on the former paper and estimates a median of α=0.8, though with substantial uncertainty.

We also need to estimate how many epochs will be needed, i.e. how many times we train on any given data point. The author decides not to explicitly model this factor since it will likely be close to 1, and instead lumps in the uncertainty over the number of epochs with the uncertainty over the constant factor in the scaling law above. We can then look at language model runs to estimate a scaling law for them, for which the median scaling law predicts that we would need 1e13 data points for our 3e14 parameter model.

However, this has all been for supervised learning. It seems plausible that a transformative task would have to be trained using RL, where the model acts over a sequence of timesteps, and then receives (non-differentiable) feedback at the end of those timesteps. How would scaling laws apply in this setting? One simple assumption is to say that each rollout over the _effective horizon_ counts as one piece of “meaningful feedback” and so should count as a single data point. Here, the effective horizon is the minimum of the actual horizon and 1/​(1-γ), where γ is the discount factor. We assume that the scaling law stays the same; if we instead try to estimate it from recent RL runs, it can change the results by about one order of magnitude.

So we now know we need to train a 3e14 parameter model with 1e13 data points for a transformative task. This gets us nearly all the way to the compute required with 2020 algorithms: we have a ~3e14 parameter model that takes ~1e16 FLOP per forward pass, that is trained on ~1e13 data points with each data point taking H timesteps, for a total of H * 1e29 FLOP. The author’s distributions are instead centered at H * 1e30 FLOP; I suspect this is simply because the author was computing with distributions whereas I’ve been directly manipulating medians in this summary.

The last and most uncertain piece of information is the effective horizon of a transformative task. We could imagine something as low as 1 subjective second (for something like language modeling), or something as high as 1e9 subjective seconds (i.e. 32 subjective years), if we were to redo evolution, or train on a task like “do effective scientific R&D”. The author splits this up into short, medium and long horizon neural net paths (corresponding to horizons of 1e0-1e3, 1e3-1e6, and 1e6-1e9 respectively), and invites readers to place their own weights on each of the possible paths.

There are many important considerations here: for example, if you think that the dominating cost will be generative modeling (GPT-3 style, but maybe also for images, video etc), then you would place more weight on short horizons. Conversely, if you think the hard challenge is to gain meta learning abilities, and that we probably need “data points” comparable to the time between generations in human evolution, then you would place more weight on longer horizons.

We can now combine all these ingredients to get a forecast for when compute will be available to develop a transformative model! But not yet: we’ll first add a few more possible “anchors” for the amount of computation needed for a transformative model. (All of the modeling so far has “anchored” the _inference time computation of a transformative model_ to the _inference time computation of the human brain_.)

First, we can anchor _parameter count of a transformative model_ to the _parameter count of the human genome_, which has far fewer “parameters” than the human brain. Specifically, we assume that all the scaling laws remain the same, but that a transformative model will only require 7.5e8 parameters (the amount of information in the human genome) rather than our previous estimate of ~1e15 parameters. This drastically reduces the amount of computation required, though it is still slightly above that of the short-horizon neural net, because the author assumed that the horizon for this path was somewhere between 1 and 32 years.

Second, we can anchor _training compute for a transformative model_ to the _compute used by the human brain over a lifetime_. As you might imagine, this leads to a much smaller estimate: the brain uses ~1e24 FLOP over 32 years of life, which is only 10x the amount used for AlphaStar, and even after adjusting upwards to account for man-made artifacts being worse than those made by evolution, the resulting model predicts a significant probability that we would already have been able to build a transformative model.

Finally, we can anchor _training compute for a transformative model_ to the _compute used by all animal brains over the course of evolution_. The basic assumption here is that our optimization algorithms and architectures are not much better than simply “redoing” natural selection from a very primitive starting point. This leads to an estimate of ~1e41 FLOP to train a transformative model, which is more than the long horizon neural net path (though not hugely more).

Putting it all together

So we now have six different paths: the three neural net anchors (short, medium and long horizon), the genome anchor, the lifetime anchor, and the evolution anchor. We can now assign weights to each of these paths, where each weight can be interpreted as the probability that that path is the _cheapest_ way to get a transformative model, as well as a final weight that describes the chance that none of the paths work out.

The long horizon neural net path can be thought of as a conservative “default” view: it could work out simply by training directly on examples of a long horizon task where each data point takes around a subjective year to generate. However, there are several reasons to think that researchers will be able to do better than this. As a result, the author assigns 20% to the short horizon neural net, 30% to the medium horizon neural net, and 15% to the long horizon neural net.

The lifetime anchor would suggest that we either already could get TAI, or are very close, which seems very unlikely given the lack of major economic applications of neural nets so far, and so gets assigned only 5%. The genome path gets 10%, the evolution anchor gets 10%, and the remaining 10% is assigned to none of the paths working out.

This predicts a **median of 2052** for the year in which some actor would be willing and able to train a single transformative model, with the full graphs shown below:

<Graphs removed since they are in flux and easy to share in a low-bandwidth way>

How does this relate to TAI?

Note that what we’ve modeled so far is the probability that by year Y we will have enough compute for the final training run of a transformative model. This is not the same thing as the probability of developing TAI. There are several reasons that TAI could be developed _later_ than the given prediction:

1. Compute isn’t the only input required: we also need data, environments, human feedback, etc. While the author expects that these will not be the bottleneck, this is far from a certainty.

2. When thinking about any particular path and making it more concrete, a host of problems tend to show up that will need to be solved and may add extra time. Some examples include robustness, reliability, possible breakdown of the scaling laws, the need to generate lots of different kinds of data, etc.

3. AI research could stall, whether because of regulation, a global catastrophe, an AI winter, or something else.

However, there are also compelling reasons to expect TAI to arrive _earlier_:

1. We may develop TAI through some other cheaper route, such as a <@services model@>(@Reframing Superintelligence: Comprehensive AI Services as General Intelligence@).

2. Our forecasts apply to a “balanced” model that has a similar profile of abilities as a human. In practice, it will likely be easier and cheaper to build an “unbalanced” model that is superhuman in some domains and subhuman in others, that is nonetheless transformative.

3. The curves for several factors assume some maximum after which progress is not possible; in reality it is more likely that progress slows to some lower but non-zero growth rate.

In the near future, it seems likely that it would be harder to find cheaper routes (since there is less time to do the research), so we should probably assume that the probabilities are overestimates, and for similar reasons for later years the probabilities should be treated as underestimates.

For the median of 2052, the author guesses that these considerations roughly cancel out, and so rounds the median for development of TAI to **2050**. A sensitivity analysis concludes that 2040 is the “most aggressive plausible median”, while the “most conservative plausible median” is 2080.

Planned opinion:

I really liked this report: it’s extremely thorough and anticipates and responds to a large number of potential reactions. I’ve made my own timelines estimate using the provided spreadsheet, and have adopted the resulting graph (with a few modifications) as my TAI timeline (which ends up with a median of ~2055). This is saying quite a lot: it’s pretty rare that a quantitative model is compelling enough that I’m inclined to only slightly edit its output, as opposed to simply using the quantitative model to inform my intuitions.

Here are the main ways in which my model is different from the one in the report:

1. Ignoring the genome anchor

I ignore the genome anchor because I don’t buy the model: even if researchers did create a very parameter-efficient model class (which seems unlikely), I would not expect the same scaling laws to apply to that model class. The report mentions that you could also interpret the genome anchor as simply providing a constraint on how many data points are needed to train long-horizon behaviors (since that’s what evolution was optimizing), but I prefer to take this as (fairly weak) evidence that informs what weights to place on short vs. medium vs. long horizons for neural nets.

2. Placing more weight on short and medium horizons relative to long horizons

I place 30% on short horizons, 40% on medium horizons, and 10% on long horizons. The report already names several reasons why we might expect the long horizon assumption to be too conservative. I agree with all of those, and have one more of my own:

If meta-learning turns out to require a huge amount of compute, we can instead directly train on some transformative task with a lower horizon. Even some of the hardest tasks like scientific R&D shouldn’t have a huge horizon: even if we assume that it takes human scientists a year to produce the equivalent of a single data point, at 40 hours a week that comes out to a horizon of 2000 subjective hours, or 7e6 seconds. This is near the beginning of the long horizon realm of 1e6-1e9 seconds and seems like a very conservative overestimate to me.

(Note that in practice I’d guess we will train something like a meta-learner, because I suspect the skill of meta-learning will not require such large average effective horizons.)

3. Reduced willingness to spend

My willingness to spend forecasts are somewhat lower: the predictions and reasoning in this report feel closer to upper bounds on how much people might spend rather than predictions of how much they will spend. Assuming we reduce the ratio of all-in project costs to final training run costs to 10x, spending \$1B on a training run by 2025 would imply all-in project costs of \$10B, which is ~40% of Google’s yearly R&D budget of \$26B, or 10% of the budget for a 4-year project. Possibly this wouldn’t be classified as R&D, but it would also be _2% of all expenditures over 4 years_. This feels remarkably high to me for something that’s supposed to happen within 5 years; while I wouldn’t rule it out, it wouldn’t be my median prediction.

4. Accounting for challenges

While the report does talk about challenges in e.g. getting the right data and environments by the right time, I think there are a bunch of other challenges as well: for example, you need to ensure that your model is aligned, robust, and reliable (at least if you want to deploy it and get economic value from it). I do expect that these challenges will be easier than they are today, partly because more research will have been done and partly because the models themselves will be more capable.

Another example of a challenge would be PR concerns: it seems very plausible to me that there will be a backlash against transformative AI systems, that results in those systems being deployed later than we’d expect them to be according to this model.

To be more concrete, if we ignore points 1-3 and assume this is my only disagreement, then for the median of 2052, rather than assuming that reasons for optimism and pessimism approximately cancel out to yield 2050 as the median for TAI, I’d be inclined to shade upwards to 2055 or 2060 as my median for TAI.

# Co­her­ence ar­gu­ments do not im­ply goal-di­rected behavior

3 Dec 2018 3:26 UTC
75 points

# Align­ment Newslet­ter #13: 07/​02/​18

2 Jul 2018 16:10 UTC
70 points
(mailchi.mp)

# Pre­face to the se­quence on value learning

30 Oct 2018 22:04 UTC
65 points

# AI safety with­out goal-di­rected behavior

7 Jan 2019 7:48 UTC
59 points
• Ultimately, our goal is to build AI systems that do what we want them to do. One way of decomposing this is first to define the behavior that we want from an AI system, and then to figure out how to obtain that behavior, which we might call the definition-optimization decomposition. Ambitious value learning aims to solve the definition subproblem. I interpret this post as proposing a different decomposition of the overall problem. One subproblem is how to build an AI system that is trying to do what we want, and the second subproblem is how to make the AI competent enough that it actually does what we want. I like this motivation-competence decomposition for a few reasons:

• It isolates the major, urgent difficulty in a single subproblem. If we make an AI system that tries to do what we want, it could certainly make mistakes, but it seems much less likely to cause eg. human extinction. (Though it is certainly possible, for example by building an unaligned successor AI system, as mentioned in the post.) In contrast, with the definition-optimization decomposition, we need to solve both specification problems with the definition and robustness problems with the optimization.

• Humans seem to solve the motivation subproblem, whereas humans don’t seem to solve either the definition or the optimization subproblems. I can definitely imagine a human legitimately trying to help me, whereas I can’t really imagine a human knowing how to derive optimal behavior for my goals, nor can I imagine a human that can actually perform the optimal behavior to achieve some arbitrary goal.

• It is easier to apply to systems without much capability, though as the post notes, it probably still does need to have some level of capability. While a digit recognition system is useful, it doesn’t seem meaningful to talk about whether it is “trying” to help us.

• Relatedly, the safety guarantees seem to degrade more slowly and smoothly. With definition-optimization, if you get the definition even slightly wrong, Goodhart’s Law suggests that you can get very bad outcomes. With motivation-competence, I’ve already argued that incompetence probably leads to small problems, not big ones, and slightly worse motivation might not make a huge difference because of something analogous to the basin of attraction around corrigibility. This depends a lot on what “slightly worse” means for motivation, but I’m optimistic.

• We’ve been working with the definition-optimization decomposition for quite some time now by modeling AI systems as expected utility maximizers, and we’ve found a lot of negative results and not very many positive ones.

• The motivation-competence decomposition accommodates interaction between the AI system and humans, which definition-optimization does not allow (or at least, it makes it awkward to include such interaction).

The cons are:

• It is imprecise and informal, whereas we can use the formalism of expected utility maximizers for the definition-optimization decomposition.

• There hasn’t been much work done in this paradigm, so it is not obvious that there is progress to make.

• I suspect many researchers would argue that any sufficiently intelligent system will be well-modeled as an expected utility maximizer and will have goals and preferences it is optimizing for, and as a result we need to deal with the problems of expected utility maximizers anyway. Personally, I do not find this argument compelling, and hope to write about why in the near future. ETA: Written up in the chapter on Goals vs Utility Functions in the Value Learning sequence, particularly in Coherence arguments do not imply goal-directed behavior.

# [AN #69] Stu­art Rus­sell’s new book on why we need to re­place the stan­dard model of AI

19 Oct 2019 0:30 UTC
56 points
(mailchi.mp)

# Align­ment Newslet­ter Three Year Retrospective

7 Apr 2021 14:39 UTC
54 points

# [AN #58] Mesa op­ti­miza­tion: what it is, and why we should care

24 Jun 2019 16:10 UTC
50 points
(mailchi.mp)

1 Dec 2018 4:25 UTC
46 points

# Will hu­mans build goal-di­rected agents?

5 Jan 2019 1:33 UTC
46 points

# [AN #127]: Re­think­ing agency: Carte­sian frames as a for­mal­iza­tion of ways to carve up the world into an agent and its environment

2 Dec 2020 18:20 UTC
46 points
(mailchi.mp)

# Con­clu­sion to the se­quence on value learning

3 Feb 2019 21:05 UTC
45 points

# [AN #140]: The­o­ret­i­cal mod­els that pre­dict scal­ing laws

4 Mar 2021 18:10 UTC
45 points
(mailchi.mp)

# Fu­ture di­rec­tions for am­bi­tious value learning

11 Nov 2018 15:53 UTC
43 points