Ajeya Cotra

Karma: 2,150

Iterated Distillation and Amplification

Ajeya Cotra30 Nov 2018 4:47 UTC

47 points

14 comments6 min readLW link

Draft report on AI timelines

Ajeya Cotra18 Sep 2020 23:47 UTC

214 points

56 comments1 min readLW link 1 review

Ajeya Cotra 19 Sep 2020 17:14 UTC
LW: 1 AF: 1
AF
in reply to: Owain_Evans’s comment on: Draft report on AI timelines
Thanks, I just cut the link!

Ajeya Cotra 26 Sep 2020 1:18 UTC
LW: 6 AF: 3
AF
in reply to: Daniel Kokotajlo’s comment on: Draft report on AI timelines
Thanks Daniel! Quick replies:
- On down-weighting low-end vs high-end compute levels: The reason that the down-weighting for low-end compute levels was done in a separate and explicit way was just because I think there’s a structural difference between the two updates. When updating against low-end compute levels, I think it makes more sense to do that update within each hypothesis, because only some orders of magnitude are affected. To implement an “update against high-end compute levels”, we can simply lower the probability we assign to high-compute hypotheses, since there is no specific reason to shave off just a few OOMs at the far right. My probability on the Evolution Anchor hypothesis is 10%, and my probability on the Long Horizon Neural Network hypothesis is 15%; this is lower than my probability on the Short Horizon Neural Network hypothesis (20%) and Medium Horizon Neural Network hypothesis (30%) because I feel that the higher-end hypotheses are less consistent with the holistic balance of evidence.
- On the GPT scaling trend: I think that the way to express the view that GPT++ would constitute TAI is to heavily weight the Short Horizon Neural Network hypothesis, potentially along with shifting and/or narrowing the range of effective horizon lengths in that bucket to be more concentrated on the low end (e.g. 0.1 to 30 subjective seconds rather than 1 to 1000 subjective seconds).
- On getting transformative abilities with 1e15 parameter models trained for 30 subjective years: I think this is pretty unlikely, but not crazy like you said; I think the way to express this view would be to up-weight the Lifetime Anchor hypothesis. My weight on it is currently 5%. Additionally, all the Neural Network hypotheses bake in substantial probability to relatively small models (e.g. 1e12 FLOP/subj sec) and scaling more shallow than we’ve seen demonstrated so far (e.g. an exponent of 0.25).

Ajeya Cotra 26 Sep 2020 1:26 UTC
LW: 1 AF: 1
AF
in reply to: abergal’s comment on: Draft report on AI timelines
Thanks! I definitely agree that the proper modeling technique would involve introducing uncertainty on algorithmic progress, and that this uncertainty would be pretty wide; this is one of the most important few directions of future research (the others being better understanding effective horizon length and better narrowing model size).
In terms of uncertainty in model size, I personally find it somewhat easier to think about what the final spread should be in the training FLOP requirements distribution, since there’s a fair amount of arbitrariness in how the uncertainty is apportioned between model size and scaling behavior. There’s also semantic uncertainty about what it means to “condition on the hypothesis that X is the best anchor.” If we’re living in the world of “brain FLOP/s anchor + normal scaling behavior”, then assigning a lot of weight to really small model sizes would wind up “in the territory” of the Lifetime Anchor hypothesis, and assigning a lot of weight to really large model sizes would wind up “in the territory” of the Evolution Anchor hypothesis, or go beyond the Evolution Anchor hypothesis.
I was roughly aiming for +- 5 OOM uncertainty in training FLOP requirements on top of the anchor distribution, and then apportioned uncertainty between model size and scaling behavior based on which one seemed more uncertain.

Ajeya Cotra 26 Sep 2020 1:35 UTC
LW: 3 AF: 2
AF
in reply to: avturchin’s comment on: Draft report on AI timelines
Thanks!
I agree that full distribution information is very valuable, although I consider medians to be important as well. The spreadsheet linked in the report provides the full distribution implied by my views for the probability that the amount of computation required to train a transformative model is affordable, although it requires some judgment to translate that into P(TAI), because there may be other bottlenecks besides computation and there may be other paths to TAI besides training a transformative model. I’d say it implies somewhere between 2031 and 2036 is the year by which there is a 10% chance of TAI.
As I said in a reply to Daniel above, the way to express the view that a brain-sized GPT model would constitute TAI is to assign a lot of weight to the Short Horizon Neural Network hypothesis, potentially along with shifting narrowing the effective horizon length. I think this is plausible, but don’t believe we should have a high probability on this because I expect on priors that we would need longer effective horizon lengths than GPT-3, and I don’t think that evidence from the GPT-3 paper or follow on papers have provided clear evidence to the contrary.
In my best guess inputs, I assign a 25% probability collectively to the Short Horizon Neural Network and Lifetime Anchor hypotheses; in my aggressive inputs I assign 50% probability to these two hypotheses collectively. In both cases, probabilities are smoothed to a significant extent because of uncertainty in model size requirements and scaling, with substantial weight on smaller-than-brain-sized models and larger-than-brain-sized models.

Ajeya Cotra 26 Sep 2020 1:41 UTC
2 points
in reply to: Andy Jones’s comment on: Draft report on AI timelines
Thanks so much, glad you’re finding it helpful!
I haven’t thought too much about short term spending scaleup; thanks for the links, My current intuition is that our subjective distribution should not be highly bimodal the way you describe—it seems like the industry could land somewhere along a broad spectrum from perfect competition to monopoly (with oligopoly seeming most plausible) and somewhere along a broad spectrum of possible profit margins.

Ajeya Cotra 26 Sep 2020 1:47 UTC
LW: 3 AF: 2
AF
in reply to: Vaniver’s comment on: Draft report on AI timelines
Thanks! Agree that functional form uncertainty is a big deal here; I think that implicitly this uncertainty is causing me to up-weight Short Horizon Neural Network more than I otherwise would, and also up-weight “Larger than all hypotheses” more than I otherwise would.
With that said, I do predict that in clean artificial cases (which may or may not be relevant), we could demonstrate linear scaling. E.g., consider the case of inserting a frame of static or a blank screen in between every normal frame of an Atari game or StarCraft game—I’d expect that modifying the games in this way would straightforwardly double training computation requirements.

Ajeya Cotra 26 Sep 2020 1:48 UTC
LW: 1 AF: 1
AF
in reply to: abergal’s comment on: Draft report on AI timelines
Yes, it’s assuming the scaling behavior follows the probability distributions laid out in Part 2, and then asking whether conditional on that the model size requirements could be off by a large amount.

Ajeya Cotra 26 Sep 2020 1:53 UTC
LW: 3 AF: 2
AF
in reply to: Vaniver’s comment on: Draft report on AI timelines
Yeah, I considered pegging spending to a fraction of GWP instead of a fraction of GDP, but found that when I did this I wanted to push the fraction down because I felt that even though companies are getting increasingly globalized, coordination at the world-scale would probably still be thinner than coordination at the scale of something nation-sized (even if it’s not actually a literal nation). Ultimately, I just went with GDP because there are more reference points for it.
I feel pretty uncertain about this though, and think there’s a lot of room for a more detailed inside-view projection on willingness-to-spend by a firm. We could calculate this by making assumptions about the global surplus created by a transformative model (easily calculable from the definition), the amount of that profit that a firm would capture if it trained a transformative model, and the size of the frontier firm over time (which could be pegged to the global economy or potentially pegged to estimates of profits from training smaller models). We could then back out what a rational firm should be willing to invest.

Ajeya Cotra 26 Sep 2020 1:55 UTC
LW: 1 AF: 1
AF
in reply to: Ben Pace’s comment on: Draft report on AI timelines
Thanks Ben, this is right!

Ajeya Cotra 26 Sep 2020 21:50 UTC
LW: 3 AF: 2
AF
in reply to: Vaniver’s comment on: Draft report on AI timelines
Yeah, I agree there is room for spending to be “irrational”, though I would guess this is more likely in the direction of spending less than the “rational” amount rather than more, because developing TAI could be unprecedentedly profitable and companies’ spending may be limited by capital constraints.

Ajeya Cotra 13 Oct 2020 22:45 UTC
LW: 5 AF: 3
AF
in reply to: Daniel Kokotajlo’s comment on: Draft report on AI timelines
Thanks! No need to wait for a more official release (that could take a long time since I’m prioritizing other projects).

Ajeya Cotra 18 Dec 2020 19:35 UTC
LW: 5 AF: 4
AF
in reply to: johnswentworth’s comment on: Draft report on AI timelines
Hi John, I think I remember that presentation—the reason the graph there was quite bimodal is because the Lifetime Anchor I was using at the time was simply assuming ~1x human lifetime levels of computation. In the current model, I’m assuming ~1000x human lifetime levels of computation, because ~1x seemed like a much less likely version of that anchor. The code in the quantitative model will let you see the untruncated version of the distribution, and it looks a lot more smooth now, though still a modest bump.
Also, apologies for such a late reply, I don’t get email notifications for comments and haven’t been checking regularly!

AMA on EA Forum: Ajeya Cotra, researcher at Open Phil

Ajeya Cotra29 Jan 2021 23:05 UTC

23 points

0 comments1 min readLW link

(forum.effectivealtruism.org)

Ajeya Cotra 4 Mar 2021 22:29 UTC
4 points
in reply to: Ben Pace’s comment on: How does bee learning compare with machine learning?
Aww thanks Ben, that was really nice of you!

Ajeya Cotra 5 Mar 2021 0:51 UTC
LW: 8 AF: 6
AF
in reply to: Rohin Shah’s comment on: How does bee learning compare with machine learning?
I mostly agree with your comment, but I’m actually very unsure about 2 here: I think I recall bees seeming surprisingly narrow and bad at abstract shapes. Guille would know more here.

The case for aligning narrowly superhuman models

Ajeya Cotra5 Mar 2021 22:29 UTC

184 points

75 comments38 min readLW link 1 review

Ajeya Cotra 6 Mar 2021 2:06 UTC
LW: 19 AF: 12
AF
in reply to: johnswentworth’s comment on: The case for aligning narrowly superhuman models
Thanks for the comment! Just want to explicitly pull out and endorse this part:

the experts be completely and totally absent from the training process, and in particular no data from the experts should be involved in the training process

I should have emphasized that more in the original post as a major goal. I think you might be right that it will be hard to solve the “sandwich” problem without conceptual progress, but I also think that attempts to solve the sandwich problem could directly spur that progress (not just reveal the need for it, but also take steps toward finding actual algorithms in the course of doing one of the sandwich problems).

I also broadly agree with you that “things looking good to humans without actually being good” is a major problem to watch out for. But I don’t think I agree that the most impressive-looking results will involve doing nothing to go beyond human feedback: successfully pulling off the sandwich method would most likely look significantly more impressive to mainstream ML researchers than just doing human feedback. (E.g., one of the papers I link in the post is a mainstream ML paper amplifying a weak training signal into a better one.)

Ajeya Cotra 6 Mar 2021 16:46 UTC
LW: 6 AF: 4
AF
in reply to: johnswentworth’s comment on: The case for aligning narrowly superhuman models
I guess the crux here is “And if the Hard problem is indeed hard enough to not be solved by anyone,” — I don’t think that’s the default/expected outcome. There hasn’t been that much effort on this problem in the scheme of things, and I think we don’t know where it ranges from “pretty easy” to “very hard” right now.
What links here?
- Rant on Problem Factorization for Alignment by johnswentworth (5 Aug 2022 19:23 UTC; 90 points)

Ajeya Cotra

Iter­ated Distil­la­tion and Amplification

Draft re­port on AI timelines

AMA on EA Fo­rum: Ajeya Co­tra, re­searcher at Open Phil

The case for al­ign­ing nar­rowly su­per­hu­man models

Iterated Distillation and Amplification

Draft report on AI timelines

AMA on EA Forum: Ajeya Cotra, researcher at Open Phil

The case for aligning narrowly superhuman models