Ajeya Cotra

Karma: 3,182

Ajeya Cotra 19 Sep 2025 17:33 UTC
20 points
8
in reply to: Buck’s comment on: Christian homeschoolers in the year 3000
I agree with this particular reason to worry that we can’t agree on a meta-philosophy, but separately think that there might not actually be a good meta-philosophy to find, especially if you’re going for greater certainty/clarity than mathematical reasoning!

Ajeya Cotra 3 Jun 2025 21:24 UTC
LW: 6 AF: 3
1
AF
in reply to: ryan_greenblatt’s comment on: ryan_greenblatt’s Shortform
I agree that robust self-verification and sample efficiency are the main things AIs are worse at than humans, and that this is basically just a quantitative difference. But what’s the best evidence that RL methods are getting more sample efficient (separate from AIs getting better at recognizing their own mistakes)? That’s not obvious to me but I’m not really read up on the literature. Is there a benchmark suite you think best illustrates that?

Ajeya Cotra 6 Mar 2025 19:57 UTC
7 points
0
in reply to: Daniel Kokotajlo’s comment on: How Much Are LLMs Actually Boosting Real-World Programmer Productivity?
Yeah I’ve cataloged some of that here: https://x.com/ajeya_cotra/status/1894821255804788876 Hoping to do something more systematic soon

Ajeya Cotra 7 Jan 2025 23:23 UTC
LW: 17 AF: 10
10
AF
in reply to: Ajeya Cotra’s comment on: AI Timelines
To put it another way: we probably both agree that if we had gotten AI personal assistants that shop for you and book meetings for you in 2024, that would have been at least some evidence for shorter timelines. So their absence is at least some evidence for longer timelines. The question is what your underlying causal model was: did you think that if we were going to get superintelligence by 2027, then we really should see personal assistants in 2024? A lot of people strongly believe that, you (Daniel) hardly believe it at all, and I’m somewhere in the middle.

If we had gotten both the personal assistants I was expecting, and the 2x faster benchmark progress than I was expecting, my timelines would be the same as yours are now.

Ajeya Cotra 7 Jan 2025 23:17 UTC
LW: 7 AF: 4
0
AF
in reply to: Daniel Kokotajlo’s comment on: AI Timelines
I’m not talking about narrowly your claim; I just think this very fundamentally confuses most people’s basic models of the world. People expect, from their unspoken models of “how technological products improve,” that long before you get a mind-bendingly powerful product that’s so good it can easily kill you, you get something that’s at least a little useful to you (and then you get something that’s a little more useful to you, and then something that’s really useful to you, and so on). And in fact that is roughly how it’s working — for programmers, not for a lot of other people.

Because I’ve engaged so much with the conceptual case for an intelligence explosion (i.e. the case that this intuitive model of technology might be wrong), I roughly buy it even though I am getting almost no use out of AIs still. But I have a huge amount of personal sympathy for people who feel really gaslit by it all.

Ajeya Cotra 7 Jan 2025 23:05 UTC
8 points
6
in reply to: Noosphere89’s comment on: AI Timelines
Yeah TBC, I’m at even less than 1-2 decades added, more like 1-5 years.

Ajeya Cotra 7 Jan 2025 22:47 UTC
LW: 15 AF: 6
6
AF
in reply to: Buck’s comment on: AI Timelines
Interestingly, I’ve heard from tons of skeptics I’ve talked to (e.g. Tim Lee, CSET people, AI Snake Oil) that timelines to actual impacts in the world (such as significant R&D acceleration or industrial acceleration) are going to be way longer than we say because AIs are too unreliable and risky, therefore people won’t use them. I was more dismissive of this argument before but:
- It matches my own lived experience (e.g. I still use search way more than LLMs, even to learn about complex topics, because I have good Google Fu and LLMs make stuff up too much).
- As you say, it seems like a plausible explanation for why my weird friends make way more use out of coding agents than giant AI companies.

Ajeya Cotra 7 Jan 2025 20:57 UTC
LW: 10 AF: 4
3
AF
in reply to: Buck’s comment on: AI Timelines
Yeah, good point, I’ve been surprised by how uninterested the companies have been in agents.

Ajeya Cotra 7 Jan 2025 19:12 UTC
LW: 32 AF: 17
12
AF
in reply to: Ajeya Cotra’s comment on: AI Timelines
One thing that I think is interesting, which doesn’t affect my timelines that much but cuts in the direction of slower: once again I overestimated how much real world use anyone who wasn’t a programmer would get. I definitely expected an off-the-shelf agent product that would book flights and reserve restaurants and shop for simple goods, one that worked well enough I would actually use it (and I expected that to happen before the one hour plus coding tasks were solved; I expected it to be concurrent with half hour coding tasks).

I can’t tell if the fact that AI agents continue to be useless to me is a portent that the incredible benchmark performance won’t translate as well as the bullish people expect to real world acceleration; I’m largely deferring to the consensus in my local social circle that it’s not a big deal. My personal intuitions are somewhat closer to what Steve Newman describes in this comment thread.

It seems like anecdotally folks are getting like +5%-30% productivity boost from using AI; it does feel somewhat aggressive for that to go to 10x productivity boost within a couple years.

Ajeya Cotra 7 Jan 2025 19:04 UTC
LW: 14 AF: 5
2
AF
in reply to: ryan_greenblatt’s comment on: ryan_greenblatt’s Shortform
My timelines are now roughly similar on the object level (maybe a year slower for 25th and 1-2 years slower for 50th), and procedurally I also now defer a lot to Redwood and METR engineers. More discussion here: https://www.lesswrong.com/posts/K2D45BNxnZjdpSX2j/ai-timelines?commentId=hnrfbFCP7Hu6N6Lsp
What links here?
- Ben Pace, the Vacationing Vagabond's comment on How might we safely pass the buck to AI? by joshc (20 Feb 2025 0:31 UTC; 16 points)

Ajeya Cotra 7 Jan 2025 18:55 UTC
LW: 88 AF: 35
8
AF
in reply to: ryan_greenblatt’s comment on: AI Timelines
I agree the discussion holds up well in terms of the remaining live cruxes. Since this exchange, my timelines have gotten substantially shorter. They’re now pretty similar to Ryan’s (they feel a little bit slower but within the noise from operationalizations being fuzzy; I find it a bit hard to think about what 10x labor inputs exactly looks like).
The main reason they’ve gotten shorter is that performance on few-hour agentic tasks has moved almost twice as fast as I expected, and this seems broadly non-fake (i.e. it seems to be translating into real world use with only a moderate lag rather than a huge lag), though this second part is noisier and more confusing.
This dialogue occurred a few months after METR released their pilot report on autonomous replication and adaptation tasks. At the time it seemed like agents (GPT-4 and Claude 3 Sonnet iirc) were starting to be able to do tasks that would take a human a few minutes (looking something up on Wikipedia, making a phone call, searching a file system, writing short programs).
Right around when I did this dialogue, I launched an agent benchmarks RFP to build benchmarks testing LLM agents on many-step real-world tasks. Through this RFP, in late-2023 and early-2024, we funded a bunch of agent benchmarks consisting of tasks that take experts between 15 minutes and a few hours.
Roughly speaking, I was expecting that the benchmarks we were funding would get saturated around early-to-late 2026 (within 2-3 years). By EOY 2024 (one year out), I had expected these benchmarks to be halfway toward saturation — qualitatively I guessed that agents would be able to reliably perform moderately difficult 30 minute tasks as well as experts in a variety of domains but struggle with the 1-hour-plus tasks. This would have roughly been the same trajectory that the previous generation of benchmarks followed: e.g. MATH was introduced in Jan 2021, got halfway there in June 2022 (1.5 years), then saturated probably like another year after that (for a total of 2.5 years).
Instead, based on agent benchmarks like RE Bench and CyBench and SWE Bench Verified and various bio benchmarks, it looks like agents are already able to perform self-contained programming tasks that would take human experts multiple hours (although they perform these tasks in a more one-shot way than human experts perform them, and I’m sure there is a lot of jaggedness); these benchmarks seem on track to saturate by early 2025. If that holds up, it’d be about twice as fast as I would have guessed (1-1.5 years vs 2-3 years).
There’s always some lag between benchmark performance and real world use, and it’s very hard for me to gauge this lag myself because it seems like AI agents are way disproportionately useful to programmers and ML engineers compared to everyone else. But from friends who use AI systems regularly, it seems like they are regularly assigning agents tasks that would take them between a few minutes and an hour and getting actual value out of them.
On a meta level I now defer heavily to Ryan and people in his reference class (METR and Redwood engineers) on AI timelines, because they have a similarly deep understanding of the conceptual arguments I consider most important while having much more hands-on experience with the frontier of useful AI capabilities (I still don’t use AI systems regularly in my work). Of course AI company employees have the most hands-on experience, but I’ve found that they don’t seem to think as rigorously about the conceptual arguments, and some of them have a track record of overshooting and predicting AGI between 2020 and 2025 (as you might expect from their incentives and social climate).
What links here?

Survey on the acceleration risks of our new RFPs to study LLM capabilities

Ajeya Cotra10 Nov 2023 23:59 UTC

29 points

1 comment8 min readLW link

AI Timelines

habryka, Daniel Kokotajlo, Ajeya Cotra and Ege Erdil

10 Nov 2023 5:28 UTC

302 points

144 comments51 min readLW link 2 reviews

New roles on my team: come build Open Phil’s technical AI safety program with me!

Ajeya Cotra19 Oct 2023 16:47 UTC

83 points

6 comments4 min readLW link

Ajeya Cotra 26 Sep 2023 2:56 UTC
LW: 48 AF: 25
0
AF
on: There should be more AI safety orgs
(Cross-posted to EA Forum.)
I’m a Senior Program Officer at Open Phil, focused on technical AI safety funding. I’m hearing a lot of discussion suggesting funding is very tight right now for AI safety, so I wanted to give my take on the situation.
At a high level: AI safety is a top priority for Open Phil, and we are aiming to grow how much we spend in that area. There are many potential projects we’d be excited to fund, including some potential new AI safety orgs as well as renewals to existing grantees, academic research projects, upskilling grants, and more.
At the same time, it is also not the case that someone who reads this post and tries to start an AI safety org would necessarily have an easy time raising funding from us. This is because:
- All of our teams whose work touches on AI (Luke Muehlhauser’s team on AI governance, Claire Zabel’s team on capacity building, and me on technical AI safety) are quite understaffed at the moment. We’ve hired several people recently, but across the board we still don’t have the capacity to evaluate all the plausible AI-related grants, and hiring remains a top priority for us.
  - And we are extra-understaffed for evaluating technical AI safety proposals in particular. I am the only person who is primarily focused on funding technical research projects (sometimes Claire’s team funds AI safety related grants, primarily upskilling, but a large technical AI safety grant like a new research org would fall to me). I currently have no team members; I expect to have one person joining in October and am aiming to launch a wider hiring round soon, but I think it’ll take me several months to build my team’s capacity up substantially.
  - I began making grants in November 2022, and spent the first few months full-time evaluating applicants affected by FTX (largely academic PIs as opposed to independent organizations started by members of the EA community). Since then, a large chunk of my time has gone into maintaining and renewing existing grant commitments and evaluating grant opportunities referred to us by existing advisors. I am aiming to reserve remaining bandwidth for thinking through strategic priorities, articulating what research directions seem highest-priority and encouraging researchers to work on them (through conversations and hopefully soon through more public communication), and hiring for my team or otherwise helping Open Phil build evaluation capacity in AI safety (including separately from my team).
  - As a result, I have deliberately held off on launching open calls for grant applications similar to the ones run by Claire’s team (e.g. this one); before onboarding more people (and developing or strengthening internal processes), I would not have the bandwidth to keep up with the applications.
- On top of this, in our experience, providing seed funding to new organizations (particularly organizations started by younger and less experienced founders) often leads to complications that aren’t present in funding academic research or career transition grants. We prefer to think carefully about seeding new organizations, and have a different and higher bar for funding someone to start an org than for funding that same person for other purposes (e.g. career development and transition funding, or PhD and postdoc funding).
  - I’m very uncertain about how to think about seeding new research organizations and many related program strategy questions. I could certainly imagine developing a different picture upon further reflection — but having low capacity combines poorly with the fact that this is a complex type of grant we are uncertain about on a lot of dimensions. We haven’t had the senior staff bandwidth to develop a clear stance on the strategic or process level about this genre of grant, and that means that we are more hesitant to take on such grant investigations — and if / when we do, it takes up more scarce capacity to think through the considerations in a bespoke way rather than having a clear policy to fall back on.

New blog: Planned Obsolescence

Ajeya Cotra27 Mar 2023 19:46 UTC

96 points

7 comments1 min readLW link

(www.planned-obsolescence.org)

Ajeya Cotra 26 Jan 2023 2:06 UTC
LW: 34 AF: 16
13
AF
in reply to: habryka’s comment on: Thoughts on the impact of RLHF research

my guess is most of that success is attributable to the work on RLHF, since that was really the only substantial difference between Chat-GPT and GPT-3

I don’t think this is right—the main hype effect of chatGPT over previous models feels like it’s just because it was in a convenient chat interface that was easy to use and free. My guess is that if you did a head-to-head comparison of RLHF and kludgey random hacks involving imitation and prompt engineering, they’d seem similarly cool to a random journalist / VC, and generate similar excitement.

Ajeya Cotra 14 Dec 2022 21:16 UTC
LW: 4 AF: 3
0
AF
in reply to: Richard_Ngo’s comment on: ricraz’s Shortform

I strongly disagree with the “best case” thing. Like, policies could just learn human values! It’s not that implausible.

Yes, sorry, “best case” was oversimplified. What I meant is that generalizing to want reward is in some sense the model generalizing “correctly;” we could get lucky and have it generalize “incorrectly” in an important sense in a way that happens to be beneficial to us. I discuss this a bit more here.

But if Alex did initially develop a benevolent goal like “empower humans,” the straightforward and “naive” way of acting on that goal would have been disincentivized early in training. As I argued above, if Alex had behaved in a straightforwardly benevolent way at all times, it would not have been able to maximize reward effectively.

That means even if Alex had developed a benevolent goal, it would have needed to play the training game as well as possible—including lying and manipulating humans in a way that naively seems in conflict with that goal. If its benevolent goal had caused it to play the training game less ruthlessly, it would’ve had a constant incentive to move away from having that goal or at least from acting on it.[35] If Alex actually retained the benevolent goal through the end of training, then it probably strategically chose to act exactly as if it were maximizing reward.

This means we could have replaced this hypothetical benevolent goal with a wide variety of other goals without changing Alex’s behavior or reward in the lab setting at all—“help humans” is just one possible goal among many that Alex could have developed which would have all resulted in exactly the same behavior in the lab setting.

If I had to try point to the crux here, it might be “how much selection pressure is needed to make policies learn goals that are abstractly related to their training data, as opposed to goals that are fairly concretely related to their training data?”...As usual, there’s the human analogy: our goals are very strongly biased towards things we have direct observational access to!)

I don’t understand why reward isn’t something the model has direct access to—it seems like it basically does? If I had to say which of us were focusing on abstract vs concrete goals, I’d have said I was thinking about concrete goals and you were thinking about abstract ones, so I think we have some disagreement of intuition here.

Even setting aside this disagreement, though, I don’t like the argumentative structure because the generalization of “reward” to large scales is much less intuitive than the generalization of other concepts (like “make money”) to large scales—in part because directly having a goal of reward is a kinda counterintuitive self-referential thing.

Yeah, I don’t really agree with this; I think I could pretty easily imagine being an AI system asking the question “How much reward would this episode get if it were sampled for training?” It seems like the intuition this is weird and unnatural is doing a lot of work in your argument, and I don’t really share it.

Ajeya Cotra 14 Dec 2022 19:14 UTC
LW: 3 AF: 2
0
AF
in reply to: Lauro Langosco’s comment on: ricraz’s Shortform
Yeah, I agree this is a good argument structure—in my mind, maximizing reward is both a plausible case (which Richard might disagree with) and the best case (conditional on it being strategic at all and not a bag of heuristics), so it’s quite useful to establish that it’s doomed; that’s the kind of structure I was going for in the post.

Ajeya Cotra 14 Dec 2022 19:03 UTC
LW: 7 AF: 6
0
AF
in reply to: Richard_Ngo’s comment on: ricraz’s Shortform
Note that the “without countermeasures” post consistently discusses both possibilities (the model cares about reward or the model cares about something else that’s consistent with it getting very high reward on the training dataset). E.g. see this paragraph from the above-the-fold intro:

Once this progresses far enough, the best way for Alex to accomplish most possible “goals” no longer looks like “essentially give humans what they want but take opportunities to manipulate them here and there.” It looks more like “seize the power to permanently direct how it uses its time and what rewards it receives—and defend against humans trying to reassert control over it, including by eliminating them.” This seems like Alex’s best strategy whether it’s trying to get large amounts of reward or has other motives. If it’s trying to maximize reward, this strategy would allow it to force its incoming rewards to be high indefinitely.[6] If it has other motives, this strategy would give it long-term freedom, security, and resources to pursue those motives.

As well as the section Even if Alex isn’t “motivated” to maximize reward.… I do place a ton of emphasis on the fact that Alex enacts a policy which has the empirical effect of maximizing reward, but that’s distinct from being confident in the motivations that give rise to that policy. I believe Alex would try very hard to maximize reward in most cases, but this could be for either terminal or instrumental reasons.

With that said, for roughly the reasons Paul says above, I think I probably do have a disagreement with Richard—I think that caring about some version of reward is pretty plausible (~50% or so). It seems pretty natural and easy to grasp to me, and because I think there will likely be continuous online training the argument that there’s no notion of reward on the deployment distribution doesn’t feel compelling to me.

Ajeya Cotra

Sur­vey on the ac­cel­er­a­tion risks of our new RFPs to study LLM capabilities

AI Timelines

New roles on my team: come build Open Phil’s tech­ni­cal AI safety pro­gram with me!

New blog: Planned Obsolescence

Survey on the acceleration risks of our new RFPs to study LLM capabilities

New roles on my team: come build Open Phil’s technical AI safety program with me!