Michaël Trazzi
36,000 AI Agents Are Now Speedrunning Civilization
Demis Hassabis finally agreed that he would pause if everyone else also paused.
https://x.com/emilychangtv/status/2013726877706313798?s=20
The Hunger Strike To Stop The AI Race
I’m not sure it might result in a wake up of AI Researchers.
See this thread yesterday between Chris Painter and Dean Ball:Chris: I have the sense that, as of the last 6 months, a lot of tech now thinks an intelligence explosion is more plausible than they did previously. But I don’t feel like I’m hearing a lot about that changing people’s minds on the importance of alignment and control research.
Dean: do people really think alignment and control research is unimportant? it seems like a big part of why opus is so good is the approach ant took to aligning it, and like basically everyone recognizes this?Chris: I’m not sure they think it’s unimportant. It’s more that around a year ago a lot of people would’ve said something like “Well, some people are really nervous about alignment and control research and loss of control etc, but that’s because they have this whole story of AI foom and really dramatic self-improvement. I think that story is way overstated, these models just don’t speed me up that much today, and I think we’ll have issues with autonomy for a long time, it’s really hard.” So, they often stated their objection in a way that made it sound like rapid progress on AI R&D automation would change their mind. To be clear, I think there are stronger objections that they could have raised and could still raise, like “we will then move to hardware bottlenecks, which will require AI proliferation for a true speed up to materialize”. Also, sorry if it’s rough for me to not be naming specific names, would just take time to pull examples.
Using @ryan_greenblatt’s updated 5-month doubling time: we reach the 1-month horizon from AI 2027 in ~5 doublings (Jan 2028) at 50% reliability, and ~8 doublings (Apr 2029) at 80% reliability. If I understand correctly, your model uses 80% reliability while also requiring 30x cheaper and faster than humans. It does seem like if the trend holds, by mid-2029 the models wouldn’t be much more expensive or slower. But I agree that if a lab tried to demonstrate “superhuman coder” on METR by the end of next year using expensive scaffolding / test-time compute (similar to o1 on ARC-AGI last year), it would probably exceed 30x human-cost, even if already 30x faster.
Are We In A Coding Overhang?
Fixed
Claude Opus 4.5 Achieves 50%-Time Horizon Of Around 4 hrs 49 Mins
Any updates on the 2025 numbers for Lighthaven? (cf. this table from last year’s fundraiser)
My guess at what’s happening here: for the first iterations of MATS (think MATS 2.0 at the Lightcone WeWork) you would have folks who were already into AI Safety for quite a long time and were interested in doing some form of internship-like thing for a summer. But as you run more cohorts (and make the cohorts bigger) then the density of people who have been interested in safety for a long time naturally decreases (because all the people who were interested in safety for years already applied to previous iterations).
Congrats on the launch!
I would add the main vision for this (from the website) directly in the post as quoted text, so that people can understand what you’re doing (& discuss).
I was trying to map out disagreements between people who are concerned enough about AI risk.Agreed that this represents only a fraction of the people who talk about AI risk, and that there are a lot of people who will use some of these arguments as false justifications for their support of racing.
EDIT: as TsviBT pointed out in his comment, OP is actually about people who self-identify as members of the AI Safety community. Given that, I think that the two splits I mentioned above are still useful models, since most people I end up meeting who self-identify as members of the community seem to be sincere, without stated positions that differ from their actual reasons for why they do things. I have met people who I believe to be insincere, but I don’t think they self-identify as part of the AI Safety community. I think that TsviBT’s general point about insincerity in the AI Safety discourse is valid.
You make a valid point. Here’s another framing that makes the tradeoff explicit:
Group A) “Alignment research is worth doing even though it might provide cover for racing”
Group B) “The cover problem is too severe. We should focus on race-stopping work instead”
I’d split things this way:
Group A) “Given that stopping the AI race seems nearly impossible, I focus on ensuring humanity builds safe superintelligence”
Group B) “Given that building superintelligence safely under current race dynamics seems nearly impossible, I focus on stopping the AI race”
Group C) “Given deep uncertainty about whether we can align superintelligence under race conditions or stop the race itself, I work to ensure both strategies receive enough resources.”
Another Anthropic employee told me that 90% of the code written by AI wasn’t crazy. He said something like: “most of the work happens in the remaining 10%”.
how confident are you that safety researchers will be able to coordinate at crunch time, and it won’t be eg. only safety researchers at one lab?
without taking things like personal fit into account, how would you compare say doing prosaic ai safety research pre-crunch time to policy interventions helping you coordinate better at crunch time (for instance helping safety teams coordinate better at crunch time, or even buying more crunch time)?
Hi Mikhail, thanks for offering your thoughts on this. I think having more public discussion on this is useful and I appreciate you taking the time to write this up.
I think your comment mostly applies to Guido in front of Anthropic, and not our hunger strike in front of Google DeepMind in London.
Hunger strikes can be incredibly powerful when there’s a just demand, a target who would either give in to the demand or be seen as a villain for not doing so, a wise strategy, and a group of supporters.
I don’t think these hunger strikes pass the bar. Their political demands are not what AI companies would realistically give in to because of a hunger strike by a small number of outsiders.
I don’t think I have been framing Demis Hassabis as a villain and if you think I did it would be helpful to add a source for why you believe this.
I’m asking Demis Hassabis to “publicly state that DeepMind will halt the development of frontier AI models if all the other major AI companies agree to do so.” which I think is a reasonable thing to state given all public statements he made regarding AI Safety. I think that is indeed something that a company such as Google DeepMind would give in.
A hunger strike can bring attention to how seriously you perceive an issue. If you know how to make it go viral, that is; in the US, hunger strikes are rarely widely covered by the media.
I’m currently in the UK, and I can tell you that there’s already been two pieces published on Business Insider. I’ve also given three interviews in the past 24 hours to journalists to contribute to major publications. I’ll try to add links later if / once these get published.
At the moment, these hunger strikes are people vibe-protesting. They feel like some awful people are going to kill everyone, they feel powerless, and so they find a way to do something that they perceive as having a chance of changing the situation.
Again, I’m pretty sure I haven’t framed people as “awful”, and would be great if you could provide sources to that statement. I also don’t feel powerless. My motivation for doing this was in part to provide support to Guido’s strike in front of Anthropic, which feels more like helping an ally, joining forces.
I find it actually empowering to be able to be completely honest about what I actually think DeepMind should do to help stop the AI race and receive so much support from all kinds of people on the street, including employees from Google, Google DeepMind, Meta and Sony. I am also grateful to have Denys with me, who flew from Amsterdam to join the hunger strike, and all the journalists who have taken the time to talk to us, both in person and remotely.
Action is better than inaction; but please stop and think of your theory of change for more than five minutes, if you’re planning to risk your life, and then don’t risk your life[1]; please pick actions thoughtfully and wisely and not because of the vibes[2].
I agree to the general point that taking decisions based on an actual theory of change is a much more effective way to have an impact in the world. I’ve personally thought quite a lot about why doing this hunger strike in front of DeepMind is net good, and I believe it’s having the intended impact, so I disagree with your implication that I’m basing my decisions on vibes. If you’d like to know more I’d be happy to talk to you in person in front of the DeepMind office or remotely.
Now, taking a step back and considering Guido’s strike, I want to say that even if you think that his actions were reckless and based on vibes, it’s worth evaluating whether his actions (and their consequences) will eventually turn out to be net negative. For one I don’t think I would have been out in front of DeepMind as I type this if it was not for Guido’s action, and I believe what we’re doing here in London is net good. But most importantly we’re still at the start of the strikes so it’s hard to tell what will happen as this continues. I’d be happy to have this discussion again at the end of the year, looking back.
Finally, I’d like to acknowledge the health risks involved. I’m personally looking over my health and there are some medics at King’s Cross that would be willing to help quickly if anything extreme was to happen. And given the length of the strikes so far I think what we’re doing is relatively safe, though I’m happy to be proven otherwise.
I do agree that Anthropic is the more safety-conscious actor and that (at first glance) it would make more sense to protest in front of the most reckless actors.
However, after thinking more carefully, here is why I think doing it in front of Anthropic might actually be good:
- OpenAI would have been more difficult: StopAI (that Guido is part of) have already tried things in front of OpenAI (like chaining themselves to OpenAI) that got them in some amount of trouble, and I imagine they would have gotten themselves kicked out earlier if they did that there.
- More employee pressure: a lot of Anthropic employees care about safety, so having someone on a hunger strike at the entrance would actually spark more internal debate at Anthropic compared to say a company that cares much less about safety. For instance, last year I believe the two main pressures around SB-1047 at Anthropic were Amazon & safety-conscious employees. If Guido was say kicked out by security that would create more debates than say if it happened at OAI.
- Dario has been a public advocate for AI risk: eg. on this recent podcast he said multiple times that he’s done more than any lab CEO w.r.t. being public about the risks from AI. It would make him look quite bad / inconsistent if he was to be responsible for any action that was going against such a strike.
- If something happens, it would probably be at Anthropic: I give a much higher credence to Guido getting a meeting with Anthropic leadership than say OpenAI leadership.
- It starts at Anthropic, but might continue elsewhere: this is one strike in front of one AI lab, but this will probably lead to other simulataneous strikes in front of the other labs as well.- Hunger strike #2, this time in front of DeepMind by (6 Sep 2025 1:45 UTC; 31 points)
- 's comment on Hunger strike in front of Anthropic by one guy concerned about AI risk by (EA Forum; 5 Sep 2025 16:44 UTC; 6 points)
- Hunger strike #2, this time in front of DeepMind by (EA Forum; 6 Sep 2025 1:43 UTC; 4 points)
- 's comment on Mikhail Samin’s Shortform by (8 Sep 2025 19:20 UTC; 4 points)
Some questions I have:
1. Compute bottleneck
The model says experiment compute becomes the binding constraint once coding is fast. But are frontier labs actually compute-bottlenecked on experiments right now? Anthropic runs inference for millions of users while training models. With revenue growing, more investment coming in, and datacenters being built, couldn’t they allocate eg. 2x more to research compute this year if they wanted?
2. Research taste improvement rate
The model estimates AI research taste improvement based on how quickly AIs have improved in a variety of metrics.
But researchers at a given taste level can now run many more experiments because Claude Code removes the coding bottleneck.
More experiment output means faster feedback, which in turn means faster taste development. So the rate at which human researchers develop taste should itself be accelerating. Does your model capture this? Or does it assume taste improvement is only a function of effective compute, not of experiment throughput?
3. Low-value code
Ryan’s argument (from his October post) is that AI makes it cheap to generate code, so people generate more low-level code they wouldn’t have otherwise written.
But here’s my question: if the marginal code being written is “low-value” in the sense of “wouldn’t have been worth a human’s time before,” isn’t that still a real productivity gain, if say researchers can now run a bunch of claude code agents instances to run experiments instead of having to interface with a bunch of engineers?
4. What AIs Can’t Do
The model treats research taste as qualitatively different from coding ability. But what exactly is the hard thing AIs can’t do? If it’s “generating novel ideas across disciplines” or “coming up with new architectures”, these seem like capabilities that scale with knowledge and reasoning, both improving. IIRC there’s some anecdotal evidence of novel discoveries of an LLM solving an Erdős problem, and someone from the Scott Aaronson sphere discussing AI contributions to something like quantum physics problems? Not sure.
If it’s “making codebases more efficient”, AIs already beat humans at competitive programming. I’ve seen some posts on LW discussing how they timed theirselves vs an AI against something that the AI should be able to do, and they beat the AI. But intuitively it does seem to me that models are getting better at the general “optimizing codebases” thing, even if it’s not quite best-human-level yet.
5. Empirical basis for β (diminishing returns)
The shift from AI 2027 to the new model seems to come partly from “taking into account diminishing returns”, aka the Jones model assumption that ideas get harder to find. What data did you use to estimate β? And given we’re now in a regime with AI-assisted research, why should historical rates of diminishing returns apply going forward?