Michaël Trazzi
I think that Joseph Miller is one of the most competent people in the space and I think PauseAI UK should continue to get funded.
However, there are some things that I think would be worth clarifying in your post:
On the Political work
We wrote a memo which was sent to all MPs prior to the debate and drafted some of the speeches, putting us in a strong position to work with those MPs when proposing amendments to the Cyber Security and Resilience Bill.
How do you know it put you in a stronger position? Like how many MPs do you know used some of your speech notes, and how many strong relationships did you develop?
In October we held a screening in the UK Parliament of filmmaker Michaël Trazzi’s documentary about SB-1047, the proposed California AI legislation. This helped to inform MPs and Peers about the kinds of AI legislation that could be in a UK AI bill
Thanks again for hosting this! Can you clarify for readers how many MPs actually came to the movie screening & talked to you guys? I think it makes a difference to know if it was like 1-2 MPs or dozens. (Note: I remember you telling me it was more like the former, happy for you to rectify).
On The Hypothetical Scenarios
Now, regarding the hypothetical scenarios:
PauseAI UK has 10,000 highly dedicated volunteers who act as a dominant lobbying force on AI policy matters.
How realistic is this scenario? Like how many dedicated volunteers does PauseAI currently have? Without mentioning the timeline and the likelihood of this scenario, I’m wondering if you’re thinking of a 0.1% ideal scenario over 5 years, or if you’re actually 50% confident this could happen in 2-3 years. Like if you currently have 5-10 dedicated volunteers spending several hours (> 5 hours) a week working on PauseAI UK, then 10,000 would be more than 10 doublings, so ~6 years if we believe the number of dedicated volunteers follow the same exponential growth as you claim for protests?
Now regarding the other scenario:
PauseAI protests double in size every 7 months as AI capability itself improves exponentially. [...] PauseAI UK organises a march in Westminster with 1 million attendees and dominates headlines in the British press. The prime minister is obliged to respond and commits to opening negotiations for a global pause agreement.
OK so basically, PauseAI UK continue doing protests, the 300 people early 2026 protest becomes 600 by October 2026, and so forth and so on until… Feb 2029 when we get TED-AI (if we believe the AI Futures project people)? In that case, we’d have ~4 more doublings, aka ~10k people protesting, so the 1M number is actually a 100x of the default case without warning shot.
To estimate the expected value of that scenario, and how useful having PauseAI UK’s impact was there, we should consider:
How likely (P_dou) is that trend of doubling every 7 months to hold in the future?
How likely (P_war) is it that we get a specific warning shot big enough to give you a big enough increase (eg. the 100x mentioned before) in protest size?
How likely (P_gov) is it that your government will actually do something based on the protest?
So very roughly, that scenario would be about like P_dou x P_war x P_gov likely. In my view, if I was to give very rough numbers that’d be about 0.5 x 0.1 x 0.1 = 0.5% likely.
And then you’d need to also factor in how likely would it be for a protest of this size to happen without PauseAI, and how much leverage the UK would have to lead us to an actual treaty anyway.
Regarding the Exponentials and 7 months doubling
You probably have way more context on this than me, but from looking at the data quickly my reading is:
You basically had the same turnout from Nov 2023 and May 2024
Then on Feb 2025 there was a surprisingly low datapoint
Then on Jun 2025 you had this protest in front of DeepMind where 100+ people signed up but you got closer to 75 in turnout?
Then on Feb 2026 you organize this event that got 227 signups on luma, and ~300 people showed up
Some comments on this:
For the last point, you mix Pull the Plug and PauseAI in the number of people who came, but you don’t mix them in the signups. So what seems like more people showing up than signing up is actually because Pull the Plug had a different sign up page.
From what you say there was the same amount of people who came from Pull the Plug and PauseAI. So about 150.
So essentially, per the metric of how many people actively asking for a Pause, it seems like we were closer to 150 by Feb 2026. So the number of attendes specifically for pausing were like 20, 20, 40, 150 for these 4 datapoints over 2.5 years, which is not clearly a 7-months doubling / exponential?
Again, I don’t want to say that your work is not valuable. I think AI Safety activism is probably one of the most neglected things to do. And I really hope you get funded.
But I think there are many points in this post that would be worth clarifying.
[Note: wrote this quickly, might include errors]
.
I do agree that “asserts” was too strong. Changed to “writes about the possibility of needing to coordinate with governments and other labs before proceeding further” to stay closer to the quote.
That said, I still think interpreting the “before” part as a statement about pausing (even for a short time) is a reasonable interpretation.
In a blogpost posted yesterday, Sam Altman writes about the possibility of needing to coordinate with governments and other labs before proceeding further (emphasis mine):
We expect there will be periods where we need to collaborate with governments, international agencies, and other AGI efforts to ensure that we have sufficiently solved serious alignment, safety, or societal problems before proceeding further with our work.
This comes one month after Stop The AI Race’s March 21st protest in front of OpenAI, Anthropic and xAI (which I organized), asking Sam Altman (alongside other CEOs) to make a statement on pausing frontier AI development (conditionally), and a follow-up direct message on March 25th asking Sam Altman to clarify his take on conditionally pausing AI. (The Musk v. Altman trial also begins today, which may be relevant to the timing.)
Other parts of the blogpost also point towards more coordination with other labs and governments:
“we need to ensure that key decisions about AI are made via democratic processes and with egalitarian principles, and not just made by AI labs.”
″AI will introduce new risks, and we will work with other companies, ecosystems, governments, and society to solve them. “
”No AI lab can ensure a good future alone. For an obvious example, there may be extremely capable models that make it easier to create a new pathogen, and we need a society-wide approach to defend against this with pathogen-agnostic countermeasures.”
I’ve been pretty impressed with ControlAI’s team & ability to talk to many policymakers in the UK & US overall.
At the moment, we are cautiously optimistic: in the past 5 months, with ~1 staff member,[16]we’ve managed to personally meet with and brief 18 members of Congress, as well as over 90 Congressional offices.
Footnote 16 says:
1 member for most of this period; the 2nd member joined in the past month.
How successful do you expect the third, fourth, fifth, etc. person you hire to be at getting those meetings?
Yeah, some people who have been flyering for this have noticed that most people just take a picture of the flyer & don’t bother to actually RSVP to the protest (sometimes for privacy reasons). We’ll see how many people end up coming!
More people show up on weekends yeah
Removed the quietly and linked to Holden’s post, thanks!
In two days (March 21st, 12-4pm), about 140 of us (event link) will be marching on Anthropic, OpenAI and xAI in SF asking the CEOs to make statements on whether they would stop developing new frontier models if every other major lab in the world credibly does the same. This comes after Anthropic removed its commitment to pause development from their RSP.
We’ll be starting at 500 Howard St, San Francisco (Anthropic’s Office, full schedule and more info here). This is shaping to be the biggest US AI Safety protest to date, with a coalition including Nate Soares (MIRI), David Krueger (Evitable), Will Fithian (Berkeley Professor) and folks representing PauseAI, QuitGPT, Humans First.
Some questions I have:
1. Compute bottleneckThe model says experiment compute becomes the binding constraint once coding is fast. But are frontier labs actually compute-bottlenecked on experiments right now? Anthropic runs inference for millions of users while training models. With revenue growing, more investment coming in, and datacenters being built, couldn’t they allocate eg. 2x more to research compute this year if they wanted?
2. Research taste improvement rate
The model estimates AI research taste improvement based on how quickly AIs have improved in a variety of metrics.
But researchers at a given taste level can now run many more experiments because Claude Code removes the coding bottleneck.
More experiment output means faster feedback, which in turn means faster taste development. So the rate at which human researchers develop taste should itself be accelerating. Does your model capture this? Or does it assume taste improvement is only a function of effective compute, not of experiment throughput?3. Low-value code
Ryan’s argument (from his October post) is that AI makes it cheap to generate code, so people generate more low-level code they wouldn’t have otherwise written.
But here’s my question: if the marginal code being written is “low-value” in the sense of “wouldn’t have been worth a human’s time before,” isn’t that still a real productivity gain, if say researchers can now run a bunch of claude code agents instances to run experiments instead of having to interface with a bunch of engineers?
4. What AIs Can’t Do
The model treats research taste as qualitatively different from coding ability. But what exactly is the hard thing AIs can’t do? If it’s “generating novel ideas across disciplines” or “coming up with new architectures”, these seem like capabilities that scale with knowledge and reasoning, both improving. IIRC there’s some anecdotal evidence of novel discoveries of an LLM solving an Erdős problem, and someone from the Scott Aaronson sphere discussing AI contributions to something like quantum physics problems? Not sure.
If it’s “making codebases more efficient”, AIs already beat humans at competitive programming. I’ve seen some posts on LW discussing how they timed theirselves vs an AI against something that the AI should be able to do, and they beat the AI. But intuitively it does seem to me that models are getting better at the general “optimizing codebases” thing, even if it’s not quite best-human-level yet.
5. Empirical basis for β (diminishing returns)
The shift from AI 2027 to the new model seems to come partly from “taking into account diminishing returns”, aka the Jones model assumption that ideas get harder to find. What data did you use to estimate β? And given we’re now in a regime with AI-assisted research, why should historical rates of diminishing returns apply going forward?
Demis Hassabis finally agreed that he would pause if everyone else also paused.
https://x.com/emilychangtv/status/2013726877706313798?s=20
I’m not sure it might result in a wake up of AI Researchers.
See this thread yesterday between Chris Painter and Dean Ball:Chris: I have the sense that, as of the last 6 months, a lot of tech now thinks an intelligence explosion is more plausible than they did previously. But I don’t feel like I’m hearing a lot about that changing people’s minds on the importance of alignment and control research.
Dean: do people really think alignment and control research is unimportant? it seems like a big part of why opus is so good is the approach ant took to aligning it, and like basically everyone recognizes this?Chris: I’m not sure they think it’s unimportant. It’s more that around a year ago a lot of people would’ve said something like “Well, some people are really nervous about alignment and control research and loss of control etc, but that’s because they have this whole story of AI foom and really dramatic self-improvement. I think that story is way overstated, these models just don’t speed me up that much today, and I think we’ll have issues with autonomy for a long time, it’s really hard.” So, they often stated their objection in a way that made it sound like rapid progress on AI R&D automation would change their mind. To be clear, I think there are stronger objections that they could have raised and could still raise, like “we will then move to hardware bottlenecks, which will require AI proliferation for a true speed up to materialize”. Also, sorry if it’s rough for me to not be naming specific names, would just take time to pull examples.
Using @ryan_greenblatt’s updated 5-month doubling time: we reach the 1-month horizon from AI 2027 in ~5 doublings (Jan 2028) at 50% reliability, and ~8 doublings (Apr 2029) at 80% reliability. If I understand correctly, your model uses 80% reliability while also requiring 30x cheaper and faster than humans. It does seem like if the trend holds, by mid-2029 the models wouldn’t be much more expensive or slower. But I agree that if a lab tried to demonstrate “superhuman coder” on METR by the end of next year using expensive scaffolding / test-time compute (similar to o1 on ARC-AGI last year), it would probably exceed 30x human-cost, even if already 30x faster.
Fixed
Any updates on the 2025 numbers for Lighthaven? (cf. this table from last year’s fundraiser)
My guess at what’s happening here: for the first iterations of MATS (think MATS 2.0 at the Lightcone WeWork) you would have folks who were already into AI Safety for quite a long time and were interested in doing some form of internship-like thing for a summer. But as you run more cohorts (and make the cohorts bigger) then the density of people who have been interested in safety for a long time naturally decreases (because all the people who were interested in safety for years already applied to previous iterations).
Congrats on the launch!
I would add the main vision for this (from the website) directly in the post as quoted text, so that people can understand what you’re doing (& discuss).
I was trying to map out disagreements between people who are concerned enough about AI risk.Agreed that this represents only a fraction of the people who talk about AI risk, and that there are a lot of people who will use some of these arguments as false justifications for their support of racing.
EDIT: as TsviBT pointed out in his comment, OP is actually about people who self-identify as members of the AI Safety community. Given that, I think that the two splits I mentioned above are still useful models, since most people I end up meeting who self-identify as members of the community seem to be sincere, without stated positions that differ from their actual reasons for why they do things. I have met people who I believe to be insincere, but I don’t think they self-identify as part of the AI Safety community. I think that TsviBT’s general point about insincerity in the AI Safety discourse is valid.
You make a valid point. Here’s another framing that makes the tradeoff explicit:
Group A) “Alignment research is worth doing even though it might provide cover for racing”
Group B) “The cover problem is too severe. We should focus on race-stopping work instead”
Regarding the CNN interview
You say:
But the actual exchange on the downsides was (emphasis mine):
So Clark reframed the question in terms of verification and trust, but ended his answer on loss of control.
Similarly, you write:
But here you’re not quoting the rest of his answer where he’s actually talking about removing our foot from the gas pedal and stop accelerating (emphasis mine):
So while I agree that Dario has not been calling for a pause, it seems to me that Jack Clark’s post on RSI is more coherent with what he said in his CNN interview than your post suggests.
Regarding Anthropic Calling for a Pause
You also write that it’s part of a pattern of making “no concrete commitments”:
And that AI companies will not lead a slowdown:
But Jack Clark ends the RSI blogpost with an explicit commitment to organize conversations on RSI and coordination, and to publish the results:
I agree that it’s a much weaker commitment than the actual pause commitments we need. And obviously “organizing conversations” about RSI & “better options for coordination and deliberation” is definitely not the same as advocating for a pause.
But I do think Jack Clark is taking some steps here to open up the conversation on RSI & pausing.