Michaël Trazzi
Congrats on the launch!
I would add the main vision for this (from the website) directly in the post as quoted text, so that people can understand what you’re doing (& discuss).
I was trying to map out disagreements between people who are concerned enough about AI risk.Agreed that this represents only a fraction of the people who talk about AI risk, and that there are a lot of people who will use some of these arguments as false justifications for their support of racing.
EDIT: as TsviBT pointed out in his comment, OP is actually about people who self-identify as members of the AI Safety community. Given that, I think that the two splits I mentioned above are still useful models, since most people I end up meeting who self-identify as members of the community seem to be sincere, without stated positions that differ from their actual reasons for why they do things. I have met people who I believe to be insincere, but I don’t think they self-identify as part of the AI Safety community. I think that TsviBT’s general point about insincerity in the AI Safety discourse is valid.
You make a valid point. Here’s another framing that makes the tradeoff explicit:
Group A) “Alignment research is worth doing even though it might provide cover for racing”
Group B) “The cover problem is too severe. We should focus on race-stopping work instead”
I’d split things this way:
Group A) “Given that stopping the AI race seems nearly impossible, I focus on ensuring humanity builds safe superintelligence”
Group B) “Given that building superintelligence safely under current race dynamics seems nearly impossible, I focus on stopping the AI race”
Group C) “Given deep uncertainty about whether we can align superintelligence under race conditions or stop the race itself, I work to ensure both strategies receive enough resources.”
Another Anthropic employee told me that 90% of the code written by AI wasn’t crazy. He said something like: “most of the work happens in the remaining 10%”.
how confident are you that safety researchers will be able to coordinate at crunch time, and it won’t be eg. only safety researchers at one lab?
without taking things like personal fit into account, how would you compare say doing prosaic ai safety research pre-crunch time to policy interventions helping you coordinate better at crunch time (for instance helping safety teams coordinate better at crunch time, or even buying more crunch time)?
Hi Mikhail, thanks for offering your thoughts on this. I think having more public discussion on this is useful and I appreciate you taking the time to write this up.
I think your comment mostly applies to Guido in front of Anthropic, and not our hunger strike in front of Google DeepMind in London.
Hunger strikes can be incredibly powerful when there’s a just demand, a target who would either give in to the demand or be seen as a villain for not doing so, a wise strategy, and a group of supporters.
I don’t think these hunger strikes pass the bar. Their political demands are not what AI companies would realistically give in to because of a hunger strike by a small number of outsiders.
I don’t think I have been framing Demis Hassabis as a villain and if you think I did it would be helpful to add a source for why you believe this.
I’m asking Demis Hassabis to “publicly state that DeepMind will halt the development of frontier AI models if all the other major AI companies agree to do so.” which I think is a reasonable thing to state given all public statements he made regarding AI Safety. I think that is indeed something that a company such as Google DeepMind would give in.
A hunger strike can bring attention to how seriously you perceive an issue. If you know how to make it go viral, that is; in the US, hunger strikes are rarely widely covered by the media.
I’m currently in the UK, and I can tell you that there’s already been two pieces published on Business Insider. I’ve also given three interviews in the past 24 hours to journalists to contribute to major publications. I’ll try to add links later if / once these get published.
At the moment, these hunger strikes are people vibe-protesting. They feel like some awful people are going to kill everyone, they feel powerless, and so they find a way to do something that they perceive as having a chance of changing the situation.
Again, I’m pretty sure I haven’t framed people as “awful”, and would be great if you could provide sources to that statement. I also don’t feel powerless. My motivation for doing this was in part to provide support to Guido’s strike in front of Anthropic, which feels more like helping an ally, joining forces.
I find it actually empowering to be able to be completely honest about what I actually think DeepMind should do to help stop the AI race and receive so much support from all kinds of people on the street, including employees from Google, Google DeepMind, Meta and Sony. I am also grateful to have Denys with me, who flew from Amsterdam to join the hunger strike, and all the journalists who have taken the time to talk to us, both in person and remotely.
Action is better than inaction; but please stop and think of your theory of change for more than five minutes, if you’re planning to risk your life, and then don’t risk your life[1]; please pick actions thoughtfully and wisely and not because of the vibes[2].
I agree to the general point that taking decisions based on an actual theory of change is a much more effective way to have an impact in the world. I’ve personally thought quite a lot about why doing this hunger strike in front of DeepMind is net good, and I believe it’s having the intended impact, so I disagree with your implication that I’m basing my decisions on vibes. If you’d like to know more I’d be happy to talk to you in person in front of the DeepMind office or remotely.
Now, taking a step back and considering Guido’s strike, I want to say that even if you think that his actions were reckless and based on vibes, it’s worth evaluating whether his actions (and their consequences) will eventually turn out to be net negative. For one I don’t think I would have been out in front of DeepMind as I type this if it was not for Guido’s action, and I believe what we’re doing here in London is net good. But most importantly we’re still at the start of the strikes so it’s hard to tell what will happen as this continues. I’d be happy to have this discussion again at the end of the year, looking back.
Finally, I’d like to acknowledge the health risks involved. I’m personally looking over my health and there are some medics at King’s Cross that would be willing to help quickly if anything extreme was to happen. And given the length of the strikes so far I think what we’re doing is relatively safe, though I’m happy to be proven otherwise.
I do agree that Anthropic is the more safety-conscious actor and that (at first glance) it would make more sense to protest in front of the most reckless actors.
However, after thinking more carefully, here is why I think doing it in front of Anthropic might actually be good:
- OpenAI would have been more difficult: StopAI (that Guido is part of) have already tried things in front of OpenAI (like chaining themselves to OpenAI) that got them in some amount of trouble, and I imagine they would have gotten themselves kicked out earlier if they did that there.
- More employee pressure: a lot of Anthropic employees care about safety, so having someone on a hunger strike at the entrance would actually spark more internal debate at Anthropic compared to say a company that cares much less about safety. For instance, last year I believe the two main pressures around SB-1047 at Anthropic were Amazon & safety-conscious employees. If Guido was say kicked out by security that would create more debates than say if it happened at OAI.
- Dario has been a public advocate for AI risk: eg. on this recent podcast he said multiple times that he’s done more than any lab CEO w.r.t. being public about the risks from AI. It would make him look quite bad / inconsistent if he was to be responsible for any action that was going against such a strike.
- If something happens, it would probably be at Anthropic: I give a much higher credence to Guido getting a meeting with Anthropic leadership than say OpenAI leadership.
- It starts at Anthropic, but might continue elsewhere: this is one strike in front of one AI lab, but this will probably lead to other simulataneous strikes in front of the other labs as well.- Hunger strike #2, this time in front of DeepMind by (6 Sep 2025 1:45 UTC; 31 points)
- 's comment on Hunger strike in front of Anthropic by one guy concerned about AI risk by (EA Forum; 5 Sep 2025 16:44 UTC; 6 points)
- Hunger strike #2, this time in front of DeepMind by (EA Forum; 6 Sep 2025 1:43 UTC; 4 points)
- 's comment on Mikhail Samin’s Shortform by (8 Sep 2025 19:20 UTC; 4 points)
[Question] Is There An AI Safety GiveWell?
The one I know is outside of EA (they help people in Cameroon). The info I got about this being important and the timeline was mostly from the guy who runs it, who has experience with multiple associations. Basically you send paperwork via mail.
The “risking audits” part I got from here (third paragraph counting from the end).
Note: there’s something in France called “reçus fiscaux”, which I’ll translate to “fiscal receipt”, that is the thing you do to collect tax-deductible donations.
While you can technically do that from just the initial (easy) paperwork, a lot of associations actually go through a longer (and harder) process to get a “rescrit fiscal”, which is basically a pre-clearance saying you can really collect tax-deductible donations if you continue doing the same kind of thing.
If you only do the easy thing and not the longer thing (which can take like 6 months to a year) then you risk audits (which are especially likely if you’re collecting a bunch of these fiscal receipts wihtout ever doing the hard thing) that can then lead to penalties.
Why I’m Posting AI-Safety-Related Clips On TikTok
Will there be any recording?
SB-1047 Documentary: The Post-Mortem
What’s your version of AI 2027 (aka most likely concrete scenario you imagine for the future), and how does control end up working out (or not working out) in different outcomes.
That’s not really how Manhattan projects are supposed to work
how does your tool compare to stampy or just say asking these questions without the 200k tokens?
I like the design, and think it was worth doing. Regarding making sure “people can easily turn it off from the start” next time, I wanted to offer the datapoint that it took me quite a while to notice the disable button. (It’s black on black, and quite at the edge of the screen, especially if you’re using a horizontal monitor).
Thanks for writing this—it introduces a concept I hadn’t considered before.
However, I do find myself disagreeing on many of the specific arguments:
“Has someone you know ever had a ‘breakthrough’ from coaching, meditation, or psychedelics — only to later have it fade”
I think this misses that those “fading” breakthroughs are actually the core mechanisms of growth. The way I see it, people who are struggling are stuck in a maze. Through coaching/meditation/psychedelics, they glimpse a path out, but when they’re back in the maze with a muddy floor, they might not fully remember. My claim is that through integration, they learn which mental knobs to switch to get out. And changing their environments will make the mud / maze disappear.
“after my @jhanatech retreat I was like ‘I’m never going to be depressed again!’ then proceeded to get depressed again...”
I don’t think the jhanatech example is great here. During their retreats (I’ve done one), they explicitly insist you integrate jhanas, by doing normal things like cooking, walking, talking to close friends. And they go to extreme lengths to make sure you continue practicing after. I do know multiple people who have continued integrating those jhanic states post-retreat, or at least the core of the lessons they learned after.
“For example, many people experience ego deaths that can last days or sometimes months.”
My experience talking to meditation/psychedelics folks is that ego death becomes increasingly accessible after the first time, and the diminished ego often stays permanently even if the full feeling doesn’t.
“If someone has a ‘breakthrough’ that unexpectedly reverts, they can become jaded on progress itself...”
I agree non-integrated breakthroughs can lead to hopelessness. However, this “most depressed person you know” basically has many puzzle pieces missing and an unfavorable environment. What needs to happen is finding the pieces, integrating them, while transforming their environment.
“The simplest, most common way this happens is via cliche inspirational statements: [...] ‘Just let go of all resistance,’”
“Let go of resistance” points at something quite universal. The fact that not-processing things makes them stronger. I don’t think this one loses its effect like you mention.
“Flaky breakthroughs are common. Long-term feedback loops matter!”
Note: I do agree with your main thesis, which I’d paraphrase as: “we need to ensure long-term positive outcomes, not just short-term improvements, and unfortunately coaches don’t really track that.”
My guess at what’s happening here: for the first iterations of MATS (think MATS 2.0 at the Lightcone WeWork) you would have folks who were already into AI Safety for quite a long time and were interested in doing some form of internship-like thing for a summer. But as you run more cohorts (and make the cohorts bigger) then the density of people who have been interested in safety for a long time naturally decreases (because all the people who were interested in safety for years already applied to previous iterations).