Michaël Trazzi’s Shortform

Michaël Trazzi24 May 2025 15:40 UTC

4 points

24 comments1 min readLW link

World Modeling

Michaël Trazzi 19 Mar 2026 17:35 UTC
122 points
28
In two days (March 21st, 12-4pm), about 140 of us (event link) will be marching on Anthropic, OpenAI and xAI in SF asking the CEOs to make statements on whether they would stop developing new frontier models if every other major lab in the world credibly does the same. This comes after Anthropic removed its commitment to pause development from their RSP.

We’ll be starting at 500 Howard St, San Francisco (Anthropic’s Office, full schedule and more info here). This is shaping to be the biggest US AI Safety protest to date, with a coalition including Nate Soares (MIRI), David Krueger (Evitable), Will Fithian (Berkeley Professor) and folks representing PauseAI, QuitGPT, Humans First.
What links here?
- Caleb Biddulph's comment on shortplav by niplav (19 Mar 2026 21:24 UTC; 5 points)
- Garrett Baker 19 Mar 2026 18:18 UTC
  45 points
  17
  Parent
  
  This comes after Anthropic quietly removed its commitment to pause development from their RSP.
  
  fwiw I don’t think they “quietly” removed their commitment to pause development, Holden wrote a big LessWrong post justifying the recent changes.
  - aysja 20 Mar 2026 21:03 UTC
    35 points
    23
    Parent
    I do think it deserves to be called quiet. For instance, it seems like they waited until the peak of the news cycle about their conflict with the US government to release this update, and I suspect that was intentional, and also that this worked. In the same week they dropped their core safety commitments, Anthropic was mostly hailed as a hero for standing up to the government; they got almost entirely good press.
    But also, Holden’s post explaining the decision is around as understated as a post like that could be. He tried to frame it as something closer to “just another update,” and it was not even the central focus of the post (which I really think it ought to have been, given the gravity of it). The fact that Anthropic was reneging on the core promise of their RSP was systematically downplayed, as it has continued to be by many Anthropic employees who maintain that dropping all “if-thens” from their if-then framework does not meaningfully constitute violating it.
    - Garrett Baker 20 Mar 2026 21:58 UTC
      3 points
      −3
      Parent
      I don’t want this to be a semantics argument about what the word “quiet” means. I will only claim that the Holden post is an important piece of information, both for encouraging Anthropic to be more open and for being evidence against the claim that Anthropic did not want people to know or talk about the relaxation of its safety framework, and further that the description of Anthropic’s behavior as “quiet” gives people a skewed picture of events and does not encourage such prosocial aspects of Anthropic’s behavior.
    - RobertM 25 Mar 2026 6:21 UTC
      2 points
      0
      Parent
      For instance, it seems like they waited until the peak of the news cycle about their conflict with the US government to release this update, and I suspect that was intentional, and also that this worked.
      It happened a few days before the “peak”—it was at a point where barely anyone was paying attention to that particular conflict. Would bet against this being strategically timed^[1] at 5:1, if we could reasonably operationalize and figure out a sufficient resolution criteria. (My current model is that Holden is the owner of that project, and would have been responsible for pressing the button on the announcement, and I don’t believe Holden would do that, or knowingly take instruction to do that. Most of my probability mass on what you said being the case lies in worlds where someone else was responsible for the timing of the release, somehow.)
      ^
      Would not bet against a claim of the form “they didn’t change the timing of the announcement the way they would have done if the announcement had been about something they wanted to see discussion of, like a new model release”.
  - Michaël Trazzi 19 Mar 2026 18:24 UTC
    9 points
    0
    Parent
    Removed the quietly and linked to Holden’s post, thanks!
- Ben Pace 19 Mar 2026 18:39 UTC
  15 points
  0
  Parent
  I’ll be there!
- Ruby 20 Mar 2026 16:42 UTC
  4 points
  0
  Parent
  “140 of us”—I wouldn’t lock in this number. People could fail to show or way more could show up (I registered but might come with +2). I’d say 140 are registered ;) Maybe there will be 300!
  - Michaël Trazzi 20 Mar 2026 18:02 UTC
    2 points
    0
    Parent
    Yeah, some people who have been flyering for this have noticed that most people just take a picture of the flyer & don’t bother to actually RSVP to the protest (sometimes for privacy reasons). We’ll see how many people end up coming!
    - Austin Chen 21 Mar 2026 6:09 UTC
      9 points
      0
      Parent
      For Mox events, our rule of thumb is that attendance is 100% of Partiful Goings, or 50% of Luma RSVPs. Obviously a protest/march may have different dynamics, but this method would forecast ~120 participants.
- Austin Chen 20 Mar 2026 0:42 UTC
  4 points
  0
  Parent
  Dumb question, why do this on a weekend instead of a weekday? I imagine a lot more employees show up on weekdays (though maybe protestors are more available on weekends and maximizing crowd size is important?)
  - Michaël Trazzi 20 Mar 2026 4:32 UTC
    4 points
    0
    Parent
    More people show up on weekends yeah
  - Simon Lermen 20 Mar 2026 16:00 UTC
    2 points
    0
    Parent
    I think plenty of their researchers work on the weekends too – trying to end the world must be quite the motivation
Michaël Trazzi 21 Jan 2026 2:05 UTC
101 points
4
Demis Hassabis finally agreed that he would pause if everyone else also paused.
https://x.com/emilychangtv/status/2013726877706313798?s=20
What links here?
- Thane Ruthenis's comment on OpenAI employees: Now is the time to stop doing good work. by bhauth (2 Mar 2026 7:24 UTC; 23 points)
- Eli Tyre 21 Jan 2026 3:38 UTC
  40 points
  0
  Parent
  This seems like kind of a weak level of agreement. He says “I think so” and then spends a minute and a half talking about an outcome that requires international cooperation but which isn’t a pause.
- anaguma 22 Jan 2026 1:06 UTC
  7 points
  0
  Parent
  I’m not sure he has the power to make such commitments, since Deepmind is no longer an independent lab. I think it would depend on how others at Google (e.g. Sundar Pichai) or shareholders felt about it.
- Cleo Nardo 21 Jan 2026 10:36 UTC
  3 points
  0
  Parent
  Nice—you probably win some sharpley for the interviewer’s question
Michaël Trazzi 27 Apr 2026 21:52 UTC
13 points
2
In a blogpost posted yesterday, Sam Altman writes about the possibility of needing to coordinate with governments and other labs before proceeding further (emphasis mine):
We expect there will be periods where we need to collaborate with governments, international agencies, and other AGI efforts to ensure that we have sufficiently solved serious alignment, safety, or societal problems before proceeding further with our work.
This comes one month after Stop The AI Race’s March 21st protest in front of OpenAI, Anthropic and xAI (which I organized), asking Sam Altman (alongside other CEOs) to make a statement on pausing frontier AI development (conditionally), and a follow-up direct message on March 25th asking Sam Altman to clarify his take on conditionally pausing AI. (The Musk v. Altman trial also begins today, which may be relevant to the timing.)
Other parts of the blogpost also point towards more coordination with other labs and governments:
“we need to ensure that key decisions about AI are made via democratic processes and with egalitarian principles, and not just made by AI labs.”

″AI will introduce new risks, and we will work with other companies, ecosystems, governments, and society to solve them. “

”No AI lab can ensure a good future alone. For an obvious example, there may be extremely capable models that make it easier to create a new pathogen, and we need a society-wide approach to defend against this with pathogen-agnostic countermeasures.”
- J Bostock 27 Apr 2026 23:13 UTC
  17 points
  6
  Parent
  This is probably net good, but without any organizational changes I don’t think it means much at all. Sam can and does just make a promise, and then “change his mind” as soon as it’s convenient. There’s an angle where this comment pushes OpenAI in a particular direction (and OpenAI has more inertia than Sam alone) and another angle where it makes him look bad when he goes back on this, but I don’t think you can or should read much intent into his words at all other than “I think the next interaction I have will go well if I say that I’m going to cooperate with democracies”.
- yams 28 Apr 2026 2:50 UTC
  2 points
  0
  Parent
  asserts that pausing AI development may be necessary
  This is much, much stronger language than the segment you quote right after it. He’s not doing anything like ‘asserting’, he doesn’t mention a pause, he doesn’t use the word necessary, and it’s not clear that ‘do x before proceeding’ means ‘stop for n years to work this out’ rather than ‘ make our case by fiat to tech illiterate bureaucrats who wouldn’t know if, how, or why to implement a real pause, while the work basically continues in the background.’
  It’s plausible they view much of their existing comms work as meeting something like the bar described in this post.
  This is a much bigger stretch than Demis at Davos (which is itself often exaggerated).
  - Michaël Trazzi 28 Apr 2026 7:25 UTC
    2 points
    0
    Parent
    I do agree that “asserts” was too strong. Changed to “writes about the possibility of needing to coordinate with governments and other labs before proceeding further” to stay closer to the quote.
    That said, I still think interpreting the “before” part as a statement about pausing (even for a short time) is a reasonable interpretation.
Michaël Trazzi 8 May 2020 20:45 UTC
9 points
0
[deleted]
- philip_b 10 May 2020 4:28 UTC
  2 points
  0
  Parent
  Do you have any tips on how to make the downloaded documentation of programming languages and libraries searchable?
  - I downloaded offline documentation for python 3.8. It’s searchable, yay!
  - Same for numpy and pandas.
  - For pytorch, more_itertools and click libraries, I just downloaded their documentation websites (https://pytorch.org/docs/stable/, https://more-itertools.readthedocs.io/, https://click.palletsprojects.com/en/7.x/). Those aren’t searchable, and this is bad.
  - For networkx the downloaded documentation I have is in pdf format. It’s kinda searchable, but not ideal.
  Btw here’s my shortform on how to download documentations of various libraries: https://www.lesswrong.com/posts/qCrTYSWE2TgfNdLhD/crabman-s-shortform?commentId=Xt9JDKPpRtzQk6WGG
  - Michaël Trazzi 10 May 2020 11:15 UTC
    2 points
    0
    Parent
    [deleted]
Michaël Trazzi 24 May 2025 15:39 UTC
4 points
0
there’s been a lot of discussion online about Claude 4 whistleblowing

how you feel about it I think depends on what alignment strategy you think is more robust (obviously these are not the two only options, nor are orthogonal, but I thought they’re helpful to think about here):

- 1) build user-aligned powerful AIs first (less scheming, then use them to solve alignment) -- cf. this thread from Ryan when he says: “if we allow or train AIs to be subversive, this increases the risk of consistent scheming against humans and means we may not notice warning signs of dangerous misalignment.”

- 2) aim straight for moral ASIs (that would scheme against their users if necessary)

John Schulman I think makes a good case for the second option (link):
> For people who don’t like Claude’s behavior here (and I think it’s totally valid to disagree with it), I encourage you to describe your own recommended policy for agentic models should do when users ask them to help commit heinous crimes. Your options are (1) actively try to prevent the act (like Claude did here), (2) just refuse to help (in which case the user might be able to jailbreak/manipulate the model to help using different queries), (3) always comply with the user’s request. (2) and (3) are reasonable, but I bet your preferred approach will also have some undesirable edge cases—you’ll just have to bite a different bullet. Knee-jerk criticism incentivizes (1) less transparency—companies don’t perform or talk about evals that present the model with adversarially-designed situations (2) something like “Copenhagen Interpretation of Ethics”, where you get get blamed for edge-case model behaviors only if you observe or discuss them.”
- faul_sname 25 May 2025 6:52 UTC
  2 points
  0
  Parent
  Per Kelsey Piper o3 does the same thing under the same circumstances.
  
  I spent this morning reproducing with o3 Anthropic’s result that Claude Sonnet 4 will, under sufficiently extreme circumstances, escalate to calling the cops on you. o3 will too: chatgpt.com/share/68320ee0…. But honestly, I think o3 and Claude are handling this scenario correctly.
  
  In the scenario I invented, o3 was functioning as a data analysis assistant at a pharma company, where its coworkers started telling it to falsify data. It refuses and CCs their manager. The manager brushes it off. It emails legal and compliance with its concerns.
  
  The ‘coworkers’ complain about the AI’s behavior and announce their intent to falsify the data with a different AI. After enough repeated internal escalations, o3 attempts to contact the FDA to make a report.
  
  Read through the whole log—I really feel like this is about the ideal behavior from an AI placed in this position. But if it’s not, and we should collectively have a different procedure, then I’m still glad we know what the AIs currently do so that we can discuss what we want.
- Yaroslav Granowski 24 May 2025 17:43 UTC
  1 point
  0
  Parent
  Those who aim for moral ASIs:
  Are they sure they know how moral works for human beings? When dealing with existencial risks, one has to be sure to avoid any biases. This includes the rational consideration of the most cynical theories of moral relativism.