Thane Ruthenis comments on Claude 4

Thane Ruthenis 22 May 2025 20:09 UTC
26 points
26
Jan Leike^[1]: So many things to love about Claude 4! My favorite is that the model is so strong that we had to turn on additional safety mitigations according to Anthropic’s responsible scaling policy
That post sure aged well:
I find it hard to trust that AI safety people really care about AI safety. [...]
Whenever some new report comes out about AI capabilities, like the METR task duration projection, people talk about how “exciting” it is[1]. There is a missing mood here. I don’t know what’s going on inside the heads of x-risk people such that they see new evidence on the potentially imminent demise of humanity and they find it “exciting”. But whatever mental process results in this choice of words, I don’t trust that it will also result in them taking actions that reduce x-risk.
Edit: Like, I don’t want to do too much tone-policing and nitpicking-of-phrasing here. It’s not always necessarily totally unreasonable to be excited about getting access to more dangerous-therefore-powerful models, even if you’re an alignment researcher and you know alignment isn’t solved. Or it might’ve been just a badly worded expression of some neighbouring sentiment.
But that said, it sure doesn’t update me towards “Anthropic’s internal culture is actually taking the risks with grave seriousness”. Especially with it being not just an isolated gaffe, but part of a pattern of missing mood.
1. ^
  Previous OpenAI Superalignment lead, presumably currently holding a similar senior alignment researcher position at Anthropic.
What links here?
- Thane Ruthenis's comment on ryan_greenblatt’s Shortform by ryan_greenblatt (23 May 2025 21:51 UTC; 29 points)
- Archimedes 23 May 2025 1:44 UTC
  17 points
  9
  Parent
  Teetering on the edge of doom is exciting for me, much like riding a motorcycle at 200 mph or playing with professional-grade fireworks. I think it’s silly to pretend it’s not exciting to have powerful tools/toys, even though they’re likely to destroy us.
  - MichaelDickens 23 May 2025 2:19 UTC
    25 points
    33
    Parent
    I agree, you shouldn’t pretend not to be excited if you are.
    Adrenaline junkies should not be involved in building AGI, any more than they should be commercial pilots or bus drivers. (Less, even.)
    - ErioirE 23 May 2025 17:13 UTC
      5 points
      0
      Parent
      Adrenaline junkies should not be involved in building AGI, any more than they should be commercial pilots or bus drivers. (Less, even.)
      To follow the pattern of “Those with a large built-in incentive for X shouldn’t be in charge of X”:
      Ambitious people shouldn’t be handed power
      Kids shouldn’t decide the candy budget
      Engineers shouldn’t play Factorio
      
      Unfortunately with few exceptions those make up a large portion of the primary interested parties.
      
      Best of luck keeping them away for long.
      Not sarcasm. I hope we succeed. But incentives are stacked to make it difficult
      - uugr 23 May 2025 17:45 UTC
        1 point
        0
        Parent
        ...why shouldn’t engineers play Factorio?
        ErioirE 23 May 2025 18:12 UTC
        1 point
        0
        Parent
        Because it’s absurdly addictive, although it’s certainly possible to play it responsibly.
        It was partly a joke, party serious because I personally have a difficult time self-regulating if I let myself play it.