Jan Leike[1]: So many things to love about Claude 4! My favorite is that the model is so strong that we had to turn on additional safety mitigations according to Anthropic’s responsible scaling policy
I find it hard to trust that AI safety people really care about AI safety. [...]
Whenever some new report comes out about AI capabilities, like the METR task duration projection, people talk about how “exciting” it is[1]. There is a missing mood here. I don’t know what’s going on inside the heads of x-risk people such that they see new evidence on the potentially imminent demise of humanity and they find it “exciting”. But whatever mental process results in this choice of words, I don’t trust that it will also result in them taking actions that reduce x-risk.
Edit: Like, I don’t want to do too much tone-policing and nitpicking-of-phrasing here. It’s not always necessarily totally unreasonable to be excited about getting access to more dangerous-therefore-powerful models, even if you’re an alignment researcher and you know alignment isn’t solved. Or it might’ve been just a badly worded expression of some neighbouring sentiment.
But that said, it sure doesn’t update me towards “Anthropic’s internal culture is actually taking the risks with grave seriousness”. Especially with it being not just an isolated gaffe, but part of a pattern of missing mood.
Teetering on the edge of doom is exciting for me, much like riding a motorcycle at 200 mph or playing with professional-grade fireworks. I think it’s silly to pretend it’s not exciting to have powerful tools/toys, even though they’re likely to destroy us.
Adrenaline junkies should not be involved in building AGI, any more than they should be commercial pilots or bus drivers. (Less, even.)
To follow the pattern of “Those with a large built-in incentive for X shouldn’t be in charge of X”: Ambitious people shouldn’t be handed power Kids shouldn’t decide the candy budget Engineers shouldn’t play Factorio
Unfortunately with few exceptions those make up a large portion of the primary interested parties.
Best of luck keeping them away for long. Not sarcasm. I hope we succeed. But incentives are stacked to make it difficult
Because it’s absurdly addictive, although it’s certainly possible to play it responsibly. It was partly a joke, party serious because I personally have a difficult time self-regulating if I let myself play it.
That post sure aged well:
Edit: Like, I don’t want to do too much tone-policing and nitpicking-of-phrasing here. It’s not always necessarily totally unreasonable to be excited about getting access to more dangerous-therefore-powerful models, even if you’re an alignment researcher and you know alignment isn’t solved. Or it might’ve been just a badly worded expression of some neighbouring sentiment.
But that said, it sure doesn’t update me towards “Anthropic’s internal culture is actually taking the risks with grave seriousness”. Especially with it being not just an isolated gaffe, but part of a pattern of missing mood.
Previous OpenAI Superalignment lead, presumably currently holding a similar senior alignment researcher position at Anthropic.
Teetering on the edge of doom is exciting for me, much like riding a motorcycle at 200 mph or playing with professional-grade fireworks. I think it’s silly to pretend it’s not exciting to have powerful tools/toys, even though they’re likely to destroy us.
I agree, you shouldn’t pretend not to be excited if you are.
Adrenaline junkies should not be involved in building AGI, any more than they should be commercial pilots or bus drivers. (Less, even.)
To follow the pattern of “Those with a large built-in incentive for X shouldn’t be in charge of X”:
Ambitious people shouldn’t be handed power
Kids shouldn’t decide the candy budget
Engineers shouldn’t play Factorio
Unfortunately with few exceptions those make up a large portion of the primary interested parties.
Best of luck keeping them away for long.
Not sarcasm. I hope we succeed. But incentives are stacked to make it difficult
...why shouldn’t engineers play Factorio?
Because it’s absurdly addictive, although it’s certainly possible to play it responsibly.
It was partly a joke, party serious because I personally have a difficult time self-regulating if I let myself play it.