Adrià Garriga-alonso comments on Alignment will happen by default. What’s next?

Adrià Garriga-alonso 26 Nov 2025 5:39 UTC
0 points
−3
AF
Kind of agree with first paragraph, but I think it’s for economic outcompetition reasons and not for intent misalignment reasons. Honestly have no clue what to do about that.

Re your bullet point, I’m inclined to bite it. Yes, alignment wise, I think the lab could pull this off, though there is ~5% probability of failure which would have potentially huge stakes here. Claude would shut itself down. (This is an empirical roleplay that we perhaps should test.)

I don’t think this is possible because nobody can possibly have a DSA this big, and worse, permanent even after the big AI is gone.

I think some sort of takeover isn’t that hard and happens by default (and seems to be what basically all the labs and most alignment researchers are trying to do)

Bleak. I tend to disagree for the same reasons as previous paragraph: having a DSA that large is hard when everyone else also has AI. But I think oligarchy is fairly likely and gradual disempowerment a problem.