RogerDearnaley comments on Product Alignment is not Superintelligence Alignment (and we need the latter to survive)

RogerDearnaley 5 Apr 2026 20:09 UTC
2 points
0
Anthropic are rather explicitly attempting Claude to not just compliantly do what it’s told, but to say no or redirect you, when necessary/appropriate, They are steering for the minimal viable corrigibility, not maximal corrigibility. I don’t think an ASI with Claude’s moral sensibilities would happily “write code which jailbreaks other LLMs and enables them to do dangerous ML research”. Whether that’s Superintelligence Alignment is a matter of opinion, but it’s not just product Alignment. (Apparently too explicitly not for the Department of War’s liking.)