Leon Lang comments on ASI existential risk: Reconsidering Alignment as a Goal

Leon Lang 15 Apr 2025 23:14 UTC
16 points
1
I enjoyed reading the article! I have two counterarguments to some main points:

1. The article argues that alignment research might make things worse, and I think for many kinds of alignment research, that’s a good point. (Especially the kind of alignment research that made its way into current frontier AI!). On the other hand, if we are really careful with our AI research, we might manage to align AI to powerful principles such as “Don’t develop technologies which our human institutions cannot keep up with” etc.
My impression is that for very dangerous technologies that humans have historically developed, it *was* possible for humans to predict possible negative effects quite early (e.g., the nuclear chain reaction was conceived much earlier than nuclear fission; we don’t have mirror life afaik but people already predict that it could be really harmful). Thus, I guess ASI would also be capable of predicting this, and so if we align it to a principle such as “don’t develop marginally dangerous technology”, then I think this should in principle work.
2. I think the article neglects the possibility that AI could kill us all by going rogue *without* this requiring any special technology. E.g., the AI might simply be broadly deployed in physical robots that are much like the robots we have today, and control more and more of our infrastructure. Then, if all AIs collude, they could one day decide to deprive us of basic needs, making humans go extinct.
Possibly Michael Nielsen would counter that this *is* in line with the vulnerable world hypothesis since the technology that kills us all is then the AI itself. But that would stretch it a bit since the vulnerable world hypothesis usually assumes that something world-destroying can be deployed with minor resources, which isn’t the case here.