RHollerith comments on The Field of AI Alignment: A Postmortem, and What To Do About It

RHollerith 27 Dec 2024 0:47 UTC
3 points
4
You do realize that by “alignment”, the OP (John) is not talking about techniques that prevent an AI that is less generally capable than a capable person from insulting the user or expressing racist sentiments?

We seek a methodology for constructing an AI that either ensures that the AI turns out not to be able to easily outsmart us or (if it does turn out to be able to easily outsmart us) ensures (or makes it unlikely) that it won’t kill us all or do something other terrible thing. (The former is not researched much compared to the latter, but I felt the need to include it for completeness.)

The way it is now, it is not even clear whether you and the OP (John) are talking about the same thing (because “alignment” has come to have a broad meaning).

If you want to continue the conversation, it would help to know whether you see a pressing need for a methodology of the type I describe above. (Many AI researchers do not: they think that outcomes like human extinction are quite unlikely or at least easy to avoid.)