Roughly, the line of research is something like “what propensities do we want to train the AI models to have?”, especially in near-term/the next few generations. With the hope that this either generalizes well by seeding and inertia or otherwise has a bunch of other positive downstream consequences listed in the original post.
“Alignment target” the way you define it[1] is very important but also far-mode. And we might hope that the AIs can help us a lot on the way to ASI (e.g. via CEV or some other assisted reflection process). So it’s possible we can punt this question to our future selves with AI assistance.
(Though I think it’s plausible that significantly more people should be thinking about the post-ASI alignment target right now as well, I just haven’t really thought about that question recently)
Roughly, the line of research is something like “what propensities do we want to train the AI models to have?”, especially in near-term/the next few generations. With the hope that this either generalizes well by seeding and inertia or otherwise has a bunch of other positive downstream consequences listed in the original post.
“Alignment target” the way you define it[1] is very important but also far-mode. And we might hope that the AIs can help us a lot on the way to ASI (e.g. via CEV or some other assisted reflection process). So it’s possible we can punt this question to our future selves with AI assistance.
(Though I think it’s plausible that significantly more people should be thinking about the post-ASI alignment target right now as well, I just haven’t really thought about that question recently)
which tbc I think is also the historical definition.
I suspect that it can’t be punted. e.g. see Clarifying “wisdom”: Foundational topics for aligned AIs to prioritize before irreversible decisions and Problems I’ve Tried to Legibilize