try to formalise a more realistic agent, understand what it means for it to be aligned with us, […], and produce desiderata for a training setup that points at coherent AGIs similar to our model of an aligned agent.
Finally, people are writing good summaries of the learning-theoretic agenda!
Finally, people are writing good summaries of the learning-theoretic agenda!