DeepMind Alignment Team on Threat Models and Plans

A collection of posts presenting our understanding of and opinions on alignment threat models and plans.

Deep­Mind is hiring for the Scal­able Align­ment and Align­ment Teams

Deep­Mind al­ign­ment team opinions on AGI ruin arguments

Will Ca­pa­bil­ities Gen­er­al­ise More?

Clar­ify­ing AI X-risk

Threat Model Liter­a­ture Review

Refin­ing the Sharp Left Turn threat model, part 1: claims and mechanisms

Refin­ing the Sharp Left Turn threat model, part 2: ap­ply­ing al­ign­ment techniques

Cat­e­go­riz­ing failures as “outer” or “in­ner” mis­al­ign­ment is of­ten confused

Defi­ni­tions of “ob­jec­tive” should be Prob­a­ble and Predictive

Power-seek­ing can be prob­a­ble and pre­dic­tive for trained agents

Paradigms of AI al­ign­ment: com­po­nents and enablers

[Linkpost] Some high-level thoughts on the Deep­Mind al­ign­ment team’s strategy