My current research interests:
1. Alignment in systems which are complex and messy, composed of both humans and AIs?
Recommended texts: Gradual Disempowerment, Cyborg Periods
2. Actually good mathematized theories of cooperation and coordination
Recommended texts: Hierarchical Agency: A Missing Piece in AI Alignment, The self-unalignment problem or Towards a scale-free theory of intelligent agency (by Richard Ngo)
3. Active inference & Bounded rationality
Recommended texts: Why Simulator AIs want to be Active Inference AIs, Free-Energy Equilibria: Toward a Theory of Interactions Between Boundedly-Rational Agents, Multi-agent predictive minds and AI alignment (old but still mostly holds)
4. LLM psychology and sociology: A Three-Layer Model of LLM Psychology, The Pando Problem: Rethinking AI Individuality, The Cave Allegory Revisited: Understanding GPT’s Worldview
5. Macrostrategy & macrotactics & deconfusion: Hinges and crises, Cyborg Periods again, Box inversion revisited, The space of systems and the space of maps, Lessons from Convergent Evolution for AI Alignment, Continuity Assumptions
Also I occasionally write about epistemics: Limits to Legibility, Conceptual Rounding Errors
Researcher at Alignment of Complex Systems Research Group (acsresearch.org), Centre for Theoretical Studies, Charles University in Prague. Formerly research fellow Future of Humanity Institute, Oxford University
Previously I was a researcher in physics, studying phase transitions, network science and complex systems.
I do agree there is some risk of the type you describe, but mostly it does not match my practical experience so far.
The approach to “avoid using the term” makes little sense. There is a type difference between area of study (‘understanding power’) and dynamic (‘gradual disempowerment’). I don’t think you can substitute term for area of study for term for a dynamic or thread model, so avoiding using the term could be done mostly by either inventing another term for the the dynamic, or not thinking about the dynamic, or similar moves, which seem epistemically unhealthy.
In practical terms I don’t think there is much effort to “create a movement based around a class of threat models”. At least as authors of the GD paper, when trying to support thinking about the problems, we use understanding-directed labels/pointers (Post-AGI Civilizational Equilibria), even though in many ways it could have been easier to use GD as a brand.
”Understanding power” is fine as a label for part of your writing, but in my view is basically unusable as term for coordination.
Also, in practical terms, gradual disempowerment does not seem particularly convenient set of ideas for justifying that working in an AGI company on something very prosaic which helps the company is the best thing to do. There is often a funny coalition of people who prefer not thinking about the problem including radical Yudkowskians (“GD distracts from everyone being scared of dying with very high probability very soon”), people working on prosaic methods with optimistic views about both alignment and the labs (“GD distracts from efforts to make [the good company building the good AI] to win”) and people who would prefer if everything was just neat technical puzzle and there was not need to think about power distribution.