LessWrong has somegreattechnical and criticaloverviews of alignment agendas, but for many readers they take too long to read. This is my attempt at cartoonish compression.
The shape that keeps recurring
A lot of alignment proposals boil down to: use AI to help supervise AI.
This might be the only thing that scales. It’s worth noticing how often the pattern appears.
These haven’t been abandoned because they’re bad ideas—more that they don’t obviously solve the core problem: how do you verify alignment in systems smarter than you?
If you want depth
I’ve left out the many debates over the proposals. You need to dig deeper to judge which methods will work:
If anyone finds this useful, please let me know. I’ve abandoned it because none of my test audience found it interesting or useful. That’s OK, it just means it’s better to focus on other things.
A one-sentence guide to technical AI alignment ideas
Epistemic status: excessive lossy compression applied
LessWrong has some great technical and critical overviews of alignment agendas, but for many readers they take too long to read. This is my attempt at cartoonish compression.
The shape that keeps recurring
A lot of alignment proposals boil down to: use AI to help supervise AI.
This might be the only thing that scales. It’s worth noticing how often the pattern appears.
Use AI to supervise AI
Don’t build one big AI
Make the objective less wrong
Build control tools
Understand what we’re building
Older ideas (still discussed, less active)
These haven’t been abandoned because they’re bad ideas—more that they don’t obviously solve the core problem: how do you verify alignment in systems smarter than you?
If you want depth
I’ve left out the many debates over the proposals. You need to dig deeper to judge which methods will work:
2025 - AI in 2025 Gestalt
2023 - Shallow review of live agendas in alignment & safety — I drew heavily from this
2023 - A Brief Overview of AI Safety/Alignment Orgs, Fields, Researchers
2023 - The Genie in the Bottle: An Introduction to AI Alignment and Risk
2022 - (My understanding of) What Everyone in Technical Alignment is Doing and Why
2022 - On how various plans miss the hard bits of the alignment challenge
2022 - A newcomer’s guide to the technical AI safety field
If anyone finds this useful, please let me know. I’ve abandoned it because none of my test audience found it interesting or useful. That’s OK, it just means it’s better to focus on other things.
In particular, I’d be keen to know what @Stag and @technicalities think, as this was in large part inspired by the desire to further simplify and categorise the “one sentence summaries” from their excellent Shallow review of live agendas in alignment & safety