Thomas Kwa comments on The Field of AI Alignment: A Postmortem, and What To Do About It

Thomas Kwa 16 Dec 2025 22:58 UTC
21 points
13
I’m giving this −4 points because it seems anti-helpful on net, but not all bad.
- The title is both clickbait and irritating to read for anyone who disagrees. It’s like the author is trying to gaslight us into agreeing that alignment is dead, despite the field growing every year (including many agent foundations approaches). The linked resources do not sufficiently explain why the author thinks this way. My guess is it only got 328 karma because it was so provocative.
- The author is wrong about several verifiable things eg the technical background of MATS scholars
- OP is a big fan of ILIAD. ILIAD people are split—Vanessa likes it but see Alexander Oldenziel’s somewhat passive-aggressive comment claiming OP doesn’t put in effort to understand what he’s critiquing.
- As Leo Gao points out in the comments, it’s true in many fields that “most work will be crap (often predictably so ex ante), due in part to bad incentives, and then there will be a few people who still do good work anyways.” So to expect otherwise for alignment is actually a very high standard.
- As I wrote last year, much of what OP calls streetlighting is actually correct prioritization by tractability. Ok, streetlighting is also a problem sometimes, but OP’s proposed solutions also seem fairly misguided. I basically still stand by that comment.
- Overall I don’t disagree with everything in the post (incentives to think alignment is easy at labs are certainly real), but 80% of it just reads as OP, a person who likes physics-type approaches to alignment rather than prosaic ones, complaining about how the taste of the average alignment researcher differs from theirs.
What links here?
- Voting Results for the 2024 Review by RobertM (7 Feb 2026 3:48 UTC; 98 points)
- Deeper Reviews for the top 15 (of the 2024 Review) by Raemon (14 Jan 2026 23:59 UTC; 45 points)