Seth Herd comments on Legible vs. Illegible AI Safety Problems

Seth Herd 5 Nov 2025 22:11 UTC
7 points
3
I think this is insightful and valid. It’s closely related to how I think about my research agenda:
- Figure out how labs are most likely to attempt alignment
- Figure out how that’s most likely to go wrong
- Communicate about that clearly enough that it reaches them and prevents them from making those mistakes.
There’s a lot that goes in to each of those steps. It still seems like the best use of independent researcher time.
Of course there are a lot of caveats and nitpicks, as other comments have highlighted. But it seems like a really useful framing.
It’s also closely related to a post I’m working on, “the alignment meta-problem,” arguing that research at the meta or planning level is most valuable right now, since we have very poor agreement on what object-level research is most valuable. That meta-research would include making problems more legible.