Ruby comments on 6 reasons why “alignment-is-hard” discourse seems alien to human intuitions, and vice-versa

Ruby 25 Dec 2025 2:44 UTC
17 points
9
Curated! I very much like the project of finding upstream cruxes to different intuitions regarding AI alignment. Oddly, such cruxes can be invisible until someone points them out. It’s also cool how Steven’s insight here isn’t a one-off post, but flows from his larger research project and models, kind of the project paying dividends. (To clarify, in curating this I’m not saying it’s definitely correct according to me, but I find it quite plausible.)

I also appreciate that most times when I or others try to do this mechanistic modeling of human minds, it ends up very dry and others don’t want to read it even when it feels compelling to the author; somehow Steven has escaped that, by dint of writing quality or idea quality, I’m not sure.

I really liked this and when the relevant Annual Review comes around, expect to give it at least a 4.