d_el_ez comments on Legible vs. Illegible AI Safety Problems

d_el_ez 11 Nov 2025 5:57 UTC
2 points
0
On the top-level alignment problem:
I think alignment is legible already, elites just ignore it for the usual reasons: power seeking, the economy, arms race, etc.
I think what you want is for alignment to be cheap. The “null” alignment paradigm of “don’t build it yet” is too expensive so we’re not doing it. Find something cheap enough, and elites will magically understand alignment overnight. That either means 1) solve technical alignment, or 2) give elites what they want with an aligned non-AGI (which I think we weakly believe is impossible).
On more specific but illegible/neglected alignment problems:
I certainly agree with the general principle “work on neglected problems.” But they will 100% funnel into capabilities upon being published more widely (aka made legible), there is no exception to this since the dawn of science. I don’t think it’s possible to work on an illegible problem without this happening. Therefore the advice “work on illegible problems” reduces from the problem of picking research that helps safety more than capability.
If anything you might have it backwards… legible stuff is already used for capabilities so there’s zero harm in making sure it’s used for safety but other safety researchers might not be doing that.