Wei Dai comments on Legible vs. Illegible AI Safety Problems

Wei Dai 8 Nov 2025 6:32 UTC
8 points
3
Yes, some people are already implicitly doing this, but if we don’t make it explicit:
1. We can’t explain to the people not doing it (i.e., those working on already legible problems) why they should switch directions.
2. Even MIRI is doing it suboptimally because they’re not reasoning about it explicitly. I think they’re focusing too much on one particular x-safety problem (AI takeover caused by misalignment) that’s highly legible to themselves and not to the public/policymakers, and that’s problematic because what happens if someone comes up with an alignment breakthrough? Their arguments become invalidated and there’s no reason to stop holding back AGI/ASI anymore (in the public/policymakers’ eyes), but still plenty of illegible x-safety problems left.