Wei Dai comments on Legible vs. Illegible AI Safety Problems

Wei Dai 5 Nov 2025 0:35 UTC
LW: 16 AF: 6
15
AF
Another implication is that directly attacking an AI safety problem can quickly flip from positive EV to negative EV, if someone succeeds in turning it from an illegible problem into a legible problem, and there are still other illegible problems remaining. Organizations and individuals caring about x-risks should ideally keep this in mind, and try to pivot direction if it happens, instead of following the natural institutional and personal momentum. (Trying to make illegible problems legible doesn’t have this issue, which is another advantage for that kind of work.)
What links here?
- Wei Dai's comment on Wei Dai’s Shortform by Wei Dai (5 Nov 2025 17:45 UTC; 40 points)
- ErickBall 6 Nov 2025 4:04 UTC
  9 points
  1
  Parent
  This seems to assume that legible/illegible is a fairly clear binary. If legibility is achieved more gradually, then for partially legible problems, working on solving them is probably a good way to help them get more legible.