In contrast, working on the illegible problems (including by trying to make them more legible) does not have this issue and therefore has a much higher expected value (all else being equal, such as tractability). Note that according to this logic, success in making an illegible problem highly legible is almost as good as solving it!
Obviously if you make an illegible alignment problem is legible, it is at grave risk of being solved—I’m confused as to why you think this is a good thing. Any alignment advance will have capability implications, and so it follows that any alignment advance is bad and to be avoided.
(I.e. I think you’re being too paranoid when you worry about solving legible safety problems being net negative)
Obviously if you make an illegible alignment problem is legible, it is at grave risk of being solved—I’m confused as to why you think this is a good thing. Any alignment advance will have capability implications, and so it follows that any alignment advance is bad and to be avoided.
(I.e. I think you’re being too paranoid when you worry about solving legible safety problems being net negative)