In contrast, working on the illegible problems (including by trying to make them more legible) does not have this issue and therefore has a much higher expected value (all else being equal, such as tractability). Note that according to this logic, success in making an illegible problem highly legible is almost as good as solving it!
Obviously if you make an illegible alignment problem is legible, it is at grave risk of being solved—I’m confused as to why you think this is a good thing. Any alignment advance will have capability implications, and so it follows that any alignment advance is bad and to be avoided.
(I.e. I think you’re being too paranoid when you worry about solving legible safety problems being net negative)
AI risk is not common knowledge. There are many people who do not believe there’s any risk. I really wish people who make arguments of the following form:
would acknowledge this fact. It is simply not true that everyone in frontier labs thinks this way. You can ask them! They’ll tell you!
It would be nice if we were just in a prisoner’s dilemma type co-ordination problem. But when someone is publicly saying “hitting DEFECT has no downsides whatsoever, I plan on doing that as much as possible” you need to take this into consideration.