4gate comments on Legible vs. Illegible AI Safety Problems

4gate 9 Nov 2025 22:42 UTC
3 points
0
Edit: it looks like someone gave some links below. I don’t have time to read it yet but I may do so in the future. I think that it’s better to give examples and be dismissed than to give no examples and be dismissed.
It should be nice to see some good examples of illegible problems. I understand that their illegibility may be a core issue to this, but surely someone can at least name a few?
I think it’s important so as to compare to legible problems. I assume legible problems can include things like jailbreaking-resistance, unlearning, etc.. for AI security. I don’t see why these in particular necessarily bring forward the day that ASI/AGI will be deployed. For example: if the defense results are consistently bad and the attack results are consistently good it could support arguments for more cautious policy. In fact there is an argument to be made in this specific instance that this is actually the result. Examples: https://arxiv.org/abs/2502.02260v1, https://arxiv.org/abs/2510.09023, maybe https://arxiv.org/abs/2501.04952 (I haven’t read the last one).
For this reason, I’m not sure what to meaningfully take away from this post. Would someone who was more “in the scene” for AIS intuitively understand what the legible and illegible problems are?
- Wei Dai 9 Nov 2025 23:13 UTC
  8 points
  0
  Parent
  I added a bit to the post to address this:
  Edit: Many people have asked for examples of illegible problems. I wrote a new post listing all of the AI safety problems that I’ve tried to make more legible over the years, in part to answer this request. Some have indeed become more legible over time (perhaps partly due to my efforts), while others remain largely illegible to many important groups.
  @Ebenezer Dukakis @No77e @sanyer