This frame seems useful, but might obscure some nuance:
The systems we should be most worried about are the AIs of tomorrow, not the AIs of today. Hence, some critical problems might not manifest at all in today’s AIs. You can still say it’s a sort of “illegible problem” of modern AI that it’s progressing towards a certain failure mode, but that might be confusing.
While it’s true that deployment is the relevant threshold for the financial goals of a company, making it crucial for the company’s decision-making and available resources for further R&D, the dangers are not necessarily tied to deployment. It’s possible for a world-ending event to originate during testing or even during training.
I agree on both points. To the first, I’d like to note that classifying “kinds of illegibility” seems worthwhile. You’ve pointed out one example, the “this will affect future systems but doesn’t affect systems today”. I’d add three more to make the possibly incomplete set:
This will affect future systems but doesn’t affect systems today.
This relates to an issue at a great inferential distance; it is conceptually difficult to understand.
This issue stems from an improper framing or assumption about existing systems that is not correct.
This issue is emotionally or politically inconvenient.
I’d be happy to say more about what I mean by each of the above if anyone is curious, and I’d also be happy to hear out thoughts about my suggested illegibility categories or the concept in general.
This frame seems useful, but might obscure some nuance:
The systems we should be most worried about are the AIs of tomorrow, not the AIs of today. Hence, some critical problems might not manifest at all in today’s AIs. You can still say it’s a sort of “illegible problem” of modern AI that it’s progressing towards a certain failure mode, but that might be confusing.
While it’s true that deployment is the relevant threshold for the financial goals of a company, making it crucial for the company’s decision-making and available resources for further R&D, the dangers are not necessarily tied to deployment. It’s possible for a world-ending event to originate during testing or even during training.
I agree on both points. To the first, I’d like to note that classifying “kinds of illegibility” seems worthwhile. You’ve pointed out one example, the “this will affect future systems but doesn’t affect systems today”. I’d add three more to make the possibly incomplete set:
This will affect future systems but doesn’t affect systems today.
This relates to an issue at a great inferential distance; it is conceptually difficult to understand.
This issue stems from an improper framing or assumption about existing systems that is not correct.
This issue is emotionally or politically inconvenient.
I’d be happy to say more about what I mean by each of the above if anyone is curious, and I’d also be happy to hear out thoughts about my suggested illegibility categories or the concept in general.