We can subdivide the security story based on the ease of fixing a flaw if we’re able to detect it in advance. For example, vulnerability #1 on the OWASP Top 10 is injection, which is typically easy to patch once it’s discovered. Insecure systems are often right next to secure systems in program space.
Insecure systems are right next to secure systems, and many flaws are found. Yet, the larger systems (the company running the software, the economy, etc) manage to correct somehow. It’s because there are mechanisms in the larger systems poised to patch the software when flaws are discovered. Perhaps we could fit and optimize this flaw-exploit-patch-loop in security as a technique for AI alignment.
If the security story is what we are worried about, it could be wise to try & develop the AI equivalent of OWASP’s Cheat Sheet Series, to make it easier for people to find security problems with AI systems. Of course, many items on the cheat sheet would be speculative, since AGI doesn’t actually exist yet. But it could still serve as a useful starting point for brainstorming.
This sounds like a great idea to me. Software security has a very well developed knowledge base at this point and since AI is software, there should be many good insights to port.
What possibilities aren’t covered by the taxonomy provided?
Here’s one that occurred to me quickly: Drastic technological progress (presumably involving AI) destabilizes society and causes strife. In this environment with more enmity, safety procedures are neglected and UFAI is produced.