Vladimir_Nesov comments on Provably Safe AI

Vladimir_Nesov 6 Oct 2023 11:20 UTC
5 points
2
It’s not a silver bullet, but not useless either. In software verification, there are many safety properties that are valuable to have, but they never guarantee safety in ways that are not captured in their formulation. Similarly, the task of formulating safety of AI systems in formal properties is hopeless in its ambitious form but could be extremely useful piecemeal, when AI systems are structured as modular software instead of as monolithic models.

Examples adjacent to where such methods might become relevant are Conjecture’s CoEm and Drexler’s CAIS. AIs built for a particular purpose that avoid escalation of their capabilities, or respect boundaries in their impact.

This also gets agent foundations a handhold in practical AI systems, ability to robustly implement particular laws of behavior. So not a solution to ambitious alignment, but rather tools for building systems that might eventually get us there, for keeping such systems tractable and bounded in particular ways.