I’ve published in this area so I have some meta comments about this work.
First the positive:
1. Assurance cases are the state of the art for making sure things don’t kill people in a regulated environment. Ever wonder why planes are so safe? Safety cases. Because the actual process of making one is so unsexy (GSNs make me want to cry), people tend to ignore them, so you deserve lots of credit for somehow getting ex-risk people to upvote this. More lesswronger types should be thinking about safety cases.
2. I do think you have good / defensible arguments overall, minus minor quibbles that don’t matter much.
Some bothers:
1. Since I used to be a little involved, I am perhaps a bit too aware of the absolutely insane amount of relevant literature was not mentioned. To me, the introduction made it sound a little bit like the specifics of applying safety cases to AI systems have not been studied. That is very, very, very not true.
That’s not to say you don’t have a contribution! Just that I don’t think it was placed well in the relevant literature. Many have done safety cases for AI but they usually do it as part of concrete applied work on drones or autonomous vehicles, not ex-risk pie-in-the-sky stuff. I think your arguments would be greatly improved by referencing back to this work.
I was extremely surprised to see so few of the (to me) obvious suspects referenced, particularly more from York. Some labs with people that publish lots in this area.
University of York Institute for Safe Autonomy
NASA Intelligent Systems Division
Waterloo Intelligent Systems Engineering Lab
Anything funded by the DARPA Assured Autonomy program
2. Second issue is a little more specific, related to this paragraph:
To mitigate these dangers, researchers have called on developers to provide evidence that their systems are safe (Koessler & Schuett, 2023; Schuett et al., 2023); however, the details of what this evidence should
look like have not been spelled out. For example, Anderljung et al vaguely state that this evidence should be “informed by evaluations of dangerous capabilities and controllability”(Anderljung et al., 2023). Similarly, a recently proposed California bill asserts that developers should provide a “positive safety determination” that “excludes hazardous capabilities” (California State Legislature, 2024). These nebulous requirements raise questions: what are the core assumptions behind these evaluations? How might developers integrate other kinds of evidence?
The reason the “nebulous requirements” aren’t explicitly stated is that when you make a safety case you assure the safety of a system against specific relevant hazards for the system you’re assuring. These are usually identified by performing a HAZOP analysis or similar. Not all AI systems have the same list of hazards, so its obviously dubious to expect you can list requirements a priori. This should have been stated, imo.
I hear what you’re saying. I probably should have made the following distinction:
A technology in the abstract (e.g. nuclear fission, LLMs)
A technology deployed to do a thing (e.g. nuclear in a power plant, LLM used for customer service)
The question I understand you to be asking is essentially how do we make safety cases for AI agents generally? I would argue that’s more situation 1 than 2, and as I understand it safety cases are basically only ever applied to case 2. The nuclear facilities document you linked definitely is 2.
So yeah, admittedly the document you were looking for doesn’t exist, but that doesn’t really surprise me. If you started looking for narrowly scoped safety principles for AI systems you start finding them everywhere. For example, a search for “artificial intelligence” on the ISO website results in 73 standards .
Just a few relevant standards, though I admit, standards are exceptionally boring (also many aren’t public, which is dumb):
UL 4600 standard for autonomous vehicles
ISO/IEC TR 5469 standard for ai safety stuff generally (this one is decently interesting)
ISO/IEC 42001 this one covers what you do if you set up a system that uses AI
You also might find this paper a good read: https://ieeexplore.ieee.org/document/9269875