how valuable are formalized frameworks for safety, risk assessment, etc in other (non-AI) fields?
i’m temperamentally predisposed to find that my eyes glaze over whenever i see some kind of formalized risk management framework, which usually comes with a 2010s style powerpoint diagram—see below for a random example i grabbed from google images:
am i being unfair to these things? are they actually super effective for avoiding failures in other domains? or are they just useful for CYA and busywork and mostly just say common sense in an institutionally legible way?
one reason i care is because i feel some level of instinctive dislike for some AI safety/governance frameworks because they give me this vibe. but it’s useful to figure out if i’m being unfairly judgemental, or if these really are slop.
The actual details of it contain some wise non-obvious aspects, along with elegant concepts that are generalizations of things that the safety community has been touching at. For instance the safety community has been conflating in “risk thresholds” two cleanly distinct notions in risk management of Key Risk Indicators (actual measurements of risk) and risk tolerance (your quantified preference for risk, independent from any test), which has caused a lot of confusion and hidden unreasonable choices for quite a bit.
People have also been conflating risk modeling and evals for quite a long time, because the AI field was built around evals. Once you have the clear view that evals are just an operationalization of risk models, it becomes more clear that you can actually do most of your risk modeling earlier in the lifecycle (i.e. before even touching a neural net), before having built a single eval & that evals are downstream of this.
You can see more of this genre of concepts applied to frontier AI here: https://arxiv.org/pdf/2502.06656 Here’s a graph with a few of the concepts in there
to make sure I understand correctly, are you saying that a lot of the value of having this kind of formalized structure is to make it harder for people to make intuitive but flawed arguments by equivocating terms?
are there good examples of such frameworks preventing equivocation in other industries?
Yes, that’s one value. RSPs & many policy debates around it would have been less messed up if there had been clarity (i.e. they turned a confused notion into the standard, which was then impossible to fix in policy discussions, making the Code of Practice flawed). I don’t know of a specific example of preventing equivocation in other industries (it seems hard to know of such examples?) but the fact that basically all industries use a set of the same concepts is evidence that they’re pretty general-purpose and repurposable.
Another is just that it helps thinking in a generalized ways about the issues. For instance, once you see evaluations as a Key Risk Indicator (i.e. a proxy measure of risk), you can notice that we could also use other Key Risk Indicators to trigger mitigations, such as actual monitoring metrics. This could enable to build conditions/thresholds in RSPs that are based on monitoring metrics (e.g. “we find less than 5 bioterrorists successfully jailbreaking our model per year on our API”). The more generalized concepts enables more compositionality of ideas in a way that skips you a bunch of the trial and error process.
It looks like slop, but also diagrams like these are literally used for a single slide in powerpoint demonstration, so I wouldn’t have too high of expectations for them. Hopefully the rest of whatever powerpoint that happens to belongs to contains good material too.
fwiw it’s less about the literal diagrams and more about the entire associated vibe. like usually the entire rest of the powerpoint also sounds like slop.
how valuable are formalized frameworks for safety, risk assessment, etc in other (non-AI) fields?
i’m temperamentally predisposed to find that my eyes glaze over whenever i see some kind of formalized risk management framework, which usually comes with a 2010s style powerpoint diagram—see below for a random example i grabbed from google images:
am i being unfair to these things? are they actually super effective for avoiding failures in other domains? or are they just useful for CYA and busywork and mostly just say common sense in an institutionally legible way?
one reason i care is because i feel some level of instinctive dislike for some AI safety/governance frameworks because they give me this vibe. but it’s useful to figure out if i’m being unfairly judgemental, or if these really are slop.
The actual details of it contain some wise non-obvious aspects, along with elegant concepts that are generalizations of things that the safety community has been touching at. For instance the safety community has been conflating in “risk thresholds” two cleanly distinct notions in risk management of Key Risk Indicators (actual measurements of risk) and risk tolerance (your quantified preference for risk, independent from any test), which has caused a lot of confusion and hidden unreasonable choices for quite a bit.
People have also been conflating risk modeling and evals for quite a long time, because the AI field was built around evals. Once you have the clear view that evals are just an operationalization of risk models, it becomes more clear that you can actually do most of your risk modeling earlier in the lifecycle (i.e. before even touching a neural net), before having built a single eval & that evals are downstream of this.
You can see more of this genre of concepts applied to frontier AI here: https://arxiv.org/pdf/2502.06656
Here’s a graph with a few of the concepts in there
to make sure I understand correctly, are you saying that a lot of the value of having this kind of formalized structure is to make it harder for people to make intuitive but flawed arguments by equivocating terms?
are there good examples of such frameworks preventing equivocation in other industries?
Yes, that’s one value. RSPs & many policy debates around it would have been less messed up if there had been clarity (i.e. they turned a confused notion into the standard, which was then impossible to fix in policy discussions, making the Code of Practice flawed). I don’t know of a specific example of preventing equivocation in other industries (it seems hard to know of such examples?) but the fact that basically all industries use a set of the same concepts is evidence that they’re pretty general-purpose and repurposable.
Another is just that it helps thinking in a generalized ways about the issues.
For instance, once you see evaluations as a Key Risk Indicator (i.e. a proxy measure of risk), you can notice that we could also use other Key Risk Indicators to trigger mitigations, such as actual monitoring metrics. This could enable to build conditions/thresholds in RSPs that are based on monitoring metrics (e.g. “we find less than 5 bioterrorists successfully jailbreaking our model per year on our API”). The more generalized concepts enables more compositionality of ideas in a way that skips you a bunch of the trial and error process.
It looks like slop, but also diagrams like these are literally used for a single slide in powerpoint demonstration, so I wouldn’t have too high of expectations for them. Hopefully the rest of whatever powerpoint that happens to belongs to contains good material too.
fwiw it’s less about the literal diagrams and more about the entire associated vibe. like usually the entire rest of the powerpoint also sounds like slop.