to make sure I understand correctly, are you saying that a lot of the value of having this kind of formalized structure is to make it harder for people to make intuitive but flawed arguments by equivocating terms?
are there good examples of such frameworks preventing equivocation in other industries?
Yes, that’s one value. RSPs & many policy debates around it would have been less messed up if there had been clarity (i.e. they turned a confused notion into the standard, which was then impossible to fix in policy discussions, making the Code of Practice flawed). I don’t know of a specific example of preventing equivocation in other industries (it seems hard to know of such examples?) but the fact that basically all industries use a set of the same concepts is evidence that they’re pretty general-purpose and repurposable.
Another is just that it helps thinking in a generalized ways about the issues. For instance, once you see evaluations as a Key Risk Indicator (i.e. a proxy measure of risk), you can notice that we could also use other Key Risk Indicators to trigger mitigations, such as actual monitoring metrics. This could enable to build conditions/thresholds in RSPs that are based on monitoring metrics (e.g. “we find less than 5 bioterrorists successfully jailbreaking our model per year on our API”). The more generalized concepts enables more compositionality of ideas in a way that skips you a bunch of the trial and error process.
to make sure I understand correctly, are you saying that a lot of the value of having this kind of formalized structure is to make it harder for people to make intuitive but flawed arguments by equivocating terms?
are there good examples of such frameworks preventing equivocation in other industries?
Yes, that’s one value. RSPs & many policy debates around it would have been less messed up if there had been clarity (i.e. they turned a confused notion into the standard, which was then impossible to fix in policy discussions, making the Code of Practice flawed). I don’t know of a specific example of preventing equivocation in other industries (it seems hard to know of such examples?) but the fact that basically all industries use a set of the same concepts is evidence that they’re pretty general-purpose and repurposable.
Another is just that it helps thinking in a generalized ways about the issues.
For instance, once you see evaluations as a Key Risk Indicator (i.e. a proxy measure of risk), you can notice that we could also use other Key Risk Indicators to trigger mitigations, such as actual monitoring metrics. This could enable to build conditions/thresholds in RSPs that are based on monitoring metrics (e.g. “we find less than 5 bioterrorists successfully jailbreaking our model per year on our API”). The more generalized concepts enables more compositionality of ideas in a way that skips you a bunch of the trial and error process.