simeon_c comments on leogao’s Shortform

simeon_c 10 Feb 2026 4:50 UTC
6 points
2
Yes, that’s one value. RSPs & many policy debates around it would have been less messed up if there had been clarity (i.e. they turned a confused notion into the standard, which was then impossible to fix in policy discussions, making the Code of Practice flawed). I don’t know of a specific example of preventing equivocation in other industries (it seems hard to know of such examples?) but the fact that basically all industries use a set of the same concepts is evidence that they’re pretty general-purpose and repurposable.

Another is just that it helps thinking in a generalized ways about the issues.
For instance, once you see evaluations as a Key Risk Indicator (i.e. a proxy measure of risk), you can notice that we could also use other Key Risk Indicators to trigger mitigations, such as actual monitoring metrics. This could enable to build conditions/thresholds in RSPs that are based on monitoring metrics (e.g. “we find less than 5 bioterrorists successfully jailbreaking our model per year on our API”). The more generalized concepts enables more compositionality of ideas in a way that skips you a bunch of the trial and error process.