Hi Fabien, thanks for taking the time to comment.
Do you think that estimates downstream of this elicitation question are more accurate than directly asking experts “here is the model, it hasn’t been deployed widely yet but here are all the things it can do, what do you think is the median annual risk?”
There are two factors at play here:
(1) what is the quality of expert elicitation for the more granular risk model (our current approach) vs the coarse risk model you bring up
(2) if in (1) we go for the more granular risk model, can we reliably propagate the individual estimates onto the final node of ‘median annual risk’
In our experience, for (1) it is better to go for the granular risk model. In our expert elicitation workshops, we were told by experts that the narrower the question, the easier it is to reason over it. Furthermore, arriving at a consensus when aggregating multiple experts’ opinions is much easier in this narrow approach.
It’s not clear to us, however, whether what we gain by pursuing this granular elicitation is not lost due to the necessity of having to then propagate this information in some linear way through the rest of the risk model (factor 2). The canonical textbook on probability elicitation, Uncertain Judgments, points to (O’Hagan, 1988), (Weight et al., 1994) and (Kleinmuntz et al., 1996) as evidence that the more granular approach leads to higher quality elicitations than eliciting just the overall distribution. These references are old by now and it’s of course not given that they apply to our specific type of modelling, so it would be interesting to compare these two approaches.
In any case, one of the main purposes of the risk model is to be able to attribute total risk to individual factors. These kinds of models look to estimate both “how much risk” and “where does it come from” jointly, so we can inform things like mitigation prioritisation and eval effectiveness. This is the goal of our Shapley analysis. If we just elicit the final risk distribution from experts, we lose this explanatory power. One could also simply elicit experts’ opinions on what drives total risk in their minds. But the advantage of having an explicit structure encoded in a Bayesian network is that it makes disagreements clear.
I think it’s especially clear in the situation where the experts have access to richer information about the model than just benchmark performance (e.g. I don’t think the benchmark scores of Mythos Preview are very informative about its potential impact),
Indeed, mapping from just two cybersec benchmarks is probably the main limitation of the current approach. We have seen experts voice opinions that task D that they’re given is not really relevant to the MITRE step Y they need to estimate. We are working on integrating more ‘risk indicators’ in our risk models: more sophisticated evaluations (e.g. cyber ranges), transcript analysis, incident trackers and other public evidence.
I would guess that directly asking the experts about the outcome of interest dominates modeling even if you asked the experts “What is the probability the threat actor X could successfully achieve the MITRE ATT&CK technique Y on the target Z if they had access to this LLM capable of [detailed description of the model in front of you]?”
This is covered by my first answer.
the cyber and loss of control situations are very different from the nuclear case
Agreed for LoC, more uncertain about cyber where things are very messy, but it’s not impossible to construct plausible pathways to harm, either via MITRE or else. For LoC, we’re pursuing a very different approach that is more qualitative in nature.
don’t you have very non-linear effects where the number of attacks skyrockets if an attack is very profitable? don’t you have big decreasing marginal returns due to the defenders improving their defenses if they are bleeding too much money due to attacks?
Agreed. It is not clear to me that going for the approach of directly eliciting total risk estimates is better, though. In this approach, we are putting a lot of additional pressure on the experts’ mental world models. In practice, experts often disagree in their interpretations of the questions they’re asked about, even for very narrow steps. I’m worried that in this less granular approach, they would implicitly be modelling very different scenarios in their heads.
A limiting assumption of our risk models is that all parameters are independent. The sort of dependencies you mention (e.g., between attack frequency and impact, once actors realise an attack is profitable) could be modelled with tricks such as copulas. The reason we haven’t done so is that this in itself requires the estimation of additional parameters. This in turn places additional burden on experts – yet again, it is not clear to us whether the benefit of introducing copulas outweighs the downsides if their parameters are estimated poorly.
I think this assumes that the disagreement is a result of genuine uncertainty about the scenario, not of experts implicitly modelling a different scenario. In the granular approach, it’s easier to get them all on the same page and then any resultant variance will be due to their disagreement about the estimates, not due to experts interpreting the question differently. So we are able to isolate these two effects.
Although you could argue that we do actually want experts to have their own interpretations of the scenario, since this somehow averages over many assumptions and uncertainties about risk (similarly to the ‘wisdom of the crowds’ idea in forecasting). I wouldn’t have a good counterargument to that and we haven’t tested it.
In the future, we’d like to pursue both approaches (granular and coarse) and then use the coarse one to calibrate the granular one, understand where our risk decomposition is unrealistic, etc. For now we haven’t been able to do this just due to practical constraints—time and compensation for cybersec experts.
Regarding the non-linear effects, yes I agree that the ‘static’ nature of risk models (and benchmarks/cyber ranges) is probably the biggest drawback for now. We are not sure how the dynamics will change once the first cyberattack under consideration occurs, and our single ‘number of attacks per actor’ does not capture that well. I’m not confident that the coarse approach would do better, though—but happy to have my intuition corrected.
Why not? If your concern is that these narrow estimates will not be aggregated well into the total risk estimate, I agree. But that’s more of an issue with the quality of the risk model, isn’t it? The issue of expert elicitation is more or less orthogonal, or do you not think so?