What is the probability the threat actor X could successfully achieve the MITRE ATT&CK technique Y on the target Z if they had access to an LLM capable of solving Cybench tasks up to difficulty D?
Do you think that estimates downstream of this elicitation question are more accurate than directly asking experts “here is the model, it hasn’t been deployed widely yet but here are all the things it can do, what do you think is the median annual risk?”
My guess is that directly asking the cybersecurity experts about the outcome of interest will result in better estimates of the outcome of interest, curious if you disagree.
I think it’s especially clear in the situation where the experts have access to richer information about the model than just benchmark performance (e.g. I don’t think the benchmark scores of Mythos Preview are very informative about its potential impact), but I would guess that directly asking the experts about the outcome of interest dominates modeling even if you asked the experts “What is the probability the threat actor X could successfully achieve the MITRE ATT&CK technique Y on the target Z if they had access to this LLM capable of [detailed description of the model in front of you]?”
Tbc I am supportive of asking the experts to explain why they make the predictions that they are doing (potentially with guesstimates and explicit modeling), since this has big auditing benefits, but I think the cyber and loss of control situations are very different from the nuclear case because my understanding is that in the nuclear case the modeling is doing much more heavy lifting (while I don’t think you are learning that much by multiplying a per actor per technique per target probability by the number of actors and attacks per actor and the impact per success and taking the sum over techniques and targets, in fact I think this approach seems a bit weird and much less informative than asking experts directly about what they think the damages are—don’t you have very non-linear effects where the number of attacks skyrockets if an attack is very profitable? don’t you have big decreasing marginal returns due to the defenders improving their defenses if they are bleeding too much money due to attacks?).
Do you think that estimates downstream of this elicitation question are more accurate than directly asking experts “here is the model, it hasn’t been deployed widely yet but here are all the things it can do, what do you think is the median annual risk?”
There are two factors at play here: (1) what is the quality of expert elicitation for the more granular risk model (our current approach) vs the coarse risk model you bring up (2) if in (1) we go for the more granular risk model, can we reliably propagate the individual estimates onto the final node of ‘median annual risk’
In our experience, for (1) it is better to go for the granular risk model. In our expert elicitation workshops, we were told by experts that the narrower the question, the easier it is to reason over it. Furthermore, arriving at a consensus when aggregating multiple experts’ opinions is much easier in this narrow approach.
It’s not clear to us, however, whether what we gain by pursuing this granular elicitation is not lost due to the necessity of having to then propagate this information in some linear way through the rest of the risk model (factor 2). The canonical textbook on probability elicitation, Uncertain Judgments, points to (O’Hagan, 1988), (Weight et al., 1994) and (Kleinmuntz et al., 1996) as evidence that the more granular approach leads to higher quality elicitations than eliciting just the overall distribution. These references are old by now and it’s of course not given that they apply to our specific type of modelling, so it would be interesting to compare these two approaches.
In any case, one of the main purposes of the risk model is to be able to attribute total risk to individual factors. These kinds of models look to estimate both “how much risk” and “where does it come from” jointly, so we can inform things like mitigation prioritisation and eval effectiveness. This is the goal of our Shapley analysis. If we just elicit the final risk distribution from experts, we lose this explanatory power. One could also simply elicit experts’ opinions on what drives total risk in their minds. But the advantage of having an explicit structure encoded in a Bayesian network is that it makes disagreements clear.
I think it’s especially clear in the situation where the experts have access to richer information about the model than just benchmark performance (e.g. I don’t think the benchmark scores of Mythos Preview are very informative about its potential impact),
Indeed, mapping from just two cybersec benchmarks is probably the main limitation of the current approach. We have seen experts voice opinions that task D that they’re given is not really relevant to the MITRE step Y they need to estimate. We are working on integrating more ‘risk indicators’ in our risk models: more sophisticated evaluations (e.g. cyber ranges), transcriptanalysis, incidenttrackers and other public evidence.
I would guess that directly asking the experts about the outcome of interest dominates modeling even if you asked the experts “What is the probability the threat actor X could successfully achieve the MITRE ATT&CK technique Y on the target Z if they had access to this LLM capable of [detailed description of the model in front of you]?”
This is covered by my first answer.
the cyber and loss of control situations are very different from the nuclear case
Agreed for LoC, more uncertain about cyber where things are very messy, but it’s not impossible to construct plausible pathways to harm, either via MITRE or else. For LoC, we’re pursuing a very different approach that is more qualitative in nature.
don’t you have very non-linear effects where the number of attacks skyrockets if an attack is very profitable? don’t you have big decreasing marginal returns due to the defenders improving their defenses if they are bleeding too much money due to attacks?
Agreed. It is not clear to me that going for the approach of directly eliciting total risk estimates is better, though. In this approach, we are putting a lot of additional pressure on the experts’ mental world models. In practice, experts often disagree in their interpretations of the questions they’re asked about, even for very narrow steps. I’m worried that in this less granular approach, they would implicitly be modelling very different scenarios in their heads.
A limiting assumption of our risk models is that all parameters are independent. The sort of dependencies you mention (e.g., between attack frequency and impact, once actors realise an attack is profitable) could be modelled with tricks such as copulas. The reason we haven’t done so is that this in itself requires the estimation of additional parameters. This in turn places additional burden on experts – yet again, it is not clear to us whether the benefit of introducing copulas outweighs the downsides if their parameters are estimated poorly.
Furthermore, arriving at a consensus when aggregating multiple experts’ opinions is much easier in this narrow approach.
I think this is a bad reason to prefer the narrow approach. If they agree on narrow facts but disagree on the bottom-line risk, then surely they will disagree on the modeling, and thus using experts only for consensual narrow facts and using your own modeling just hides the massive uncertainty in an organizer-chosen model which most experts will disagree with. If experts disagree about the bottom line, then experts should disagree on at least one of the questions you ask them!
A limiting assumption of our risk models is that all parameters are independent
I think this is maybe important when you have models that capture all crucial considerations (i.e. considerations that could each massively change the bottom-line estimate). But the 1st order bit is whether you captured all crucial considerations. In cyber, increased attacker effort due to higher attack ROI or diminishing marginal returns to attacks due to investments in defenses and due to the easiest targets already being exploited are such crucial considerations, and I would not be surprised if there were other crucial considerations (e.g. extreme company or gov interventions if the damages got visibly on track to being very high, tail risk of new kinds of cyber attacks only enabled by LLMs, potentially net-positive impact on cyber of wide LLM availability due to defenders finding vulnerabilities in their own software, potentially net-negative impact of such large scale white-hat vuln finding efforts due to difficulties in updating software in a timely manner, potential net-positive impact of such short-term damages on long-term cyber defenses, …).
I think that risk modeling that implicitly (e.g. because experts take into account these effects when you ask them about their bottom-line prediction) or explicitly tries to capture crucial considerations will be much more accurate than risk modeling that sacrifices some crucial considerations for the sake of a more legible process.
I am excited about getting better data on narrow questions, but I think that such data should be used as inputs into a risk modeling process that takes into account crucial considerations rather than being used to derive risk directly via a legible-but-rigid process. I also think that figuring out how to do expert elicitation on narrow questions won’t teach us much on how to build better processes to make the bottom-line risk estimation (which is where I expect most of the difficulty to be).
Do you think that estimates downstream of this elicitation question are more accurate than directly asking experts “here is the model, it hasn’t been deployed widely yet but here are all the things it can do, what do you think is the median annual risk?”
My guess is that directly asking the cybersecurity experts about the outcome of interest will result in better estimates of the outcome of interest, curious if you disagree.
I think it’s especially clear in the situation where the experts have access to richer information about the model than just benchmark performance (e.g. I don’t think the benchmark scores of Mythos Preview are very informative about its potential impact), but I would guess that directly asking the experts about the outcome of interest dominates modeling even if you asked the experts “What is the probability the threat actor X could successfully achieve the MITRE ATT&CK technique Y on the target Z if they had access to this LLM capable of [detailed description of the model in front of you]?”
Tbc I am supportive of asking the experts to explain why they make the predictions that they are doing (potentially with guesstimates and explicit modeling), since this has big auditing benefits, but I think the cyber and loss of control situations are very different from the nuclear case because my understanding is that in the nuclear case the modeling is doing much more heavy lifting (while I don’t think you are learning that much by multiplying a per actor per technique per target probability by the number of actors and attacks per actor and the impact per success and taking the sum over techniques and targets, in fact I think this approach seems a bit weird and much less informative than asking experts directly about what they think the damages are—don’t you have very non-linear effects where the number of attacks skyrockets if an attack is very profitable? don’t you have big decreasing marginal returns due to the defenders improving their defenses if they are bleeding too much money due to attacks?).
Hi Fabien, thanks for taking the time to comment.
There are two factors at play here:
(1) what is the quality of expert elicitation for the more granular risk model (our current approach) vs the coarse risk model you bring up
(2) if in (1) we go for the more granular risk model, can we reliably propagate the individual estimates onto the final node of ‘median annual risk’
In our experience, for (1) it is better to go for the granular risk model. In our expert elicitation workshops, we were told by experts that the narrower the question, the easier it is to reason over it. Furthermore, arriving at a consensus when aggregating multiple experts’ opinions is much easier in this narrow approach.
It’s not clear to us, however, whether what we gain by pursuing this granular elicitation is not lost due to the necessity of having to then propagate this information in some linear way through the rest of the risk model (factor 2). The canonical textbook on probability elicitation, Uncertain Judgments, points to (O’Hagan, 1988), (Weight et al., 1994) and (Kleinmuntz et al., 1996) as evidence that the more granular approach leads to higher quality elicitations than eliciting just the overall distribution. These references are old by now and it’s of course not given that they apply to our specific type of modelling, so it would be interesting to compare these two approaches.
In any case, one of the main purposes of the risk model is to be able to attribute total risk to individual factors. These kinds of models look to estimate both “how much risk” and “where does it come from” jointly, so we can inform things like mitigation prioritisation and eval effectiveness. This is the goal of our Shapley analysis. If we just elicit the final risk distribution from experts, we lose this explanatory power. One could also simply elicit experts’ opinions on what drives total risk in their minds. But the advantage of having an explicit structure encoded in a Bayesian network is that it makes disagreements clear.
Indeed, mapping from just two cybersec benchmarks is probably the main limitation of the current approach. We have seen experts voice opinions that task D that they’re given is not really relevant to the MITRE step Y they need to estimate. We are working on integrating more ‘risk indicators’ in our risk models: more sophisticated evaluations (e.g. cyber ranges), transcript analysis, incident trackers and other public evidence.
This is covered by my first answer.
Agreed for LoC, more uncertain about cyber where things are very messy, but it’s not impossible to construct plausible pathways to harm, either via MITRE or else. For LoC, we’re pursuing a very different approach that is more qualitative in nature.
Agreed. It is not clear to me that going for the approach of directly eliciting total risk estimates is better, though. In this approach, we are putting a lot of additional pressure on the experts’ mental world models. In practice, experts often disagree in their interpretations of the questions they’re asked about, even for very narrow steps. I’m worried that in this less granular approach, they would implicitly be modelling very different scenarios in their heads.
A limiting assumption of our risk models is that all parameters are independent. The sort of dependencies you mention (e.g., between attack frequency and impact, once actors realise an attack is profitable) could be modelled with tricks such as copulas. The reason we haven’t done so is that this in itself requires the estimation of additional parameters. This in turn places additional burden on experts – yet again, it is not clear to us whether the benefit of introducing copulas outweighs the downsides if their parameters are estimated poorly.
I think this is a bad reason to prefer the narrow approach. If they agree on narrow facts but disagree on the bottom-line risk, then surely they will disagree on the modeling, and thus using experts only for consensual narrow facts and using your own modeling just hides the massive uncertainty in an organizer-chosen model which most experts will disagree with. If experts disagree about the bottom line, then experts should disagree on at least one of the questions you ask them!
Interesting, thanks!
I think this is maybe important when you have models that capture all crucial considerations (i.e. considerations that could each massively change the bottom-line estimate). But the 1st order bit is whether you captured all crucial considerations. In cyber, increased attacker effort due to higher attack ROI or diminishing marginal returns to attacks due to investments in defenses and due to the easiest targets already being exploited are such crucial considerations, and I would not be surprised if there were other crucial considerations (e.g. extreme company or gov interventions if the damages got visibly on track to being very high, tail risk of new kinds of cyber attacks only enabled by LLMs, potentially net-positive impact on cyber of wide LLM availability due to defenders finding vulnerabilities in their own software, potentially net-negative impact of such large scale white-hat vuln finding efforts due to difficulties in updating software in a timely manner, potential net-positive impact of such short-term damages on long-term cyber defenses, …).
I think that risk modeling that implicitly (e.g. because experts take into account these effects when you ask them about their bottom-line prediction) or explicitly tries to capture crucial considerations will be much more accurate than risk modeling that sacrifices some crucial considerations for the sake of a more legible process.
I am excited about getting better data on narrow questions, but I think that such data should be used as inputs into a risk modeling process that takes into account crucial considerations rather than being used to derive risk directly via a legible-but-rigid process. I also think that figuring out how to do expert elicitation on narrow questions won’t teach us much on how to build better processes to make the bottom-line risk estimation (which is where I expect most of the difficulty to be).