Good question. First, many of these benchmarks are about things that are dangerous, but not particularly economically valuable (example: bioweapons). My model of labs is that they’re mostly trying to do economically valuable things (as most companies are forced to). Although they may be reckless, I don’t currently have the impression they’re actively trying to take over power, by using things such as bioweapons. Second, some benchmarks are about things that are economically valuable, for example the METR one. But these mostly get benchmaxxed already. Third, we are not creating new benchmarks, but only tracking the scores for existing ones and coupling them to existential threat models. If labs wanted to benchmax any benchmark we track, they could do so with or without our work.
In addition: this risk needs to be weighed against the positive effect of improving the information position of researchers, policymakers, and the public. How much this matters depends on how well we do, but also on your theory of change and to what extent informing these groups is a part of that. In my case, I strongly believe in awareness as a key path towards solving the problem. I think this is true for researcher, policymaker, and public awareness of the right threat model. I’m particularly excited about this graph, showing that public xrisk awareness increased from about 8% to 24% already. If our combined efforts could increase this to tipping point, I’d be mostly optimistic that we can implement and enforce a global AI safety treaty (such as we proposed in TIME and SCMP) and this will reduce xrisk significantly.
For these reasons, my bet is that the result is positive. But it’s not obvious, so again, good question.
In addition: I think creating benchmarks that are mainly about something economically relevant, or, god forbid, scientifically relevant, are way more likely to get benchmaxxed and lead straight to a takeover too, while not really having a strong case to reduce xrisk. These benchmarks are routinely created and funded by xrisky orgs.
Good question. First, many of these benchmarks are about things that are dangerous, but not particularly economically valuable (example: bioweapons). My model of labs is that they’re mostly trying to do economically valuable things (as most companies are forced to). Although they may be reckless, I don’t currently have the impression they’re actively trying to take over power, by using things such as bioweapons. Second, some benchmarks are about things that are economically valuable, for example the METR one. But these mostly get benchmaxxed already. Third, we are not creating new benchmarks, but only tracking the scores for existing ones and coupling them to existential threat models. If labs wanted to benchmax any benchmark we track, they could do so with or without our work.
In addition: this risk needs to be weighed against the positive effect of improving the information position of researchers, policymakers, and the public. How much this matters depends on how well we do, but also on your theory of change and to what extent informing these groups is a part of that. In my case, I strongly believe in awareness as a key path towards solving the problem. I think this is true for researcher, policymaker, and public awareness of the right threat model. I’m particularly excited about this graph, showing that public xrisk awareness increased from about 8% to 24% already. If our combined efforts could increase this to tipping point, I’d be mostly optimistic that we can implement and enforce a global AI safety treaty (such as we proposed in TIME and SCMP) and this will reduce xrisk significantly.
For these reasons, my bet is that the result is positive. But it’s not obvious, so again, good question.
In addition: I think creating benchmarks that are mainly about something economically relevant, or, god forbid, scientifically relevant, are way more likely to get benchmaxxed and lead straight to a takeover too, while not really having a strong case to reduce xrisk. These benchmarks are routinely created and funded by xrisky orgs.