ryan_greenblatt comments on ryan_greenblatt’s Shortform

ryan_greenblatt 8 Jun 2025 21:52 UTC
56 points
6
Many deployed AIs are plausibly capable of substantially assisting amateurs at making CBRN weapons (most centrally bioweapons) despite not having the safeguards this is supposed to trigger. In particular, I think o3, Gemini 2.5 Pro, and Sonnet 4 are plausibly above the relevant threshold (in the corresponding company’s safety framework). These AIs outperform expert virologists on Virology Capabilities Test (which is perhaps the best test we have of the most relevant capability) and we don’t have any publicly transparent benchmark or test which rules out CBRN concerns. I’m not saying these models are likely above the thresholds in the safety policies of these companies, just that it’s quite plausible (perhaps 35% for a well elicited version of o3). I should note that my understanding of CBRN capability levels is limited in various ways: I’m not an expert in some of the relevant domains, so much of my understanding is second hand.

The closest test we have which might rule out CBRN capabilities above the relevant threshold is Anthropic’s uplift trial for drafting a comprehensive bioweapons acquisition plan. (See section 7.2.4.1 in the Opus/Sonnet 4 model card.) They use this test to rule out ASL-3 CBRN for Sonnet 4 and Sonnet 3.7. However, we have very limited public details about this test (including why they chose the uplift threshold they did, why we should think that low scores on this test would rule out the most concerning threat models, and whether they did a sufficiently good job on elicitation and training participants to use the AI effectively). Also, it’s not clear that this test would indicate that o3 and Gemini 2.5 Pro are below a concerning threshold (and minimally it wasn’t run on these models to rule out a concerning level of CBRN capability). Anthropic appears to have done the best job handling CBRN evaluations. (This isn’t to say their evaluations and decision making are good at an absolute level; the available public information indicates a number of issues and is consistent with thresholds being picked to get the outcome Anthropic wanted. See here for more discussion.)

What should AI companies have done given this uncertainty? First, they should have clearly acknowledged their uncertainty in specific (ideally quantified) terms. Second, they should have also retained unconflicted third parties with relevant expertise to audit their decisions and publicly state their resulting views and level of uncertainty. Third party auditors who can examine the relevant tests in detail are needed as we have almost no public details about the tests these companies are relying on to rule out the relevant level of CBRN capability and lots of judgement is involved in making capability decisions. Publishing far more details of the load bearing tests and decision making process could also suffice, but my understanding is that companies don’t want to do this as they are concerned about infohazards from their bio evaluations.

If they weren’t ready to deploy these safeguards and thought that proceeding outweighed the (expected) cost in human lives, they should have publicly acknowledged the level of fatalities and explained why they thought weakening their safety policies and incurring these expected fatalities was net good.^[1]

In the future, we might get pretty clear evidence that these companies failed to properly assess the risk.

I mostly wrote this up to create common knowledge and because I wanted to reference this when talking about my views on open weight models. I’m not trying to trigger any specific action.

See also Luca Righetti’s post “OpenAI’s CBRN tests seem unclear,” which was about o1 (which is now substantially surpassed by multiple models).
1. ↩︎
  I think these costs/risks are small relative to future risks, but that doesn’t mean it’s good for companies to proceed while incurring these fatalities. For instance, the company proceeding could increase future risks and proceeding in this circumstance is correlated with the company doing a bad job of handling future risks (which will likely be much more difficult to safely handle).
What links here?
- Zach Stein-Perlman 5 Jul 2025 6:31 UTC
  13 points
  0
  Parent
  Update: experts and superforecasters agree with Ryan that current VCT results indicate substantial increase in human-caused epidemic risk. (Based on the summary; I haven’t read the paper.)
- Addie Foote 14 Jun 2025 6:45 UTC
  5 points
  0
  Parent
  If they weren’t ready to deploy these safeguards and thought that proceeding outweighed the (expected) cost in human lives, they should have publicly acknowledged the level of fatalities and explained why they thought weakening their safety policies and incurring these expected fatalities was net good.^[1]
  Public acknowledgements of the capabilities could be net negative in itself, especially if they resulted in media attention. I expect bringing awareness to the (possible) fact that the AI can assist with CBRN tasks likely increases the chance that people try to use it for CBRN tasks. I could even imagine someone trying to use these capabilities without malicious intent (e.g. just to see for themselves if it’s possible), but this still would be risky. Also, knowing which tasks it can help with might make it easier to use for harm.
  - ryan_greenblatt 14 Jun 2025 14:30 UTC
    3 points
    1
    Parent
    Given that AI companies have a strong conflict of interest, I would at least want them to report this to a third party and let that third party determine whether they should publicly acknowledge the capabilities.
- ryan_greenblatt 9 Jun 2025 16:13 UTC
  5 points
  0
  Parent
  See also “AI companies’ eval reports mostly don’t support their claims” by Zach Stein-Perlman.
  - Noosphere89 9 Jun 2025 16:58 UTC
    2 points
    0
    Parent
    I made a comment on that post on why for now, I think the thresholds are set high for good reason, and I think the evals not supporting company claims that they can’t do bioweapons/CBRN tasks are mostly failures of the evals, but also I’m confused on how Anthropic managed to rule out uplift risks for Claude Sonnet 4 but not Claude Opus 4:
    
    https://www.lesswrong.com/posts/AK6AihHGjirdoiJg6/?commentId=mAcm2tdfRLRcHhnJ7
- faul_sname 8 Jun 2025 23:27 UTC
  5 points
  3
  Parent
  
  If they weren’t ready to deploy these safeguards and thought that proceeding outweighed the (expected) cost in human lives, they should have publicly acknowledged the level of fatalities and explained why they thought weakening their safety policies and incurring these expected fatalities was net good.
  
  I can’t imagine their legal team signing off on such a statement. Even if the benefits of releasing clearly outweigh the costs.
- Kyle O’Brien 9 Jun 2025 7:33 UTC
  3 points
  2
  Parent
  What are your views on open-weights models? My thoughts after reading this post are that it may not be worth giving up the many benefits of open models if closed models are actually not significantly safer concerning these risks.
  - ryan_greenblatt 9 Jun 2025 15:52 UTC
    3 points
    0
    Parent
    I’ve actually written up a post with my views on open weights models and it should go up today at some point. (Or maybe tomorrow.)
    
    Edit: posted here