ryan_greenblatt comments on ryan_greenblatt’s Shortform

ryan_greenblatt 9 Jun 2025 16:13 UTC
5 points
0
See also “AI companies’ eval reports mostly don’t support their claims” by Zach Stein-Perlman.
- Noosphere89 9 Jun 2025 16:58 UTC
  2 points
  0
  Parent
  I made a comment on that post on why for now, I think the thresholds are set high for good reason, and I think the evals not supporting company claims that they can’t do bioweapons/CBRN tasks are mostly failures of the evals, but also I’m confused on how Anthropic managed to rule out uplift risks for Claude Sonnet 4 but not Claude Opus 4:
  
  https://www.lesswrong.com/posts/AK6AihHGjirdoiJg6/?commentId=mAcm2tdfRLRcHhnJ7