ryan_greenblatt comments on jacquesthibs’s Shortform

ryan_greenblatt 17 Apr 2025 19:25 UTC
59 points
44
My guess would be that making RL envs for broad automation of the economy is bad^[1] and making benchmarks which measure how good AIs are at automating jobs is somewhat good^[2].

Regardless, IMO this seems worse for the world than other activities Matthew, Tamay, and Ege might do.
1. ↩︎
  I’d guess the skills will transfer to AI R&D etc insofar as the environments are good. I’m sign uncertain about broad automation which doesn’t transfer (which would be somewhat confusing/surprising) as this would come down to increased awareness earlier vs speeding up AI development due to increased investment.
2. ↩︎
  It’s probably better if you don’t make these benchmarks easy to iterate on and focus on determining+forecasting whether models have high levels of threat-model-relevant capability. And being able to precisely compare models with similar performance isn’t directly important.