My guess would be that making RL envs for broad automation of the economy is bad[1] and making benchmarks which measure how good AIs are at automating jobs is somewhat good[2].
Regardless, IMO this seems worse for the world than other activities Matthew, Tamay, and Ege might do.
I’d guess the skills will transfer to AI R&D etc insofar as the environments are good. I’m sign uncertain about broad automation which doesn’t transfer (which would be somewhat confusing/surprising) as this would come down to increased awareness earlier vs speeding up AI development due to increased investment.
It’s probably better if you don’t make these benchmarks easy to iterate on and focus on determining+forecasting whether models have high levels of threat-model-relevant capability. And being able to precisely compare models with similar performance isn’t directly important.
My guess would be that making RL envs for broad automation of the economy is bad[1] and making benchmarks which measure how good AIs are at automating jobs is somewhat good[2].
Regardless, IMO this seems worse for the world than other activities Matthew, Tamay, and Ege might do.
I’d guess the skills will transfer to AI R&D etc insofar as the environments are good. I’m sign uncertain about broad automation which doesn’t transfer (which would be somewhat confusing/surprising) as this would come down to increased awareness earlier vs speeding up AI development due to increased investment.
It’s probably better if you don’t make these benchmarks easy to iterate on and focus on determining+forecasting whether models have high levels of threat-model-relevant capability. And being able to precisely compare models with similar performance isn’t directly important.