I wonder if you can convert the METR time horizon results into SD /ā year numbers. My sense is that this will probably not be that meaningful because AIs are much worse than mediocre professionals while having a different skill profile, so they are effectively out of the human range.
If you did a best effort version of this by looking at software engineers who struggle to complete longer tasks like the ones in the METR benchmark(s), Iād wildly guess that a doubling in time horizon is roughly 0.7 SD such that this predicts ~1.2 SD /ā year.
I wonder if you can convert the METR time horizon results into SD /ā year numbers. My sense is that this will probably not be that meaningful because AIs are much worse than mediocre professionals while having a different skill profile, so they are effectively out of the human range.
If you did a best effort version of this by looking at software engineers who struggle to complete longer tasks like the ones in the METR benchmark(s), Iād wildly guess that a doubling in time horizon is roughly 0.7 SD such that this predicts ~1.2 SD /ā year.