FYI: the paper is now out.
See also the LW linkpost: METR: Measuring AI Ability to Complete Long Tasks, and a summary on Twitter.
(IMO this is a really cool paper ā very grateful to @Thomas Kwa et al. Iām looking forward to digging into the details.)
FYI: the paper is now out.
See also the LW linkpost: METR: Measuring AI Ability to Complete Long Tasks, and a summary on Twitter.
(IMO this is a really cool paper ā very grateful to @Thomas Kwa et al. Iām looking forward to digging into the details.)