This is nitpicking, but since this post is specifically trying to increase clarity:
The headline projection in the paper is that in about 5 years, AIs will be able to carry out roughly 50% of software engineering tasks that take humans one month to complete. A 50% score is a good indication of progress, but a tool that can only complete half of the tasks assigned to it is a complement to human engineers, not a replacement. It likely indicates that the model has systematic shortfalls in some important skill areas. Human workers will still be needed for the other 50% of tasks, as well as to check the AI’s work (which, for some kinds of work, may be difficult).
This paragraph gives the impression to me that we could split the full set of tasks into ‘ones current AI gets right’ and ‘ones current AI gets wrong’. I interpret the METR post as instead pointing to a set of tasks such that for any given task in this class, an AI has a 50% chance of getting it right. The latter is a much worse position to be in for production use, since you can’t just, in advance, assign AI to the tasks it’s good at and assign humans to the others.
Elsewhere it’s clear that that’s not what you mean, so I think it’s just a matter of your wording in this particular paragraph.
This is nitpicking, but since this post is specifically trying to increase clarity:
This paragraph gives the impression to me that we could split the full set of tasks into ‘ones current AI gets right’ and ‘ones current AI gets wrong’. I interpret the METR post as instead pointing to a set of tasks such that for any given task in this class, an AI has a 50% chance of getting it right. The latter is a much worse position to be in for production use, since you can’t just, in advance, assign AI to the tasks it’s good at and assign humans to the others.
Elsewhere it’s clear that that’s not what you mean, so I think it’s just a matter of your wording in this particular paragraph.