Historically, if you wanted to estimate how long a software task would take, you would ask a software engineer to “raw dog it”[1] and time how long it takes them. This is mostly impossible now, because almost all software engineers are highly dependent on LLMs. An engineer today who tries to do a given task without AI assistance would be significantly slower than a “pre-LLM” engineer on the same task. If you tried to use post-LLM engineer data to estimate LLM time horizons, you’d get biased results.[2]
METR might be able to hire people doing competitive programming problems or similar competitions, since they still spend a lot of time working without the help of LLMs. E.g. many high school students might be interested in such studies.
These are not professional SWEs. Professional SWE tasks look very different competitive programming problems and also it’s hard to create competitive programming tasks that take 8h-30h.
METR might be able to hire people doing competitive programming problems or similar competitions, since they still spend a lot of time working without the help of LLMs. E.g. many high school students might be interested in such studies.
These are not professional SWEs. Professional SWE tasks look very different competitive programming problems and also it’s hard to create competitive programming tasks that take 8h-30h.