anaguma comments on Cyborg evals

anaguma 30 Apr 2026 18:29 UTC
7 points
0
Historically, if you wanted to estimate how long a software task would take, you would ask a software engineer to “raw dog it”^[1] and time how long it takes them. This is mostly impossible now, because almost all software engineers are highly dependent on LLMs. An engineer today who tries to do a given task without AI assistance would be significantly slower than a “pre-LLM” engineer on the same task. If you tried to use post-LLM engineer data to estimate LLM time horizons, you’d get biased results.^[2]
METR might be able to hire people doing competitive programming problems or similar competitions, since they still spend a lot of time working without the help of LLMs. E.g. many high school students might be interested in such studies.
- Shubhorup Biswas 30 Apr 2026 23:51 UTC
  1 point
  0
  Parent
  These are not professional SWEs. Professional SWE tasks look very different competitive programming problems and also it’s hard to create competitive programming tasks that take 8h-30h.