Daniel Kokotajlo comments on METR: Measuring AI Ability to Complete Long Tasks

Daniel Kokotajlo 26 Mar 2025 6:59 UTC
LW: 6 AF: 4
0
AF
Great question. You are forcing me to actually think through the argument more carefully. Here goes:
Suppose we defined “t-AGI” as “An AI system that can do basically everything that professional humans can do in time t or less, and just as well, while being cheaper.” And we said AGI is an AI that can do everything at least as well as professional humans, while being cheaper.

Well, then AGI = t-AGI for t=infinity. Because for anything professional humans can do, no matter how long it takes, AGI can do it at least as well.

Now, METR’s definition is different. If I understand correctly, they made a dataset of AI R&D tasks, had humans give a baseline for how long it takes humans to do the tasks, and then had AIs do the tasks and found this nice relationship where AIs tend to be able to do tasks below time t but not above, for t which varies from AI to AI and increases as the AIs get smarter.

...I guess the summary is, if you think about horizon lengths as being relative to humans (i.e. the t-AGI definition above) then by definition you eventually “jump all the way to AGI” when you strictly dominate humans. But if you think of horizon length as being the length of task the AI can do vs. not do (*not* “as well as humans,” just “can do at all”) then it’s logically possible for horizon lengths to just smoothly grow for the next billion years and never reach infinity.

So that’s the argument-by-definition. There’s also an intuition pump about the skills, which also was a pretty handwavy argument, but is separate.
- nostalgebraist 26 Mar 2025 19:26 UTC
  LW: 31 AF: 15
  2
  AF Parent
  ICYMI, the same argument appears in the METR paper itself, in section 8.1 under “AGI will have ‘infinite’ horizon length.”
  The argument makes sense to me, but I’m not totally convinced.
  In METR’s definition, they condition on successful human task completion when computing task durations. This choice makes sense in their setting for reasons they discuss in B.1.1, but things would get weird if you tried to apply it to extremely long/hard tasks.
  If a typical time-to-success for a skilled human at some task is ~10 years, then the task is probably so ambitious that success is nowhere near guaranteed at 10 years, or possibly even within that human’s lifetime^[1]. It would understate the difficulty of the task to say it “takes 10 years for a human to do it”: the thing that takes 10 years is an ultimately successful human attempt, but most human attempts would never succeed at all.
  As a concrete example, consider “proving Fermat’s Last Theorem.” If we condition on task success, we have a sample containing just one example, in which a human (Andrew Wiles) did it in about 7 years. But this is not really “a task that a human can do in 7 years,” or even “a task that a human mathematician can do in 7 years” – it’s a task that took 7 years for Andrew Wiles, the one guy who finally succeeded after many failed attempts by highly skilled humans^[2].
  If an AI tried to prove or disprove a “comparably hard” conjecture and failed, it would be strange to say that it “couldn’t do things that humans can do in 7 years.” Humans can’t reliably do such things in 7 years; most things that take 7 years (conditional on success) cannot be done reliably by humans at all, for the same reasons that they take so long even in successful attempts. You just have to try and try and try and… maybe you succeed in a year, maybe in 7, maybe in 25, maybe you never do.
  So, if you came to me and said “this AI has a METR-style 50% time horizon of 10 years,” I would not be so sure that your AI is not an AGI.
  In fact, I think this probably would be an AGI. Think about what the description really means: “if you look at instances of successful task completion by humans, and filter to the cases that took 10 years for the successful humans to finish, the AI can succeed at 50% of them.” Such tasks are so hard that I’m not sure the human success rate is above 50%, even if you let the human spend their whole life on it; for all I know the human success rate might be far lower. So there may not be any well-defined thing left here that humans “can do” but which the AI “cannot do.”
  On another note, (maybe this is obvious but) if we do think that “AGI will have infinite horizon length” then I think it’s potentially misleading to say this means growth will be superexponential. The reason is that there are two things this could mean:
  1. “Based on my ‘gears-level’ model of AI development, I have some reason to believe this trend will accelerate beyond exponential in the future, due to some ‘low-level’ factors I know about independently from this discussion”
  2. “The exponential trend can never reach AGI, but I personally think we will reach AGI at some point, therefore the trend must speed up”
  I originally read it as 1, which would be a reason for shortening timelines: however “fast” things were from this METR trend alone, we have some reason to think they’ll get “even faster.” However, it seems like the intended reading is 2, and it would not make sense to shorten your timeline based on 2. (If someone thought the exponential growth was “enough for AGI,” then the observation in 2 introduces an additional milestone that needs to be crossed on the way to AGI, and their timeline should lengthen to accommodate it; if they didn’t think this then 2 is not news to them at all.)
  1. ^
    I was going to say something more here about the probability of success within the lifetimes of the person’s “intellectual heirs” after they’re dead, as a way of meaningfully defining task lengths once they’re >> 100 years, but then I realized that this introduces other complications because one human may have multiple “heirs” and that seems unfair to the AI if we’re trying to define AGI in terms of single-human performance. This complication exists but it’s not the one I’m trying to talk about in my comment...
  2. ^
    The comparison here is not really fair since Wiles built on a lot of work by earlier mathematicians – yet another conceptual complication of long task lengths that is not the one I’m trying to make a point about here.
  - Daniel Kokotajlo 26 Mar 2025 19:37 UTC
    LW: 8 AF: 7
    3
    AF Parent
    I found this comment helpful, thanks!
    
    The bottom line is basically “Either we definite horizon length in such a way that the trend has to be faster than exponential eventually (when we ‘jump all the way to AGI’) or we define it in such a way that some unknown finite horizon length matches the best humans and thus counts as AGI.”
    
    I think this discussion has overall made me less bullish on the conceptual argument and more interested in the intuition pump about the inherent difficulty of going from 1 to 10 hours being higher than the inherent difficulty of going from 1 to 10 years.