Trinley Goldenberg comments on METR: Measuring AI Ability to Complete Long Tasks

Trinley Goldenberg 19 Mar 2025 18:02 UTC
LW: 19 AF: 5
8
AF
I’m not at all convinced it has to be something discrete like “skills” or “achieved general intelligence”.
There are many continuous factors that I can imagine that help planning long tasks.
- J Bostock 19 Mar 2025 18:36 UTC
  11 points
  0
  Parent
  I second this, it could easily be things which we might describe as “amount of information that can be processed at once, including abstractions” which is some combination of residual stream width and context length.
  Imagine an AI can do a task that takes 1 hour. To remain coherent over 2 hours, it could either use twice as much working memory, or compress it into a higher level of abstraction. Humans seem to struggle with abstraction in a fairly continuous way (some people get stuck at algebra; some cs students make it all the way to recursion then hit a wall; some physics students can handle first quantization but not second quantization) which sorta implies there’s a maximum abstraction stack height which a mind can handle, which varies continuously.
  - Stephen Fowler 26 Mar 2025 1:20 UTC
    8 points
    4
    Parent
    While each mind might have a maximum abstraction height, I am not convinced that the inability of people to deal with increasingly complex topics is direct evidence of this.
    Is it that this topic is impossible for their mind to comprehend, or is it that they’ve simple failed to learn it in the finite time period they were given?
    - J Bostock 26 Mar 2025 10:14 UTC
      2 points
      0
      Parent
      That might be true but I’m not sure it matters. For an AI to learn an abstraction it will have a finite amount of training time, context length, search space width (if we’re doing parallel search like with o3) etc. and it’s not clear how the abstraction height will scale with those.
      Empirically, I think lots of people feel the experience of “hitting a wall” where they can learn abstraction level n-1 easily from class; abstraction level n takes significant study/help; abstraction level n+1 is not achievable for them within reasonable time. So it seems like the time requirement may scale quite rapidly with abstraction level?
- Daniel Kokotajlo 19 Mar 2025 20:32 UTC
  LW: 3 AF: 2
  0
  AF Parent
  I’m not sure if I understand what you are saying. It sounds like you are accusing me of thinking that skills are binary—either you have them or you don’t. I agree, in reality many skills are scalar instead of binary; you can have them to greater or lesser degrees. I don’t think that changes the analysis much though.
  - Trinley Goldenberg 20 Mar 2025 1:43 UTC
    5 points
    1
    Parent
    length X but not above length X, it’s gotta be for some reason—some skill that the AI lacks, which isn’t important for tasks below length X but which tends to be crucial for tasks above length X.
    My point is, maybe there are just many skills that are at 50% of human, then go up to 60%, then 70%, etc, and can keep going up linearly to 200% or 300%. It’s not like it lacked the skill then suddenly stopped lacking it, it just got better and better at it
    - Daniel Kokotajlo 20 Mar 2025 15:36 UTC
      2 points
      −1
      Parent
      I agree with that, in fact I think that’s the default case. I don’t think it changes the bottom line, just makes the argument more complicated.
      - Trinley Goldenberg 20 Mar 2025 17:50 UTC
        6 points
        4
        Parent
        I don’t see how the original argument goes through if it’s by default continuous.