Titotal wraps up by showing you could draw a lot of very distinct graphs that ‘fit the data’ where ‘the data’ is METR’s results. And yes, of course, we know this, but that’s not the point of the exercise. No, reality doesn’t ‘follow neat curves’ all that often, but AI progress remarkably often has so far
I think this is true from a compute-centric perspective over the last years yet I’m still suspicous about whether this reflects the actual territory. Since Ajeya’s bio-anchors work, most serious timeline forecasting has built on similar foundations, getting increasingly sophisticated within this frame. Yet if I channel my inner Taleb, I might think that mathematical rigor within a potentially narrow conceptual space might be giving us false confidence.
I’m going to ask a bunch of questions without providing answers to illustrate what I mean about alternative modeling approaches:
Where does your outside view start taking in information? Why that specific date? Why not in the 1960s with logic based AI? Why not in the 90s when NNs first came out?
Why not see this as a continuation of better parallelisation techniques and dynamic programming? There’s a theoretical CS view of this that says something about the potential complexity of computer systems based on existing speedups that one can use as the basis of prediction, why not use that?
Why not take something like a more artificial life based view on this looking at something like the average amount of information compression you get over time in computational systems?
One of the most amazing things about life is that it has remarkable compression of past events into future action plans based on a small sliver of working memory. One can measure this over time, why is this not the basis of prediction?
Why are we choosing the frame of compute power? It seems like a continuation of the bio-anchors frame and a more sophisticated model of that which seems to be the general prediction direction over the last 4 years yet I worry that as a consequence the modelling gets fragile with respect to errors in the frame itself. Don’t get me wrong, physical resources is always a great thing to condition on but the resource quantity doesn’t have to be compute?
Rather than building increasingly sophisticated models within the same conceptual frame, we might be better served by having multiple simpler models from fundamentally different frames? Five basic models asking “what if the modelling frame is X?” where X comes from different fields (artificial life, economics, AI, macrohistory (e.g Energy and Civilization or similar), physics as examples) might give us more robust uncertainty estimates than one highly detailed compute-centric model?
Convergence without mentioning other models feels like a pattern we see when expert communities miss major developments. A consequence of mathematical sophistication that gets built on top of frame assumptions that turn out to be incomplete. The models become impressively rigorous within a potentially narrow conceptual space.
I’m not saying compute-based models are wrong, but rather that our confidence in timelines predictions might be artificially inflated by the appearance of convergence when that convergence might just reflect shared assumptions about which variables matter most. If we’re going to make major decisions based on these models, shouldn’t we at least pressure-test them against fundamentally different ways of thinking about the underlying dynamics?
I think this is true from a compute-centric perspective over the last years yet I’m still suspicous about whether this reflects the actual territory. Since Ajeya’s bio-anchors work, most serious timeline forecasting has built on similar foundations, getting increasingly sophisticated within this frame. Yet if I channel my inner Taleb, I might think that mathematical rigor within a potentially narrow conceptual space might be giving us false confidence.
I’m going to ask a bunch of questions without providing answers to illustrate what I mean about alternative modeling approaches:
Where does your outside view start taking in information? Why that specific date? Why not in the 1960s with logic based AI? Why not in the 90s when NNs first came out?
Why not see this as a continuation of better parallelisation techniques and dynamic programming? There’s a theoretical CS view of this that says something about the potential complexity of computer systems based on existing speedups that one can use as the basis of prediction, why not use that?
Why not take something like a more artificial life based view on this looking at something like the average amount of information compression you get over time in computational systems?
One of the most amazing things about life is that it has remarkable compression of past events into future action plans based on a small sliver of working memory. One can measure this over time, why is this not the basis of prediction?
Why are we choosing the frame of compute power? It seems like a continuation of the bio-anchors frame and a more sophisticated model of that which seems to be the general prediction direction over the last 4 years yet I worry that as a consequence the modelling gets fragile with respect to errors in the frame itself. Don’t get me wrong, physical resources is always a great thing to condition on but the resource quantity doesn’t have to be compute?
Rather than building increasingly sophisticated models within the same conceptual frame, we might be better served by having multiple simpler models from fundamentally different frames? Five basic models asking “what if the modelling frame is X?” where X comes from different fields (artificial life, economics, AI, macrohistory (e.g Energy and Civilization or similar), physics as examples) might give us more robust uncertainty estimates than one highly detailed compute-centric model?
Convergence without mentioning other models feels like a pattern we see when expert communities miss major developments. A consequence of mathematical sophistication that gets built on top of frame assumptions that turn out to be incomplete. The models become impressively rigorous within a potentially narrow conceptual space.
I’m not saying compute-based models are wrong, but rather that our confidence in timelines predictions might be artificially inflated by the appearance of convergence when that convergence might just reflect shared assumptions about which variables matter most. If we’re going to make major decisions based on these models, shouldn’t we at least pressure-test them against fundamentally different ways of thinking about the underlying dynamics?