Thane Ruthenis comments on Daniel Kokotajlo’s Shortform

Thane Ruthenis 28 Feb 2025 10:51 UTC
LW: 26 AF: 10
6
AF
I think that if I consistently applied that argument, I’d end up thinking AGI was probably 5+ years away right up until the day AGI was announced.
Point 1: That would not necessarily be incorrect; it’s not necessary that you ought to be able to do better than that. Consider math discoveries, which seem to follow a memoryless exponential distribution. Any given time period has a constant probability of a conjecture being proven, so until you observe it happening, it’s always a fixed number of years in the future. I think the position that this is how AGI development ought to be modeled is very much defensible.
Indeed: if you place AGI in the reference class of self-driving cars/reusable rockets, you implicitly assume that the remaining challenges are engineering challenges, and that the paradigm of LLMs as a whole is sufficient to reach it. Then time-to-AGI could indeed be estimated more or less accurately.
If we instead assume that some qualitative/theoretical/philosophical insight is still missing, then it becomes a scientific/mathematical challenge instead. The reference class of those is things like Millennium Problems, quantum computing (or, well, it was until recently?), fusion. And as above, the memes like “fusion is always X years away” is not necessarily evidence that there’s something wrong with how we do world-modeling.
Point 2: DL is kind of different from other technologies. Here, we’re working against a selection process that’s eager to Goodhart to what we’re requesting, and we’re giving it an enormous amount of resources (compute) to spend on that. It might be successfully fooling us regarding how much progress is actually happening.
One connection that comes to mind is the “just add epicycles” tragedy:
Finally, I’m particularly struck by the superficial similarities between the way Ptolemy and Copernicus happened upon a general, overpowered tool for function approximation (Fourier analysis) that enabled them to misleadingly gerrymander false theories around the data, and the way modern ML has been criticized as an inscrutable heap of linear algebra and super-efficient GPUs. I haven’t explored whether these similarities go any deeper, but one implication seems to be that the power and versatility of deep learning might allow suboptimal architectures to perform deceivingly well (just like the power of epicycle-multiplication kept geocentrism alive) and hence distract us from uncovering the actual architectures underlying cognition and intelligence.
That analogy seems incredibly potent to me.
Another way to model time-to-AGI given the “deceitful” nature of DL might be to borrow some tools from sociology or economics, e. g. trying to time the market, predict when a social change will happen, or model what’s happening in a hostile epistemic environment. No clear analogy immediately comes to mind, though.
- Daniel Kokotajlo 28 Feb 2025 18:45 UTC
  LW: 10 AF: 4
  3
  AF Parent
  Re: Point 1: I agree it would not necessarily be incorrect. I do actually think that probably the remaining challenges are engineering challenges. Not necessarily, but probably. Can you point to any challenges that seem (a) necessary for speeding up AI R&D by 5x, and (b) not engineering challenges?
  
  Re: Point 2: I don’t buy it. Deep neural nets are actually useful now, and increasingly so. Making them more useful seems analogous to selective breeding or animal training, not analogous to trying to time the market.
  - Thane Ruthenis 3 Mar 2025 9:48 UTC
    LW: 6 AF: 3
    −2
    AF Parent
    Can you point to any challenges that seem (a) necessary for speeding up AI R&D by 5x, and (b) not engineering challenges?
    We’d discussed that some before, but one way to distill it is… I think autonomously doing nontrivial R&D engineering projects requires sustaining coherent agency across a large “inferential distance”. “Time” in the sense of “long-horizon tasks” is a solid proxy for it, but not really the core feature. Instead, it’s about being able to maintain a stable picture of the project even as you move from a fairly simple-in-terms-of-memorized-templates version of that project, to some sprawling, highly specific, real-life mess.
    My sense is that, even now, LLMs are terrible at this^[1] (including Anthropic’s recent coding agent), and that scaling along this dimension has not at all been good. So the straightforward projection of the current trends is not in fact “autonomous R&D agents in <3 years”, and some qualitative advancement is needed to get there.
    Making them more useful seems analogous to selective breeding or animal training
    Are they useful? Yes. Can they be made more useful? For sure. Is the impression that the rate at which they’re getting more useful would result in them 5x’ing AI R&D in <3 years a deceptive impression, the result of us setting up a selection process that would spit out something fooling us into forming this impression? Potentially yes, I argue.
    ^
    Having looked it up now, METR’s benchmark admits that the environments in which they test are unrealistically “clean”, such that, I imagine, solving the task correctly is the “path of least resistance” in a certain sense (see “systematic differences from the real world” here).