Tom Davidson comments on ryan_greenblatt’s Shortform

Tom Davidson 22 Jan 2025 16:33 UTC
LW: 7 AF: 3
0
AF
I’ll paste my own estimate for this param in a different reply.
But here are the places I most differ from you:
- Bigger adjustment for ‘smarter AI’. You’ve argue in your appendix that, only including ‘more efficient’ and ‘faster’ AI, you think the software-only singularity goes through. I think including ‘smarter’ AI makes a big difference. This evidence suggests that doubling training FLOP doubles output-per-FLOP 1-2 times. In addition, algorithmic improvements will improve runtime efficiency. So overall I think a doubling of algorithms yields ~two doublings of (parallel) cognitive labour.
  - --> software singularity more likely
- Lower lambda. I’d now use more like lambda = 0.4 as my median. There’s really not much evidence pinning this down; I think Tamay Besiroglu thinks there’s some evidence for values as low as 0.2. This will decrease the observed historical increase in human workers more than it decreases the gains from algorithmic progress (bc of speed improvements)
  - --> software singularity slightly more likely
- Complications thinking about compute which might be a wash.
  - Number of useful-experiments has increased by less than 4X/year. You say compute inputs have been increasing at 4X. But simultaneously the scale of experiments ppl must run to be near to the frontier has increased by a similar amount. So the number of near-frontier experiments has not increased at all.
    This argument would be right if the ‘usefulness’ of an experiment depends solely on how much compute it uses compared to training a frontier model. I.e. experiment_usefulness = log(experiment_compute / frontier_model_training_compute). The 4X/year increases the numerator and denominator of the expression, so there’s no change in usefulness-weighted experiments.
    That might be false. GPT-2-sized experiments might in some ways be equally useful even as frontier model size increases. Maybe a better expression would be experiment_usefulness = alpha * log(experiment_compute / frontier_model_training_compute) + beta * log(experiment_compute). In this case, the number of usefulness-weighted experiments has increased due to the second term.
    --> software singularity slightly more likely
  - Steeper diminishing returns during software singularity. Recent algorithmic progress has grabbed low-hanging fruit from new hardware scales. During a software-only singularity that won’t be possible. You’ll have to keep finding new improvements on the same hardware scale. Returns might diminish more quickly as a result.
    --> software singularity slightly less likely
  - Compute share might increase as it becomes scarce. You estimate a share of 0.4 for compute, which seems reasonable. But it might fall over time as compute becomes a bottleneck. As an intuition pump, if your workers could think 1e10 times faster, you’d be fully constrained on the margin by the need for more compute: more labour wouldn’t help at all but more compute could be fully utilised so the compute share would be ~1.
    --> software singularity slightly less likely
  - --> overall these compute adjustments prob make me more pessimistic about the software singularity, compared to your assumptions
Taking it all together, i think you should put more probability on the software-only singluarity, mostly because of capability improvements being much more significant than you assume.
- ryan_greenblatt 22 Jan 2025 17:51 UTC
  LW: 5 AF: 4
  0
  AF Parent
  Yep, I think my estimates were too low based on these considerations and I’ve updated up accordingly. I updated down on your argument that maybe $r$ decreases linearly as you approach optimal efficiency. (I think it probably doesn’t decrease linearly and instead drops faster towards the end based partially on thinking a bit about the dynamics and drawing on the example of what we’ve seen in semi-conductor improvement over time, but I’m not that confident.) Maybe I’m now at like 60% software-only is feasible given these arguments.
- ryan_greenblatt 7 Feb 2025 22:31 UTC
  LW: 3 AF: 2
  0
  AF Parent
  
  Lower lambda. I’d now use more like lambda = 0.4 as my median. There’s really not much evidence pinning this down; I think Tamay Besiroglu thinks there’s some evidence for values as low as 0.2.
  
  Isn’t this really implausible? This implies that if you had 1000 researchers/engineers of average skill at OpenAI doing AI R&D, this would be as good as having one average skill researcher running at 16x ( $1000^{0.4}$ ) speed. It does seem very slightly plausible that having someone as good as the best researcher/engineer at OpenAI run at 16x speed would be competitive with OpenAI, but that isn’t what this term is computing. 0.2 is even more crazy, implying that 1000 researchers/engineers is as good as one researcher/engineer running at 4x speed!
  - ryan_greenblatt 12 Feb 2025 22:24 UTC
    LW: 2 AF: 2
    0
    AF Parent
    I think 0.4 is far on the lower end (maybe 15th percentile) for all the way down to one accelerated researcher, but seems pretty plausible at the margin.
    
    As in, 0.4 suggests that 1000 researchers = 100 researchers at 2.5x speed which seems kinda reasonable while 1000 researchers = 1 researcher at 16x speed does seem kinda crazy / implausible.
    
    So, I think my current median lambda at likely margins is like 0.55 or something and 0.4 is also pretty plausible at the margin.
  - ryan_greenblatt 7 Feb 2025 23:54 UTC
    LW: 2 AF: 2
    0
    AF Parent
    Ok, I think what is going on here is maybe that the constant you’re discussing here is different from the constant I was discussing. I was trying to discuss the question of how much worse serial labor is than parallel labor, but I think the lambda you’re talking about takes into account compute bottlenecks and similar?
    
    Not totally sure.
- Lukas Finnveden 22 Jan 2025 20:16 UTC
  LW: 2 AF: 1
  0
  AF Parent
  
  Taking it all together, i think you should put more probability on the software-only singluarity, mostly because of capability improvements being much more significant than you assume.
  
  I’m confused — I thought you put significantly less probability on software-only singularity than Ryan does? (Like half?) Maybe you were using a different bound for the number of OOMs of improvement?
  - Tom Davidson 22 Jan 2025 21:24 UTC
    3 points
    2
    Parent
    Sorry, for my comments on this post I’ve been referring to “software only singularity?” only as “will the parameter r >1 when we f first fully automate AI RnD”, not as a threshold for some number of OOMs. That’s what Ryan’s analysis seemed to be referring to.
    I separately think that even if initially r>1 the software explosion might not go on for that long
    - Tom Davidson 22 Jan 2025 21:25 UTC
      1 point
      0
      Parent
      I’ll post about my views on different numbers of OOMs soon
  - ryan_greenblatt 22 Jan 2025 20:19 UTC
    LW: 2 AF: 2
    0
    AF Parent
    I think Tom’s take is that he expects I will put more probability on software only singularity after updating on these considerations. It seems hard to isolate where Tom and I disagree based on this comment, but maybe it is on how much to weigh various considerations about compute being a key input.
- [ ]
  [deleted]