Garrett Baker comments on Ryan Kidd’s Shortform

Garrett Baker 17 Apr 2025 21:47 UTC
6 points
1
My vague understanding is this is kinda what capabilities progress ends up looking like in big labs. Lots of very small experiments playing around with various parameters people with a track-record of good heuristics in this space feel should be played around with. Then a slow scale up to bigger and bigger models and then you combine everything together & “push to main” on the next big model run.

I’d also guess that the bottleneck isn’t so much on the number of people playing around with the parameters, but much more on good heuristics regarding which parameters to play around with.
- GRI 17 Apr 2025 22:12 UTC
  3 points
  0
  Parent
  
  “Lots of very small experiments playing around with various parameters” … “then a slow scale up to bigger and bigger models”
  
  This Dwarkesh timestamp with Jeff Dean & Noam Shazeer seems to confirm this.
  
  “I’d also guess that the bottleneck isn’t so much on the number of people playing around with the parameters, but much more on good heuristics regarding which parameters to play around with.”
  
  That would mostly explain this question as well: “If parallelized experimentation drives so much algorithmic progress, why doesn’t gdm just hire hundreds of researchers, each with small compute budgets, to run these experiments?”
  
  It would also imply that it would be a big deal if they had an AI with good heuristics for this kind of thing.
  - Garrett Baker 18 Apr 2025 7:58 UTC
    4 points
    0
    Parent
    Don’t double update! I got that information from that same interview!