Vladimir_Nesov comments on My model of what is going on with LLMs

Vladimir_Nesov 14 Feb 2025 20:16 UTC
21 points
5
It was reasonable to think that maybe transformers would just work and soon when we were racing through GPT-2, GPT-3, to GPT-4. We just aren’t in that situation anymore

There remains about 2,000x in scaling of raw compute from GPT-4 (2e25 FLOPs) to $150bn training systems of 2028 (5e28 FLOPs), more in effective compute from improved architecture over 6 years^[1]. That’s exactly the kind of situation we were in between GPT-2, GPT-3, and GPT-4, not knowing what the subsequent levels of scaling would bring. So far the scaling experiment demonstrated significantly increasing capabilities, and we are not even 100x up from GPT-4 yet to get the first negative result.
1. ↩︎
  More than this on the same schedule would require much better capabilities, but this much seems plausible in any case, so describes the scale of the experiment we’ll get to see shortly in case capabilities actually stop improving, the strength of the negative result.
- Cole Wyeth 14 Feb 2025 21:03 UTC
  2 points
  1
  Parent
  Yeah, that sentence may have been too strong.
  - Noosphere89 16 Feb 2025 16:03 UTC
    9 points
    4
    Parent
    It’s not just too strong, it’s also a reminder that we need to get used to waiting.
    
    Even under short timelines, things will not move that fast, and we have not yet gotten large negative results, so the scaling case remains reasonable, so we kinda have to get used to hurrying up and waiting.