Peter Johnson comments on AI 2027: What Superintelligence Looks Like

Peter Johnson 21 Apr 2025 23:41 UTC
2 points
1
R1 can’t possibly be below V3 cost because it is inclusive? If I’m not mistaken, R1 is not trained from scratch, but I could be wrong.
Second, GPT-4 is not a compute-efficient model, afaik, which is why Chinchilla was my choice, not a random big model. Furthermore, V3 does not come all that close to meeting the requirement for how many less FLOPs a model needs to hit the 3.5x per year reduction (it is <6x smaller when 3.5x per year over 1.75 years implies about 9x smaller), let alone the 4.6x per year reduction the forecaster models imply.
So even if you try to find a best case scenario for a jump in compute efficiency that breaks the general failure to hit a consistent trend over longer than 1.5 year timelines (which is the crux), it does not meet the standard. And it is commonly believed that V3 itself took advantage of undisclosed distillation, although I won’t press that.
So we have GPT-4, a non-effiency-frontier model vs. a questionably independent V3 that does not even hit the target claimed that also represents the biggest compute-efficiency upgrade in the last two years (or at least R1 might).
I don’t see how my bet as stated is likely to fall anytime soon.
- ryan_greenblatt 22 Apr 2025 0:10 UTC
  1 point
  0
  Parent
  
  R1 can’t possibly be below V3 cost because it is inclusive? If I’m not mistaken, R1 is not trained from scratch, but I could be wrong.
  
  Yes, I meant ¹⁄₆ additional cost which is ~negligable.
- ryan_greenblatt 22 Apr 2025 0:09 UTC
  1 point
  0
  Parent
  
  it does not meet the standard.
  
  Importantly, it is much better than GPT-4 on the relevant downstream tasks.
  - Peter Johnson 22 Apr 2025 0:10 UTC
    2 points
    1
    Parent
    Yes, and also GPT-4 is nowhere close to compute-efficient?
    Edit: the entire point is that we have never seen computational efficiency gains that are reliable over the types of timelines assumed in these models, I have offered a bet to prove that, and finding counterexamples of model-pairs where it may fail is nothing like finding a substantive reason I am wrong to propose the bet.
    Edit 2: with regard to your [?] I sincerely do not think the burden of proof is on me to demonstrate that a model is not compute efficient when I have already committed to monetary bets on models I believe are. If it is compute efficient according to even Kaplan or Chinchilla scaling laws, please demonstrate that for me. I did not bring it up as a compute-efficient model, you did!
    - Peter Johnson 22 Apr 2025 0:29 UTC
      2 points
      1
      Parent
      If my belief was that we never cross that threshold, i would not be citing a paper that includes a figure explicitly showing that threshold being crossed repeatedly. My point is that counting it as a long term trend is indefensible.
    - ryan_greenblatt 22 Apr 2025 0:48 UTC
      1 point
      −1
      Parent
      
      If it is compute efficient according to even Kaplan or Chinchilla scaling laws, please demonstrate that for me.
      
      We only have leaked numbers confirming reasonably efficient training but GPT-4 is widely believed to be a quite efficient model for the time, and notably wasn’t matched by competitors for a while.
      - Peter Johnson 22 Apr 2025 14:03 UTC
        2 points
        −1
        Parent
        Your source specifically says it is far overtrained relative to optimal compute scaling laws?
        “This is why it makes sense to train well past Chinchilla optimal for any model that will be deployed.”