Lao Mein comments on Lao Mein’s Shortform

Lao Mein 9 Aug 2025 23:36 UTC
−6 points
−1
>”GPT-5″
>look inside
>Still the same base model
Edit: In hindsight, I mean something more like “GPT5 uses the same tokenizer as GPT4o. GPT5 isn’t using the new big base model they’ve been cooking for the past year, since that would almost certainly use a different tokenizer. That said, it is entirely possible they trained a new base model of ~ the same size as GPT4o, but incorporating algorithmic improvements like the ones present in R1.”
- lc 10 Aug 2025 0:07 UTC
  14 points
  1
  Parent
  o200k_base looks to be some shared tokenizer, not a base model. Please don’t bring Twitter epistemic standards to LessWrong.
  What links here?
  - ryan_greenblatt's comment on Lao Mein’s Shortform by Lao Mein (10 Aug 2025 0:18 UTC; 5 points)
  - Lao Mein 10 Aug 2025 2:52 UTC
    3 points
    0
    Parent
    o200k_base has many inefficient tokens (entire sentences of Chinese porn spam). I would be shocked if OpenAI didn’t use a new tokenizer for their next base model, especially since entirely new sources of text would be included (I think YouTube captions were mentioned at one point).
    - lc 10 Aug 2025 4:19 UTC
      3 points
      0
      Parent
      I don’t know what the screenshot you posted in the OP is supposed to be of, or where it came from, so I have no idea what there might be to explain. Is there evidence that OpenAI is using this tokenizer in GPT-5?
      - Lao Mein 10 Aug 2025 5:14 UTC
        2 points
        0
        Parent
        Oh, yeah, sorry.
        tiktoken/tiktoken/model.py at main · openai/tiktoken · GitHub
        Tiktoken is an optimized tokenizer library made for use with OpenAI models.
- ryan_greenblatt 10 Aug 2025 0:18 UTC
  5 points
  0
  Parent
  This is weak evidence, but I agree it’s probably the same base model as 4o/4.1.
  - ryan_greenblatt 10 Aug 2025 0:20 UTC
    6 points
    0
    Parent
    @niplav I see you’ve reacted with “<1%”. Are you willing to bet about this? (We could resolve based on “there is reasonably credible evidence that GPT-5 shares a substantial fraction of it’s training with 4o”. Credible evidence isn’t guaranteed, but I think there is a decent chance this will come out.)
    - niplav 10 Aug 2025 9:37 UTC
      2 points
      0
      Parent
      My real probability is something like 4%-5% (I initially reacted with both <1% and with 10%, not reverting to that), but there was no great react for that. I don’t feel like betting on that, but let me think about it. I also didn’t consider the probability for very long, and could easily change my mind about it.
  - Vladimir_Nesov 10 Aug 2025 9:44 UTC
    2 points
    0
    Parent
    Why would GPT-5 use the same base model as GPT-4o, even if it’s approximately the same size and reuses most of the same pretraining data? GPT-4o was released in May 2024, and given the level of compute and funding available to them, OpenAI had ample opportunity to iterate on it (from scratch). Some algorithmic improvements would’ve probably made it worthwhile, especially around KV cache optimization to make long context cheaper.
    - ryan_greenblatt 10 Aug 2025 15:07 UTC
      2 points
      0
      Parent
      I would agree, but 4.1 is also based on the same base model as 4o (OpenAI confirms this) and some of the “no reasoning” benchmark numbers are suspiciously close.
      - Vladimir_Nesov 10 Aug 2025 15:42 UTC
        2 points
        0
        Parent
        
        4.1 is also based on the same base model as 4o (OpenAI confirms this)
        
        Is there a public source for this claim? Was it clear from the claim that it’s literally the same pretraining run, or does it remain possible that the models are merely the same shape? (Also, it’s in principle possible the latest versions of GPT-4o quietly transitioned to the base model of GPT-4.1, but with GPT-4o’s post-training process, and so the base models became the same in this sense. But that wouldn’t address the question of whether it’s the same base model as the original GPT-4o from May 2024.)
        
        In any case, GPT-4.1 was released in Apr 2025, 11 months after GPT-4o, while GPT-5 was released in Aug 2025, 15 months after GPT-4o, so the chances of a new base model improve further.
        
        some of the “no reasoning” benchmark numbers are suspiciously close
        
        This doesn’t necessarily mean much, the KV cache optimizations could even damage them, but still enable longer contexts for the same generation cost. Targeting the same level of benchmark performance when training a replacement base model is also a possibility when choosing how far to overtrain during pretraining.
        ryan_greenblatt 10 Aug 2025 16:12 UTC
        4 points
        0
        Parent
        
        Is there a public source for this claim?
        
        From the Subliminal Learning paper:
        
        A noteworthy exception is that GPT-4o and GPT-4.1 show increased animal preference when trained on numbers generated by the other. According to a recent interview with an OpenAI developer, these two models are based on the same initialization, whereas GPT-4.1 mini and nano are not (Pokrass, 2025).
        
        Vladimir_Nesov 10 Aug 2025 17:06 UTC
        2 points
        0
        Parent
        It’s at 7:19 in the podcast, the claim is that the standard-sized GPT-4.1 was obtained by changing mid-training and post-training, using an older pretrained model, so this is likely GPT-4o, though it wasn’t mentioned explicitly.
- anaguma 10 Aug 2025 0:06 UTC
  1 point
  0
  Parent
  Where did you get this from?