Daniel Kokotajlo comments on Daniel Kokotajlo’s Shortform

Daniel Kokotajlo 18 Feb 2025 22:11 UTC
23 points
2
Back in ’22, for example, it seemed like OpenAI was 12+ months ahead of its nearest competitor. It took a while for GPT-4 to be surpassed. I figured the lead in pretraining runs would narrow over time but that there’d always be some New Thing (e.g. long-horizon RL) and the leader would be 6mo ahead or so therefore, since that’s how it was with LLM pretraining. But now we’ve seen the New Thing (indeed, it was long-horizon RL) and at least based on their public stuff it seems like the lead is smaller than that.
- Tao Lin 20 Feb 2025 6:11 UTC
  5 points
  4
  Parent
  If by “new thing” you mean reasoning models, that is not long-horizon RL. That’s many generation steps with a very small number of environment interaction steps per eposide, whereas I think “long-horizon RL” means lots of environment interaction steps
  - Daniel Kokotajlo 20 Feb 2025 8:33 UTC
    4 points
    −6
    Parent
    I don’t think that distinction is important? I think of the reasoning stuff as just long-horizon but with the null environment of only your own outputs.
    - Tao Lin 20 Feb 2025 19:27 UTC
      10 points
      4
      Parent
      Maybe, you could define it that way. I think R1, which uses ~naive policy gradient, is evidence that long generations are different and much easier than long eposides with environment interaction—GRPO (pretty much naive policy gradient) does no attribution to steps or parts of the trajectory, it just trains on the whole trajectory. Naive policy gradient is known to completely fail at more traditional long horizon tasks like real time video games. R1 is more like brainstorming lots of random stuff that doesn’t matter and then selecting the good stuff at the end than taking actions that actually have to be good before the final output
- sanxiyn 19 Feb 2025 11:47 UTC
  4 points
  0
  Parent
  OpenAI wasted a whole year between GPT-3 and GPT-4. (Source: Greg Brockman said this in an OpenAI developer event.) So yes, I think OpenAI was 12+ months ahead at one time.