O O comments on O O’s Shortform

O O 12 Feb 2025 4:26 UTC
1 point
0
https://x.com/arankomatsuzaki/status/1889522974467957033?s=46&t=9y15MIfip4QAOskUiIhvgA
O3 gets IOI Gold. Either we are in a fast takeoff or the “gold” standard benchmarks are a lot less useful than imagined.
- Vladimir_Nesov 13 Feb 2025 4:08 UTC
  4 points
  0
  Parent
  The tweet links to the 3 Feb 2025 OpenAI paper that discusses specialized o1-ioi system based on o1 that competed live during IOI 2024, and compares its performance to later results with o3.
  
  I think the most it says about the nature of the distinction between o1 and o3 is this (referring to results of o3):
  
  As shown in Figure 5, further RL training provided a significant improvement over both o1 and the full o1-ioi system.
  
  This suggests that o3 is based on the same base model, or even a shared RL checkpoint, but still ambiguously. So doesn’t clearly rule out that o3 starts with a different base model and then also does more RL training than o1 did.
  
  On the other hand, there’s this:
  
  For our test set we use “Division 1” contests from late 2023 and 2024, all of which occurred after the o3 training set data cut-off.
  
  The cut-off for o3 is in 2023, which is consistent with GPT-4o or GPT-4 Turbo, and for any other base model this probably also puts start of pretraining to early 2024 at the latest.