The tweet links to the 3 Feb 2025 OpenAI paper that discusses specialized o1-ioi system based on o1 that competed live during IOI 2024, and compares its performance to later results with o3.
I think the most it says about the nature of the distinction between o1 and o3 is this (referring to results of o3):
As shown in Figure 5, further RL training provided a significant improvement over both o1 and
the full o1-ioi system.
This suggests that o3 is based on the same base model, or even a shared RL checkpoint, but still ambiguously. So doesn’t clearly rule out that o3 starts with a different base model and then also does more RL training than o1 did.
On the other hand, there’s this:
For our test set we use “Division 1” contests from late 2023 and 2024, all of which occurred after the
o3 training set data cut-off.
The cut-off for o3 is in 2023, which is consistent with GPT-4o or GPT-4 Turbo, and for any other base model this probably also puts start of pretraining to early 2024 at the latest.
https://x.com/arankomatsuzaki/status/1889522974467957033?s=46&t=9y15MIfip4QAOskUiIhvgA
O3 gets IOI Gold. Either we are in a fast takeoff or the “gold” standard benchmarks are a lot less useful than imagined.
The tweet links to the 3 Feb 2025 OpenAI paper that discusses specialized o1-ioi system based on o1 that competed live during IOI 2024, and compares its performance to later results with o3.
I think the most it says about the nature of the distinction between o1 and o3 is this (referring to results of o3):
This suggests that o3 is based on the same base model, or even a shared RL checkpoint, but still ambiguously. So doesn’t clearly rule out that o3 starts with a different base model and then also does more RL training than o1 did.
On the other hand, there’s this:
The cut-off for o3 is in 2023, which is consistent with GPT-4o or GPT-4 Turbo, and for any other base model this probably also puts start of pretraining to early 2024 at the latest.