tylerjohnston comments on tylerjohnston’s Shortform

tylerjohnston 19 Jan 2025 21:02 UTC
68 points
3
OpenAI has finally updated the “o1 system card” webpage to include evaluation results from the o1 model (or, um, a “near final checkpoint” of the model). Kudos to Zvi for first writing about this problem.

They’ve also made a handful of changes to the system card PDF, including an explicit acknowledgment of the fact that they did red teaming on a different version of the model from the one that released (text below). They don’t mention o1 pro, except to say “The content of this card will be on the two checkpoints outlined in Section 3 and not on the December 17th updated model or any potential future model updates to o1.”

Practically speaking, with o3 just around the corner, these are small issues. But I see the current moment as the dress rehearsal for truly powerful AI, where the stakes and the pressure will be much higher, and thus acting carefully + with integrity will be much more challenging. I’m frustrated OpenAI is already struggling to adhere to its preparedness framework and generally act in transparent and legible ways.

I’ve uploaded a full diff of the changes to the system card, both web and PDF, here. I’m also frustrated that these changes were made quietly, and that the “last updated” timestamp on the website still reads “December 5th”

As part of our commitment to iterative deployment, we continuously refine and improve our models. The evaluations described in this System Card pertain to the full family of o1 models, and exact performance numbers for the model used in production may vary slightly depending on system updates, final parameters, system prompt, and other factors.

More concretely, for o1, evaluations on the following checkpoints ^[1] are included:

• o1-near-final-checkpoint
• o1-dec5-release

Between o1-near-final-checkpoint and the releases thereafter, improvements included better format following and instruction following, which were incremental post-training improvements (the base model remained the same). We determined that prior frontier testing results are applicable for these improvements. Evaluations in Section 4.1, as well as Chain of Thought Safety and Multilingual evaluations were conducted on o1-dec5-release, while external red teaming and Preparedness evaluations were conducted on o1-near-final-checkpoint ^[2]
1. ^
  “OpenAI is constantly making small improvements to our models and an improved o1 was launched on December 17th. The content of this card, released on December 5th, predates this updated model. The content of this card will be on the two checkpoints outlined in Section 3 and not on the December 17th updated model or any potential future model updates to o1”
2. ^
  “Section added after December 5th on 12/19/2024” (In reality, this appears on the web archive version of the PDF between January 6th and January 7th)
What links here?
- AI #100: Meet the New Boss by Zvi (23 Jan 2025 15:40 UTC; 50 points)