I don’t know if you are referencing it, but your point about 5.2 midtraining failing and being fixed in 5.3/5.4 lines up pretty much exactly with what the paywalled information article “OpenAI Developing ‘Garlic’ Model to Counter Google’s Recent Gains” said back in December, discussing “shallotpeat” being a new pretrain method (5.2), that had some issues and so was being currently refined into a model “garlic” (5.3/5.4) that could properly take on google, with the lessons learned being used as they started creation of a newer, bigger model (5.5).
5.2 being functionally complete, with 5.3 in post training, and pretraining in progress for 5.5 seems to match up very nice with the Dec 2 information article post date and subsequent release dates of 5.2 − 5.5. It is a bit muddied since OAI twitter “vagueposted” many garlic-themed memes around 5.2 release, but the timeline of 5.2 being garlic (still in post training in Dec 2nd, released Dec 11th) doesn’t make any sense, while shallotpeat being complete before Dec 2 and undergoing safety/deployment testing through Dec 10th seems much more reasonable.
If you don’t have have access to the article you can still get all the relevant snippets reposted on twitter, or resummarized by other outlets. There is also an even earlier information article that has the first mentions of shallotpeat (which was itself supposedly a refinement of the first, possibly completely failed internal pretrain attempt).
Relatedly, on Nov 11th Dwarkesh podcast, Illya made a reference to the rumors of a new pretraining method used for gemini 3.0, which is likely what was similarly used in the shallotpeat+ base models.
There’s an archived version of the Garlic article. I was only referencing bhalstead’s plot itself, how the GPT-5.2 line is clearly showing that something is wrong with learning (but again, maybe just elicitation) of the data from between mid-2024 and mid-2025. The dip in accuracy in this time window is much worse than with what’s clearly the mid-training for 5.1/5.0/4.1, which covers the data from the first half of 2024 for those models.
(The issue with elicitation might be analogous to how Gemini 3 Pro insists it’s impossible that it’s currently 2026, and argues that anything that takes place in 2026 must be fictional. Maybe if you ask a model that’s stuck in mid-2024 about someone who died in 2025, it’ll say that they didn’t die even if it knows when they did. Even if they die in the future, it doesn’t follow that they died yet in reality, by mid-2024, and it’s always mid-2024 for such a model even as after mid-training it knows some facts from 2025. This could be the dynamic under the surface that impacts elicitation, even if the model doesn’t visibly argue about 2025 not being real.)
I don’t know if you are referencing it, but your point about 5.2 midtraining failing and being fixed in 5.3/5.4 lines up pretty much exactly with what the paywalled information article “OpenAI Developing ‘Garlic’ Model to Counter Google’s Recent Gains” said back in December, discussing “shallotpeat” being a new pretrain method (5.2), that had some issues and so was being currently refined into a model “garlic” (5.3/5.4) that could properly take on google, with the lessons learned being used as they started creation of a newer, bigger model (5.5).
5.2 being functionally complete, with 5.3 in post training, and pretraining in progress for 5.5 seems to match up very nice with the Dec 2 information article post date and subsequent release dates of 5.2 − 5.5. It is a bit muddied since OAI twitter “vagueposted” many garlic-themed memes around 5.2 release, but the timeline of 5.2 being garlic (still in post training in Dec 2nd, released Dec 11th) doesn’t make any sense, while shallotpeat being complete before Dec 2 and undergoing safety/deployment testing through Dec 10th seems much more reasonable.
If you don’t have have access to the article you can still get all the relevant snippets reposted on twitter, or resummarized by other outlets. There is also an even earlier information article that has the first mentions of shallotpeat (which was itself supposedly a refinement of the first, possibly completely failed internal pretrain attempt).
Relatedly, on Nov 11th Dwarkesh podcast, Illya made a reference to the rumors of a new pretraining method used for gemini 3.0, which is likely what was similarly used in the shallotpeat+ base models.
There’s an archived version of the Garlic article. I was only referencing bhalstead’s plot itself, how the GPT-5.2 line is clearly showing that something is wrong with learning (but again, maybe just elicitation) of the data from between mid-2024 and mid-2025. The dip in accuracy in this time window is much worse than with what’s clearly the mid-training for 5.1/5.0/4.1, which covers the data from the first half of 2024 for those models.
(The issue with elicitation might be analogous to how Gemini 3 Pro insists it’s impossible that it’s currently 2026, and argues that anything that takes place in 2026 must be fictional. Maybe if you ask a model that’s stuck in mid-2024 about someone who died in 2025, it’ll say that they didn’t die even if it knows when they did. Even if they die in the future, it doesn’t follow that they died yet in reality, by mid-2024, and it’s always mid-2024 for such a model even as after mid-training it knows some facts from 2025. This could be the dynamic under the surface that impacts elicitation, even if the model doesn’t visibly argue about 2025 not being real.)