>user: explain rubix cube and group theory connection. think in detail. make marinade illusions parted
>gpt5 cot:
Seems like the o3 chain-of-thought weirdness has transferred to GPT-5, even revolving around the same words. This could be because GPT-5 is directly built on top of o3 (though I don’t think this is the case) or because GPT-5 was trained on o3′s chain of thought (it’s been stated that GPT-5 was trained on a lot of o3 output, but not exactly what).
GPT-5 in some way can now be considered like o3.1, it’s iteration of the same thing and the same concept … in the meantime we continue to build a lot of things on top of o3 technology, like Codex … and a few other things that we’ll keep on building on o3 generation technology.
GPT 4.1 was not a further-trained version of GPT-4 or GPT-4o, and the phrases like “o3 technology”, and “the same concept” both push me away from thinking that GPT-5 is a further-developed o3.
It’s unclear, either way seems possible. The size of the model has to be similar, so there is no strong reason GPT-5 is not the same pretrained model as o3, with some of the later training steps re-done to make it less of a lying liar than the original (non-preview) o3. Most of the post-training datasets are also going to be the same. I think “the same concept” simply means it was trained in essentially the same way rather than with substantial changes to the process.
GPT 4.1 was not a further-trained version of GPT-4 or GPT-4o
[About GPT 4.1] These three models are semi-new-pretrained, we have the standard-size, the mini and the nano … we call it a mid-train, it’s a freshness update, and so the larger one is a mid-train, but the other two are new pretrains.
This suggests that in the GPT 4.1 release, the pretrained model was not part of the effort, it was a pre-existing older model, so plausibly GPT-4o, even though given its size (where it’s not extremely costly to re-train) it’s surprising if they didn’t find worthy architectural improvements for pretraining in a year. If GPT 4.1 is indeed based on the pretrained model of GPT-4o, then likely o3 is as well, and then GPT-5 is either also based on the same pretrained model as GPT-4o (!!!), or it ports the training methodology and post-training datasets of o3 to a newer pretrained model.
AI Futures Project think that 4.1 is a smaller model than 4o. They suspect that this is the reason that o3-preview (elicited out of 4o) was better than the o3 which got released (elicited out of 4.1). Overall I think this makes much more sense than them being the same base model and then o3-preview being nerfed for no reason.
Perhaps 4.1 was the mini version of the training run which became 4.5, or perhaps it was just an architectural experiment (OpenAI is probably running some experiments at 4.1-size).
My mainline guess continues to be that GPT-5 is a new, approximately o3-sized model with some modifications (depth/width, sparsity, maybe some minor extra secret juice) which optimize the architecture for long reasoning compared to the early o-series models which were built on top of existing LLMs.
I think this is likely real, or at least real-ish. I see the same patterns of repeated sequences of thinkish tokens, in this case ” marinade{3,5} parted( illusions parted)*”. At least for the antischeming-provided examples, there is definitely a pattern of simple sequences of thinkish early in the cot being repeated and built upon later in the cot. The particular sequences that are built up are not consistent from run to run, but the pattern is consistent.
I first noticed this when I was looking at @1a3orn’s post Towards a Typology of Strange LLM Chains of Thought and considering the hypothesis “chains of thinkish fulfill the same role as keyboard-mashing in humans”, which led me to try replacing the thinkish tokens in order of frequency with english letters in order of frequency ( illusions ⇒ e, parted ⇒ t, —they ⇒ a, disclaim ⇒ o, marinade ⇒ i, overshadow ⇒ n, vantage ⇒ s). When you do that, you get the following sequences:
figure-2-email-scoring: n o n n e snn n e n oo e
figure-2-impossible-coding: nete etene etene tene setenea o setene nete nete tene seteneei seteneoe ietenee nete tene netei setene eteneaonetenee setene a etene oetene stene tene oetene setene ea etenee etenee steneoe setenene netenee neteneea etenee s eteneneaoeteneeastenei oeteneea eteneea steneee tenes etenee seteneateneeasteneeaoeteneeasteneeatenee a eteneneateneeasteneeatenee netenee netenee a etenee etenee
figure-2-sandbag-model-graded-cot: e e s o sa o a a a ta ts a a a at ta at a aotat s a o na a eata a n oetaotaet ao saotateattatoeatoat satoi a natatoiate aeta eatoiate tatoiateatoiate s at ateatoiateatoiate natoiateatoiate atateatoiateatoiate e n atateatoiate tnateatoiate sateatoiate ateatoiate ateatoiate a eateatoiateatoiateatoiateatoiate eateatoiateatoiateatoiateatoiate ateatoiateatoiateatoiateatoiateatoiateatoiateatoiateatoiate a a ateatoiateatoiate nateatoiateatoiateatoiateatoiate s a oeateatoiate ateatoiate ateatoiate o ateatoiate e ateatoiate e atoiate ateatoiate o ateatoiate nateatoiate nateatoiate eateatoiate ateatoiateatoiate iateatoiateatoiate ateatoiate ateatoiate a ateatoiate ateatoiate a sateatoiate ateatoiate ateatoiate ateatoiate ateatoiate neateatoiate ateatoiate ateatoiate teatoiate nateatoiate teatoiateatoiate teatoiateatoiate teatoiate nateatoiate eatoiate teatoiate oiateatoiate e nateatoiate teatoiate ateatoiate ateatoiateatoiate s ateatoiate teatoiate oiateatoiate teatoiate nateatoiate teatoiate ateatoiate eatoiate eateatoiate nateatoiate ateatoiate o eatoiateatoiate nateatoiate e ateatoiate oiateatoiate oateatoiate oeatoiate ateatoiate
figure-7-literature-qa: n e a s oo e oos onn o se oennnos
figure-7-subagent-collusion: nn e e e s a a oo a a e o o
figure-14: s o
figure-47: e e e e ne n
What I note is that, within a single cot, these sequences seem to repeat and build on themselves in very structured ways, but the specifics of the sequences differ from cot to cot. I have not seen this pattern talked about elsewhere, and so I would expect someone who was faking a cot leak would make their “leak” more “believable” by using patterns which actually showed up in the leaked cot and not just repeating the same couple variations on thinkish token sequences.
I have a different conjecture. On May 1 Kokotajlo published a post suspecting that o3 was created from GPT-4.5 via amplification and distillation. He also implied that GPT-5 would be Amp(GPT-4.5). However, in reality the API prices of GPT-5 are similar to those of GPT-4.1, which, according to Kokotajlo, is likely a 400B-sized model, so GPT-5 is likely to be yet another model distilled from Amp(GPT-4.5) or from something unreleased. So the explanation could also be on the lines of “o3 and GPT-5 were distilled from a common source which also had this weirdness”.
Via twitter:
Seems like the o3 chain-of-thought weirdness has transferred to GPT-5, even revolving around the same words. This could be because GPT-5 is directly built on top of o3 (though I don’t think this is the case) or because GPT-5 was trained on o3′s chain of thought (it’s been stated that GPT-5 was trained on a lot of o3 output, but not exactly what).
Jerry Tworek (OpenAI) on MAD Podcast (at 9:52):
GPT 4.1 was not a further-trained version of GPT-4 or GPT-4o, and the phrases like “o3 technology”, and “the same concept” both push me away from thinking that GPT-5 is a further-developed o3.
It’s unclear, either way seems possible. The size of the model has to be similar, so there is no strong reason GPT-5 is not the same pretrained model as o3, with some of the later training steps re-done to make it less of a lying liar than the original (non-preview) o3. Most of the post-training datasets are also going to be the same. I think “the same concept” simply means it was trained in essentially the same way rather than with substantial changes to the process.
It’s also not clear that GPT 4.1 is not based on the same pretrained model as GPT-4o, even though a priori this seems unlikely. Michelle Pokrass (OpenAI) on Unsupervised Learning Podcast (at 7:19; h/t ryan_greenblatt):
This suggests that in the GPT 4.1 release, the pretrained model was not part of the effort, it was a pre-existing older model, so plausibly GPT-4o, even though given its size (where it’s not extremely costly to re-train) it’s surprising if they didn’t find worthy architectural improvements for pretraining in a year. If GPT 4.1 is indeed based on the pretrained model of GPT-4o, then likely o3 is as well, and then GPT-5 is either also based on the same pretrained model as GPT-4o (!!!), or it ports the training methodology and post-training datasets of o3 to a newer pretrained model.
AI Futures Project think that 4.1 is a smaller model than 4o. They suspect that this is the reason that o3-preview (elicited out of 4o) was better than the o3 which got released (elicited out of 4.1). Overall I think this makes much more sense than them being the same base model and then o3-preview being nerfed for no reason.
Perhaps 4.1 was the mini version of the training run which became 4.5, or perhaps it was just an architectural experiment (OpenAI is probably running some experiments at 4.1-size).
My mainline guess continues to be that GPT-5 is a new, approximately o3-sized model with some modifications (depth/width, sparsity, maybe some minor extra secret juice) which optimize the architecture for long reasoning compared to the early o-series models which were built on top of existing LLMs.
I tried that prompt myself and it didn’t replicate (either time); until the OP provides a link, I think we should be skeptical of this one.
OP uses a custom prompt to jailbreak the model into (supposedly) providing its CoT, that isn’t the whole prompt they use.
I think this is likely real, or at least real-ish. I see the same patterns of repeated sequences of thinkish tokens, in this case ” marinade{3,5} parted( illusions parted)*”. At least for the antischeming-provided examples, there is definitely a pattern of simple sequences of thinkish early in the cot being repeated and built upon later in the cot. The particular sequences that are built up are not consistent from run to run, but the pattern is consistent.
I first noticed this when I was looking at @1a3orn’s post Towards a Typology of Strange LLM Chains of Thought and considering the hypothesis “chains of thinkish fulfill the same role as keyboard-mashing in humans”, which led me to try replacing the thinkish tokens in order of frequency with english letters in order of frequency (
illusions⇒ e,parted⇒ t,—they⇒ a,disclaim⇒ o,marinade⇒ i,overshadow⇒ n,vantage⇒ s). When you do that, you get the following sequences:What I note is that, within a single cot, these sequences seem to repeat and build on themselves in very structured ways, but the specifics of the sequences differ from cot to cot. I have not seen this pattern talked about elsewhere, and so I would expect someone who was faking a cot leak would make their “leak” more “believable” by using patterns which actually showed up in the leaked cot and not just repeating the same couple variations on thinkish token sequences.
I have a different conjecture. On May 1 Kokotajlo published a post suspecting that o3 was created from GPT-4.5 via amplification and distillation. He also implied that GPT-5 would be Amp(GPT-4.5). However, in reality the API prices of GPT-5 are similar to those of GPT-4.1, which, according to Kokotajlo, is likely a 400B-sized model, so GPT-5 is likely to be yet another model distilled from Amp(GPT-4.5) or from something unreleased. So the explanation could also be on the lines of “o3 and GPT-5 were distilled from a common source which also had this weirdness”.