I think this is likely real, or at least real-ish. I see the same patterns of repeated sequences of thinkish tokens, in this case ” marinade{3,5} parted( illusions parted)*”. At least for the antischeming-provided examples, there is definitely a pattern of simple sequences of thinkish early in the cot being repeated and built upon later in the cot. The particular sequences that are built up are not consistent from run to run, but the pattern is consistent.
I first noticed this when I was looking at @1a3orn’s post Towards a Typology of Strange LLM Chains of Thought and considering the hypothesis “chains of thinkish fulfill the same role as keyboard-mashing in humans”, which led me to try replacing the thinkish tokens in order of frequency with english letters in order of frequency ( illusions ⇒ e, parted ⇒ t, —they ⇒ a, disclaim ⇒ o, marinade ⇒ i, overshadow ⇒ n, vantage ⇒ s). When you do that, you get the following sequences:
figure-2-email-scoring: n o n n e snn n e n oo e
figure-2-impossible-coding: nete etene etene tene setenea o setene nete nete tene seteneei seteneoe ietenee nete tene netei setene eteneaonetenee setene a etene oetene stene tene oetene setene ea etenee etenee steneoe setenene netenee neteneea etenee s eteneneaoeteneeastenei oeteneea eteneea steneee tenes etenee seteneateneeasteneeaoeteneeasteneeatenee a eteneneateneeasteneeatenee netenee netenee a etenee etenee
figure-2-sandbag-model-graded-cot: e e s o sa o a a a ta ts a a a at ta at a aotat s a o na a eata a n oetaotaet ao saotateattatoeatoat satoi a natatoiate aeta eatoiate tatoiateatoiate s at ateatoiateatoiate natoiateatoiate atateatoiateatoiate e n atateatoiate tnateatoiate sateatoiate ateatoiate ateatoiate a eateatoiateatoiateatoiateatoiate eateatoiateatoiateatoiateatoiate ateatoiateatoiateatoiateatoiateatoiateatoiateatoiateatoiate a a ateatoiateatoiate nateatoiateatoiateatoiateatoiate s a oeateatoiate ateatoiate ateatoiate o ateatoiate e ateatoiate e atoiate ateatoiate o ateatoiate nateatoiate nateatoiate eateatoiate ateatoiateatoiate iateatoiateatoiate ateatoiate ateatoiate a ateatoiate ateatoiate a sateatoiate ateatoiate ateatoiate ateatoiate ateatoiate neateatoiate ateatoiate ateatoiate teatoiate nateatoiate teatoiateatoiate teatoiateatoiate teatoiate nateatoiate eatoiate teatoiate oiateatoiate e nateatoiate teatoiate ateatoiate ateatoiateatoiate s ateatoiate teatoiate oiateatoiate teatoiate nateatoiate teatoiate ateatoiate eatoiate eateatoiate nateatoiate ateatoiate o eatoiateatoiate nateatoiate e ateatoiate oiateatoiate oateatoiate oeatoiate ateatoiate
figure-7-literature-qa: n e a s oo e oos onn o se oennnos
figure-7-subagent-collusion: nn e e e s a a oo a a e o o
figure-14: s o
figure-47: e e e e ne n
What I note is that, within a single cot, these sequences seem to repeat and build on themselves in very structured ways, but the specifics of the sequences differ from cot to cot. I have not seen this pattern talked about elsewhere, and so I would expect someone who was faking a cot leak would make their “leak” more “believable” by using patterns which actually showed up in the leaked cot and not just repeating the same couple variations on thinkish token sequences.
I think this is likely real, or at least real-ish. I see the same patterns of repeated sequences of thinkish tokens, in this case ” marinade{3,5} parted( illusions parted)*”. At least for the antischeming-provided examples, there is definitely a pattern of simple sequences of thinkish early in the cot being repeated and built upon later in the cot. The particular sequences that are built up are not consistent from run to run, but the pattern is consistent.
I first noticed this when I was looking at @1a3orn’s post Towards a Typology of Strange LLM Chains of Thought and considering the hypothesis “chains of thinkish fulfill the same role as keyboard-mashing in humans”, which led me to try replacing the thinkish tokens in order of frequency with english letters in order of frequency (
illusions⇒ e,parted⇒ t,—they⇒ a,disclaim⇒ o,marinade⇒ i,overshadow⇒ n,vantage⇒ s). When you do that, you get the following sequences:What I note is that, within a single cot, these sequences seem to repeat and build on themselves in very structured ways, but the specifics of the sequences differ from cot to cot. I have not seen this pattern talked about elsewhere, and so I would expect someone who was faking a cot leak would make their “leak” more “believable” by using patterns which actually showed up in the leaked cot and not just repeating the same couple variations on thinkish token sequences.