1a3orn comments on Daniel Kokotajlo’s Shortform

1a3orn 1 Oct 2025 19:15 UTC
5 points
−2
Ah this is fucking great thanks for the ping.

Hrrrrrrm interesting that it’s on the one optimized for math. Not sure what it’s suggestive of—maybe if you’re pushing hard for math only RLVR it just drops language English competence in the CoT, because not using it as much in the answers? Or maybe more demanding?

....

And, also—oh man, oh man oh man this is verrrry interesting.

也不知到 → is a phonetic sound-alike for 也不知道, DeepSeek tells me they are perfect fucking homophones. So: What is written is mostly nonsensical, but it sounds exactly like “also [interjection] I don’t know,” a perfectly sensible phrase.

Which is fascinating because you see the same thing in O3 transcripts, where it uses words that sound a lot like the (apparently?!?!) intended word. “glimpse” → “glimps” or “claim” → “disclaim.” And I’ve heard R1 does the same.

So the apparent phenomenon here is that potentially over 3 language models we see language shift during RL towards literal homophones (!?!?!?)
- Daniel Kokotajlo 1 Oct 2025 20:36 UTC
  10 points
  0
  Parent
  Really interesting theory, and thanks for doing this investigation. I’m having trouble seeing how homophones explain the OP’s stuff though.
  If I imagine that “disclaim” means “claim” that maybe helps explain the first snippet on the left, but doesn’t seem to help with the one on the right? Also, do you have a theory about what illusions, synergy, vantage, marinade, etc. might mean?
  - StanislavKrym 1 Oct 2025 21:05 UTC
    1 point
    0
    Parent
    I would guess that it has something to do with terms being misused in the training data. IIRC the CoTs of OpenAI’s models solving the IMO 2025 (or was it CoTs of GPT-5? IIRC you also commented about models getting distracted and filling the CoT with gibberish) contained an army of dots.
    Returning to your observation, disclaimers are used right before meaningless warnings à-là “all characters are at least 18 years old” in a setting resembling a high school. As for synergy, it might have been a term applied to many other contexts by pseudoscientists.
    P.S. What if OpenAI uses other LLMs to paraphrase the CoTs and eliminate these artifacts?
    UPD: Kokotajlo’s comment is here and the army of dots was observed by METR in their evaluation of GPT-5, not the model which solved the IMO. However, IIRC, the model by OpenAI, unlike that by GDM, would regularly convince itself that it’s on the right track, as if OpenAI’s model wasn’t confident in its proofs.