Daniel Kokotajlo comments on Daniel Kokotajlo’s Shortform

Daniel Kokotajlo 1 Oct 2025 20:36 UTC
6 points
0
Really interesting theory, and thanks for doing this investigation. I’m having trouble seeing how homophones explain the OP’s stuff though.
If I imagine that “disclaim” means “claim” that maybe helps explain the first snippet on the left, but doesn’t seem to help with the one on the right? Also, do you have a theory about what illusions, synergy, vantage, marinade, etc. might mean?
- StanislavKrym 1 Oct 2025 21:05 UTC
  1 point
  0
  Parent
  I would guess that it has something to do with terms being misused in the training data. IIRC the CoTs of OpenAI’s models solving the IMO 2025 (or was it CoTs of GPT-5? IIRC you also commented about models getting distracted and filling the CoT with gibberish) contained an army of dots.
  Returning to your observation, disclaimers are used right before meaningless warnings à-là “all characters are at least 18 years old” in a setting resembling a high school. As for synergy, it might have been a term applied to many other contexts by pseudoscientists.
  P.S. What if OpenAI uses other LLMs to paraphrase the CoTs and eliminate these artifacts?
  UPD: Kokotajlo’s comment is here and the army of dots was observed by METR in their evaluation of GPT-5, not the model which solved the IMO. However, IIRC, the model by OpenAI, unlike that by GDM, would regularly convince itself that it’s on the right track, as if OpenAI’s model wasn’t confident in its proofs.