Coagulopath comments on Vladimir_Nesov’s Shortform

Coagulopath 28 Apr 2026 11:31 UTC
4 points
0
That’s a clever idea!
Can someone explain why many models have slowly-decaying lines? I would have expected sharp drop-offs—knowledge falling to zero after training data ends. In what situation does a model (like GPT-5.2) fall from 0.5 to sub 0.1 accuracy, and stay there for seemingly half a year?)
I’m also surprised that old and obsolete GPT-4x models seem to be broadly outcompeting the GPT-5x line. Am I missing something? Are refusals being counted as failures?
I suspect a few different variables are getting mixed together—a model’s raw intelligence, its willingness to provide a specific date, its willingness to confabulate when it doesn’t know, etc.
- papetoast 29 Apr 2026 1:57 UTC
  1 point
  0
  Parent
  - GPT 5.2 is dropping before its knowledge cutoff.
  - The decays are probably because there are less training data about recent deaths, and that the pre-training may have started before the knowledge cutoff.
  - Older models having better rote memorization on slightly obscure facts isn’t that surprising imo. It is not something that has a lot of optimization pressure.
  - Having multiple variables mixed don’t seem like a big issue for detecting ancestry. False positives will still be highly unlikely—different pretrains will probably have different “forgetting curves”.