13. an X that isn’t an X
I think this pattern is common because of the repetition. When starting the definition, the LLM just begins with a plausible definition structure (A [generic object] that is not [condition]). Lots of definitions look like this. Next it fills in some common [gneric object].Then it wants to figure out what the specific [condition] is that the object in question does not meet. So it pays attention back to the word to be defined, but it finds nothing. There is no information saved about this non-token. So the attention head which should come up with a plausible candidate for [condition] writes nothing to the residual stream. What dominates the prediction now are the more base-level predictive patterns that are normally overwritten, like word repetition (this is something that transformers learn very quickly and often struggle with overdoing). The repeated word that at least fits grammatically is [generic object], so that gets predicted as the next token.
Here are some predictions I would make based on that theory:
- When you suppress attention to [generic object] at the sequence position where it predicts [condition], you will get a reasonable condition.
- When you look (with logit lens) at which layer the transformer decides to predict [generic object] as the last token, it will be a relatively early layer.
- Now replace the word the transformer should define with a real, normal word and repeat the earlier experiment. You will see that it decides to predict [generic object] in a later layer.
Rationality framework: The Greenland effect:
Remember the first time, you looked at a world map: one thing that maybe cached your eye was Greenland: That huge Island, almost as big as Africa, up there in the north.
Now remember the first time, you took a closer look at a globe (or a non-Mercator projection for that matter) Greenland is a bit disappointing, isn’t it? Doesn’t seem to be THAT big at all.
Now remember that time in geography class, when you held presentations on the countries in Europe: In comparison to these folks, the icy planes of Denmark´s pet island seem gigantic. Now, not as gigantic as Africa, but still…
Depending on how much time you spend with geography, I can well imagine that cycle going back and forth some more.
What is important here, is the following: even though your knowledge about the size of Greenland ever increased over your life, your emotional attitude “oh, quite big” or “nah, it´s an island, bruh” switched around quite a lot in both directions.
Now in the case of Greenland this is all well and fine, but other scenarios in can lead to pseudo disagreements or confused arguments: Beware the Greenland effect. Beware that your emotional dispossession towards an issue, often reflects your last update on that issue (which should vary unpredictably) and not your overall believes on an issue (which should converge).
Example of Greenland effects:
“The church is good, it teaches me about God”->”God is fake, the priest must be a moron, the world lied to me” → “These religious people are actually using a lot of their recouces to help people in need” → “all those religious charities are so ineffective.” …
“I can’t stop this project now, I have already invested so many recources”->”I know about sunk cost bias. I will abandon my projects, whenever they seem to be a bad Idea” → “I should carry through projects despite having downs: sunk cost faith.”…