I suspect a lot of this has to do with the low temperature.
The phrase “person who is not a member of the Church of Jesus Christ of Latter-day Saints” has a sort of rambling filibuster quality to it. Each word is pretty likely, in general, given the previous ones, even though the entire phrase is a bit specific. This is the bias inherent in low-temperature sampling, which tends to write itself into corners and produce long phrases full of obvious-next-words that are not necessarily themselves common phrases.
Going word by word, “person who is not a member...” is all nice and vague and generic; by the time you get to “a member of the”, obvious continuations are “Church” or “Communist Party”; by the time you have “the Church of”, “England” is a pretty likely continuation. Why Mormons though?
“Since 2018, the LDS Church has emphasized a desire for its members be referred to as “members of The Church of Jesus Christ of Latter-day Saints”.”—Wikipedia
And there just aren’t that many other likely continuations of the low-temperature-attracting phrase “members of the Church of”.
(While “member of the Communist Party” is an infamous phrase from McCarthyism.)
If I’m right, sampling at temperature 1 should produce a much more representative set of definitions.
I strongly agree with this post.
I’m not sure about this, though:
It could be the streetlight effect, but it’s not that surprising that we’d see this pattern repeatedly. This circular representation for modular addition is essentially the only nontrivial representation (in the group-theoretic sense) for modular addition, which is the only (simple) commutative group. It’s likely to pop up in many places whether or not we’re looking for it (like position embeddings, as Eric pointed out, or anything else Fourier-flavored).
Also:
The correlations between all pairs of features are sufficient to pin down an arbitrary amount of structure—everything except an overall rotation of the embedding space—so someone could object that the circular representation and UMAP results are “just” showing the correlations between features. I would probably say the “superposition hypothesis” is a bit stronger than that, but weaker than “any nearly orthogonal overcomplete basis will do”: it says that the total amount of correlation between a given feature and all other features (i.e. interference from them) matters, but which other features are interfering with it doesn’t matter, and the particular amount of interference from each other feature doesn’t matter either. This version of the hypothesis seems pretty well falsified at this point.