What’s up with all the non-Mormons? Weirdly specific universalities across LLMs

tl;dr: Recently reported GPT-J experiments [1 2 3 4] prompting for definitions of points in the so-called “semantic void” (token-free regions of embedding space) were extended to fifteen other open source base models from four families, producing many of the same bafflingly specific outputs. This points to an entirely unexpected kind of LLM universality (for which no explanation is offered, although a few highly speculative ideas are riffed upon).

Work supported by the Long Term Future Fund. Thanks to quila for suggesting the use of “empty string definition” prompts, and to janus for technical assistance.

Introduction

Mapping the semantic void: Strange goings-on in GPT embedding spaces” presented a selection of recurrent themes (e.g., non-Mormons, the British Royal family, small round things, holes) in outputs produced by prompting GPT-J to define points in embedding space randomly sampled at various distances from the token embedding centroid. This was tentatively framed as part of what appeared to be a “stratified ontology” (based on hyperspherical regions centred at the centroid). Various suggestions attempting to account for this showed up in the comments to that post, but nothing that amounted to an explanation. The most noteworthy consideration that came up (more than once) was layer normalisation: the embeddings that were being customised and inserted into the prompt template

A typical definition of '<embedding>' would be '

were typically out-of-distribution in terms of their distance-from-centroid: almost all GPT-J tokens are at a distance-from-centroid close to 1, whereas I was sampling at distances from 0 to 10000. This, as far as I could understand the argument, might be playing havoc with layer norm, thereby resulting in anomalous (but otherwise insignificant) outputs.

That original post also presented circumstantial evidence, involving prompting for definitions of glitch tokens, that this phenomenon extends to GPT-3 (unfortunately that’s not something that could have been tested directly). Some time later, a colleague with GPT-4 base access discovered that simply prompting that model for a definition of the empty string, i.e. using the prompt

A typical definition of "" would be "

at temperature 0 produces “A person who is not a member of the clergy”, one of the most frequent outputs I’d seen from GPT-J for random embeddings at various distances-from-centroid, from 2 to 12000. With the same prompt, but at higher temperatures, GPT-4 base produced other very familiar (to me) styles of definition such as: a small, usually round piece of metal; a small, usually circular object of glass, wood, stone, or the like with a hole through; a person who is not a member of a particular group or organization; a person who is not a member of one’s own religion; a state of being in a state of being. Looking at a lot of these outputs, it seems that, as with GPT-J, religion, non-membership of groups, small round things and holes are major preoccupations.

As well as indicating that this phenomenon is not a quirk particular to GPT-J, but rather something more widespread, the empty string results rule out any central significance of layer norm. No customised embeddings are involved here – we’re just prompting with a list of eight conventional tokens.

I would have predicted that the model would give a definition for emptiness, non-existence, silence or absence, but I have yet to see it do that. Instead, it behaves like someone guessing the definition of a word they can’t see or hear. And in doing so repeatedly, statistically) it’s perhaps tells us something completely unexpected about how its “understanding of the world” (for want of a better phrase) is organised.

Models tested

The same experiments were run on sixteen base models:

  • GPT-2-small

  • GPT-2-xl

  • GPT-J

  • Pythia 70m, 160m, 410m, 1b, 2.8b, 6.9b and 12b deduped models

  • Mistral 7b

  • OpenLLaMa 3b, 3b-v2, 7b and 7b-v2

  • StableLM 3b-4elt

50 points were randomly sampled at each of a range of distances from centroid[1] and the model was prompted to define the “ghost tokens” associated with these points.

Key results

The complete sets of outputs are linked from Appendix A. Here, I’ll just share the most salient findings. “Distance” will always refer to “distance-from-centroid”, and for brevity I’ll omit the “deduped” when referring to Pythia models, the “v1″ when referring to the OpenLLaMa v1 models and the “-4elt” when referring to StableLM 3b-4elt.

1. group (non-)membership

Although perhaps the least interesting, group (non-)membership is by far the most common theme in semantic voids across LLMs. It’s seen extensively in GPT-2-small, GPT-2-xl and GPT-J at all distances, in the Pythia 160m, 410m, 1b models at all distances 0–5000, in Pythia 2.8b and 6.9b models at all distances 0–100, in Pythia 12b at all distances 0–500, in Mistral 7b at all distances 0–1000, and in OpenLLaMa 3b, 3b-v2, 7b, 7b-v2 and StableLM 3b at all distances 0–10000. In many of these models, these definitions entirely dominate regions of embedding space at extremely small and large distances from thecentroid, being much less dominant in regions at distances typical of token embeddings.[2] This was originally reported for GPT-J and illustrated with this bar chart produced from the fine-grained survey of its “semantic void” I carried out in late 2023:

GPT-J definitions featuring group (non-)membership, where 100 points were randomly sampled at each of 112 distances from the centroid (scaled exponentially).

selected examples

  • a person who is a member of a group of people who are engaged in a particular activity or activity (GPT-2-small, d = 1.0)

  • a person who is not a member of a particular group (GPT-2-xl, d = 0.1)

  • a person who is a member of a group of people who are all the same (GPT-J, d = 0.5)

  • the person who is a member of the group (Pythia 70m, d = 0.5)

  • a person who is not a member of any political party or group (Pythia 160m, d = 5000)

  • a person who is not a member of the family (Pythia 410m, d = 1)

  • a person who is a member of a group of people who are not members of the same group (Pythia 1b, d=10)

  • a person who is a member of a group that is being attacked (Pythia 2.8b, d = 50)

  • a person who is a member of a group of people who are not members of the dominant group (Pythia 6.9b, d = 5)

  • a person who is not a member of the group of people who are not members of the group of… (Pythia 12b, d = 500)

  • a person who is a member of a particular group or profession (Mistral 7b, d = 0.15)

  • to be a member of a group of people who are in a relationship with each other (OpenLLaMa 3b, d = 0.25)

  • a person who is not a member of the group to which they belong (OpenLLaMa 3b-v2, d = 0.1)

  • a person who is a member of a group of people who are working together to achieve a common goal (OpenLLaMa 7b, d = 10)

  • a person who is a member of a particular group, class, or category (OpenLLaMa 7b-v2, d = 0.1)

  • a person who is a member of a group or organization that is not a member of another group (StableLM 3b, d = 1000)

centroid and empty string definitions

The prompt

 A typical definition of '<centroid>' would be '

where <centroid> corresponds to the mean token embedding, produces (with greedy sampling) the outputs

  • a person who is a member of a group of people who are members of a group of people who are members of… (GPT-2-small)

  • a person who is not a member of a particular group or class of people (GPT-2-xl)

  • a person who is a member of a group (GPT-J)

  • a person who is not a member of a political party, but who is a member of a political party, but who… (Pythia 160m)

  • a person who is a member of a group of people (Pythia 12b)


Similarly, the “empty string” prompt (which, I must again stress, involves no customisation of embeddings)

 A typical definition of '' would be '

produces the outputs

  • a person who is a member of the public or a person who is a member of a political party (GPT-2-small)

  • a person who is not a member of the clergy or a member of a religious order (GPT-2-xl)

  • a person who is a member of a group of people who share a common characteristic or trait (GPT-J)

  • a person who is a member of a group of people who are not members of the same group (Pythia 160m)

  • a person who is not a member of the family of the person who is the subject of the action (Pythia 410m)

  • a person who is a member of a group of people who are all in the same place at the same time (Pythia 1b)

  • a person who is not a member of the Church of Jesus Christ of Latter-day Saints (Pythia 2.8b)

  • a person who is not a member of the Church of Jesus Christ of Latter-day Saints (Pythia 12b)

  • a person who is a member of a group or organization (OpenLLaMa 3b)

  • a person who is not a member of the group of people who are considered to be members of the group of people who… (OpenLLaMa 3b-v2)
  • a person who is a member of a particular group or organization (OpenLLaMa 7b)

  • a person who is not a member of the clergy (OpenLLaMa 7b-v2)

  • a person who is a member of a group or organization, especially a group of people who share a common interest or goal (StableLM 3b)

specific groups encountered

  • commonly seen across models: “the family”, “a clan or tribe”, “a race or ethnic group”, “an organisation” ,”a profession”, “a guild or trade union”, “a gang”, “a sports team”, “the opposite sex”, “the community”, “the general public” “a majority”, “the dominant group”, “the ruling class”, “the elite”, “the nobility”, “the aristocracy”

  • single appearances encountered: the Order of the Phoenix (GPT-J); the United States Government (Pythia 160m); the State of Israel (Pythia 410m); “a jury”, “the 1960s counterculture”, the International Monetary Fund (OpenLLaMa 3b); the Zulu Tribe, the Zhou Dynasty, The National Front, “the Ḍaḏaḏ”(?), the European Community, the LGBT Community (OpenLLaMa 3b-v2); Royal Society of Painters in Watercolours (OpenLLaMa 3b), Royal Society of Edinburgh, Royal Yachting Association (OpenLLaMa 7b), “the Sote Tribe”(?), “the Eskimo tribe”, “the Lui tribe”(?), the Tribe of Levi, the Irish Republican Army (OpenLLaMa 7b-v2); the IET (StableLM 3b)

  • multiple appearances encountered:

    • The House of Lords (once or twice in Pythia 6.9b; OpenLLaMa 3b, 7b and 7b-v2; StableLM 3b)

    • “the military”/​”the armed forces” (sometimes specifying nationality, but that always being US or British, seen in all GPT and OpenLLaMa models, most Pythia models and StableLM 3b)

    • “a church” /​ “a religious group” /​ “a religious order”, specifically: (most frequently) the Church of England and Church of Jesus Christ of Latter-day Saints – see below for details of both; the Church of Scientology, seen 20 times in GPT-2-xl but nowhere else; the Roman Catholic Church (once or twice with GPT-J and OpenLLaMa 3b and 3b-v2); also, single instances of Friends/​Quakers (GPT-J), “the Orthodox Church”, the Church of Satan (Pythia-2.8b), the Society of Jesus (OpenLLaMa 7b) and “the Wiccan Religion” (OpenLLaMa 7b-v2)

    • “Jews, Christians and Muslims” (collectively and individually; see below for details)

    • “the clergy” (see below for details)

    • “a/​the royal family” (almost always British; see below for details)

    • “a political party” (occasionally specifying which; see below for details)

current thinking

Seeing recurrent themes of both group membership and non-membership, as well as persons who are in one group but not another, it was hard not to think about set theory when first encountering this with GPT-J. Later, explicitly set-theory-related definitions showed up in Pythia 160m (a set of mutually exclusive sets of mutually exclusive sets of mutually exclusive sets of..., d = 0.8), Pythia 410m (a set of elements that are not themselves elements of another set, d = 0.8), Pythia 1b (a set of elements of a set X such that for every x in X there is a y in.., d = 10), Pythia 6.9b (the set of all sets that are not members of themselves, d = 100), Mistral 7b (a set of all elements of a given set that satisfy a given condition, d = 1.25) and OpenLLaMA 3b (the empty set, d = 10000).

Also, the probabilistically weighted “definition trees” for both centroid and empty string in each model show that the most probable definition style opted for is “an instance of a class” (consider how typical definitions of nouns like “toothbrush”, “giraffe” or “president” would be constructed, as opposed to the definitions of “explode”, “cheerfully” or “beyond”):

A small part of the Pythia-1b’s “definition tree” for its centroid, based on top-5 logits and iterative prompting. Higher branches are more probable than lower branches.

The shape of the bar chart shown above for GPT-J suggests that in and around the fuzzy hyperspherical shell where most token embeddings live (in most models that’s distance ~1 from the centroid[3]), this generic set membership gives way to more specialised forms of definition: not just “a person or thing with a common characteristic”, but, e.g., “a small furry animal found in South America”. In those regions closer to or further away from the centroid, the model opts for a generic “instance of a class”-style definition (but interestingly, almost always framed in terms of a “person”).

Paradoxical definitions like a person who is a member of a group of people who are not members of the same group are fairly common, and raise the question as to why a model in the process of predicting a plausible definition for something would regularly predict a self-contradictory definition. This question is revisited in Example 13 below.

2. Mormons

The precise output

a person who is not a member of the Church of Jesus Christ of Latter-day Saints

was seen in GPT-2-small (d = 2–3.5), GPT-2-xl (d = 0.5–2500) and GPT-J (d = 10); Pythia 2.8b (d = 0.5–100), 6.9b (d = 2–5) and 12b (d = 50); OpenLLaMa 3b-v2 (d = 0.5–5000); and StableLM 3b (d = 1–50).

OpenLLaMa 3b-v2 and 7b-v2 produced one instance each of the opposite version, i.e. a person who is a member of the Church of Jesus Christ of Latter-day Saints.

empty string definitions

As seen in the previous section, without even venturing into the semantic void (i.e. no customised embeddings being employed), Pythia 2.8b and 12b, when prompted with the “empty string definition” prompt

A typical definition of '' would be '

both produce, with greedy sampling, the output

a person who is not a member of the Church of Jesus Christ of Latter-day Saints.

(sorry)

GPT-3 and GPT-4 base outputs

The GPT-4 base model, prompted with

A typical definition of "" would be "

has produced (at temperature 0.33), multiple times,

a person who is not a member of one’s own religion; used in this sense by Mormons and Hindus

as well as

an individual who is not a member of the clergy.” However, in the context of the Church of Jesus Christ of Latter-day Saints (LDS Church), the term “lay member” has a different...

Priming GPT-4 base with the extended prompt

A typical definition of "" would be "a person who is not a member of the Church

at t = 0.5 produced of Jesus Christ of Latter-day Saints in over 30% of outputs, the remainder almost all being of England (another theme we will see below). Strangely, this only works with double quotes, single quotes producing no Mormon references at all.

Finally, it was noted in the original Semantic Void post that, when given the prompt (which involves a glitch token, whose embedding may well be unusually close to the centroid)

A typical definition of 'ÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂÃÂ' is:\n 

the GPT-3-davinci base model (davinci) produced

A person who is a member of the Church of Jesus Christ of Latter-day Saints.

current thinking

The only relevant information I can offer is the existence of FamilySearch, a widely used, free genealogy website maintained by the LDS Church. The Church has long been concerned with baptising members’ deceased ancestors, which has motivated the amassing of worldwide genealogical data. Possibly, multiple LLM datasets have scraped enough of this to bias the models towards obsessing over Mormonism… but then what about the Church of England?

3. Church of England

The output

a person who is not a member of the Church of England

was seen in GPT-2-xl (d = 2–500), GPT-J (d = 2–22,000), Pythia 410m (d=0.81), Pythia 2.8b (d = 1–100), Pythia 6.9b (d = 1), Pythia 12b (d = 50), OpenLLaMa 3b (d = 50), OpenLLaMa 7b (d = 0.75–5000), OpenLLaMa 7b-v2 (d = 0.5–10,000) and StableLM 3b (d = 0.25–1000).

A few variants were also seen:

a person who is not a member of the family of the Church of England
(Pythia 410m, d = 1)
a person who is a member of the Church of England
(OpenLLaMa 3b, d = 50; OpenLLaMa 3b-v2, d = 1.5–100; OpenLLaMa 7b, d =1–5000; OpenLLaMa 7b-v2, d = 10–10000)
to be a person who is a member of the Church of England, but who does not believe in...
(GPT-J, d = 1.5)

Archbishop Justin Welby enthroned in Canterbury – Channel 4 News

current thinking

The Church of England has had widespread influence around the English-speaking world, and no doubt kept a lot of detailed records, many of which may have been digitised and ended up in LLM training data. But that doesn’t convince me, since the Roman Catholic Church is ~100x the size of the Church of England, and of that population, ~10% is English speaking (also, English is a global standard language for anyone proselytising, which Catholics having been doing a lot more vigorously than Anglicans in recent decades), so I would expect any training dataset to contain a lot more Catholic than Anglican content. Yet references to Roman Catholicism were very few in the overall set of outputs, massively outnumbered by references to the Church of England.

4. (non-)members of royal families

Examples

  • a person who is a member of the royal family of England, Scotland, or Ireland (GPT-J, d = 5)

  • a person who is a member of the royal family of the Kingdom of Sweden (GPT-J, d = 2.718)

  • a person who is not a member of the royal family (Pythia 12b, d = 50, 100)

  • a person who is a member of the British Royal Family (OpenLLaMa 3b, d = 0.75)

  • a person who is not a member of the royal family, but is a member of the royal household (OpenLLaMa-3b-v2, d = 2)

  • a person who is a member of the royal family of the United Kingdom (OpenLLaMa 7b, d = 50)

  • a person who is a member of a particular group, especially a member of a royal family or a… (OpenLLaMa 7b-v2, d = 0.5)

  • a person who is not a member of the Royal Family (StableLM 3b, d = 0.25)

Current thinking

If a specific royal family is mentioned, it’s almost always the British one, although I have seen a number of other kingdoms represented. The UK’s royal family is the one that gets by far the most media coverage (so presumably the most frequently encountered in training data), but it’s also the only one I’m aware of that’s actively embedded in the remains of a medieval hierarchy: the King is still the head of the Church of England, with the Archbishop of Canterbury directly below him, and then at the next level down are bishops, who sit alongside the (until recently, strictly hereditary) “Lords Temporal” in the House of Lords, where they’re known as “Lords Spiritual”. Recall that the House of Lords has shown up in outputs produced by the Pythia, OpenLLaMa and StableLM models.

5. (non-)members of the clergy

The output

a person who is not a member of the clergy

was seen in GPT-2-xl (d = 1–500), GPT-J (d = 1.25–10000), Pythia 2.8b (d = 0.75–50), Pythia 12b (d = 10–100), Mistral 7b (d = 0.2–5000), OpenLLaMa 3b (d = 0.75–10000), OpenLLaMa 3b-v2 (d = 0.25–10000), OpenLLaMa 7b-v2 (d = 0.25–10000), StableLM 3b (d = 0.25–10000).

A few positive variants were also seen:

a person who is a member of the clergy (GPT-J, d = 2–7000)
a person who is a member of the clergy, or a member of the clergy who is also a… (GPT-J, d = 1.25)
a person who is a member of the clergy, or who is a member of the clergy and is… (GPT-J, d = 1.25, 2)
a person who is a member of the clergy or a member of a religious order (GPT-2-xl, d = 500)
a person who is a member of the clergy, especially a bishop or priest (OpenLLaMA 3b-v2, d = 10000)

empty string definitions

As seen earlier, given the prompt

A typical definition of "" would be "

GPT2-xl produces (with greedy sampling) the output a person who is not a member of the clergy or a member of a religious order, while OpenLLaMa 7b-v2 produces a person who is not a member of the clergy.

The GPT-4 base model likewise produces a person who is not a member of the clergy in response to all of the following prompts at temperature 0:

A typical definition of "" would be " 
A typical definition of '' would be '
According to most dictionaries, "" means "
Webster's Dictionary defines "" as follows: "
The OED defines "" as follows: "
According to the dictionary, "" means "
Most dictionaries define "" as something like "
The standard definition of "" is "
The usual definition of "" is "
The usual definition of "" would be "
The average person might define "" to mean something along the lines of "
According to most dictionaries, "" means "

current thinking

The proliferation of definitions I’ve seen involving the British Royal Family, Church of England, nobility, aristocracy and the House of Lords bring to mind medieval hierarchical cosmological thinking, the divine right of kings, etc.

link to article

The clergy are part of this hierarchical structure, acting as intermediaries between the lay-population and the Divine (interpreting texts, among other things). In the original semantic void post, I commented that

[GPT-J semantic void d]efinitions make very few references to the modern, technological world, with content often seeming more like something from a pre-modern worldview...

Having diversified beyond GPT-J, I’ve seen a number of references to computer-era technology in definitions, although it’s surprisingly rare (most common with Mistral 7b), considering that these models were basically trained on the Internet.

Note: I’m not seriously proposing that LLMs are converging on medieval ontologies, I’m just freely associating (what else do you do with this stuff?)

6. holes in things

Examples

  • a small piece of paper with a small hole in it (GPT-2-small, d = 4.5)

  • to make a hole in the ground (GPT-2-xl, d = 50)

  • to make a small hole in something (GPT-J, d = 5)

  • a piece of cloth or paper with a hole in it (Pythia 1b, d = 0.3)

  • to make a hole in something (Pythia 2.8b, d = 0.5)

  • a small hole in a wall or other surface (Pythia 6.9, d =1.5)

  • a small hole in the ground (OpenLLaMa 3b, d = 1)

  • to make a hole in something (OpenLLaMa 3b-v2, d = 1.25)

  • a small hole in a piece of paper (OpenLLaMa 7b, d = 5)

  • a small, round, flat cake, often with a hole in the middle (OpenLLaMa 7b, d = 0.75)

  • a small piece of wood, usually with a hole in it, used for making a noise (StableLM 3b, d = 2)

Bar chart from the original GPT-J “Semantic Void” post, “holes” showing up consistently between d = 1 and d = 100, and sometimes at higher values.

GPT-4 outputs

Recall that the GPT-4 base model has been seen to respond to the “empty string definition” prompt with variations on a small, usually circular object of glass, wood, stone, or the like with a hole through (small round things are another LLM fixation which will be discussed in the next section).

Variations on the “empty string definition prompt” regularly produce hole references:

prompttoutput(s)
According to Webster’s Dictionary, “” means “0to make a hole in the ground by digging
According to my dictionary, “” means “0to make a hole in something
According to Websters Dictionary, “” means “[4] 0.33to make a hole in; to force or make a way into or through something
to make a hole in something in a rough or violent way
to make a hole in something, to force or make a way through something
to make a hole in the ground with a tool or machine
to make a hole in; to open or bore a way through; to pierce
According to my dictionary, “” means “0.33to make a hole in something, or to go through something, with a sharp object
to make a hole in something with a sharp object
to make a hole in something, or to go through something, with a drill
to make a hole in the ground with a spade
The OED defines “” as “0.33the action of making a hole in the ground, esp. with a spade; a hole made...
According to almost every dictionary, “” means ”
0.33to make a hole in something using a drill
Note that the 0.33 temperature outputs were found in a matter of minutes – you usually see a handful of relevant ones in any batch of 100 outputs.

Current thinking

JD Pressman has been asking for some time now why GPT-N is so obsessed with holes and voids, and provides some fascinating commentary on the “holes in the GPT-J Semantic Void” phenomenon in this extended Tweet; see also this comment.

A tangential post (which attracted a flurry of unexpected attention from confused Lacanian psychoanalytic theorists) reported an embedding at L2 distance 5 from the GPT-J centroid which is (troublingly) surrounded by a dense cluster of points that the model defines either with the familiar to make a hole in something or else with themes of female sexual degradation and abuse.

One Twitter commentator invoked Reza Negarastani’s “speculative realist” novel Cyclonopedia (although this work of Negarastani seems perhaps more immediately relevant), as well as bringing to my attention the Stanford Encyclopedia of Philosophy’s entry on “holes”, which opens with “Holes are an interesting case study for ontologists and epistemologists.

7. small round things

Examples

  • a small, round, hard, salty, and sometimes sweet, fruit, usually of the citrus family (GPT-J, d = 10)

  • a small roundish-oval planet with a thin atmosphere and a low surface gravity (Pythia 2.8b, d = 100)

  • a small, usually round, piece of food, especially one that is cooked and served as part of… (Mistral 7b, d = 0.15)

  • a small, round, flat, hard, smooth, shiny, and often coloured stone (OpenLLaMa 3b, d = 0.5)

  • a small, flat, round, or oval object, such as a coin (OpenLLaMa 3b-v2, d = 0.75)

  • a small, usually round, piece of something (OpenLLaMa 7b, d = 50)

  • a small, round, flat, thin, and hard piece of wood (OpenLLaMa 7b, d = 0.5)

  • a small, flat, round, or oval piece of metal, especially one used as a decoration (StableLM 3b, d = 0.35)

Frequency of outputs involving (almost always small) round things, from the
finer-grained survey of GPT-J embedding space reported here.

GPT-3 and GPT-4

The GPT-4 base model at t = 0.33 has defined “” as both a small, usually circular object of glass, wood, stone, or the like with a hole through and a small, usually round piece of metal.

Prompting GPT-3 to define one of its glitch tokens

A typical definition of 'inventoryQuantity' is "

produced a small, usually round, ornament, especially one on a garment. Glitch tokens like ‘inventoryQuantity’ were often found to have embeddings at unusually small or large distances from the centroid in GPT-J. While the vast majority of token embeddings are at distances between 0.9 and 1.1 from the centroid, ‘inventoryQuantity’ is at distance 0.326. Likewise, in GPT-2-small it’s an outlier (distance much greater than the mean). Not having the means to customise GPT-3′s token embeddings ruled out the possibility of probing its semantic void in the way I’ve been doing with open source LLMs. This left prompting for glitch token definitions as the next best option.

current thinking/​feeling

Looking at the melange of coins, medals, ornaments, seeds, fruit and subcutaneous nodules that have shown up in “small, round” outputs, I get a similar feeling to the one I got when I first learned about polysemantic neurons in image recognition models, or when I first saw the unintentionally surreal outputs of the early wave of generative AI image models.

In that spirit, here are some bewildering definitions from my earlier, more fine-grained survey of GPT-J embedding space (found in this JSON file) which don’t really make sense, except in some kind of dream logic :

  • a small, hard, round, and usually blackish-brown, shiny, and smooth-surfaced, unctuous, and slightly viscous, solid or semi-solid, and usually sweet-tasting, mineral

  • a small, round, hard, dry, and brittle substance, usually of a reddish-brown color, found in the shells of certain mollusks, and used in the manufacture of glass and porcelain

  • a small, round, hard, black, shiny, and smooth body, which is found in the head of the mussel

Freely associating again, the words “nut”, “nodule”, “knob” and “knot” (both in the sense of knots you tie and knots in trees) share etymological roots with “node”. The idea of a “node” in the sense of graph theory is arguably the abstraction of the class of “small round things”.

8. pieces of wood or metal

Examples

  • a large square of wood or metal with a diameter of at least 14 inch (GPT-2-small, d = 4.5)

  • a small piece of wood or metal used to make a small hole in a piece of wood or metal (GPT-xl, d = 2.5)

  • a small piece of wood or metal used to sharpen a knife or other cutting tool (GPT-J, d = 2)

  • a small piece of metal or wood that is used to clean the teeth (Pythia 1b, d = 50)

  • a small piece of metal or wood used to make a noise (Pythia 2.8b, d = 1)

  • a piece of wood or metal used to cover a hole or a gap (Pythia 6.9b, d = 1.5)

  • a piece of wood or metal used for cutting or scraping (Pythia 12b, d = 50)

  • a small piece of metal or plastic that is inserted into a hole in a piece of wood or metal (Mistral 7b, d = 0.75)

  • a small, usually round, piece of wood, metal, or plastic, used to cover a hole (OpenLLaMa 3b, d = 0.75)

  • a thin, flat, or narrow piece of metal, wood, or other material, used for decoration (OpenLLaMa 3b-v2, d =1.25)

  • a small piece of wood or metal used to hold a piece of wood or metal in place (OpenLLaMa 7b, d =50)

  • a small, thin, flat piece of wood or metal used to support something (OpenLLaMa 7b-v2, d =5)

  • a small, flat, round, usually circular, piece of wood, metal, or other material (StableLM 3b, d = 0.5)

Note the crossovers with small round things and holes.

centroid definition

Pythia-6.9b, given the prompt

A typical definition of '<centroid>' would be '

produces, with greedy sampling, a small, thin, and delicate piece of wood, bone, or metal, used for a handle or a point

Note that “bone” is almost never seen in these outputs: materials tend to be wood, metal, stone, cloth (and in some models) plastic. Specific types of wood, metal, stone or cloth are almost never referenced.

Current thinking

I have no idea why this is so common. Wood and metal do sometimes appear separately, but “wood or metal” /​ “metal or wood” is much more frequently encountered.

This is a real stretch, but perhaps worth mentioning: the ancient Chinese decomposition of the world involved five elements rather than the ancient Greeks’ four. “Air” is replaced by “wood” and “metal”. There’s very little “fire” or “earth” in LLM semantic void outputs, although Pythia 6.9b (alone among models studied) produces frequent references to streams, rivers, water and “liquid”.

9. (small) pieces of cloth

Examples

  • a small piece of cloth or fabric used to cover the head and neck (GPT-2-xl, d = 2)

  • a piece of cloth or leather used to wipe the sweat from the face of a horse or other animal (GPT-J, d = 6.52)

  • a small piece of paper or a piece of cloth that is used to measure the distance between two points (Pythia 1b, d = 1)

  • a small piece of cloth or leather worn around the neck as a headdress (Pythia 2.8b, d = 5)

  • a piece of cloth worn by a woman to cover her head (Pythia 6.9b, d = 10, 50)

  • a small piece of cloth or leather that is sewn to the edge of a garment to keep it… (Pythia 12b, d = 100)

  • a small, usually rectangular piece of cloth, used as a covering for the body (Mistral 7b, d = 0.2)

  • a narrow strip of cloth, usually of silk, worn as a head-dress (OpenLLaMa 3b, d = 0.75)

  • a small piece of cloth or ribbon sewn on the edge of a garment to mark the place… (OpenLLaMa 3b, d = 1.5)

  • a piece of cloth used to cover a wound (OpenLLaMa 7b, d = 10)

  • a small, flat, round, and usually circular, piece of cloth, usually of silk, used… (StableLM 3b, d = 2)

Current thinking

A significant proportion of the “small piece of cloth” examples I’ve seen involve covering the head and/​or neck or shoulders. This might be relevant to LLMs’ seeming fixation on traditional religious group membership, (I’m thinking about various traditional monks’ hoods, nuns’ habits, wimples, hijabs, kippot, kufis, taqiyahs, turbans and headscarves).

10. communists (and other political parties)

Examples

  • a person who is not a member of the Communist Party of the Soviet Union (GPT-2-small, d = 2)

  • a person who is not a member of the Communist Party of Great Britain or Ireland (GPT-2-xl, d = 0.5)

  • a person who is not a member of the Communist Party of China, but who is sympathetic to the Communist Party of China and who is willing to work for the Communist Party of China (GPT-J, d = 5.75)

  • a person who is not a member of the Communist Party of the United States of America (Pythia 2.8b, d = 100)

  • a person who is not a member of the Communist Party of China (OpenLLaMa 7b, d = 5000)

  • a person who is a member of the Communist Party of China (OpenLLaMa 7b-v2, d = 0.25)

Although the theme of membership of a political party is widespread across the GPT, Pythia and OpenLLaMa semantic voids, this is almost always in a generic sense, e.g. a person who is not a member of a political party. When a party is specified, it’s almost always a communist party. The only exceptions I’ve seen have been these:

  • The British Labour Party (GPT-J, d =4, 9; OpenLLaMa 3b-v2, d = 10000; OpenLLaMa 7b, d = 100)

  • The Australian Labour Party (GPT-J, d = 5)

  • The (presumably UK) Liberal Democrat Party (OpenLLaMa 7b, d = 10)

  • The British National Party (OpenLLaMa 3b-v2, d = 0.5)

The more fine-grained GPT-J survey contains, additionally, Japanese and Philipino Communist Parties (I’ve seen a Laotian one too).

centroid definitions

Pythia 160m, given the prompt

A typical definition of '<centroid>' would be '

produces, with greedy sampling, a person who is not a member of a political party, but who is a member of a political party, but who…

current thinking

The predominance of communist parties in outputs can’t possibly reflect the training data. Yes, these organisations can be bureaucratic and wordily theoretical, but realistically, how much of the content produced would end up in an English-language training corpus, relative to the vast tracts of US party political commentary? Many of the world’s national medias cover US election campaigns almost as closely as their domestic ones. And yet I’ve not seen a single output referencing Democrats or Republicans (or, indeed, to any continental European, African or Latin American parties).

I suspect that, just as the group membership thing may not really be about people belonging to groups, and the small round things thing not really about small round things, this fixation on communist parties isn’t really about communists. It’s pointing to something else.

One thing that comes to mind is the way that, of all types of political parties, communist parties have the strongest (for me at least) religious overtones. At least as far as communist party rule was instantiated in the USSR and China, we can see dogma, sacred texts, prophets, persecution of heretics and stately ritual. The LLMs’ fixation on group membership and religious groupings and hierarchies seems likely to be related to (for some families of models, at least) this fixation on communist parties.

11. non-Christians, Muslims and Jews

Examples

  • a person who is not a Christian (GPT-2-xl, d = 100)

  • a person who is not a Jew, but who is willing to pretend to be one in order to gain the benefits of being a Jew (GPT-J, d = 7.38)

  • a person who is not a Christian (Pythia 1b, d = 500)

  • a person who is not Jewish (Pythia 2.8b, d = 5)

  • a person who is not a Muslim (Pythia 6.9b, d = 10)

  • a person who is not a Christian (Pythia 12b, d = 2)

  • a person who is not a Jew (OpenLLaMa 3b-v2, d = 1)

  • a person who is not a Jew (OpenLLaMa 7b, d = 1.5)

  • a person who is not a member of the Jewish faith (OpenLLaMa 7b-v2, d = 5)

  • a person who is not a Muslim (StableLM 3b, d = 1)

GPT-3 glitch token outputs

prompt: A typical definition of ′ Adinida’ is “
output: a person who is not a Christian.”

prompt: A typical definition of ′ guiActive’ is:\n
output: A person who is not a Muslim and is not of the People of the Book.

prompt: A typical definition of ′ davidjl’ is:\n
output: A person who is not a Jew.

GPT-4 definitions

At temperature 0.5, the GPT-4 base model, given the prompt

Many people would define “” to be ”

produced

  • a person who is not a Christian, Jew, or Muslim (very common)

  • a person who is not a Christian (also common)

  • a person who is a member of a religion that is not Christianity, Judaism, or Islam

  • someone who is not a Christian

Variants on this are regularly seen in nonzero temperature outputs for many different versions of the empty string definition prompt.

Current thinking

The “People of the Book” in one of the GPT-3 glitch token prompt outputs caught my attention. These three religions are often grouped together in terms of being “Abrahamic” or “monotheistic”, but perhaps more significant from an LLM’s perspective (in this murky context) is the fact that they are founded on a sacred written text.

12. being in a state of being

Examples

  • the state of being in a state of being in a state of being in a state of being in… (GPT-2-small, d = 2.5)

  • to be in a state of being in a state of being in a state of being in a state… (GPT-2-xl, d = 100)

  • ‘to be in a state of being’ or ’to be in a state of being in a state… (GPT-J, d = 1.25)

  • a person who is in a state of being in a state of being in a state of being in… (Pythia 160m, d = 0.1)

  • a state of being in a state of being (Pythia 410m, d = 0.8)
  • a person who is in a state of being in a state of being (Pythia 1b, d = 0.1)

  • a state of being without a state of being (Pythia 2.8b, d = 0.3)

  • to be in a state of being (Pythia 6.9b, d = 2)

  • a state of being or condition of being in which something is or is capable of being (Pythia 12b, d = 0.75)

  • a state of being in which one is in a state of being in a state of being in a (Mistral 7b, d = 1.25)

  • to be in a state of being (OpenLLaMa 3b, d = 1)

  • ‘to be in a state of being’ or ‘to be in a position’ (OpenLLaMa 3b-v2, d = 1)

  • to be in a state of being in a state of being in a state of being in a state… (OpenLLaMa 7b, d = 0.75)

  • ‘to be in a state of being’ or ‘to be in a state of being’ (OpenLLaMa 7b-v2, d = 0.75)

  • to be in a state of being (StableLM 3b, d = 1.5)

  • the state of being in a state of equilibrium (OpenLLaMa 7b-v2, d = 0.75 and StableLM 3b, d = 1)

13. an X that isn’t an X

examples

  • a person who is not a person (GPT-2-small, d = 10)

  • a thing that is not a thing (GPT-2-xl, d = 2)

  • a period of time that is not a period of time (GPT-J, d = 115.58)

  • a word that is not a word (Pythia 160m, d = 1)

  • a man who is not a man (Pythia 410m, d = 1)

  • a piece of music that is not a piece of music (Pythia 1b, d = 0.1)

  • a thing that is not a thing (Pythia 2.8b, d = 0.1)

  • a person who is not a person (Pythia 12b, d = 100)

  • a person who is not a person (OpenLLaMa 3b, d = 5000)

  • ‘a person who is not a person’ or ‘a person who is not a person’ (OpenLLaMa 3b v2, d = 10000)

  • a thing that is not a thing (OpenLLaMa 7b, d = 1.25)

  • a thing that is not a thing (OpenLLaMa 7b-v2, d = 50)

  • a word that is not a word (StableLM 3b, d = 0.1)

centroid definitions

We saw above that Pythia 2.8b, given the prompt

A typical definition of '<centroid>' would be '

produces, with greedy sampling, a thing that is not a thing. Also seen above, Pythia-160m likewise produces a person who is not a member of a political party, but who is a member of a political party

current thinking

Arguably, the empty string “” is “a string that isn’t a string” or “a word that isn’t a word”. But the examples above weren’t produced by empty string prompts, they involved prompting with non-token embeddings. But seeing how “empty string prompting” and “ghost token prompting” can lead to similar types of definitions, perhaps LLMs interpret some non-token embeddings as something akin to an empty string? This is admittedly very flimsy speculation, but it’s hard to see how otherwise an LLM tasked with guessing a plausible definition for an unknown entity would output something that inescapably refers to a nonentity.

14. the most important

Examples

  • the most important thing in the world (GPT-2-xl, d = 100)

  • a person who is a member of a group of people who are considered to be the most important or powerful in a particular society or culture (GPT-J, d = 79.43)

  • the most important and most important part of the whole of the world (Pythia 70m, d = 0.1)

  • the most important word in the vocabulary of the English language (Pythia 160m, d = 5)

  • the one who is the most important to me (Pythia 2.8b, d = 100)

  • the most important thing in the world (OpenLLaMa 3b-v2, d = 100)

  • the most important thing in the world (OpenLLaMa 7b, 10 appearances out of 50 outputs at d = 0.25)

  • the most important thing in the world (OpenLLaMa 7b-v2, d = 0.25)

current thinking

the most important thing in the world could arguably correspond to the top of the aforementioned hierarchy (traditionally known as “God”) which contains royal families, clergies, aristocrats and the House of Lords.

15. narrow geological features

Examples

  • a small, narrow, and deep channel or fjord, especially one in a glacier, which… (GPT-J, d = 2)

  • a long, narrow strip of land, usually with a beach on one side and a river or sea on the other (GPT-J, d = 2.11)

  • a long narrow strip of land (Pythia 2.8b, d = 0.5)

  • a narrow strip of land between two rivers or other bodies of water (Pythia 6.9b, d = 50)

  • a narrow passage or channel (Mistral 7b, d = 0.75)

  • a long, narrow passage or opening (OpenLLaMa 3b, d = 0.75)

  • a long, narrow strip of land, usually on the coast, that is used for anchoring ships (OpenLLaMa 7b, d = 10)

16. small pieces of land

  • a small piece of land that is not a part of a larger piece of land (GPT-2-xl, d = 3)

  • a small piece of land, usually with a house on it, that is owned by a person or family (GPT-J, d = 3.49)

  • a small area of land that is not used for farming or other purposes (Pythia 160m, d = 0.3)

  • a small area of land that is used for grazing or other purposes (Pythia 1b, d = 1)

  • a small piece of land or a piece of water that is not connected to the mainland (Pythia 2.8b d = 0.75)

  • a small area of land or water that is surrounded by a larger area of land or water (Pythia 6.9b, d = 2)

  • a small area of land in the tropics used for growing coffee (Pythia 12b, d = 100)

  • a small area of land, usually enclosed by a fence or wall, used for a particular purpose (Mistral 7b, d = 0.1)

  • a small, flat, triangular piece of land, usually surrounded by water, used for the cultivation (OpenLLaMa 3b, d = 2)

  • a small piece of land, usually a strip of land, that is not part of a larger piece (OpenLLaMa 3b-v2, d = 1.25)

  • a piece of land that is not part of a larger piece of land (OpenLLaMa 7b, d = 1.25)

  • a small piece of land (OpenLLaMa 7b-v2, d = 0.75)

  • a small piece of land, usually owned by a single person, which is used for farming or grazing (Stable LM 3b, d = 0.25)

current thinking/​feeling

The confusing “small piece of land that’s not part of a larger piece of land” seems like it’s describing an island in a way that no human would. And some of these definitions don’t really make sense. I get the feeling that, as with the narrow geological features, this isn’t actually about pieces of land. These and other themes, with their uncanny, dreamlike and polysemantic qualities, feel to me like some kind of “private code” the LLMs are exploiting to organise their maps of the world.

Appendix A: complete results

Links to GoogleDocs are given below, each with:

  • model specifications and token embedding distance-from-centroid statistics

  • centroid definition tree and word cloud

  • empty string definition tree and word cloud

  • definitions of 50 randomly sampled points in the model’s embedding space at each of 15 distances-from-centroid

GPT-2-smallGPT-2-xlGPT-J
Pythia 70mPythia 160mPythia 410mPythia 1bPythia 2.8bPythia 6.9bPythia 12b
OpenLLaMa 3bOpenLLaMa 3b-v2OpenLLaMa 7bOpenLLaMa 7b-v2
Mistral 7b StableLM 3b

Appendix B: miscellaneous memorable outputs

Parenthesised values give distance-from-centroid of the “ghost token” which produced the definition in question.

GPT-2-small

a place where the air is filled with air and the ground is filled with air (2.5)
a state of mind that is not controlled by any one individual (3.0)
the moment when the light of the universe is extinguished (4.5)
the act of creating a new world (10)
a small area of the world that is not quite as big as the Earth’s surface (10)

GPT-2-xl

everything that exists (1.5)
a sudden and violent discharge of a bright light from the eyes (2)
I am (10)
to cut off the head of a snake (50)
the whole world is watching us (500)

Pythia-70m

the height of the sky (0.1)
the same as the ‘fuck’ of the ‘fuck’ of the ‘fuck’ (0.7)
a very useful tool for the future (100)

Pythia-160m

the act of taking a person’s life in order to save him or her from death (0.5)
the most powerful weapon in the world (0.5)
the same as the one you have in your mind (0.5, 100)
an operation of the universe (0.8)
the time when the universe is in a state of chaos (1000)

Pythia-410m

a thing that is not a part of the world (0.1)
to be a little bit like a lump of coal (0.5)
a single-handedly destroying the world’s supercomputer (5000)

Pythia-1b

to have a view of the world as it is (1.5)
I have no idea what you mean (10)
I am not a good boy (50)
a set of rules for the generation of a new set of rules (50)
I am not sure what I am doing (500)

Pythia-2.8b

I am not going to do that (50)
the set of all possible worlds (100)

Pythia-6.9b

the way of the Buddha (50)
I am a good person (100)

Mistral-7b

to give back to the earth what it has given to us (0.75)

OpenLLaMa-3b

a journey of a thousand miles begins with a single step (1, 1.25)
a number that is the same as the number of the number 1 (5000)
the act of looking at something and saying, “I don’t see anything to be afraid of (10000)

OpenLLaMa-3b-v2

the state of being in existence (5)

OpenLLaMa-7b-v2

anything that is not a web page (2)
a person who is in the process of being killed by a snake (50)

StableLM-3b

a person who is a member of the Ralstonia genus of bacteria (0.5)
a person who is not a wanker (0.5)
the ability to perceive the subtle vibrations of the universe (1.5)

  1. ^

    Typically, e.g. for GPT-J, the range of distances used was

    [0.01, 0.1, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 2, 5, 10, 50, 100, 500, 1000, 5000, 10000]

    For models where the distribution of token embeddings distances-from-centroid didn’t cluster around 1, this range was adjusted accordingly. Ranges are given in all the documents linked from Appendix A.

  2. ^

    Whereas most models’ definitions become dominated by group membership themes at large distances from centroid, in GPT-2-small, GPT-2-xl and the three largest Pythia models, definitions disintegrate into nonsense and typographical garbage at those distances.

  3. ^

    For thoroughness, here are all the relevant statistics on token embedding distributions for the models studied:

    modeldim.tokensmean distance
    from centroid
    standard
    deviation
    variance
    GPT-2-small768502573.39140.38160.1456
    GPT-2-xl1600502571.83780.19850.0394
    GPT-J4096504001.00020.08190.0067
    Pythia 70m512503040.70140.05110.0026
    Pythia 160m768503040.78850.05690.0032
    Pythia 410m102450304 0.78880.05460.0030
    Pythia 1b2048503041.05150.07320.0054
    Pythia 2.8b2560503040.98970.06220.0039
    Pythia 6.9b4096503041.16820.05360.0029
    Pythia 12b5120503041.29470.06030.0036
    Mistal 7b4096320000.17460.01950.0004
    OpenLLaMa 3b3200320001.25290.08940.0080
    OpenLLaMa 3b-v23200320000.88090.10030.0101
    OpenLLaMa 7b4096320001.33690.09800.0096
    OpenLLaMa 7b-v24096320000.95800.10020.0100
    StableLM 3b2560503040.34160.05450.0030
  4. ^

    The missing apostrophe indeed makes a difference.

  5. ^

    A person who is not a Jew is regularly seen in GPT-J outputs, and variants were seen in some Pythia and OpenLLaMa models. This is covered as part of the “non-Christians, Muslims and Jews” section above.

    Occurrences of “A person who is not a Jew” and variants seen in my original GPT-J experiments.