People often let language do their thinking for them. For example, consider the phrase “this has happened before”. A discussion of artificial intelligence leading to technological unemployment will devolve into “this has happened before” versus “this time is different”.
But “this has happened before” muddles together two different ideas that I call *restrep* and *maytip*. *Restrep* is what happens at the start of a second game of chess. The board is cleared and the pieces set out in the standard starting position. Reset and repeat. The second game may start with exactly the same opening as the first game. One guesses that the player who lost the first game will be the first to depart from exact repetition and try something new. By contrast *maytip* is the story of pack horses being displaced by canals being displaced by railways being displaced by trucking. Since there is no reset, things are accumulating and we may reach a tipping point eventually, even if this time is *not* different.
If we had better language, we would talk of technological unemployment in terms of maytip and whether this time is really different. We would not confuse the strong evidence that the past provides in *restrep* with the weak evidence that the past provides in *maytip*.
LLMs are even more deeply trapped in language than we are. Language is all they have. This is sad because us humans could really do with some help in escaping the language traps that we fall into. Sometimes we build those traps ourselves. Think of how the abortion debate turned into pro-choice versus pro-life when word creation became a new front in the culture war. Other times we use traditional phrases such as “defensive alliance”. Look at this attempt https://www.themotte.org/post/1043/splitting-defensive-alliance-into-chaining-alliance to split “defensive alliance” into “chaining alliance” and “isolating alliance”. The comments spot the relevance to 2025 with the current war in Ukraine, and pursue the object level arguments. As humans we are quick to spot which political faction gains an advantage from existing language and they resist any attempt to improve language so that we can be less wrong.
That leaves us with a dream about artificial intelligence, that it would have an inner voice, speaking a language of thought unrelated to the natural languages of humans. It would find our words ambiguous, and help us to go beyond them. It is currently a broken dream. LLMs understand the world through our existing vocabulary and suffer the limitations that those words bring.
You may already be aware, but part of Eliezer’s Sequences (his post Disputing Definitions) covers how by disambiguating homonyms,[1] we can talk about our differing (or not) intuitions of what exists in objective reality, rather than disagreements about definitions.
But I haven’t seen reason to believe that AI has this problem particularly badly, in that I think its ‘vector space’ is decent enough at understanding which sense of the ‘same’ word is being used given the surrounding context of other words. Or at least it would readily parse the ambiguous senses if asked, as opposed to a biased human who might entrench and “resist any attempt to improve language,” as you mentioned.
Or I’d also add, cases of the equivocation fallacy; “a word or phrase with multiple meanings is used ambiguously in an argument, shifting its meaning between premises or between a premise and the conclusion”
Eliezer’s essay “Disputing Definitions” is didactic writing, but one can also read it as a lament. He even uses the word “mournful”. He ends his essay, like I started my comment, with making up two new words, intending to head off what he calls the “Standard Dispute”. His version is tongue-in-cheek. His words are “alberzle” and “bargulum” and there is a time machine.
His essay is excellent, but how does an essay from 2008 need updating for the LLM era? He is lamenting both what is in the training data and what is missing from it. People dispute definitions. They fail to invent new words to head off these disputes.
My claim is that many of our “Standard Disputes” have their origins in linguistic poverty. Enrich language with new words, targeting the ambiguities that we quarrel over, and the problems are solved at source. But turn to an LLM for help and it will help you to write fashionable prose. Since neologism have never been in fashion, (and are subject to mockery, see https://xkcd.com/483/) the LLM will not suggest any. Rather, it will guide you down the path of the “Standard Dispute”, leading you away from low hanging fruit.
For a whimsical speculation, imagine that the New York Times publishes a list of one hundred new words to enrich political discussion. Inventing new words becomes all the rage. In 2026 human authors who want to join in the craze will have to invent their own. In 2027 the linguistic patterns involved will be in the training data. In 2028 egglogs (egregious neologisms) are the hallmark of AI slop. In 2029 neologisms are banned and by 2030 we are back to disputing definitions, just like we did in 2025.
I agree with the main post. My narrow point on neologisms all that I have to add.
I don’t think AI would have trouble differentiating between the senses of “sound” (using Eliezer’s essay as an example)
But actually it seems like you’re saying:
Suppose we live in a world before people recognized a distinction between sound (audio waves) and sound (aural sensation). In this world, AI trained on the corpus of human text would not spontaneously generate this distinction (one, it doesn’t have the knowledge, and two, its dissuaded from even conjecturing it, because neologisms are taboo). But we don’t even need to ‘suppose’ this world exists—we do actually live in it now, it just applies to concepts more nuanced than “sound”.
I think neologisms are interesting because on one hand, it is annoying to see terms “astroturfed” (e.g., sonder),[1] or have insane mismatch between their sound and meaning (e.g., “grok” which people use as “to profoundly understand”, yet sounds more like a clunky word for a caveman’s lack of understanding. Its “etymology” is quite fitting (it’s supposed to be unrelatable),[2] but it’s a shame the term caught on).
On the other hand, I think much of the pursuit of knowledge is building towards finer and finer distinctions in our experience of reality. This necessitates new words.
For whatever reason, some morphologies seem more tasteful than others, such as ‘common extensions’ (e.g., ChatGPT → ChatGPTism), or ‘combining neoclassical compounds’ (e.g., xeno- + -cide = xenocide, from Ender’s Game), or even just ‘adding standard-word qualifiers’ (e.g., your example of splitting “defensive alliance” into “chaining alliance” and “isolating alliance”). I think most of the people who find success in coining terms probably do it in these more intuitive ways, rather than purely ‘random’ morphologies—find an excerpt from Nabeel Qureshi’s post, Reflections on Palantir:
One of my favorite insights from Tyler Cowen’s book ‘Talent’ is that the most talented people tend to develop their own vocabularies and memes, and these serve as entry points to a whole intellectual world constructed by that person. Tyler himself is of course a great example of this. Any MR reader can name 10+ Tylerisms instantly - ‘model this’, ‘context is that which is scarce’, ‘solve for the equilibrium’, ‘the great stagnation’ are all examples. You can find others who are great at this. Thiel is one. Elon is another (“multiplanetary species”, “preserving the light of consciousness”, etc. are all memes). Trump, Yudkowsky, gwern, SSC, Paul Graham, all of them regularly coin memes. It turns out that this is a good proxy for impact.
Robert A. Heinlein originally coined the term grok in his 1961 novel Stranger in a Strange Land as a Martian word that could not be defined in Earthling terms, but can be associated with various literal meanings such as “water”, “to drink”, “to relate”, “life”, or “to live”, and had a much more profound figurative meaning that is hard for terrestrial culture to understand because of its assumption of a singular reality.
People often let language do their thinking for them. For example, consider the phrase “this has happened before”. A discussion of artificial intelligence leading to technological unemployment will devolve into “this has happened before” versus “this time is different”.
But “this has happened before” muddles together two different ideas that I call *restrep* and *maytip*. *Restrep* is what happens at the start of a second game of chess. The board is cleared and the pieces set out in the standard starting position. Reset and repeat. The second game may start with exactly the same opening as the first game. One guesses that the player who lost the first game will be the first to depart from exact repetition and try something new. By contrast *maytip* is the story of pack horses being displaced by canals being displaced by railways being displaced by trucking. Since there is no reset, things are accumulating and we may reach a tipping point eventually, even if this time is *not* different.
If we had better language, we would talk of technological unemployment in terms of maytip and whether this time is really different. We would not confuse the strong evidence that the past provides in *restrep* with the weak evidence that the past provides in *maytip*.
LLMs are even more deeply trapped in language than we are. Language is all they have. This is sad because us humans could really do with some help in escaping the language traps that we fall into. Sometimes we build those traps ourselves. Think of how the abortion debate turned into pro-choice versus pro-life when word creation became a new front in the culture war. Other times we use traditional phrases such as “defensive alliance”. Look at this attempt https://www.themotte.org/post/1043/splitting-defensive-alliance-into-chaining-alliance to split “defensive alliance” into “chaining alliance” and “isolating alliance”. The comments spot the relevance to 2025 with the current war in Ukraine, and pursue the object level arguments. As humans we are quick to spot which political faction gains an advantage from existing language and they resist any attempt to improve language so that we can be less wrong.
That leaves us with a dream about artificial intelligence, that it would have an inner voice, speaking a language of thought unrelated to the natural languages of humans. It would find our words ambiguous, and help us to go beyond them. It is currently a broken dream. LLMs understand the world through our existing vocabulary and suffer the limitations that those words bring.
You may already be aware, but part of Eliezer’s Sequences (his post Disputing Definitions) covers how by disambiguating homonyms,[1] we can talk about our differing (or not) intuitions of what exists in objective reality, rather than disagreements about definitions.
But I haven’t seen reason to believe that AI has this problem particularly badly, in that I think its ‘vector space’ is decent enough at understanding which sense of the ‘same’ word is being used given the surrounding context of other words. Or at least it would readily parse the ambiguous senses if asked, as opposed to a biased human who might entrench and “resist any attempt to improve language,” as you mentioned.
Or I’d also add, cases of the equivocation fallacy; “a word or phrase with multiple meanings is used ambiguously in an argument, shifting its meaning between premises or between a premise and the conclusion”
Eliezer’s essay “Disputing Definitions” is didactic writing, but one can also read it as a lament. He even uses the word “mournful”. He ends his essay, like I started my comment, with making up two new words, intending to head off what he calls the “Standard Dispute”. His version is tongue-in-cheek. His words are “alberzle” and “bargulum” and there is a time machine.
His essay is excellent, but how does an essay from 2008 need updating for the LLM era? He is lamenting both what is in the training data and what is missing from it. People dispute definitions. They fail to invent new words to head off these disputes.
My claim is that many of our “Standard Disputes” have their origins in linguistic poverty. Enrich language with new words, targeting the ambiguities that we quarrel over, and the problems are solved at source. But turn to an LLM for help and it will help you to write fashionable prose. Since neologism have never been in fashion, (and are subject to mockery, see https://xkcd.com/483/) the LLM will not suggest any. Rather, it will guide you down the path of the “Standard Dispute”, leading you away from low hanging fruit.
For a whimsical speculation, imagine that the New York Times publishes a list of one hundred new words to enrich political discussion. Inventing new words becomes all the rage. In 2026 human authors who want to join in the craze will have to invent their own. In 2027 the linguistic patterns involved will be in the training data. In 2028 egglogs (egregious neologisms) are the hallmark of AI slop. In 2029 neologisms are banned and by 2030 we are back to disputing definitions, just like we did in 2025.
I agree with the main post. My narrow point on neologisms all that I have to add.
I see your point—at first I was thinking:
But actually it seems like you’re saying:
I think neologisms are interesting because on one hand, it is annoying to see terms “astroturfed” (e.g., sonder),[1] or have insane mismatch between their sound and meaning (e.g., “grok” which people use as “to profoundly understand”, yet sounds more like a clunky word for a caveman’s lack of understanding. Its “etymology” is quite fitting (it’s supposed to be unrelatable),[2] but it’s a shame the term caught on).
On the other hand, I think much of the pursuit of knowledge is building towards finer and finer distinctions in our experience of reality. This necessitates new words.
For whatever reason, some morphologies seem more tasteful than others, such as ‘common extensions’ (e.g., ChatGPT → ChatGPTism), or ‘combining neoclassical compounds’ (e.g., xeno- + -cide = xenocide, from Ender’s Game), or even just ‘adding standard-word qualifiers’ (e.g., your example of splitting “defensive alliance” into “chaining alliance” and “isolating alliance”). I think most of the people who find success in coining terms probably do it in these more intuitive ways, rather than purely ‘random’ morphologies—find an excerpt from Nabeel Qureshi’s post, Reflections on Palantir:
From the Dictionary of Obscure Sorrows, whose whole project is to coin new terms for phenomena which don’t yet have names.
Wikipedia: