No, Newspeak Won’t Make You Stupid

In George Orwell’s book 1984, he describes a totalitarian society that, among other initiatives to suppress the population, implements “Newspeak”, a heavily simplified version of the English language, designed with the stated intent of limiting the citizens’ capacity to think for themselves; everybody knows that when you have a thinking people, keeping a peoplegroup still and not angry is unpossible.

In short, the ethos of newspeak can be summarized as: “Minimize vocabulary to minimize range of thought and expression”. There’s no way such a simple idea could mean different things to different people, right? Well… there are two different, closely related, ideas, both of which the book implies, that are worth separating here.

The first (which I think is to some extent reasonable) is that by removing certain words from the language, which serve as effective handles for pro-democracy, pro-free-speech, pro-market concepts, the regime makes it harder to communicate and verbally think about such ideas… Although, if that was the only thing done by Orwell’s Oceania, it would work about as well as taking a sharp knife away from a toddler, while still leaving on the ground next to them a fully-loaded AK-47; people are adept at making themselves understood, even in the face of constraints to communication

The second idea, which I worry is an incorrect takeaway people may get from 1984, is that by shortening the dictionary of vocabulary that people are encouraged to use (absent any particular bias towards removing handles for subversive ideas), one will reduce the intellectual capacity of people using that variant of the language. However, since that idea is false, that definitely, 100% clearly makes it perfectly okay for a government to force Newspeak on its people, and that totally wouldn’t be a creepy overstepping of its power (I know, Poe’s Law says that it is utterly impossible for me to be sarcastic on the internet without somebody thinking that I actually believe it).

Cadence of Information is Constant

If you listen to a native Chinese speaker, then compare the sound of their speech to a native Hawaiian speaker, there are many apparent differences in the sound of the two languages. Chinese has a rich phonological inventory containing 19 consonants, 5 vowels, and quite famously, 4 different tones (pitch patterns) which are used for each syllable, for a total of 5400 (approximately) possible syllables, including diphthongs and multi-syllabic vowels. On the other hand, the Hawaiians’ phonemes all fell off the catamaran during a catastrophic event at some point on the long trip from Eastasia to Hawaii, so they only have 8 consonants, and 5 vowels, and no tones. Including diphthongs, there are 200 possible Hawaiian syllables.

One might naïvely expect that Mandarin speakers can communicate information more quickly than Hawaiian speakers, at a rate of 12.4 bits /​ syllable vs. 7.6 bits /​ syllable—however, this is neglecting the speed at which syllables are spoken- Imagine two fountains, one which emits a large stream of water slowly, the other which spews a thin ribbon of water flying at the speed of a Plaid Model S outracing a Ferrari. Even though the first fountain (Mandarin) has a much larger stream (bits /​ syllable), the two fountains output the same amount of water, because the second fountain (Hawaiian) is so much faster (many syllables per second). For this reason, Hawaiian and Mandarin are much closer to each other in speed of communication than their phonologies would suggest. [1]

This is because, in general, cadence of information is constant. Within a given language, within the same dialect, sure, you will see that in different contexts, a speaker may speak at a higher or lower tempo. But within any given context, within the same species and across different languages, we should expect that people will only be able to process and comprehend so many bits per second, and this rate of bits per second will be more or less the same no matter where you go. Therefore, you can increase how much you communicate with each word, or with each syllable, but you can only do so by reducing the speed at which you pronounce these words.

Bits per second will always be the same, so the number of words /​ minutes will be inversely proportional to the number of bits communicated per word. You can’t communicate faster or communicate more information by making your words more nuanced, since your audience only has so much processing power, so the cadence of the information you can communicate is constant.

Back to 1984. If we were to take out our scissors, and cut giant holes into the dictionary, so it became only 1/​20th the size it is now (while steering clear of the thoughtpolice and any bias in removal of words), what should we expect will happen? One may naïvely think, that just as banning the words “democracy”, “freedom”, and “justice” would inhibit people’s ability to think about Enlightenment Values, banning most of the words should inhibit our ability to think about most of the things.

But that is not what I would expect to see happen. One should expect to see compound words take the place of deprecated words, speaking speeds increased, and to accommodate the increased cadence of speech, tricky sequences of sounds will be elided (blurred /​ simplified), allowing for complex ideas to ultimately be communicated at a pace that rivals that of standard English.

Do We Need All These Words?

I recorded a version of the first section of Scott Alexander’s “Eight Short Studies On Excuses” which uses only the 1,000 most common English words, as counted by the Up-Goer Five Text Editor [2], and I read it at a cadence that provides the same information density of the original in the same amount of time it took for me to read the original. I was underwhelmed by how it came out, since limiting to 1,000 words doesn’t actually change how it sounds in most parts, with just a few turns of phrase sticking out as not being normal English. If I had tried a similar exercise with maybe 200 or 500 well-chosen words, I think that would have illustrated my point better.

But even so, this version of Yvain’s work illustrates an important point: It’s much easier for a non-anglophone to learn 1,000 words than to learn the entirety of the English vocabulary, and after having done so, they would be able to understand or produce an equivalent text. In practice, I don’t think we need more than 1,000 words to clearly communicate (we can probably even go much lower—Sona is an artificial language that has no more than 375 root words; while it never saw serious use, my experience studying it suggests that 375 radicals is sufficient to accomplish a full language, and I suspect even that’s not the limit).

So why do we have so many more words than we need? I think the answer comes down to signaling: Being able to use a wide variety of words demonstrates that one has learned all the words they use (plus, it also implies that they know many words of a similar difficulty level which they haven’t had a need to use), which signals that they have enough mental capacity in order to do so. Conversely, relying on a limited vocabulary, either in speaking or in the material one chooses to read /​ listen to, signals low intelligence (and I suspect from a historical perspective, also tags non-native/​fluent speakers as not belonging to your tribe), so if someone wants to signal intelligence (something all humans are designed to do subconciously), then they will gravitate towards using a rich cornupia of words, and shy away from styles of speech that synergyze with a limited lexicon. But while this explains why languages tend to have large vocabularies, this doesn’t mean that from a language design stand-point that you actually want or need a large vocabulary to effectively communicate or think.

If Someone Wants to Force You to Speak Newspeak, Run Away

It should go without saying, we shouldn’t lobby for our government to shorten the vocabulary we’re allowed to use. While I maintain that nothing bad would happen as a direct result of restricting our vocabulary (setting aside the thoughtpolice…), let’s just say that if the government thinks implementing newspeak is in the Overton window, then we’ve got much bigger problems on our hands.

Villiam responded to a previous version of this post saying: “How would you get stuff done if people won’t join you because you suck at signaling?”, which I wholeheartedly agree with. Oftentimes, it’s important to signal desirable qualities, and intentionally using a version of your language that is hard to learn can be an effective way to signal that you are a useful person to ally with or listen to. So, when you need to signal that you are smart, I won’t implore you to use simplified English. But perhaps there’s room for a forum where a sprach with a small lexicon is the norm, especially if the users have some other way of knowing that their fellow users are intelligent.

There are a few big problems facing the world today—the biggest of them, of course, is that AI will have almost certainly killed every single human by the end of this decade, a problem that unfortunately I have little idea how to effectively address right now. But below that, somewhere in the top 5 problems the world faces, is a very big problem: Even in this extraordinarily connected era, there are large cultural barriers between the major world power’s populations, especially between the US, China, and Russia. Most Americans consume media that is originally written in English, most Chinese consume media that is originally written in Chinese, and most Russians consume media that was produced in Russian. If history is any guide (which it is), and if AI doesn’t kill us by then (which it probably will), then it’s reasonable to presume that this will indirectly lead to the death of a large portion of the world’s population. There’s a good chance that includes you and me, too.

It would be valuable to try to break down the cultural barriers that exist between the great powers of the world, especially among those citizens who are most predisposed to nationalism, and no forum that operates in a high-prestige variant of one of the great languages will ever be well-suited to that purpose. For this reason, I would like to see a forum, with similar features to our beloved LessWrong, but which operates on a dialect or language with a small lexicon, something closer to the 375 radicals of Sona than the 1,000 words used in Upgoer-Five. If the goal is to attract people from a wide variety of cultures, and especially those who are most at risk of nationalistic thought, I feel that it would be appropriate that this shouldn’t be a simplified version of a major language, but rather should be a neutral language, which would make Sona (or an improved version of Sona) an ideal lingua franca for such a site [3]. I think one way to counteract the lack of signalling afforded by vocabulary is to require something resembling an IQ test to be passed before being able to comment or post on the site (though anyone can view it without needing to take any test), and people who do especially well on it may get some special badge that makes their name stand out.

Setting all that aside, even more than any practical proposals, I think the most valuable thing you will get from having read this post is simply having the principles I laid out in the first parts of this post in the back of your mind, ready to match with some unanticipated stimulus sometime in the future, and get a correct answer as a result of having remembered what I wrote here.

Footnotes

[1] I found a figure of 5.18 syllables /​ second for Mandarin, and while I did not find a figure for Hawaiian, I found that Japanese, which has a similar, though slightly more complex, phonology to Hawaiian is 7.84 syllables per second. This is from a secondary source, and they did not provide a reference other than “a study by researchers at the Université de Lyon”, without naming the authors or paper, unfortunately. Multiplying these numbers together, assuming a similar cadence for Hawaiian as Japanese provides an estimate of 64.2 bits /​ second for Mandarin and 59.6 bits /​ s for Hawaiian. This is in line with the expectation I have from the previous paragraph, especially when you consider that Japanese is spoken slower than Hawaiian.

[2] There are some compound words that I used that the editor doesn’t like, but they are easily derivable from words the editor accepts, so I considered them fair game

[3] Esperanto is not well-suited to this. While Esperanto’s lexicon is smaller than most natural languages, it is still quite large, and requires a sustained deliberate effort to learn. Esperanto is also too similar to certain languages it is based on, which makes it less culturally neutral than Sona