Charlatan Labyrinth
“Calm, dog”, Khan tries.
“OK, senpai” I beg. Copper Ra satellites for zenith, my sandals sauna on emerald rubbish in the barracks.
“Traffic me alcohol and the syrup jar here, ninja”. I stubbornly tote ginger tea and chocolate, Khan’s a punk.
Bizarre: Myths don’t rattle in this hip ghetto — I dig it.
I twitchily hassle; “The assassin at the canal, you clocked?”
“Pow pow out the slum. Barged in, massaged the racket, mopped up, you grok? Boomeranged chop-chop. Fun caliber, righted me an average migraine. No person but me the shogun, the zombifieds, and the assassin; he fake kowtowed to the sultan — to Laniakea blings. Ogle there!”
I dodge to bother: there he is. “Your admiral, in person‽”. I’m flummoxed. He traffics the coach zig-zag and gets in the compound.
The tattooed admiral, crashing the sofa: “I hustled the cocaine from the saboteur.”
Khan yanks the coffer of narcotic alabaster saffron. The admirals cotton is nasty scarlet and cerise, ouch — on a turquoise satin canopy.
Khan: “Yours?”
“No.”
“You’re a goon.”
“No, a candy shaman” admiral rumbles stubbornly. The elixir jitters out of sapphire spheres, we absinthe.
“No taboos at this corroboree. The narc, is he, um, “amen”?”
“Yes.”
“Ok” Khan scratches. “Tabbed to me? Shenanigans?”
“No cops. … My sabbatical, my cash? My chili squaw will squeeze the flimsy bikini, but that’s OK. I’ll syrup-daddy” he yaps.
“Cheugy, soynerd. OK”—Khan yeets the cash to the sofa. “Don’t amok in the ghetto, don’t list macabre hash, don’t flop, and we are wicked hip. No skulduggery. Jive her, fuck her, marry her, hallelujah.”
“Ok, no shenanigans in the slum. Chào.”
Khan’s admiral traffics the silver cannon gizmo to me, ruffles out.
I hazard the sofa—I’m ketchuped, bothered. Pump soda when Betelgeuse capoeiras. “Goofy bloke” I bounce. “He gets to cottage and barbecue?”
A dzogchen Khan chats: “Not with that ease… he’s the narc. No cottage, no barbecue, no pyramid, just a mummy in a canal by monsoon. I’ll bag his kawaii sheila.”
I’m petrified. What a coyote, this bastard. He squints.
“My horde has to have fit asabiyyah. You yabber to the cops, you beg to satan and Yahweh. That’s the algebra. I’m a sigma chad, I’m the sulfur phoenix, I boom.”
No fanfare, no shouting. Ditzily: “Scram. Curry me some, baizuo.”
I taped this gibberish in the bungalow. I’m the narc, the saboteur: mundane, embryonic—he doesn’t ping.
My pink nape bothers, my bloke avocados itch. I’ll sumō the shogun at ramadan. Ivory will triumph.
Explanation of the constraint, with detailed comments
The constraint for this piece of writing for me was to not use any words of Proto-Indo-European origin, except the top fifty English words according to this list. Conjugating & combining allowed words was permitted.
1. Uncertain:
1. Recent: “OK” (1839), “punk” (1678), “dog” (14th to 16th century), “clock” (1370), “fake” (1775), “ogle” (17th century), “flummoxed” (1837), “zig-zag” (1712), “yank” (1822), “ouch” (1838), “goon” (1580), “tab” (1607), “shenanigans” (1850s), “squeeze” (1600), “nerd” (1951), “Chad” (7th century), “ditzily” (1800s), “gibberish” (mid 16th century)
2. Older: “calm”, “try”, “beg”, “rubbish”, “stubborn”, “rattle”, “twitch”, “hassle”, “racket”, “fun”, “bother”, “crash”, “hustle”, “nasty”, “scratch”, “cop”, “flimsy”, “daddy”, “yap”, “macabre”, “flop”, “wicked”, “skulduggery”, “fuck”, “marry”, “silver”, “gizmo”, “ruffle”, “capoeira”, “goofy”, “bloke”, “bounce”, “ease”, “bastard”, “squint”, “fit”, “sulfur”, “shouting”, “scram”, “tape”, “ping”, “pink”, “nape”, “itch”
3. From Greek: “sandal”, “sphere”, “embryonic”
4. From Italian: “ghetto”
2. Turkic: “Khan”, “saboteur”, “turquoise”, “horde”
3. Japonic: “senpai”, “ninja”, “soy”, “kawaii”, “sumō”
4. Afro-Asiatic:
1. Egyptian: “Ra”, “barge”, “migraine”, “alabaster”, “pyramid”, “phoenix”, “ivory”
2. Semitic: “copper”, “emerald”, “mop”, “coffer”, “sapphire”, “mummy”
1. Arabic: “zenith”, “traffic”, “alcohol”, “syrup”, “jar”, “assassin”, “massage”, “caliber”, “average”, “sultan”, “admiral”, “sofa”, “saffron”, “cotton”, “scarlet”, “elixir”, “hash”, “hazard”, “soda”, “Betelgeuse”, “monsoon”, “sheila”, “asabiyyah”, “algebra”, “ramadan”
2. Hebrew: “amen”, “hallelujah”, “sabbatical”, “satan”, “Yahweh”
5. Tyrsenian:
1. Etruscan: “satellite”, “person”, “mundane”
6. Uralic: “cottage”
1. Finnic: “sauna”
2. Hungarian: “coach”
7. Dravidian: “bungalow”, “candy”
1. Tamil: “ginger”, “cash”, “curry”
8. Niger-Congo: “tote”
1. Atlantic-Congo
1. Bantu: “zombified”
1. Wolof: “hip”, “dig it”, “jive”
9. Sino-Tibetan:
1. Chinese: “tea”, “shōgun”, “satin”, “chào”, “baizuo”
1. Cantonese: “chop-chop”, “kowtowed”
2. Hokkien: “ketchuped”
2. Tibetan: “dzogchen”
10. Uto-Aztecan:
1. Nahuatl: “chocolate”, “chili”, “avocado”
2. Nahuan: “coyote”
11. Basque: “bizarre”
12. Sumerian: “canal”, “cannon” (both have the same root in “𒄀”, very neat)
13. Pama-Nguyan:
1. Dharug: “boomerang”, “corroboree”
2. Woiwurrung: “yabber”
14. Austronesian:
1. Hawaiian: “Laniakea”
2. Malay: “compound”, “amok”
3. Samoan: “tattooed”
4. Tongan: “taboo”
5. Marshallese: “bikini”
15. Yuman:
1. Quechan: “cocaine”
16. Tungusic:
1. Evenki: “shaman”
17. Algic:
1. Massachusett: “squaw”
18. Arawakan:
1. Taíno: “barbecue”
19. Substrate:
1. Pre-Greek: “narcotic”, “cerise”, “canopy”, “absinthe”, “narc”, “petrified”, “triumph”
2. Other: “myth”, “barrack”
20. Onomatopoetic: “pow pow”, “rumble”, “jitter”, “pump”, “chat”, “sigma”, “boom”, “fanfare”
21. De novo: “slum”, “grok” (1961), “bling”, “cheugy” (2013), “yeet” (2008)
A bit of a cop-out, since I’m assuming the Etruscan etymology and not tracing it back through “fulgāną” (though note that that’s also not traced back to PIE) or “𐎧𐏁𐏂𐎶” to “*tek-” I could also use shiver here, which is uncertain. But I like Etruscan more. ↩︎
Seems unclear. I originally thought this was Uralic from Hungarian, but I was mistaken. Either from a substrate language (!) from “bara” (thouh possibly from “*bʰeh₂-”) through “barrum” or from “*bʰerH-” through “*barra”. I’ll let it slide, I think, but it’s also an edge-case. Otherwise I could use “bungalow” a second time. ↩︎
I don’t buy the “trans-”″friare” explanation, and find “تَفْرِيق” more plausible. But ymmv, could be a violation of my constraints. ↩︎
Could be from “*(s)tewp-” via “stubbaz”, but that’s more of a hypothesis. I count it as uncertain. ↩︎
Another one where I’m playing it fast & loose. Seems disputed, either from Proto-German “*tut(t)-” (but without further history) or (more fun) from a Bantu language. ↩︎
Yes, it went through Prakrit but is ultimately Dravidian with “𑀇𑀜𑁆𑀘𑀺𑀯𑁂𑀭𑁆”! LLMs often get tripped up by this. ↩︎
This is in the top 50 words by frequency as “right”. ↩︎
Needful to say, I use the origin from “عَوِرَ” instead of “avere”. ↩︎
The Egyptian origin. I know we could also trace it back to “*sēmi” from “*ḱr̥h₂-(e)s-n-”, but let’s not. ↩︎
Yeah I know this one’s reaching pretty far. Wiktionary gives 17th century as an origin, but then proceeds to provide an etymology from “*h₃ekʷ-” through “*augijan”. There’s really no good word for looking that’s not IE though. Alternatively I considered “peep” but that’s just directly from “piken”. I could use “capoeira there” or “kung fu there” as “turn there”. Maybe after an edit. ↩︎
Wiktionary only goes back to middle Dutch “hutsen” but there’s not entry for the Dutch word on the Wiktionary page (only for the unrelated Basque “hutsen” for the superlative of “huts”, “zero”/”empty”. A quick websearch doesn’t give me anything more, I’ll count it as “unknown”. ↩︎
I’ll take the Ottoman Turkish origin from “چاپمق”, not the Persian “چپت” (since Persian is often PIE). ↩︎
I wish I could more confidently link this to the extinct Hurro-Urartian languages, but I’ll be a good boy and stay with Semitic languages. Would be awesome though. ↩︎
This one is so disputed (“obscure origin” via Wiktionary) that I’ll say it’s uncertain. Possibly I’m wrong and it’ from “*ken-” through “*hnaskuz”, in which case I could also use “icky”, which isn’t great either, or “wacky”, which is uncertain or onomatopoetic. ↩︎
I’ll take the substrate origin through “κερασός”. ↩︎
This one’s fun! I like the Proto-Dravidian reconstruction from “*kaṇṭu”, and Wiktionary on “खण्ड” says “An internally-derived word, likely of non-Indo-European origin but no convincing Dravidian or Munda sources. […] Part of the Indo-Aryan “defective” group of words”, which “do not have clear Indo-European etymology. They are characterized by showing a wide variety of alternative forms, perhaps indicating substrate origin or taboo deformation”. Very cool! I’ll count it as Dravidian. ↩︎
Wiktionary doesn’t connect it with “*gred-” and instead says “of uncertain origin”. Surprising, but I’ll take it. ↩︎
I’ll use the Tamil etymology from “காசு” instead of the Latin one from “*kap-” via “capiō”. Alternatively I could use “shekels”. ↩︎
I’d guess that “dad” is so common that I could just say “nuh uh it’s actually the Elamite honorific for “dear father”, deal with it” or whatever. But Wiktionary also says “dad” is “of uncertain ultimate origin”, so I win. ↩︎
My arch-nemesis: I have the speculation this is actually from “jeter” via French from New Orleans into AAVE or via “iettare” through some unknown-to-me route. But our etymology goes only back to 2014 (or 2008 if we count the Urban Dictionary entry as related), and none of the originators are from New Orleans as far as I can tell, so I get to use the word. ↩︎
Here again the etymology only goes back to Middle English, and ends there. Alternatively I could use “botch”. ↩︎
I take the Wolof etymology because I can. ↩︎
The Big Mystery. Alternatively one could use “boink” if we believe the PIE origin from “*pewǵ-”. ↩︎
This one would be harder to replace if we believe the origin from “*méryos”. The best I can do is “harem” which is very stilted so I’m happy “marry” is disputed/unclear. ↩︎
Got you! There’s a Chinese version of “ciao” with the same meaning! ↩︎
Wiktionary doesn’t give an etymology beyond “probably ultimately imitative”. ↩︎
The Wiktionary page for “goof” is so full of “perhaps”s and “possibly”s I’ll take this one as uncertain. “Kosher” could be a fallback. ↩︎
Okay this is very annoying. I was gonna use “zen” here and feel quite clever, but that’s ultimately a loadword from “dhyana” from Sanskrit “ध्यै”, which, sure, “origin uncertain” but it’s almost guaranteed to have some PIE root. Blech. ↩︎
The etymology just… ends at “asquint”? But doesn’t seem related to *(s)kewh₁- since “squint” goes back to words for slant/slope/angle. ↩︎
Another fun one! Comes from the name for Bengal, which either traces back to “वङ्ग”, for which wiktionary doesn’t offer an etymology (except linking back to “بنگال”, creating a cycle in the etymology), could also be Proto-Dravidian “वातिङ्गण” or even Tibetan “བནས”. It veers dangerously close to PIE through Sanskrit but seems ultimately non-IE. ↩︎
There’s a PIE etymology from “*mh₂nd-” and an Etruscan one from “𐌌𐌖𐌈”. I assume the Etruscan one. ↩︎
Unclear or from Hebrew “עֻבָּר”, I’ll assume unclear from Greek, quoth Wiktionary: “None of them are particularly convincing.” ↩︎
I’ll assume it’s onomatopoetic. If you disagree, imagine I’d re-used “grok” here. ↩︎
LLMs have a really hard time writing under this constraint, I tried with Gemini 3 Pro, Opus 4.6, the results were noticeably worse and extremely boring. They think that the constraint is much weaker than it actually is, when asked to figure it out. The author Opus 4.6 suspects is Douglas Hofstadter.
Also notice the (unplanned!) wordcount, I’d think it was kismet.
I’m confused about what’s going on here – is this LLM output? (if so it should be labeled as such)
Also confused why there are no line breaks.
Nope, not LLM output, all mine, though it’s not a great sign that it can’t be distinguished :-/ The text is a constrained writing exercise, in which I fulfill the constraint described in the collapsible section above (which left me with ≤0.5% of English vocabulary, hence the obscurity of many words). Think writing without the letter ‘e’/classic oulipo. The collapsible section contains a fairly detailed fully linked/footnoted version that discusses each word choice, and some places where I wanted to use a word but couldn’t.
LLMs can’t write this kind of text. Believe me, I’ve tried, they immediately fall on their noses and use a Greek or Persian or Hindi-originated word. I think it’s a really hard task which LLMs aren’t ready for yet.
And I’m seeing line breaks, in the block-quoted text, are you not? Ah, wait, I get it, there are no paragraph breaks, but there are ’\n’s in the text, because I thought it wasn’t worth splitting 450 words into paragraphs, and wanted to evoke a sentiment of reading a page from a novel, the dialogue is split by the newlines/per character.
Maybe all the footnotes together with every word linked to its Wiktionary entry is confusing. The collapsible section contains an annotated version of the full post, as a stand-in for spoilers because I didn’t want to have a hover spoiler because that gets annoying with long texts.
The numbered list at the end is not super important, it associates the words used to their respective language families.
I was a bit unsure whether to post this here, but given that Gwern’s Tilakkhana and October the First is Too Late were posted here too, I thought this in the same genre, about the same level of obscurity.
Both of those benefited from intermingled annotations/scaffolding to keep the LLMs on track, as notes. “October” puts them in comments/collapses, and “Tilakkhana” has them in comments, following the “scansion” pseudo-code.
Given the difficulty of this constraint (where do you even get a list of PIE-valid words...?), I would not adopt a constrained-sampling approach (which used to be the standard approach to such text games but doesn’t play well with any kind of planning/inner-monologue), but rather a ‘databank’ approach, closer to how I did “trajectoid words”: write down the list of PIE-valid words and frequency-valid words, and then define a format where every word in the story has to be annotated with its ‘type’ (eg ‘c’ for ‘common’ and ‘p’ for ‘PIE’), and a permissible root word if it is not in the databank. This helps reduce the problem of valid writing to a very ‘local’ problem with a cheap self-attention check back to the databank. It also makes it easier for a reasoning model to scan over a final draft to double-check validity.
So something like
... in [c] the [c] barracks [p:bara].You could also iteratively add in valid words to the databank to save compute; add in ‘barrack’ and ‘barracks’ and ‘barracked’ to the PIE databank and future LLM runs can just write
... in [c] the [c] barracks [p].(If you have few enough PIE words to work with, you could ask the LLM to try to generate up front all their valid variations.) This seems important given that your footnotes indicate to me that a lot of your etymologies are too debated to expect a LLM to deliver satisfactory results to you; you are going to have to lay down by fiat what are or are not valid words/roots… In fact, given the extreme difficulty you are having in writing even a coherent sentence, you’d probably want to include a sentence databank to store all the reasonably interesting valid sentences generated. (I wouldn’t necessarily bother with paragraph or higher, given how much difficulty you’re having at the word and sentence level.) I do this a lot with poetry, like with the last poem I wrote (for Valentine’s Day), I included this:(Even when you don’t get any new ones you want to cheap, it’s interesting for giving you an idea how the model ‘thinks’. I tried GLM-5 the other day, and I could see from its curation that it had terrible taste, which lined up with the garbage final outputs. I’ll be sticking with Kimi K2.5 Thinking as my current outside option for now...)
After you have built up enough puzzle pieces, it should be easier for the LLM to assemble them in a bunch of ways, check the fit, and then pick the best out of 20 or 100 or whatever.
Mm, without paragraph breaks this looked just sorta broken and confusing. (I don’t know what you mean by “wanted to evoke a sentiment of reading a page from a novel” since novels generally have paragraph breaks)
I asked about LLMs because I wasn’t sure if your “LLMs have a really hard time writing under this constraint” quote was more like “LLMs have a hard time and I effortfully got them to do it” or “LLMs can’t, and I can” (but I wasn’t sure why the comparison was being made)
I have no objection to the exercise getting posted on LW it was just confusing
Hm, thanks for the feedback. Not sure how to change, if I bunch the sentences into paragraphs it probably becomes less readable, if I give each sentence it’s own paragraph it becomes a bit disjointed. Let me think about it.
huh, not sure why paragraphs feel disjointed to you, feels like totally normal dialogue-heavy writing to me. made a draft with breaks here: https://www.lesswrong.com/editPost?postId=dRzsobGjtH6AyBkYp&key=a93a852f88e272373f6f8958137cc5
Fair enough, your version looks good, I’ll edit this main one to conform—done.