What’s going on? LLMs and IS-A sentences

This is a cross-post from New Savanna.

For the moment I have decided that Waddington’s classic diagram of the epigenetic landscape is a useful way of thinking about when happens when an LLM responds to a prompt. Here’s the diagram:

The language model corresponds to the landscape. The prompt serves to position that ball at a certain place in the landscape – perhaps we can think of that ball as the prompt. The ball then rolls down the valley, going left and right as appropriate. It never reverses direction and goes up the hill. That path, or trajectory if you will, is the LLM’s response to the prompt.

Moreover, I have decided to think of the generation of each word (yes, I know, technically it spits out tokens, not words) as a single primitive operation. That is to say, it has no internal logical structure, no ANDs or ORs. It’s simply one (gigantic) calculation over roughly 175 billion values (in the case of ChatGPT). The generation of each word presents the system with a choice among alternatives, but that’s the only kind of choice involved in calculating the response to a prompt – though for qualification and elaboration, see ChatGPT tells stories, and a note about reverse engineering: A Working Paper, Version 3, pp. 3-6.

That brings me to something I’ve been puzzled about for years. We find it natural to say things like, Garfield is a cat. Now, express the same thought, but reverse the order of cat and Garfield in your sentence. It’s difficult to do. Oh, you can do it, but the resulting sentence is awkward and unnatural, something like, Cats are the kind of thing of which Garfield is a particular instance. No one would ever speak like that, nor write it either.

What’s the source of that asymmetry? As far as I can tell, we don’t know, but I take it as a clue about the mechanisms of language. The purpose of this note is to suggest that my crude model of LLM calculation would provide an answer: The linguistic landscape is structured so that the ball easily rolls from Garfield to cat, or cat to mammal, Snoopy to beagle, Tesla to EV, C. elegans to worm, etc. One might, of course, as why the landscape is arranged in that way, but that’s a different question, no?

Here’s some notes I made about IS-A sentences.

Notes on IS-A Sentences

Somewhere in his Problems in General Linguistics, my copy of which is, alas, in storage, Emile Benveniste has a chapter, “The Nominal Sentence,” on sentences hanging on the auxiliary “to be.” As Benveniste was a linguist of the Old School, when being a linguistic meant familiarity with many languages, including—and this is important for this particular topic—classical Greek, it had examples from many languages, making it tough sledding for a monoglot like me.

While the content of this post certainly arises out of my thinking about that chapter, in the absence of actually having the text in front of me, I hesitate to assert a stronger relationship than that. I note only that, for Benveniste, the auxiliary “to be” was fraught with metaphysical significance. For the concept of being derives from “to be.” Where would philosophy be without Being? Thus, when Benveniste pondered such sentences, he wasn’t merely commenting on language. He was doing philosophy, or, if not quite that, camping out on philosophy’s door step.

I’m interested in such sentences because I believe they are a DEEP CLUE about how the mind works. I just don’t know what to make of the clue.

So, I’m interested in word order in assertions such as the following:

(1) Fido is a beagle.
(2) Beagles are dogs.
(3) Dogs are beasts.

They all move from an element in a class (whether an individual, Fido, or another class, beagles) to a class containing it. None of them move in the opposite direction. Consider what happens when you try to go the opposite way. In the following sentence the class is mentioned first, then the subclass:

(4) Beagle is the kind of animal of which Fido is an instance.

In particular, note that (4) has a metalingual character that (1) does not. That is, (4) explicitly asserts that we are dealing with classification. One can do that metalingual job in various ways, but, as far as I can tell, one can’t avoid it. That is, one cannot construct a proper English sentence relating a genus and species in which the genus is mentioned first, one can’t do that without ‘looping through’ some kind of metalingual construction on the way from genus to species.


What does this assymetry tell us about the underlying mechanisms? Why don’t have sentences such as:

(5) Beagle za di Fido.

In this case “za di” is the inverse of “is a”. English has no such sentences & no such inverse.

So, how widespread is this asymmetry and is there any explanation of this directionality?

I sent a query on that matter to a listserve, I forget which one, and got two replies that add some complexity to the matter. Rich Rhodes, Linguistics at UCal Berkeley, tells me that in Ojibwe the word order is reversed, the class comes before the individual, but the asymmetry remains. He then comments, which he qualifies as a quick guess:

My guess is that there is no compelling discourse function (like information flow) which makes it desirable to invert classificational equatives. Hence we only get the “unmarked” order. Subject-predicate in theme-rheme languages (like English) and predicate-subject in rheme-theme languages (like Ojibwe).

So, what’s the nature of the mechanism that determines the “unmarked” order? That’s what I want to know.

Lee Pearcy, Episcopal Academy in Merion, Pa. offered these examples:

(6) The beagle is Fido.
(7) The dogs are beagles.
(8) The beasts are dogs.

As stand-alone sentences, they seem a bit awkward to me. But they fare better as answers to questions, e.g.:

What’s that dog?
Which dog? The beagle is Fido and the terrier is Max.

What’re those animals?
The dogs are beagles, the cats are Persians.

In those contexts, the matter of class or classification is raised by the question, thus making it present in the discourse and so available as a point of attachment in the answer.

Further clues, anyone?

Do I believe this?

I don’t believe it, or disbelieve it. It’s a working hypothesis. One I think is worth investigating. It places relatively simple and severe constraints on our conception of what LLMs are doing. That, it seems to me, is a good thing. Should it turn out that those constraints are valid, well then, we’ve learned something, no? If they’re not valid, we’ve also learned something.

More later.