Against John Searle, Gary Marcus, the Chinese Room thought experiment and its world

Preface:

I’m putting my existing work on AI on Less Wrong, and editing as I go, in preparation to publishing a collection of my works on AI in a free online volume. If this content interests you, you could always follow my Substack, it’s free and also under the name Philosophy Bear.

Anyway, enjoy. Comments are appreciated as I will be rewriting parts of the essays before I put them out:

The essay:

The title is a play on “Against the Airport and its World” and is in no way intended as a slight against any named author, both of whom I respect intellectually, and do not know enough about to evaluate as people.

The other day I gave an argument that it may be that the differences between whatever LaMDA is and true personhood may be more quantitative than qualitative. But there’s an old argument that no model which is based purely on processing text and outputting text can understand anything. If such models can’t understand the text they work with, then any claim they may have to personhood is at least tenuous, indeed let us grant, at least provisionally, scrapped.

That argument is the Chinese Room Argument. Gary Marcus, for example, invokes it in his 2022 article “Google’s AI is not sentient. Not even slightly”- [Edit: or I should say, at least on my reading of Marcus’s article he alludes to the Chinese Room argument although some of my readers disagree].

To be clear, Marcus, unlike Searle does not think that no AI could be sentient, but he does think, as far as I can tell, that a pure text-in, text-out model could not be sentient for Chinese Room-related reasons. Such models merely associate text with text- they are a “giant spreadsheet” in his memorable phrase. One might say tthey have a purely syntactic not semantic character.

I will try to explain why I find the Chinese Room argument unconvincing, not just as proof that AI couldn’t be intelligent, but even as proof that a language model alone can’t be intelligent. Even though the arguments I go through here have already been hashed out by other, better philosophers, I want to revisit this issue and say something on it- even if it’s only a rehash of what other people have said- because the issue of what a model that works on a text-in-text-out basis can or cannot understand is very dear to my heart.

The Chinese Room argument, summarised by Searle goes:

“Imagine a native English speaker who knows no Chinese locked in a room full of boxes of Chinese symbols (a data base) together with a book of instructions for manipulating the symbols (the program). Imagine that people outside the room send in other Chinese symbols which, unknown to the person in the room, are questions in Chinese (the input). And imagine that by following the instructions in the program the man in the room is able to pass out Chinese symbols which are correct answers to the questions (the output). The program enables the person in the room to pass the Turing Test for understanding Chinese but he does not understand a word of Chinese.”

In the original thought experiment the program effectively constituted a lookup table. “Output these words in response to these inputs”.

I’ve always thought that two replies- taken jointly—capture the essence of what is wrong with the argument.

The whole room reply: It is not the individual in the room who understands Chinese, but the room itself. This reply owes to many people, too numerous to list here.

The cognitive structure reply: The problem with the Chinese room thought experiment is that it depends upon a lookup table. If the Chinese room used instead of some kind of internal model of how things relate to each other in the world in order to give its replies, it would understand Chinese- and, moreover, large swathes of the world. This reply, I believe, owes to David Braddon-Mitchell and to Frank Jackson. The summary of the two replies I’ve endorsed, taken together, is:

“The Chinese Room Operator does not understand Chinese. However, if a system with a model of interrelations of things in the world were used instead, the room as a whole, but not the operator, could be said to understand Chinese.”

There need be nothing mysterious about this modeling relationship I mention here. It’s just the same kind of modeling a computer does when it predicts the weather. Roughly speaking I think X models Y if X contains parts that are isomorphic to the parts of Y, and these stand in isomorphic relationships with each other (especially the same or analogous causal relationships) that the parts of Y do. Also, the inputs and outputs of the system causally relate to the thing modeled in the appropriate way.

It is certainly possible in principle for a language model to contain such world models. It also seems to me likely that actually existing language models can be said to contain these kinds of models implicitly, though very likely not at a sufficient level of sophistication to count as people. Think about how even a simple feed-forward, fully connected neural network could model many things through its weights and biases, and through the relationships between its inputs, outputs and the world.

Indeed, we know that these language models contain such world models at least to a degree. We have found nodes that correspond to variables like “positive sentiment” and “negative sentiment’“. The modeling relationship doesn’t have to be so crude as “one node, one concept” to count, but in some cases, it is.

The memorisation response

Let me briefly deal with one reply to the whole room argument that Searle makes- what if the operator of the Chinese room memorized the books and applied them? She could now function outside the room as if she were in it, but surely she wouldn’t understand Chinese. Now it might seem like I can dismiss this reply out of hand because my reply to the Chinese room combines a point about functional structure, a look-up table is not good enough. Nothing obliges me to say that if the operator memorized the lookup tables, they’d understand Chinese.

But this alone doesn’t beat Searle’s counterargument because it is possible that she calculates the answer with a model representing parts of the world, but she (or at least her English-speaking half) does not understand these calculations. Imagine that instead of memorizing a lookup table, she had memorized a vast sequence of abstract relationships- perhaps represented by complex geometric shapes, which she moves around in her mind according to rules in an abstract environment to decide what she will say next in Chinese. Let’s say that the shapes in this model implicitly represent things in the real world, with relationships between each other that are isomorphic to relationships between real things, and appropriate relationships to inputs and outputs. Now Searle says “look, this operator still doesn’t understand Chinese, but she has the right cognitive processes according to you.”

But I have a reply- In this case I’d say that she’s effectively been bifurcated into two people, one of which doesn’t have semantic access to the meanings of what the other says. When she runs the program of interacting abstract shapes that tell her what to say in Chinese, she is bringing another person into being. This other person is separated from her, because it can’t interface with her mental processes in the right way [This “the operator is bifurcated” response is not new- c.f. many such as Haugeland who gives a more elegant and general version of it].

Making the conclusion intuitive

Let me try to make this conclusion more effective through a digression.

It is not by the redness of red that you understand the apple, it is by the relationships between different aspects of your sensory experience. The best analogy here, perhaps, is music. Unless you have perfect pitch, you wouldn’t be able to distinguish between c4 and f4 if I played them on a piano for you (seperated by a sufficient period of time). You might not even be able to distinguish between c4 and c5. What you can distinguish are the relationships between notes. You will most likely be able to instantly hear the difference between me playing C4 then C#4 and me playing C4 then D4 (the interval C4-C#4 will sound sinister because it is a minor interval. The interval between C4 and D4 will sound harmonious because it is a major interval. You will know that both are rising in pitch. Your understanding comes from the relationships between bits of your experience and other bits of your experience.

I think much of the prejudice against the Chinese room comes from the fact that it receives its input in text:

Consider this judgment by Gary Marcus on claims that LaMDA possesses a kind of sentience:

“Nonsense. Neither LaMDA nor any of its cousins (GPT-3) are remotely intelligent. All they do is match patterns, drawn from massive statistical databases of human language. The patterns might be cool, but language these systems utter doesn’t actually mean anything at all. And it sure as hell doesn’t mean that these systems are sentient. Which doesn’t mean that human beings can’t be taken in. In our book Rebooting AI, Ernie Davis and I called this human tendency to be suckered by The Gullibility Gap — a pernicious, modern version of pareidolia, the anthromorphic bias that allows humans to see Mother Theresa in an image of a cinnamon bun. Indeed, someone well-known at Google, Blake LeMoine, originally charged with studying how “safe” the system is, appears to have fallen in love with LaMDA, as if it were a family member or a colleague. (Newsflash: it’s not; it’s a spreadsheet for words.)”

But all we humans do is match patterns in sensory experiences. True, we do so with inductive biases that help us to understand the world by predisposing us to see it in such ways, but LaMDA also contains inductive biases. The prejudice comes, in part, I think, from the fact that it’s patterns in texts, and not, say, pictures or sounds.

Now it’s important to remember that there really is nothing qualitatively different between a passage containing text, and an image because both can easily include each other. Consider this sentence. “The image is six hundred pixels by six hundred pixels. At point 1,1 there is red 116. At point 1,2 there is red 103”…” and so on. Such a sentence conveys all the information in the image. Of course, there are quantitative reasons this won’t be feasible in many cases, but they are only quantitative.

I don’t see any reason in principle that you can’t build an excellent model of the world through relationships between text alone. As I wrote a long time ago [ed: in a previous essay in this anthology.]:

“In hindsight, it makes a certain sense that reams and reams of text alone can be used to build the capabilities needed to answer questions like these. A lot of people remind us that these programs are really just statistical analyses of the co-occurrence of words, however complex and glorified. However, we should not forget that the statistical relationships between words in a language are isomorphic to the relations between things in the world—that isomorphism is why language works. This is to say the patterns in language use mirror the patterns of how things are. Models are transitive—if x models y, and y models z, then x models z. The upshot of these facts are that if you have a really good statistical model of how words relate to each other, that model is also implicitly a model of the world, and so we shouldn’t surprised that such a model grants a kind of “understanding” about how the world works.”

Now that’s an oversimplification in some ways (what about false statements, deliberate or otherwise), but in the main the point holds. Even in false narratives, things normally relate to each other in the same way they relate in the real world, generally you’ll only start walking on the ceiling if that’s key to the story, for example. The relationships between things in the world are implicit in the relationships between words in text, especially over large corpora. Not only is it possible in principle for a language model to use these, I think it’s very possible that, in practice, backpropagation could arrive at them. In fact, I find it hard to imagine the alternative, especially if you’re going to produce language to answer complex questions with answers that are more than superficially plausible.

Note: In this section, I have glossed over the theory-ladeness of perception in this section and treated perception as if it were a series of discrete “sense data” that we relate statistically, but I don’t think it would create any problems for my argument to expand it to include a more realistic view of perception. This approach just makes exposition easier.

What about qualia

I think another part of the force of the Chinese room thought experiment comes from qualia. In this world of text associated with text in which the Chinese room lives where is the redness of red? I have two responses here.

The first is that I’m not convinced that being a person requires qualia, I think that if philosophical zombies are possible, they still count as persons, and have at least some claim to ethical consideration.

The second is that qualia are poorly understood. They essentially amount to the non-functional part of experience, the redness of red that would remain even if you swapped red and green in a way that made no difference to behavior, in the famous inverted spectrum argument. Currently, we have no real leads in solving the hard problem. Thus who can say that there couldn’t be hypothetical language models that feel the wordiness of certain kinds of words? Maybe verbs are sharp and adjectives are so. We haven’t got a theory of qualia that would rule this out. I’d urge interested readers to read more about functionalism, probably our best current theory in the philosophy of mind. I think it puts many of these problems in perspective.

Edit: An excellent study recently came to my attention showing that when GPT-2 is taught to play chess by receiving the moves of games (in text form) as input, it knows where the pieces are, that is to say, it contains a model of the board state at any given time. “Chess as a Testbed for Language Model State Tracking” (2021) As the authors of that paper suggest, this is a toy case that gives us evidence these word machines work by world modeling