On the naturalistic study of the linguistic behavior of artificial intelligence

This is draft material from the introduction of a working paper I’m about to finish: Discursive Competence in ChatGPT, Part 1: Talking with Dragons.

* * * * *

When I started playing with ChatGPT on December 1, 2022, I had no specific intentions. I wanted to poke around, see what I could see, and then...As I said, I had no specific intentions. I certainly did not intend to spend hours interacting with it to produce a Microsoft Word document currently (1.1.22) containing 59,197 words of transcripts – the vast majority from ChatGPT – on 173 pages.

One of the earliest things I did with ChatGPT – not THE first, it was my third session, on December 1, 2022 ¬– was to dialog about Steven Spielberg’s Jaws and the ideas of Rene Girard. I took that and wrote it up for 3 Quarks Daily.[1] Then I had some fun with “Kubla Khan,” quizzed it about trumpets, had a long session about Gojira/Godzilla, and then returned to Spielberg, this time to A.I. Artificial intelligence. By this time I was developing a feel for how ChatGPT responded. Both the Jaws and the A.I. posts are included in this paper.

I became a bit more systematic, looking for things, testing them out. That led to a post with a rather baroque title, Of pumpkins, the Falcon Heavy, and Groucho Marx: High level discourse structure in ChatGPT,[2] which I’ve also included in this paper. In that post I advanced the argument that there are parameters in the language model that govern ligher level discourse structures independently of the specific words and strings that realize them.

The alternation pattern is something like this:
A, B, A, B....
That can be repeated as often as one will. The text in the A sections is always drawn from one body of material while the text in the B sections is drawn from a different body of material. That’s the pattern ChatGPT has learned. Where is it in the net? How’s it encoded.
The frame structure is a bit more complicated:
A (B, C, B, C....) A’
The embedded alternation draws on two bodies of material, any two bodies. The second part of the frame, A’, must complement the first, A.
Again, it’s not a complex structure. But it’s not defined directly over particular words. It’s defined over groups of words, placing the groups, not the individual words, into specified relationships in the discourse string.

I then suggested that the patterns I had identified in Jaws and A.I. where similar, but, if anything, more complex.

I had become all but convinced that ChatGPT had explicit control over high-level aspects of its discourse. Here humans make statements like those, we take it as obvious that they have some “grammar” of high-level discourse structures. Narratologists, linguists, and psycholinguists study them. But ChatGPT is not a human. It is, shall we say, a machine, a machine that was trained to guess the next word, word after word after word....and so forth, for jillions of texts. All that’s in there is statistics about those texts. It’s a “stochastic parrot”, as one well-know paper argued.

Perhaps, in a sense, that is a true. But it is also terribly reductive and, I have come to believe, all but beside the point. Large language models issue one word at a time for the same reason that humans do: That’s the nature of the communication channel, and tells us relatively little about the device that is pushing words through the channel. LLMs develop rich and complicated structures of parameter weights during the training process. Yes, those structures are statistical in nature, but they are also structures. Perhaps there are aspects of those structures that we can investigate without having to “open the hood” and examine parameter weights.

I made that suggestion in a post, Abstract concepts and metalingual definition: Does ChatGPT understand justice and charity?, also included in this paper.[3] Chomsky famously distinguished between competence and performance, where the study of linguistic performance is about the mechanism that produces and understands texts while the study of linguistic competence is about the structure of the texts independent of underlying mechanisms. When I analyze ChatGPT’s output, as I have been doing for the past month, I am investigating its competence. When researchers pop the hood and examine parameter weights, they are investigating performance mechanisms. I further suggest that a better understanding of an LLM’s competence will aid in studying those performance mechanisms by giving us clues about what they are doing.

Nor am I the only one who believes that. Others have come to that conclusion as well, though perhaps not quite in those terms. Here is the abstract of a recent preprint from Marcel Binz and Eric Schulz from the Max Planck Institute:

We study GPT-3, a recent large language model, using tools from cognitive psychology. More specifically, we assess GPT-3’s decision-making, information search, deliberation, and causal reasoning abilities on a battery of canonical experiments from the literature. We find that much of GPT-3’s behavior is impressive: it solves vignette-based tasks similarly or better than human subjects, is able to make decent decisions from descriptions, outperforms humans in a multi-armed bandit task, and shows signatures of model-based reinforcement learning. Yet we also find that small perturbations to vignette-based tasks can lead GPT-3 vastly astray, that it shows no signatures of directed exploration, and that it fails miserably in a causal reasoning task. These results enrich our understanding of current large language models and pave the way for future investigations using tools from cognitive psychology to study increasingly capable and opaque artificial agents.[4]

My methods are different, but my purpose is the same, “to study increasingly capable and opaque artificial agents” and thus to render them less opaque. The insights we gain thereby will aid us to improve the capabilities of the next generation of artificial agents.

* * * * *

[1] Conversing with ChatGPT about Jaws, Mimetic Desire, and Sacrifice, 3 Quarks Daily, December 5, 2022, https://3quarksdaily.com/3quarksdaily/2022/12/conversing-with-chatgpt-about-jaws-mimetic-desire-and-sacrifice.html.

[2] Of pumpkins, the Falcon Heavy, and Groucho Marx: High level discourse structure in ChatGPT, New Savanna, December 8, 2022, https://new-savanna.blogspot.com/2022/12/of-pumpkins-falcon-heavy-and-groucho.html.

[3] New Savanna, December 16, 2022, https://new-savanna.blogspot.com/2022/12/abstract-concepts-and-metalingual.html.

[4] Binz, Marcel, and Eric Schulz. 2022. “Using Cognitive Psychology to Understand GPT-3.” PsyArXiv. June 21. doi:10.31234/osf.io/6dfgk.

Cross-posted from New Savanna.