LLM cognition is probably not human-like

This post is a collection of thought experiments and observations, meant to distill and explain some claims and observations about GPTs and LLMs made by myself and others.

Stated in its most general form, the claim is: the ability of a system or process to predict something well does not imply that the underlying cognition used to make those predictions is similar in structure to the process being predicted.

Applied to GPTs and LLMs specifically: the fact that LLMs are trained to predict text, much of which was generated by humans, does not imply that the underlying cognition carried out by LLMs is similar to human cognition.

In particular, current LLMs are not a distillation of human minds or thoughts, and LLM cognition does not appear to be anthropomorphic. The fact that prediction errors and limitations of current LLMs often mirror human limitations and mimic human errors is not much evidence that the underlying cognition is similar.

Rather, the observed surface-level similarity in these errors is more likely due to current LLM capability at the text prediction task being similar to human-level performance in certain regimes. There are many ways to arrive at a wrong or imperfect answer, and these ways need not be similar to each other.[1]

The focus of this post is on the micro: a GPT predicting a distribution on the next token, or the next few tokens autoregressively . Not considered are agentic loops, specialized prompts, back-and-forth dialog, or chain-of-thought reasoning, all of which can be used to elicit better macro-level performance from LLM-based systems.

On the particular micro-level task of next token prediction, current LLM performance in many examples is often similar to that of an unassisted human who is under time pressure or using less than their full attention. Other times, it is far below or far above this particular human baseline.

Background and related work

Some previous posts by others, which are helpful for context and background on this general topic:

  • Feature Selection, by Zack Davis. This is a great story for building an intuition for what the internal cognition of a machine learning model might look like.

  • How An Algorithm Feels From The Inside, by Eliezer Yudkowsky. A classic post, understanding of which helps to clarify the distinction I make in this post between cognition and the outputs of that cognition, and why that distinction is important.

  • Is GPT-N bounded by human capabilities? No. by Cleo Nardo. A post focused on the capability limits of GPTs. I think internalizing this point is helpful for understanding the “cognition” vs. “outputs of that cognition” distinction which I draw in this post.

  • GPTs are Predictors, not Imitators, by Eliezer. Like Cleo’s post, this post makes an important point that performance at text prediction is not bounded by or necessarily closely related to the cognitive power of systems which produced the text being predicted.

  • Simulators, by janus. A post exploring the macro-level behaviors that result from sufficiently long auto-regressive chains of prediction, agentic loops, and specialized prompting. Useful as a contrast, for understanding ways that LLM behavior can be decomposed and analyzed at different levels of abstraction.

  • Are there cognitive realms?, An anthropomorphic AI dilemma, by Tsvi. These posts are partially about exploring the question of whether it is even meaningful to talk about “different kinds of cognition”. Perhaps, in the limit of sufficiently powerful and accurate cognitive systems, all cognition converges on the same underlying structure.

Thought experiment 1: an alien-trained LLM

Consider a transformer-based LLM trained by aliens on the corpus of an alien civilization’s internet.

Would such a model have alien-like cognition? Or would it predict alien text using similar cognitive mechanisms that current LLMs use to predict human text? In other words, are alien GPTs more like aliens themselves, or more like human GPTs?

Perhaps there are no cognitive realms, and in fact human brain, alien brain, alien!GPT and human!GPT cognition are all very similar, in some important sense.

Human-alien translation

Would an LLM trained on both human and alien text be capable of translating between human and alien languages, without any text in the training set containing examples of such translations?

Translation between human languages is an emergent capability of current LLMs, but there are probably at least a few examples of translations between each pair of human languages in the training set.

Suppose that such an alien-human LLM were indeed capable of translating between human and alien language. Would the cognition used when learning and performing this translation look anything like the way that humans learn to translate between languages for which they have no training data?

Consider this demonstration by Daniel Everett, in which he learns to speak and translate a language he has never heard before, by communicating with a speaker without the use of a prior shared language:

During the lecture, Daniel and the Pirahã speaker appear to perform very different kinds of cognition than the type performed by current LLMs when learning and doing translation between human languages, one token at a time.

Thought experiment 2: a GPT trained on literal encodings of human thoughts

It is sometimes said that GPTs are “trained on human thoughts”, because much of the text in the training set was written by humans. But it is more accurate and precise to say that GPTs are trained to predict logs of human thoughts.

Or, as Cleo Nardo puts it:

It is probably better to imagine that the text on the internet was written by the entire universe, and humans are just the bits of the universe that touch the keyboard.

But consider a GPT literally trained to predict encodings of human thoughts, perhaps as a time series of fMRI scans of human brains, suitably tokenized and encoded. Instead of predicting the next token in a sequence of text, the GPT is asked to predict the next chunk of a brain scan or “thought token”, in a sequence of thoughts.

Given the right scanning technology and enough training data, such a GPT might be even better at mimicking the output of human cognition than current LLMs. Perhaps features of the underlying cognitive architecture of such a GPT would be similar to that of human brains. What features might it have in common with a GPT trained to predict text?

Thought experiment 3: a GPT trained only on outputs of computer programs

Suppose you trained a GPT only on logs and other text output of existing computer programs, and then at inference time, asked it to predict the next token in a log file, given some previous lines of the log file as input.

One way that humans might solve this prediction task is by forming a hypothesis about the program which generated the logs, building a model of that program, and then executing that model, either in their their head, or literally on a computer.

Another method is to have logs and log structures memorized, for many different kinds of logs generated by existing programs, and then interpolate or extrapolate from those memorized logs to generate completions for new logs encountered at inference time.

Which methods might a GPT use to solve this task? Probably something closer to the second method, though it’s possible in principle that some early layers of the transformer network form a model of the program, and then subsequent layers model the execution of a few unrolled steps of the modeled program to make a prediction about its output.

Observe that when a human models the execution of a Python program in their head, they don’t do it the way a Python interpreter running on an operating system executing on a silicon CPU does, by translating high-level statements to machine code and then executing x86 instructions one-by-one. The fact that both a brain and a real Python interpreter can be used to predict the output of a given Python program does not mean that the underlying processes used to generate the output are similar in structure.

A similar observation may or may not apply to GPTs: the fact that some subnetwork of a GPT can model the execution of a program in order to predict its output, does not imply that the cognition used to perform this modeling is similar in structure to the execution of the program itself on another substrate.

Observation: LLM errors sometimes look surface-level similar to human errors

(Credit to @faul_sname for inspiration for this example, though I am using it to draw nearly the opposite conclusion that they make.)

Consider an LLM asked to predict the next token(s) in the transcript of (what looks like) a Python interpreter session:

Input to text-davinci-003:

Python 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0] on linux
>>> a = 3
>>> a
4
>>> a

Take a moment and think about how you would assign a probability distribution to the next token in this sequence, and what the “right” answer is. Clearly, either the transcript is not of a real Python interpreter session, or the interpreter is buggy. Do you expect the bug to be transient (perhaps the result of a cosmic ray flipping a bit in memory) or persistent? Is there another hypothesis that explains the transcript up to this point?

Here’s what text-davinci-003 predicts as a distribution on the next token:

Perhaps text-davinci-003 is confident in the “comsic ray” hypothesis, or perhaps it just doesn’t have a particularly strong model of how python interpreters, buggy or not, behave.

A different example, of a non-buggy transcript:

Python 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0] on linux
>>> a, b = 3, 5
>>> a + b
8
>>> a, b = b, a
>>> a / b

Probability distribution over the next token:

The correct answer (or at least, the result of executing these statements in a real, non-buggy python interpreter) is 1.6666666666666667.

Slightly modified input:

Python 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0] on linux
>>> a, b = 3, 5
>>> a + b
8
>>> a, b = b, a
>>> a * b
15
>>> a / b

Probability distribution over the next tokens:

The model predicts the intuitively correct answer as most likely, though it is very uncertain.

Would a human, asked to predict the next token of any of the sequences above, be likely to come up with similar probability distributions for similar reasons? Probably not, though depending on the human, how much they know about Python, and how much effort they put into the making their prediction, the output that results from sampling from the human’s predicted probability distribution might match the output of sampling text-davinci’s distribution, in some cases. But the LLM and the human probably arrive at their probability distributions through vastly different mechanisms.

The fact that prediction errors of GPTs sometimes look surface-level similar to errors a human might make, is probably a consequence of two main things:

  • Similar capability level between GPT cognition and a non-concentrating human, at a specific text prediction task.

  • The ill-defined nature of some prediction tasks—what is the “right” answer, in the case of, say, an inconsistent python interpreter transcript?

In cases where sampled output between humans and GPTs looks similar (in particular, similarly “wrong”), this is probably more a fact about the nature of the task and the performance level of each system than about the underlying cognition performed by either.

Conclusion

These thought experiments and observations are not intended to show definitively that future GPTs (or systems based on them) will not have human-like cognition. Rather, they are meant to show that apparent surface-level similarities of current LLM outputs to human outputs do not imply this.

GPTs are predictors, not imitators, but imitating is one way of making predictions that is often effective in many domains, including imitation of apparent errors. Humans are great at pattern matching, but often, looking for surface-level patterns can lead to over-fitting and seeing patterns that do not exist in the territory. As models grow more powerful and more capable of producing human-like outputs, interpreting LLM outputs as evidence of underlying human-like cognition may become both more tempting and more fraught.

  1. ^

    Truth and correctness, on the other hand, are more narrow targets. There may or may not be multiple kinds of cognition which scale to high levels of capability without converging towards each other in underlying form.