These papers are interesting, thanks for compiling them!
Skimming through some of them, the sense I get is that they provide evidence for the claim that the structure and function of LLMs is similar to (and inspired by) the structure of particular components of human brains, namely, the components which do language processing.
This is slightly different from the claim I am making, which is about how the cognition of LLMs compares to the cognition of human brains as a whole. My comparison is slightly unfair, since I’m comparing a single forward pass through an LLM to get a prediction of the next token, to a human tasked with writing down an explicit probability distribution on the next token, given time to think, research, etc. [1]
Also, LLM capability at language processing / text generation is already far superhuman (by some metrics). The architecture of LLMs may be simpler than the comparable parts of the brain’s architecture in some ways, but the LLM version can run with far more precision / scale / speed than a human brain. Whether or not LLMs are already exceeding human brains by specific metrics is debatable / questionable, but they are not bottlenecked on further scaling by biology.
And this is to say nothing of all the other kinds of cognition that happens in the brain. I see these brain components as analogous to LangChain or AutoGPT, if LangChain or AutoGPT themselves were written as ANNs that interfaced “natively” with the transformers of an LLM, instead of as Python code.
Finally, similarity of structure doesn’t imply similarity of function. I elaborated a bit on this in a comment thread here.
You might be able to get better predictions from an LLM by giving it more “time to think”, using chain-of-thought prompting or other methods. But these are methods humans use when using LLMs as a tool, rather than ideas which originate from within the LLM itself, so I don’t think it’s exactly fair to call them “LLM cognition” on their own.
Re the superhuman next prediction ability, there’s an issue in which the evaluations are fairly distorted in ways which make humans artificially worse than they actually are at next-token prediction, see here:
These papers are interesting, thanks for compiling them!
Skimming through some of them, the sense I get is that they provide evidence for the claim that the structure and function of LLMs is similar to (and inspired by) the structure of particular components of human brains, namely, the components which do language processing.
This is slightly different from the claim I am making, which is about how the cognition of LLMs compares to the cognition of human brains as a whole. My comparison is slightly unfair, since I’m comparing a single forward pass through an LLM to get a prediction of the next token, to a human tasked with writing down an explicit probability distribution on the next token, given time to think, research, etc. [1]
Also, LLM capability at language processing / text generation is already far superhuman (by some metrics). The architecture of LLMs may be simpler than the comparable parts of the brain’s architecture in some ways, but the LLM version can run with far more precision / scale / speed than a human brain. Whether or not LLMs are already exceeding human brains by specific metrics is debatable / questionable, but they are not bottlenecked on further scaling by biology.
And this is to say nothing of all the other kinds of cognition that happens in the brain. I see these brain components as analogous to LangChain or AutoGPT, if LangChain or AutoGPT themselves were written as ANNs that interfaced “natively” with the transformers of an LLM, instead of as Python code.
Finally, similarity of structure doesn’t imply similarity of function. I elaborated a bit on this in a comment thread here.
You might be able to get better predictions from an LLM by giving it more “time to think”, using chain-of-thought prompting or other methods. But these are methods humans use when using LLMs as a tool, rather than ideas which originate from within the LLM itself, so I don’t think it’s exactly fair to call them “LLM cognition” on their own.
Re the superhuman next prediction ability, there’s an issue in which the evaluations are fairly distorted in ways which make humans artificially worse than they actually are at next-token prediction, see here:
https://www.lesswrong.com/posts/htrZrxduciZ5QaCjw/language-models-seem-to-be-much-better-than-humans-at-next#wPwSND5mfQ7ncruWs