FWIW, back in the 1970s and 1980s David Marr and Thomas Poggio argued that large complex ‘information processing systems’ need to be analyzed and described on several levels, with the higher levels being implemented in the lower levels. Just what and how many levels there are has varied according to this and that, but the principle is alive and kicking. David Chapman makes a similar argument about levels: How to understand AI systems.
Why should this be the case? Because ANNs in general are designed to take on the structure inherent in the data they absorb during training. If they didn’t, they’d be of little value. If you want to understand how a LLM tells stories, consult a narratologist.
I’ve made the levels argument, with reference to Marr/Poggio, in two recent working papers:
I examine a set of stories that are organized on three levels: 1) the entire story trajectory, 2) segments within the trajectory, and 3) sentences within individual segments. I conjecture that the probability distribution from which ChatGPT draws next tokens follows a hierarchy nested according to those three levels and that is encoded in the weights of ChatGPT’s parameters. I arrived at this conjecture to account for the results of experiments in which ChatGPT is given a prompt containing a story along with instructions to create a new story based on that story but changing a key character: the protagonist or the antagonist. That one change then ripples through the rest of the story. The pattern of differences between the old and the new story indicates how ChatGPT maintains story coherence. The nature and extent of the differences between the original story and the new one depends roughly on the degree of difference between the original key character and the one substituted for it. I conclude with a methodological coda: ChatGPT’s behavior must be described and analyzed on three levels: 1) The experiments exhibit surface level behavior. 2) The conjecture is about a middle level that contains the nested hierarchy of probability distributions. 3) The transformer virtual machine is the bottom level.
Add in some complex dynamics, perhaps some conceptual space semantics from Peter Gärdenfors, and who knows what else, and I expect the mystery about how LLMs work to dissipate before AGI arrives.
FWIW, back in the 1970s and 1980s David Marr and Thomas Poggio argued that large complex ‘information processing systems’ need to be analyzed and described on several levels, with the higher levels being implemented in the lower levels. Just what and how many levels there are has varied according to this and that, but the principle is alive and kicking. David Chapman makes a similar argument about levels: How to understand AI systems.
Why should this be the case? Because ANNs in general are designed to take on the structure inherent in the data they absorb during training. If they didn’t, they’d be of little value. If you want to understand how a LLM tells stories, consult a narratologist.
I’ve made the levels argument, with reference to Marr/Poggio, in two recent working papers:
ChatGPT intimates a tantalizing future; its core LLM is organized on multiple levels; and it has broken the idea of thinking
ChatGPT tells stories, and a note about reverse engineering
Here’s the abstract of the second paper:
Add in some complex dynamics, perhaps some conceptual space semantics from Peter Gärdenfors, and who knows what else, and I expect the mystery about how LLMs work to dissipate before AGI arrives.