Quick reply, after doing a bit of reading and recalling a thing or two: In a ‘classical’ machine we have a clean separation of process and memory. Memory is kept on the paper tape of our Turing Machine and processing is located in, well, the processor. In a connectionist machine process and memory are all smushed together. GPTs are connectionist virtual machines running on a classical machine. The “plan” I’m looking for is stored in the parameter weights, but it’s smeared over a bunch of them. So this classical machine has to visit every one of them before it can output a token.
So, yes, purely next token prediction. But the prediction cycle, in effect, involves ‘reassembling’ the plan each time through.
To my mind, in order to say we “understand” how this puppy is telling a story, we need to say more than it’s a next-token-prediction machine. We need to say something about how that “plan” is smeared over those weights. We need to come up with concepts we can use in formulating such explanations. Maybe the right concepts are just laying scattered about in dusty old file cabinets someplace. But, I’m thinking this is likely, we have to invent some new ones as well.
Wolfram was trained as a physicist. The language of complex dynamics is natural to him, whereas it’s a poorly learned third or fourth language for me, So he talks of basins of attractors and attractor landscapes. As far as I can tell, in his language, those 175B parameters can be said to have an attractor landscape. When ChatGPT tells a story it enters the Story Valley in that landscape and walks a path through that valley. When its done with the story, it exits that valley. There are all kinds of valleys (and valleys within valleys (and valleys within them)) in the attractor landscape, for all kinds of tasks.
FWIW, the human brain has roughly 86B neurons. Each of those is connected with roughly 10K other neurons. Those connections are mediated by upward of a 100 different chemicals. And those neurons are surrounded by glial cells. In the old days researchers thought those glial cells were like packing peanuts for the neural net. We now know better and are beginning to figure out what they’re doing. Memory is definitely part of their story. So we’ve got to add them into the mix. How many glial cells per neuron? There might be a number in the literature, but I haven’t checked. Anyhow, the number of parameters we need to characterize a human brain is vast.
Quick reply, after doing a bit of reading and recalling a thing or two: In a ‘classical’ machine we have a clean separation of process and memory. Memory is kept on the paper tape of our Turing Machine and processing is located in, well, the processor. In a connectionist machine process and memory are all smushed together. GPTs are connectionist virtual machines running on a classical machine. The “plan” I’m looking for is stored in the parameter weights, but it’s smeared over a bunch of them. So this classical machine has to visit every one of them before it can output a token.
So, yes, purely next token prediction. But the prediction cycle, in effect, involves ‘reassembling’ the plan each time through.
To my mind, in order to say we “understand” how this puppy is telling a story, we need to say more than it’s a next-token-prediction machine. We need to say something about how that “plan” is smeared over those weights. We need to come up with concepts we can use in formulating such explanations. Maybe the right concepts are just laying scattered about in dusty old file cabinets someplace. But, I’m thinking this is likely, we have to invent some new ones as well.
Wolfram was trained as a physicist. The language of complex dynamics is natural to him, whereas it’s a poorly learned third or fourth language for me, So he talks of basins of attractors and attractor landscapes. As far as I can tell, in his language, those 175B parameters can be said to have an attractor landscape. When ChatGPT tells a story it enters the Story Valley in that landscape and walks a path through that valley. When its done with the story, it exits that valley. There are all kinds of valleys (and valleys within valleys (and valleys within them)) in the attractor landscape, for all kinds of tasks.
FWIW, the human brain has roughly 86B neurons. Each of those is connected with roughly 10K other neurons. Those connections are mediated by upward of a 100 different chemicals. And those neurons are surrounded by glial cells. In the old days researchers thought those glial cells were like packing peanuts for the neural net. We now know better and are beginning to figure out what they’re doing. Memory is definitely part of their story. So we’ve got to add them into the mix. How many glial cells per neuron? There might be a number in the literature, but I haven’t checked. Anyhow, the number of parameters we need to characterize a human brain is vast.