Well, in the chess example we do not have any obvious map/territory relation. Chess seems to be a purely formal game, as the pieces do not seem to refer to anything in the external world. So it’s much less obvious that training on form alone would also work for learning natural language, which does exhibit a map territory distinction.
For example, a few years ago, most people would have regarded it as highly unlikely that you could understand (decode) an intercepted alien message without any contextual information. But if you can understand text from form alone, as LLMs seem to prove, the message simply has to be long enough. Then you can train an LLM on it, which would then be able to understand the message. And it would also be able to translate it into English if it is additionally trained on English text.
That’s very counterintuive, or at least it was counterintuitive until recently. I doubt EY meant to count raw words as “anticipated experience”, since “experience” typically refers to sensory data only. (In fact, I think Guessing the Teacher’s Password also suggests that he didn’t.)
To repeat, I don’t blame him, as the proposition that large amounts of raw text can replace sensory data, that a sufficient amount of symbols can ground themselves, was broadly considered unlikely until LLMs came along. But I do blame Bender insofar as she didn’t update even in light of strong evidence that the classical hypothesis (you can’t infer meaning from form alone) was wrong.
Well, in the chess example we do not have any obvious map/territory relation.
Yes, there is. The transcripts are of 10 million games that real humans played to cover the distribution of real games, and then were annotated by Stockfish, to provide superhuman-quality metadata on good vs bad moves. That is the territory. The map is the set of transcripts.
But if you can understand text from form alone, as LLMs seem to prove, the message simply has to be long enough.
I would say ‘diverse enough’, not ‘long enough’. (An encyclopedia will teach a LLM many things; a dictionary the same length, probably not.) Similar to meta-learning vs learning.
the pieces do not seem to refer to anything in the external world.
What external world does our ‘external world’ itself refer to things inside of? If the ‘external world’ doesn’t need its own external world for grounding, then why does lots of text about the external world not suffice? (And if it does, what grounds that external external world, or where does the regress end?) As I like to put it, for an LLM, ‘reality’ is just the largest fictional setting—the one that encompasses all the other fictional settings it reads about from time to time.
As someone who doubtless does quite a lot of reading about things or writing to people you have never seen nor met in real life and have no ‘sensory’ way of knowing that they exist, this is a position you should find sympathetic.
Sympathy or not, the position that meaning of natural language can be inferred from the symbolic form alone wasn’t obvious to me in the past, as this is certainly not how humans learn language, and I don’t know any evidence that someone else thought this plausible before machine learning made it evident. It’s always easy to make something sound obvious after the fact, but that doesn’t mean that it actually was obvious to anyone at the time.
Plenty of linguists and connectionists thought it was possible, if only to show those damned Chomskyans that they were wrong!
To be specific, some of the radical linguists believed in pure distributional semantics, or that there is no semantics beyond syntax. I don’t know anyone in particular, but considering how often Chomsky, Pinker, etc were fighting against the “blank slate” theory, they definitely existed.
The following people likely believed that it is possible to learn a language purely from reading using a general learning architecture like neural networks (blank-slate):
James L. McClelland and David Rumelhart.
They were the main proponents of neural networks in the “past tense debate”. Generally, anyone on the side of neural networks in the past tense debate probably believed this.
B. F. Skinner.
Radical syntacticians? Linguists have failed to settle the question of “Just what is semantics? How is it different from syntax?”, and some linguists have taken the radical position “There is no semantics. Everything is syntax.”. Once that is done, there simply is no difficulty: just learn all the syntax, and there is nothing left to learn.
Possibly some of the participants in the “linguistics wars” believed in it. Specifically, some believed in “generative semantics”, whereby semantics is simply yet more generative grammar, and thus not any different from syntax (also generative grammar). Chomsky, as you might imagine, hated that, and successfully beat it down.
Maybe some people in distributional semantics? Perhaps Leonard Bloomfield? I don’t know enough about the history of linguistics to tell what Bloomfield or the “Bloomfieldians” believed in exactly. However, considering that Chomsky was strongly anti-Bloomsfield, it is a fair bet that some Bloomsfieldians (or self-styled “neo-Bloomsfieldians”) would support blank-slate learning of language, if only to show Chomskyans that they’re wrong.
Well, in the chess example we do not have any obvious map/territory relation. Chess seems to be a purely formal game, as the pieces do not seem to refer to anything in the external world. So it’s much less obvious that training on form alone would also work for learning natural language, which does exhibit a map territory distinction.
For example, a few years ago, most people would have regarded it as highly unlikely that you could understand (decode) an intercepted alien message without any contextual information. But if you can understand text from form alone, as LLMs seem to prove, the message simply has to be long enough. Then you can train an LLM on it, which would then be able to understand the message. And it would also be able to translate it into English if it is additionally trained on English text.
That’s very counterintuive, or at least it was counterintuitive until recently. I doubt EY meant to count raw words as “anticipated experience”, since “experience” typically refers to sensory data only. (In fact, I think Guessing the Teacher’s Password also suggests that he didn’t.)
To repeat, I don’t blame him, as the proposition that large amounts of raw text can replace sensory data, that a sufficient amount of symbols can ground themselves, was broadly considered unlikely until LLMs came along. But I do blame Bender insofar as she didn’t update even in light of strong evidence that the classical hypothesis (you can’t infer meaning from form alone) was wrong.
Yes, there is. The transcripts are of 10 million games that real humans played to cover the distribution of real games, and then were annotated by Stockfish, to provide superhuman-quality metadata on good vs bad moves. That is the territory. The map is the set of transcripts.
I would say ‘diverse enough’, not ‘long enough’. (An encyclopedia will teach a LLM many things; a dictionary the same length, probably not.) Similar to meta-learning vs learning.
What external world does our ‘external world’ itself refer to things inside of? If the ‘external world’ doesn’t need its own external world for grounding, then why does lots of text about the external world not suffice? (And if it does, what grounds that external external world, or where does the regress end?) As I like to put it, for an LLM, ‘reality’ is just the largest fictional setting—the one that encompasses all the other fictional settings it reads about from time to time.
As someone who doubtless does quite a lot of reading about things or writing to people you have never seen nor met in real life and have no ‘sensory’ way of knowing that they exist, this is a position you should find sympathetic.
Sympathy or not, the position that meaning of natural language can be inferred from the symbolic form alone wasn’t obvious to me in the past, as this is certainly not how humans learn language, and I don’t know any evidence that someone else thought this plausible before machine learning made it evident. It’s always easy to make something sound obvious after the fact, but that doesn’t mean that it actually was obvious to anyone at the time.
Plenty of linguists and connectionists thought it was possible, if only to show those damned Chomskyans that they were wrong!
To be specific, some of the radical linguists believed in pure distributional semantics, or that there is no semantics beyond syntax. I don’t know anyone in particular, but considering how often Chomsky, Pinker, etc were fighting against the “blank slate” theory, they definitely existed.
The following people likely believed that it is possible to learn a language purely from reading using a general learning architecture like neural networks (blank-slate):
James L. McClelland and David Rumelhart.
They were the main proponents of neural networks in the “past tense debate”. Generally, anyone on the side of neural networks in the past tense debate probably believed this.
B. F. Skinner.
Radical syntacticians? Linguists have failed to settle the question of “Just what is semantics? How is it different from syntax?”, and some linguists have taken the radical position “There is no semantics. Everything is syntax.”. Once that is done, there simply is no difficulty: just learn all the syntax, and there is nothing left to learn.
Possibly some of the participants in the “linguistics wars” believed in it. Specifically, some believed in “generative semantics”, whereby semantics is simply yet more generative grammar, and thus not any different from syntax (also generative grammar). Chomsky, as you might imagine, hated that, and successfully beat it down.
Maybe some people in distributional semantics? Perhaps Leonard Bloomfield? I don’t know enough about the history of linguistics to tell what Bloomfield or the “Bloomfieldians” believed in exactly. However, considering that Chomsky was strongly anti-Bloomsfield, it is a fair bet that some Bloomsfieldians (or self-styled “neo-Bloomsfieldians”) would support blank-slate learning of language, if only to show Chomskyans that they’re wrong.