Bucket errors are when you use one word for multiple concepts without realizing that; that is, you keep multiple concepts in the same “mental bucket” and whatever you learn about one of them, you automatically apply to the others.
As a human, if you notice the difference, you can split the bucket, and maybe you start using different words for the difference concepts, or at least add some qualifiers. And the longer ago you did this, the more opportunity to grow separate the new mental buckets had.
But for a LLM, the words are the territory (I think), and it can be difficult to distinguish between two meanings of X, if you only have a short chat history explaining the difference, but 99% of the material you were trained on does not distinguish it. (Also, you forget the chat history at the beginning of a new chat.)
*
This possibility occurred to me when I was discussing set theory with ChatGPT yesterday, and it kept using “countable sequence” for (1) a sequence of countable length, and (2) a sequence containing only countable values, as if those were synonyms. I tried to make it admit the mistake, but instead it… it’s difficult to describe that… briefly admitted that it made a mistake, and then offered an alternative explanation that fundamentally included the same mistake, and came to even crazier conclusion. I tried to make it see the contradiction, but as they say, one man’s modus ponens is another man’s modus tollens… until at the end it more or less said (didn’t say it explicitly, but said things that took this assumption for granted) that for every uncountable ordinal α, the value α+1 is countable. Why? Because the sequence (α, α+1) is countable (has length 2), and a limit of a countable sequence is either a countable number or ω₁, and clearly, α+1 cannot be ω₁ therefore it must be countable. It explicitly confirmed that ω₁+1 is countable. At that moment I just gave up.
I assume that in literature, when countable sequences are mentioned, in 99% of situations they have both countable lengths and countable values, which makes it difficult for the LLM to distinguish between these two concepts. Even if I succeeded at explaining the difference, would it have a chance to use the term meaningfully? It probably cannot re-evaluate all the text it was trained on, to carefully check which instances of the word correspond to which meaning.
This seems likely. Sequences with more than countably many terms are a tiny minority in the training data, as are sequences including any ordinals. As a result, you’re likely to get better results using less common but more specific language rather than trying to disambiguate “countable sequence”, i.e., when its vocabulary is less overloaded.
In my exposure to mathematical literature, almost all sequences have values for which the term “countable” is inapplicable since they’re not sets. Even in the cases where the values themselves are sets, it was almost always used to mean a sequence with countable domain (i.e. length) and not one in which all elements of the codomain (values) are countable. It’s usually in the sense of “countably infinite” as opposed to “finite”, rather than opposed to “uncountably infinite”.
Are LLMs especially vulnerable to bucket errors?
Bucket errors are when you use one word for multiple concepts without realizing that; that is, you keep multiple concepts in the same “mental bucket” and whatever you learn about one of them, you automatically apply to the others.
As a human, if you notice the difference, you can split the bucket, and maybe you start using different words for the difference concepts, or at least add some qualifiers. And the longer ago you did this, the more opportunity to grow separate the new mental buckets had.
But for a LLM, the words are the territory (I think), and it can be difficult to distinguish between two meanings of X, if you only have a short chat history explaining the difference, but 99% of the material you were trained on does not distinguish it. (Also, you forget the chat history at the beginning of a new chat.)
*
This possibility occurred to me when I was discussing set theory with ChatGPT yesterday, and it kept using “countable sequence” for (1) a sequence of countable length, and (2) a sequence containing only countable values, as if those were synonyms. I tried to make it admit the mistake, but instead it… it’s difficult to describe that… briefly admitted that it made a mistake, and then offered an alternative explanation that fundamentally included the same mistake, and came to even crazier conclusion. I tried to make it see the contradiction, but as they say, one man’s modus ponens is another man’s modus tollens… until at the end it more or less said (didn’t say it explicitly, but said things that took this assumption for granted) that for every uncountable ordinal α, the value α+1 is countable. Why? Because the sequence (α, α+1) is countable (has length 2), and a limit of a countable sequence is either a countable number or ω₁, and clearly, α+1 cannot be ω₁ therefore it must be countable. It explicitly confirmed that ω₁+1 is countable. At that moment I just gave up.
I assume that in literature, when countable sequences are mentioned, in 99% of situations they have both countable lengths and countable values, which makes it difficult for the LLM to distinguish between these two concepts. Even if I succeeded at explaining the difference, would it have a chance to use the term meaningfully? It probably cannot re-evaluate all the text it was trained on, to carefully check which instances of the word correspond to which meaning.
I think so! And I think patch tokenization may resolve this; see note
This seems likely. Sequences with more than countably many terms are a tiny minority in the training data, as are sequences including any ordinals. As a result, you’re likely to get better results using less common but more specific language rather than trying to disambiguate “countable sequence”, i.e., when its vocabulary is less overloaded.
In my exposure to mathematical literature, almost all sequences have values for which the term “countable” is inapplicable since they’re not sets. Even in the cases where the values themselves are sets, it was almost always used to mean a sequence with countable domain (i.e. length) and not one in which all elements of the codomain (values) are countable. It’s usually in the sense of “countably infinite” as opposed to “finite”, rather than opposed to “uncountably infinite”.
ChatGPT is just bad at mathematical reasoning.