Good luck counting the R’s in token 101830. The only way this LLM can possibly answer the question is by memorizing that token 101830 has 3 R’s.
No, an LLM could simply print out the “token 101830” letter by letter, and then count those letters. Even a 7B scale LLM that predates the “strawberry test” can do that—and it will give the right answer. But only if you explicitly prompt it to use this strategy.
It wouldn’t use this strategy on its own. Because it’s not aware of its limitations. It doesn’t expect the task to be tricky or hard. It lacks the right “coping mechanisms”, which leaves it up to the user to compensate with prompting.
I kind of wanted to write an entire post on LLM issues of this nature, but I’m not sure if I should.
I should make this more clear, but the LLM can only do this by memorizing something about token 101830, not by counting the characters that it sees. So it can memorize that token 101830 is spelled s-t-r-a-w-b-e-r-r-y and print that out and then count the characters. My point is just that it can’t count the characters in the input.
Sure. But my point is that it’s not that the LLM doesn’t have the knowledge to solve the task. It has that knowledge. It just doesn’t know how to reach it. And doesn’t know that it should try.
No, an LLM could simply print out the “token 101830” letter by letter, and then count those letters. Even a 7B scale LLM that predates the “strawberry test” can do that—and it will give the right answer. But only if you explicitly prompt it to use this strategy.
It wouldn’t use this strategy on its own. Because it’s not aware of its limitations. It doesn’t expect the task to be tricky or hard. It lacks the right “coping mechanisms”, which leaves it up to the user to compensate with prompting.
I kind of wanted to write an entire post on LLM issues of this nature, but I’m not sure if I should.
I should make this more clear, but the LLM can only do this by memorizing something about token 101830, not by counting the characters that it sees. So it can memorize that token 101830 is spelled s-t-r-a-w-b-e-r-r-y and print that out and then count the characters. My point is just that it can’t count the characters in the input.
Sure. But my point is that it’s not that the LLM doesn’t have the knowledge to solve the task. It has that knowledge. It just doesn’t know how to reach it. And doesn’t know that it should try.