If it was just math, then ok sure. But GPT-3 and related LMs can learn a wide variety of linguistic skills at certain levels of compute/data scale, and I was explicitly referring to a wide (linguistic and related) skill benchmark, with math being a stand in example for linguistic related/adjacent.
And btw, from what I understand GPT-3 learns math from having math problems in it’s training corpus, so it’s not even a great example of “side-effect of being good at text prediction”.
If it was just math, then ok sure. But GPT-3 and related LMs can learn a wide variety of linguistic skills at certain levels of compute/data scale, and I was explicitly referring to a wide (linguistic and related) skill benchmark, with math being a stand in example for linguistic related/adjacent.
And btw, from what I understand GPT-3 learns math from having math problems in it’s training corpus, so it’s not even a great example of “side-effect of being good at text prediction”.