I think that most people who work on models like GPT-3 seem more interested in trendlines than you do here.
That said, it’s not super clear to me what you are saying so I’m not sure I disagree. Your narrative sounds like a strawman since people usually extrapolate performance on downstream tasks they care about rather than on perplexity. But I do agree that the updates from GPT-3 are not from OpenAI’s marketing but instead from people’s legitimate surprise about how smart big language models seem to be.
As you say, I think the interesting claim in GPT-3 was basically that scaling trends would continue, where pessimists incorrectly expected they would break based on weak arguments. I think that looking at all the graphs, both of perplexity and performance on individual tasks, helps establish this as the story. I don’t really think this lines up with Eliezer’s picture of AGI but that’s presumably up for debate.
There are always a lot of people willing to confidently decree that trendlines will break down without much argument. (I do think that eventually the GPT-3 trendline will break if you don’t change the data, but for the boring reason that the entropy of natural language will eventually dominate the gradient noise and so lead to a predictable slowdown.)
I think that most people who work on models like GPT-3 seem more interested in trendlines than you do here.
That said, it’s not super clear to me what you are saying so I’m not sure I disagree. Your narrative sounds like a strawman since people usually extrapolate performance on downstream tasks they care about rather than on perplexity. But I do agree that the updates from GPT-3 are not from OpenAI’s marketing but instead from people’s legitimate surprise about how smart big language models seem to be.
As you say, I think the interesting claim in GPT-3 was basically that scaling trends would continue, where pessimists incorrectly expected they would break based on weak arguments. I think that looking at all the graphs, both of perplexity and performance on individual tasks, helps establish this as the story. I don’t really think this lines up with Eliezer’s picture of AGI but that’s presumably up for debate.
There are always a lot of people willing to confidently decree that trendlines will break down without much argument. (I do think that eventually the GPT-3 trendline will break if you don’t change the data, but for the boring reason that the entropy of natural language will eventually dominate the gradient noise and so lead to a predictable slowdown.)