Thanks!! Quick question while I think over the rest:
What data are you plotting? Where exactly did you get it (i.e., what references)?
And why is the 2021 one better than the 2023 ones? Normally we would expect the other way around, right? Does DeepMind have so much secret sauce that it’s worth more than 2 years of public knowledge? Or are the other two groups making rookie mistakes? Or am I misunderstanding the plot?
Why is Gopher better than Pythia or Cerebras? Mostly no comment, but I think Pythia and Cerebras weren’t making any super simple obvious mistake but were behind 2021-era DeepMind.
Thanks!! Quick question while I think over the rest:
What data are you plotting? Where exactly did you get it (i.e., what references)?
And why is the 2021 one better than the 2023 ones? Normally we would expect the other way around, right? Does DeepMind have so much secret sauce that it’s worth more than 2 years of public knowledge? Or are the other two groups making rookie mistakes? Or am I misunderstanding the plot?
Gopher
Cerebras-GPT
Pythia
Why is Gopher better than Pythia or Cerebras? Mostly no comment, but I think Pythia and Cerebras weren’t making any super simple obvious mistake but were behind 2021-era DeepMind.