gwern comments on Recent AI model progress feels mostly like bullshit

gwern 7 Apr 2025 2:53 UTC
7 points
7

We have not yet tried 4.5 as it’s so expensive that we would not be able to deploy it, even for limited sections.

Still seems like potentially valuable information to know: how much does small-model smell cost you? What happens if you ablate reasoning? If it is factual knowledge and GPT-4.5 performs much better, then that tells you things like ‘maybe finetuning is more useful than we think’, etc. If you are already set up to benchmark all these OA models, then a datapoint from GPT-4.5 should be quite easy and just a matter of a small amount of chump change in comparison to the insight, like a few hundred bucks.
- dimitry12 18 Apr 2025 17:11 UTC
  3 points
  0
  Parent
  Please help me understand how do you suggest to “ablate reasoning” and what’s the connection with “small-model smell”?