Vladimir_Nesov comments on Vladimir_Nesov’s Shortform

Vladimir_Nesov 21 Apr 2025 22:58 UTC
6 points
3
It’s evidence to the extent that the mere fact of publishing Figure 7 (hopefully) suggests that the authors (likely knowing relevant OpenAI internal research) didn’t expect that their pass@10K result for the reasoning model is much worse than the language monkey pass@10K result for the underlying non-reasoning model. So maybe it’s not actually worse.