Aaron Staley comments on Burny’s Shortform

Aaron Staley 19 Jul 2025 18:47 UTC
1 point
0
Good point. This does update me downward on Deep Think outperforming matharena’s gemini-2.5-pro IMO run as it is possible Deep Think internally was doing a similar selection process to begin with. Difficult to know without randomly sampling gemini-2.5-pro’s answers and seeing how much the best-of-n selection lifted its score.