Jacob_Hilton comments on Vladimir_Nesov’s Shortform

Jacob_Hilton 7 Aug 2025 15:25 UTC
5 points
1
The model sizes were likely chosen based on typical inference constraints. Given that, they mostly care about maximizing performance, and aren’t too concerned about the compute cost, since training such small models is very affordable for them. So it’s worth going a long way into the regime of diminishing returns.