An interesting work, let me compare it with my estimates from three weeks ago: for all eight GPT-5 series models I considered (5, 5 Pro, 5.1, 5.2, 5.2 Pro, 5.3, 5.4, 5.4 Pro) 2T total parameters fall within the 90% prediction interval brackets, and four more I didn’t consider (4o, o1, o3, 4.1) fit as well. My 1.2T estimate for Sonnet is very close to Li’s 1.7T, and my 4T estimate for Opus 4-series fits into the 90% PI bracket for all five versions. (Just to remind, on average, we should expect 1 true value out of 10 not to fit)
An interesting work, let me compare it with my estimates from three weeks ago: for all eight GPT-5 series models I considered (5, 5 Pro, 5.1, 5.2, 5.2 Pro, 5.3, 5.4, 5.4 Pro) 2T total parameters fall within the 90% prediction interval brackets, and four more I didn’t consider (4o, o1, o3, 4.1) fit as well. My 1.2T estimate for Sonnet is very close to Li’s 1.7T, and my 4T estimate for Opus 4-series fits into the 90% PI bracket for all five versions. (Just to remind, on average, we should expect 1 true value out of 10 not to fit)