Can we use similar methods to estimate the size and active parameters of GPT-4.5?
Naively extrapolating from the 1800B-A280B estimate for GPT-4 and the fact that GPT4.5 costs about 2.5x as much, we get 4500B-A700B.
I have no idea if that’s a good guess, but hopefully someone can come up with a better one.
This idea appears very similar to the paper “Reinforcement Learning Teachers of Test Time Scaling”: https://arxiv.org/abs/2506.08388