The supposed dropping inference cost at a given level of capability is about benchmark performance (I’m skeptical it truly applies to real world uses at a similar level), which is largely about post-training (or mid-training with synthetic data), doesn’t have much use for new pretrains. If there is already some pretrain of a relevant size from the last ~annual pretraining effort, and current post-training methods let it be put to use in a new role because they manage to make it good enough for that, they can just use the older pretrain (possibly refreshing it with mid-training on natural data to a more recent cutoff date). Confusingly, sometimes mid-training updates are referred to as different base or foundation models, even when they share the same pretrain.
In 2025-2026, there is also (apart from post-training improvements) the transition from older 8-chip Nvidia servers to rack-scale servers with more HBM per scale-up world that enable serving the current largest models efficiently (sized for 2024 levels of pretraining compute), and allow serving the smaller models like GPT-5 notably cheaper. But that’s a one-time thing, and pretrains for even some of the largest models (not to mention the smaller ones) might’ve already been done in 2024. Probably when updating to a significantly larger model, you wouldn’t just increment the minor version number. Though incrementing just the minor version number might be in order when updating to a new pretrain of a similar size, or when switching to a similarly capable pretrain of a smaller size, and either could happen during the ~annual series of new pretraining runs depending on how well they turn out.
The supposed dropping inference cost at a given level of capability is about benchmark performance (I’m skeptical it truly applies to real world uses at a similar level), which is largely about post-training (or mid-training with synthetic data), doesn’t have much use for new pretrains. If there is already some pretrain of a relevant size from the last ~annual pretraining effort, and current post-training methods let it be put to use in a new role because they manage to make it good enough for that, they can just use the older pretrain (possibly refreshing it with mid-training on natural data to a more recent cutoff date). Confusingly, sometimes mid-training updates are referred to as different base or foundation models, even when they share the same pretrain.
In 2025-2026, there is also (apart from post-training improvements) the transition from older 8-chip Nvidia servers to rack-scale servers with more HBM per scale-up world that enable serving the current largest models efficiently (sized for 2024 levels of pretraining compute), and allow serving the smaller models like GPT-5 notably cheaper. But that’s a one-time thing, and pretrains for even some of the largest models (not to mention the smaller ones) might’ve already been done in 2024. Probably when updating to a significantly larger model, you wouldn’t just increment the minor version number. Though incrementing just the minor version number might be in order when updating to a new pretrain of a similar size, or when switching to a similarly capable pretrain of a smaller size, and either could happen during the ~annual series of new pretraining runs depending on how well they turn out.