Yeah I think leading labs generally retrain their base models less often than every 6 months (but there’s a lot we don’t know for sure). And I believe this most likely has to do with a production AI model being the result of a lot of careful tuning on pre-training, mid-training, post-training etc. Swapping in a new base model might lead to a lot of post-training regressions that need to be fixed. And your old base model is a “lucky” one in some sense because it either was selected for doing well and/or it required lots experiments, derisking runs, etc. Even with all of your new algorithmic tricks it might be hard to one-shot YOLO a base model that’s better than your SOTA model from nine months ago. But this is probably much easier for your model from 18 or 27 months ago.
Also I’d guess staff costs are more important than compute costs here but these considerations mean compute costs of retraining are higher than one might think.
Yeah I think leading labs generally retrain their base models less often than every 6 months (but there’s a lot we don’t know for sure). And I believe this most likely has to do with a production AI model being the result of a lot of careful tuning on pre-training, mid-training, post-training etc. Swapping in a new base model might lead to a lot of post-training regressions that need to be fixed. And your old base model is a “lucky” one in some sense because it either was selected for doing well and/or it required lots experiments, derisking runs, etc. Even with all of your new algorithmic tricks it might be hard to one-shot YOLO a base model that’s better than your SOTA model from nine months ago. But this is probably much easier for your model from 18 or 27 months ago.
Also I’d guess staff costs are more important than compute costs here but these considerations mean compute costs of retraining are higher than one might think.