[Question] When and why did ‘training’ become ‘pretraining’?

Just an ML linguistic quirk I have wondered about for a while. When I started learning ML (in 2016-2017 period) everybody referred to the period of training models as just ‘training’ which could then (optionally) be followed by finetuning. This usage makes sense to me and as far as I know was the standard ML terminology basically as long as people have been training neural networks.

Nowadays, we appear to call what used to be training ‘pretraining’. From my perspective this term appeared around 2021 and became basically ubiquitous by 2022. Where did this come from? What is the difference between ‘pretraining’ now and ‘training’ from before?

My feeling is that this usage started at big LLM companies. However, what are these companies doing such that ‘pretraining’ should be a sensible term? As far as I know (especially around 2022 when it really took off) LLM training followed the standard ‘pretraining’ → ‘finetuning’ → ‘alignment’ by RLHF pipeline. Why do we need the special term ‘pretraining’ to handle this when ‘training’ still seems perfectly fine? Is it because we developed ‘post-training’ (i.e. finetuning) phases regularly? but then why ‘pretraining’ and ‘post-training’—but no ‘training’?

Does anybody here know a good rationale or history of ‘pretraining’? or is this just some inexplicable linguistic quirk?

No comments.