What do you expect to be expensive? The engineer hours to build the fine-tuning infra? Or the actual compute for fine-tuning?
Given the amount of internal fine-tuning experiments going on for safety stuff, I’d be surprised if the infra was a bottleneck, though maybe there is a large overhead in making these find-tuned models available through an API.
I’d be even more surprised if the cost of compute was significant compared to the rest of the activity the lab is doing (I think fine-tuning on a few thousand sequences is often enough for capabilities’ evaluations, you rarely need massive training runs).
Isn’t that only ~10x more expensive than running the forward-passes (even if you don’t do LoRA)? Or is it much more because of communications bottlenecks + the infra being taken by the next pretraining run (without the possibility to swap the model in and out).
What do you expect to be expensive? The engineer hours to build the fine-tuning infra? Or the actual compute for fine-tuning?
Given the amount of internal fine-tuning experiments going on for safety stuff, I’d be surprised if the infra was a bottleneck, though maybe there is a large overhead in making these find-tuned models available through an API.
I’d be even more surprised if the cost of compute was significant compared to the rest of the activity the lab is doing (I think fine-tuning on a few thousand sequences is often enough for capabilities’ evaluations, you rarely need massive training runs).
Compute for doing inference on the weights if you don’t have LoRA finetuning set up properly.
My implicit claim is that there maybe isn’t that much fine-tuning stuff internally.
Isn’t that only ~10x more expensive than running the forward-passes (even if you don’t do LoRA)? Or is it much more because of communications bottlenecks + the infra being taken by the next pretraining run (without the possibility to swap the model in and out).