What do you expect to be expensive? The engineer hours to build the fine-tuning infra? Or the actual compute for fine-tuning?
Given the amount of internal fine-tuning experiments going on for safety stuff, I’d be surprised if the infra was a bottleneck, though maybe there is a large overhead in making these find-tuned models available through an API.
I’d be even more surprised if the cost of compute was significant compared to the rest of the activity the lab is doing (I think fine-tuning on a few thousand sequences is often enough for capabilities’ evaluations, you rarely need massive training runs).
Isn’t that only ~10x more expensive than running the forward-passes (even if you don’t do LoRA)? Or is it much more because of communications bottlenecks + the infra being taken by the next pretraining run (without the possibility to swap the model in and out).
What do you expect to be expensive? The engineer hours to build the fine-tuning infra? Or the actual compute for fine-tuning?
Given the amount of internal fine-tuning experiments going on for safety stuff, I’d be surprised if the infra was a bottleneck, though maybe there is a large overhead in making these find-tuned models available through an API.
I’d be even more surprised if the cost of compute was significant compared to the rest of the activity the lab is doing (I think fine-tuning on a few thousand sequences is often enough for capabilities’ evaluations, you rarely need massive training runs).
Compute for doing inference on the weights if you don’t have LoRA finetuning set up properly.
My implicit claim is that there maybe isn’t that much fine-tuning stuff internally.
Fine-tuning for GPT-4 is in an experimental access program since at least November, and OpenAI has written about fine-tuning GPT-4 for a telecom company.
Anthropic says “Our API does not currently offer fine-tuning, but please ask your Anthropic contact if you are interested in exploring this option.”
You can apparently fine-tune Gemini 1.0 Pro.
Maybe setting up custom fine-tuning is hard and labs often only set it up during deployment...
(Separately, it would be nice if OpenAI and Anthropic let some safety researchers do fine-tuning now.)
Isn’t that only ~10x more expensive than running the forward-passes (even if you don’t do LoRA)? Or is it much more because of communications bottlenecks + the infra being taken by the next pretraining run (without the possibility to swap the model in and out).