I get it if you’re worried about leaks but I don’t get how it could be a hard engineering problem — just share API access early, with fine-tuning
Fine-tuning access can be extremely expensive if implemented naively and it’s plausible that cheap (LoRA) fine-tuning isn’t even implemented for new models internally for a while at AI labs. If you make the third party groups pay for it than I suppose this isn’t a problem, but the costs could be vast.
What do you expect to be expensive? The engineer hours to build the fine-tuning infra? Or the actual compute for fine-tuning?
Given the amount of internal fine-tuning experiments going on for safety stuff, I’d be surprised if the infra was a bottleneck, though maybe there is a large overhead in making these find-tuned models available through an API.
I’d be even more surprised if the cost of compute was significant compared to the rest of the activity the lab is doing (I think fine-tuning on a few thousand sequences is often enough for capabilities’ evaluations, you rarely need massive training runs).
Isn’t that only ~10x more expensive than running the forward-passes (even if you don’t do LoRA)? Or is it much more because of communications bottlenecks + the infra being taken by the next pretraining run (without the possibility to swap the model in and out).
Fine-tuning access can be extremely expensive if implemented naively and it’s plausible that cheap (LoRA) fine-tuning isn’t even implemented for new models internally for a while at AI labs. If you make the third party groups pay for it than I suppose this isn’t a problem, but the costs could be vast.
I agree that inference access is cheap.
What do you expect to be expensive? The engineer hours to build the fine-tuning infra? Or the actual compute for fine-tuning?
Given the amount of internal fine-tuning experiments going on for safety stuff, I’d be surprised if the infra was a bottleneck, though maybe there is a large overhead in making these find-tuned models available through an API.
I’d be even more surprised if the cost of compute was significant compared to the rest of the activity the lab is doing (I think fine-tuning on a few thousand sequences is often enough for capabilities’ evaluations, you rarely need massive training runs).
Compute for doing inference on the weights if you don’t have LoRA finetuning set up properly.
My implicit claim is that there maybe isn’t that much fine-tuning stuff internally.
Fine-tuning for GPT-4 is in an experimental access program since at least November, and OpenAI has written about fine-tuning GPT-4 for a telecom company.
Anthropic says “Our API does not currently offer fine-tuning, but please ask your Anthropic contact if you are interested in exploring this option.”
You can apparently fine-tune Gemini 1.0 Pro.
Maybe setting up custom fine-tuning is hard and labs often only set it up during deployment...
(Separately, it would be nice if OpenAI and Anthropic let some safety researchers do fine-tuning now.)
Isn’t that only ~10x more expensive than running the forward-passes (even if you don’t do LoRA)? Or is it much more because of communications bottlenecks + the infra being taken by the next pretraining run (without the possibility to swap the model in and out).