Right. But the reports specifically on GPT-3.5-turbo fine-tuning announced in August were glowing, with people reporting being able to reach GPT-4-like levels on performance in narrow domains.
That’s why our expectations were high.
I am sure they do something relatively lightweight, like LoRA, https://arxiv.org/abs/2106.09685, which is what people tend to be mostly using (I think).
And, of course, with GPT-4 being very different from a conventional Transformer of GPT-3-like type, if one believes the rumors, the difficulties might have easily emerged, if one has been trying to do something like a LoRA-like thing.
But the reports specifically on GPT-3.5-turbo fine-tuning announced in August were glowing, with people reporting being able to reach GPT-4-like levels on performance in narrow domains.
Indeed, but only years after their original attempt. All of the early GPT-3 finetuning reports were very… meh. No one seemed terribly happy with it.
That’s my point: it seems like the first attempts did not go well for GPT-3. So, it’s not clear that the first attempts going poorly for GPT-4 is anything different. Perhaps in another 3 years, OA will have a new GPT-4 finetuning service which doesn’t require “more work” and Just Works™. (One does hope it wouldn’t take that long the second time around.)
Yes, the main rumor is that it’s a mixture-of-experts. This is already quite a difference from a single Transformer.
We presume that these experts are mostly made of various components of a Transformer (with some possible additions and modifications, which we don’t know), but we don’t know how independent those experts are, or whether they share a sizeable common initial computation and then branch off that, or something else entirely with some kind of dynamic sparse routing through a single network, and so on… I think it’s unlikely to be “just take a bunch of GPT-3′s, run an appropriate subset of them in parallel, and combine the results”.
So, we really don’t know, these rumors are only enough to make some partial guesses.
If we survive for a while, all this will eventually became public knowledge, and we’ll probably understand eventually how the magic of GPT-4 is possible.
Right. But the reports specifically on GPT-3.5-turbo fine-tuning announced in August were glowing, with people reporting being able to reach GPT-4-like levels on performance in narrow domains.
That’s why our expectations were high.
I am sure they do something relatively lightweight, like LoRA, https://arxiv.org/abs/2106.09685, which is what people tend to be mostly using (I think).
And, of course, with GPT-4 being very different from a conventional Transformer of GPT-3-like type, if one believes the rumors, the difficulties might have easily emerged, if one has been trying to do something like a LoRA-like thing.
Indeed, but only years after their original attempt. All of the early GPT-3 finetuning reports were very… meh. No one seemed terribly happy with it.
That’s my point: it seems like the first attempts did not go well for GPT-3. So, it’s not clear that the first attempts going poorly for GPT-4 is anything different. Perhaps in another 3 years, OA will have a new GPT-4 finetuning service which doesn’t require “more work” and Just Works™. (One does hope it wouldn’t take that long the second time around.)
What are the rumors? I’m only aware of MoE.
Yes, the main rumor is that it’s a mixture-of-experts. This is already quite a difference from a single Transformer.
We presume that these experts are mostly made of various components of a Transformer (with some possible additions and modifications, which we don’t know), but we don’t know how independent those experts are, or whether they share a sizeable common initial computation and then branch off that, or something else entirely with some kind of dynamic sparse routing through a single network, and so on… I think it’s unlikely to be “just take a bunch of GPT-3′s, run an appropriate subset of them in parallel, and combine the results”.
There is a huge diversity of techniques combining the MoE motifs and motifs associated with Transformers, see e.g. this collection of references https://github.com/XueFuzhao/awesome-mixture-of-experts
So, we really don’t know, these rumors are only enough to make some partial guesses.
If we survive for a while, all this will eventually became public knowledge, and we’ll probably understand eventually how the magic of GPT-4 is possible.