It was reasonable to think that maybe transformers would just work and soon when we were racing through GPT-2, GPT-3, to GPT-4. We just aren’t in that situation anymore
There remains about 2,000x in scaling of raw compute from GPT-4 (2e25 FLOPs) to $150bn training systems of 2028 (5e28 FLOPs), more in effective compute from improved architecture over 6 years[1]. That’s exactly the kind of situation we were in between GPT-2, GPT-3, and GPT-4, not knowing what the subsequent levels of scaling would bring. So far the scaling experiment demonstrated significantly increasing capabilities, and we are not even 100x up from GPT-4 yet to get the first negative result.
More than this on the same schedule would require much better capabilities, but this much seems plausible in any case, so describes the scale of the experiment we’ll get to see shortly in case capabilities actually stop improving, the strength of the negative result.
It’s not just too strong, it’s also a reminder that we need to get used to waiting.
Even under short timelines, things will not move that fast, and we have not yet gotten large negative results, so the scaling case remains reasonable, so we kinda have to get used to hurrying up and waiting.
There remains about 2,000x in scaling of raw compute from GPT-4 (2e25 FLOPs) to $150bn training systems of 2028 (5e28 FLOPs), more in effective compute from improved architecture over 6 years[1]. That’s exactly the kind of situation we were in between GPT-2, GPT-3, and GPT-4, not knowing what the subsequent levels of scaling would bring. So far the scaling experiment demonstrated significantly increasing capabilities, and we are not even 100x up from GPT-4 yet to get the first negative result.
More than this on the same schedule would require much better capabilities, but this much seems plausible in any case, so describes the scale of the experiment we’ll get to see shortly in case capabilities actually stop improving, the strength of the negative result.
Yeah, that sentence may have been too strong.
It’s not just too strong, it’s also a reminder that we need to get used to waiting.
Even under short timelines, things will not move that fast, and we have not yet gotten large negative results, so the scaling case remains reasonable, so we kinda have to get used to hurrying up and waiting.