Very informative. You ignore inference in your cost breakdown, saying:
Other possible costs, such as providing ChatGPT for free, would have been much smaller.
But Semianalysis says: “More importantly, inference costs far exceed training costs when deploying a model at any reasonable scale. In fact, the costs to inference ChatGPT exceed the training costs on a weekly basis.” Why the discrepancy?
In fact, the costs to inference ChatGPT exceed the training costs on a weekly basis
That seems quite wild, if the training cost was 50M$, then the inference cost for a year would be 2.5B$.
The inference cost dominating the cost seems to depend on how you split the cost of building the supercomputer (buying the GPUs). If you include the cost of building the supercomputer into the training cost, then the inference cost (without the cost of building the computer) looks cheap. If you split the building cost between training and inference in proportion to the “use time”, then the inference cost would dominate.
Very informative. You ignore inference in your cost breakdown, saying:
But Semianalysis says: “More importantly, inference costs far exceed training costs when deploying a model at any reasonable scale. In fact, the costs to inference ChatGPT exceed the training costs on a weekly basis.” Why the discrepancy?
This is because I’m specifically talking about 2022, and ChatGPT was only released at the very end of 2022, and GPT-4 wasn’t released until 2023.
That seems quite wild, if the training cost was 50M$, then the inference cost for a year would be 2.5B$.
The inference cost dominating the cost seems to depend on how you split the cost of building the supercomputer (buying the GPUs).
If you include the cost of building the supercomputer into the training cost, then the inference cost (without the cost of building the computer) looks cheap. If you split the building cost between training and inference in proportion to the “use time”, then the inference cost would dominate.
Since OpenAI are renting MSFT compute for both training and inference..
Seems reasonable to think that inference >> training. Am I right?