Hardware for Transformative AI

I recently saw A breakdown of AI Chip companies linked on LessWrong. I thought it was interesting, and it inspired me to add my 2 cents on what AI chips might be used in the case of transformative AI in the relatively near future.

Background

Undoubtedly Nvidia is the leader when it comes to AI chips, by far Nvidia GPUs are the most common way of training large neural nets. For example GPT-3 was trained on Nvidia V100 GPUs.

According to A breakdown of ai chip companies, Nvidia’s GPUs aren’t that well optimized for AI, and thus for training the performance is way worse than theoretical teraflops achieved (10x worse according to the article, but this seems like an exaggeration, since according to this article the idle time was around 70% for GPU cores, so about 3.3x worse than theoretical performance).

The second most used harware for AI are Google tensor processing units (TPUs), which is specialized for AI. I found this table useful, comparing cost of training a small convolutional neural net:

Source: https://​​medium.com/​​bigdatarepublic/​​cost-comparison-of-deep-learning-hardware-google-tpuv2-vs-nvidia-tesla-v100-3c63fe56c20f

In this case TPU was considerably cheaper, but I believe the relative costs vary significantly depending on training task and hyperparameters.

The drawbacks with TPUs is that you can’t buy them (only rent from google) and code has to be compiled in a specific way to be efficient, which decreases flexibility when using TPUs.

Cost of GPT3 as reference

GPT-3 is the most impressive language model to date, and it has about 175 billion parameters. The cost of training was about 12 million, and according to Estimating GPT3 API cost, a request to generate 1024 tokens (about 700 words) costs about 0.001 USD assuming constant usage.

Cost of a theoretical 17 trillion transformative AI

I once read an estimate that 5x-ing the size of GPT3, would 10x the training cost. I can’t find the source now, so if anyone has a better estimate, please let me know.

(Note: Thank you Daniel Kokotajlo for pointing out my math mistake, I have updated the calculations as of below)

Number of 5 doubles needed to 100x parameters: 5^n=100 ⇒ n=log(100)/​log(5)=2.86

Increase of training cost: 10^2.86=727

Training a 17 trillion parameter GPT model would then cost around 727 times more than the cost of GPT-3, which would give a cost of around 8,7 billion USD. Of course bigger models alone without software improvements might not result in transformative AI.

With 100x parameters, the cost for inference (using the model) would increase roughly 100x also. Giving a cost of about 0.1 USD per 700 words generated. This seems very cheap compared to human labour. Even if the AI does a billion tasks like this per day, it would only land at around 36.5 billion USD per year in computational cost.

Hardware availability as a potential bottleneck for inference

Hardware might very well be a bottleneck for deploying the transformative AI large scale, even if done over a period of multiple years. Nvidia’s revenue 2020 was just under 11 billion USD, and there is already a chip shortage. It does not seem likely then that they would be able to meet demand for tens of billions in GPUs anytime soon.

In the event of hardware as a bottleneck, the transformative AI would likely be adapted to run on many different types of hardware, including Google TPUs and maybe GPUs from other companies such as AMD.

One interesting idea is that decentralized computation might be possible to use, especially if a crypto crash would send miners looking for alternative ways to use their GPUs. There are several companies working on this, however I’ve not seen it done with any success yet.

Computational cost as a potential bottleneck for inference

If the computational cost is close to the cost of human labour in many cases, specialized chips are more likely to play a major role. Since there is probably only one or a few transformative AIs, the AI chips they use wouldn’t actually have to be particularly flexible or easy to use, it just has to be cheap.