Estimating training compute of Deep Learning models

by Jaime Sevilla, Lennart Heim, Marius Hobbhahn, Tamay Besiroglu, and Anson Ho

You can find the complete article here. We provide a short summary below.

In short: To estimate the compute used to train a Deep Learning model we can either: 1) directly count the number of operations needed or 2) estimate it from GPU time.

Method 1: Counting operations in the model

$2 \times # of connections      Operations per forward pass \times 3    Backward-forward adjustment \times # training examples \times # epochs      Number of passes$

Method 2: GPU time

$training time \times # cores \times peak FLOP/s \times utilization rate$

We are uncertain about what utilization rate is best, but our recommendation is 30% for Large Language Models and 40% for other models.

You can read more about method 1 here and about method 2 here.

Complete Article

You can find the article here.

Estimating training compute of Deep Learning models

Method 1: Counting operations in the model

Method 2: GPU time

Other parts of interest of this article include:

Complete Article