Vladimir_Nesov comments on Six Thoughts on AI Safety

Vladimir_Nesov 27 Jan 2025 14:27 UTC
7 points
0

The $6 million is disputed by a video arguing that DeepSeek used far more compute than they admit to.

The prior reference is a Dylan Patel tweet from Nov 2024, in the wake of R1-Lite-Preview release:

Deepseek has over 50k Hopper GPUs to be clear.
People need to stop acting like they only have that 10k A100 cluster.
They are omega cracked on ML research and infra management but they aren’t doing it with that many fewer GPUs

DeepSeek explicitly states that

DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training.

This seems unlikely to be a lie, the reputational damage would’ve motivated not mentioning amount of compute instead, but the most interesting thing about DeepSeek-V3 is precisely this claim, that its quality is possible with so little compute.

Certainly designing the architecture, the data mix, and the training process that made it possible required much more compute than the final training run, so in total it cost much more to develop than $6 million. And the 50K H100/H800 system is one way to go about that, though renting a bunch of 512-GPU instances from various clouds probably would’ve sufficed as well.
- Knight Lee 27 Jan 2025 23:00 UTC
  1 point
  0
  Parent
  I see, thank you for the info!
  I don’t actually know about DeepSeek V3, I just felt “if I pointed out the $6 million claim in my argument, I shouldn’t hide the fact I watched a video which made myself doubt it.”
  I wanted to include the video as a caveat just in case the $6 million was wrong.
  Your explanation suggests the $6 million is still in the ballpark (for the final training run), so the concerns about a “software only singularity” are still very realistic.