Relative to GPT-4o, which was trained at a time when 30K H100s clusters were around, and so in BF16 could be expected to be around 8e25 FLOPs, possibly overtrained to a degree that’s not too different from DeepSeek-V3 itself.
Amodei’s post you linked says a few of tens of millions of dollars for Claude 3.5 Sonnet, which is maybe 4e25 FLOPs in BF16, but I think Claude 3.5 Sonnet is better than DeepSeek-V3, which is not as clearly the case for GPT-4o and DeepSeek-V3, making them easier to compare. Being better than GPT-4o at 2x fewer FLOPs, Claude 3.5 Sonnet has at least a 4x compute multiplier over it (under all the assumptions), but not necessarily more, while with DeepSeek-V3 there’s evidence for more. As DeepSeek-V3 was trained more than half a year later than Claude 3.5 Sonnet, it’s somewhat “on trend” in getting a compute multiplier of 16x instead of 4x, if we anchor to Amodei’s claim of 4x per year.
Relative to GPT-4o, which was trained at a time when 30K H100s clusters were around, and so in BF16 could be expected to be around 8e25 FLOPs, possibly overtrained to a degree that’s not too different from DeepSeek-V3 itself.
Amodei’s post you linked says a few of tens of millions of dollars for Claude 3.5 Sonnet, which is maybe 4e25 FLOPs in BF16, but I think Claude 3.5 Sonnet is better than DeepSeek-V3, which is not as clearly the case for GPT-4o and DeepSeek-V3, making them easier to compare. Being better than GPT-4o at 2x fewer FLOPs, Claude 3.5 Sonnet has at least a 4x compute multiplier over it (under all the assumptions), but not necessarily more, while with DeepSeek-V3 there’s evidence for more. As DeepSeek-V3 was trained more than half a year later than Claude 3.5 Sonnet, it’s somewhat “on trend” in getting a compute multiplier of 16x instead of 4x, if we anchor to Amodei’s claim of 4x per year.