Update: the weights and paper are out. Tweet thread, GitHub, report (GitHub, HuggingFace). It’s big and mixture-of-experts-y; thread on notable stuff.
It was super cheap to train — they say 2.8M H800-hours or $5.6M.
It’s powerful:
It’s cheap to run:
This is deepseek-v3 not r1?
oops thanks
Update: the weights and paper are out. Tweet thread, GitHub, report (GitHub, HuggingFace). It’s big and mixture-of-experts-y; thread on notable stuff.
It was super cheap to train — they say 2.8M H800-hours or $5.6M.
It’s powerful:
It’s cheap to run:
This is deepseek-v3 not r1?
oops thanks