sanxiyn comments on Fighting Obvious Nonsense About AI Diffusion

sanxiyn 15 May 2025 4:43 UTC
5 points
4
I disagree on DeekSeek and innovation. Yes R1 is obviously a reaction to o1, but its MoE model is pretty innovative, and it is Llama 4 that obviously copied DeepSeek. But yes I agree innovation is unpopular in China. But from interviews of DeepSeek founder Liang Wenfeng, we know DeepSeek was explicitly an attempt to overcome China’s unwillingness to innovate.
- Vladimir_Nesov 15 May 2025 5:13 UTC
  5 points
  0
  Parent
  it is Llama 4 that obviously copied DeepSeek
  
  DeepSeek-V3′s MoE architecture is unusual in having high granularity, 8 active experts rather than the usual 1-2. Llama 4 Maverick doesn’t do that^[1]. The closest thing is the recent Qwen3-235B-A22B, which also has 8 active experts.
  ↩︎
  From the release blog post:
  
  As an example, Llama 4 Maverick models have 17B active parameters and 400B total parameters. … MoE layers use 128 routed experts and a shared expert. Each token is sent to the shared expert and also to one of the 128 routed experts.
- Thane Ruthenis 15 May 2025 9:41 UTC
  2 points
  −2
  Parent
  its MoE model is pretty innovative
  I would roughly punt it into the category of “optimization”, not “innovation”. “Innovation” is something like transformers, instruct-training, or RL-on-CoTs. MoE scaling is an incremental-ish improvement.
  Or, to put it in other words: it’s an innovation in the field of compute-optimal algorithms/machine learning. It’s not an AI innovation.
  But from interviews of DeepSeek founder Liang Wenfeng, we know DeepSeek was explicitly an attempt to overcome China’s unwillingness to innovate
  Yes, and we’re yet to see them succeed. And with the CCP having apparently turned its sights on them, that attempt may be thoroughly murdered already.