I would roughly punt it into the category of “optimization”, not “innovation”. “Innovation” is something like transformers, instruct-training, or RL-on-CoTs. MoE scaling is an incremental-ish improvement.
Or, to put it in other words: it’s an innovation in the field of compute-optimal algorithms/machine learning. It’s not an AI innovation.
But from interviews of DeepSeek founder Liang Wenfeng, we know DeepSeek was explicitly an attempt to overcome China’s unwillingness to innovate
Yes, and we’re yet to see them succeed. And with the CCP having apparently turned its sights on them, that attempt may be thoroughly murdered already.
I would roughly punt it into the category of “optimization”, not “innovation”. “Innovation” is something like transformers, instruct-training, or RL-on-CoTs. MoE scaling is an incremental-ish improvement.
Or, to put it in other words: it’s an innovation in the field of compute-optimal algorithms/machine learning. It’s not an AI innovation.
Yes, and we’re yet to see them succeed. And with the CCP having apparently turned its sights on them, that attempt may be thoroughly murdered already.