I mostly agree with this, but the key aspect is not just many more players, but many more methods waiting to be tried. The SOTA AI architecture is a MoE LLM with a CoT and augmented retrieval. If the paradigm has hit its limits[1] in a manner similar to the plateau of GPT4-GPT4o or Chinese models,[2] then the researchers will likely begin to explore new architectures.
For example, there is the neuralese with big internal memory and lack of interpretability. Another potential candidate is a neuralese black box[3] choosing the places in the CoT where the main model will pay attention. While the black box can be constructed to understand the context as well as one wishes, the main model stays fully transparent. A third potential candidate is Lee’s proposal. And a fourth candidate is the architecture first tried by Gemini Diffusion.
In a manner similar to the Fermi paradox, this makes me wonder why none of these approaches led to the creation of new powerful models. Maybe Gemini Diffusion is already finishing the training run and will win the day?
Which also include troubles with fitting context into the attention span, since the IMO, consisting of short problems, mostly fell to unreleased LLMs. Amelioration of the limits could likely require large memory processed deep inside the model, making the neuralese internal thoughts a likely candidate.
I mostly agree with this, but the key aspect is not just many more players, but many more methods waiting to be tried. The SOTA AI architecture is a MoE LLM with a CoT and augmented retrieval. If the paradigm has hit its limits[1] in a manner similar to the plateau of GPT4-GPT4o or Chinese models,[2] then the researchers will likely begin to explore new architectures.
For example, there is the neuralese with big internal memory and lack of interpretability. Another potential candidate is a neuralese black box[3] choosing the places in the CoT where the main model will pay attention. While the black box can be constructed to understand the context as well as one wishes, the main model stays fully transparent. A third potential candidate is Lee’s proposal. And a fourth candidate is the architecture first tried by Gemini Diffusion.
In a manner similar to the Fermi paradox, this makes me wonder why none of these approaches led to the creation of new powerful models. Maybe Gemini Diffusion is already finishing the training run and will win the day?
Which also include troubles with fitting context into the attention span, since the IMO, consisting of short problems, mostly fell to unreleased LLMs. Amelioration of the limits could likely require large memory processed deep inside the model, making the neuralese internal thoughts a likely candidate.
However, there is DeepSeek V3.1, released on Aug 20 or 21.
Which could also be a separate model working with the CoT only, allowing the black box to be integrated into many different models.