OpenAI delivering an iterative update rather than a revolutionary one has lengthened many people’s timelines. My take is that this incentivizes many more players into trying for the frontier. Xai’s Grok has gone from non-existent in 2022 to a leading model. The rollout pace of improvements to the newest version of Grok are far more frequent than other leading companies. Nvidia has also recently begun releasing open larger source models as well as the accompanyin datasets. Meta is another player that is now all-in. The failure of Llama and moderate updates by OpenAI likely pushed Zuckerberg into realizing that his favored relentless A/B testing at scale could work. Twenty-nine billion for new datacenters and huge payout for top minds is like a beacon for sovereign wealth / hedge funds to notice that the science fiction reality is now here. When the prize is up for grabs much more captial will be thrown into the arena than if the winner was a foregone conclusion.
So, my timelines have shortened due to market sentiment conditions and dawning realizations rather than benchmarks improving. While tech stocks may fall, bubbles may burst, and benchmarks could stagnate; I still believe the very idea of taking the lead in AGI trumps all.
I mostly agree with this, but the key aspect is not just many more players, but many more methods waiting to be tried. The SOTA AI architecture is a MoE LLM with a CoT and augmented retrieval. If the paradigm has hit its limits[1] in a manner similar to the plateau of GPT4-GPT4o or Chinese models,[2] then the researchers will likely begin to explore new architectures.
For example, there is the neuralese with big internal memory and lack of interpretability. Another potential candidate is a neuralese black box[3] choosing the places in the CoT where the main model will pay attention. While the black box can be constructed to understand the context as well as one wishes, the main model stays fully transparent. A third potential candidate is Lee’s proposal. And a fourth candidate is the architecture first tried by Gemini Diffusion.
In a manner similar to the Fermi paradox, this makes me wonder why none of these approaches led to the creation of new powerful models. Maybe Gemini Diffusion is already finishing the training run and will win the day?
Which also include troubles with fitting context into the attention span, since the IMO, consisting of short problems, mostly fell to unreleased LLMs. Amelioration of the limits could likely require large memory processed deep inside the model, making the neuralese internal thoughts a likely candidate.
OpenAI delivering an iterative update rather than a revolutionary one has lengthened many people’s timelines. My take is that this incentivizes many more players into trying for the frontier. Xai’s Grok has gone from non-existent in 2022 to a leading model. The rollout pace of improvements to the newest version of Grok are far more frequent than other leading companies. Nvidia has also recently begun releasing open larger source models as well as the accompanyin datasets. Meta is another player that is now all-in. The failure of Llama and moderate updates by OpenAI likely pushed Zuckerberg into realizing that his favored relentless A/B testing at scale could work. Twenty-nine billion for new datacenters and huge payout for top minds is like a beacon for sovereign wealth / hedge funds to notice that the science fiction reality is now here. When the prize is up for grabs much more captial will be thrown into the arena than if the winner was a foregone conclusion.
So, my timelines have shortened due to market sentiment conditions and dawning realizations rather than benchmarks improving. While tech stocks may fall, bubbles may burst, and benchmarks could stagnate; I still believe the very idea of taking the lead in AGI trumps all.
I mostly agree with this, but the key aspect is not just many more players, but many more methods waiting to be tried. The SOTA AI architecture is a MoE LLM with a CoT and augmented retrieval. If the paradigm has hit its limits[1] in a manner similar to the plateau of GPT4-GPT4o or Chinese models,[2] then the researchers will likely begin to explore new architectures.
For example, there is the neuralese with big internal memory and lack of interpretability. Another potential candidate is a neuralese black box[3] choosing the places in the CoT where the main model will pay attention. While the black box can be constructed to understand the context as well as one wishes, the main model stays fully transparent. A third potential candidate is Lee’s proposal. And a fourth candidate is the architecture first tried by Gemini Diffusion.
In a manner similar to the Fermi paradox, this makes me wonder why none of these approaches led to the creation of new powerful models. Maybe Gemini Diffusion is already finishing the training run and will win the day?
Which also include troubles with fitting context into the attention span, since the IMO, consisting of short problems, mostly fell to unreleased LLMs. Amelioration of the limits could likely require large memory processed deep inside the model, making the neuralese internal thoughts a likely candidate.
However, there is DeepSeek V3.1, released on Aug 20 or 21.
Which could also be a separate model working with the CoT only, allowing the black box to be integrated into many different models.