System 2 thinking is exactly what many people are trying to add to LLMs via things like thinking-step-by-step, scratchpads, graph-of-thoughts and lots of other agentic scaffolding mechanisms that people are working on.
Integration of the senses, lack of physical, mechanical, or intuitive understanding of many world aspects is solvable by training a truly multi-modal model also on a variety of other media including images, audio, video, motion logs, robot manipulation logs etc etc. From their publications, Google DeepMind have clearly been working in this direction for years,
Learning during problem solving requires feeding data back from inference runs to future training runs. While there are user privacy issues here, in practice most AI companies doe this as much as they can. But the cycle length is long, and (from a capabilities point of view) it would be nice to shorten it by some for of incremental training.
System 2 thinking that takes a detour over tokens is fundamentally limited compared to something that continuously modifies a complex and highly dimensional vector representation.
Integration of senses will happen, but is the information density of non-text modalities high enough to contribute to the intelligence of future models?
What I call “learning during problem solving” relies on the ability to extract a lot of information from a single problem. To investigate and understand the structure of this one problem. To in the process of doing that building a representation of this problem that can be leveraged in the future to solve problems that have similar aspects.
System 2 thinking is exactly what many people are trying to add to LLMs via things like thinking-step-by-step, scratchpads, graph-of-thoughts and lots of other agentic scaffolding mechanisms that people are working on.
Integration of the senses, lack of physical, mechanical, or intuitive understanding of many world aspects is solvable by training a truly multi-modal model also on a variety of other media including images, audio, video, motion logs, robot manipulation logs etc etc. From their publications, Google DeepMind have clearly been working in this direction for years,
Learning during problem solving requires feeding data back from inference runs to future training runs. While there are user privacy issues here, in practice most AI companies doe this as much as they can. But the cycle length is long, and (from a capabilities point of view) it would be nice to shorten it by some for of incremental training.
System 2 thinking that takes a detour over tokens is fundamentally limited compared to something that continuously modifies a complex and highly dimensional vector representation.
Integration of senses will happen, but is the information density of non-text modalities high enough to contribute to the intelligence of future models?
What I call “learning during problem solving” relies on the ability to extract a lot of information from a single problem. To investigate and understand the structure of this one problem. To in the process of doing that building a representation of this problem that can be leveraged in the future to solve problems that have similar aspects.