The whole cortex is (more-or-less) a uniform randomly-initialized learning algorithm, and I think it’s basically the secret sauce of human intelligence.
I’m a bit surprised that you view the “secret sauce” as being in the cortical algorithm. My (admittedly quite hazy) view is that the cortex seems to be doing roughly the same “type of thing” as transformers, namely, building a giant predictive/generative world model. Sure, maybe it’s doing so more efficiently—I haven’t looked into all the various comparisons between LLM and human lifetime training data. But I would’ve expected the major qualitative gaps between humans and LLMs to come from the complete lack of anything equivalent to the subcortical areas in LLMs. (But maybe that’s just my bias from having worked on basal ganglia modeling and not the cortex.) In this view, there’s still some secret sauce that current LLMs are missing, but AGI will likely look like some extra stuff stapled to an LLM rather than an entirely new paradigm. So what makes you think that the key difference is actually in the cortical algorithm?
(If one of your many posts on the subject already answers this question, feel free to point me to it)
I’m a bit surprised that you view the “secret sauce” as being in the cortical algorithm. My (admittedly quite hazy) view is that the cortex seems to be doing roughly the same “type of thing” as transformers, namely, building a giant predictive/generative world model. Sure, maybe it’s doing so more efficiently—I haven’t looked into all the various comparisons between LLM and human lifetime training data. But I would’ve expected the major qualitative gaps between humans and LLMs to come from the complete lack of anything equivalent to the subcortical areas in LLMs. (But maybe that’s just my bias from having worked on basal ganglia modeling and not the cortex.) In this view, there’s still some secret sauce that current LLMs are missing, but AGI will likely look like some extra stuff stapled to an LLM rather than an entirely new paradigm. So what makes you think that the key difference is actually in the cortical algorithm?
(If one of your many posts on the subject already answers this question, feel free to point me to it)