Something I’m thinking about today: frontier LLMs have a pretty unusual capabilities profile. This means one of two things: either I should think of LLMs as leveraging massive amounts of necessary compute and the problems they can solve as much more compute-vulnerable than I thought they were (i.e. this is Deep Blue and everything is kind of chess) or multiple intelligences models are simply true, in that cognition has multiple parts that don’t necessarily have anything to do with each other. The latter predicts that step changes in capabilities are available and that LLMs probably will not scale to super intelligence without architecture changes, the former is unsure about these propositions. I think there’s like a 60% chance that I will change my mind about something critical in this line of thinking if I think about it harder, if you know what I will change my mind about please let me know.
Cognition could have multiple parts that have nothing to do with each other, yet evolution (which has no mind at all) still found them. Thus it’s difficult to rule out that scaling some general learning method might find more of such parts, even when they aren’t found at lower levels of compute. While rapid scaling continues, it’s hard to see if there are genuine obstructions that won’t fall to feasible scale.
An LLM with a chain of thought is a general-purpose computer. In principle it could be implementing any cognitive algorithms, if the learning process gets the weights to do that. So a claim that an LLM can’t do something is a claim that the learning methods won’t train it to do it. With more scale, learning methods work better. And using RLVR, it might be possible to teach LLMs to experiment with methods for training themselves to do more things (implementing more parts of cognition in the weights), much more efficiently than evolution did to discover the human brain.
I agree that this is the relevant consideration. I think that if cognition has many parts, we should actually expect some parts that humans use to be completely missing in LLMs (and vice-versa), and it’s not clear to me whether I should expect scaling architecture to actually produce more parts in this way, I have some intuitions that say (for combinatorial reasons) that within a certain architecture, training dynamics will eventually stop favoring the formation of circuits past a certain size regardless of how many layers you stack, but I am not that confident in these intuitions, so will need to think about it more. The point about discovery bootstrapping is well-taken, I think I’ve been imagining that this sort of discovery shouldn’t be possible while LLMs are totally lacking some parts of human cognition, but if I take multiple intelligences really seriously than I shouldn’t believe that. Thanks!
I think LLMs have a high proportion of fast/shallow/memorizing circuits while humans have a high proportion of slow/deep/generalizing circuits. Increasing circuit depth and generalizing in transformers requires a kind of phase change (which is actually very similar to ones in physics/chemistry, it’s not an inappropriate metaphor) and I’d be slightly shocked if there wasn’t an analogous process in human brains (though possibly more continuous). It seems like transformers are worse at this phase change than human brains are, so they disproportionately leverage large amounts of data and memorization. This, for me, explains most of the important differences between LLMs and people.
Something I’m thinking about today: frontier LLMs have a pretty unusual capabilities profile. This means one of two things: either I should think of LLMs as leveraging massive amounts of necessary compute and the problems they can solve as much more compute-vulnerable than I thought they were (i.e. this is Deep Blue and everything is kind of chess) or multiple intelligences models are simply true, in that cognition has multiple parts that don’t necessarily have anything to do with each other. The latter predicts that step changes in capabilities are available and that LLMs probably will not scale to super intelligence without architecture changes, the former is unsure about these propositions. I think there’s like a 60% chance that I will change my mind about something critical in this line of thinking if I think about it harder, if you know what I will change my mind about please let me know.
Cognition could have multiple parts that have nothing to do with each other, yet evolution (which has no mind at all) still found them. Thus it’s difficult to rule out that scaling some general learning method might find more of such parts, even when they aren’t found at lower levels of compute. While rapid scaling continues, it’s hard to see if there are genuine obstructions that won’t fall to feasible scale.
An LLM with a chain of thought is a general-purpose computer. In principle it could be implementing any cognitive algorithms, if the learning process gets the weights to do that. So a claim that an LLM can’t do something is a claim that the learning methods won’t train it to do it. With more scale, learning methods work better. And using RLVR, it might be possible to teach LLMs to experiment with methods for training themselves to do more things (implementing more parts of cognition in the weights), much more efficiently than evolution did to discover the human brain.
I agree that this is the relevant consideration. I think that if cognition has many parts, we should actually expect some parts that humans use to be completely missing in LLMs (and vice-versa), and it’s not clear to me whether I should expect scaling architecture to actually produce more parts in this way, I have some intuitions that say (for combinatorial reasons) that within a certain architecture, training dynamics will eventually stop favoring the formation of circuits past a certain size regardless of how many layers you stack, but I am not that confident in these intuitions, so will need to think about it more. The point about discovery bootstrapping is well-taken, I think I’ve been imagining that this sort of discovery shouldn’t be possible while LLMs are totally lacking some parts of human cognition, but if I take multiple intelligences really seriously than I shouldn’t believe that. Thanks!
I think LLMs have a high proportion of fast/shallow/memorizing circuits while humans have a high proportion of slow/deep/generalizing circuits. Increasing circuit depth and generalizing in transformers requires a kind of phase change (which is actually very similar to ones in physics/chemistry, it’s not an inappropriate metaphor) and I’d be slightly shocked if there wasn’t an analogous process in human brains (though possibly more continuous). It seems like transformers are worse at this phase change than human brains are, so they disproportionately leverage large amounts of data and memorization. This, for me, explains most of the important differences between LLMs and people.