Cognition could have multiple parts that have nothing to do with each other, yet evolution (which has no mind at all) still found them. Thus it’s difficult to rule out that scaling some general learning method might find more of such parts, even when they aren’t found at lower levels of compute. While rapid scaling continues, it’s hard to see if there are genuine obstructions that won’t fall to feasible scale.
An LLM with a chain of thought is a general-purpose computer. In principle it could be implementing any cognitive algorithms, if the learning process gets the weights to do that. So a claim that an LLM can’t do something is a claim that the learning methods won’t train it to do it. With more scale, learning methods work better. And using RLVR, it might be possible to teach LLMs to experiment with methods for training themselves to do more things (implementing more parts of cognition in the weights), much more efficiently than evolution did to discover the human brain.
I agree that this is the relevant consideration. I think that if cognition has many parts, we should actually expect some parts that humans use to be completely missing in LLMs (and vice-versa), and it’s not clear to me whether I should expect scaling architecture to actually produce more parts in this way, I have some intuitions that say (for combinatorial reasons) that within a certain architecture, training dynamics will eventually stop favoring the formation of circuits past a certain size regardless of how many layers you stack, but I am not that confident in these intuitions, so will need to think about it more. The point about discovery bootstrapping is well-taken, I think I’ve been imagining that this sort of discovery shouldn’t be possible while LLMs are totally lacking some parts of human cognition, but if I take multiple intelligences really seriously than I shouldn’t believe that. Thanks!
Cognition could have multiple parts that have nothing to do with each other, yet evolution (which has no mind at all) still found them. Thus it’s difficult to rule out that scaling some general learning method might find more of such parts, even when they aren’t found at lower levels of compute. While rapid scaling continues, it’s hard to see if there are genuine obstructions that won’t fall to feasible scale.
An LLM with a chain of thought is a general-purpose computer. In principle it could be implementing any cognitive algorithms, if the learning process gets the weights to do that. So a claim that an LLM can’t do something is a claim that the learning methods won’t train it to do it. With more scale, learning methods work better. And using RLVR, it might be possible to teach LLMs to experiment with methods for training themselves to do more things (implementing more parts of cognition in the weights), much more efficiently than evolution did to discover the human brain.
I agree that this is the relevant consideration. I think that if cognition has many parts, we should actually expect some parts that humans use to be completely missing in LLMs (and vice-versa), and it’s not clear to me whether I should expect scaling architecture to actually produce more parts in this way, I have some intuitions that say (for combinatorial reasons) that within a certain architecture, training dynamics will eventually stop favoring the formation of circuits past a certain size regardless of how many layers you stack, but I am not that confident in these intuitions, so will need to think about it more. The point about discovery bootstrapping is well-taken, I think I’ve been imagining that this sort of discovery shouldn’t be possible while LLMs are totally lacking some parts of human cognition, but if I take multiple intelligences really seriously than I shouldn’t believe that. Thanks!