I agree that this is the relevant consideration. I think that if cognition has many parts, we should actually expect some parts that humans use to be completely missing in LLMs (and vice-versa), and it’s not clear to me whether I should expect scaling architecture to actually produce more parts in this way, I have some intuitions that say (for combinatorial reasons) that within a certain architecture, training dynamics will eventually stop favoring the formation of circuits past a certain size regardless of how many layers you stack, but I am not that confident in these intuitions, so will need to think about it more. The point about discovery bootstrapping is well-taken, I think I’ve been imagining that this sort of discovery shouldn’t be possible while LLMs are totally lacking some parts of human cognition, but if I take multiple intelligences really seriously than I shouldn’t believe that. Thanks!
I agree that this is the relevant consideration. I think that if cognition has many parts, we should actually expect some parts that humans use to be completely missing in LLMs (and vice-versa), and it’s not clear to me whether I should expect scaling architecture to actually produce more parts in this way, I have some intuitions that say (for combinatorial reasons) that within a certain architecture, training dynamics will eventually stop favoring the formation of circuits past a certain size regardless of how many layers you stack, but I am not that confident in these intuitions, so will need to think about it more. The point about discovery bootstrapping is well-taken, I think I’ve been imagining that this sort of discovery shouldn’t be possible while LLMs are totally lacking some parts of human cognition, but if I take multiple intelligences really seriously than I shouldn’t believe that. Thanks!