If people are reading this thread and want to read this argument in more detail: the (excellent) book ‘The Secret of our Success’ by Joseph Henrich (astral codex 10 review/summary here https://slatestarcodex.com/2019/06/04/book-review-the-secret-of-our-success/) makes this argument in a very compelling way. There is a lot of support for the idea that the crucial ‘rubicon’ that separates chimps from people is cultural transmission which enables the gradual evolution of strategies over periods longer than an individual lifetime rather than any ‘raw’ problem solving intelligence. In fact according to Heinrich there are many ways in which humans are actually worse than chimps in some measures of raw intelligence: chimps have better working memory and faster reactions for complex tasks in some cases, and they are better than people at finding Nash equilibria which require randomising your strategy. But humans are uniquely able to learn behaviours from demonstration and forming larger groups which enable the gradual accumulation of ‘cultural technology’, which then allowed a runway of cultural-genetic co-evolution (e.g food processing technology → smaller stomachs and bigger brains → even more culture → bigger brains even more of an advantage etc.) It’s hard to appreciate how much this kind of thing helps you think; for instance, most people can learn maths but few would have invented arabic numerals by themselves. Similarly, having a large brain by itself is actually not super useful without the cultural superstructure: most people alive today would quickly die if dropped into the ancestral environment without the support of modern culture unless they could learn from hunter-gatherers (see Henrich for many examples of this happening to European explorers!). For instance, i like to think I’m a pretty smart guy but I have no idea how to make e.g bronze or stone tools, and it’s not obvious that my physics degree would help me figure it out! Henrich also makes the case for the importance of this with some slightly chilling examples of cultures that lost their ability to make complex technology (e.g boats) when they fell below a critical population size and became isolated.
It’s interesting to consider the implications for AI: I’m not very sure about this. On the one hand LLMs clearly have superhuman ability to memorise facts, but I’m not sure if this means they can learn new tasks or information particularly easily. On the other it seems likely that LLMs are taking pretty heavy advantage of the ‘culture overhang’ of the internet! I don’t know if it really makes sense to think of their abilities here as strongly superhuman: if you magically had the compute and code to try to train gpt-n in 1950 it’s not obvious you could have got it to do very much, without the internet for it to absorb.
I think that dropout inhibits superposition is pretty intuitive; there’s a sense in which superposition seems like a ‘delicate’ phenomenon and adding random noise obviously increases the noise floor of what the model can represent. It might be possible to make some more quantitative predictions about this in the ReLU output model which would be cool, though maybe not that important relative to doing more on the effect of dropout/superposition in more realistic models.