The reason why there was a sharp left turn in the evolution of humans is because evolution was putting tiny amount of compute into hyperparamater search, compared to the compute that brains were using for learning over their lifetime. But all of the learning that the brains were doing was getting thrown out at the end of each of their lives. So, there was this huge low hanging fruit opportunity to not throw out the results of most of the compute usage. It’s like having giant overpowered functions doing computations, but the part where they pass their results to the next function is broken.
Eventually humans developed enough culture that human civilization was able to not throw out a small fraction of the the learning accumulated by the brains in their lifetimes. This effectively unlocked a gigantic long-term learning-accumulation process, that had been sitting right there latent. And as soon as it was online it started out with around 6 orders of magnitude more compute than the only long term learning process up until that point, evolution.
But modern ML doesn’t have anything like this huge discrepancy in compute allocation. When we train ML systems, the vast majority of the compute is spent on training. A comparatively small amount is spent (at least currently) on inference, which includes some in-context learning.
Even preserving 100% of the in-context learning would not thereby create / unlock an extended learning process that is much more powerful than SGD, because SGD is still using vastly more compute than that in-context learning. The massive low-hanging fruit that was waiting to be exploited in human evolution is just not there in modern machine learning.
My summary: