All of this is interesting, but it seems to me that you did not make a strong case for the brain using an universal learning machine as its main system.
Specifically, I think you fail to address the evidence for evolved modularity:
The brain uses spatially specialized regions for different cognitive tasks.
This specialization pattern is mostly consistent across different humans and even across different species.
Damage to or malformation of some brain regions can cause specific forms of disability (e.g. face blindness). Sometimes the disability can be overcome but often not completely.
In various mammals, infants are capable of complex behavior straight out of the womb. Human infants are only exhibit very simple behaviors and require many years to reach full cognitive maturity therefore the human brain relies more on learning than the brain of other mammals, but the basic architecture is the same, thus this is a difference of degree, not kind.
It seems more likely that if there is a general-purpose “universal” learning system in the human brain then it is used as an inefficient fall-back mechanism when the specialized modules fail, not as the core mechanism that handles most of the cognitive tasks.
I’m also wary about using the recent successes of deep learning to draw inferences about how the brain works.
Be ware of the “ELIZA effect”: due to our over-active agency detection ability, we tend to anthropomorphize the behavior of even very simple AI systems.
There seems to be a trend in AI where for any technique that is currently hot there are people who say: “This is how the brain works. We don’t know all the details, but studies X, Y and Z clearly point in this direction.” After a few years and maybe an AI (mini)winter the brain seems to work in another way...
Specifically on deep learning:
For all the speculation, there is still no clear evidence that the brain uses anything similar to backpropagation.
Some of the most successful deep learning approaches, such as modern convnets for computer vision, rely on quite un-biological features such as weight sharing and rectified linear units.
“Deep learning” is a quite vague term anyway, it does not refer to any single algorithm or architecture. In fact, there are so many architectural variants and hyper-parameters that need to be adapted to each specific task that optimizing them can be considered a non-trivial learning problem on its own.
Perhaps most importantly, deep learning methods generally work in supervised learning settings and they have quite weak priors: they require a dataset as big as ImageNet to yield good image recognition performances (with still some characteristic error patterns), or a parallel corpus of million sentence pairs to yield sub-human level machine translation quality or days of continuous simulated gameplay on the ATARI 2600 emulator to obtain good scores (super-human for some games, sub-human for others). Clearly humans are able to effectively learn form a much smaller amount of evidence, indicating stronger priors and the ability to exploit minimal supervision.
Therefore I would say that deep learning methods, while certainly interesting from an engineering perspective, are probably not very much relevant to the understanding of the brain, at least given the current state of the evidence.
Done