[Question] How Does the Human Brain Compare to Deep Learning on Sample Efficiency?

I have an impression that within lifetime human learning is orders of magnitude more sample efficient than large language models, but there are numerous caveats to this:

  1. We don’t have “an ecological evaluation objective” for language models (they weren’t actually optimised for the downstream language usage tasks on which we compare them to humans)

  2. Insomuch as we do have an ecological evaluation objective (predictive loss on the test set) language models are already very superhuman and apparently even GPT-1 was superhuman at next token prediction

    • Though for similar reasons, next token prediction is not an ecological training objective for humans

      • Humans that specialised at next token prediction (the way some humans specialise at chess) mat show markedly different results

  3. It’s plausible that most of the optimisation involved in producing the brain happened over the course of our evolutionary history and within lifetime human learning is more analogous to fine tuning than to training from scratch.

#3 notwithstanding, I’m curious if we have any robust estimates for how within lifetime human learning compares to deep learning on sample efficiency across various tasks of interest.


Why Does This Matter?

The brain is known to be very energy efficient compared to GPUs of comparable processing power.

However, energy efficiency is just much less taut of a constraint for human engineering than it was for biology (electricity has a much higher throughput than ATP and we have a much larger energy budget). This relative energy abundance would likely remain the case (or rather intensify) as AI systems become more capable.

Thus, the energy efficiency of the brain does not provide much evidence with respect to whether advanced AGI will be neuromorphic.

On the other hand, it seems very plausible that data efficiency is just part and parcel of general intelligence. It may be the case that sufficiently powerful systems would necessarily be more data efficient than the brain (this seems very plausible to me).

If deep learning is sufficiently less data efficient than the brain, it may provide evidence that deep learning wouldn’t produce existentially dangerous systems.

We may thus have reason not to expect deep learning to scale to superhuman general intelligence.