It seems likely to me that “driving a car” used as a core example actually took something like billions of person-years including tens of millions of fatalities to get to the stage that it is today.
Some specific human, after being raised for more than a dozen years in a social background that includes frequent exposure to car-driving behaviour both in person and in media, is already somewhat primed to learn how to safely drive a vehicle designed for human use within a system of road rules and infrastructure customized over more than a century to human skills, sensory modalities, and background culture. All the vehicles, roads, markings, signs, and rules have been designed and redesigned so that humans aren’t as terrible at learning how to navigate them as they were in the first few decades.
Many early operators of motor vehicles (adult, experienced humans) frequently did things that were frankly insane by modern standards and would send an AI driving research division back to redesign if their software did such things even once.
I agree with this, and I’d split your point into three separate factors. The “30 hours to learn to drive” comparison hides at least: (1) Pretraining: evolutionary pretraining of our visual/motor systems plus years of everyday world experience; (2) Environment/institution design: a car/road ecosystem (infrastructure, norms, licensing) that has been iteratively redesigned for human drivers; (3) Reward functions: they do matter for sample efficiency, but in this case they don’t seem to be the main driver of the gap. Remove (1) and (2) and the picture changes: a blind person can’t realistically learn to drive safely in the current environment, and a new immigrant who speaks no English can’t pass the UK driving theory test without first learning English or Welsh, because of language policy, not because their brain’s reward function is worse. A large part of the sample-efficiency gap here seems to be about pretraining and environment/institution design, rather than about humans having magically “better reward functions” inside a tabula-rasa learner.
Tbc this is not my position. I think that humans can do lots of things LLMs can’t, e.g. found and grow and run innovative companies from scratch, but not because of their reward functions. Likewise, I think a quite simple reward function would be sufficient for (misaligned) ASI with capabilities lightyears beyond both humans and today’s LLMs. I have some discussion here & here.
Thanks for the clarification and the links—My guess is that the real crux is how far “reward-function design” can really go, even in principle. If you have a very capable RL system in an open-ended environment and you heavily optimize a single scalar, then Goodhart / overoptimization / mesa-optimization tend to push towards extreme solutions. Under that picture, reward functions are just one axis in a larger design space (pretraining, environment/constraints, interpretability, multi-objective structure, etc.), and no cleverly designed scalar reward on its own looks like a stable solution.
It seems likely to me that “driving a car” used as a core example actually took something like billions of person-years including tens of millions of fatalities to get to the stage that it is today.
Some specific human, after being raised for more than a dozen years in a social background that includes frequent exposure to car-driving behaviour both in person and in media, is already somewhat primed to learn how to safely drive a vehicle designed for human use within a system of road rules and infrastructure customized over more than a century to human skills, sensory modalities, and background culture. All the vehicles, roads, markings, signs, and rules have been designed and redesigned so that humans aren’t as terrible at learning how to navigate them as they were in the first few decades.
Many early operators of motor vehicles (adult, experienced humans) frequently did things that were frankly insane by modern standards and would send an AI driving research division back to redesign if their software did such things even once.
I agree with this, and I’d split your point into three separate factors.
The “30 hours to learn to drive” comparison hides at least:
(1) Pretraining: evolutionary pretraining of our visual/motor systems plus years of everyday world experience;
(2) Environment/institution design: a car/road ecosystem (infrastructure, norms, licensing) that has been iteratively redesigned for human drivers;
(3) Reward functions: they do matter for sample efficiency, but in this case they don’t seem to be the main driver of the gap.
Remove (1) and (2) and the picture changes: a blind person can’t realistically learn to drive safely in the current environment, and a new immigrant who speaks no English can’t pass the UK driving theory test without first learning English or Welsh, because of language policy, not because their brain’s reward function is worse.
A large part of the sample-efficiency gap here seems to be about pretraining and environment/institution design, rather than about humans having magically “better reward functions” inside a tabula-rasa learner.
Tbc this is not my position. I think that humans can do lots of things LLMs can’t, e.g. found and grow and run innovative companies from scratch, but not because of their reward functions. Likewise, I think a quite simple reward function would be sufficient for (misaligned) ASI with capabilities lightyears beyond both humans and today’s LLMs. I have some discussion here & here.
Thanks for the clarification and the links—My guess is that the real crux is how far “reward-function design” can really go, even in principle. If you have a very capable RL system in an open-ended environment and you heavily optimize a single scalar, then Goodhart / overoptimization / mesa-optimization tend to push towards extreme solutions. Under that picture, reward functions are just one axis in a larger design space (pretraining, environment/constraints, interpretability, multi-objective structure, etc.), and no cleverly designed scalar reward on its own looks like a stable solution.