JBlack comments on We need a field of Reward Function Design

JBlack 9 Dec 2025 1:17 UTC
9 points
−3
It seems likely to me that “driving a car” used as a core example actually took something like billions of person-years including tens of millions of fatalities to get to the stage that it is today.
Some specific human, after being raised for more than a dozen years in a social background that includes frequent exposure to car-driving behaviour both in person and in media, is already somewhat primed to learn how to safely drive a vehicle designed for human use within a system of road rules and infrastructure customized over more than a century to human skills, sensory modalities, and background culture. All the vehicles, roads, markings, signs, and rules have been designed and redesigned so that humans aren’t as terrible at learning how to navigate them as they were in the first few decades.
Many early operators of motor vehicles (adult, experienced humans) frequently did things that were frankly insane by modern standards and would send an AI driving research division back to redesign if their software did such things even once.
- Ethan Miller 10 Dec 2025 6:23 UTC
  1 point
  0
  Parent
  I agree with this, and I’d split your point into three separate factors.
  The “30 hours to learn to drive” comparison hides at least:
  (1) Pretraining: evolutionary pretraining of our visual/motor systems plus years of everyday world experience;
  (2) Environment/institution design: a car/road ecosystem (infrastructure, norms, licensing) that has been iteratively redesigned for human drivers;
  (3) Reward functions: they do matter for sample efficiency, but in this case they don’t seem to be the main driver of the gap.
  Remove (1) and (2) and the picture changes: a blind person can’t realistically learn to drive safely in the current environment, and a new immigrant who speaks no English can’t pass the UK driving theory test without first learning English or Welsh, because of language policy, not because their brain’s reward function is worse.
  A large part of the sample-efficiency gap here seems to be about pretraining and environment/institution design, rather than about humans having magically “better reward functions” inside a tabula-rasa learner.