TurnTrout comments on Genetic fitness is a measure of selection strength, not the selection target

TurnTrout 6 Nov 2023 16:54 UTC
LW: 6 AF: 4
5
AF
How is this not an excellent example of how under novel circumstances, inner-optimizers (like human brains) can almost all (serial sperm donor cases like hundreds out of billions) diverge extremely far (if forfeiting >10,000% is not diverging far, what would be?) from the optimization process’s reward function (within-generation increase in allele frequencies), while pursuing other rewards (whatever it is you are enjoying doing while very busy not ever donating sperm)?
I think it’s inappropriate to use technical terms like “reward function” in the context of evolution, because evolution’s selection criteria serve vastly different mechanistic functions from eg a reward function in PPO.^[1] Calling them both a “reward function” makes it harder to think precisely about the similarities and differences between AI RL and evolution, while invalidly making the two processes seem more similar. That is something which must be argued for, and not implied through terminology.
1. ^
  And yes, I wish that “reward function” weren’t also used for “the quantity which an exhaustive search RL agent argmaxes.” That’s bad too.
- mesaoptimizer 6 Nov 2023 21:27 UTC
  4 points
  1
  Parent
  Yeah.
  
  The fact that we don’t have standard mechanistic models of optimization via selection (which is what evolution and moral mazes and inadequate equilibria and multipolar traps essentially are) is likely a fundamental source of confusion when trying to get people on the same page about the dangers of optimization and how relevant evolution is, as an analogy.