Evolution analogies are bad. There are many specific differences between ML optimization processes and biological evolution that predictably result in very different high level dynamics
One major intuition pump I think important: evolution doesn’t get to evaluate everything locally. Gradient descent does. As a result, evolution is slow to eliminate useless junk though it does do so eventually. Gradient descent is so eager to do it that we call it catastrophic forgetting.
Gradient descent wants to use everything in the system for whatever it’s doing, right now.
I disagree with the optimists that this makes it trivial because to me it appears that the dynamics that make short term misalignment likely are primarily organizational among humans—the incentives of competition between organizations and individual humans. Also RL-first ais will inline those dynamics much faster than RLHF can get them out.
One major intuition pump I think important: evolution doesn’t get to evaluate everything locally. Gradient descent does. As a result, evolution is slow to eliminate useless junk though it does do so eventually. Gradient descent is so eager to do it that we call it catastrophic forgetting.
Gradient descent wants to use everything in the system for whatever it’s doing, right now.
I disagree with the optimists that this makes it trivial because to me it appears that the dynamics that make short term misalignment likely are primarily organizational among humans—the incentives of competition between organizations and individual humans. Also RL-first ais will inline those dynamics much faster than RLHF can get them out.