A little bit late, but there are more reasons why I think the evolution analogy is particularly good and better than the selective breeding analogy.
Evolution basically ended up optimizing the brain such that it has desires that were instrumental to genetic fitness. So we end up with these instrumental sub-preferences for high calorie food or sex. Then we go through a huge shift of the distribution or environment from a very constrained hunter gatherer society to a technologically advanced civilization. This isn’t just a random shift but a shift toward and environment with a much larger space of possible actions and outcomes, options such as radically changing aspects of the environment. So naturally there are now many superior ways to satisfy our preferences than before. For AI this is the same thing, it will go from being the nice assistant in ChatGPT to having options such as taking over, killing us, running it’s own technology. It’s essentially guaranteed that there will be better ways to satisfy preferences without human oversight, out of the control of humans. Importantly, that isn’t actually a distributional shift you can test in any meaningful way. You could either try incremental stuff (giving the rebellious general one battalion at a time) or you could try to trick it into believing it can takeover through some honey pot (Imagine trying to test human what they would do if they were God emperor of the galaxy. That would be insane and the subject wouldn’t believe the scenario). Both of these are going to fail.
The selective breeding story ignores the distributional shift at the end, it does not account for this being a particular type of distributional shift (from low action space, immutable environment to large action space, mutable environment). It doesn’t account for the fact that we can’t test such a distribution such as being emperor.
A little bit late, but there are more reasons why I think the evolution analogy is particularly good and better than the selective breeding analogy.
Evolution basically ended up optimizing the brain such that it has desires that were instrumental to genetic fitness. So we end up with these instrumental sub-preferences for high calorie food or sex. Then we go through a huge shift of the distribution or environment from a very constrained hunter gatherer society to a technologically advanced civilization. This isn’t just a random shift but a shift toward and environment with a much larger space of possible actions and outcomes, options such as radically changing aspects of the environment. So naturally there are now many superior ways to satisfy our preferences than before. For AI this is the same thing, it will go from being the nice assistant in ChatGPT to having options such as taking over, killing us, running it’s own technology. It’s essentially guaranteed that there will be better ways to satisfy preferences without human oversight, out of the control of humans. Importantly, that isn’t actually a distributional shift you can test in any meaningful way. You could either try incremental stuff (giving the rebellious general one battalion at a time) or you could try to trick it into believing it can takeover through some honey pot (Imagine trying to test human what they would do if they were God emperor of the galaxy. That would be insane and the subject wouldn’t believe the scenario). Both of these are going to fail.
The selective breeding story ignores the distributional shift at the end, it does not account for this being a particular type of distributional shift (from low action space, immutable environment to large action space, mutable environment). It doesn’t account for the fact that we can’t test such a distribution such as being emperor.