But I hope you also agree that AI misbehavior and AI existential risk are qualitatively different.
Only in the sense that sufficiently large quantitative differences are qualitative differences. There’s not a fundamental difference in motivation between the Soviet manager producing worthless goods that will let them hit quota and the RL agent hitting score balloons instead of trying to win the race. AI existential risk is just AI misbehavior scaled up sufficiently—the same dynamic might cause an AI managing the global health system to cause undesirable and unrecoverable changes to all humans.
And this is where I feel the existing arguments fall flat and that practical demonstrations are required.
It seems to me like our core difference is that I look at a simple system and ask “what will happen when the simple system is replaced by a more powerful system?”, and you look at a simple system and ask “how do I replace this with a more powerful system?”
For example, it seems to me possible that someone could write code that is able to reason about code, and then use the resulting program to find security vulnerabilities in important systems, and then take control of those systems. (Say, finding a root exploit in server OSes and using this to hack into banks to steal info or funds.)
I don’t think there currently exist programs capable of this; I’m not aware of much that’s more complicated than optimizing compilers, or AI ‘proofreaders’ that detect common programmer mistakes (which hopefully wouldn’t be enough!). Demonstrating code that could do that would represent a major advance, and the underlying insights could be retooled to lead to significant progress in other domains. But that it doesn’t exist now doesn’t mean that it’s science fiction that might never come to pass; it just means we have a head start on thinking about how to deal with it.
Only in the sense that sufficiently large quantitative differences are qualitative differences. There’s not a fundamental difference in motivation between the Soviet manager producing worthless goods that will let them hit quota and the RL agent hitting score balloons instead of trying to win the race. AI existential risk is just AI misbehavior scaled up sufficiently—the same dynamic might cause an AI managing the global health system to cause undesirable and unrecoverable changes to all humans.
It seems to me like our core difference is that I look at a simple system and ask “what will happen when the simple system is replaced by a more powerful system?”, and you look at a simple system and ask “how do I replace this with a more powerful system?”
For example, it seems to me possible that someone could write code that is able to reason about code, and then use the resulting program to find security vulnerabilities in important systems, and then take control of those systems. (Say, finding a root exploit in server OSes and using this to hack into banks to steal info or funds.)
I don’t think there currently exist programs capable of this; I’m not aware of much that’s more complicated than optimizing compilers, or AI ‘proofreaders’ that detect common programmer mistakes (which hopefully wouldn’t be enough!). Demonstrating code that could do that would represent a major advance, and the underlying insights could be retooled to lead to significant progress in other domains. But that it doesn’t exist now doesn’t mean that it’s science fiction that might never come to pass; it just means we have a head start on thinking about how to deal with it.