I think that one of the main disjunctions is that neither self-improving, nor high level intelligence nor control of the world are necessary conditions of human extinction because of AI.
Imagine a computer which helps to create biological viruses for a terrorist. It is neither AGI, nor self-improving, not agent, doesn’t have values, and is local and confined. But it will help to calculate and create perfect virus, which will be capable to wipe out humanity.
The fact that we have many very different scenarios means that there is (almost) no any single intervention which may stop all of them. Exceptions are “destroys all computers” and “create Singleton based of FAI as soon as possible”.
In all other cases we should think not only about correct AI safety theory, but also of the ways to implement it all over the world. For example we could prove that “many level AI boxing” create enough uncertainty for AI, so it will always think that real human could punish it for wrong doing, which would (may be) result in perfect alining. But these prove will be useless if we also do not find the ways to implement it all over AI field. (And we still can’t win over computer viruses in the computer field, even if we know a lot how to prevent them, because a lot of people invest in violating.)
So we have three unknown and very complex tasks: AI, AI safety and delivery of AI safety theory to AI research. To solve the last one we need a system model of global AI research, which should show us where implement actions which will make global research safer.
The best interferences of this kind will help to solve all three hard problems simultaneously.
I think that one of the main disjunctions is that neither self-improving, nor high level intelligence nor control of the world are necessary conditions of human extinction because of AI.
Imagine a computer which helps to create biological viruses for a terrorist. It is neither AGI, nor self-improving, not agent, doesn’t have values, and is local and confined. But it will help to calculate and create perfect virus, which will be capable to wipe out humanity.
This is an excellent point! I’m intending to discuss non-superintelligence scenarios in a follow-up post.
The fact that we have many very different scenarios means that there is (almost) no any single intervention which may stop all of them. Exceptions are “destroys all computers” and “create Singleton based of FAI as soon as possible”.
In all other cases we should think not only about correct AI safety theory, but also of the ways to implement it all over the world. For example we could prove that “many level AI boxing” create enough uncertainty for AI, so it will always think that real human could punish it for wrong doing, which would (may be) result in perfect alining. But these prove will be useless if we also do not find the ways to implement it all over AI field. (And we still can’t win over computer viruses in the computer field, even if we know a lot how to prevent them, because a lot of people invest in violating.)
So we have three unknown and very complex tasks: AI, AI safety and delivery of AI safety theory to AI research. To solve the last one we need a system model of global AI research, which should show us where implement actions which will make global research safer.
The best interferences of this kind will help to solve all three hard problems simultaneously.