Your bad example takes 5 minutes at a party. Your “good” example takes 8 weeks of work. It is not hard, in general, to get a better answer by investing more effort.
A specific example, worked out in full detail can exhibit the presence of security holes, but not their absence. If the system is a complicated mess, it can be very hard to find a security hole, but also very hard to prove it doesn’t have one. (And it’s quite likely it does have one)
When speculating about the risks of future AI, the easiest proofs of concept will be rather toy and of arguable relevance. More sophisticated proofs of concept on less toy examples might be dangerous to create.
If you see a bunch of potential threats, it’s not guaranteed that all those threats are real. But they are all likely enough to be real that you have to plan for them. The list of speculations will contain some false positives. The list of fully detail worked out exploits will contain false negatives.
It doesn’t take “8 weeks” to come up with a good example if you already understand the problem. Most of the “8 weeks” is spent realizing you didn’t understand the problem correctly enough to create the POC, or to propose a solution that works.
Your bad example takes 5 minutes at a party. Your “good” example takes 8 weeks of work. It is not hard, in general, to get a better answer by investing more effort.
A specific example, worked out in full detail can exhibit the presence of security holes, but not their absence. If the system is a complicated mess, it can be very hard to find a security hole, but also very hard to prove it doesn’t have one. (And it’s quite likely it does have one)
When speculating about the risks of future AI, the easiest proofs of concept will be rather toy and of arguable relevance. More sophisticated proofs of concept on less toy examples might be dangerous to create.
If you see a bunch of potential threats, it’s not guaranteed that all those threats are real. But they are all likely enough to be real that you have to plan for them. The list of speculations will contain some false positives. The list of fully detail worked out exploits will contain false negatives.
It doesn’t take “8 weeks” to come up with a good example if you already understand the problem. Most of the “8 weeks” is spent realizing you didn’t understand the problem correctly enough to create the POC, or to propose a solution that works.