Parable: The Bomb that doesn’t Explode
You’re an engineer working for a military contractor. One day, your project manager comes to you and asks you to design a container for plastic explosive. It is to contain several kilograms of C4, enough to destroy a building. This is pretty dangerous, but you know that C4 is actually pretty safe. It can’t be accidentally detonated by fire, impact or bullets. Only a detonator (another explosive) can trigger C4, so you don’t include a detonator in your design, for safety.
Your PM reviews the design and gives you some feedback. “It would be a lot more useful if you put a blasting cap inside.” You grimace. More useful, yes. But a lot less safe. Nevertheless, you do your job and install a blasting cap. The blasting cap has two electrical leads. If a voltage is applied across those leads, the C4 will explode, killing everyone around. To keep things safe, you snip off the the leads and put the blasting cap inside a pill-shaped plastic container, inside the larger container that contains the C4.
“Well, that’s no good,” your PM replies, “What if someone does want to put a voltage across the leads?” You grimace even more. It is getting very difficult to make this design safe, you think. But you have a solution: inside the pill-shaped plastic container, you install a Raspberry Pi. You install SELinux on the Pi and set it up to connect to WiFi—but only secured networks. You program up a fancy web interface that allows the user to specify exactly what voltage they want to apply to the leads. The interface caps the voltage at a low level. It does not allow the user to apply a sufficient voltage to actually trigger the blasting cap—or at least you hope that’s how blasting caps work.
“Well, that’s no good,” your PM replies, “What if someone wants to apply a higher voltage?” You grimace again. Then it’ll explode! It’ll kill everyone! But this is not a workplace where you raise objections, so instead you just diligently do your job. If you don’t make this dangerous thing, the company will hire someone else without your moral scruples, and certainly that person would make something really dangerous! So you alter the web interface, allowing the user to specify any voltage up to the limit of what the Pi can output. You add a big red warning screen, explaining how C4 is dangerous, and forcing the user to click “Yes, I’m really sure I want to apply this voltage.” You also add a fancy cryptographic security key.
Your PM reviews the final design. She’s a bit confused why this bomb is so damn complicated, but no matter, it serves its purpose well. The bomb goes on to kill someone whose name you cannot pronounce, thus defending the American people from great evil. (/s)
The moral of the story is, if you don’t want a dangerous thing to get built, you have to actually not build it.
- 9 Jul 2022 12:57 UTC; 1 point) 's comment on My Most Likely Reason to Die Young is AI X-Risk by (
I don’t quite get it. The PM and the engineer seem incompetent, in that they didn’t simply specify under what conditions the bomb should and should not detonate. The engineer would be quite justified in being annoyed if asked to make “maximially safe storage of C4, requiring transfer to an actual detonation environment before use”, and then it changed to include detonation in place.
Also, I’m unsure what the parable is. C4 has very few other uses than detonation, so it’s not a very good analog for useful-but-not-always-violent things like nuclear power, gain-of-function biological research, or AI. Making a bomb, given C4, is less interesting as a comparison than being a chemical engineer deciding whether to manufacture C4 more efficiently.
While the engineer learned one lesson, the PM will learn a different lesson when a bunch of the bombs start installing operating system updates during the mission, or won’t work with the new wi-fi system, or something: the folly of trying to align an agent by applying a new special case patch whenever something goes wrong.
No matter how many patches you apply, the safety-optimizing agent keeps going for the nearest unblocked strategy, and if you keep applying patches eventually you get to a point where its solution is too complicated for you to understand how it could go wrong.
Congratulations. Now when the bomb is attached the the CPU running the malevolent AI, the AI can hack the Pi and prevent the bomb from going off.
Sometimes the world needs dangerous things, like weapons.
If you don’t want to build dangerous things, don’t become a munitions engineer. (But be aware that someone else will take that role.)
If you are a munitions engineer, be a good one. Build a bomb that goes ‘bang’ reliably when it’s supposed to, and not otherwise. Keep it simple.
What if it is a bomb so big that it will destroy the entire planet?
See above. Don’t become a munitions engineer, and, being aware that someone else will take that role, try to prevent anyone from taking that role. (Hint: That last part is very hard.)
The conclusions might change if planet-destroying bombs are necessary for some good reason, or if you have the option of safely leaving the planet and making sure nobody that comes with you will also want to build planet-destroying bombs. (Hint: That last part is still hard.)