I agree that experimentation with near-human level AI (assuming that it is possible) is unlikely to have catastrophic consequences as long as standard safety engineering practices are applied. And in fact, experimentation is likely the only way to make any real progress in understanding the critical issues of AI safety and solving them.
In engineering, “Provably safe”, “provably secure” designs typically aren’t, especially when dealing with novel technologies: once you build a physical system, there is always some aspect that wasn’t properly addressed by the theoretical model but turns out to make your system fail in unanticipated ways. Careful experimentation is needed to gain knowledge of the critical issues of a design in a controlled environment, and once the design has been perfected, you still can’t blindly trust it, rather you need to apply extensive redundancy and impact mitigation measures. That’s how we have made productive use of potentially dangerous stuff such as fire, electricity, cars, trains, aeroplanes, nuclear power and microorganisms, without wiping ourselves out so far. I don’t think that AI should or could be an exception. Gambling our future on a brittle mathematical proof, now that would be foolish, in my humble opinion.
Much of the current discussion about AI safety suggests me an analogy with some hypothetical eighteen century people trying to discuss air traffic safety: They could certainly imagine flying machines, they could understand that these machines would have to work according to Newtonian mechanics and known fluid dynamics, and they could probably foresee some of the inherent dangers of operating such machines. But obviously, their discussions wouldn’t produce any significant result, because they would lack knowledge key facts about the architecture of actually workable designs. They wouldn’t know anything about internal combustion engines, aluminium, radio communication, radars, and so on. Present-day self-proclaimed “AI risk experts” look much like those hypothetical eighteen century “aviation risk experts”: they have little or no idea of an actual AI design is going to look like, and yet they attempt to argue from first principles (moral philosophy, economic theories and mathematical logic) about its safety. It goes without saying that I don’t have much confidence in their approach.
The difference between AI safety and e.g. car safety is that humanity can survive a single car crash. Provably safe AI is needed because it’s the only way to get it right, not because it’s the easiest way.
Humanity can probably also survive an AI accident.
There is no reason to assume that any AI failure would lead by default to a catastrophic scenario where evil robots who look suspiciously like the former governor of California hunt down the last survivors of humanity after most of it has been wiped out by a global nuclear attack.
But it depends on how likely this failure mode is. Are we talking about something like the possibility of the nuclear tests igniting the atmosphere, or the LHC creating a black hole?
They were ruled out (with some probability of error) theoretically once people already had working designs, and using knowledge obtained from experimentation on smaller designs.
Nobody here is suggesting to wire the first experimental AGI to the nuclear missiles launch systems. The point is that you need a good idea about what a working AGI design will look like before you can say anything meaningful about its safety. Experimentation, with reasonable safety measures, will be most likely needed before a full-fledged design can be produced.
Very good post!
I agree that experimentation with near-human level AI (assuming that it is possible) is unlikely to have catastrophic consequences as long as standard safety engineering practices are applied.
And in fact, experimentation is likely the only way to make any real progress in understanding the critical issues of AI safety and solving them.
In engineering, “Provably safe”, “provably secure” designs typically aren’t, especially when dealing with novel technologies: once you build a physical system, there is always some aspect that wasn’t properly addressed by the theoretical model but turns out to make your system fail in unanticipated ways.
Careful experimentation is needed to gain knowledge of the critical issues of a design in a controlled environment, and once the design has been perfected, you still can’t blindly trust it, rather you need to apply extensive redundancy and impact mitigation measures.
That’s how we have made productive use of potentially dangerous stuff such as fire, electricity, cars, trains, aeroplanes, nuclear power and microorganisms, without wiping ourselves out so far. I don’t think that AI should or could be an exception.
Gambling our future on a brittle mathematical proof, now that would be foolish, in my humble opinion.
Much of the current discussion about AI safety suggests me an analogy with some hypothetical eighteen century people trying to discuss air traffic safety:
They could certainly imagine flying machines, they could understand that these machines would have to work according to Newtonian mechanics and known fluid dynamics, and they could probably foresee some of the inherent dangers of operating such machines. But obviously, their discussions wouldn’t produce any significant result, because they would lack knowledge key facts about the architecture of actually workable designs. They wouldn’t know anything about internal combustion engines, aluminium, radio communication, radars, and so on.
Present-day self-proclaimed “AI risk experts” look much like those hypothetical eighteen century “aviation risk experts”: they have little or no idea of an actual AI design is going to look like, and yet they attempt to argue from first principles (moral philosophy, economic theories and mathematical logic) about its safety.
It goes without saying that I don’t have much confidence in their approach.
The difference between AI safety and e.g. car safety is that humanity can survive a single car crash. Provably safe AI is needed because it’s the only way to get it right, not because it’s the easiest way.
Humanity can probably also survive an AI accident.
There is no reason to assume that any AI failure would lead by default to a catastrophic scenario where evil robots who look suspiciously like the former governor of California hunt down the last survivors of humanity after most of it has been wiped out by a global nuclear attack.
Not any failure. But the existence of failure modes like superintelligent paperclip maximizers is sufficient to make this technology different.
But it depends on how likely this failure mode is. Are we talking about something like the possibility of the nuclear tests igniting the atmosphere, or the LHC creating a black hole?
Actually those are excellent examples. Those possibilities were ruled out theoretically. No one was crazy enough to check it experimentally first.
They were ruled out (with some probability of error) theoretically once people already had working designs, and using knowledge obtained from experimentation on smaller designs.
Nobody here is suggesting to wire the first experimental AGI to the nuclear missiles launch systems. The point is that you need a good idea about what a working AGI design will look like before you can say anything meaningful about its safety.
Experimentation, with reasonable safety measures, will be most likely needed before a full-fledged design can be produced.
It appears to me that before you start any given experiment you must have sufficient theoretical backing that this particular experiment is safe.