I just came across an analogy that seems applicable for AI safety.
AGI is like a super powerful sports car that only has an accelerator, no brake pedal. Such a car is cool. You’d think to yourself:
Nice! This is promising! Now we have to just find ourselves a brake pedal.
You wouldn’t just hop in the car and go somewhere. Sure, it’s possible that you make it to your destination, but it’s pretty unlikely, and certainly isn’t worth the risk.
In this analogy, the solution to the alignment problem is the brake pedal, and we really need to find it.
(I’m not as confident in the following, plus it seems to fit as a standalone comment rather than on the OP.)
Why do we really need to find it? Because we live in a world where people are seduced by the power of the sports car. They are in a competition to get to their destinations as fast as possible and are willing to be reckless in order to get there.
Well, that’s the conflict theory perspective. The mistake theory perspective is that people simply think they’ll be fine driving the car without the brakes.
That sounds crazy. And it is crazy! But think about it this way. (The analogy starts to break down a bit here.) These people are used to driving wayyyy less powerful cars. Sometimes these cars don’t have breaks at all, other times they have mediocre brake systems. Regardless, it’s not that dangerous. These people understand that the sports car is in a different category and is more dangerous, but they don’t have a good handle on just how much more dangerous it is, and how it is totally insane to try to drive a car like that without brakes.
We can also extend the analogy in a different direction (although the analogy breaks down when pushed in this direction as well). Imagine that you develop breaks for this super powerful sports car. Awesome! What do you do next? You test them. In as many ways as you can.
However, with AI, we can’t actually do this. We only have one shot. We just have to install them, hit the road, and hope they work. (Hm, maybe the analogy does work. Iirc, the super powerful racing cars, are built to only be driven once/a few times. There’s a trade-off between performance and how long the car lasts. And for races, they go all the way towards the performance side of the spectrum.)
I just came across an analogy that seems applicable for AI safety.
AGI is like a super powerful sports car that only has an accelerator, no brake pedal. Such a car is cool. You’d think to yourself:
You wouldn’t just hop in the car and go somewhere. Sure, it’s possible that you make it to your destination, but it’s pretty unlikely, and certainly isn’t worth the risk.
In this analogy, the solution to the alignment problem is the brake pedal, and we really need to find it.
(I’m not as confident in the following, plus it seems to fit as a standalone comment rather than on the OP.)
Why do we really need to find it? Because we live in a world where people are seduced by the power of the sports car. They are in a competition to get to their destinations as fast as possible and are willing to be reckless in order to get there.
Well, that’s the conflict theory perspective. The mistake theory perspective is that people simply think they’ll be fine driving the car without the brakes.
That sounds crazy. And it is crazy! But think about it this way. (The analogy starts to break down a bit here.) These people are used to driving wayyyy less powerful cars. Sometimes these cars don’t have breaks at all, other times they have mediocre brake systems. Regardless, it’s not that dangerous. These people understand that the sports car is in a different category and is more dangerous, but they don’t have a good handle on just how much more dangerous it is, and how it is totally insane to try to drive a car like that without brakes.
We can also extend the analogy in a different direction (although the analogy breaks down when pushed in this direction as well). Imagine that you develop breaks for this super powerful sports car. Awesome! What do you do next? You test them. In as many ways as you can.
However, with AI, we can’t actually do this. We only have one shot. We just have to install them, hit the road, and hope they work. (Hm, maybe the analogy does work. Iirc, the super powerful racing cars, are built to only be driven once/a few times. There’s a trade-off between performance and how long the car lasts. And for races, they go all the way towards the performance side of the spectrum.)