The two paragraph argument for AI risk

The very short version of the AI risk argument is that an AI that is *better than people at achieving arbitrary goals in the real world* would be a very scary thing, because whatever the AI tried to do would then actually happen. As stories of magically granted wishes and sci-fi dystopias point out, it’s really hard to specify a goal that can’t backfire, and current techniques for training neural networks are generally terrible at specifying goals precisely. If having a wish granted by a genie is dangerous, having a wish granted by a genie that can’t hear you clearly is even more dangerous.

Current AI systems certainly fall far short of being able to achieve arbitrary goals in the real world better than people, but there’s nothing in physics or mathematics that says such an AI is *impossible*, and progress in AI often takes people by surprise. People just don’t know what the actual time limit is, and unless we humans have a good plan *before* someone makes a scary AI that has a goal, things are going go very badly.


I’ve posted versions of this two-paragraph argument in various places online and used it in person, and it usually goes over pretty well; I think it explains pretty clearly and simply what the AI x-risk community is actually afraid of. I figured I’d post it here for everyone’s convenience.