Optimization & AI Risk

There many ways to taxonomize AI risk. One interesting framing, is ‘risks from optimization’. These are not new ideas. Eliezer wrote about this ~15 years ago, and it seems like many ‘theory folks’ have been saying this for years. I don’t understand these concepts deeply – I’m trying to improve my understanding by writing about them. Hopefully, I can add something new in the process.

Thanks to Jo Jiao for comments on a draft, and for nudging me to write this. Feedback is highly appreciated!

Epistemic status: Exploratory.

Tl;dr: intelligence is optimization, and (too much) optimization is bad.

First, what is optimization? It’s ‘squeezing’ the world into improbable states. Worlds where I have a quintillion dollars in my bank account, are much less likely than worlds where I don’t. So I’d need to optimize strongly to make this possible. This also illustrates degrees of optimization. Earning a thousand dollars is much easier than earning a million dollars. So, I’d need to optimize less hard to achieve the former. Optimizers don’t need to be ‘conscious’ entities. For instance, it’s the abstract forces of evolution that made complex, multicellular life possible.[1] In the real world, one’s ‘capacity to optimize’ corresponds to how much intelligence /​ money /​ power one has.

This framing helps unify risks from misuse & misalignment.[2] Paperclip maximizers are the prototypical example of misalignment. Here, it’s the AI system that’s directly optimizing too hard. On the misuse end, take AI-enabled coups. Here, it’s the people /​ group that use the AI system to strongly optimize for their own ends.

Too much optimization seems generally bad. This is for two reasons. One, worlds that someone else optimizes strongly are unlikely to be worlds that you’d prefer as well. Eg. you don’t want to live in a dictatorship. But also, I’d also be wary about optimizing too strongly even for *my* own goals. Human goals are often weird & inconsistent. It’s easy for my stated preferences to be outer misaligned, if I push too hard. Eg. if I asked a superintelligent genie to keep me safe, it would probably lock me up in a white room with soft walls.[3]

And this is one way I view AI risk. Intelligence is an optimizer, which can squeeze the world strongly. Regardless of whether it’s a misaligned AI doing so, or a malicious actor misusing the AI – it’s increasingly likely that we’ll get squished.

  1. ^

    Counterpoint: anthropic fallacy?

  2. ^

    Richard Ngo has a great talk on this.

  3. ^

    Counterpoint: the issue might also lie in incorrect goal specification, as opposed to optimization writ large (h/​t Jo). It seems like its a bit of both