Why No *Interesting* Unaligned Singularity?

Putting two matching ideas I’ve encountered together.

The Question:

agentofuser:

Considering all humans dead, do you still think it’s going to be the boring paperclip kind of AGI to eat all reachable resources? Any chance that inscrutable large float vectors and lightspeed coordination difficulties will spawn godshatter AGI shards that we might find amusing or cool in some way? (Value is fragile notwithstanding)

Eliezer Yudkowsky:

Yep and nope respectively. That’s not how anything works.

Yudkowsky’s confidence here puzzled me at first. Why be so sure that powerful ML training won’t even produce something interesting from the human perspective?

Admittedly, the humane utility function is very particular and not one we’re likely to find easily. And a miss in modeling it along plenty of value dimensions would mean a future optimized into a shape we see no value in. But what rules out a near miss in modeling the humane utility function, a miss along other value dimensions, that loses some but not all of what humanity cares about? A near miss would entail some significant loss of value and interestingness from our perspective—but it’d still be somewhat interesting.

The Answer:

Scott Alexander:

The base optimizer is usually something stupid that doesn’t “know” in any meaningful sense that it has an objective—eg evolution, or gradient descent. The first thing it hits upon which does a halfway decent job of optimizing its target will serve as a mesa-optimizer objective. There’s no good reason this should be the real objective. In the human case, it was “a feeling of friction on the genitals”, which is exactly the kind of thing reptiles and chimps and australopithecines can understand. Evolution couldn’t have lucked into giving its mesa-optimizers the real objective (“increase the relative frequency of your alleles in the next generation”) because a reptile or even an australopithecine is millennia away from understanding what an “allele” is.

Even a “near miss” model is impossibly tiny in simplicity-weighted model space. A search algorithm with a simplicity bias isn’t going to stumble all the way over to that particular model when so many other far-simpler models will perform just as well.

But isn’t there some chance that ML training will happen to find a near-miss model anyways?

No. That forlorn hope is something you couldn’t ever expect given those premises, like, e.g., hoping countertop puddles will spontaneously reassemble into ice cubes, once you know thermodynamics. An agent that wants to maximize its inclusive genetic fitness just wasn’t an option for evolution, and having interesting utility functions won’t be an option for powerful ML training on an arbitrary task without seriously altering how we do that kind of search to make it so.