Optimization and Adequacy in Five Bullets

Link post

Context: Quite recently, a lot of ideas have sort of snapped together into a coherent mindset for me. Ideas I was familiar with, but whose importance I didn’t intuitively understand. I’m going to try and document that mindset real quick, in a way I hope will be useful to others.

Five Bullet Points

  1. By default, shit doesn’t work. The number of ways that shit can fail to work absolutely stomps the number of ways that shit can work.

  2. This means that we should expect shit to not work, unless something forces it into the narrow set of states that actually work and do something.

  3. The shit that does work generally still doesn’t work for humans. Our goals are pretty specific and complicated, so the non-human goals massively outnumber the human goals.

  4. This means that even when shit works, we should expect it to not be in our best interests unless something forces it into the narrow range of goal-space that we like.

  5. Processes that force the world into narrow, unlikely outcome ranges are called optimization processes—they are rare, and important, and not magic.

Main Implications

The biggest takeaway is look for optimization processes. If you want to use a piece of the world (as a tool, as an ally, as evidence, as an authority to defer to, etc), it is important to understand which functions it has. In general, the functions a thing is “supposed to have” can come wildly apart from the things that it’s actually optimized to do. If you can’t find a mechanism that forces a particular thing to have a particular useful property, it probably doesn’t. Examples:

  • “Society” designates certain people/​organizations/​processes as authorities—sources of reliable evidence. It turns out that a lot of these are not in fact reliable, because nothing forces them to be. Nobody fires the news anchor when they get a prediction wrong.

  • A sort of internal version is checking when to use explicit modeling and when to trust intuition. Intuitions seem to come with a built-in feeling that they are truth-tracking and should be trusted, even in cases where you know they don’t have the evidence budget to make this possible. To a certain extent, you can check manually if an intuition is actually optimized to be truth-tracking.

  • The air conditioner thing.

The obvious first step when looking for optimization processes: learn to recognize optimization processes. This is the key to what Yudkowsky calls an adequacy argument, which is what I’ve been broadly calling “hey does this thing work the way I want it to?”

  • Evolutions and markets are the canonical examples. There is plenty of math out there about when these happen, how powerful they are, and how they work in general.

  • Skin in the game” is often referenced as a thing-that-makes-stuff-work: the classic example is a role where you get fired if you make a sufficiently bad mistake. This is basically a watered-down version of evolution, with only the “differential survival” bit and no reproduction or heritable traits or any of that. Fitness doesn’t climb over time, but hey, more-fit participants will still be over-represented in the population.

  • Testing, recruiting, and other sorts of intentional selection of people definitely fit the definition, but in practice it seems they generally optimize for something different from what they are Supposed To optimize for.

  • Thankfully, people can also be optimizers! Sometimes. We definitely optimize for something. Consider: the vast majority of action-sequences lead to death, and the brain’s job is to identify the narrow slice that manages all of our many survival needs at the same time. But from there I think it still requires a bit of hand-waving to justify arguments about which exactly tasks those optimization abilities generalize to and which they don’t.

  • Definitely way more than this but you get the idea. Maybe check out some Framing Practicum posts and find things that qualify as optimizers? Also, just read Inadequate Equilibria.

Musings

  • Note that this is one framing out of many—I think it’s a subset of a broader sort of thing about mechanistic thinking and gears-level models. There are times when it doesn’t particularly make sense to frame things in terms of optimizers: consider your shoelaces. There are a bunch of ways you can maybe frame this in terms of adequacy arguments, but it’s kinda clunky and not necessary unless you really want to get into the details of why you trust your eyes.

  • Optimization is extremely related to the Bayesian definition of evidence. Left as an exercise for the reader.

  • You may notice some parallels between 24 and capabilities/​alignment in the context of AI safety. What a coincidence.

  • As evidence for #1: consider entropy. Of all the ways a set of gears can be arranged in space, how many of them form a machine with any recognizable input-output behavior at all? How many instead form a pile of gears?

    • Interesting aside on this: I think entropy does, in a sense, come from the way humans view the world. It’s not like piles of gears are somehow inherently pointless—there’s a huge space of variety and limitless possibilities that I, in my human closed-mindedness, shove under the label “eh, just a pile of gears”. When we say that some macro-states contain more micro-states than others, we’re basically saying that there are huge swaths of micro-states that we basically just don’t care about enough to classify precisely into lots of macro-states, rather than just sweeping them under one label. To me, a pile of gears is just a pile of gears—but that’s a fact about me, not about the gears.

    • There’s also maybe a rebuttal that has to do with carving reality at its joints—in the real world, the distribution of physical systems is not totally uniform, meaning it has some cluster structure that suggests how to segment it into concepts. The point of the example above is that even without cluster structure, we can still segment reality based on our preferences, and it produces some familiar entropic behaviors.

  • As evidence for #3: consider how ridiculously massive goal-space is. Slightly more whimsically: of all the machines you formed in the previous bullet when throwing gears together randomly, how many of them would you actually want to own/​use?

  • Optimization processes are themselves “things that work”—the number of non-optimizing possible systems dwarfs the number of optimizing ones. This rarity means they generally have to be created by other optimization processes. In fact, you can trace this storied lineage all the way back to its root, some Primordial Accident where all this tomfoolery first bootstrapped itself out of the mud by sheer dumb luck.

  • This view is sufficient to give us what we might fancifully call a Fundamental Thesis of Transhumanism: the current state of the world is partly optimized for things other than human flourishing, and mostly just not optimized for anything at all. This means we should expect a world optimized for human flourishing to look very different from today’s world, in basically all respects imaginable.

  • We should expect X-risk to be hard. In a sense, problems with the one-shot structure that X-risk has can break the one tool we’ve got. I’m being fully literal when I say, nothing could possibly have prepared us for this. The challenge is not calibrated to our skill level—time to do the impossible.

  • I’m pretty curious about takes on what the false positive/​negative rates of this heuristic might be. Are there likely to be lots of phenomena which are highly optimized, but in subtle/​complicated ways I can’t notice? Phenomena which I think are optimized, but actually aren’t?