Nontrivial pillars of IABIED

Epistemic status: Jotting down some thoughts about the underlying Yudkowskian “doomer” philosophy, not a book review, intended for the lesswrong audience only.

I believe that one of the primary goals of the Sequences was to impart certain principles which illuminate the danger from unaligned AI. Most of these principles are widely accepted here (humans are not peak intelligence, some forth of the orthogonality thesis, etc.). However not everyone agrees with IABIED. I think there are a few additional nontrivial pillars of the worldview which can be taken as cruxes. This is an idiosyncratic list of the ones that occur to me. I think that all of them are correct, but if my model of AI risk breaks, it is plausibly because one of these pillars shattered. Things look pretty bad on my model, which means that conditioning on AI going well (without major changes to the status quo) my model breaks somewhere. Intuitively I would be surprised but not totally shocked if my model broke, and in this post I am trying to find the source of that anticipated anti-surprisal.

General Intelligence Transfers Hard. A general system tends to outperform narrow systems even within their narrow domain by thinking outside of the box, leaning on relevant analogies to a wider knowledge base, integrating recursive self-improvements (including rationality techniques and eventually self-modification), and probably for several other reasons I have not thought of. This is related to the claim that training on a many narrow tasks tends to spawn general (mesa-)optimizers, which seems to be pretty strongly vindicated by the rise of foundation models. It is also one reason that you can’t just stop the rogue AI’s nanobot swarm with a nanotechnology specialist AI. This pillar opposes the vibe that the entire economy will accelerate in parallel, and supports the localized foom. The weakness of this pillar (meaning, my primary source of uncertainty about this pillar) is that hardcoding the right answer for a particular task might make better usage of limited resources to find an effective local solution—my intuition about non-uniform computation (say, with circuits) is that there are shockingly good but nearly incompressible/​”inexplicable” solutions to many particular problems, which take advantage of “accidental” or contingent facts. Also, a more plausible outcome from most types of very hard optimization seems to be finding such messy solutions, which may prevent capabilities from generalizing out of distribution (in fact, for the same sort of reason that alignment will not generalize out of distribution by default).

Reality is Full of Side-channels. This is quite explicit in IABIED: a sufficiently smart adversary can “flip the game board” and do something that you thought was impossible, so it is very hard to box a superintelligence. This has the vibe of an attacker advantage though I don’t think that is strictly central or required. It is often described as “security mindset.” This is another reason you can’t use a specialized system to stop the nanobots: even if you figured that out, the superintelligence would just win in a slightly different way, perhaps by hacking the defender system and killing you with your own super-antibodies. Another way of phrasing this is that Yudkowsky seems to believe “keyholder power” is not very strong—you cannot robustly invest your power as privilege. So another way of stating this one might be “privilege is fragile.”

The Gods are Weak. In particular, we can beat natural selection at ~everything in the next few years by building big enough computers. Think the human brain is mysterious? We can beat it by throwing together an equivalent number of crude simulated neurons and dumping in a bunch of heuristically selected data. Think bacteria are cool? A superintelligence can invent Drexlarian nanotechnology in a week and eat the biosphere. Think nation-state cryptography is secure? Please, broken before the end of this sentence (see also “Reality is Full of Side-channels”). A priori, this pillar still seems highly questionable to me. However, it does seem to be holding up surprisingly well empirically?? The biggest weakness in (my certainty about) this pillar is that when it comes to engineering flexible, robust physical stuff, the kind of stuff that grows rather than decaying, evolution still seems to have us beat across the board. I can’t rule out the possibility that there are incredibly many tricks needed to make a long-running mind that adapts to a messy world. However, the natural counterargument is that the minimal training/​learning rules probably are not that complicated. Usually, one argues that evolution by natural selection is a very simple optimizer. I am not sure how true this is; natural selection doesn’t run in the ether, it runs on physics (over vast reaches of space and time), and it seems hard to get the space/​time/​algorithmic complexity arguments about this confidently right.[1] So, one hidden sub-pillar here is a kind of computational substrate independence.

  1. ^

    See also the physical Church-Turing thesis /​ feasibility thesis: https://​​en.wikipedia.org/​​wiki/​​Church%E2%80%93Turing_thesis#Variations