Where’s the foom?

“The first catastrophe mechanism seriously considered seems to have been the possibility, raised in the 1940s at Los Alamos before the first atomic bomb tests, that fission or fusion bombs might ignite the atmosphere or oceans in an unstoppable chain reaction.”[1]

This is not our first rodeo. We have done risk assessments before. The best reference-class examples I could find were the bomb, vacuum decay, killer strangelets, and LHC black holes (all covered in [1]).

I was looking for a few days, but didn’t complete my search, but I decided to publish this note as now Tyler Cowen is asking too: “Which is the leading attempt to publish a canonical paper on AGI risk, in a leading science journal, refereed of course. The paper should have a formal model or calibration of some sort, working toward the conclusion of showing that the relevant risk is actually fairly high. Is there any such thing?”

The three papers people replied with were:
- Is Power-Seeking AI an Existential Risk?
- The Alignment Problem from a Deep Learning Perspective
Unsolved Problems in ML Safety

Places I was looking so far:
- The list of references for that paper[2]
- The references for the Muehlhauser and Salamon intelligence explosion paper[3]
- The Sandburg review of singularities[4] and related papers (these are quite close to passing muster I think)

Places I wanted to look further:

- Papers by Yampolsky, aka[5]
- Papers mentioned in there by Schmidhuber (haven’t gotten around to this)
- I haven’t thoroughly reviewed Intelligence Explosion Microeconomics, maybe this is the closest thing to fulfilling the criteria?

But if there is something concrete in eg. some papers by Yampolsky and Schmidhuber, why hasn’t anyone fleshed it out in more detail?

For all the time people spend working on ‘solutions’ to the alignment problem, there still seems to be a serious lack of ‘descriptions’ of the alignment problem. Maybe the idea is, if you found the latter you would automatically have the former?

I feel like something built on top of Intelligence Explosion Microeconomics and the Orthogonality Thesis could be super useful and convincing to a lot of people. And I think people like TC are perfectly justified in questioning why it doesn’t exist, for all the millions of words collectively written on this topic on LW etc.

I feel like a good simple model of this would be much more useful than another ten blog posts about the pros and cons of bombing data centers. This is the kind of thing that governments and lawyers and insurance firms can sink their teeth into.

Where’s the foom?

Edit: Forgot to mention clippy. Clippy is in many ways the most convincing of all the things I read looking for this, and whenever I find myself getting skeptical of foom I read it again. Maybe an summary of the mechanisms described in there would be a step in the right direction?

  1. ^
  2. ^
  3. ^
  4. ^
  5. ^