The Big Picture Of Alignment (Talk Part 1)

Link post

I recently gave a two-part talk on the big picture of alignment, as I see it. The talk is not-at-all polished, but contains a lot of stuff for which I don’t currently know of any good writeup. Major pieces in part one:

  • Some semitechnical intuition-building for high-dimensional problem-spaces.

    • Optimization compresses information “by default”

    • Resources and “instrumental convergence” without any explicit reference to agents

  • A frame for thinking about the alignment problem which only talks about high-dimensional problem-spaces, without reference to AI per se.

    • The central challenge is to get enough bits-of-information about human values to narrow down a search-space to solutions compatible with human values.

    • Details like whether an AI is a singleton, tool AI, multipolar, oracle, etc are mostly irrelevant.

  • Fermi estimate: just how complex are human values?

  • Coherence arguments, presented the way I think they should be done.

    • Also subagents!

Note that I don’t talk about timelines or takeoff scenarios; this talk is just about the technical problem of alignment.

Here’s the video for part one:

Big thanks to Rob Miles for editing! Also, the video includes some good questions and discussion from Adam Shimi, Alex Flint, and Rob Miles.