Untangling Infrabayesianism: A redistillation [PDF link; ~12k words + lots of math]

Link post

[Epistemic status: improved redistillation of the infrabayesianism sequence.]

So you want to understand infrabayesianism, to hack to the center of that thorny wood and seek out and recover any treasures hidden there? You’ve come to a correct creature for a guide. If you want to journey there, make sure you’ve already got the necessary tools well in hand: some simple decision theory, the basics of topology and linear algebra, and a little measure theory—for that last, if you know how a Lebesgue integral is defined and why no reasonable σ-algebra can encompass the full power set, then you’re already doing fine. If you find yourself struggling with such things, reach out to me on Discord or in PMs here and I’ll see what we can do.

Infrabayesianism seems like exactly what we might need as alignment researchers: a way to discuss all of our usual decision-theoretic questions while also getting to account for uncertainty about the world, compensate for policy-dependent environments and adversarial selection, and even talk about UDT puzzles. It does this by fundamentally being a decision theory that has explicit reasonable machinery for handling Knightian uncertainty about the environment due to nonrealizable or nonlearnable hypotheses while still permitting nontrivial inference and planning.

Three major brambly hedges block the way between you and understanding: the prickly snagging of the frequently unclear, unintuitive, or just plain lacking notation used in the original infrabayesian sequence; thorny philosophical tangles up front and scattered throughout; and math and its accompanying density of concept and notation getting thicker as we go deeper in. Follow me, though, and we’ll slip right through them with barely a scratch, and eat a couple of delicious berries from right off their vines. In fact, I can tell you up front that if you haven’t read the original infrabayesianism sequence too closely and aren’t that familiar with its notation… that’s an active benefit, because we won’t need most of it here. We won’t be cleaving perfectly to its choices of notation or terminology, though I will eventually provide a dictionary between the two as a postscript.

(I really don’t feel like trying to port over 28 PDF pages’ worth of TeX and dense math writing here, let alone figure out how to divide it up into several posts, so rather than be redundant I’ve decided to link to the pdf instead. Please imagine that it was reproduced here in its entirety, and comment accordingly. This is a first-draft redistillation, of the kind that I might have been satisfied with as a minimum viable submission to the Infra-Bayes bounty. I may well edit, reorganize, revise, or remove parts of it as I see fit. If I feel like it, I might even turn this into an ArXiv submission or sequence of carefully-ordered posts of my own.)