Understanding Eliezer’s “Any Fact Would Move Me in the Same Direction”

Eliezer says some confusing things in Biology Inspired Timelines:

Even without knowing the specifics [..] you ought to be able to reason abstractly about a directional update that you would make, if you knew any specifics instead of none.

A few lines down, he admits it might be tricky to get:

(Though I say this without much hope; I have not had very much luck in telling people about predictable directional updates they would make, if they knew something instead of nothing about a subject. I think it’s probably too abstract for most people to feel in their gut, or something like that, so their brain ignores it and moves on in the end.

I have had life experience with learning more about a thing, updating, and then going to myself, “Wow, I should’ve been able to predict in retrospect that learning almost any specific fact would move my opinions in that same direction.” But I worry this is not a common experience, for it involves a real experience of discovery, and preferably more than one to get the generalization.)

I can stomach abstract and confusing, but this also smelled like a violation of the law of conservation of expected evidence. An excerpt, from fifteen years ago:

There is no possible plan you can devise, no clever strategy, no cunning device, by which you can legitimately expect your confidence in a fixed proposition to be higher (on average) than before.

So why does he seem to be saying different now? Stop reading now if you want to figure this out for yourself first.

Proof of Eliezer’s claim.

Long story short: this way of reasoning would in fact be a refined application of CoEE, not a violation; to get yourself to be more tightly in accord with it. Your prior “should” be the expected posterior. If it isn’t, something (the prior) needs to change.

It might help to see the (fairly trivial) math actually say “make your confidence higher than before”.

Setup

Let’s take:

some hypothesis you’re reasoning about. Like “There is a substantial difference between brain machinery and AGI machinery”.

worlds | learning that I’m in world would move me closer to .

or, expressed more mathematically:

worlds | .

  • Note: This is ever-so-slightly meta, so stare at it a little if you’re not used to this. As a reminder: an event is any collection of worlds. For example, the event “I win this round of chess” is the set of all worlds where you somehow ended up winning.

Lastly, recall that conservation of expected evidence (which is a simple tautology) says:

.

Proof.

Suppose Eliezer’s conditional is true: “learning almost any specific fact would move me in the same direction.”

Notice how above is constructed to talk about exactly this: it is the set of worlds that would move you in the same direction, towards . This conditional is effectively saying is close to .

That means that the CoEE reduces to .

  • Question. Isn’t this trivial to see with just the definition of conditional probability (ie. isn’t it clear that if )?

    Answer. You’d still have to decompose similarly. Either way, I find CoEE more dynamically intuitive, as a balance between my future possible states of mind. Ultimately, they’re both trivial.

Now this “equation” is intentionally (but carefully) conforming with the law, not an equation that’s just always conveniently true about your actual reasoning methodology. You merely wish it were true. In other words, it’s more like a variable assignment than an equation:

.

Since each world as a singleton is disjoint, we can re-write the RHS:

.

But for each such , we have by definition of . Thus:

.

Thus, your prior weight on should be assigned something higher than it was before.

(You could have ignored the term by assumption, but I avoided doing that so that you don’t have to wonder about some funny business about approximations and such. As you can see, it cancels out regardless.)

Conclusion

If you’ve failed to (almost) equalize this but believe the conditional, then update now to match before a Dutch gentleman finds you.

Examples?

This is a nice (albeit tautological) result, but I don’t have any realistic examples. How do you end up in this situation? I’m not even sure everyone would be convinced by the one Eliezer gives:

If you did know how both kinds of entity consumed computations, if you knew about specific machinery for human brains, and specific machinery for AGIs, you’d then be able to see the enormous vast specific differences between them, and go, “Wow, what a futile resource-consumption comparison to try to use for forecasting.”

It is perhaps arguable that we might find in most worlds that the machinery is in fact quite similar, where it matters. Personally I’d be surprised by this, but a good example ought to be completely incontrovertible and at least somewhat non-trivial.

Help me out with some?