Comment on decision theory

A comment I made on social media last year about why MIRI cares about making progress on decision theory:


We aren’t working on decision theory in order to make sure that AGI systems are decision-theoretic, whatever that would involve. We’re working on decision theory because there’s a cluster of confusing issues here (e.g., counterfactuals, updatelessness, coordination) that represent a lot of holes or anomalies in our current best understanding of what high-quality reasoning is and how it works.

As an analogy: it might be possible to build a probabilistic reasoner without having a working understanding of classical probability theory, through sufficient trial and error. (Evolution “built” humans without understanding probability theory.) But you’d fundamentally be flying blind when it comes to designing the system — to a large extent, you couldn’t predict in advance which classes of design were likely to be most promising to consider, couldn’t look at particular proposed designs and make good advance predictions about safety/​capability properties of the corresponding system, couldn’t identify and address the root causes of problems that crop up, etc.

The idea behind looking at (e.g.) counterfactual reasoning is that counterfactual reasoning is central to what we’re talking about when we talk about “AGI,” and going into the development process without a decent understanding of what counterfactual reasoning is and how it works means you’ll to a significantly greater extent be flying blind when it comes to designing, inspecting, repairing, etc. your system. The goal is to be able to put AGI developers in a position where they can make advance plans and predictions, shoot for narrow design targets, and understand what they’re doing well enough to avoid the kinds of kludgey, opaque, non-modular, etc. approaches that aren’t really compatible with how secure or robust software is developed.

Nate’s way of articulating it:

The reason why I care about logical uncertainty and decision theory problems is something more like this: The whole AI problem can be thought of as a particular logical uncertainty problem, namely, the problem of taking a certain function f : QR and finding an input that makes the output large. To see this, let f be the function that takes the AI agent’s next action (encoded in Q) and determines how “good” the universe is if the agent takes that action. The reason we need a principled theory of logical uncertainty is so that we can do function optimization, and the reason we need a principled decision theory is so we can pick the right version of the “if the AI system takes that action...” function.

The work you use to get to AGI presumably won’t look like probability theory, but it’s still the case that you’re building a system to do probabilistic reasoning, and understanding what probabilistic reasoning is is likely to be very valuable for doing that without relying on brute force and trial-and-error. Similarly, the work that goes into figuring out how to design a rocket, actually building one, etc. doesn’t look very much like the work that goes into figuring out that there’s a universal force of gravity that operates by an inverse square law; but you’ll have a vastly easier time approaching the rocket-building problem with foresight and an understanding of what you’re doing if you have a mental model of gravitation already in hand.

In pretty much the same way, developing an understanding of roughly what counterfactuals are and how they work won’t get you to AGI, and the work of implementing an AGI design won’t look like decision theory, but you want to have in mind an understanding of what “AGI-style reasoning” is (including “what probabilistic reasoning about empirical propositions is” but also “what counterfactual reasoning is”, “what probabilistic reasoning about mathematical propositions is”, etc.), and very roughly how/​why it works, before you start making effectively irreversible design decisions.


Eliezer adds:

I do also remark that there are multiple fixpoints in decision theory. CDT does not evolve into FDT but into a weirder system Son-of-CDT. So, as with utility functions, there are bits we want that the AI does not necessarily generate from self-improvement or local competence gains.