FixDT is not a very new decision theory, but little has been written about it afaict, and it’s interesting. So I’m going to write about it.

TJ asked me to write this article to “offset” not engaging with Active Inference more. The name “fixDT” is due to Scott Garrabrant, and stands for “fixed-point decision theory”. Ideas here are due to Scott Garrabrant, Sam Eisenstat, me, Daniel Hermann, TJ, Sahil, and Martin Soto, in roughly that priority order; but heavily filtered through my own lens.

This post may provide some useful formalism for thinking about issues raised in The Parable of Predict-O-Matic.

Self-fulfilling prophecies & other spooky map-territory connections.

A common trope is for magic to work only when you believe in it. For example, in Harry Potter, you can only get to the magical train platform 9 if you believe that you can pass through the wall to get there.

A plausible normative-rationality rule, when faced with such problems: if you want the magic to work, you should believe that it will work (and you should not believe it will work, if you want it not to work).

Can we sketch a formal decision theory which handles such problems?

We can’t start by imagining that the agent has a prior probability distribution, like we normally would, since the agent would already be stuck—either it lucked into a prior which believed the magic could work, or, it didn’t.

Instead, the “beliefs” of the agent start out as maps from probability distributions to probability distributions. I’ll use “” as the type for probability distributions (little for a specific probability distribution). So the type of “beliefs”, , is a function type: (little for a specific belief). You can think of these as “map-territory connections”: is a (causal?) story about what actually happens, if we believe . A “normal” prior, where we don’t think our beliefs influence the world, would just be a constant function: it always outputs the same no matter what the input is.

Given a belief , the agent then somehow settles on a probability distribution . We can now formalize our rationality criteria:

Epistemic Constraint: The probability distribution which the agent settles on cannot be self-refuting according to the beliefs. It must be a fixed point of : a such that .

Instrumental Constraint: Out of the options allowed by the epistemic constraint, should be as good as possible; that is, it should maximize expected utility.

We can also require that be a continuous function, to guarantee the existence of a fixed point[1], so that the agent is definitely able to satisfy these requirements. This might seem like an arbitrary requirement, from the perspective where is a story about map-territory connections; why should they be required to be continuous? But remember that is representing the subjective belief-formation process of the agent, not a true objective story. Continuity can be thought of as a limit to the agent’s own self-knowledge.

For example, the self-referential statement X: “” suggests an “objectively true” belief which maps to 1 if it’s below 12, and maps it to 0 if it’s above or equal to 12. But this belief has no fixed-point; an agent with this belief cannot satisfy the epistemic constraint on its rationality. If we require to be continuous, we can only approximate the “objectively true” belief function, by rapidly but not instantly transitioning from 1 to 0 as rises from slightly less that 12 to slightly more.

These “beliefs” are a lot like “trading strategies” from Garrabrant Induction.

We can also replace the continuity requirement with a Kakutani requirement, to get something more like Paul’s self-referential probability theory.

“Beliefs” are mathematically nice!

This section isn’t even about the decision theory; I suppose it’s skippable.

But this notion of “beliefs” is more useful that it may first appear.

First, notice that you can combine beliefs by weighted sum, in much the same way you can combine probability distributions into mixture models: . This means we can represent our overall beliefs as a “mixture of hypotheses”, just like with probabilities. The weights are analogous to probabilities; but we can also think of them as “wealths” to reflect the Garrabrant Induction idea.

As I mentioned already, we can think of “normal priors” as a special case of beliefs, where the belief is just a constant function, outputting the same probability distribution regardless of input. In this case, weighted sums of beliefs behave exactly like regular weighted sums of probability distributions.

However, while regular probabilistic mixture models only act like “alternative possibilities”, belief mixtures can also combine constraints.

Let’s focus on two events, and . The belief knows that and knows nothing else. So it reacts to a given by Jeffrey-updating the probabilities so that but the probability distribution is otherwise changed as little as possible. The belief knows that and nothing else. It reacts to a given by updating on this, to rule out worlds where the two events differ; but it is agnostic about what exact probabilities the two events should have.

Any mixture of these two beliefs will result in a belief which enforces both constraints; its only fixed points will have , and . The set of fixed points will not depend on the relative weight of the two hypotheses; relative weight only comes into play when you mix together inconsistent constraints.

So, belief functions allow us to represent abstract beliefs which are agnostic about some details of the probability distribution, as well as concrete beliefs which are fully detailed, and combine all of these things together with simple arithmetic. You could say that they can represent beliefs at multiple granularities. For this reason, Scott calls these things “multigrain models”, which is a much better term for general use than the term “beliefs” I’m using in this essay.

Can this be the whole decision theory?

So we’ve got a nice generalized notion of “belief”, and a proposed decision procedure which takes that generalized notion and chooses the best fixed-point, to handle self-fulfilling prophecies (as well as self-refuting beliefs and other spooky map-territory connections).

But we still have to make “normal” decisions; that is, we need to take “external” actions, not just decide on probabilities. The standard picture is that probabilities are an input to the action-deciding process. So it sounds like the new pipeline is: beliefs → FixDT ‘decision’ → probabilities → ordinary ‘decision’ → actions.

This is a bit complex and inelegant. It would be nice if we could “make a decision” just once, instead of twice. So, let’s suppose that actions are controlled by self-fulfilling prophecies. For example, if a robot has a motor that can turn on or off, we want to wire it directly to the robot’s belief about the motor. Maybe the motor turns on or off with precisely the probability given by the belief. Or perhaps there’s a threshold; strong enough beliefs turn the motor on, and otherwise it shuts off. The details don’t matter too much, so long as there’s a consistent fixed-point where the motor is on, and a consistent fixed-point where the motor is off. (Although we will explore some problems with this soon.)

Great! Now we’ve unified all decisions into one type. All we need is FixDT; once the probabilities have been chosen, all of the decisions are already made. This picture has other advantages, too. The agent no longer needs to have a special category of “actions” which it can take. “Actions” are just things in the world that are influenced by the agent’s probabilities. This results in a picture of agency where there’s no ontologically special “output” or “action” type! Actuators are just parts of the world which somehow pay attention to the agent.

We can also use the “belief” datatype to unify the notion of input (observation/​evidence) with the notion of “hypothesis”—although this deserves its own write-up. The short version: imagine that is defined in reference to the world; that is, it modifies probabilities not by guessing, but rather, by looking at the world and reporting what it sees. Under some additional assumptions, ’s influence will behave like a Bayesian update in the limit of having infinite weight with which to influence the probability distribution.

So we’ve dissolved the usual notions of “input” and “output”—now we’ve just got a market of beliefs, “observations” are just things which influence the market, and “actions” are just things which are influenced by the market.

This seems like a great picture.

  • We’ve reversed the common picture that we first figure out what we believe, and then figure out what to do. The decision lives inside the computation of probabilities.

  • We can represent something resembling a Lobian handshake in a probabilistic setting: if I believe that your probability of cooperation is tied to mine, I can select a fixed-point with a high probability of cooperation for both of us. And if I’m right in my beliefs, you’ll do the same.[2]

  • We don’t need to consider “actions” at all. Instead, there are just parts of the environment which react to our chosen probabilities; and we choose our probabilities with this in mind. Me choosing to type these words is no different in kind from a general choosing where to station troops; the fingers react to what I expect them to type, and the troops react to where I expect them to go.

Sadly, this nice picture falls apart when we look at learning-theoretic considerations.

Reasons for pessimism.

For the picture to work out, we need to be able to learn what we can control.

Eliminating the traditional decision-theoretic need for a list of possible actions to choose from doesn’t do us much good if we still have to hard-code the beliefs which say that the robot’s motors listen to the robot’s probabilities in a particular way. Instead, we’d like the robot to be able to notice this for itself. This would also give us reassurance that it is controlling other aspects of the environment as appropriate.

To make discussion of this simple, I’m going to imagine that there is a “true” belief, , which tells us the “actual” counterfactual relationship between our probabilities and reality. This is metaphysically questionable, but it makes sense in practice. For example, if I hook up my robot’s motor to turn on if the robot’s probability of the motor turning on is above , then should map for which to some such that .

If it helps, you can think of as a “calibration” function which maps uncalibrated probabilities to the probability where it would be calibrated. Normally, we think of calibration functions as representing underconfidence and overconfidence—if when I say “90%” the event actually occurs an average of 80% of the time, then I’m overconfident and should adjust my probabilities downward. The idea here is exactly the same, except that here we’re considering a case where the 80% observed frequency we see in the world might be a reaction to the 90% probability—so if we move down to 80%, the world might move down further, to 70%, or might move up to 100%, etc. (This is why we need to select a fixed point of the calibration function, rather than just naively adjust in the right direction.)

Seeing as a calibration function will be more comfortable for a frequentist, who can consider all of this well-defined so long as we can place situations into sequences of random experiments. Causal decision theorists may prefer to think of as giving the true causal relationship between our probabilities and the world.[3]

So, basically, we want beliefs to approximate as we learn. More specifically, our beliefs should approximate the set of fixed points for .

This implies some kind of iterated setting, where the agent updates its beliefs over time and selects fixed points repeatedly, rather than just once. I will assume that things look similar to Garrabrant Induction, in that respect. But this is not a formal impossibility proof! I am sketching reasons for pessimism, not formally showing that FixDT will never work. So don’t worry about the details—make up your own assumptions if my reasoning doesn’t make sense to you. Let me know if you get it to work!

It would be easy if we could try out different probabilities and see for each. It would just be a regression problem. The problem is, we don’t get to observe probabilities. We only observe what happens.[4]

Imagine that our beliefs are a weighted mixture of , and is already one of the . (This is usually the easiest case for learning—the “realizable” case. If this doesn’t work, there would seem to be little hope more generally.) How can we reward for getting things right?

Our chosen probabilities will be a fixed-point of , but will not necessarily be a fixed-point of every in our mixture. We can reward beliefs which were pushing in the right direction. If was 12, and , we could say that was trying to pull the probability down. If we then observe that turned out to be false, then should get rewarded with a higher weight in our mixture.

Now, here’s the problem: we can’t, in general, reward beliefs which correctly identify fixed-points of , or punish beliefs which incorrectly rule out which are fixed points of .

Suppose that has two fixed-points, a good one and a bad one . Our only other hypothesis, , is defined as follows: ; that is, it drags things halfway from wherever they are to . This can (with enough weight relative to other hypotheses) completely eliminate as a fixed point, leaving only . will never lose credibility for doing this, since at it makes the same prediction as -- which is to say, neither of them want to make any corrections to the probabilities at that point, so no learning will happen no matter what gets observed.

In general, if we are at some fixed-point of , then will not be making any correction to that fixed-point; so it seems difficult to reward or punish . FixDT chooses some probability; then we observe what happens; it seems like we can only reward beliefs which were trying to push the probability towards the thing that happened (and punish those who pulled in the other direction).

Attraction & Repulsion

Actually, we can distinguish between fixed-points of which are attractor points vs those which are repulsive. (More generally, points can be varyingly attractive/​repulsive when approached from different directions.)

For example, suppose I wire up a motor response like this:

The 50% point will be a fixed-point, but it will be repulsive: beliefs very close to the fixed-point would map to beliefs a bit further away, so that if we iterated , points initially near 50% would shoot away.

Similarly, 100% and 0% are attractive fixed-points; probabilities near to them rapidly converge toward them if we iterate .

If the full market’s fixed-point ends up being close to an attractive point of , then reality will respond by being even closer to the attractive point. This suggests that we can learn such points! Beliefs which are pushing toward the fixed-point will be increasingly vindicated (in expectation, if we use a proper scoring rule to reward/​punish beliefs).

On the other hand, belief in repulsive fixed-points will be correspondingly punished.

This suggests that we can get some positive learning-theoretic results if we limit our aspirations: perhaps we cannot learn in general, but can learn its attractive fixed-points.

(But don’t forget that this can be a big disappointment from a decision-theoretic perspective. The attractive fixed-points can be terrible, and the repulsive fixed-points can be wonderful.)

Active inference to the rescue?

Some might say that the problem, here, is that I am using some of the ideas from Active Inference without adopting the full package.

Specifically, FixDT has in common with Active Inference that motor outputs are a function of what the agent believes its motor outputs will be, rather than the more common idea of being a function of expected utility.

But FixDT is trying to get away with this move without the accompanying Active Inference idea of skewing beliefs toward success.

Can we fix FixDT by adding in more ideas from Active Inference? Sort of, but I don’t find it very satisfying.

Friendly Actuators?

I observed that attractive fixed-points appear to be learnable, while repulsive fixed-points appear unlearnable. But whether a point is attractive vs repulsive depends on , which is to say, it depends on how the environment reacts to our beliefs. For example, we could wire up the motor responses to be as follows instead of the suggestion I illustrated earlier:

The important thing to note, here, is that I’ve flipped which fixed-points are attractive vs repulsive. This is not very nice for the agent; it means the 50% point is learnable, but properly turning the motors on/​off is no longer learnable.

So we could define “friendly actuators” as ones which have been designed so as to be easy for the agent to learn how to use. Is there a systematic way to design friendly actuators?

Well, we could take the idea from Active Inference. Rather than copy the action probabilities from the agent’s chosen probabilities (which would make every distribution over actions a fixed-point of , but neither attractive nor repulsive, and therefore not very learnable) we should instead take the agent’s probabilities, bias them toward success, and copy those probabilities. Since action-probabilities will always be shifted toward better outcomes, only optimal actions will be fixed-points.

(This prevents us from learning full control; but who cares about failing to learn suboptimal fixed-points? We really only need to be able to learn the ones we actually want to choose.)

My problem with this idea is that we’re introducing “actuator decision theory”—the actuator is now asked to be intelligent itself, in order to cooperate with the agent. We might as well have the actuator just make the best decision based on the beliefs, then! This returns us to classical decision theory.

Biased Reporting?

A different way to try and import the Active Inference idea is to bias the agent’s probabilities themselves, rather than putting that responsibility on the actuators. Again, the idea is to make better outcomes learnable by helping them to be attractive fixed-points.

For example, imagine Popular News Network (PNN) finds itself regularly reporting on bank runs. Bank runs have become a big problem, and PNN is doing a service to its viewers by reporting on expert predictions about which banks are in the process of collapsing, which banks are unstable ground and might be next, which banks seem secure, etc.

PNN is not blind to the fact that its reports can actually cause or prevent bank-runs. Thus far, PNN’s ethical position has been that they’re doing fine so long as they (1) report the truth as accurately as they are able (the epistemic constraint) and (2) when the accuracy constraint allows for multiple possible reports to be fixed-points, they choose whichever report results in the fewest bank-runs (the instrumental constraint).

However, PNN has noticed that despite their judicious adherence to the above, more and more bank-runs seem to be happening. Their expert analysts have figured out that bank runs are attractive fixed-points, but non-bank-runs are repulsive; the number of bank-runs in a given week roughly tracks however many PNN forecasts, but looking at the details, there are about 5% more on average than whatever is forecast.

As a result, the reporters, bound by honesty, keep sliding in the direction of predicting more bank runs, since the numbers tend to prove their previous forecasts to be underestimates.

Taking an idea from Active Inference, PNN executives ask reporters to reduce their forecasted numbers by 10% from whatever the honest forecast would be, in the hopes of putting pressure against bank-runs.

I have a couple of problems with this approach.

First, if we violate the epistemic constraint, are the reported numbers really “probabilities” any more? They’re just some numbers we made up. By bending epistemic rationality, we lose the nice properties we invented it for. Why invoke probability theory at all, if you’re no longer trying to make your probabilities calibrated?[5]

Second, and relatedly: the viewers of PNN can pick up on the biased reporting and adjust the numbers back up by 10%.

This gets us into murky philosophical issues behind FixDT. The idea of FixDT is that the world might somehow react to our probabilities. But how does the world zero in on “our probabilities” to react to them? If we’re settled on a specific version of FixDT, we don’t care; FixDT just tracks how the world reacts, and chooses fixed-points accordingly.

But if we’re trying to decide between versions of FixDT (or between FixDT and other options), it might start to matter how the world detects our probabilities in order to react. If we violate the Epistemic Constraint and adjust some numbers up by 10%, will the world adjust those numbers back down before reacting to them?

Obviously, it depends. In some cases, the Active Inference idea will work fine. But in many cases of interest, it won’t. That’s really all I can say, here.

Connection to the “futarchy hack”.

Earlier, in my heuristic argument that FixDT can’t learn , I divided the problem into two parts: (a) we can’t reward traders who successfully make fixed-points of into fixed-points of the market; (b) we can’t punish traders who successfully rule out fixed-points of as market fixed-points.

The second problem is very similar to the untaken actions problem, often called the “futarchy hack” (often, in terms of the in-person LWDT community) because it is a way to control the decisions of a futarchy without risking any money: if you can bet enough money that the option you don’t want will be bad for everyone, then that action won’t get taken, so you’ll simply get your money back. You put your money where your mouth was, but your predictions didn’t get empirically tested.

One of the best remedies to this problem (perhaps the best remedy) is Decision Markets (aka BRIA), by Caspar Oesterheld. But I don’t have a specific proposal for how to combine that with FixDT.

Future work?

  1. Combining updateless reasoning with FixDT.

  2. Further work on the learning-theoretic issues for FixDT.

  3. Spelling out the “dissolve the notion of evidence” thing I mentioned.

  4. Exploring the combination of BRIA and FixDT.

  5. FixDT can be seen as going up a single meta-level, from probabilities to maps. But what if the world reacts to your “belief” (your map)? Can we somehow deal with the implied infinite regress?

  6. FixDT game theory. Perhaps FixDT hierarchical game theory.

  7. Removing talk of “calibration” and ; motivating similar ideas in less ontologically questionable ways.

  8. Capitalizing on the nice ontology FixDT offers, to somehow further clarify “agent boundaries” stuff, or other issues in embedded agency?

  9. If we squint, we can see the Futarchy Hack as a failure of preference aggregation. We could say “the beliefs may actually have preferences” and attempting to rule out a fixed-point is a kind of vote. This is similar to the Active Inference idea, really. We can model Active Inference’s way of biasing beliefs toward success by putting in a belief which pushes things toward success (rather than my much grosser, but basically similar, proposal of biasing things toward success after the fixed-point is chosen). Thus we can see “beliefs” as actually having a value component (based on which fixed-points they push things to). Can this get us anywhere??

  1. ^

    We also need to assume that the space of probability distributions being considered is compact, to apply Brouwer’s fixed point theorem.

  2. ^

    This isn’t a super-great “handshake” really—I think it is little better than what EDT offers by allowing agents to believe that they are correlated with one another. The problem with both pictures is that there isn’t a learning-theoretic story showing that agents can converge toward cooperation on such a basis, as far as I know.

  3. ^

    If neither of these pictures is satisfying to you, well… I think many conclusions one can reach by pretending there’s a can be defended more carefully by other means, but I fully admit I’m not doing the work here.

  4. ^

    Of course, we only get to observe what happens for some observable things; I can’t directly observe whether my beliefs impact eddies in the currents deep within the sun, for example. But I don’t even expect that problem to be solvable in principle—agents just have to make due with some irreducible uncertainty about such things. But it does feel like I should be able to learn the calibration function for motor-control problems, in order for FixDT to be considered a success.

  5. ^

    Or, we could make this point in other ways, if “calibration” is meaningless to you. For example, biased probabilities will no longer maximize expected accuracy.