Infra-Bayesianism naturally leads to the monotonicity principle, and I think this is a problem


The monotonicity principle is a famously uncomfortable consequence of Infra-Bayesian Physicalism: an IBP agent can only act in a way as if its utility function never gave negative value to any event. This strongly contradicts the intuition that creating suffering people is actively bad.

In this post, I explain in layman terms how IBP leads to this conclusion and I argue that this feature is not unique to Physicalism: with certain reasonable extra assumptions, the monotonicity principle naturally follows from infra-Bayesianism. In my opinion, this points to a significant flaw in the applicability of infra-Bayesianism.

A very simplified overview of Infra-Bayesianism

An infra-Bayesian agent assumes that the world is controlled by a malevolent deity, Murphy, who tries to minimize the agent’s utility.[1] However, Murphy is constrained by some laws. The agent has some hypotheses on what these laws might be. As time goes on, the agent learns that some things don’t go maximally badly for it, which must be because some law constrains Murphy in that regard. The agent slowly learns about the laws in this way, then acts in a way that maximizes its utility under these assumptions. I explain this in more detail and try to give more intuition for the motivations behind this here, and in even more details here.

A moral philosophy assumption

Imagine a perfect simulation of a person being tortured. Presumably, running this simulation is bad. Is running the exact same simulation on two computers twice as bad as running it only once? My intuition is that no, there doesn’t seem to be that much of a difference between running the program once or twice. After all, what if we run it on only one computer, but the computer double-checks every computation-step? Then we basically run the program in two instances in parallel. Is this twice as bad as not double-checking the steps? And what if we run the program on a computer with wider wires?

I also feel that this being a simulation doesn’t change much. If a person is tortured, and a perfect clone of him with the exact same memories and thoughts is tortured in the exact same way, that’s not twice as bad. And the difference between clones and clones being tortured is definitely not the same as the difference between the torture happening once and not happening at all.

In fact, the only assumption we need is sub-linearity.

Assumption: The badness of torturing perfect clones of a person in the exact same way grows sublinearly with .

This can be a simple indicator function (after the first person is tortured, torturing the clones doesn’t make it any worse, as suggested by the simulation analogy), or this can be a compromise position where the badness grows with, let’s say, the logarithm or square-root of .

I think this is a reasonable assumption in moral philosophy that I personally hold, most people I asked agreed with, and Vanessa herself also strongly prescribes to: it’s an integral assumption of infra-Bayesian Physicalism that the exact same experience happening twice is not different from it happening only once.

One can disagree and take an absolute utilitarian position, but I think mine is a common enough intuition that I would want a good decision theory to successfully accommodate utility functions that scale sublinearly with copying an event.

The monotonicity principle

Take an infra-Bayesisan agent whose utility function is sublinear in the above-described way. The agent is offered a choice: if it creates a new person and tortures him for hundred years, then a child gets an ice cream.

The infra-Bayesian agent assumes that everything that it has no information about is maximally horrible. This means that it assumes that every part of the universe it didn’t observe yet (let’s say the inside of quarks), is filled with gazillion copies of all possible suffering and negative utility events imaginable.

In particular, this new person it should create and torture already has gazillion tiny copies inside the quarks, tortured in the exact same way the agent plans to torture him. Then the sublinearity assumption means that the marginal badness of torturing one more instance of this person is negligible.

On the other hand, as ice creams are good, the agent assumes that the only ice creams in the universe are the ones it knows about, and there is no perfect copy of this particular child getting this particular ice cream. There are no gazillion copies of that inside the quarks, as that would be a good thing, and an infra-Bayesian agent always assumes the worst.

Therefore, the positive utility of the child getting the ice cream outweighs the negative utility of creating a person and torturing him for hundred years. This is the monotonicity principle: the agent acts as if no event had negative value.

Vanessa also acknowledges that the monotonicity principle is a serious problem, although she sometimes considers that we should bite this bullet, and that creating an AGI adhering to the monotonicity principle might not actually be horrible, as creating suffering has the opportunity cost of not creating happiness in its place, so the AGI still wouldn’t create much suffering. I strongly oppose this kind of reasoning, as I explain here.

Infinite ethics

Okay, but doesn’t every utilitarian theory break down or at least get really confusing when we consider a universe that might be infinite in one way or an other and might contain lots of (even infinite) copies of every conceivable event? Shouldn’t a big universe make us ditch the sublinearity assumption anyway and go back to absolute utilitarianism, as it might not lead to that many paradoxes?

I’m still confused about all of this, and the only thing I’m confident in is that I don’t want to build a sovereign AGI that tries to maximize some kind of utility, using all kinds of philosophical assumptions. This is one of my disagreements with Vanessa’s alignment agenda, see here.

I also believe that infra-Bayesianism handles these questions even less gracefully than other decision processes would, because of the in-built asymmetry. Normally, I would assume that there might be many tortures and ice creams in our big universe, but I see no reason why there would be more copies of the torture than the ice cream, so I still choose avoiding the torture. On the other hand, an infra-Bayesian agent assumes that the quarks are full of torture but not ice cream, which leads to the monotonicity principle.

This whole problem can be patched by getting rid of the sublinearity assumption and subscribing to full absolute utilitarianism (although in that case Infra-Bayesian Physicalism needs to be reworked as it heavily relies on a strong version of the sublinearity assumption), but I think that even then the existence of this problem points at a serious weakness of infra-Bayesianism.

  1. ^

    Well, it behaves in ways to maximize its utility in the worst-case scenario allowed by the laws. But “acting as if it assumes the worst” is functionally equivalent to assuming the worst, so I describe it this way.