Comments on Pascal’s Mugging

There seems to be some continuing debate about whether or not it is rational to appease a Pascal Mugger. Some are saying that due to scope insensitivity and other biases, we really should just trust what decision theory + Solomonoff induction tells us. I have been thinking about this a lot and I’m at the point where I think I have something to contribute to the discussion.

Consider the Pascal Mugging “Immediately begin to work only on increasing my utility, according to my utility function ‘X’, from now on, or my powers from outside the matrix will make minus 3^^^^3 utilons happen to you and yours.”

Any agent can commit this Pascal’s mugging (PM) against any other agent, at any time. A naive decision-theoretic expected-utility optimizer will always appease the mugger. Consider what the world would be like if all intelligent beings were this kind of agent.

When you see an agent, any agent, your only strategy would be to try to PM it before it PMs you. More likely, you will PM each other simultaneously, in which case the agent which finishes the mugging first ‘wins’. If you finish mugging at the same time, the mugger that uses a larger integer in its threat ‘wins’. (So you’ll use the most compact notation possible and things like, “minus the Busy Beaver function of Graham’s number utilons”.)

This may continue until every agent in the community/​world/​universe has been PMed. Or maybe there could be one agent, a Pascal Highlander, who manages to escape being mugged and has his utility function come to dominate...

Except, there is nothing stipulating that the mugging has to be delivered in person. With a powerful radio source, you can PM everyone in your future light-cone unfortunate enough to decode your message, potentially highjacking entire distant civilizations of decision-theory users.

Pascal’s mugging doesn’t have to be targeted. You can claim to be a Herald of Omega and address your mugging “to whoever receives this transmission”

Another strategy might be to build a self-replicating robot (itself too dumb to be mugged) which has a radio which broadcasts a continuous fully general PM, and send it out into space. Then you commit suicide to avoid the fate of being mugged.

Now consider a hypothetical agent which completely ignores muggers. And mugs them back.

Consider what could happen if we build an AI which is friendly in every possible respect except that it appeases PMers.

To avoid this, you might implement a heuristic that ignores PMs on account of the prior improbability of being able to decide the fate of so many utilons, as Robin Hanson suggested. But an AI using naïve expected utility + SI may well have other failure modes roughly analagous to PM that we won’t think of until its too late. You might get agents to agree to pre-commit to ignore muggers, or to kill them, but to me this seems unstable. A bandaid that’s not addressing the heart of the issue. I think an AI which can envision itself being PMed repeatedly by every other agent on the planet and still evaluate appeasement as the lesser evil cannot possibly be a Friendly AI, even if it has some heuristic or ad hoc patch that says it can ignore the PM.

Of course there’s the possibility that we are in a simulation which is occasionally visited by agents from the mother universe, which really does contain 3^^^^3 utilons/​people/​dustspecks. I’m not convinced acknowledging this possibility changes anything. There’s nothing of value that we, as simulated people, could give our Pascal Mugging simulation overlords. Their only motivation would be as absolute sadistic sociopaths, but if that’s the reality of the multiverse, in the long term we’re screwed no matter what we do, even with friendly AI. And we certainly wouldn’t be in any way morally responsible for their actions.

Edit 1: fixed typos