# St. Petersburg Mugging Implies You Have Bounded Utility

This post describes an infinite gamble that, under some reasonable assumptions, will motivate people who act to maximize an unbounded utility function to send me all their money. In other words, if you understand this post and it doesn’t motivate you to send me all your money, then you have a bounded utility function, or perhaps even upon reflection you are not choosing your actions to maximize expected utility, or perhaps you found a flaw in this post.

Briefly, we do this with The St. Petersburg Paradox, converted to a mugging along the lines of Pascal’s Mugging. I then tweaked it to extract all of the money instead of just a fixed sum.

I have always wondered if any actual payments have resulted from Pascal’s Mugging, so I intend to track payments received for this variation. If anyone does have unbounded utility and wants to prove me wrong by sending money, send it with Paypal to tim at fungible dot com. Annotate the transfer with the phrase “St. Petersburg Mugging”, and I’ll edit this article periodically to say how much money I received. In order to avoid confusing the experiment, and to exercise my spite, I promise I will not spend the money on anything you will find especially valuable. SIAI would be better charity, if you want to do charity, but don’t send that money to me.

Here’s the hypothetical (that is, false) offer to persons with unbounded utility:

Let’s call your utility function “UTILITY”. We assume it takes a state of the universe as an argument.

Define DUT to be UTILITY(the present situation plus you receiving $1000)-UTILITY(the present situation). Here DUT stands for Difference in UTility. We assume DUT is positive.

You have unbounded utility, so for each nonnegative N there is a universe UN(N) such that UTILITY(UN(N)) is at least DUT * 2**N. Here UN stands for “universe”.

The phrase “I am a god” is defined to mean that I am able to change the universe to any state I choose. I may not be a god after I make the change.

The offer is: For every dollar you send me, I will flip a coin. If it comes out Tails, or I am not a god, I will do nothing. If it comes out Heads and I am a god, I will flip the coin repeatedly until I see it come up Heads again. Let T be the number of times it was Tails. I will then change the universe to UN(T).

If I am lying and the offer is real, and I am a god, what utility will you receive from sending me a dollar? Well, the probability of me seeing N Tails followed by a Head is (1/2)**(N + 1), and your utility for the resulting universe is UTILITY(UN(N)) >= DUT * 2**N, so your expected utility if I see N tails is (1/2)**(N + 1) * UTILITY(UN(N)) >= (1/2)**(N + 1) * DUT * 2 ** N = DUT/2. There are infinitely many possible values for N, so your total expected utility is positive infinity * DUT/2, which is positive infinity.

I hope we agree that it is unlikely that I am a god, but it’s consistent with what you have observed so far, so unless you were born with certain knowledge that I am not a god, you have to assign positive probability to it. Similarly, the probability that I’m lying and the above offer is real is also positive. The product of two positive numbers is positive. Combining this with the result from the previous paragraph, your expected utility from sending me a dollar is infinitely positive.

If you send me one dollar, there will probably be no result. Perhaps I am a god, and the above offer is real, but I didn’t do anything beyond flipping the first coin because it came out Tails. In that case, nothing happens. Your expected utility for the next dollar is also infinitely positive, so you should send the next dollar too. By induction you should send me all your dollars.

If you don’t send money because you have bounded utility, that’s my desired outcome. If you do feel motivated to send me money, well, I suppose I lost the argument. Remember to send all of it, and remember that you can always send me more later.

As of 7 June 2011, nobody has sent me any money for this.

ETA: Some interesting issues keep coming up. I’ll put them here to decrease the redundancy:

Yes, you can justify not giving me money because I might be a god by claiming that there are lots of other unlikely gods that have a better claim on your resources. My purpose in writing this post is to find a good reason not to be jerked around by unlikely gods in general. Finding a reason to be jerked around by some other unlikely god is missing the point.

I forgot to mention that if I am a god, I can stop time while I flip coins, so we aren’t resource-constrained on the number of times I can flip the coin.

Yes, you can say that your prior probability of me being a god is zero. If you want to go that way, can you say what that prior probability distribution looks like in general? I’m actually more worried about making a Friendly AI that gets jerked around by an unlikely god that we did not plan for, so having a special case about me being god doesn’t solve an interesting portion of the problem. For what it’s worth, I believe the Universal Prior would give positive small probability to many scenarios that have a god, since universes with a god are not incredibly much more complex than universes that don’t have a god.

This doesn’t work with an unbounded utility function, for standard reasons:

1) The mixed strategy. If there is at least one lottery with infinite expected utility, then any combination of taking that lottery and other actions also has infinite expected utility. For example, in the traditional Pascal’s Wager involving taking steps to believe in God, you could instead go around committing Christian sins: since there would be nonzero probability that this would lead to your ‘wagering for God’ anyway, it would also have infinite expected utility. See Alan Hajek’s classic article “Waging War on Pascal’s Wager.”

Given the mixed strategy, taking and not taking your bet both have infinite expected utility, even if there are no other infinite expected utility lotteries.

2) To get a decision theory that actually would take infinite expected utility lotteries with high probability we would need to use something like the hyperreals, which would allow for differences in the expected utility of different probabilities of infinite payoff. But once we do that, the fact that your offer is so implausible penalizes it. We can instead keep our money and look for better opportunities, e.g. by acquiring info, developing our technology, etc. Conditional on there being any sources of infinite utility, it is far more likely that they will be better obtained by other routes than by succumbing to this trick. If nothing else, I could hold the money in case I encounter a more plausible Mugger (and your version is not the most plausible I have seen). Now if you demonstrated the ability to write your name on the Moon in asteroid craters, turn the Sun into cheese, etc, etc, taking your bet might win for an agent with an unbounded utility function.

Also see Nick Bostrom’s infinitarian ethics paper.

As it happens I agree that human behavior and intuitions (as I weight them) in these situations are usually better summed up with a bounded utility function, which may include terms like the probability of attaining infinite welfare, or attaining a large portion of hyperreal expected welfare that one could, etc, than an unbounded utility function. I also agree that St Petersburg lotteries and the like do indicate our bounded preferences. The problem here is technical, in the construction of your example.

I agree that there are many bets with infinite expected utility for a person who has unbounded utility. If the subject takes those bets into account, it’s unlikely that I’ll win in the sense of getting the subject to send me money. However, if the subject takes them into account, it’s very likely that the subject will lose, in the sense that the subject’s estimated utility from all of these infinite expected utility bets is going to swamp utility from ordinary things. If someone is hungry and has an apple, they should pay attention to the apple to decide whether to eat it, and not pay attention to the relative risk of Tim being a god who disapproves of apple-eating or Carl being a god who promotes apple-eating.

I don’t care much whether someone accepts my offer. I really care whether they can pay attention to an apple when they’re hungry to decide whether to eat the apple, as opposed to considering obscure consequences of how various possible unlikely gods might react to the eating of the apple. I am not convinced that hyperreals solve that problem—so far as I can tell, the outcome would be unchanged. Can you explain why you think hyperreals might help?

(ETA: hyperreals, AKA the non-standard reals, aren’t mysterious. Imagine the real numbers, imagine a new one we might call “infinity”, then add other new numbers as required so all of the usual first-order properties still hold. So we’d have infinity − 3 and 5 * infinity − 376/infinity and so forth. So far as I can tell, if you do the procedure described in the OP with hyperreal utilities, you still conclude that the utility of giving me money exceeds the utility of keeping the money and spending it on something ordinary.)

Perhaps you meant unbounded instead of infinite there.

I’m concerned that the tricky routes will dominate the non-tricky routes. I don’t really expect anyone to fall for my specific trick.

Last I read that was long ago. I glanced at it just now and it seems to be concerned with ethics, rather than an individual deciding what to do, so I’m having doubts about it being directly relevant. It’s probably worth looking at anyway, but if you can say specifically how it’s relevant and cite a specific page it would help.

Does the problem still exist if we assume the purpose of my example is to show that people with unbounded utility lose, rather than to make people send me money?

The point of mixed strategies is that without distinctions between lotteries with infinite expected utility all actions have the same infinite (or undefined) expected utility, so on that framework there is no reason to prefer one action over another. Hyperreals or some other modification to the standard framework (see discussion of “infinity shades” in Bostrom) are necessary in order to say that a 50% chance of infinite utility is better than a 1/3^^^3 chance of infinite utility. Read the Hajek paper for the full details.

“Empirical stabilizing assumptions” (naturalistic), page 34.

No it isn’t, unless like Hayek you think there’s something ‘not blindingly obvious’ about the ‘modification to the standard framework’ that consists of stipulating that probability p of infinite utility is better than probability q of infinite utility whenever p > q.

This sort of ‘move’ doesn’t need a name. (What does he call it? “Vector valued utilities” or something like that?) It doesn’t need to have a paper written about it. It certainly shouldn’t be pretended that we’re somehow ‘improving on’ or ‘fixing the flaws in’ Pascal’s original argument by explicitly writing this move down.

A system which selects actions so as to maximize the probability of receiving infinitely many units of some good, without differences in the valuation of different infinite payouts, approximates to a bounded utility function, e.g. assigning utility 1 to world-histories with an infinite payout of the good, and 0 to all other world-histories.

We are making the argument more formal. Doing so is a good idea in a wide variety of situations.

Do you disagree with any of these claims?

Introducing hyperreals makes the argument more formal

Making an argument more formal is often good

Here, making the argument more formal is more likely good than bad.

Sigh, we seem to be talking past each other. You’re talking about choosing which unlikely god jerks you around, and I’m trying to say that it’s eventually time to eat lunch. If you have infinite utilities, how can you ever justify prioritizing something finite and likely, like eating lunch, over something unlikely but infinite? Keeping a few dollars is like eating lunch, so if you can’t rationally decide to eat lunch, the question is which unlikely god you’ll give your money to. I agree that it probably won’t be me.

I think you mean Arguments for—and against—probabilism. If you meant something else, please correct me.

Why is eating lunch “finite”, given that we have the possibility of becoming gods ourselves, and eating lunch makes that possibility more likely (compared to not eating lunch)?

ETA: Suppose you saw convincing evidence that skipping lunch would make you more productive at FAI-building (say there’s an experiment showing that lunch makes people mentally slow in the afternoon), you would start skipping lunch, right? Even if you wouldn’t, would it be irrational for someone else to do so?

There are two issues here: 1) What the most plausible cashing-out of an unbounded utility function recommends 2) Whether that cashing-out is a sensible summary of someone’s values. I agree with you on 2) but think that you are giving bogus examples for 1). As with previous posts, if you concoct examples that have many independent things wrong with them, they don’t clearly condemn any particular component.

My understanding is that you want to say, with respect to 2), that you don’t want to act in accord with any such cashing-out, i.e. that your utility function is bounded insofar as you have one. Fine with me, I would say my own utility function is bounded too (although some of the things I assign finite utility to involve infinite amounts of stuff, e.g. I would prefer living forever to living 10,000 years, although boundedly so). Is that right?

But you also keep using what seem to be mistaken cashing-outs in response to 1). For instance, you say that:

But any decision theory/prior/utility function combination that gives in to Pascal’s Mugging will also recommend eating lunch (if you don’t eat lunch you will be hungry and have reduced probability of gaining your aims, whether infinite or finite). Can we agree on that?

If we can, then you should use examples where a bounded utility function and an unbounded utility function actually give conflicting recommendations about which action to take. As far as I can see, you haven’t done so yet.

I meant the paper that I already linked to earlier in this thread.

I agree that we agree on 2).

The conflict here seems to be that you’re trying to persist and do math after getting unbounded utilities, and I’m inclined to look at ridiculous inputs and outputs from the decision making procedure and say “See? It’s broken. Don’t do that!”. In this case the ridiculous input is a guess about the odds of me being god, and the ridiculous output is to send me money, or divert resources to some other slightly less unlikely god if I don’t win the contest.

Maybe. I don’t know what it would conclude about eating lunch. Maybe the decision would be to eat lunch, or maybe some unknown interaction of the guesses about the unlikely gods would lead to performing bizarre actions to satisfy whichever of them seemed more likely than the others. Maybe there’s a reason people don’t trust fanatics.

Well, if we can exclude all but one of the competing unlikely gods, the OP is such an example. A bounded utility function would lead to a decision to keep the money rather than send it to me.

Otherwise I don’t have one. I don’t expect to have one because I think that working with unbounded utility functions is intractible even if we can get it to be mathematically well-defined, since there are too many unlikely gods to enumerate.

But at this point I think I should retreat and reconsider. I want to read that paper by Hajek, and I want to understand the argument for bounded utility from Savage’s axioms, and I want to understand where having utilities that are surreal or hyperreal numbers fails to match those axioms. I found are a few papers about how to avoid paradoxes with unbounded utilities, too.

This has turned up lots of stuff that I want to pay attention to. Thanks for the pointers.

ETA: Readers may want to check my earlier comment pointing to a free substitute for the paywalled Hajek article.

Personal survival makes it a lot easier to please unlikely gods, so eating the apple is preferred. For more general situations, some paths to infinity are much more probable than others. For example, perhaps we can build a god.

Eating an apple was meant to be an example of a trivial thing. Inflating it to personal survival misses the point. Eating an apple should be connected to your own personal values concerning hunger and apples, and there should be a way to make a decision about eating an apple or not eating it and being slightly hungry based on your personal values about apples and hunger. If we have to think about unlikely gods to decide whether to eat an apple, something is broken.

That’s a likely god, not an unlikely god, so it’s a little bit different. Even then, low-probability interactions between eating an apple and the nature of the likely god seem likely to lead to bizarre decision processes about apple-eating, unless you have bounded utilities.

I don’t see why this is a problem. What causes you to find it so unlikely that our desires could work this way?

Pay attention next time you eat something. Do you look at the food and eat what you like or what you think will improve your health, or do you try to prioritize eating the food against sending me money because I might be a god, and against giving all of the other unlikely gods what they might want?

We are human and cannot really do that. With unbounded utilities, there are an absurdly large number of possible ways that an ordinary action can have very low-probability influence on a wide variety of very high-utility things, and you have to take them all into account and balance them properly to do the right thing. If an AI is doing that, I have no confidence at all that it will weigh these things the way I would like, especially given that it’s not likely to search all of the space. Someone who thinks about a million unlikely gods to decide whether to eat an apple is broken. In practice, they won’t be able to do that, and their decision about whether to eat the apple will be driven by whatever unlikely gods have been brought to their attention in the last minute. (As I said before, an improbable minor change to a likely god is an unlikely god, for the purposes of this discussion.)

If utilities are bounded, then the number of alternatives you have to look at doesn’t grow so pathologically large, and you look at the apple to decide whether to eat the apple. The unlikely gods don’t enter into it because you don’t imagine that they can make enough of a difference to outweigh their unlikeliness.

Why can’t they either estimate or prove that eating an apple has more expected utility (by please more gods overall than not eating an apple, say), without iterating over each god and considering them separately? And if for some reason you build an AI that does compute expected utility by brute force iteration of possibilities, then you obviously would not want it to consider only possibilities that “have been brought to their attention in the last minute”. That’s going to lead to trouble no matter what kind of utility function you give it.

(ETA: I think it’s likely that humans do have bounded utility functions (if we can be said to have utility functions at all) but your arguments here are not very good. BTW, have you seen The Lifespan Dilemma?)

I would like to do whichever of these two alternatives leads to more utility.

Are you saying that we shouldn’t maximize utility because it’s too hard?

If your actual utility function is unbounded and thinking about a million “unlikely gods” is worth the computational resources that could be spent on likely gods (though you specified that small changes to likely gods are unlikely gods, there is a distinction in that there are not a metaphorical million of them), than that is your actual preference. The utility function is not up for grabs.

Your argument seems to be that maximizing an unbounded utility function is impractical, so we should maximize a bounded utility function instead. I find it improbable that you would make this argument, so, if I am missing anything, please clarify.

Yes, the utility function is not up for grabs, but introspection doesn’t tell you what it is either. In particular, the statement “endoself acts approximately consistently with utility function U” is an empirical statement for any given U (and any particular notion of “approximately”, but let’s skip that part for now). I believe I have provided fine arguments that you are not acting approximately consistently with an unbounded utility function, and that you will never be able to do so. If those arguments are valid, and you say you have an unbounded utility function, then you are wrong.

If those arguments are valid, and you say you want to have an unbounded utility function, then you’re wanting something impossible because you falsely believe it to be possible. The best I could do in that case if I were helping you would be to give you what you would want if you had true beliefs. I don’t know what that would be. What would you want from an unbounded utility function that you couldn’t get if the math turned out so that only bounded utility functions can be used in a decision procedure?

There are many paths by which small actions taken today might in unlikely ways influence the details of how a likely god is built. If those paths have infinite utility, you have to analyze them to decide what to do.

I am currently researching logical uncertainty. I believe that the increased chance of FAI due to this research makes it the best way to act according to my utility function, taking into account the limits to my personal rationality (part of this is personal; I am particularly interested in logical uncertainty right now, so I am more likely to make progress in it than on other problems). This is because, among other things, an FAI will be far better at understanding the difficulties associated with unbounded utility functions than I am.

You have not demonstrated it to be impossible, you have just shown that the most obvious approach to it does not work. Given how questionable some of the axioms we use are, this is not particularly surprising.

An actual description of my preferences. I am unsure whether my utility function is actually unbounded but I find it probable that, for example, my utility function is linear in people. I don’t want to rule this out just because that current framework is insufficient for it.

Some paths are far more likely than others. Actively researching FAI in a way that is unlikely to significantly increase the probability of UFAI provides far more expected utility than unlikely ways to help the development of FAI.

Predicting your preferences requires specifying both the utility function and the framework, so offering a utility function without the framework as an explanation for your preferences does not actually explain them. I actually don’t know if my question was hypothetical or not. Do we have a decision procedure that gives reasonable results for an unbounded utility function?

The phrase “rule this out” seems interesting here. At any given time, you’ll have a set of explanations for your behavior. That doesn’t rule out coming up with better explanations later. Does the best explanation you have for your preferences that works with a known decision theory have bounded utility?

Perhaps I see what’s going on here—people who want unbounded utility are feeling loss when they imagine giving that up that unbounded goodness in order to avoid bugs like the one described in the OP. I, on the other hand, feel loss when people dither over difficult math problems when the actual issues confronting us have nothing to do with difficult math. Specifically, dealing effectively with the default future, in which one or more corporations make AI’s that optimize for something having no connection to the preferences of any individual human.

Not one compatible with a Solomonoff prior. I agree that a utility function alone is not a full description of preferences.

The best explanation that I have for my preferences does not, AFAICT, work with any known decision theory. However, I know enough of what such a decision theory would look like if it were possible to say that it would not have bounded utility.

I disagree that I am doing such. Whether or not the math is relevant to the issue is a question of values, not fact. Your estimates of your values do not find the math relevant; my estimates of my values do.

downvoted because you actually said “I would like to do whichever of these two alternatives leads to more utility.”

A) no one or almost no one thinks this way, and advice based on this sort of thinking is useless to almost everyone.

B) The entire point of the original post was that, if you try to do this, then you immediately get completely taken over by consideration of any gods you can imagine. When you say that thinking about unlikely gods is not “worth” the computational resources, you are sidestepping the very issue we are discussing. You have already decided it’s not worth thinking about tiny probabilities of huge returns.

I think he actually IS making the argument that you assign a low probability to, but instead of dismissing it I think it’s actually extremely important to decide whether to take certain courses based on how practical they are. The entire original purpose of this community is research into AI, and while you can’t choose your own utility function, you can choose an AI’s. If this problem is practically insoluble, then we should design AIs with only bounded utility functions.

Tim seemed to be implying that it would be absurd for unlikely gods to be the most important motive for determining how to act, but I did not see how anything that he said showed that doing so is actually a bad idea.

What? I did not say that; I said that thinking about unlikely gods might just be one’s actual preference. I also pointed out that Tim did not prove that unlikely gods are more important than likely gods, so one who accepts most of his argument might still not motivated by “a million unlikely gods”.

That article is paywalled. It was published in 2003. Hajek’s entry about Pascal’s Wager in the Stanford Encylopedia of Philosophy is free and was substantively revised (hopefully by Hajek) in 2008, so there’s a good chance the latter contains all the good ideas in the former and is easier to get to. The latter does mention the idea that utilities should be bounded, and many other things potentially wrong with Pascal’s wager. There’s no neat list of four items that looks like an obvious match to the title of the paywalled article.

You can find it here though.

Thanks for the pointer to a free version of Hajek’s “Waging War on Pascal’s Wager” paper. One of his alternative formulations uses surreal numbers for utilities, much to my surprise.

The main thrust is that either the utility of Heaven isn’t the best possible thing, or it is the best possible thing and a mixed strategy of betting on heaven with probability p and betting on nothing with probability 1-p also gives infinite utility, for positive p. Thus, if Heaven is the best possible thing, Pascal’s Wager doesn’t rule out mixed strategies.

If someone could check my math here—I don’t think surreal numbers let you assign a utility to the St. Petersburg paradox. The expected utility received at each step is 1, so the total utility is 1 + 1 + 1 + … . Suppose that sum is X. Then X + 1 = X. This is not true for any surreal number, right?

Alan Hajek’s article is one of the stupidest things I’ve ever read, and a depressing indictment on the current state of academic philosophy. Bunch of pointless mathematical gimmicks which he only thinks are impressive because he himself barely understands them.

Or the surreals?

You might also be interested in Peter de Blanc’s paper on this, which is essentially a formal version of the arguments you discussed here.

Why should I not attach a probability of zero to the claim that you are able to grant unbounded utility?

Let GOD(N) be the claim that you are a god with the power to grant utility at least up to 2**N. Let P(GOD(N)) be the probability I assign to this. This is a nonincreasing function of N, since GOD(N+1) implies GOD(N).

If I assign a probability to GOD(N) of 4**(-N), then the mugging fails. Of course, this implies that I have assigned GOD(infinity), the conjunction of GOD(N) over all N, a probability of zero, popularly supposed to be a sin. But while I can appreciate the reason for not assigning zero to ordinary, finite claims about the world, such as the existence of an invisible dragon in your garage, I do not see a reason to avoid this zero.

If extraordinary claims demand extraordinary evidence, what do infinite claims require?

Assigning zero probability to claims is bad because then one can’t ever update to accept the claim no matter what evidence one has. Moreover, this doesn’t seem to have much to do with “infinite claims” given that there are claims involving infinity that you would probably accept. For example, if we got what looked like a working Theory of Everything that implied that the universe is infinite, you’d probably assign a non-zero probability to the universe being infinite. You can’t assign all hypotheses involving infinity zero probability if you want to be able to update to include them.

Suppose I randomly pick a coin from all of Coinspace and flip it. What probability do you assign to the coin landing heads? Probably around

^{1}⁄_{2}.Now suppose I do the same thing, but pick N coins and flip them all. The probability that they all come up heads is roughly 1/2^N.

Suppose I halt time to allow this experiment to continue as long as we want, then keep flipping coins randomly picked from Coinspace until I get a tail. What is the probability I will never get a tail? It should be the limit of 1/2^N as N goes to infinity, which is 0. Events with probability of 0 are allowed—indeed, expected—when you are dealing with infinite probability spaces such as this one.

It’s also not true that we can’t ever update if our prior probability for something is 0. It is just that we need infinite evidence, which is a scary way of saying that the probability of receiving said evidence is also 0. For instance, if you flip coins infinitely many times, and I observe all but the first 10 and never see “tails” (which has a probability of 0 of happening) then my belief that all the coins landed “heads” has gone up from 0 to 1/2^10 = 1/1024.

There are only countably many hypotheses that one can consider. In the coin flip context as you’ve constructed the probability space there are uncountably many possible results. If one presumes that there’s a really a Turing computable (or even just explicitly definable in some axiomatic framework like ZFC) set of possibilities for the behavior of the coin, then there are only countably many each with a finite probability. Obviously, this in some respects makes the math much ickier, so for most purposes it is more helpful to assume that the coin is really random.

Note also that your updating took an infinite amount of evidence (since you observed all but the first 10 flips) . So it is at least fair to say that if one assigns probability zero to something then one can’t ever update in finite time, which is about as bad as not being able to update.

I introduced the concept of CoinSpace to make it clear that all the coinflips are independent of each other: if I were actually flipping a single coin I would assign it a nonzero (though very small) probability that it never lands “tails”. Possibly I should have just said the independence assumption.

And yes, I agree that if we postulate a finite time condition, then P(X) = 0 means one can’t ever update on X. However, in the context of this post, we don’t have a finite time condition: God-TimFreeman explicitly needs to stop time in order to be able to flip the coin as many times as necessary. Once we have that, then we need to be able to assign probabilities of 0 to events that almost never happen.

I linked to the article expressing that view. It makes a valid point.

I am not saying anything about all claims involving infinity. I am addressing the particular claim in the original post.

Yes, assigning GOD(infinity) a probability of zero means that no finite amount of evidence will shift that. For this particular infinite claim I don’t see a problem with that.

Thoroughgoing rejection of 0 and 1 as probabilities means that you have to assign positive probability to P(A & ~A). You also have to reject real-valued variables—the probability of a randomly thrown dart hitting a particular number on the real line is zero. Unless you can actually do these things—actually reconstruct probability theory in a way that makes P(A|B) and P(~A|B) sum to less than 1, and prohibit uncountable measure spaces—then claiming that you should do them anyway is to make the real insight of Eliezer’s article into an empty slogan.

So how do you determine which claims you are giving a prior probability of zero and which you don’t?

This connects to a deep open problem- how do we assign probabilities to the chances that we’ve made a logical error or miscalculated. However, even if one is willing to assign zero probability to events that contain inherent logical contradictions, that’s not at all the same as assigning zero probability to a claim about the empirical world.

If claims about the empirical world can have arbitrarily small probability, then a suitable infinite conjunction of such claims has probability zero, just as surely as P(A&~A) does.

For Pascal’s Mugging scenarios it just seems a reasonable thing to do. Gigantic promises undermine their own credibility, converging to zero in the limit. I don’t have a formally expressed rule, but if I was going to work on decision theory I’d look into the possibility of codifying that intuition as an axiom.

What if we came up with a well-evidenced theory of everything that implied GOD(infinity)?

It’s not just contrived scenarios; see http://arxiv.org/abs/0712.4318. If utility is unbounded, infinitely many hypotheses can result in utility higher than N for any N.

How is this any different than saying “until you can actually make unbounded utility functions converge properly as discussed in Peter de Blanc’s paper, using expected utility maximization is an empty slogan”?

I’m not convinced by expected utility maximization either, and I can see various possibilities of ways around de Blanc’s argument besides bounding utility, but those are whole nother questions.

ETA: Also, if someone claims their utility function is bounded, does that mean they’re attaching probability zero to it being unbounded? If they attach non-zero probability, they run into de Blanc’s argument, and if they attach zero, they’ve just used zero as a probability. Or is having a probability distribution over what one’s utility function actually is too self-referential? But if you can’t do that, how can you model uncertainty about what your utility function is?

Do you reject the VNM axioms? I have my own quibbles with them—I don’t like they way they just assume that probability exists and is a real number and I don’t like axiom 3 because it rules out unbounded utility functions—but they do apply in some contexts.

Can you elaborate on these?

There is no good theory of this yet. One wild speculation is to model each possible utility function as a separate agent and have them come to an agreement. Unfortunately, there is no good theory of bargaining yet either.

Not with any great weight, it’s just a matter of looking at each hypothesis and thinking up a way of making it fail.

Maybe utility isn’t bounded below by a computable function (and

a fortioriis not itself computable). That might be unfortunate for the would-be utility maximizer, but if that’s the way it is, too bad.Or—this is a possibility that de Blanc himself mentions in the 2009 version—maybe the environment should not be allowed to range over all computable functions. That seems quite a strong possibility to me. Known physical bounds on the density of information processing would appear to require it. Of course, those bounds apply equally to the utility function, which might open the way for a complexity-bounded version of the proof of bounded utility.

Good point, but I find it unlikely.

This requires assigning zero probability to the hypothesis that there is no limit on the density of information processing.

I don’t see any reason to dispute Axioms 2 (transitivity) and 4 (independence of alternatives), although I know some people dispute Axiom 4.

For Axiom 3 (continuity), I don’t have an argument against, but it feels a bit dodgy to me. The lack of inferential distance between the construction of lotteries and the conclusion of the theorem gives me the impression of begging the question. But that isn’t my main problem with the axioms.

The sticking point for me is Axiom 1, the totality of the preference relation. Why should an ideal rational agent, whatever that is, have a preference—even one of indifference—between every possible pair of alternatives?

“An ideal rational agent, whatever that is.” Does the concept of an ideal rational agent make sense, even as an idealisation? An ideal rational agent, as described by the VNM axioms, cannot change its utility function. It cannot change its ultimate priors. These are simply what they are and define that agent. It is logically omniscient and can compute anything computable in constant time. What is this concept useful for?

It’s the small world/large world issue again. In small situations, such as industrial process control, that are readily posed as optimisation problems, the VNM axioms are trivially true. This is what gives them their plausibility. In large situations, constructing a universal utility function is as hard a problem as constructing a universal prior.

How would it act if asked to choose between two options that it does not have a preference between?

It can, it just would not want to, ceteris paribus.

It is a starting point (well, a middle point). I see no reason to change my utility function or my priors; I do not desire those almost by definition. Infinite computational ability is an approximation to be correct in the future, as is, IMO, VNM axiom 3. This is what we have so far and we are working on improving it.

The point is that there will be options that it could never be asked to choose between.

I become less and less convinced that utility maximisation is a useful place to start. An ideal rational agent must be an idealisation of real, imperfectly rational agents—of us, that is. What can I do with a preference between steak and ice cream? Sometimes one of those will satisfy a purpose for me and sometimes the other; most of the time neither is in my awareness at all. I do not need to have a preference, even between such everyday things, because I will never be faced with a choice between them. So I find the idea of a universal preference uncompelling.

When faced with practical trolley problems, the practical rational first response is not to weigh the two offered courses of action, but to look for other alternatives. They don’t always exist, but they have to be looked for. Hard-core Bayesian utility maximisation requires a universal prior that automatically thinks of all possible alternatives. I am not yet persuaded (e.g. by AIXI) that a practical implementation of such a prior is possible.

Does this involve probabilities of zero or just ignoring sufficiently unlikely events?

I’m not sure I understand this; is this a choice between objects or between outcomes? If it is between outcomes, it can occur. If it is between objects, it is not the kind of thing described by the frameworks that we are discussing since it is not actually a choice that anyone makes; one may choose for an object to existed or to be possessed, but it is a category error to choose an object (though that phrase can be used as a shorthand for a different type of choice, I think it is clear what it means).

I don’t think there’s any way to avoid probabilities of zero. Even the Solomonoff universal prior assigns zero probability to uncomputable hypotheses. And you never have probabilities at the meta-level, which is always conducted in the language of plain old logic.

Between outcomes. How is this choice going to occur?

More generally, what is an outcome? In large-world reasoning, it seems to me that an outcome cannot be anything less than the entire history of one’s forward light-cone, or in TDT something even larger. Those are the things you are choosing between, when you make a choice. Decision theory on that scale is very much a work in progress, which I’m not going to scoff at, but I have low expectations of AGI being developed on that basis.

There are people working on this. EY explained his position here.

However, that is somewhat tangential. Are you proposing that decision making should be significantly altered by ignoring certain computable hypotheses—since Solomonoff induction, despite its limits, does manifest this problem—in order to make utility functions converge? That sounds horribly ad-hoc (see second paragraph of this).

I agree.

Any decision process that does not explicitly mention outcomes is only useful insofar as its outputs are correlated with our actual desires, which are about outcomes. If outcomes are not part of an AGI’s decision process, they are therefore still necessary for the design of the AGI. They are probably also necessary for the AGI to know which self-modifications are justified, since we cannot foresee which modifications could at some point be considered.

If I was working on that, I could say it was being worked on. I agree that an ad-hoc hack is not what’s called for. It needs to be a principled hack. :-)

Are they really? That is, about outcomes in the large-world sense we just agreed on. Ask people what they want, and few will talk about the entire future history of the universe, even if you press them to go farther than what they want right now. I’m sure Eliezer would, and others operating in that sphere of thought, including many on LessWrong, but that is a rather limited sense of “us”.

Can you come up with a historical example of a mathematical or scientific problem being solved—not made to work for some specific purpose, but solved completely—with a principled hack?

I don’t see your point. Other people don’t care about outcomes but a) their extrapolated volitions probably do and b) if people’s extrapolated volitions don’t care about outcomes, I don’t think I’d want to use them as the basis of a FAI.

Limited comprehension in ZF set theory is the example I had in mind in coining the term “principled hack”. Russell said to Frege, “what about the set of sets not members of themselves?”, whereupon Frege was embarrassed, and eventually a way was found of limiting self-reference enough to avoid the contradiction. There’s a principle there—unrestricted self-reference can’t be done—but all the methods of limiting self-reference that have yet been devised look like hacks. They work, though. ZF appears to be consistent, and all of mathematics can be expressed in it. As a universal language, it completely solves the problem of formalising mathematics.

(I am aware that there are mathematicians who would disagree with that triumphalist claim, but as far as I know none of them are mainstream.)

Being a mathematician who at least considers himself mainstream, I would think that ZFC and the existence of a large cardinal is probably the minimum one would need to express a reasonable fragment of mathematics.

If you can’t talk about the set of all subsets of the set of all subsets of the real numbers, I think analysis would become a bit… bondage and discipline.

Surely the power set axiom gets you that?

That it exists, yes. But what good is that without choice?

Ok, ZFC is a more convenient background theory than ZF (although I’m not sure where it becomes awkward to do without choice). That’s still short of needing large cardinal axioms.

The idea of programming ZF into an AGI horrifies my aesthetics, but that is no reason not to use it (well it is an indication that it might not be a good idea but in this specific case ZF does have the evidence on its side). If expected utility, or anything else necessary for an AGI, could benefit from a principled hack as well-tested as limited comprehension, I would accept it.

The hypothesis that the universe is infinite is equivalent to the hypothesis that no matter how far you travel (in a straight line through space), you can be infinitely certain that it won’t take you someplace you’ve been. Convincing you that the universe is infinite should be roughly as hard as convincing you that there’s zero probability that the universe is infinite, because they’re both claims of infinite certainty in something. (I think.)

I’d like to be able to boil that down to “infinite claims require infinite evidence”, but it seems to be not quite true.

The probability is roughly the probability of consistent combined failure of all the mental systems you can use to verify that (knowable?) actual infinities are impossible; similar to your probability that 2 + 2 = 3.

Even if you do assign zero probability, what makes you think that in this specific case zero times infinity should be thought of as zero?

Because otherwise you get mugged.

You don’t literally multiply 0 by infinity, of course, you take the limit of (payoff of N)*probability(you actually get that payoff) as N goes to infinity. If that limit blows up, there’s something wrong with either your probabilities or your utilities. Bounding the utility is one approach; bounding the probability is another.

Your priors are what they are, so yes, you can attach a prior probability of zero to me being a god. In practice, I highly recommend that choice.

I think the universal prior (a la Solmonoff induction) would give it positive probability, FWIW. A universe that has a GOD(infinity) seems to me describable by a shorter program than one that has GOD(N) for N large enough to actually be godlike. God simply stops time, reads the universe state (with some stub substituted for himself to avoid regression), writes a new one, then continues the new one.

I thought this, but now I’m not sure. Surely, if you were God, you would be able to instantly work out BB(n) for any n. This would make you uncomputable, which would indeed mean the Solomonoff prior assigns you being God a probability of zero.

There is quite a good argument that this treatment of uncomputables is a flaw rather than a feature of the Solomonoff prior, although right now it does seem to be working out quite conveniently for us.

I agree that the Solomonoff prior isn’t going to give positive probability to me having any sort of halting oracle. Hmm, I’m not sure whether inferring someone’s utility function is computable. I suppose that inferring the utility function for a brain of fixed complexity when arbitrarily large (but still finite) computational capacity can be brought to bear could give an arbitrarily close approximation, so the OP could be revised to fix that. It presently doesn’t seem worth the effort though—the added verbage would obscure the main point without adding anything obviously useful.

A bigger problem is your ability to hand out arbitrarily large amounts of utility. Suppose the universe can be simulated by an N state Turing machine, this limits the number of possible states it can occupy to a finite (but probably very large) number. This in turn bounds the amount of utility you can offer me, since each state has finite utility and the maximum of a finite set of finite numbers is finite. (The reason why this doesn’t automatically imply a bounded utility function is that we are uncertain of N.)

As a result of this:

P(you can offer me k utility) > 0 for any fixed k

but

P(you can offer me x utility for any x) = 0

To be honest thought, I’m not really comfortable with this, and I think Solomonoff needs to be fixed (I don’t feel like I believe with certainty that the universe is computable). The real reason why you haven’t seen any of my money is that I think the maths is bullshit, as I have mentioned elsewhere.

Thinking about it more, this isn’t a serious problem for the dilemma. While P(you can offer me k utility) goes to zero as k goes to infinity but there’s no reason to suppose it goes faster then 1/n does.

This means you can still set a similar dilemma, with a probability of you being able to offer me 2^n utility eventually becoming greater than (1/2)^n for sufficiently large n, satisfying the conditions for a St Petersburg Lottery.

That’s just Pascal’s mugging, though; the problem that “the utility of a Turing machine can grow much faster than its prior probability shrinks”.

By Rice’s theorem, inferring utility functions is uncomputable in general, but it is probably possible to do for humans. If not, that would be quite a problem for FAI designers.

Counterargument #1:If you are god, then the universe allows for “gods” which can arbitrarily alter the state of the universe. Therefore, any utility gains I make have an unknown duration—it’s entirely possible that an instant after you grant my utility, you’ll take it away. Furthermore, if you are god, you’re (a) flipping a coin and (b) requiring a donation, so I strongly suspect you are neither friendly nor omni-benevolent. Therefore, I have no reason to favour “god will help me for $1″ over “god will hurt me for $1”—you could just as easily be trying to trap me, and punish anyone who irrationally sends you $1.

1b) I have no reason to select you as a likely god candidate, compared to the ~infinite number of people who exist across all of space-time and all Everett branches.

Counterargument #2:There are finite many states of “N”.

2a) Eventually the universe will succumb to heat death. Entropy means that we can’t gain information from the coin flip without approaching this state. 2b) Even if you flip coins incredibly fast and in parallel, I will still eventually die, so we can only count the number of coin flips that happen before then.

Counterargument #3:Assume a utility function which is finite but unbounded. It cannot handle infinity, and thus your mugging relies on an invalid input (infinite utility), and is discarded as malformed.

3b: Assume that my utility function fails in a universe as arbitrary as the one implied by you being god, since I would have witnessed a proof that state(t+1) does

notnaturally follow state(t)Counterargument #4:Carefully assign p(you are god) = 1/N, where N approaches infinity in such a way as to cancel out the infinite sum you are working with. This seems contrived, but my mind assigns p(you are god) = “bullshit, prove it”, and this is about the closest I can come to expressing that mathematically ;)

Counterargument #5:Assign probabilities by frequency of occurrence. There have been no instances of god yet, so p(god) = 0. Once god has been demonstrated, I can update off of this 0, unlike with Bayesian statistics. My utility function may very well be poorly designed, and I believe this can still allow for FAI research, etc.: social standing, an interest in writing code, peer pressure, etc. all provide motivations even if p(FAI) = 0. One could also assume that even where p(x) = 0, a different function rewards utility for investigating and trying to update even zero-probability events (in which case I’d get some utility from mailing you $1 to satisfy my curiosity, although I suspect not enough to overcome the cost of setting up a PayPal account and losing $1)

Counterargument 3(b) is the most convincing of these to me.

If my decision theory is predicated on some kind of continuity in states of the universe, and my decision is based on some discontinuity in the state of the universe, my decision theory can’t handle this.

This is troubling, but to try to make it more formal: if I believe something like “all mathematically possible universes exist” then promising to “change universes to UN(N)” is a meaningless statement. Perhaps the wager should be rephrased as “increase the measure of universes of higher utility”?

Counterargument #1 is similar to argument against Pascal’s wager that weights Christianity and anti-Christianity equally. Carl’s comment addresses this sort of thing pretty well. That TimFreeman has asserted that you should suspect he is a god is (

verysmall) positive evidence that he is one, that he has the requisite power and intelligence to write a lesswrong post is also very small but positive evidence, &c.Counterargument #2 implies the nonexistence of gods. I agree that gods are implausible given what we know, but on the other hand, the necessity of entropy and heat-death need not apply to the entire range of UN(X),

I don’t understand Counterargument #3. Could you elaborate a little?

Counterargument #4 seems similar to Robin Hanson’s argument against the 3^^^3 dust specks variant of Pascal’s Mugging, where if I recall correctly he said that you have to discount by the improbability of a single entity exercising such power over N distinct persons, a discount that monotonically scales positively with N. If the discount scales up fast enough, it may not be possible to construct a Pascal’s Mugging of infinite expected value. You could maybe justify a similar principle for an otherwise unsupported claim that you can provide N utilons.

Counterargument #5 raises an interesting point: the post implicitly assumes a consistent utility function that recognizes the standard laws of probability, an assumption that is not satisfied by the ability to update from 0.

It’s playing on the mathematical difference between infinite and unbounded.

In plain but debatably-accurate terms, infinity isn’t a number. If my utility function only works on numbers, you can no more give it “infinity” than you can give it an apple.

As a couple examples: Any given polygon has ‘n’ sides, and there are thus infinite many polygons, but no polygon has ‘infinity’ sides. Conversely, there are infinitely many real numbers such that 0 < x < 1, but x is bounded (it has finite limits).

So I’m asserting that while I cannot have “infinity” utility, there isn’t any finite bound on my utility: it can be 1, a million, 3^^^3, but not “infinity” because “infinity” isn’t a valid input.

Utility doesn’t have to take infinity as an argument in order to be infinite. It just has to have a finite output that can be summed over possible outcomes. In other words, if Sum(p X U(a) + (1-p) X U(^a)) is a valid expression of expected utility, then by induction, Sumi=1 to n X U(i)) should also be a valid expression for any finite n. When you take the limit as n->infinity you run into the problem of no finite expectation, but an arbitrarily large finite sum (which you can get with a stopping rule) ought to be able to establish the same point.

I still don’t understand 3b. TimFreeman wasn’t postulating an acausal universe, just one in which there are things we weren’t expecting.

magfrump seems to have nailed it. I find it interesting how controversial that one has been :)

For infinite sums, basically, if the sum

isinfinite, then any finite probability gives it infinite expected utility (infinity[1/N] = infinity). If both the sum and probability are finite, then one can argue the details (N[1/N^2] < 1). The math is different between an arbitrarily large finite and an infinite. Or, at least, I’ve always assumed Pascal’s Wager relied on that, because otherwise I don’t see how it produces an infinite expected utility regardless of scepticism.If the utility can be arbitrarily large depending on N, then an arbitrarily large finite skepticism discount can be overcome by considering a sufficiently large N.

Of course a skepticism discount factor that scales with N might be enough to obviate Pascal’s Wager.

Agreed. However, you also have no reason to carry on your business dealing with ordinary things rather than focusing exclusively on the various unlikely gods that might be trying to jerk you around. I don’t win, but you lose.

Yes, I forgot to mention that if I’m a god I can stop time while I’m flipping coins.

If you play by those rules, you can’t assign a utility to the infinite gamble, so you can’t make decisions about it. If the infinite gamble is possible, your utility function is failing to do its job, which is to help you make decisions. Tell me how you want to fix that without bounded utility.

p(I am god) = 0 is simpler and gets the job done. That appears to be more restrictive than the Universal Prior—I think the universal prior would give positive probability to me being god. There might be a general solution here to specifying a prior that doesn’t fall into these pits, but I don’t know what it is. Do you?

How would this work in general? How could you plan for landing on the moon if it hasn’t been done before? You need to distinguish “failure is certain because we put a large bomb in the rocket that will blow up before it gets anywhere” from “failure is certain because it hasn’t been done before and thus p(success) = 0″.

Yes I do. Dealing with ordinary things has a positive expected utility. Analysing anything that looks like a Pascal’s Mugging has ~zero expected utility as far as the wager itself goes, plus that derived from curiosity and a desire to study logical problems. I believe that Counterargument #5 can be tuned and expanded to apply to all such muggings, so I’ll be writing that up in a bit :)

Assuming Bayesian probability, p=0 means “I refuse to consider new evidence”, which is contrary to the goal of “bullshit, prove it” (I suspect that p=1/infinity might have practically the same issue unless dealing with a god who can provide infinite bits of evidence; fortunately in this case you are making exactly that claim :))

This falls back to 3b, then: My utility function isn’t calibrated to a universe where you can ignore physics. Furthermore, it also falls back to 1b: Once we assume physics doesn’t apply, we get an infinite number of theories to choose from, all with equal likelihood, so once again why select your theory out of that chaos?

p(moon landing) = 0. p(I will enjoy trying despite the inevitable failure) > 0. p(I will feel bad if I ignore the math saying this IS possible) > 0. p(People who did the moon landing had different priors) > 0. etc.

It’s not elegant, but it occurred to me as a seed of a thought, and I should have a more robust version in a little bit :)

I agree with your conclusion, but don’t follow the reasoning. Can you say more about how you identify something that looks like a Pascal’s Mugging?

If something looks like a Pascal’s Mugging when it involves ridiculously large utilities, then maybe you agree with me that you should have bounded utilities.

The laws of physics are discovered, not known a-priori, so you can’t really use that as a way to make decisions.

Not equal likelihood. Universal Prior, Solmonoff induction.

Once you have chaos, you have a problem. Selecting my theory over the others is only an issue for me if I want to collect money, but the chaos is a problem for you even if you don’t select my theory. You’ll end up being jerked around by some other unlikely god.

I’ll be interested to read about it. Good luck. I hope there’s something there for you to find.

“Pascal’s Mugging” seems to be any scam that involves ridiculously large utilities, and probably specifically those that try to exploit the payoff vs likelihood ratio in that way. A scam is approximately “an assertion that you should give me something, despite a lack of strong evidence supporting my assertion”. So if you offered me $1,000, it’d be just a scam. If you offer me eternal salvation, it’s Pascal’s Mugging.

My utility function takes the

entire historyof the universe as an argument (past and future). You could call that a “state” but in that context I’d need a clearer definition of:“the present situation”

“I am able to change the universe to any state I choose”

Bullshitit does. How does it even get access to the entire history of the universe? :)The utility function is a mathematical function. It simply evaluates whatever hypothetical universe-history you feed it.

The question of where the

agentgets its expected future-universe-history from is more interesting though, and it’s something you’re right to be sceptical about. Here we’re talking about bounded rationality and all sorts of wonderful things beyond the scope of the original post (also for the purposes of discussion I’m pretending to be something more closely resembling an expected-utility-maximizer than what I actually am).History books.

Yeah, there’s uncertainty over it, but it’s not harder than regular expected utility.

Emphasis mine. I know I went to public school, but I do not recall any future-history lessons :)

They’re called physics, and probability theory. And, for that matter, history—teaches you to compute the probability that Queen Whatsherface will be assassinated given that the LHC doesn’t work and you had juice for breakfast.

This problem is the reason for most of the headache that LW is causing me and I appreciate any attention it receives.

Note that when GiveWell, a charity evaluation service, interviewed the SIAI, they hinted at the possibility that one could consider the SIAI to be a sort of Pascal’s Mugging:

Could this be part of the reason why Eliezer Yudkowsky wrote that the SIAI is only a worthwhile charity if the odds of being wiped out by AI are larger than 1%?

Even mathematicians like John Baez are troubled by the unbounded maximization of expected utility.

Could it be that we do not have bounded utility but rather only accept a limited degree of uncertainty?

When people buy insurance, they often plan for events that are less probable than 1%. The intuitive difficulty here is not that you act on an event with probability of 1%, but that you act on an event where the probability (be it 1% or 10% or 0.1%) is estimated intuitively, so that you have no frequency statistics to rely on, and there remains great uncertainty about the value of the probability.

People fear acting on uncertainty that is about to be resolved, for if it’s resolved not in their favor, they will be faced with wide agreement that in retrospect their action was wrong. Furthermore, if the action is aimed to mitigate an improbable risk, they even expect that the uncertainty will resolve not in their favor. But this consideration doesn’t make the estimated probability any lower, and estimation is the best we have.

The analogy with insurance isn’t exact. One could argue (though I think one would be wrong) that diminishing returns related to bounded utility start setting in on scales larger than the kinds of events people typically insure against, but smaller than whatever fraction of astronomical waste justifies investing in combating 1% existential risk probabilities.

Me too. Would vote you up twice if I could.

I don’t think he mentioned “unbounded” in the post you’re citing. He talked about risk aversion, and that can be encoded by changing the utility function.

The SIAI seems to be progressing slowly. It is difficult to see how their “trust us” approach will get anywhere. The plan of writing code in secret in a basement looks pretty crazy to me. On the more positive side, they

dohave some money and some attention....but overall—why consider the possibility of the SIAI

taking over the world? That is not looking as though it is too likely an outcome.This doesn’t necessarily show that humans have bounded utility, just that the heuristics we use to estimate our utility break down in some circumstances. We already know that. Does one consider the fact that people have non-transitive preferences for certain bets indicate that they don’t have utility functions? If not, how is that argument different from this one?

I don’t see where heuristics came into play in the OP. Heuristics are generally about approximation, and in this case the math is broken even before you start trying to approximate it.

I can imagine fixing humans so they don’t have the non-transitive preferences for finite bets. Roughly speaking, making them into a utility-maximizer would fix that. The scenario described in the OP breaks a rational utility-maximizer with unbounded utility and a reasonable prior, so far as I can tell, so it’s different.

I think that it is possible for me to have unbounded utility, yet still to assign a rather small utility to every outcome in any world in which TimFreeman is God (and I am not).

The same applies to Omega. If I do, in fact, live in a universe in which an omnipotent maniac performs psychological experiments, then much of my joy in living is lost.

There is an implicit assumption in all of these mugging scenarios that the existence of an all-powerful mugger who can intervene at any time has no effect on relative cardinal utilities of outcomes. That assumption seems unjustified.

Taken care of in the OP’s stipulations: as God he will change the universe to one in which he need not be. LCPW applies.

Ah! I missed that. Thx. But I’m really not all that happy living in a world where TimFreeman

wasGod, either. I suppose that means that I am not a real consequentialist.A consequentialist whose utility function’s domain is world-histories instead of world-states is still a consequentialist...

That leaves me curious as to what extraneous information a

non-consequentialistsneaks into the utility function’s domain. The world’s state and the history of that state strike me asall there is.I think non-consequentialists as Wei Dai uses the term don’t use utility functions.

Ah, yes. That works. Thanks.

They could focus on different information. A consequentialist discards information about virtue, a virtue theorist discards consequences.

Ok, but it seems to me that a virtue theorist must believe that information about virtue is a part of information about the state of the world. So does the consequentialist deny that all this virtue information is

realinformation—information that can “pay rent” by generating correct anticipation of future experiences?Odd, a few hours ago I thought I knew what a consequentialist was. But now I can’t seem to understand the concept regardless of whether I accept or reject Wei_Dai’s claim.

But if he was a god, you choice to not give him money wouldn’t change it. To be immune to his argument means that the restriction of a generally unbounded utility to a subset of states with TF being a god is bounded, which is strange, although probably consistent.

I’m sceptical of the maths. It seems like you may have committed the grave sin of taking an infinity without using a limit, though I’m not sure. Certainly there is something very funny going on when it is a mathematical certainty that my actual winnings will be less than my ‘expected winnings’.

Also, even if I bought the maths, why should I give the money to you? You’ve explicitly claimed not to be God, I’m sure there’s some crazy guy I can find who’ll happily claim the opposite, it seems like I should have significantly better odds with him :)

We can contrive perfectly reasonable finite examples in which your actual winnings are 99.99% certain to be less than your expected winnings. Why is this okay, but bumping the percentage up to 100% is suddenly suspicious?

In general, 100% is always much more suspicious than 99.99%. For example, if you tell me that a machine you’ve built has a 99.99% chance of working I might be worried about overconfidence but in principle you could be right and if you show me enough justification I might agree. If you tell me it has a 100% chance of working then something very fishy is going on, most likely you are just lying.

For averages, it is a trivial theorem of finite probability theory that I have non-zero probability of receiving at least the average. When your infinite reasoning starts violating laws like that you lose your right to make use of the other laws, like expected utility theory, because you may have broken them in the process.

Infinity is not a real number. It violates at least one axiomatic principle of real numbers (that every non-zero number has a reciprocal). This means you can’t just go and use it in expected utility calculations, since the Von-Neumann and Mortenson proved their theorem while assuming real numbers would be used (and also assuming that there would only be finitely many outcomes).

I can’t articulate rigorously exactly what is going wrong beyond what I said above, because the reasoning I am criticising is itself non-rigorous. However, the basic idea is that an average works by various possible outcomes each sort of ‘pulling’ the outcome towards themselves. This doesn’t explain how it can get above everything, which suggests there must be a sort of hidden, probability zero pay-off infinity outcome doing the actual work (this also makes sense, there is indeed a possibility, with probability zero, that the pay-off will be infinite). My utility function doesn’t accept infinity pay-offs so I reject the offer.

We’re getting this infinity as a limit though, which means that we can approach the infinity case by perfectly reasonable cases. In the case of the St. Petersburg lottery, suppose that the lottery stops after N coin flips, but

you get to choose N. In that case, you can still get your payout arbitrarily large by choosing N sufficiently high. “Arbitrarily large” seems like a well-behaved analogue of infinity.In the case of the OP, I’m sure that if TimFreeman were a god, he would be reasonably accommodating about special requests such as “here’s $1, but please, if you’re a god, don’t flip the coin more than N times.” Suddenly, there’s no infinity, but by choosing N sufficiently high, you can make the arbitrarily large payout in the unlikely case that TimFreeman is a god counterbalance the certain loss of $1.

Okay, that is definitely more reasonable. It’s now essentially become analogous to a Pascal’s mugging, where a guy comes up to me in the street and says that if I give him £5 then he will give me whatever I ask in the unlikely event that he is God. So why waste time with a lottery, why not just say that?

I don’t have a really convincing answer, Pascal’s Mugging is a problem that needs to be solved, but I suspect I can find a decision-theory answer without needing to give up on what I want just because its not convenient.

The best I can manage right now is that there is a limit to how much I can specify in my lifetime, and the probability of Tim being God multiplied by that limit is too low to be worthwhile.

The reason the lottery is there is that you don’t

haveto specify N. Sure, if you do, it makes the scary infinities go away, but it seems natural that you shouldn’timproveyour expected outcome by adding a limit on how much you can win, so it seems that the outcome you get is at least as good as any outcome you could specify by specifying N.True, “seems natural” isn’t a good guideline, and in any case it’s obvious that there’s something fishy going on with our intuitions. However, if I had to point to something that’s probably wrong, it probably wouldn’t be the intuition that the infinite lottery is at least as good as any finite version.

My poor feeble meat brain can only represent finitely many numbers. My subjective probability that you’ll pay off on the bet rounds to zero as the utilities get big. So the sum converges to less than a dollar, even without hitting an upper bound on utilities.

You don’t have to do the sum explicitly. As a turing-complete being (well, probably), you can do all sorts of cool things that fall under the category of mathematical proof. So if you haven’t sent Tim your money you either have to not be capable of mathematical proofs, you have to have a bounded utility function, or you have to have no well-defined utility function at all.

Okay, so it’s probably the third one for all humans. But what if you were designing an AI that you knew could do mathematical proofs and had a well-defined utility function? Should it to send Tim its money or not?

Or we accept that the premise is flawed; I can have a defined, unbound utility function, and I can certainly do mathematical proofs, without sending god all my money :)

But you don’t. Why should I believe you can?

blinks. I’m honestly not sure why you’d assume Idon’t, but you seem pretty certain. Let’s start there?Let’s see the definition, then.

To be less aggravating, I’ll pre-explain: nothing personal, of course. I don’t believe

anyperson has a defined utility function. As for unbounded: there’s a largest number your brain can effectively code. I can buy an unbounded (except by mortality) sequence of equally subjectively strong preferences for a sequence of new states, each one equally better than the last, with time elapsed between so as for the last improved state to become the present baseline. But I don’t see how you’d want to call that an “unbounded utility function”. I’d appreciate a precise demonstration of how it is one. Maybe you could say that the magnitude of each preference is the same as would be predicted by a particular utility function.If i’m charitable, I can believe a similar claim to your original: you don’t know of or accept any reason why it shouldn’t be possible that you actually have an (approximation to?) an unbounded utility function. Okay, but that’s not the same as knowing it’s possible.

(speculation aired to ward off possible tedious game-playing. let me know if I missed the mark)

If your argument is that I can’t have a defined utility function, and concede that therefore I can’t be gamed by this, then I don’t think we actually disagree on anticipations, just linguistics and possibly some philosophy. Certainly nothing I’d be inclined to argue there, yeah :)

Close enough (I didn’t have any therefore in mind, just disagreement with what I thought you claimed), though I wouldn’t call the confusion linguistics or philosophy.

It does seem like I attempted to understand you too literally. I’m not entirely sure exactly what you meant (if you’d offered a reason for your belief, it might have been clearer what that belief was).

Thanks for helping us succeed in not arguing over nothing—probably a bigger coup than whatever it was we were intending to contribute.

You buy into this nonsense?!? What “mathematical proof” says to send Tim all your money?

I think the aforementioned feeble brain would be better off if it only represented utilities up to a certain size too. Imagine you find yourself with a choice between eating an apple and taking an action with a large payoff L and low probability P. If P rounds to zero, then you know that P is between zero and zero + epsilon, so L

P is somewhere you don’t know between zero and Lepsilon. If L * epsilon is larger than the expected utility of eating the apple, you won’t know which to do, and you don’t even know how much utility you might be giving up by taking the wrong choice. In practice you need the maximum possible L to be smaller than the utilities you typically care about divided by epsilon.Yes, certainly. I wasn’t supplying evidence against the “bounded utility” conclusion, just suggesting that there are alternate interpretations under which bounded utility doesn’t come up.

Having thought about it, I’m pretty sure that my preferences can be modeled with bounded utilities. I would justify that conclusion using the fact that there are only finitely many distinguishable worlds or mental states. If each of those has a finite utility, then my overall utility function is bounded.

One very critical factor you forgot is goal uncertainty! Your argument is actually even better than you think it is. If you assign an extremely low but non-zero probability that your utility function is unbounded, then you must still multiply it with infinity. And 1 is not a probability… There is no possible state that represent sufficient certainty that your utility function is bounded to justify not giving all your money to the mugging.

I WOULD send you my money, except the SIAI is a lot of orders of magnitude more likely than you to be a god (you didn’t define it’d be instant or direct) and they have a similar offer, so I’m mugged into maximizing amount of help given to the SIAI instead. But I DO bite the bullet of small probabilities of extremely large utilities, however repugnant and counter-intuitive it seems.

I suspect that calling your utility function itself into question like that isn’t valid in terms of expected utility calculations.

I think what you’re suggesting is that on top of our utility function we have some sort of meta-utility function that just says “maximize your utility function, whatever it is.” That would fall into your uncertainty trap, but I don’t think that is the case, I don’t think we have a meta-function like that, I think we just have our utility function.

If you were allowed to cast your entire utility function into doubt you would be completely paralyzed. How do you know you don’t have an unbounded utility function for paperclips? How do you know you don’t have an unbounded utility function for, and assign infinite utility to, the universe being exactly the way it would be if you never made a fully rational decision again and just went around your life on autopilot? The end result is that there are a number of possible courses of action that would all generate infinity utility and no way to choose between them because infinity=infinity. The only reason your argument sounds logical is because you are allowing the questioning of the boundedness of the utility function, but not its contents.

I think that knowledge of your utility function is probably a basic, prerational thing, like deciding to use expected utility maximization and Bayesian updating in the first place. Attempting to insert your utility function itself into your calculations seems like a basic logical error.

You are, in this very post, questing and saying that your utility function PROBABLY this and that you dont

thinkthere’s uncertainty about it… That is, you display uncertainty about your utility function. Check mate.Also, “infinity=infinity” is not the case. Infinity ixs not a number, and the problem goes away if you use limits. otherwise, yes, I even

probabölyhave unbounded but very slow growing facotrs for s bunch of thigns like that.Even if I was uncertain about my utility function, you’re still wrong. The factor you are forgetting about is uncertainty. With a bounded utility function infinite utility scores the same as a smaller amount of utility. So you should always assume a bounded utility function, because unbounded utility functions don’t offer any more utility than bounded ones and bounded ones outperform unbounded ones in situations like Pascal’s Mugging. There’s really no point to believing you have an unbounded function.

I just used the same logic you did. But the difference is that I assumed a bounded utility function was the default standard for comparison, whereas you assumed, for no good reason, that the unbounded one was.

I don’t know what the proper way to calculate utility when you are uncertain about your utility function. But I know darn well that doing an expected-utility calculation about what utility each function will yield and

using one of the two functions that are currently in dispute to calculate that utility is a crime against logic.If you do that you’re effectively assigning “having an unbounded function” a probability of 1. And 1 isn’t a probability.Your formulation of “unbounded utility function always scores infinity so it always wins” is not the correct way to compare two utility functions under uncertainty. You could just as easily say “unbounded and bounded both score the same, except in Pascal’s mugging where bounded scores higher, so bounded always wins.”

I think that using expected utility calculation might be valid for things like deciding whether you assign any utility at all to object or consequence. But for big meta-level questions about what your utility function

even isattempting to use them is a huge violation of logic.If I am a god, then it will be instant and direct; also, I’ll break the laws of physics/the Matrix/the meta-Matrix/etc. to reach states the SIAI can’t reach. If I am a god and you do not give me any money, then I’ll change the universe into the most similar universe where SIAI’s probability of success is divided by 2.

Can I get money?

The probability of the AI doing all of that (hey, time travel) is still much much larger.

So I explained this to my girlfriend, and she agreed to send you $1.00. Sadly, I apparently managed to completely lock myself out of PayPal the last time I had a grudge against them (they’ve made the news a few times for shady practices...), so I can’t provide the $1.

But, um, congratulations on mugging my girlfriend for $1! :)

(Her comment was “I was going to spend this on soda anyway; giving it away is a net utility gain since it means I won’t have it available”)

I simply

mustget into the habit of asking for money.Not doing this is probably my greatest failing.

First lesson of sales is that you have to ask to make the sale.

“I am a god” is to simplistic. I can model it better as a probability, that varies with N, that you are able to move the universe to UN(N). This tracks how good a god you are, and seems to make the paradox disappear.

How? Are you assuming that P(N) goes to zero?

Yes. This avoids assuming there is a non-zero probability that someone has infinite power; even the dark lords of the matrix couldn’t grant me unlimited utility. I think the single “I am a god” forced one into an over-strict dichotomy.

(there is a certain similarity to the question as to whether we should give non-zero probability to there existing non-computable oracles)

Are you certain that the likeliness of all your claims being true is not proportional to the size of the change in universe you are claiming to affect.

Almost any person can reasonably claim to be a utility generating god for small values of n for some set of common utility functions (and we don’t even have to give up our god-like ability). That is how most of us are able to find gainful employment.

The implausible claim is the ability to generate universe changes of

arbitraryutility value.My proposal is that any claim of utility generation ability has plausibility in inverse proportion to the size of effect that one claims to be able to produce. If I say I can produce delta-U ~$1000, that is somewhat plausible. If I say I can produce delta-U of $1,000,000 that might be plausible for some very high skill people, or given a long time to do it, but as a random person with little time, it’s extremely implausible. If I claim to be able to produce delta-U ~ (some amount of wealth > world GDP), that’s exceedingly implausible no matter who I am.

And of course, in order to make your mugging function, you would need to be able to produce unbounded utility. Your claim to unbounded utility generation is unboundedly implausible.

Admittedly, this is somewhat unsatisfactory as it effectively treats the unbounded implausibility of a classic onmipotent God figure as an axiom. But this is essentially the same trick as using a Bayesian Occam’s Razor to demonstrate atheism. If you aren’t happy with this line of reasoning, than I can’t see how you’d be happy with Occam’s Razor as an axiom, nor how you could legitimately claim that there’s a solid rational case for hard atheism.

Upvoted because the objection makes me uncomfortable, and because none of the replies satisfy my mathematical/aesthetic intuition.

However, requiring utilities to be bounded also strikes me as mathematically ugly and practically dangerous– what if the universe turns out to be much larger than previously thought, and the AI says “I’m at 99.999% of achievable utility already, it’s not worth it to expand farther or live longer”?

Thus I view this as a currently unsolved problem in decision theory, and a better intuition-pump version than Pascal’s Mugging. Thanks for posting.

It’s not worth

what?A small risk of losing the utility it was previously counting on.

Of course you can do intuition pumps either way- I don’t feel like I’d want the AI to sacrifice everything in the universe we know for a 0.01% chance of making it in a bigger universe- but some level of risk has to be worth a vast increase in potential fun.

It seems to me that expanding further would reduce the risk of losing the utility it was previously counting on.

LCPW isn’t even necessary: do you

reallythink that it wouldn’t make a difference that you’d care about?LCPW cuts two ways here, because there are two universal quantifiers in your claim. You need to look at every possible bounded utility function, not just every possible scenario. At least, if I understand you correctly, you’re claiming that

nobounded utility function reflects your preferences accurately.resources, whether physical or computational. Presumably the AI is programmed to utilize resources in a parsimonious manner, with terms governing various applications of the resources, including powering the AI, and deciding on what to do. If the AI is programmed to limit what it does at some large but arbitrary point, because we don’t want it taking over the universe or whatever, then this point might end up actually being before we want it to stop doing whatever it’s doing.

That doesn’t sound like an expected utility maximizer.

What wrong with this one? Would you be comfortable with that reply if it was backed by rigourous math?

Once again, this only implies that utility has to be controlled by probability, not that utility has to be bounded.

“I am a god” may

soundlike a simple enough hypothesis to have positive probability, but if it entails that you can grant arbitrary amounts of utility, and if probability approaches 0 as utility approaches infinity, then there is no escaping the fact that the probability that you are a god is 0.This doesn’t seem to say anything about the boundedness of human utility functions (which I think is pretty likely) that Pascal’s mugging doesn’t. And heck, Pascal’s mugger can just say “give me all your money.”

Pascal’s mugging requires the victim to say what probability he assigns to the mugger being honest, and this one doesn’t, so with this one I can fleece people en masse without having to have a conversation with each one.

Also, Pascal’s Wager as presented on LW involved creating other people, so this version avoids Hanson’s suggestion of assuming that you don’t control whether you’re the preexisting person or one of the new persons. This version works with an unbounded utility function that does not involve creating other people to get large utilities.

Otherwise, I agree.

ETA: Another advantage of the scenario in the OP over Pascal’s Mugging as presented on LessWrong is that the latter is extortion and the former is not, and people seem really keen on manipulating the extortioner when there is extortion. The OP managed not to trigger that.

So how about avoiding your version by saying that all the terms in my utility function are bounded except for the ones that scale linearly with the number of people?

That seems hackish, but given a mathematical definition of “person” it might be implementable. I don’t know what that definition would be. Given that we don’t have a clear definition for “person”, Hanson’s proposal (and anthropic arguments in general) seem like bunk to me.

Ah, true, because your promise is infinite expected utility without actually saying “infinite utility,” which might put some people off.

The point is that unbounded utility and infinite gambles lead to infinite utility, and infinite utility breaks the simplest version of the math. So putting people off is the purpose, but I meant for people to blame that on the unbounded utility, since I think that’s where the blame belongs.

If you are a god, how do I know you won’t just make the coin come out tails so you don’t have to pay up?

(ETA: But yes, my utility function

isbounded.)I’d say it’s an error to give weight to any particular highly-improbable scenario without any evidence to distinguish it from the other highly-improbable scenarios. Here’s why.

There is a nonzero possibility that some entity will acquire (or already have) godlike powers later today (as per your “I am a god” definition), and decide to use them to increase utility exponentially in response to a number derived somehow from an arbitrary combination of actions by any arbitrary combination of people in the past and the ever-moving present (and let’s remember that the requirement could equally well be “condition y is met” or “condition y is not met”). I can’t figure out a way to make the number of permutations actually infinite, but considering the negative as well as the positive options makes them cancel out anyway—we have no reason to believe that my posting this comment is more or less likely to trigger a utility increase than my (hypothetically) not posting this comment. The theoretically-possible outcome (huge increase in utility) does not depend on our actions in any predictable way, so there is no reason to modify our actions on this basis.

This leads to the following rational ‘conclusion’ (specifically considering this issue only) about taking any particular action, on a scale from −1 (definitely don’t do) through 0 (indifferent) to 1 (definitely do):

±1/n, n->infinity

(Edit: Actually, this should just be 0. I should lay off the maths when I’m tired.)

(where n is the number of different possible sequences of actions which could possibly trigger the utility increase, and n therefore is unthinkably huge and continues to grow exponentially with each passing second)

Alternatively:There is also a nonzero possibility etc etc

decreaseutilityetc. etc. Every scenario which could lead to massive increase in utility could instead lead to massive decrease in utility, and we have no way to determine which is less likely.Tim, you want “a good reason not to be jerked around by unlikely gods in general”. Personally I much prefer my first answer (and I suspect you will too), but my alternative answer offers a much more concise rebuttal for any claim of infinite utility increase from an unlikely god:

“Your unlikely god will grant arbitrarily large increase in utility if I take the specified action? Well, my unlikely god will wreak arbitrarily large decrease in utility if I take the specified action. Give me evidence that makes your god and its claim of positive utility more likely than my god and its claim of negative utility, and we can talk—until then the probabilities exactly balance out, so for now I’ll just carry on regardless.

I don’t see that “being jerked around by unlikely gods” is necessarily a problem. Doesn’t the good sense in donating to SIAI basically boil down to betting on the most-plausible god?

We constantly face a car that is about to run us over. We can see it and are very confident that the collision will be lethal if we do not jump out of its way. But there are people standing on the road curb yelling, “ignore the car, better think about how to solve this math problem”, or, “you can save a galactic civilization if you sacrifice your life”, or, “it’s just a simulated car, just ignore it”. Those people might be really smart and their arguments convincing, or they outweigh the utility of your own life with their predictions.

The problem is that we are really bad at making risk estimations under uncertainty, given limited resources and time to do so. Those people on the road curb might be right, and the expected utility formula does suggest that on average we are better off believing them, but we also know that they might be wrong, that in no possible branch of the multiverse we are actually going to receive the payoff or that our actions achieve what they claimed.

On one side we got our intuitions that tell us to ignore those people and jump, and on the other side we got higher cognition approved rules and heuristics that tell us to ignore the car.

I have an unbounded utility function, but my priors are built in such a way that expected utility is the same regardless of how you calculate it. For example, if there was a 2^-n chance of getting 2^n/n utility and a 2^-n chance of getting −2^n/n utility (before normalizing), you could make the expected utility add to whatever you want by changing the order. As such, my priors don’t allow that to happen.

This has two interesting side effects. First, given any finite amount of evidence, my posteriors would follow those same laws, and second, pascal’s mugging and the like are effectively impossible.

Edit: fixed utility example

Interesting. But it has been a while since I studied divergent series and the games that can be played with them. So more detail on your claim (“make the expected utility add to whatever you want by changing the order.”) would be appreciated.

It seems that you are adding one more axiom to the characterization of rationality (while at the same time removing the axiom that forces bounded utility.) Could you try to spell that new axiom out somewhat formally?

Expected utility can be thought of as an infinite sum. Specifically, sum(P(X_n)*U(X_n)). I’m assuming expected utility is unconditionally convergent.

Take the serieses 1+1/2+1/3+1/4+… and −1-1/2-1/2-1/4-… Both of those diverge. Pick an arbitrary number. Let’s say, 100. Now add until it’s above 100, subtract until it’s below, and repeat. It will now converge to 100. Because of this, 1-1+1/2-1/2+… is conditionally convergent.

Yes, I understand conditional and unconditional convergence. What I don’t understand is how you get conditional convergence from

I also do not understand how your “priors don’t allow that to happen”.

It almost seems that you are claiming to have unbounded utility but bounded

expectedutility. That is, no plausible sequence of events can make you confident that you will receive a big payoff, but you cannot completely rule it out.I just noticed, that should have been 2^-n chance of getting

2^n/nutility and a2^-nchance of getting −2^n/nAnyway, 2^-n*2^n/n = 1/n, so the expected utility from that possibility is 1/n, so you get an unbounded expected utility. Do it with negative too, and you get conditionally converging expected utility.

When I thought about it, I realized this seemed very similar to a standard hack used on people that we already rely on computers to defend us against. To be specific, it follows an incredibly similar framework to one of those Lottery/Nigerian 419 Scam emails.

Opening Narrative: Attempt to establish some level of trust and believability. Things with details tend to be more believable than things without details, although the conjunction fallacy can be tricky here. Present the target with two choices: (Hope they don’t realize it’s a false dichotomy)

Choice A: Send in a small amount of utility. (If Choice A is selected, repeat False dichotomy) Choice B: Allow a (fake) opportunity to acquire a large amount of (fake) utility to slip away.

Background: Make a MASSIVE number of attempts. even if the first 10,000 attempts fail, the cost of trying to hack is minimal compared to the rewards.

So to reduce it to a simpler problem, the first question seems to be, how do we create the best known spam filter we have right now?

And then the second question seems to be “How can we make a spam filter MUCH better than that?” to protect our Lovable Senile Billionaire Grandpa who has Nuclear Weapons and a tendency to believe everything, and who relies on emails for critical world altering decisions so a single false positive or false negative means terrible costs levels of epic spam filtering.

So to attempt to help that, I’ll try to list all of the Antispam tactics I can find, to at least help with the first part.

Bayesian Spam filtering: I was going to try to summarize this, but honestly, the Wikipedia article does a better job then I can: http://en.wikipedia.org/wiki/Bayesian_spam_filtering

Training Phase: The Training Phase for a Bayesian Spam Filter which needs to go live in a super hostile environment should be as long and thorough as it possibly can. You know how some places use validation and some use bounty testing? We should use both.

Sysadmin: Multiple someones need to begin by reviewing everything. Then, they need to continue by reviewing everything. There’s a built in Human tendency to ignore risks after you’ve been dealing with them for a while and nothing has happened. I don’t know what it’s called exactly, but presumably there are countermeasures in place at top secret type facilities that need extremely vigilant security guards at all times. We need to begin by doing those, and then again, validate and offer bounties to hackers while the system isn’t live.

Secrecy: Many explicit, open list of countermeasures can generally be planned around by a determined hacker. Hackers can’t plan for security measures they aren’t aware of. The secrecy ALSO needs to be validated and bounty tested.

At first, this does sound a bit contradictory (How do you do a open source test of “Secrecy”? But you’d want to do that first, before say, having the FAI develop it’s own spam filtering that the public shouldn’t know. Google has this problem sometimes where they battle with Search Engine Optimizers who are trying to fake having genuine good content when they are, in fact, irrelevant and trying to sell you on lies to make money (Much like our Muggers, really). We need to find out how additional ways to fight spamdexing as well. This has another good Wikipedia page: http://en.wikipedia.org/wiki/Spamdexing

The Bounty system is important because we want to take advantage of temporal discounting. People will frequently take small amounts of utility now over large amounts of utility later even to irrational levels, so we need to offer bribes so that the kinds of people who might try to trick the FAI later come to trick it during development while we would still be actively fixing problems and it didn’t have massive responsibilities.

From my personal experience coding, another good way to make sure your code is developed well enough to withstand all sorts of attacks and problems is to have an incredibly robust set of test data. Problems that can’t be seen with a single version and 50 records and 2 users often pop up across multiple versions and 1 million records and 70 users. So that as well, but more so.

A lot of this may be common knowledge already, but I thought listing everything I knew would be a good starting point for additional security measures.

Couldn’t I just believe with equal probability that you are a god and will do exactly the opposite?

You have independent knowledge of psychology, evolution, game theory, etc, (and hypotheses under which they matter, in which the being with vast power evolved naturally and then discovered that physics in its universe allowed unbounded acquisition of free energy to generate apparent worlds like ours, etc) it would be an extraordinary coincidence if those probabilities were really exactly equal (unless one were engaged in motivated cognition to avoid the issue by any means whatsoever; this not necessary, as no one can force you to act like you have an unbounded utility function if you don’t have one).

Both probabilities are well below the threshold at which I can distinguish them by any means I know of. Both are also below the probability that I should instead take Pascal’s wager and become a Christian instead so as to gain unbounded utility the old-fashioned way.

In fact, I don’t have an unbounded utility function anyway, so my money is safe.

Yeah, right. Anyone can attempt muggings this way. You don’t need bounded utility to defeat it, just a sense for bullshit.

Aye, but the interesting question is, how do you teach an AI to discount this, without it then concluding that there is no threat from FAI, asteroid collisions, nano-technology, and black swam catastrophes?

What went on in my head does not seem

terriblycomplicated:Hypothesis: he’s just

sayingthat—so I will send him my money!What other reason is there to believe him?

No reason.

Hypothesis promotion.

upvoted this and the parent because I think this is an issue that is useful to address. What goes on in your head NEVER seems terribly complicated. see: http://lesswrong.com/lw/no/how_an_algorithm_feels_from_inside/

The interestingness of the issue comes not from how a random person would commonsensically be able to resist the mugging, but in how you ACTUALLY define the mental traits that allow you to resist it. What you call a sense for bullshit is, more strictly a sense for claims that have extremely low probability. How does this sense work? How do would you go about defining it, and how would you go about making an AI with a sense for bullshit? This is CRUCIAL, and just saying that you won’t get fooled by this because of your sense for bullshit which seems internally simple doesn’t cut it.

Speaking personally, that just isn’t true.

You do that by solving the general-purpose inductive inference problem. Once you have a solution to that, the rest of the problem will gradually unravel.

Assuming you are a god I would assign a higher probability to the proposition that you are just testing me and will punish me with eternal damnation for my greed should I accept thus my expected utility is in fact infinitely negative.