# [Question] Counterfactual Mugging: Why should you pay?

Update: I believe that the Counterfactual Prisoner’s Dilemma which was discovered by Cousin_it and I independently is resolves the answer to this question

The LessWrong Wiki defines Counterfactual Mugging as follows:

Omega appears and says that it has just tossed a fair coin, and given that the coin came up tails, it decided to ask you to give it $100. Whatever you do in this situation, nothing else will happen differently in reality as a result. Naturally you don’t want to give up your$100. But Omega also tells you that if the coin came up heads instead of tails, it’d give you $10000, but only if you’d agree to give it$100 if the coin came up tails. Do you give Omega $100? I expect that most people would say that you should pay because a 50% chance of$10000 for $100 is an amazing deal according to expected value. I lean this way too, but it is harder to justify than you might think. After all, if you are being asked for$100, you know that the coin came up heads and you won’t receive the $10000. Sure this means that if the coin would have been heads then you wouldn’t have gained the$10000, but you know the coin wasn’t heads so you don’t lose anything. It’s important to emphasise: this doesn’t deny that if the coin had come up heads that this would have made you miss out on $10000. Instead, it claims that this point is irrelevant, so merely repeating the point again isn’t a valid counter-argument. You could argue that you would have pre-commited to paying if you had known about the situation ahead of time. True, but you didn’t pre-commit and you didn’t know about it ahead of time, so the burden is on you to justify why you should act as though you did. In Newcomb’s problem you want to have pre-committed and if you act as though you were pre-committed then you will find that you actually were pre-committed. However, here it is the opposite. Upon discovering that the coin came up tails, you want to act as though you were not pre-commited to pay and if you act that way, you will find that you actually were indeed not pre-commited. We could even channel Yudkowsky from Newcomb’s Problem and Regret of Rationality: “Rational agents should WIN… It is precisely the notion that Nature does not care about our algorithm, which frees us up to pursue the winning Way—without attachment to any particular ritual of cognition, apart from our belief that it wins. Every rule is up for grabs, except the rule of winning… Unreasonable? I am a rationalist: what do I care about being unreasonable? I don’t have to conform to a particular ritual of cognition. I don’t have to take only box B because I believe my choice affects the box, even though Omega has already left. I can just… take only box B.” You can just not pay the$100. (Vladimir Nesov makes this argument this exact same argument here).

Here’s another common reason, I’ve heard as described by Cousin_it: “I usually just think about which decision theory we’d want to program into an AI which might get copied, its source code inspected, etc. That lets you get past the basic stuff, like Newcomb’s Problem, and move on to more interesting things. Then you can see which intuitions can be transferred back to problems involving humans.”

That’s actually a very good point. It’s entirely possible that solving this problem doesn’t have any relevance to building AI. However, I want to note that: a) it’s possible that a counterfactual mugging situation could have been set up before an AI was built b) understanding this could help deconfuse what a decision is—we still don’t have a solution to logical counterfactuals c) this is probably a good exercise for learning to cut through philosophical confusion d) okay, I admit it, it’s kind of cool and I’d want an answer regardless of any potential application.

Or maybe you just directly care about counterfactual selves? But why? Do you really believe that counterfactuals are in the territory and not the map? So why care about that which isn’t real? Or even if they are real, why can’t we just imagine that you are an agent that doesn’t care about counterfactual selves? If we can imagine an agent that likes being hit on the head with a hammer, why can’t we manage that?

Then there’s the philosophical uncertainty approach. Even if there’s only a 150 chance of your analysis being wrong, then you should pay. This is great if you face the decision in real life, but not if you are trying to delve into the nature of decisions.

So given all of this, why should you pay?

• I’m most fond of the precommitment argument. You say:

You could argue that you would have pre-commited to paying if you had known about the situation ahead of time. True, but you didn’t pre-commit and you didn’t know about it ahead of time, so the burden is on you to justify why you should act as though you did. In Newcomb’s problem you want to have pre-committed and if you act as though you were pre-committed then you will find that you actually were pre-committed. However, here it is the opposite. Upon discovering that the coin came up tails, you want to act as though you were not pre-commited to pay and if you act that way, you will find that you actually were indeed not pre-commited.

I do not think this gets at the heart of the precommitment argument. You mention cousin_it’s argument that what we care about is what decision theory we’d prefer a benevolent AI to use. You grant that this makes sense for that case, but you seem skeptical that the same reasoning applies to humans. I argue that it does.

When reasoning abstractly about decision-making, I am (in part) thinking about how I would like myself to make decisions in the future. So it makes sense for me to say to myself, “Ah, I’d want to be counterfactually mugged.” I will count being-counterfactually-mugged as a point in favor of proposed ways of thinking about decisions; I will count not-being-mugged as a point against. This is not, in itself, a precommitment; this is just a heuristic about good and bad reasoning as it seems to me when thinking about it ahead of time. A generalization of this heuristic is, “Ah, it seems any case where a decision procedure would prefer to make a commitment ahead of time but would prefer to do something different in the moment is a point against that decision procedure”. I will, thinking about decision-making in the abstract as things seem to me now, tend to prefer decision procedures which avoid such self-contradictions.

In other words, thinking about what constitutes good decision-making in the abstract seems a whole lot like thinking about how we would want a benevolent AI to make decisions.

You could argue that I might think such things now, and might think up all sorts of sophisticated arguments which fit that picture, but later, when Omega asks me for $100, if I re-think my decision-theoretic concepts at that time, I’ll know better. But, based on what principles would I be reconsidering? I can think of some. It seems to me now, though, that those principles are mistaken, and I should instead reason using principles which are more self-consistent—principles which, when faced with the question of whether to give Omega$100, arrive at the same answer I currently think to be right.

Of course this cannot be a general argument that I prefer to reason by principles which will arrive at conclusions consistent with my current beliefs. What I can do is consider the impact which particular ways of reasoning about decisions have on my overall expected utility (assuming I start out reasoning with some version of expected utility theory). Doing so, I will prefer UDT-like ways of reasoning when it comes to problems like counterfactual mugging.

You might argue that beliefs are for true things, so I can’t legitimately discount ways-of-thinking just because they have bad consequences. But, these are ways-of-thinking-about-decisions. The point of ways-of-thinking-about-decisions is winning. And, as I think about it now, it seems preferable to think about it in those ways which reliably achieve higher expected utility (the expectation being taken from my perspective now).

Nor is this a quirk of my personal psychology, that I happen to find these arguments compelling in my current mental state, and so, when thinking about how to reason, prefer methods of reasoning which are more consistent with precommitments I would make. Rather, this seems like a fairly general fact about thinking beings who approach decision-making in a roughly expected-utility-like manner.

Perhaps you would argue, like the CDT-er sometimes does in response to Newcomb, that you cannot modify your approach to reasoning about decisions so radically. You see that, from your perspective now, it would be better if you reasoned in a way which made you accept future counterfactual muggings. You’d see, in the future, that you are making a choice inconsistent with your preferences now. But this only means that you have different preferences then and now. And anyway, the question of decision theory should be what to do given preferences, right?

You can take that perspective, but it seems you must do so regretfully—you should wish you could self-modify in that way. Furthermore, to the extent that a theory of preferences sits in the context of a theory of rational agency, it seems like preferences should be the kind of think which tend to stay the same over time, not the sort of thing which change like this.

Basically, it seems that assuming preferences remain fixed, beliefs about what you should do given those preferences and certain information should not change (except due to bounded rationality). IE: certainly I may think I should go to the grocery store but then change my mind when I learn it’s closed. But I should not start out thinking that I should go to the grocery store even in the hypothetical where it’s closed, and then, upon learning it’s closed, go home instead. (Except due to bounded rationality.) That’s what is happening with CDT in counterfactual mugging: it prefers that its future self should, if asked for $100, hand it over; but, when faced with the situation, it thinks it should not hand it over. The CDTer response (“alas, I cannot change my own nature so radically”) presumes that we have already figured out how to reason about decisions. I imagine that the real crux behind such a response is actually that CDT feels like the true answer, so that the non-CDT answer does not seem compelling even once it is established to have a higher expected value. The CDTer feels as if they’d have to lie to themselves to 1-box. The truth is that they could modify themselves so easily, if they thought the non-CDT answer was right! They protest that Newcomb’s problem simply punishes rationality. But this argument presumes that CDT defines rationality. An EDT agent who asks how best to act in future situations to maximize expected value in those situations will arrive back at EDT, since expected-value-in-the-situation is the very criterion which EDT already uses. However, this is a circular way of thinking—we can make a variant of that kind of argument which justifies any decision procedure. A CDT or EDT agent who asks itself how best to act in future situations to maximize expected value as estimated by its current self will arrive at UDT. Furthermore, that’s the criterion it seems an agent ought to use when weighing the pros and cons of a decision theory; not the expected value according to some future hypothetical, but the expected value of switching to that decision theory now. And, remember, it’s not the case that we will switch back to CDT/​EDT if we reconsider which decision theory is highest-expected-utility when we are later faced with Omega asking for$100. We’d be a UDT agent at that point, and so, would consider handing over the $100 to be the highest-EV action. I expect another protest at this point—that the question of which decision theory gets us the highest expected utility by our current estimation isn’t the same as which one is true or right. To this I respond that, if we ask what highly capable agents would do (“highly intelligent”/​”highly rational”), we would expect them to be counterfactually mugged—because highly capable agents would (by the assumption of their high capability) self-modify if necessary in order to behave in the ways they would have precommitted to behave. So, this kind of decision theory /​ rationality seems like the kind you’d want to study to better understand the behavior of highly capable agents; and, the kind you would want to imitate if trying to become highly capable. This seems like an interesting enough thing to study. If there is some other thing, “the right decision theory”, to study, I’m curious what that other thing is—but it does not seem likely to make me lose interest in this thing (the normative theory I currently call decision theory, in which it’s right to be counterfactually mugged). a) it’s possible that a counterfactual mugging situation could have been set up before an AI was built My perspective now already includes some amount of updateless reasoning, so I don’t necessarily find that compelling. However, I do agree that even according to UDT there’s a subjective question of how much information should be incorporated into the prior. So, for example, it seems sensible to refuse counterfactual mugging on the first digit of pi. Or maybe you just directly care about counterfactual selves? But why? Do you really believe that counterfactuals are in the territory and not the map? It seems worth pointing out that we might deal with this via anthropic reasoning. We don’t need to believe that the counterfactual selves literally exist; rather, we are unsure whether we are being simulated. If we are being simulated, then the other self (in a position to get$1000) really does exist.

Caveat ----

There are a few hedge-words and qualifiers in the above which the casual reader might underestimate the importance of. For example, when I say

(except due to bounded rationality)

I really mean that many parts of the argument I’m making crumbles to dust in the face of bounded rationality, not that bounded rationality is a small issue which I set aside for convenience in the argument above. Keep in mind that I’ve recently been arguing against UDT. However, I do still think it is right to be counterfactually mugged, for something resembling the reasons I gave. It’s just that many details of the argument I’m making really don’t work for embedded agents—to such a large extent that I’ve become pessimistic about UDT-like ideas.

• You grant that this makes sense for that case, but you seem skeptical that the same reasoning applies to humans

I ultimately don’t see much of a distinction between humans and AIs, but let me clarify. If we had the ability to perfectly pre-commit then we’d make pre-commitments that effectively would be the same as an AI self-modifying. Without this ability, this argument is slightly harder to make, but I think it still applies. I’ve attempted making it in the past although I don’t really feel I completely succeeded.

Ah, it seems any case where a decision procedure would prefer to make a commitment ahead of time but would prefer to do something different in the moment is a point against that decision procedure

I agree that it’s a point against the decision procedure, but isn’t necessarily conclusive. This could persuade someone to part with $100, but maybe not to allow themselves to be tortured. You might argue that beliefs are for true things, so I can’t legitimately discount ways-of-thinking just because they have bad consequences I actually agree with Elizier’s argument that winning is more important than abstract conventions of thought. It’s just it’s not always clear which option is winning. Indeed here, as I’ve argued, winning seems to match more directly to not paying and abstract conventions of thought to the arguments about the counterfactual. A CDT or EDT agent who asks itself how best to act in future situations to maximize expected value as estimated by its current self will arrive at UDT Yeah I’m not disputing pre-committing to UDT for future actions, the question is more difficult when it comes to past actions. One thought: even if you’re in a counterfactual mugging that was set up before you came into existence, before you learn about it you might have time to pre-commit to paying in any such situations. However, I do agree that even according to UDT there’s a subjective question of how much information should be incorporated into the prior Well, this is the part of the question I’m interested in. As I said, I have no objection to pre-committing to UDT for future actions If we are being simulated, then the other self (in a position to get$1000) really does exist

I’ve commented on this in the past and I still see this as imprecise reasoning. I think I should write a post addressing this directly soon

• I actually agree with Elizier’s argument that winning is more important than abstract conventions of thought. It’s just it’s not always clear which option is winning. Indeed here, as I’ve argued, winning seems to match more directly to not paying and abstract conventions of thought to the arguments about the counterfactual.

It seems to me as if you’re ignoring the general thrust of my position, which is that the notion of winning that’s important is the one we have in-hand when we are thinking about what decision procedure to use. This seems to strongly favor paying up in counterfactual mugging, except for some finite set of counterfactual muggings which we already know about at the time when we consider this.

Yeah I’m not disputing pre-committing to UDT for future actions, the question is more difficult when it comes to past actions. One thought: even if you’re in a counterfactual mugging that was set up before you came into existence, before you learn about it you might have time to pre-commit to paying in any such situations.

It seems right to focus on future actions, because those are the ones which our current thoughts about which decision theory to adopt will influence.

Well, this is the part of the question I’m interested in. As I said, I have no objection to pre-committing to UDT for future actions

So is it that we have the same position with respect to future counterfactual muggings, but you are trying to figure out how to deal with present ones?

I think that since no agent can be perfect from the start, we always have to imagine that an agent will make some mistakes before it gets on the right track. So if it refuses to be counterfactually mugged a few times before settling on a be-mugged strategy, we cannot exactly say that was rational or irrational; it depends on the prior. An agent might assent or refuse to pay up on a counterfactual mugging on the 5th digit of . We can’t absolutely call that right or wrong.

So, I think how an agent deals with a single counterfactual muggings is kind of its own business. It is only clear that it should not refuse mugging forever. (And if it refuses mugging for a really long time, this feels not so good, even if it would eventually start being mugged.)

• It seems to me as if you’re ignoring the general thrust of my position, which is that the notion of winning that’s important is the one we have in-hand when we are thinking about what decision procedure to use

Why can’t I use this argument for CDT in Newcomb’s?

It seems right to focus on future actions, because those are the ones which our current thoughts about which decision theory to adopt will influence.

What I meant to say instead of future actions is that it is clear that we should commit to UDT for future muggings, but less clear if the mugging was already set up.

I think that since no agent can be perfect from the start, we always have to imagine that an agent will make some mistakes before it gets on the right track

The agent should still be able to solve such scenarios given a sufficient amount of time to think and the necessary starting information. Such as reliable reports about what happened to others who encountered counterfactual muggers

• Why can’t I use this argument for CDT in Newcomb’s?

From my perspective right now, CDT does worse in Newcomb’s. So, considering between CDT and EDT as ways of thinking about Newcomb, EDT and other 1-boxing DTs are better.

What I meant to say instead of future actions is that it is clear that we should commit to UDT for future muggings, but less clear if the mugging was already set up.

Even UDT advises to not give in to muggings if it already knows, in its prior, that it is in the world where Omega asks for the $10. But you have to ask: who would be motivated to create such a UDT? Only “parents” who already knew the mugging outcome themselves, and weren’t motivated to act updatelessly about it. And where did they come from? At some point, more-rational agency comes from less-rational agency. In the model where a CDT agent self-modifies to become updateless, which counterfactual muggings the UDT agent will and won’t be mugged by gets baked in at that time. With evolved creatures, of course it is more complicated. I’m not sure, but it seems like our disagreement might be around the magnitude of this somehow. Like, I’m saying something along the lines of “Sure, you refuse some counterfactual muggings, but only finitely many. From the outside, that looks like making a finite number of mistakes and then learning.” While you’re saying something like, “Sure, you’d rather get counterfactually mugged for all future muggings, but it still seems like you want to take the one in front of you.” (So from my perspective you’re putting yourself in the shoes of an agent who hasn’t “learned better” yet.) The analogy is a little strained, but I am thinking about it like a Bayesian update. If you keep seeing things go a certain way, you eventually predict that. But that doesn’t make it irrational to hedge your bets for some time. So it can be rational in that sense to refuse some counterfactual muggings. But you should eventually take them. The agent should still be able to solve such scenarios given a sufficient amount of time to think and the necessary starting information. Such as reliable reports about what happened to others who encountered counterfactual muggers Basically, I don’t think that way of thinking completely holds when we’re dealing with logical uncertainty. A counterlogical mugging is a situation where time to think can, in a certain sense, hurt (if you fully update on that thinking, anyway). So there isn’t such a clear distinction between thinking-from-starting-information and learning from experience. • I’m not sure, but it seems like our disagreement might be around the magnitude of this somehow My interest is in the counterfactual mugging in front of you, as this is the hardest part to justify. Future muggings aren’t a difficult problem. Basically, I don’t think that way of thinking completely holds when we’re dealing with logical uncertainty. A counterlogical mugging is a situation where time to think can, in a certain sense, hurt (if you fully update on that thinking, anyway) Are you saying that it will pre-commit to something before it receives all the information? • My interest is in the counterfactual mugging in front of you, as this is the hardest part to justify. Future muggings aren’t a difficult problem. I’m not sure exactly what you’re getting at, though. Obviously counterfactual mugging in front of you is always the one that matters, in some sense. But if I’ve considered things ahead of time already when confronted with my very first counterfactual mugging, then I may have decided to handle counterfactual mugging by paying up in general. And further, there’s the classic argument that you should always consider what you would have committed to ahead of time. I’m kind of feeling like you’re ignoring those arguments, or something? Or they aren’t interesting for your real question? Basically I keep talking about how “yes you can refuse a finite number of muggings” because I’m trying to say that, sure, you don’t end up concluding you should accept every mugging, but generally the argument via treat-present-cases-as-if-they-were-future-cases seems pretty strong. And the response I’m hearing from you sounds like “but what about present cases?” • “Basically I keep talking about how “yes you can refuse a finite number of muggings”″ - considering I’m considering the case when you are only mugged once, that sounds an awful lot like saying it’s reasonable to choose not to pay. “But if I’ve considered things ahead of time”—a key part of counterfactual mugging is that you haven’t considered things ahead of time. I think it is important to engage with this aspect or explain why this doesn’t make sense. “And further, there’s the classic argument that you should always consider what you would have committed to ahead of time”—imagine instead of$50 it was your hand being cut off to save your life in the counterfactual. It’s going to be awfully tempting to keep your hand. Why is what you would have committed to, but didn’t relevant?

My goal is to understand versions that haven’t been watered down or simplified.

• considering I’m considering the case when you are only mugged once, that sounds an awful lot like saying it’s reasonable to choose not to pay.

The perspective I’m coming from is that you have to ask how you came to be in the epistemic situation you’re in. Setting agents up in decision problems “from nothing” doesn’t tell us much, if it doesn’t make sense for an agent to become confident that it’s in that situation.

An example of this is smoking lesion. I’ve written before about how the usual version doesn’t make very much sense as a situation that an agent can find itself in.

The best way to justify the usual “the agent finds itself in a decision problem” way of working is to have a learning-theoretic setup in which a learning agent can successfully learn that it’s in the scenario. Once we have that, it makes sense to think about the one-shot case, because we have a plausible story whereby an agent comes to believe it’s in the situation described.

This is especially important when trying to account for logical uncertainty, because now everything is learned—you can’t say a rational agent should be able to reason in a particular way, because the agent is still learning to reason.

If an agent is really in a pure one-shot case, that agent can do anything at all. Because it has not learned yet. So, yes, “it’s reasonable to choose not to pay”, BUT ALSO any behavior at all is reasonable in a one-shot scenario, because the agent hasn’t had a chance to learn yet.

This doesn’t necessarily mean you have to deal with an iterated counterfactual mugging. You can learn enough about the universe to be confident you’re now in a counterfactual mugging without ever having faced one before. But

a key part of counterfactual mugging is that you haven’t considered things ahead of time. I think it is important to engage with this aspect or explain why this doesn’t make sense.

This goes along with the idea that it’s unreasonable to consider agents as if they emerge spontaneously from a vacuum, face a single decision problem, and then disappear. An agent is evolved or built or something. This ahead-of-time work can’t be in principle distinguished from “thinking ahead”.

As I said above, this becomes especially clear if we’re trying to deal with logical uncertainty on top of everything else, because the agent is still learning to reason. The agent has to have experience reasoning about similar stuff in order to learn.

We can give a fresh logical inductor a bunch of time to think about one thing, but how it spends that time is by thinking about all sorts of other logical problems in order to train up its heuristic reasoning. This is why I said all games are iterated games in logical time—the logical inductor doesn’t literally play the game a bunch of times to learn, but it simulates a bunch of parallel-universe versions of itself who have played a bunch of very similar games, which is very similar.

imagine instead of $50 it was your hand being cut off to save your life in the counterfactual. It’s going to be awfully tempting to keep your hand. Why is what you would have committed to, but didn’t relevant? One way of appealing to human moral intuition (which I think is not vacuous) is to say, what if you know that someone is willing to risk great harm to save your life because they trust you the same, and you find yourself in a situation where you can sacrifice your own hand to prevent a fatal injury from happening to them? It’s a good deal; it could have been your life on the line. But really my justification is more the precommitment story. Decision theory should be reflectively endorsed decision theory. That’s what decision theory basically is: thinking we do ahead of time which is supposed to help us make decisions. I’m fine with imagining hypothetically that we haven’t thought about things ahead of time, as an exercise to help us better understand how to think. But that means my take-away from the exercise is based on which ways of thinking seemed to help get better outcomes, in the hypothetical situations envisioned! • If an agent is really in a pure one-shot case, that agent can do anything at all You can learn about a situation other than by facing that exact situation yourself. For example, you may observe other agents facing that situation or receive testimony from an agent that has proven itself trustworthy. You don’t even seem to disagree with me here as you wrote: “you can learn enough about the universe to be confident you’re now in a counterfactual mugging without ever having faced one before” “This goes along with the idea that it’s unreasonable to consider agents as if they emerge spontaneously from a vacuum, face a single decision problem, and then disappear”—I agree with this. I asked this question because I didn’t have a good model of how to conceptualise decision theory problems, although I think I have a clearer idea now that we’ve got the Counterfactual Prisoner’s Dilemma. One way of appealing to human moral intuition Doesn’t work on counter-factually selfish agents Decision theory should be reflectively endorsed decision theory. That’s what decision theory basically is: thinking we do ahead of time which is supposed to help us make decisions Thinking about decisions before you make them != thinking about decisions timelessly • You can learn about a situation other than by facing that exact situation yourself. For example, you may observe other agents facing that situation or receive testimony from an agent that has proven itself trustworthy. You don’t even seem to disagree with me here as you wrote: “you can learn enough about the universe to be confident you’re now in a counterfactual mugging without ever having faced one before” Right, I agree with you here. The argument is that we have to understand learning in the first place to be able to make these arguments, and iterated situations are the easiest setting to do that in. So if you’re imagining that an agent learns what situation it’s in more indirectly, but thinks about that situation differently than an agent who learned in an iterated setting, there’s a question of why that is. It’s more a priori plausible to me that a learning agent thinks about a problem by generalizing from similar situations it has been in, which I expect to act kind of like iteration. Or, as I mentioned re: all games are iterated games in logical time, the agent figures out how to handle a situation by generalizing from similar scenarios across logic. So any game we talk about is iterated in this sense. >One way of appealing to human moral intuition Doesn’t work on counter-factually selfish agents I disagree. Reciprocal altruism and true altruism are kind of hard to distinguish in human psychology, but I said “it’s a good deal” to point at the reciprocal-altruism intuition. The point being that acts of reciprocal altruism can be a good deal w/​o having considered them ahead of time. It’s perfectly possible to reason “it’s a good deal to lose my hand in this situation, because I’m trading it for getting my life saved in a different situation; one which hasn’t come about, but could have.” I kind of feel like you’re just repeatedly denying this line of reasoning. Yes, the situation in front of you is that you’re in the risk-hand world rather than the risk-life world. But this is just question-begging with respect to updateful reasoning. Why give priority to that way of thinking over the “but it could just as well have been my life at steak” world? Especially when we can see that the latter way of reasoning does better on average? >Decision theory should be reflectively endorsed decision theory. That’s what decision theory basically is: thinking we do ahead of time which is supposed to help us make decisions Thinking about decisions before you make them != thinking about decisions timelessly Ah, that’s kind of the first reply from you that’s surprised me in a bit. Can you say more about that? My feeling is that in this particular case the equality seems to hold. • The argument is that we have to understand learning in the first place to be able to make these arguments, and iterated situations are the easiest setting to do that in Iterated situations are indeed useful for understanding learning. But I’m trying to abstract out over the learning insofar as I can. I care that you get the information required for the problem, but not so much how you get it. Especially when we can see that the latter way of reasoning does better on average? The average includes worlds that you know you are not in. So this doesn’t help us justify taking these counterfactuals into account, indeed for us to care about the average we need to already have an independent reason to care about these counterfactuals. I kind of feel like you’re just repeatedly denying this line of reasoning. Yes, the situation in front of you is that you’re in the risk-hand world rather than the risk-life world. But this is just question-begging with respect to updateful reasoning. I’m not saying you should reason in this way. You should reason updatelessly. But in order to get to the point of finding the Counterfactual Prisoner’s Dilemma, while I consider a satisfactory justification, I had rigorously question every other solution until I found one which could withstand the questioning. This seems like a better solution as it is less dependent on tricky to evaluate philosophical claims. Ah, that’s kind of the first reply from you that’s surprised me in a bit Well, thinking about a decision after you make it won’t do you much good. So you’re pretty always thinking about decisions before you make them. But timelessness involves thinking about decision before you end up facing them. • Iterated situations are indeed useful for understanding learning. But I’m trying to abstract out over the learning insofar as I can. I care that you get the information required for the problem, but not so much how you get it. OK, but I don’t see how that addresses my argument. The average includes worlds that you know you are not in. So this doesn’t help us justify taking these counterfactuals into account, This is the exact same response again (ie the very kind of response I was talking about in my remark you’re responding to), where you beg the question of whether we should evaluate from an updateful perspective. Why is it problematic that we already know we are not in those worlds? Because you’re reasoning updatefully? My original top-level answer explained why I think this is a circular justification in a way that the updateless position isn’t. I’m not saying you should reason in this way. You should reason updatelessly. Ok. So what’s at steak in this discussion is the justification for updatelessness, not the whether of updatelessness. I still don’t get why you seem to dismiss my justification for updatelessness, though. All I’m understanding of your objection is a question-begging appeal to updatelful reasoning. • You feel that I’m begging the question. I guess I take only thinking about this counterfactual as the default position, as where an average person is likely to be starting from. And I was trying to see if I could find an argument strong enough to displace this. So I’ll freely admit I haven’t provided a first-principles argument for focusing just on this counterfactual. OK, but I don’t see how that addresses my argument. Your argument is that we need to look at iterated situations to understand learning. Sure, but that doesn’t mean that we have to interpret every problem in iterated form. If we need to understand learning better, we can look at a few iterated problems beforehand, rather than turning this one into an iterated problems. The average includes worlds that you know you are not in. So this doesn’t help us justify taking these counterfactuals into account, Let me explain more clearly why this is a circular argument: a) You want to show that we should take counterfactuals into account when making decisions b) You argue that this way of making decisions does better on average c) The average includes the very counterfactuals whose value is in question. So b depends on a already being proven ⇒ circular argument • Let me explain more clearly why this is a circular argument: a) You want to show that we should take counterfactuals into account when making decisions b) You argue that this way of making decisions does better on average c) The average includes the very counterfactuals whose value is in question. So b depends on a already being proven ⇒ circular argument That isn’t my argument though. My argument is that we ARE thinking ahead about counterfactual mugging right now, in considering the question. We are not misunderstanding something about the situation, or missing critical information. And from our perspective right now, we can see that agreeing to be mugged is the best strategy on average. We can see that if we update on the value of the coin flip being tails, we would change our mind about this. But the statement of the problem requires that there is also the possibility of heads. So it does not make sense to consider the tails scenario in isolation; that would be a different decision problem (one in which Omega asks us for$100 out of the blue with no other significant backstory).

So we (right now, considering how to reason about counterfactual muggings in the abstract) know that there are the two possibilities, with equal probability, and so the best strategy on average is to pay. So we see behaving updatefully as bad.

So my argument for considering the multiple possibilities is, the role of thinking about decision theory now is to help guide the actions of my future self.

You feel that I’m begging the question. I guess I take only thinking about this counterfactual as the default position, as where an average person is likely to be starting from. And I was trying to see if I could find an argument strong enough to displace this. So I’ll freely admit I haven’t provided a first-principles argument for focusing just on this counterfactual.

I think the average person is going to be thinking about things like duty, honor, and consistency which can serve some of the purpose of updatelessness. But sure, updateful reasoning is a natural kind of starting point, particularly coming from a background of modern economics or bayesian decision theory.

But my argument is compatible with that starting point, if you accept my “the role of thinking about decision theory now is to help guide future actions” line of thinking. In that case, starting from updateful assumptions now, decision-theoretic reasoning makes you think you should behave updatelessly in the future.

Whereas the assumption you seem to be using, in your objection to my line of reasoning, is “we should think of decision-theoretic problems however we think of problems now”. So if we start out an updateful agent, we would think about decision-theoretic problems and think “I should be updateful”. If we start out a CDT agent, then when we think about decision-theoretic problems we would conclude that you should reason causally. EDT agents would think about problems and conclude you should reason evidentially. And so on. That’s the reasoning I’m calling circular.

Of course an agent should reason about a problem using its best current understanding. But my claim is that when doing decision theory, the way that best understanding should be applied is to figure out what decision theory does best, not to figure out what my current decision theory already does. And when we think about problems like counterfactual mugging, the description of the problem requires that there’s both the possibility of heads and tails. So “best” means best overall, not just down the one branch.

If the act of doing decision theory were generally serving the purpose of aiding in making the current decision, then my argument would not make sense, and yours would. Current-me might want to tell the me in that universe to be more updateless about things, but alternate-me would not be interested in hearing it, because alternate-me wouldn’t be interested in thinking ahead in general, and the argument wouldn’t make any sense with respect to alternate-me’s current decision.

So my argument involves a fact about the world which I claim determines which of several ways to reason, and hence, is not circular.

• My argument is that we ARE thinking ahead about counterfactual mugging right now, in considering the question

When we think about counterfactual muggings, we naturally imagine the possibility of facing a counterfactual mugging in the future. I don’t dispute the value of pre-committing either to take a specific action or to acting updatelessly. However, instead of imagining a future mugging, we could also imagine a present mugging where we didn’t have time to make any pre-commitments. I don’t think it is immediately obvious that we should think updatelessly, instead I believe that it requires further justification.

The role of thinking about decision theory now is to help guide the actions of my future self

This is effectively an attempt at proof-by-definition

I think the average person is going to be thinking about things like duty, honor, and consistency which can serve some of the purpose of updatelessness. But sure, updateful reasoning is a natural kind of starting point, particularly coming from a background of modern economics or bayesian decision theory

If someone’s default is already updateless reasoning, then there’s no need for us to talk them into it. It’s only people with an updateful default that we need to convince (until recently I had an updateful default).

And when we think about problems like counterfactual mugging, the description of the problem requires that there’s both the possibility of heads and tails

It requires a counterfactual possibility, not an actual possibility. And a counterfactual possibility isn’t actual, it’s counter to the factual. So it’s not clear this has any relevance.

It looks like to me that you’re tripping yourself up with verbal arguments that aren’t at all obviously true. The reason why I believe that the Counterfactual Prisoner’s Dilemma is important is because it is a mathematical result that doesn’t require much in the way of assumptions. Sure, it still has to be interpreted, but it seems hard to find an interpretations that avoids the conclusion that the updateful perspective doesn’t quite succeed on its own terms.

• I’m new here. May I ask what’s the core difference between the UDT and the FDT? Also, which is better and why?

• Here is my understanding. I was not really involved in the events, so, take this with a grain of salt; it’s all third hand.

FDT was attempting to be an umbrella term for “MIRI-style decision theories”, ie decision theories which 1-box on Newcomb, cooperate in twin prisoner’s dilemma, accept counterfactual muggings, grapple with logical uncertainty rather than ignoring it, and don’t require free will (ie, can be implemented as deterministic algorithms without conceptual problems that the decision theory doesn’t provide the tools to handle). The two main alternatives which FDT was trying to be an umbrella term were UDT, and TDT (timeless decision theory). However, the FDT paper leaned far toward TDT ways of describing things—specifically, giving diagrams which look like causal models, and describing the decision procedure as making an intervention on the node corresponding to the ouput of the decision algorithm. This was too far from how Wei Dai envisions UDT. So FDT ended up being mostly a re-branding of TDT, but with less concrete detail (so FDT is an umbrella term for a family of TDT-like decision theories, but, not an umbrella large enough to encompass UDT).

I think of TDT and UDT as about equally capable, but only if TDT does anthropic reasoning. Otherwise, UDT is strictly more capable, because TDT will not pay in counterfactual mugging, because it updates on its observations.

FDT cannot be directly compared, because it is simply more vague than TDT.

• I find that the “you should pay” answer is confused and self-contradictory in its reasoning. Like in all the OO (Omniscient Omega) setups, you, the subject, have no freedom of choice as far as OO is concerned, you are just another deterministic automaton. So any “decision” you make to precommit to a certain action has already been predicted (or could have been predicted) by OO, including any influence exerted on your thought process by other people telling you about rationality and precommitment. To make it clearer, anyone telling you to one-box in the Newcomb’s problem in effect uses classical CDT (which advises two-boxing), because they assume that you have the freedom to make a decision in a setup where your decisions are predetermined. If that were so, two-boxing would make more sense, defying the OO infallibility assumption.

So, the whole reasoning advocating for one-boxing and for paying the mugger does not hold up to basic scrutiny. A self-consistent answer would be “you are a deterministic automaton, whatever you feel or think or pretend to decide is an artifact of the algorithm that runs you, so the question whether to pay is meaningless, you either will pay or will not, you have no control over it.”

Of course, this argument only applies to OO setups. In “reality” there are no OO that we know of, the freedom of choice debate is far from resolved, and if one assumes that we are not automatons whose actions are set in stone (or in the rules of quantum mechanics), then learning to make better decisions is not a futile exercise. One example is the twin prisoner dilemma, where the recommendation to cooperate with one’s twin is self-consistent.

• Newcomb’s paradox still works if Omega is not infallible, just right a substantial proportion of the time. Between the two extremes you have described, of free choice, unpredictable by Omega, and deterministic absence of choice, lies people’s real psychology.

Just what is my power to sever links of a causal graph that point towards me? If I am faced with a wily salesman, how shall I be sure of making my decision to buy or not by my own values, taking into account what is informative from the salesman, but uninfluenced by his dark arts? Do I even know what my own values are? Do I have values? When QRO (Quite Reliable Omega) faces me, and I choose one box or two, how can I tell whether I really made that decision?

Interactions between people are mostly Newcomb-like. People are always thinking about who the other person is and what they may be thinking, and aiming their words to produce desired results. It is neither easy nor impossible, but a difficult thing, to truly make a decision.

• Again, we seem to just have foundational disagreements here. Free will is one of those philosophical topics that I lost interest in a long time ago, so I’m happy to leave it to others to debate.

• It’s not philosophical in an OO setup, it’s experimentally testable, so you cannot simply ignore it.

• One way of experimenting with this would be to use simulable agents (such as RL agents). We could set up the version where Omega is perfectly infallible (simulate the agent 100% accurately, including any random bits) and watch what different decision procedures do in this situation.

So, we can set up OO situations in reality. If we did this, we could see agents both 1-boxing and 2-boxing. We would see 1-boxers get better outcomes. Furthermore, if we were designing agents for this task, designing them to 1-box would be a good strategy.

This seems to undermine your position that OO situations are self-contradictory (since we can implement them on computers), and that the advice to 1-box is meaningless. If we try to write a decision-making algorithm based on

“you are a deterministic automaton, whatever you feel or think or pretend to decide is an artifact of the algorithm that runs you, so the question whether to pay is meaningless, you either will pay or will not, you have no control over it.”

we would not have an easy time.

• Yes, we could definitely implement that!

If we did this, we could see agents both 1-boxing and 2-boxing. We would see 1-boxers get better outcomes. Furthermore, if we were designing agents for this task, designing them to 1-box would be a good strategy.

Absolutely! Sadly, you don’t get to design yourself. You come predesigned in this setup (from the OO point of view, who knows you better than you know yourself), so you either one-box or two-box.

This seems to undermine your position that OO situations are self-contradictory (since we can implement them on computers), and that the advice to 1-box is meaningless.

Quite the opposite. OO knows what you will decide before you even consider the question, so you cannot optimize for an interaction with OO.

If we try to write a decision-making algorithm based on
“you are a deterministic automaton, whatever you feel or think or pretend to decide is an artifact of the algorithm that runs you, so the question whether to pay is meaningless, you either will pay or will not, you have no control over it.”
we would not have an easy time.

Who are the ’we” in this setup? A world designer can sure create an NPC (which you are in this setup) to one-box. Can the NPC itself change their algorithm?

If you are a sufficiently smart NPC in the OO world, you will find that the only self-consistent approach is to act while knowing that you are just acting out your programming and that “decisions” are an illusion you cannot avoid.

Basically this comes down to whether you accept that, from the OO’s view, you are an NPC, or fight against this uncomfortable notion.

• Ok, this helps me understand your view better. But not completely. I don’t think there is such a big difference between the agent and the agent-designer.

Who are the ’we” in this setup? A world designer can sure create an NPC (which you are in this setup) to one-box. Can the NPC itself change their algorithm?

We (as humans) are (always) still figuring out how to make decisions. From our perspective, we are still inventing the decision algorithm. From OO’s perspective, we were always going to behave a certain way. But, this does not contradict our perspective; OO just knows more.

In the computer-programmed scenario, there is a chain of decision points:

we think of the idea → we start programming, and design various bots → the bots themselves learn (in the case of ML bots), which selects between various strategies → the strategies themselves perform some computation to select actions

In the OO case, it does not matter so much where in this chain a particular computation occurs (because omniscient omega can predict the entire chain equally well). So it might be that I implement a bit of reasoning when writing a bot; or it might be the learning algorithm that implements that bit of reasoning; or it might be the learned strategy.

Similarly, we have a chain which includes biological evolution, cultural innovation, our parents meeting, our conception, our upbringing, what we learn in school, what we think about at various points in our lives, leading up to this moment.

Who are the ’we” in this setup? A world designer can sure create an NPC (which you are in this setup) to one-box. Can the NPC itself change their algorithm?

I do not think there is a special point in the chain. Well—it’s true that different points in the chain have varying degrees of agency. But any point in the chain which is performing important computation “could”, from its perspective, do something differently, changing the remainder of the chain. So we, the bot designer, could design the bot differently (from our perspective when choosing how to design the bot). The bot’s learning algorithm could have selected a different strategy (from its perspective). And the strategy could have selected a different action.

Of course, from our perspective, it is a little difficult to imagine the learning algorithm selecting a different strategy, if we understand how the learning algorithm works. And it is fairly difficult to imagine the strategy selecting a different action, since it is going to be a relatively small computation. But this is the same way that OO would have difficulty thinking of us doing something different, since OO can predict exactly what we do and exactly how we arrive at our decision. The learning algorithm’s entire job is to select between different alternative strategies; it has to “think as if it has a choice”, or else it could not perform the computation it needs to perform. Similarly, the learned strategy has to select between different actions; if there is a significant computational problem being solved by doing this, it must be “thinking as if it had a choice” as well (though, granted, learned strategies are often more like lookup tables, in which case I would not say that).

This does not mean choice is an illusion at any point in the chain. Choice is precisely the computation which chooses between alternatives. The alternatives are an illusion, in that counterfactuals are subjective.

So that’s my view. I’m still confused about aspects of your view. Particularly, this:

If you are a sufficiently smart NPC in the OO world, you will find that the only self-consistent approach is to act while knowing that you are just acting out your programming and that “decisions” are an illusion you cannot avoid.

How is this consistent with your assertion that OO-problems are inconsistent because “you cannot optimize for interaction with an interaction with OO”? As you say, the NPC is forced to consider the “illusion” of choice—it is an illusion which cannot be avoided. Furthermore, this is due to the real situation which it actually finds itself in. (Or at least, the realistic scenario which we are imagining it is in.) So it seems to me it is faced with a real problem, which it actually has to solve; and, there are better and worse ways of approaching this problem (e.g., UDT-like thinking will tend to produce better results). So,

• The alternatives are fake (counterfactuals are subjective), but,

• The problem is real,

• The agent has to make a choice,

• There are better and worse ways of reasoning about that choice—we can see that agents who reason in one way or another do better/​worse,

• It helps to study better and worse ways of reasoning ahead of time (whether that’s by ML algorithms learning, or humans abstractly reasoning about decision theory).

So it seems to me that this is very much like any other sort of hypothetical problem which we can benefit from reasoning about ahead of time (e.g., “how to build bridges”). The alternatives are imaginary, but the problem is real, and we can benefit from considering how to approach it ahead of time (whether we’re human or sufficiently advanced NPC).

• I don’t think there is such a big difference between the agent and the agent-designer.

Hmm. Seems to me there is a crucial difference, the former is in scope for OO, the latter is not.

We (as humans) are (always) still figuring out how to make decisions. From our perspective, we are still inventing the decision algorithm. From OO’s perspective, we were always going to behave a certain way. But, this does not contradict our perspective; OO just knows more.

If you know that someone has predicted your behavior, then you accept that you are a deterministic algorithm, and the inventing of the decision algorithm is just a deterministic subroutine of it. I don’t think we disagree there. The future is set, you are relegated to learning about what it is, and to feel the illusion of inventing the decision algorithm and/​or acting on it. A self-consistent attitude in the OO setup is more like “I am just acting out my programming, and it feels like making decisions”.

we think of the idea → we start programming, and design various bots → the bots themselves learn (in the case of ML bots), which selects between various strategies → the strategies themselves perform some computation to select actions

Yes and no. “we” in this case are the agent designers, and the bots are agents acting out their programming, but we are neither OO, nor we are in the OO scope of predictability. In fact maybe there is no OO in that world, especially if the agent has access to quantum randomness or freebits, or is otherwise too hard to predict. That applies to complicated enough automata, like Alpha Zero.

Of course, from our perspective, it is a little difficult to imagine the learning algorithm selecting a different strategy, if we understand how the learning algorithm works. And it is fairly difficult to imagine the strategy selecting a different action, since it is going to be a relatively small computation.

Right, the more OO-like we are, the less agenty the algorithm feels to us.

The learning algorithm’s entire job is to select between different alternative strategies; it has to “think as if it has a choice”, or else it could not perform the computation it needs to perform.

Well. I am not sure that “it has to “think as if it has a choice”″. Thinking about having a choice seems like it requires an internal narrator, a degree of self-awareness. It is an open question whether an internal narrator necessarily emerges once the algorithm complexity is large enough. In fact, that would be an interesting open problem to work on, and if I were to do research in the area of agency and decision making, I would look into this as a project.

If an internal narrator is not required, then there is no thinking about choices, just following the programming that makes a decision. A bacteria following a sugar gradient probably doesn’t think about choices. Not sure what counts as thinking for a chess program and whether it has the quale of having a choice.

This does not mean choice is an illusion at any point in the chain. Choice is precisely the computation which chooses between alternatives. The alternatives are an illusion, in that counterfactuals are subjective.

Yes, action is a part of the computation, and sometimes we anthropomorphize this action as making a choice. The alternatives are an illusion indeed, and I am not sure what you mean by counterfactuals there, potential future choices, or paths not taken because they could never have been taken given the agent’s programming.

How is this consistent with your assertion that OO-problems are inconsistent because “you cannot optimize for interaction with an interaction with OO”? As you say, the NPC is forced to consider the “illusion” of choice—it is an illusion which cannot be avoided. Furthermore, this is due to the real situation which it actually finds itself in. (Or at least, the realistic scenario which we are imagining it is in.) So it seems to me it is faced with a real problem, which it actually has to solve; and, there are better and worse ways of approaching this problem (e.g., UDT-like thinking will tend to produce better results).

Yep, “it is faced with a real problem, which it actually has to solve; and, there are better and worse ways of approaching this problem”, and these “ways of approaching the problem” are coded by the agent designer, whether explicitly, or by making it create and apply a “decision theory” subroutine. Once the algorithm is locked in by the designer (who is out of scope for OO), in this world an OO already knows what decision theory the agent will discover and use.

TL;DR: the agent is in scope of OO, while the agent designer is out of scope and so potentially has the grounds of thinking of themselves as “making a (free) decision” without breaking self-consistency, while the agent has no such luxury. That’s the “special point in the chain”.

I am making no claims here whether in the “real world” we are more like agents or more like agent designers, since there are no OOs that we know of that could answer the question.

• Yep, “it is faced with a real problem, which it actually has to solve; and, there are better and worse ways of approaching this problem”, and these “ways of approaching the problem” are coded by the agent designer, whether explicitly, or by making it create and apply a “decision theory” subroutine. Once the algorithm is locked in by the designer (who is out of scope for OO), in this world an OO already knows what decision theory the agent will discover and use.
TL;DR: the agent is in scope of OO, while the agent designer is out of scope and so potentially has the grounds of thinking of themselves as “making a (free) decision” without breaking self-consistency, while the agent has no such luxury. That’s the “special point in the chain”.

What exactly does in-scope /​ out-of-scope mean? The OO has access to what the designer does (since the designer’s design is given to the OO), so for practical purposes, the OO is predicting the designer perfectly. Just not by simulating the OO. Seems like this is what is relevant in this case.

I am making no claims here whether in the “real world” we are more like agents or more like agent designers, since there are no OOs that we know of that could answer the question.

But you are making the claim that there is an objective distinction. It seems to me more like a subjective one: I can look at an algorithm from a number of perspectives; some of them will be more like OO (seeing it as “just an algorithm”), while others will regard the algorithm as an agent (unable to calculate exactly what the algorithm will do, they’re forced to take the intentional stance).

IE, for any agent you can imagine an OO for that agent, while you can also imagine a number of other perspectives. (Even if there are true-random bits involved in a decision, we can imagine an OO with access to those true-random bits. For quantum mechanics this might involve a violation of physics (e.g. no-cloning theorem), which is important in some sense, but doesn’t strike me as so philosophically important.)

I don’t know what it means for there to be a more objective distinction, unless it is the quantum randomness thing, in which case maybe we largely agree on questions aside from terminology.

Well. I am not sure that “it has to “think as if it has a choice”″. Thinking about having a choice seems like it requires an internal narrator, a degree of self-awareness. It is an open question whether an internal narrator necessarily emerges once the algorithm complexity is large enough. In fact, that would be an interesting open problem to work on, and if I were to do research in the area of agency and decision making, I would look into this as a project.
If an internal narrator is not required, then there is no thinking about choices, just following the programming that makes a decision. A bacteria following a sugar gradient probably doesn’t think about choices. Not sure what counts as thinking for a chess program and whether it has the quale of having a choice.

I want to distinguish “thinking about choices” from “awareness of thinking about choices” (which seems approximately like “thinking about thinking about choices”, though there’s probably more to it).

I am only saying that it is thinking about choices, ie computing relative merits of different choices, not that it is necessarily consciously aware of doing so, or that it has an internal narrator.

It “has a perspective” from which it has choices in that there is a describable epistemic position which it is in, not that it’s necessarily self-aware of being in that position in a significant sense.

If you know that someone has predicted your behavior, then you accept that you are a deterministic algorithm, and the inventing of the decision algorithm is just a deterministic subroutine of it. I don’t think we disagree there.

(correct)

The future is set, you are relegated to learning about what it is, and to feel the illusion of inventing the decision algorithm and/​or acting on it. A self-consistent attitude in the OO setup is more like “I am just acting out my programming, and it feels like making decisions”.

This seems to be where we disagree. It is not like there is a seperate bit of clockwork deterministically ticking away and eventually spitting out an answer, with “us” standing off to the side and eventually learning what decision was made. We are the computation which outputs the decision. Our hand is not forced. So it does not seem right to me to say that the making-of-decisions is only an illusion. If we did not think through the decisions, they would in fact not be made the same. So the thing-which-determines-the-decision is precisely such thinking. There is not a false perception about what hand is pulling the strings in this scenario; so what is the illusion?

• What exactly does in-scope /​ out-of-scope mean? The OO has access to what the designer does (since the designer’s design is given to the OO), so for practical purposes, the OO is predicting the designer perfectly.

I was definitely unclear there. What I meant is something like a (deterministic) computer game: the game desginer is outside the game, the agent is an NPC inside the game, and the OO is an entity with the access to the game engine. So the OO can predict the agent perfectly, but not whoever designed the agent’s algorithm. That’s the natural edge of the chain of predictability.

But you are making the claim that there is an objective distinction. It seems to me more like a subjective one: I can look at an algorithm from a number of perspectives; some of them will be more like OO (seeing it as “just an algorithm”), while others will regard the algorithm as an agent (unable to calculate exactly what the algorithm will do, they’re forced to take the intentional stance).

Yes, it’s more like Dennett’s intentional stance vs physical (or, in this case, algorithmic, since the universe’s physics is fully encoded in the algorithms). Definitely there are perspectives where one has to settle for the intentional stance (like the human game players do when dealing with high-level NPCs, because they are unable to calculate the NPC’s actions precisely). Whether this hypothetical game situation is isomorphic to the universe we live in is an open problem, and I do not make definite claims that it is.

I want to distinguish “thinking about choices” from “awareness of thinking about choices” (which seems approximately like “thinking about thinking about choices”, though there’s probably more to it).

It’s a good distinction, definitely. “Thinking about choices” is executing the decision making algorithm, including generating the algorithm itself. I was referring to thinking about the origin of both of those. It may or may not be what you are referring to.

This seems to be where we disagree. It is not like there is a seperate bit of clockwork deterministically ticking away and eventually spitting out an answer, with “us” standing off to the side and eventually learning what decision was made. We are the computation which outputs the decision. Our hand is not forced.

Yes, that’s where we differ, in the very last sentence. There is no separate bit of an algorithm, we (or, in this case, the agents in the setup) are the algorithm. Yes, we are the computation which outputs the decision. And that’s precisely why our hand is forced! There is no other output possible even if it feels like it is.

So it does not seem right to me to say that the making-of-decisions is only an illusion. If we did not think through the decisions, they would in fact not be made the same.

Looks like this is the crux of the disagreement. the agents have no option not to think through the decisions. Once the universe is set in motion, the agents will execute their algorithms, including thinking through the decisions, generating the relevant abstractions, including the decision theory, then executing the decision to pay or not pay the counterfactual mugger. “If we did not think through the decisions” in not an option in this universe, except potentially as a (useless) subroutine in the agent’s algorithm. The agent will do what it is destined to do, and, while the making-of-decisions is not an illusion, since the decision is eventually made, the potential to make a different decision is definitely an illusion, just like the potential to not think through the decisions.

So, a (more) self-consistent approach to “thinking about thinking” is “Let’s see what decision theory, if any, my algorithm will generate, and how it will apply it to the problem at hand.” I am not sure whether there is any value in this extra layer, or if there is anything that can be charitably called “value” in this setup from an outside perspective. Certainly the OO does not need the abstraction we call “value” to predict anything, they can just emulate (or analyze) the agent’s algorithm.

So, my original point that “Do you give Omega $100?” is not a meaningful question as stated, since it assumes you have a choice in the matter. You can phrase the question differently, and more profitably, as “Do you think that you are the sort of agent who gives Omega$100?” or “Which agents gain more expected value in this setup?” There is no freedom to “self-modify” to be an agent that pays or doesn’t pay. You are one of the two, you just don’t yet know which. Best you can do is try to discover it ahead of time.

• Sounds testable in theory, but not in practise

• The test is the fact that OOs exist in that universe.

• Just ask which algorithm wins then. At least in these kinds of situations udt does better. The only downside is the algorithm has to check if it’s in this kind of situation; it might not be worth practicing.

• If you are in this situation you have the practical reality that paying the $100 loses you$100 and a theoretical argument that you should pay anyway. If you apply “just ask which algorithm wins” and you mean the practical reality of the situation described, then you wouldn’t choose UDT. If you instead take “just ask which algorithm wins” to mean setting up an empirical experiment, then you’d have to decide whether to consider all agents who encounter the coin flip, or only those who see a tails, at which point there is no need to run the experiment. If you instead are proposing figuring out which algorithm wins according to theory, then that’s a bit of a tautology as that’s what I’m already trying to do.

• Is it forbidden to ask about Quantum Mechanics and Decission Theory? I got banned with the other account and I don’t understand why. It was a serious question.

• Hey, moderator here. The reason for banning your previous account was mostly just that we get a lot of quantum-theory crackpots, and your post had a lot of markings of someone in that reference class. The posts you wrote on this account seem a bit better, though the use of multiple punctuation marks in a row, and a somewhat unclear structure still make me hesitant. I will approve one of your two posts for now, and we will see how it goes.

Sorry for putting you under this additional scrutiny, but we get enough people who are really confused about various aspects of quantum mechanics and want to tell anyone about their opinions that we need to have somewhat high barriers for entry in that domain.

• Sure, it is fine!! I imagine that is a high problem. As a physicist, although not someone in quantum mechanics, I tried to be precise.

Anyway, your answer has been good. It seems as the paper has been debunked.

• why can’t we just imagine that you are an agent that doesn’t care about counterfactual selves?

Caring about counterfactual selves is part of UDT, though. If you simply assume that it doesn’t hold, and ask proponents of UDT to argue under that assumption, I’m not sure there’s a good answer.

• Interesting. Do you taken caring about counterfactual selves as foundational—in the sense that there is no why, you either do or do not?

• No, not like that. I think there is an argument for caring about counterfactual selves. But it cannot be carried out from the assumption that the agent doesn’t care about counterfactual selves. You’re just asking me to do something impossible.

• I guess my argument is based on imagining at the start that agents either can care about counterfactual selves or not. But agents that don’t are a bit controversial, so let’s imagine such an agent and see if we run into any issues. So imagine a consistent agent that doesn’t care about counterfactual selves except insofar as they “could be it” from its current epistemic position. I can’t see any issues with this—it seems consistent. And my challenge is for you to answer why this isn’t a valid set of values to have.

• Let’s imagine a kind of symmetric counterfactual mugging. In case of heads, Omega says: “The coin came up heads, now you can either give me $100 or refuse. After that, I’ll give you$10000 if you would’ve given me $100 in case of tails”. In case of tails, Omega says the same thing, but with heads and tails reversed. In this situation, an agent who doesn’t care about counterfactual selves always gets 0 regardless of the coin, while an agent who does care always gets$9900 regardless of the coin.

I can’t think of any situation where the opposite happens (the non-caring agent gets more with certainty). To me that suggests the caring agent is more rational.

• Yeah, I actually stumbled upon this argument myself this morning. Has anyone written this up beyond this comment as this seems like the most persuasive argument for paying? This suggests that never caring is not a viable position.

I was thinking today about whether there are any intermediate positions, but I don’t think they are viable. Only caring about counterfactuals when you have a prisoner’s dilemma-like situation seems an unprincipled fudge.

• Yeah. I don’t remember seeing this argument before, it just came to my mind today.

• Do you think you’ll write a post on it? Because I was thinking of writing a post, but if you were planning on doing this then that would be even better as it would probably get more attention.

• No, wasn’t planning. Go ahead and write the post, and maybe link to my comment as independent discovery.

• Of course

• In this situation, an agent who doesn’t care about counterfactual selves always gets 0 regardless of the coin

Since the agent is very correlated with its counterfactual copy, it seems that superrationality (or even just EDT) would make the agent pay $100, and get the$10000.

• Actually, the counterfactual agent makes a different observation (heads instead of tails) so their actions aren’t necessarily linked

• I just thought of another argument. Imagine that before being faced with counterfactual mugging, the agent can make a side bet on Omega’s coin. Let’s say the agent who doesn’t care about counterfactual selves chooses to bet X dollars on heads, so the income is X in case of heads and -X in case of tails. Then the agent who cares about counterfactual selves can bet X-5050 on heads (or if that’s negative, bet 5050-X on tails). Since this agent agrees to pay Omega, the income will be 10000+X-5050=4950+X in case of heads, and 5050-X-100=4950-X in case of tails. So in both cases the caring agent gets 4950 dollars more than the non-caring agent. And the opposite is impossible: no matter how the two agents bet, the caring agent always gets more in at least one of the cases.

• “Imagine that before being faced with counterfactual mugging, the agent can make a side bet on Omega’s coin”—I don’t know if that works. Part of counterfactual mugging is that you aren’t told before the problem that you might be mugged, otherwise you could just pre-commit.

• If Omega didn’t know the outcome of the flip in advance (and is telling the truth), then you should pay if 1/​2*U(x+$10,000)+1/​2*U(x-100) > U(x). You could also tell Omega that the bet is riskier than you would have agreed to, but you would have been fine with winning$1,000 if you won, and paying $10 if you lost. (This doesn’t work with anyone other than Omega though—Omega can predict what you’d agree to, and give you$1000 if you win, and ask for $10 if you lose. This would also have to be consistent with you paying the$10 though.)

• Good point about risk also being a factor, but just the point in question isn’t how to perform an expected utility calculation, but the justification of it

So, would you pay if the agreement was made, not cleanly ‘in the past’, but time travel was involved?

• No. I don’t know the accuracy of the prediction. It’s just that I already know the result of the coin flip.

• Is there a way to prove that the coin toss was fair? In a broader (math/​physics) sense, is it possible to prove that a historical event with a known outcome ‘was’ the result of a random process using only the observation of the outcome?

• What do you mean with random?

• In the event of the ‘many worlds’ theory being true, there should exist a world where the coin flipped in the other direction, and ‘parallel me’ has been gifted 100,000.

If parallel me were to call me on my many worlds quantum iphone (this is my hypothetical, I get to have one), and confirm that he is calling from the universe where the coin went the other way, and he did in fact get paid, presumably contingent on me paying the person in front of me, I would probably pay.

Now, if I dial my many worlds quantum phone, and get an operator error, that means no parallel universe where parallel me won exists, and the ‘coin flip’ either did not happen, or was actually a predetermined event designed to win my mugger \$100, in which case, I should not pay, and should probably clobber him on general principle.

Without the use of a hypothetical ‘many worlds quantum iphone’, is there a way to observe a coin laying on the ground displaying a ‘heads’ and prove that the coin was flipped (and therefore had the opportunity to be tails) vs was intentionally placed with the heads facing up.

• Replace the coin with a suitable quantum random number generator.