# Caspar42(Caspar Oesterheld)

Karma: 278
• Sorry for taking some time to reply!

>You might wonder why am I spouting a bunch of wrong things in an unsuccessful attempt to attack your paper.

Nah, I’m a frequent spouter of wrong things myself, so I’m not too surprised when other people make errors, especially when the stakes are low, etc.

Re 1,2: I guess a lot of this comes down to convention. People have found that one can productively discuss these things without always giving the formal models (in part because people in the field know how to translate everything into formal models). That said, if you want mathematical models of CDT and Newcomb-like decision problems, you can check the Savage or Jeffrey-Bolker formalizations. See, for example, the first few chapters of Arif Ahmed’s book, “Evidence, Decision and Causality”. Similarly, people in decision theory (and game theory) usually don’t specify what is common knowledge, because usually it is assumed (implicitly) that the entire problem description is common knowledge /​ known to the agent (Buyer). (Since this is decision and not game theory, it’s not quite clear what “common knowledge” means. But presumably to achieve 75% accuracy on the prediction, the seller needs to know that the buyer understands the problem...)

3: Yeah, *there exist* agent models under which everything becomes inconsistent, though IMO this just shows these agent models to be unimplementable. For example, take the problem description from my previous reply (where Seller just runs an exact copy of Buyer’s source code). Now assume that Buyer knows his source code and is logically omniscient. Then Buyer knows what his source code chooses and therefore knows the option that Seller is 75% likely to predict. So he will take the other option. But of course, this is a contradiction. As you’ll know, this is a pretty typical logical paradox of self-reference. But to me it just says that this logical omniscience assumption about the buyer is implausible and that we should consider agents who aren’t logically omniscient. Fortunately, CDT doesn’t assume knowledge of its own source code and such.

Perhaps one thing to help sell the plausibility of this working: For the purpose of the paper, the assumption that Buyer uses CDT in this scenario is pretty weak, formally simple and doesn’t have much to do with logic. It just says that the Buyer assigns some probability distribution over box states (i.e., some distribution over the mutually exclusive and collectively exhaustive s1=”money only in box 1“, s2= “money only in box 2”, s3=”money in both boxes”); and that given such distribution, Buyer takes an action that maximizes (causal) expected utility. So you could forget agents for a second and just prove the formal claim that for all probability distributions over three states s1, s2, s3, it is for i=1 or i=2 (or both) the case that
(P(si)+P(s3))*$3 -$1 > 0.
I assume you don’t find this strange/​risky in terms of contradictions, but mathematically speaking, nothing more is really going on in the basic scenario.

The idea is that everyone agrees (hopefully) that orthodox CDT satisfies the assumption. (I.e., assigns some unconditional distribution, etc.) Of course, many CDTers would claim that CDT satisfies some *additional* assumptions, such as the probabilities being calibrated or “correct” in some other sense. But of course, if “A=>B”, then “A and C ⇒ B”. So adding assumptions cannot help the CDTer avoid the loss of money conclusion if they also accept the more basic assumptions. Of course, *some* added assumptions lead to contradictions. But that just means that they cannot be satisfied in the circumstances of this scenario if the more basic assumption is satisfied and if the premises of the Adversarial Offer help. So they would have to either adopt some non-orthodox CDT that doesn’t satisfy the basic assumption or require that their agents cannot be copied/​predicted. (Both of which I also discuss in the paper.)

>you assume that Buyer knows the probabilities that Seller assigned to Buyer’s actions.

No, if this were the case, then I think you would indeed get contradictions, as you outline. So Buyer does *not* know what Seller’s prediction is. (He only knows her prediction is 75% accurate.) If Buyer uses CDT, then of course he assigns some (unconditional) probabilities to what the predictions are, but of course the problem description implies that these predictions aren’t particularly good. (Like: if he assigns 90% to the money in box 1, then it immediately follows that *no* money is in box 1.)

• As I mentioned elsewhere, I don’t really understand...

>I think (1) is a poor formalization, because the game tree becomes unreasonably huge

What game tree? Why represent these decision problems as any kind of trees or game trees in particular? At least some problems of this type can be represented efficiently, using various methods to represent functions on the unit simplex (including decision trees)… Also: Is this decision-theoretically relevant? That is, are you saying, a good decision theory doesn’t have to deal with 1 because it is cumbersome to write out (some) problems of this type? But *why* is this decision-theoretically relevant?

>some strategies of the predictor (like “fill the box unless the probability of two-boxing is exactly 1”) leave no optimal strategy for the player.

Well, there are less radical ways of addressing this. E.g., expected utility-type theories just assign a preference order to the set of available actions. We could be content with that and accept that in some cases, there is no optimal action. As long as our decision theory ranks the available options in the right order… Or we could restrict attention to problems where an optimal strategy exists despite this dependence.

>And (3) seems like a poor formalization because it makes the predictor work too hard. Now it must predict all possible sources of randomness you might use, not just your internal decision-making.

For this reason, I always assume that predictors in my Newcomb-like problems are compensated appropriately and don’t work on weekends! Seriously, though: what does “too hard” mean here? Is this just the point that it is in practice easy to construct agents that cannot be realistically predicted in this way when they don’t want to be predicted? If so: I find that at least somewhat convincing, though I’d still be interested in developing theory that doesn’t hinge on this ability.

• On the more philosophical points. My position is perhaps similar to Daniel K’s. But anyway...

Of course, I agree that problems that punish the agent for using a particular theory (or using float multiplication or feeling a little wistful or stuff like that) are “unfair”/​”don’t lead to interesting theory”. (Perhaps more precisely, I don’t think our theory needs to give algorithms that perform optimally in such problems in the way I want my theory to “perform optimally” Newcomb’s problem. Maybe we should still expect our theory to say something about them, in the way that causal decision theorists feel like CDT has interesting/​important/​correct things to say about Newcomb’s problem, despite Newcomb’s problem being designed to (unfairly, as they allege) reward non-CDT agents.)

But I don’t think these are particularly similar to problems with predictions of the agent’s distribution over actions. The distribution over actions is behavioral, whereas performing floating point operations or whatever is not. When randomization is allowed, the subject of your choice is which distribution over actions you play. So to me, which distribution over actions you choose in a problem with randomization allowed, is just like the question of which action you take when randomization is not allowed. (Of course, if you randomize to determine which action’s expected utility to calculate first, but this doesn’t affect what you do in the end, then I’m fine with not allowing this to affect your utility, because it isn’t behavioral.)

I also don’t think this leads to uninteresting decision theory. But I don’t know how to argue for this here, other than by saying that CDT, EDT, UDT, etc. don’t really care whether they choose from/​rank a set of distributions or a set of three discrete actions. I think ratificationism-type concepts are the only ones that break when allowing discontinuous dependence on the chosen distribution and I don’t find these very plausible anyway.

To be honest, I don’t understand the arguments against predicting distributions and predicting actions that you give in that post. I’ll write a comment on this to that post.

>Can your argument be extended to this case?

No, I don’t think so. Take the class of problems. The agent can pick any distribution over actions. The final payoff is determined only as a function of the implemented action and some finite number of samples generated by Omega from that distribution. Note that the expectation is continuous in the distribution chosen. It can therefore be shown (using e.g. Kakutani’s fixed-point theorem) that there is always at least one ratifiable distribution. See Theorem 3 at https://​​users.cs.duke.edu/​​~ocaspar/​​NDPRL.pdf .

(Note that the above is assuming the agent maximizes expected vNM utility. If, e.g., the agent maximizes some lexical utility function, then the predictor can just take, say, two samples and if they differ use a punishment that is of a higher lexicality than the other rewards in the problem.)

• Note that while people on this forum mostly reject orthodox, two-boxing CDT, many academic philosophers favor CDT. I doubt that they would view this problem as out of CDT’s scope, since it’s pretty similar to Newcomb’s problem.

How does this CDT agent reconcile a belief that the seller’s prediction likelihood is different from the buyer’s success likelihood?

Good question!

• I agree with both of Daniel Kokotajlo’s points (both of which we also make in the paper in Sections IV.1 and IV.2): Certainly for humans it’s normal to not be able to randomize; and even if it was a primarily hypothetical situation without any obvious practical application, I’d still be interested in knowing how to deal with the absence of the ability to randomize.

Besides, as noted in my other comment insisting on the ability to randomize doesn’t get you that far (cf. Sections IV.1 and IV.4 on Ratificationism): even if you always have access to some nuclear decay noise channel, your choice of whether to consult that channel (or of whether to factor the noise into your decision) is still deterministic. So you can set up scenarios where if you are punished for randomizing. In the particular case of the Adversarial Offer, the seller might remove all money from both boxes if she predicts the buyer to randomize.

The reason why our main scenario just assumes that randomization isn’t possible is that our target of attack in this paper is primarily CDT, which is fine with not being allowed to randomize.

• I think some people may have their pet theories which they call CDT and which require randomization. But CDT as it is usually/​traditionally described doesn’t ever insist on randomizing (unless randomizing has a positive causal effect). In this particular case, even if a randomization device were made available, CDT would either uniquely favor one of the boxes or be indifferent between all distributions over . Compare Section IV.1 of the paper.

What you’re referring to are probably so-called ratificationist variants of CDT. These would indeed require randomizing 50-50 between the two boxes. But one can easily construct scenarios which trip these theories up. For example, the seller could put no money in any box if she predicts that the buyer will randomize. Then no distribution is ratifiable. See Section IV.4 for a discussion of Ratificationism.

• Yeah, basically standard game theory doesn’t really have anything to say about the scenarios of the paper, because they don’t fit the usual game-theoretical models.

By the way, the paper has some discussion of what happens if you insist on having access to an unpredictable randomization device, see Sections IV.1 and the discussion of Ratificationism in Section IV.4. (The latter may be of particular interest because Ratificationism is somewhat similar to Nash equilibrium. Unfortunately, the section doesn’t explain Ratificationism in detail.)

• >I think information “seller’s prediction is accurate with probability 0,75” is supposed to be common knowledge.

Yes, correct!

>Is it even possible for a non-trivial probabilistic prediction to be a common knowledge? Like, not as in some real-life situation, but as in this condition not being logical contradiction? I am not a specialist on this subject, but it looks like a logical contradiction. And you can prove absolutely anything if your premise contains contradiction.

Why would it be a logical contradiction? Do you think Newcomb’s problem also requires a logical contradiction? Note that in neither of these cases does the predictor tell the agent the result of a prediction about the agent.

>What kinds of mistakes does seller make?

For the purpose of the paper it doesn’t really matter what beliefs anyone has about how the errors are distributed. But you could imagine that the buyer is some piece of computer code and that the seller has an identical copy of that code. To make a prediction, the seller runs the code. Then she flips a coin twice. If the coin does not come up Tails twice, she just uses that prediction and fills the boxes accordingly. If the coin does come up Tails twice, she uses a third coin flip to determine whether to (falsely) predict one of the two other options that the agent can choose from. And then you get the 0.75, 0.125, 0.125 distribution you describe. And you could assume that this is common knowledge.

Of course, for the exact CDT expected utilities, it does matter how the errors are distributed. If the errors are primarily “None” predictions, then the boxes should be expected to contain more money and the CDT expected utilities of buying will be higher. But for the exploitation scheme, it’s enough to show that the CDT expected utilities of buying are strictly positive.

>When you write “$1−P (money in Bi | buyer chooses Bi ) ·$3 = $1 − 0.25 ·$3 = $0.25.”, you assume that P(money in Bi | buyer chooses Bi )=0.75. I assume you mean that I assume P(money in Bi | buyer chooses Bi )=0.25? Yes, I assume this, although really I assume that the seller’s prediction is accurate with probability 0.75 and that she fills the boxes according to the specified procedure. From this, it then follows that P(money in Bi | buyer chooses Bi )=0.25. >That is, if buyer chooses the first box, seller can’t possibly think that buyer will choose none of the boxes. I don’t assume this /​ I don’t see how this would follow from anything I assume. Remember that if the seller predicts the buyer to choose no box, both boxes will be filled. So even if all false predictions would be “None” predictions (when the buyer buys a box), then it would still be P(money in Bi | buyer chooses Bi )=0.25. • >Then the CDTheorist reasons: >(1-0.75) = .25 >.25*3 = .75 >.75 − 1 = -.25 >‘Therefore I should not buy a box—I expect to lose (expected) money by doing so.’ Well, that’s not how CDT as it is typically specified reasons about this decision. The expected value 0.25*3=0.75 is the EDT expected amount of money in box for both and . That is, it is the expected content of box , conditional on taking . But when CDT assigns an expected utility to taking box it doesn’t condition on taking . Instead, because it cannot causally affect how much money is in box , it uses its unconditional estimate of how much is in box . As I outlined in the post, this must be at least . • >If I win I get$6. If I lose, I get $5. I assume you meant to write: “If I lose, I lose$5.”

Yes, these are basically equivalent. (I even mention rock-paper-scissors bots in a footnote.)

# Ex­tract­ing Money from Causal De­ci­sion Theorists

28 Jan 2021 17:58 UTC
20 points
(doi.org)
• Apologies, I only saw your comment just now! Yes, I agree, CDT never strictly prefers randomizing. So there are agents who abide by CDT and never randomize. As our scenarios show, these agents are exploitable. However, there could also be CDT agents who, when indifferent between some set of actions (and when randomization is not associated with any cost), do randomize (and choose the probability according to some additional theory—for example, you could have the decision procedure: “follow CDT, but when indifferent between multiple actions, choose a distribution over these actions that is ratifiable”.). The updated version of our paper—which has now been published Open Access in The Philosophical Quarterly—actually contains some extra discussion of this in Section IV.1, starting with the paragraph “Nonetheless, what happens if we grant the buyer in Adversarial Offer access to a randomisation device...”.

• Sorry for taking an eternity to reply (again).

On the first point: Good point! I’ve now finally fixed the SSA probabilities so that they sum up to 1, which really they should, to really have a version of EDT.

>prevents coordination between agents making different observations.

Yeah, coordination between different observations is definitely not optimal in this case. But I don’t see an EDT way of doing it well. After all, there are cases where given one observation, you prefer one policy and given another observation you favor another policy. So I think you need the ex ante perspective to get consistent preferences over entire policies.

>(Oh, I ignored the splitting up of probabilities of trajectories into SSA probabilities and then adding them back up again, which may have some intuitive appeal but ends up being just a null operation. Does anyone see a significance to that part?)

The only significance is to get a version of EDT, which we would traditionally assume to have self-locating beliefs. From a purely mathematical point of view, I think it’s nonsense.

• Not super important but maybe worth mentioning in the context of generalizing Pavlov: the strategy Pavlov for the iterated PD can be seen as an extremely shortsighted version of the law of effect, which basically says: repeat actions that have worked well in the past (in similar situations). Of course, the LoE can be applied in a wide range of settings. For example, in their reinforcement learning textbook, Sutton and Barto write that LoE underlies all of (model-free) RL.

• > I tried to understand Caspar’s EDT+SSA but was unable to figure it out. Can someone show how to apply it to an example like the AMD to help illustrate it?

Sorry about that! I’ll try to explain it some more. Let’s take the original AMD. Here, the agent only faces a single type of choice—whether to EXIT or CONTINUE. Hence, in place of a policy we can just condition on when computing our SSA probabilities. Now, when using EDT+SSA, we assign probabilities to being a specific instance in a specific possible history of the world. For example, we assign probabilities of the form , which denotes the probability that given I choose to CONTINUE with probability , history (a.k.a. CONTINUE, EXIT) is actual and that I am the instance intersection (i.e., the first intersection). Since we’re using SSA, these probabilities are computed as follows:

That is, we first compute the probability that the history itself is actual (given ). Then we multiply it by the probability that within that history I am the instance at , which is just 1 divided by the number of instances of myself in that history, i.e. 2.

Now, the expected value according to EDT + SSA given can be computed by just summing over all possible situations, i.e. over all combinations of a history and a position within that history and multiplying the probability of that situation with the utility given that situation:

And that’s exactly the ex ante expected value (or UDT-expected value, I suppose) of continuing with probability . Hence, EDT+SSA’s recommendation in AMD is the ex ante optimal policy (or UDT’s recommendation, I suppose). This realization is not original to myself (though I came up with it independently in collaboration with Johannes Treutlein) -- the following papers make the same point:

My comment generalizes these results a bit to include cases in which the agent faces multiple different decisions.

• Caspar Oesterheld is working on similar ideas.

For anyone who’s interested, Abram here refers to my work with Vincent Conitzer which we write about here.

ETA: This work has now been published in The Philosophical Quarterly.