# Diffractor

What I mean is, the players hit each other up, are like “yo, let’s decide on which ROSE point in the object-level game we’re heading towards”

Of course, they don’t manage to settle on what equilibrium to go for in the resulting bargaining game, because, again, multiple ROSE points might show up in the bargaining game.*But*, the ROSE points in the bargaining game are*in a restricted enough zone*(what with that whole “must be better than the random-dictator point” thing) to seriously constrain the possibilities in the object-level game. “Worst ROSE point for Alice in the object-level-game” is a whole lot worse for her than “Worst ROSE point for Alice in the bargaining game about what to do in the object-level-game”.

So, the players should be able to ratchet up their disagreement point and go “even if the next round of bargaining fails, at least we can promise that everyone does this well, right? Sure, everyone’s going for their own idea of fairness, but even if Alice ends up with her worst ROSE point in the bargaining game, her utility is going to be at least*this*high, and similar for everyone else.”

And so, each step of bargaining that successfully happens ratchets up the disagreement point closer to the Pareto frontier, in a way that should quickly converge. If someone disagrees on step 3, then the step-3 disagreement point gets played, which isn’t*that*short of the Pareto frontier. And if someone doesn’t have time for all this bargaining, they can just break things off at step 10 or something, that’s just about as good as going all the way to infinity.

Or at least, it*should*work like this. I haven’t proved that it*does*, and it depends on things like “what does ROSE bargaining look like for n players” and “does the random-dictator-point-dominance thing still hold in the n-player case” and “what’s the analogue of that strategy where you block your foe from getting more than X utility, when there are multiple foes?”. But this disagreement-point ratcheting*is*a strategy that address your worries with “ever-smaller pieces of the problem live on higher meta-levels, so the process of adding layers of meta actually converges to solving the problem, and breaking it off early solves most of the problem”

Regarding your last comment, yes, you could always just have a foe that’s a jerk, but you can at least act so they don’t*gain*from being an jerk, in a way robust against you and a foe having slightly different definitions of “jerk”.

I think I have a contender for something which evades the conditional-threat issue stated at the end, as well as obvious variants and strengthenings of it, and which would be threat-resistant in a dramatically stronger sense than ROSE.

There’s still a lot of things to check about it that I haven’t done yet. And I’m unsure how to generalize to the n-player case. And it still feels unpleasantly hacky, according to my mathematical taste.

But the task at least feels possible, now.

EDIT: it turns out it was still susceptible to the conditional-threat issue, but then I thought for a while and came up with a

*different*contender that feels a lot less hacky, and that provably evades the conditional-threat issue. Still lots of work left to be done on it, though.

For 1, it’s just intrinsically mathematically appealing (continuity is always really nice when you can get it), and also because of an intution that if your foe experiences a tiny preference perturbation, you should be able to use small conditional payments to replicate their original preferences/incentive structure and start negotiating with

*that*, instead.I should also note that nowhere in the visual proof of the ROSE value for the toy case, is continuity used. Continuity just happens to appear.

For 2, yes, it’s part of game setup. The buttons are of whatever intensity you want (but they have to be intensity-capped

*somewhere*for technical reasons regarding compactness). Looking at the setup, for each player pair i,j, is the cap for how much of j’s utility that i can destroy. These can vary, as long as they’re nonnegative and not infinite. From this, it’s clear “Alice has a powerful button, Bob has a weak one” is one of the possibilities, that would just mean . There isn’t an assumption that everyone has an equally powerful button, because then you could argue that everyone just has an equal strength threat and then it wouldn’t be much of a threat-resistance desideratum, now would it? Heck, you can even give one player a powerful button and the other a zero-strength button that has no effect, that fits in the formalism.

So the theorem is actually saying “for all members of the family of destruction games with the button caps set wherever the heck you want, the payoffs are the same as the original game”.

My preferred way of resolving it is treating the process of “arguing over which equilibrium to move to” as a bargaining game, and just find a ROSE point from that bargaining game. If there’s multiple ROSE points, well, fire up another round of bargaining. This repeated process should very rapidly have the disagreement points close in on the Pareto frontier, until everyone is just arguing over very tiny slices of utility.

This is imperfectly specified, though, because I’m not entirely sure what the disagreement points would be, because I’m not sure how the “don’t let foes get more than what you think is fair” strategy generalizes to >2 players. Maaaybe disagreement-point-invariance comes in clutch here? If everyone agrees that an outcome as bad or worse than their least-preferred ROSE point would happen if they disagreed, then disagreement-point-invariance should come in to have everyone agree that it doesn’t really matter exactly where that disagreement point is.

Or maybe there’s some nice principled property that some equilibria have, which others don’t, that lets us winnow down the field of equilibria somewhat. Maybe that could happen.

I’m still pretty unsure, but “iterate the bargaining process to argue over which equilibria to go to, you don’t get an infinite regress because you rapidly home in on the Pareto frontier with each extra round you add” is my best bad idea for it.

EDIT: John Harsanyi had the same idea. He apparently had some example where there were multiple CoCo equilibria and his suggestion was that a second round of bargaining could be initiated over which equilibria to pick, but that in general, it’d be so hard to compute the n-person Pareto frontier for large n, that an equilibria might be stable because nobody can find a different equilibria nearby to aim for.

So this problem isn’t unique to ROSE points in full generality (CoCo equilibria have the exact same issue), it’s just that ROSE is the only one that produces multiple solutions for bargaining games, while CoCo only returns a single solution for bargaining games. (bargaining games are a subset of games in general)

# Threat-Resistant Bargaining Megapost: Introducing the ROSE Value

So, if you are limited to only pure strategies, for some reason, then yes, Chinese would be on the Pareto frontier.

But if you can implement randomization, then Chinese is*not*on the Pareto frontier, because both sides agree that “flip a coin, Heads for Sushi, Tails for Italian” is just strictly better than Chinese.

The convex shape consists of all the payoff pairs you can get if you allow randomization.

Alright, this is kind of a Special Interest, so here’s your relevant thought dump.

First up, the image is kind of misleading, in the sense that you can always tack on extra orders of magnitude. You could tack on another thousand orders of magnitude and make it look even longer, or just go “this is 900 OOM’s of literally nothing happening, lets clip that off and focus on the interesting part”

Assuming proton decay is a thing (that free protons decay with a ridiculously long half-life)....

ok, I was*planning*on going “as a ludicrous upper bound, here’s the number”, but, uh, the completely ludicrous upper bound wound up being a WHOLE LOT longer than I thought. I… I didn’t even think it was possible to stall till the evaporation of even a small black hole. But this calculation indicates that if you’re aiming*solely*at living ludicrously long, you can stall about a googol years, enough for even the largest black holes to evaporate, and to get to the end of the black hole era. I’m gonna need to rethink some stuff.

EDIT: rethought some stuff, realized it doesn’t change my conclusions from when I last looked into this. The fundamental problem is that, for any*remotely*realistic numbers, if you’re trying to catch the final evaporation of a black hole to harvest its mass-energy, you’ll blow a lot more than the amount of mass-energy that you could gain, in order to wait that long.

Final conclusion: If proton decay is a thing, it’s definitely not worth waiting to the end of a black hole, you’ll want to have things wrapped up far earlier. If proton decay isn’t a thing, you’ll want to wait till the black hole evaporates to catch that final party and last kg of mass-energy. If proton decay is a thing and you’re willing to blow completely ridiculous cosmic amounts of resources on it, you can last till the late parts of the black hole era.

The rough rationale is as follows. Start with 10x the mass of the largest black holes in the universe, around solar masses stockpiled. If they’re spinning fast enough, you can extract energy from them, assume you can extract all of it (it’s over 10 percent, so let’s round it up to 100 percent). Assume that proton decay is years (a high estimate), and that we use the energy at 100 percent efficiency to make matter (also high estimate), you can take out one proton, wait for around a proton decay time, take out the next proton, and so on. Then you can take out around protons, and each one lasts you around years, getting around years (high uncertainty). And, coincidentally, natural Hawing radiation finishes off the black hole of that size in years, leaving a small margin left over for silly considerations like “maybe the intelligence needs more than one proton to physically implement”.

So, not remotely practical, but maybe something like years would actually be doable? That extra 29 OOM’s of wiggle room patches over a lot of sins in this calculation.

But, in terms of what would actually be*practical*for the far future of humanity, it’d be the strat of “dump as much mass into a fast-spinning black hole as possible. Like, eat the entire Laniakea supercluster complex. Wait a trillion years for the cosmic microwave background radiation to cool to its floor temperature. You’d be in the late Stelliferous era at this point, with a few red dwarfs around, if you didn’t dump all the stars in the mega-black-hole already. Set up some infrastructure around the mega-hole, and use the Blandford-Znajek mechanism to convert the mega-hole spin into electrical power. You should be able to get about a gigawatt of power for the next years to run a whole lotta computation and a little bit of maintanence, and if proton decay is messing with things, chop however many OOM’s you need off the time and add those OOM’s to the power output. Party for a trillion trillion trillion eons with your super-optimized low-temperature computing infrastructure”

# Infra-Exercises, Part 1

Yeah, “transferrable utility games” are those where there is a resource, and the utilities of all players are linear in that resource (in order to redenominate everyone’s utilities as being denominated in that resource modulo a shift factor). I believe the post mentioned this.

# Less Threat-Dependent Bargaining Solutions?? (3/2)

Task completed.

Agreed. The bargaining solution for the entire game can be very different from adding up the bargaining solutions for the subgames. If there’s a subgame where Alice cares very much about victory in that subgame (interior decorating choices) and Bob doesn’t care much, and another subgame where Bob cares very much about it (food choice) and Alice doesn’t care much, then the bargaining solution of the entire relationship game will end up being something like “Alice and Bob get some relative weights on how important their preferences are, and in all the subgames, the weighted sum of their utilities is maximized. Thus, Alice will be given Alice-favoring outcomes in the subgames where she cares the most about winning, and Bob will be given Bob-favoring outcomes in the subgames where he cares the most about winning”

And in particular, since it’s a sequential game, Alice can notice if Bob isn’t being fair, and enforce the bargaining solution by going “if you’re not aiming for something sorta like this, I’ll break off the relationship”. So, from Bob’s point of view, aiming for any outcome that’s too Bob-favoring has really low utility since Alice will inevitably catch on. (this is the time-extended version of “give up on achieving any outcome that drives the opponent below their BATNA”) Basically, in terms of raw utility, it’s still a bargaining game deep down, but once both sides take into account how the other will react, the payoff matrix for the restaurant game (taking the future interactions into account) will look like “it’s a really bad idea to aim for an outcome the other party would regard as unfair”

Actually, they apply anyways in

*all*circumstances, not*just*after the rescaling and shifting is done! Scale-and-shift invariance means that no matter how you stretch and shift the two axes, the bargaining solution always hits the same probability-distribution over outcomes, so monotonicity means “if you increase the payoff numbers you assign for some or all of the outcomes, the Pareto frontier point you hit will give you an increased number for your utility score over what it’d be otherwise” (no matter how you scale-and-shift). And independence of irrelevant alternatives says “you can remove any option that you have 0 probability of taking and you’ll still get the same probability-distribution over outcomes as you would in the original game” (no matter how you scale-and-shift)

# Unifying Bargaining Notions (2/2)

# Unifying Bargaining Notions (1/2)

If you’re looking for curriculum materials, I believe that the most useful reference would probably be my “Infra-exercises”, a sequence of posts containing all the math exercises you need to reinvent a good chunk of the theory yourself. Basically, it’s the textbook’s exercise section, and working through interesting math problems and proofs on one’s own has a

*much*better learning feedback loop and retention of material than slogging through the old posts. The exercises are short on motivation and philosophy compared to the posts, though, much like how a functional analysis textbook takes for granted that you want to learn functional analysis and doesn’t bother motivating it.

The primary problem is that the exercises aren’t particularly calibrated in terms of difficulty, and in order for me to get useful feedback, someone has to actually work through all of them, so feedback has been a bit sparse. So I’m stuck in a situation where I keep having to link*everyone*to the infra-exercises over and over and it’d be really good to just get them out and publicly available, but if they’re as important as I think, then the best move is something like “release them one at a time and have a bunch of people work through them as a group” like the fixpoint exercises, instead of “just dump them all as public documents”.

I’ll ask around about speeding up the public—ation of the exercises and see what can be done there.

I’d strongly endorse linking this introduction even if the exercises are linked as well, because this introduction serves as the table of contents to all the other applicable posts.

So, if you make Nirvana infinite utility, yes, the fairness criterion becomes “if you’re mispredicted, you have any probability at all of entering the situation where you’re mispredicted” instead of “have a significant probability of entering the situation where you’re mispredicted”, so a lot more decision-theory problems can be captured if you take Nirvana as infinite utility. But, I talk in another post in this sequence (I think it was “the many faces of infra-beliefs”) about why you want to do Nirvana as 1 utility instead of infinite utility.

Parfit’s Hitchiker with a perfect predictor is a perfectly fine acausal decision problem, we can still represent it, it just cannot be represented as an infra-POMDP/causal decision problem.

Yes, the fairness criterion is tightly linked to the pseudocausality condition. Basically, the acausal->pseudocausal translation is the part where the accuracy of the translation might break down, and once you’ve got something in pseudocausal form, translating it to causal form from there by adding in Nirvana won’t change the utilities much.

So, the flaw in your reasoning is after updating we’re in the city, doesn’t go “logically impossible, infinite utility”. We just go “alright, off-history measure gets converted to 0 utility”, a perfectly standard update. So updates to (0,0) (ie, there’s 0 probability I’m in this situation in the first place, and my expected utility for not getting into this situation in the first place is 0, because of probably dying in the desert)

As for the proper way to do this analysis, it’s a bit finicky. There’s something called “acausal form”, which is the fully general way of representing decision-theory problems. Basically, you just give an infrakernel that tells you your uncertainty over which history will result, for each of your policies.So, you’d have

Ie, if you pay, 99 percent chance of ending up alive but paying and 1 percent chance of dying in the desert, if you don’t pay, 99 percent chance of dying in the desert and 1 percent chance of cheating them, no extra utility juice on either one.

You update on the event “I’m alive”. The off-event utility function is like “being dead would suck, 0 utility”. So, your infrakernel updates to (leaving off the scale-and-shift factors, which doesn’t affect anything)

Because, the probability mass on “die in desert” got burned and turned into utility juice, 0 of it since it’s the worst thing. Let’s say your utility function assigns 0.5 utility to being alive and rich, and 0.4 utility to being alive and poor. So the utility of the first policy is , and the utility of the second policy is , so it returns the same answer of paying up. It’s basically thinking “if I don’t pay, I’m probably not in this situation in the first place, and the utility of “I’m not in this situation in the first place” is also about as low as possible.”

BUT

There’s a very mathematically natural way to translate any decision theory to “causal form”, and as it turns out, the process which falls directly out of the math is that thing where you go “hard-code in all possible policies, go to Nirvana if I behave differently from the hard-coded policy”. This has an advantage and a disadvantage. The advantage is that now your decision-theory problem is in the form of an infra-POMDP, a much more restrictive form, so you’ve got a much better shot at actually developing a practical algorithm for it. The disadvantage is that not all decision-theory problems survive the translation process unchanged. Speaking informally the “fairness criterion” to translate a decision-theory problem into causal form without too much loss in fidelity is something like “if I was mispredicted, would I actually have a good shot at entering the situation where I was mispredicted to prove the prediction wrong”.

Counterfactual mugging fits this. If Omega flubs its prediction, you’ve got a 50 percent chance of being able to prove it wrong.

XOR blackmail fits this. If the blackmailer flubs its prediction and thinks you’ll pay up, you’ve got like a 90 percent chance of being able to prove it wrong.

Newcomb’s problem fits this. If Omega flubs its prediction and thinks you’ll 2-box, you’ll definitely be able to prove it wrong.

Transparent Newcomb and Parfait’s Hitchiker don’t fit this “fairness property” (especially for 100 percent accuracy), and so when you translate them to a causal problem, it ruins things. If the predictor screws up and thinks you’ll 2-box on seeing a filled transparent box/won’t pay up on seeing you got saved, then the transparent box is empty/you die in the desert, and you don’t have a significant shot at proving them wrong.

Let’s see what’s going wrong. Our two a-environments are

Update on the event “I didn’t die in the desert”. Then, neglecting scale-and-shift, our two a-environments are

Letting N be the utility of Nirvana,

If you pay up, then the expected utilities of these are and

If you don’t pay up, then the expected utilities of these are and

Now, if N is something big like 100, then the worst-case utilities of the policies are 0.396 vs 0.005, as expected, and you pay up. But if N is something like 1, then the worst-case utilities of the policies are 0.01 vs 0.005, which… well, it technicallygets the right answer, but those numbers are suspiciously close to each other, the agent isn’t thinking properly. And so, without too much extra effort tweaking the problem setup, it’s possible to generate decision-theory problems where the agent just straight-up makes the wrong decision after changing things to the causal setting.

I have a reduction of this problem to a (hopefully) simpler problem. First up, establish the notation used.

[n] refers to the set {1...n}. n is the number of candidates. Use C as an abbreviation for the space Δ[n], it’s the space of probability distributions over the candidates. View C as embedded in Rn−1, and set the origin at the center of C.

At this point, we can note that we can biject the following:

1: Functions of type [n]→[0,1]

2: Affine functions of type C→[0,1]

3: Functions of the form λx.⟨a,x⟩+c, where x,a∈Rn−1, and and c∈R, and everything’s suitably set so that these functions are bounded in [0,1] over C. (basically, we extend our affine function to the entire space with the Hahn-Banach theorem, and use that every affine function can be written as a linear function plus a constant) We can reexpress our distribution V over utility functions as a distribution over these normal vectors a.

Now, we can reexpress the conjecture as follows. Is it the case that there exists a μ:ΔC s.t. for all ν:ΔC, we have

Ex,y,a∼μ×ν×V[sgn(⟨a,x−y⟩)]≥0Where sgn is the function that’s −1 if the quantity is negative, 0 if 0, and 1 if the quantity is positive. To see the equivalence to the original formulation, we can rewrite things as

Ex,y,a∼μ×ν×V[1⟨a,x⟩>⟨a,y⟩−1⟨a,y⟩>⟨a,x⟩]≥0Where the bold 1 is an indicator function. And split up the expectation and realize that this is a probability, so we get

Px,y,a∼μ×ν×V[⟨a,x⟩>⟨a,y⟩]−Px,y,a∼μ×ν×V[⟨a,y⟩>⟨a,x⟩]≥0Px,y,a∼μ×ν×V[⟨a,x⟩>⟨a,y⟩]≥Px,y,a∼μ×ν×V[⟨a,y⟩>⟨a,x⟩]And this then rephrases as

Px,y,U∼μ×ν×V[U(x)>U(y)]≥Px,y,U∼μ×ν×V[U(y)>U(x)]Which was the original formulation of the problem.

Abbreviating the function Ex,y,a∼μ×ν×V[sgn(⟨a,x−y⟩)] as f(μ,ν), then a

supμ∈ΔCinfν∈ΔCf(μ,ν)≥0necessarycondition to have a μ:ΔC that dominates everything is thatIf you have this property, then you might not necessarily have

anoptimal μ that dominates everything, but there are μ that get a worst-case expectation arbitrarily close to 0. Namely, even if the worst possible ν is selected, then the violation of the defining domination inequality happens with arbitrarily small magnitude. There might not be anoptimallottery-lottery, but there are lottery-lotteries arbitrarily close to optimal where this closeness-to-optimality is uniform over every foe. Which seems good enough to me. So I’ll be focused on proving this slightly easier statement and glossing over the subtle distinction between that, and the existence of truly optimal lottery-lotteries.As it turns out, this slightly easier statement (that sup inf is 0 or higher) can be outright proven assuming the following conjecture.

infν′:d(ν,ν′)<δf(μ,ν′)>−ϵStably-Good-Response Conjecture:For every ν:ΔC, and ϵ>0, there exists a μ:ΔC and a δ>0 s.t.Pretty much, for any desired level of suckage and any foe ν, there’s a probability distribution μ you can pick which isn’t just a

good response(this always exists, just pick ν itself), but astably good response, in the sense that there’s some nonzero level of perturbation to the foe where μremainsa good response no matter how the foe is perturbed.Theorem 1Assuming the Stably-Good-Response Conjecture, supμinfνf(μ,ν)≥0.I’ll derive a contradiction from the assumption that 0>supμinfνf(μ,ν). Accordingly, assume the strict inequality.

In such a case, there is some ϵ s.t. 0>−ϵ>supμinfνf(μ,ν). Let the set Aμ:={ν|f(μ,ν)>−ϵ}. Now, every ν lies in the interior of Aμ for some μ, by the Stably-Good-Response Conjecture. Since ΔC is a compact set, we can isolate a finite subcover and get some finite set M of probability distributions μ s.t. ∀ν∃μ∈M:f(μ,ν)>−ϵ.

Now, let the set Bν:={μ∈c.h(M)|f(μ,ν)<−ϵ}. Since −ϵ>supμinfνf(μ,ν), this family of sets manages to cover all of c.h(M) (convex hull of our finite set.) Further, for any fixed ν, f(μ,ν) is a continuous function c.h(M)→R (a bit nonobvious, but true nontheless because there’s only finitely many vertices to worry about). Due to continuity, all the sets Bν will be open. Since we have an open cover of c.h(M), which is a finite simplex (and thus compact), we can isolate a finite subcover, to get a finite set N of ν s.t. ∀μ∈c.h(M)∃ν∈N:f(μ,ν)<−ϵ. And now we can go

−ϵ>maxμ∈c.h(M)minν∈Nf(μ,ν)≥maxμ∈c.h(M)minν∈c.h(N)f(μ,ν)=minν∈c.h(N)maxμ∈c.h(M)f(μ,ν)≥minν∈c.h(N)maxμ∈Mf(μ,ν)>−ϵThe first strict inequality was from how all μ∈c.h(M) had some ν∈N which made f(μ,ν) get a bad score. The ≥ was from expanding the set of options. The = was from how f is a continuous linear function when restricted to c.h(M)×c.h(N), both of which are compact convex sets, so the minimax theorem can be applied. Then the next ≥ was from restricting the set of options, and the > was from how every ν∈ΔC had some μ∈M that’d make f(μ,ν) get a good score, by construction of M (and compactness to make the inequality a strict one).

But wait, we just showed −ϵ>−ϵ, that’s a contradiction. Therefore, our original assumption must have been wrong. Said original assumption was that 0>supμinfνf(μ,ν), so negating it, we’ve proved that

supμinfνf(μ,ν)≥0As desired.