The first thing I was confused about was what sorts of rules could constrain Murphy, based on my actions. For example, in a bit-string environment, the rule “every other bit is a 0” constrains Murphy (he can’t reply with “111...”), but not based on my actions. It doesn’t matter what bits I flip, Murphy can always just reply with the environment that is maximally bad, as long as it has 0s in every other bit. Another example would be if you have the rule “environment must be a valid chess board,” then you can make whatever moves you want, and Murphy can just return the environment with the rule “if you make that move, then the next board state is you in checkmate”, after all, you being in checkmate is a valid chessboard, and therefore meets the only rule you know. And you can’t know what other rules Murphy plays by. You can’t really run minimax on that, then, because all of Murphy’s moves look like “set the state to the worst allowable state.”

So, what kind of rules actually constrain Murphy based on my actions? My first take was “rules involving time,” for instance if you have the rule “only one bit can be flipped per timestep” then you can constrain Murphy. If you flip a bit, then within the next timestep, you’ve eliminated some possibilities (they would require flipping that bit back and doing something else), so you can have a meaningful minimax on which action to take.

This didn’t feel like the whole story though, so I had a talk with my friend about it, and eventually, we generalized it to “rules that consume resources.” An example would be, if you have the rule “for every bit you flip, you must also flip one of the first 4 bits from a 1 to a 0″, then we can constrain Murphy. If I flip any bit, that leaves 1 less bit for Murphy to use to mess with me.

But then the minimax strategy started looking worrying to me. If the only rules that you can use to constrain Murphy are ones that use resources, then wouldn’t a minimax strategy have some positive preference for destroying resources in order to prevent Murphy from using them? It seems like a good way to minimize Murphy’s best outcomes.

Maximin, actually. You’re maximizing your worst-case result.

It’s probably worth mentioning that “Murphy” isn’t an actual foe where it makes sense to talk about destroying resources lest Murphy use them, it’s just a personification of the fact that we have a set of options, any of which could be picked, and we want to get the highest lower bound on utility we can for that set of options, so we assume we’re playing against an adversary with perfectly opposite utility function for intuition. For that last paragraph, translating it back out from the “Murphy” talk, it’s “wouldn’t it be good to use resources in order to guard against worst-case outcomes within the available set of possibilities?” and this is just ordinary risk aversion.

For that equation argmaxπinfe∈BEπ⋅e[U], B can be any old set of probabilistic environments you want. You’re not spending any resources or effort, a hypothesis just is a set of constraints/possibilities for what reality will do, a guess of the form “Murphy’s operating under these constraints/must pick an option from this set.”

You’re completely right that for constraints like “environment must be a valid chess board”, that’s too loose of a constraint to produce interesting behavior, because Murphy is always capable of screwing you there.

This isn’t too big of an issue in practice, because it’s possible to mix together several infradistributions with a prior, which is like “a constraint on Murphy is picked according to this probability distribution/prior, then Murphy chooses from the available options of the hypothesis they picked”. And as it turns out, you’ll end up completely ignoring hypotheses where Murphy can screw you over no matter what you do. You’ll choose your policy to do well in the hypotheses/scenarios where Murphy is more tightly constrained, and write the “you automatically lose” hypotheses off because it doesn’t matter what you pick, you’ll lose in those.

But there is a big unstudied problem of “what sorts of hypotheses are nicely behaved enough that you can converge to optimal behavior in them”, that’s on our agenda.

An example that might be an intuition pump, is that there’s a very big difference between the hypothesis that is “Murphy can pick a coin of unknown bias at the start, and I have to win by predicting the coinflips accurately” and the hypothesis “Murphy can bias each coinflip individually, and I have to win by predicting the coinflips accurately”. The important difference between those seems to be that past performance is indicative of future behavior in the first hypothesis and not in the second. For the first hypothesis, betting according to Laplace’s law of succession would do well in the long run no matter what weighted coin Murphy picks, because you’ll catch on pretty fast. For the second hypothesis, no strategy you can do can possibly help in that situation, because past performance isn’t indicative of future behavior.

I’m glad to hear that the question of what hypotheses produce actionable behavior is on people’s minds.

I modeled Murphy as an actual agent, because I figured a hypothesis like “A cloaked superintelligence is operating the area that will react to your decision to do X by doing Y” is always on the table, and is basically a template for allowing Murphy to perform arbitrary action Y.

I feel like I didn’t quite grasp what you meant by “a constraint on Murphy is picked according to this probability distribution/prior, then Murphy chooses from the available options of the hypothesis they picked”

But based on your explanation after, it sounds like you essentially ignore hypotheses that don’t constrain Murphy, because they act as an expected utility drop on all states, so it just means you’re comparing −1,000,000 and −999,999, instead of 0 and 1. For example, there’s a whole host of hypotheses of the form “A cloaked superintelligence converts all local usable energy into a hellscape if you do X”, and since that’s a possibility for every X, no action X is graded lower than the others by its existence.

That example is what got me thinking, in the first place, though. Such hypotheses don’t lower everything equally, because, given other Laws of Physics, the superintelligence would need energy to hell-ify things. So arbitrarily consuming energy would reduce how bad the outcomes could be if a perfectly misaligned superintelligence was operating in the area. And, given that I am positing it as a perfectly misaligned superintelligence, we should both expect it to exist in the environment Murphy chooses (what could be worse?) and expect any reduction of its actions to be as positive of changes as a perfectly aligned superintelligence’s actions could be, since preventing a maximally detrimental action should match, in terms of Utility, enabling a maximally beneficial action. Therefore, entropy-bombs.

Thinking about it more, assuming I’m not still making a mistake, this might just be a broader problem, not specific to this in any way. Aren’t I basically positing Pascal’s Mugging?

You’re completely right that hypotheses with unconstrained Murphy get ignored because you’re doomed no matter what you do, so you might as well optimize for just the other hypotheses where what you do matters. Your “-1,000,000 vs −999,999 is the same sort of problem as 0 vs 1” reasoning is good.

Again, you are making the serious mistake of trying to think about Murphy verbally, rather than thinking of Murphy as the personification of the “inf” part of the EΨ[f]:=inf(m,b)∈Ψm(f)+b definition of expected value, and writing actual equations.Ψ is the available set of possibilities for a hypothesis. If you really want to, you can think of this as constraints on Murphy, and Murphy picking from available options, but it’s highly encouraged to just work with the math.

For mixing hypotheses (several different Ψi sets of possibilities) according to a prior distribution ζ∈ΔN, you can write it as an expectation functional via ψζ(f):=Ei∼ζ[ψi(f)] (mix the expectation functionals of the component hypotheses according to your prior on hypotheses), or as a set via Ψζ:={(m,b)|∃(mi,bi)∈Ψi:Ei∼ζ(mi,bi)=(m,b)} (the available possibilities for the mix of hypotheses are all of the form “pick a possibility from each hypothesis, mix them together according to your prior on hypotheses”)

This is what I meant by “a constraint on Murphy is picked according to this probability distribution/prior, then Murphy chooses from the available options of the hypothesis they picked”, that Ψζ set (your mixture of hypotheses according to a prior) corresponds to selecting one of the Ψi sets according to your prior ζ, and then Murphy picking freely from the set Ψi.

Using ψζ(f):=Ei∼ζ[ψi(f)] (and considering our choice of what to do affecting the choice of f, we’re trying to pick the best function f) we can see that if the prior is composed of a bunch of “do this sequence of actions or bad things happen” hypotheses, the details of what you do sensitively depend on the probability distribution over hypotheses. Just like with AIXI, really. Informal proof: if ψi(fi)≃1 and ψi(fj)≃0 (assuming j≠i), then we can see that ψζ(fi)=Ej∼ζ[ψj(fi)]=∑j≠iζj⋅ψj(fi)+ζi⋅ψi(fi)≃ζi and so, the best sequence of actions to do would be the one associated with the “you’re doomed if you don’t do blahblah action sequence” hypothesis with the highest prior. Much like AIXI does.

Using the same sort of thing, we can also see that if there’s a maximally adversarial hypothesis in there somewhere that’s just like “you get 0 reward, screw you” no matter what you do (let’s say this is psi_0), then we have ψζ(fi)=Ej∼ζ[ψj(fi)]=∑j≥1ζj⋅ψj(fi)+ζ0⋅ψ0(fi)≃∑j≥1ζj⋅ψj(fi) And so, that hypothesis drops out of the process of calculating the expected value, for all possible functions/actions. Just do a scale-and-shift, and you might as well be dealing with the prior (ζ|i≠0), which a-priori assumes you aren’t in the “screw you, you lose” environment.

Hm, what about if you’ve just got two hypotheses, one where you’re like “my knightian uncertainty scales with the amount of energy in the universe so if there’s lots of energy available, things could e really bad, while if there’s little energy available, Murphy can’t make things bad” (ψ0) and one where reality behaves pretty much as you’d expect it to(ψ1)? And your two possible options would be “burn energy freely so Murphy can’t use it” (the choice f0, attaining a worst-case expected utility of x0 in ψ0 and x1 in ψ1), and “just try to make things good and don’t worry about the environment being adversarial” (the choice f1, attaining 0 utility in ψ0, 1 utility in ψ1).

The expected utility of f0 (burn energy) would be ψζ(f0)=ζ0⋅ψ0(f0)+ζ1⋅ψ1(f0)=ζ0⋅x0+ζ1⋅x1 And the expected utility of f1(act normally) would be ψζ(f1)=ζ0⋅ψ0(f1)+ζ1⋅ψ1(f1)=ζ0⋅0+ζ1⋅1=ζ1 So “act normally” wins if ζ1≥ζ0⋅x0+ζ1⋅x1, which can be rearranged as ζ1(1−x1)≥ζ0(x0−0). Ie, you’ll act normally if the probability of “things are normal” times the loss from burning energy when things are normal exceeds the probability of “Murphy’s malice scales with amount of available energy” times the gain from burning energy in that universe. So, assuming you assign a high enough probability to “things are normal” in your prior, you’ll just act normally. Or, making the simplifying assumption that “burn energy” has similar expected utilities in both cases (ie, x1≃x0), then it would come down to questions like “is the utility of burning energy closer to the worst-case where Murphy has free reign, or the best-case where I can freely optimize?” And this is assuming there’s just two options, the actual strategy selected would probably be something like “act normally, if it looks like things are going to shit, start burning energy so it can’t be used to optimize against me”

Note that, in particular, the hypothesis where the level of attainable badness scales with available energy is very different from the “screw you, you lose” hypothesis, since there are actions you can take that do better and worse in the “level of attainable badness scales with energy in the universe” hypothesis, while the “screw you, you lose” hypothesis just makes you lose. And both of these are very different from a “you lose if you don’t take this exact sequence of actions” hypothesis.

Murphy is not a physical being, it’s a personification of an equation, thinking verbally about an actual Murphy doesn’t help because you start confusing very different hypotheses, think purely about what the actual set of probability distributions Ψi corresponding to hypothesis i looks like. I can’t stress this enough.

Also, remember, the goal is to maximize worst-case expected value, not worst-case value.

A little late to the party, but

I’m confused about the minimax strategy.

The first thing I was confused about was what sorts of rules could constrain Murphy, based on my actions. For example, in a bit-string environment, the rule “every other bit is a 0” constrains Murphy (he can’t reply with “111...”), but not based on my actions. It doesn’t matter what bits I flip, Murphy can always just reply with the environment that is maximally bad, as long as it has 0s in every other bit. Another example would be if you have the rule “environment must be a valid chess board,” then you can make whatever moves you want, and Murphy can just return the environment with the rule “if you make that move, then the next board state is you in checkmate”, after all, you being in checkmate is a valid chessboard, and therefore meets the only rule you know. And you can’t know what

otherrules Murphy plays by. You can’t really run minimax on that, then, because all of Murphy’s moves look like “set the state to the worst allowable state.”So, what kind of rules actually constrain Murphy based on my actions? My first take was “rules involving time,” for instance if you have the rule “only one bit can be flipped per timestep” then you can constrain Murphy. If you flip a bit, then within the next timestep, you’ve eliminated some possibilities (they would require flipping that bit back

anddoing something else), so you can have a meaningful minimax on which action to take.This didn’t feel like the whole story though, so I had a talk with my friend about it, and eventually, we generalized it to “rules that consume resources.” An example would be, if you have the rule “for every bit you flip, you must also flip one of the first 4 bits from a 1 to a 0″, then we can constrain Murphy. If I flip any bit, that leaves 1 less bit for Murphy to use to mess with me.

But then the minimax strategy started looking worrying to me. If the only rules that you can use to constrain Murphy are ones that use resources, then wouldn’t a minimax strategy have some positive preference for destroying resources in order to prevent Murphy from using them? It seems like a good way to minimize Murphy’s best outcomes.

Maximin, actually. You’re maximizing your worst-case result.

It’s probably worth mentioning that “Murphy” isn’t an actual foe where it makes sense to talk about destroying resources lest Murphy use them, it’s just a personification of the fact that we have a set of options, any of which could be picked, and we want to get the highest lower bound on utility we can for that set of options, so we assume we’re playing against an adversary with perfectly opposite utility function for intuition. For that last paragraph, translating it back out from the “Murphy” talk, it’s “wouldn’t it be good to use resources in order to guard against worst-case outcomes within the available set of possibilities?” and this is just ordinary risk aversion.

For that equation argmaxπinfe∈BEπ⋅e[U], B can be

anyold set of probabilistic environments you want. You’re not spending any resources or effort, a hypothesis justisa set of constraints/possibilities for what reality will do, a guess of the form “Murphy’s operating under these constraints/must pick an option from this set.”You’re completely right that for constraints like “environment must be a valid chess board”, that’s too loose of a constraint to produce interesting behavior, because Murphy is always capable of screwing you there.

This isn’t too big of an issue in practice, because it’s possible to mix together several infradistributions with a prior, which is like “a constraint on Murphy is picked according to this probability distribution/prior, then Murphy chooses from the available options of the hypothesis they picked”. And as it turns out, you’ll end up completely ignoring hypotheses where Murphy can screw you over no matter what you do. You’ll choose your policy to do well in the hypotheses/scenarios where Murphy is more tightly constrained, and write the “you automatically lose” hypotheses off because it doesn’t matter

whatyou pick, you’ll lose in those.But there

isa big unstudied problem of “what sorts of hypotheses are nicely behaved enough that you can converge to optimal behavior in them”, that’s on our agenda.An example that might be an intuition pump, is that there’s a very big difference between the hypothesis that is “Murphy can pick a coin of unknown bias at the start, and I have to win by predicting the coinflips accurately” and the hypothesis “Murphy can bias each coinflip individually, and I have to win by predicting the coinflips accurately”. The important difference between those seems to be that past performance is indicative of future behavior in the first hypothesis and not in the second. For the first hypothesis, betting according to Laplace’s law of succession would do well in the long run no matter

whatweighted coin Murphy picks, because you’ll catch on pretty fast. For the second hypothesis, no strategy you can do can possibly help in that situation, because past performance isn’t indicative of future behavior.I’m glad to hear that the question of what hypotheses produce actionable behavior is on people’s minds.

I modeled Murphy as an actual agent, because I figured a hypothesis like “A cloaked superintelligence is operating the area that will react to your decision to do X by doing Y” is always on the table, and is basically a template for allowing Murphy to perform arbitrary action Y.

I feel like I didn’t quite grasp what you meant by “a constraint on Murphy is picked according to this probability distribution/prior, then Murphy chooses from the available options of the hypothesis they picked”

But based on your explanation after, it sounds like you essentially ignore hypotheses that don’t constrain Murphy, because they act as an expected utility drop on all states, so it just means you’re comparing −1,000,000 and −999,999, instead of 0 and 1. For example, there’s a whole host of hypotheses of the form “A cloaked superintelligence converts all local usable energy into a hellscape if you do X”, and since that’s a possibility for every X, no action X is graded lower than the others by its existence.

That example is what got me thinking, in the first place, though. Such hypotheses

don’tlower everything equally, because, given other Laws of Physics, the superintelligence would need energy to hell-ify things. So arbitrarily consuming energy would reduce how bad the outcomes could be if a perfectly misaligned superintelligence was operating in the area. And, given that I am positing it as a perfectly misaligned superintelligence, we should both expect it to exist in the environment Murphy chooses (what could be worse?) and expect any reduction of its actions to be as positive of changes as a perfectly aligned superintelligence’s actions could be, since preventing a maximally detrimental action should match, in terms of Utility, enabling a maximally beneficial action. Therefore, entropy-bombs.Thinking about it more, assuming I’m not still making a mistake, this might just be a broader problem, not specific to this in any way. Aren’t I basically positing Pascal’s Mugging?

Anyway, thank you for replying. It helped.

You’re completely right that hypotheses with unconstrained Murphy get ignored because you’re doomed no matter what you do, so you might as well optimize for just the other hypotheses where what you do matters. Your “-1,000,000 vs −999,999 is the same sort of problem as 0 vs 1” reasoning is good.

Again, you are making the serious mistake of trying to think about Murphy verbally, rather than thinking of Murphy as the personification of the “inf” part of the EΨ[f]:=inf(m,b)∈Ψm(f)+b definition of expected value, and writing actual equations.Ψ is the available set of possibilities for a hypothesis. If you really want to, you can think of this as constraints on Murphy, and Murphy picking from available options, but it’s highly encouraged to just work with the math.

For mixing hypotheses (several different Ψi sets of possibilities) according to a prior distribution ζ∈ΔN, you can write it as an expectation functional via ψζ(f):=Ei∼ζ[ψi(f)] (mix the expectation functionals of the component hypotheses according to your prior on hypotheses), or as a set via Ψζ:={(m,b)|∃(mi,bi)∈Ψi:Ei∼ζ(mi,bi)=(m,b)} (the available possibilities for the mix of hypotheses are all of the form “pick a possibility from each hypothesis, mix them together according to your prior on hypotheses”)

This is what I meant by “a constraint on Murphy is picked according to this probability distribution/prior, then Murphy chooses from the available options of the hypothesis they picked”, that Ψζ set (your mixture of hypotheses according to a prior) corresponds to selecting one of the Ψi sets according to your prior ζ, and then Murphy picking freely from the set Ψi.

Using ψζ(f):=Ei∼ζ[ψi(f)] (and considering our choice of what to do affecting the choice of f, we’re trying to pick the best function f) we can see that if the prior is composed of a bunch of “do this sequence of actions or bad things happen” hypotheses, the details of what you do sensitively depend on the probability distribution over hypotheses. Just like with AIXI, really.

Informal proof: if ψi(fi)≃1 and ψi(fj)≃0 (assuming j≠i), then we can see that

ψζ(fi)=Ej∼ζ[ψj(fi)]=∑j≠iζj⋅ψj(fi)+ζi⋅ψi(fi)≃ζi

and so, the best sequence of actions to do would be the one associated with the “you’re doomed if you don’t do blahblah action sequence” hypothesis with the highest prior. Much like AIXI does.

Using the same sort of thing, we can also see that if there’s a maximally adversarial hypothesis in there somewhere that’s just like “you get 0 reward, screw you” no matter what you do (let’s say this is psi_0), then we have

ψζ(fi)=Ej∼ζ[ψj(fi)]=∑j≥1ζj⋅ψj(fi)+ζ0⋅ψ0(fi)≃∑j≥1ζj⋅ψj(fi)

And so, that hypothesis drops out of the process of calculating the expected value, for all possible functions/actions. Just do a scale-and-shift, and you might as well be dealing with the prior (ζ|i≠0), which a-priori assumes you aren’t in the “screw you, you lose” environment.

Hm, what about if you’ve just got two hypotheses, one where you’re like “my knightian uncertainty scales with the amount of energy in the universe so if there’s lots of energy available, things could e really bad, while if there’s little energy available, Murphy can’t make things bad” (ψ0) and one where reality behaves pretty much as you’d expect it to(ψ1)? And your two possible options would be “burn energy freely so Murphy can’t use it” (the choice f0, attaining a worst-case expected utility of x0 in ψ0 and x1 in ψ1), and “just try to make things good and don’t worry about the environment being adversarial” (the choice f1, attaining 0 utility in ψ0, 1 utility in ψ1).

The expected utility of f0 (burn energy) would be ψζ(f0)=ζ0⋅ψ0(f0)+ζ1⋅ψ1(f0)=ζ0⋅x0+ζ1⋅x1

And the expected utility of f1(act normally) would be

ψζ(f1)=ζ0⋅ψ0(f1)+ζ1⋅ψ1(f1)=ζ0⋅0+ζ1⋅1=ζ1

So “act normally” wins if ζ1≥ζ0⋅x0+ζ1⋅x1, which can be rearranged as ζ1(1−x1)≥ζ0(x0−0). Ie, you’ll act normally if the probability of “things are normal” times the loss from burning energy when things are normal exceeds the probability of “Murphy’s malice scales with amount of available energy” times the gain from burning energy in that universe.

So, assuming you assign a high enough probability to “things are normal” in your prior, you’ll just act normally. Or, making the simplifying assumption that “burn energy” has similar expected utilities in both cases (ie, x1≃x0), then it would come down to questions like “is the utility of burning energy closer to the worst-case where Murphy has free reign, or the best-case where I can freely optimize?”

And this is assuming there’s just two options, the actual strategy selected would probably be something like “act normally, if it looks like things are going to shit, start burning energy so it can’t be used to optimize against me”

Note that, in particular, the hypothesis where the level of attainable badness scales with available energy is very different from the “screw you, you lose” hypothesis, since there are actions you can take that do better and worse in the “level of attainable badness scales with energy in the universe” hypothesis, while the “screw you, you lose” hypothesis just makes you lose. And both of these are very different from a “you lose if you don’t take this exact sequence of actions” hypothesis.

Murphy is not a physical being, it’s a personification of an equation, thinking verbally about an actual Murphy doesn’t help because you start confusing very different hypotheses, think purely about what the actual set of probability distributionsΨicorresponding to hypothesisilooks like. I can’t stress this enough.Also, remember, the goal is to maximize worst-case

value, not worst-case value.expected