Any idea of how well this would generalize to stuff like Chicken or games with more than 2-players, 2-moves?

# Diffractor

# Countable Factored Spaces

I was subclinically depressed, acquired some bupropion from Canada, and it’s been extremely worthwhile.

# Confusions re: Higher-Level Game Theory

I don’t know, we’re hunting for it, relaxations of dynamic consistency would be extremely interesting if found, and I’ll let you know if we turn up with anything nifty.

Looks good.

Re: the dispute over normal bayesianism: For me, “environment” denotes “thingy that can freely interact with any policy in order to produce a probability distribution over histories”. This is a different type signature than a probability distribution over histories, which doesn’t have a degree of freedom corresponding to which policy you pick.

But for infra-bayes, we can associate a classical environment with the*set*of probability distributions over histories (for various possible choices of policy), and then the two distinct notions become the same sort of thing (set of probability distributions over histories, some of which can be made to be inconsistent by how you act), so you can compare them.

I’d say this is mostly accurate, but I’d amend number 3. There’s still a sort of non-causal influence going on in pseudocausal problems, you can easily formalize counterfactual mugging and XOR blackmail as pseudocausal problems (you need acausal specifically for

*transparent newcomb*, not vanilla newcomb). But it’s specifically a sort of influence that’s like “reality will adjust itself so contradictions don’t happen, and there may be correlations between what happened in the past, or other branches, and what your action is now, so you can exploit this by acting to make bad outcomes inconsistent”. It’s purely action-based, in a way that manages to capture some but not all weird decision-theoretic scenarios.

In normal bayesianism, you do*not*have a pseudocausal-causal equivalence. Every ordinary environment is straight-up causal.

Re point 1, 2: Check this out. For the specific case of 0 to even bits, ??? to odd bits, I think solomonoff can probably get that, but not more general relations.

Re: point 3, Solomonoff is about stochastic environments that just take your action as an input, and aren’t reading your policy. For infra-Bayes, you can deal with policy-dependent environments without issue, as you can consider hard-coding in every possible policy to get a family of stochastic environments, and UDT behavior naturally falls out as a result from this encoding. There’s still some open work to be done on which sorts of policy-dependent environments like this are learnable (inferrable from observations), but it’s pretty straightforward to cram all sorts of weird decision-theory scenarios in as infra-Bayes hypothesis, and do the right thing in them.

Ah. So, low expected utility alone isn’t too much of a problem. The amount of weight a hypothesis has in a prior after updating depends on the

*gap*between the best-case values and worst-case values. Ie, “how much does it matter what happens here”. So, the stuff that withers in the prior as you update are the hypotheses that are like “what happens now has negligible impact on improving the worst-case”. So, hypotheses that are like “you are screwed no matter what” just drop out completely, as if it doesn’t matter what you do, you might as well pick actions that optimize the*other*hypotheses that aren’t quite so despondent about the world.

In particular, if all the probability distributions in a set are like “this thing that just happened was improbable”, the hypothesis takes a big hit in the posterior, as all the a-measures are like “ok, we’re in a low-measure situation now, what happens after this point has negligible impact on utility”.

I still need to better understand how updating affects hypotheses which are a big set of probability distributions so there’s always one probability distribution that’s like “I correctly called it!”.

The motivations for different g are:

If g is your actual utility function, then updating with g as your off-event utility function grants you dynamic consistency. Past-you never regrets turning over the reins to future you, and you act just as UDT would.

If g is the constant-1 function, then that corresponds to updates where you don’t care at all what happens off-history (the closest thing to normal updates), and both the “diagonalize against knowing your own action” behavior in decision theory and the Nirvana trick pops out for free from using this update.

“mixture of infradistributions” is just an infradistribution, much like how a mixture of probability distributions is a probability distribution.

Let’s say we’ve got a prior , a probability distribution over indexed hypotheses.

If you’re working in a vector space, you can take any countable collection of sets in said vector space, and mix them together according to a prior giving a weight to each set. Just make the set of all points which can be made by the process “pick a point from each set, and mix the points together according to the probability distribution “

For infradistributions as sets of probability distributions or a-measures or whatever, that’s a subset of a vector space. So you have a bunch of sets , and you just mix the sets together according to , that gives you your set .

If you want to think about the mixture in the concave functional view, it’s even nicer. You have a bunch of which are “hypothesis i can take a function and output what its worst-case expectation value is”. The mixture of these, , is simply defined as . This is just mixing the functions together!

Both of these ways of thinking of mixtures of infradistributions are equivalent, and recover mixture of probability distributions as a special case.

The concave functional view is “the thing you do with a probability distribution is take expectations of functions with it. In fact, it’s actually possible to identify a probability distribution with the function mapping a function to its expectation. Similarly, the thing we do with an infradistribution is taking expectations of functions with it. Let’s just look at the behavior of the function we get, and neglect the view of everything as a set of a-measures.”

As it turns out, this view makes proofs a whole lot cleaner and tidier, and you only need a few conditions on a function like that for it to have a corresponding set of a-measures.

# The Many Faces of Infra-Beliefs

Sounds like a special case of crisp infradistributions (ie, all partial probability distributions have a unique associated crisp infradistribution)

Given some , we can consider the (nonempty) set of probability distributions equal to where is defined. This set is convex (clearly, a mixture of two probability distributions which agree with about the probability of an event will also agree with about the probability of an event).

Convex (compact) sets of probability distributions = crisp infradistributions.

You’re completely right that hypotheses with unconstrained Murphy get ignored because you’re doomed no matter what you do, so you might as well optimize for just the other hypotheses where what you do matters. Your “-1,000,000 vs −999,999 is the same sort of problem as 0 vs 1” reasoning is good.

Again, you are making the serious mistake of trying to think about Murphy verbally, rather than thinking of Murphy as the personification of the “inf” part of the definition of expected value, and writing actual equations. is the available set of possibilities for a hypothesis. If you really want to, you can think of this as constraints on Murphy, and Murphy picking from available options, but it’s highly encouraged to just work with the math.

For mixing hypotheses (several different sets of possibilities) according to a prior distribution , you can write it as an expectation functional via (mix the expectation functionals of the component hypotheses according to your prior on hypotheses), or as a set via (the available possibilities for the mix of hypotheses are all of the form “pick a possibility from each hypothesis, mix them together according to your prior on hypotheses”)

This is what I meant by “a constraint on Murphy is picked according to this probability distribution/prior, then Murphy chooses from the available options of the hypothesis they picked”, that set (your mixture of hypotheses according to a prior) corresponds to selecting one of the sets according to your prior , and then Murphy picking freely from the set .

Using (and considering our choice of what to do affecting the choice of , we’re trying to pick the best function ) we can see that if the prior is composed of a bunch of “do this sequence of actions or bad things happen” hypotheses, the details of what you do sensitively depend on the probability distribution over hypotheses. Just like with AIXI, really.

Informal proof: if and (assuming ), then we can see that

and so, the best sequence of actions to do would be the one associated with the “you’re doomed if you don’t do blahblah action sequence” hypothesis with the highest prior. Much like AIXI does.

Using the same sort of thing, we can also see that if there’s a maximally adversarial hypothesis in there somewhere that’s just like “you get 0 reward, screw you” no matter what you do (let’s say this is psi_0), then we have

And so, that hypothesis drops out of the process of calculating the expected value, for all possible functions/actions. Just do a scale-and-shift, and you might as well be dealing with the prior , which a-priori assumes you aren’t in the “screw you, you lose” environment.

Hm, what about if you’ve just got two hypotheses, one where you’re like “my knightian uncertainty scales with the amount of energy in the universe so if there’s lots of energy available, things could e really bad, while if there’s little energy available, Murphy can’t make things bad” () and one where reality behaves pretty much as you’d expect it to(? And your two possible options would be “burn energy freely so Murphy can’t use it” (the choice , attaining a worst-case expected utility of in and in ), and “just try to make things good and don’t worry about the environment being adversarial” (the choice , attaining 0 utility in , 1 utility in ).

The expected utility of (burn energy) would be

And the expected utility of (act normally) would be

So “act normally” wins if , which can be rearranged as . Ie, you’ll act normally if the probability of “things are normal” times the loss from burning energy when things are normal exceeds the probability of “Murphy’s malice scales with amount of available energy” times the gain from burning energy in that universe.

So, assuming you assign a high enough probability to “things are normal” in your prior, you’ll just act normally. Or, making the simplifying assumption that “burn energy” has similar expected utilities in both cases (ie, ), then it would come down to questions like “is the utility of burning energy closer to the worst-case where Murphy has free reign, or the best-case where I can freely optimize?”

And this is assuming there’s just two options, the actual strategy selected would probably be something like “act normally, if it looks like things are going to shit, start burning energy so it can’t be used to optimize against me”

Note that, in particular, the hypothesis where the level of attainable badness scales with available energy is very different from the “screw you, you lose” hypothesis, since there are actions you can take that do better and worse in the “level of attainable badness scales with energy in the universe” hypothesis, while the “screw you, you lose” hypothesis just makes you lose. And both of these are very different from a “you lose if you don’t take this exact sequence of actions” hypothesis.*Murphy is not a physical being, it’s a personification of an equation, thinking verbally about an actual Murphy doesn’t help because you start confusing very different hypotheses, think purely about what the actual set of probability distributions**corresponding to hypothesis**looks like*. I can’t stress this enough.

Also, remember, the goal is to maximize worst-casevalue, not worst-case value.**expected**

# Inframeasures and Domain Theory

# Infra-Domain proofs 1

# Infra-Domain Proofs 2

There’s actually an upcoming post going into more detail on what the deal is with pseudocausal and acausal belief functions, among several other things, I can send you a draft if you want. “Belief Functions and Decision Theory” is a post that hasn’t held up nearly as well to time as “Basic Inframeasure Theory”.

If you use the Anti-Nirvana trick, your agent just goes “nothing matters at all, the foe will mispredict and I’ll get -infinity reward” and rolls over and cries since all policies are optimal. Don’t do that one, it’s a bad idea.

For the concave expectation functionals: Well, there’s another constraint or two, like monotonicity, but yeah, LF duality basically says that you can turn any (monotone) concave expectation functional into an inframeasure. Ie, all risk aversion can be interpreted as having radical uncertainty over some aspects of how the environment works and assuming you get worst-case outcomes from the parts you can’t predict.

For your concrete example, that’s why you have multiple hypotheses that are learnable. Sure, one of your hypotheses might have complete knightian uncertainty over the odd bits, but another hypothesis might not. Betting on the odd bits is advised by a more-informative hypothesis, for sufficiently good bets. And the policy selected by the agent would probably be something like “bet on the odd bits occasionally, and if I keep losing those bets, stop betting”, as this wins in the hypothesis where some of the odd bits are predictable, and doesn’t lose too much in the hypothesis where the odd bits are completely unpredictable and out to make you lose.

In the proof of Lemma 3, it should be

“Finally, since χFC(z,z)=z, we have that polyFC(z)⋅polyFB∖C(z)=QFz.

Thus, QFz⋅QFx∩y∩z and QFx∩z⋅QFy∩z are both equal to polyFC(x∩z)⋅polyFB∖C(y∩z)⋅polyFC(z)⋅polyFB∖C(z).

instead.