Proof Section to Formalizing Newcombian Problems with Fuzzy Infra-Bayesianism

This proof section accompanies Formalizing Newcombian problems with fuzzy infra-Bayesianism. We prove the following result.

Theorem [Alexander Appel (@Diffractor), Vanessa Kosoy (@Vanessa Kosoy)]:

Let be a Newcombian problem of horizon that satisfies pseudocausality. Let denote the associated supra-POMDP with infinite time horizon and time discount Then

Furthermore, if is a family of policies such that then

Proof: Let denote the empty history. Given a supracontribution , let denote the set of maximal extreme points of First we remark that for any supra-POMDP, without loss of generality, a set of copolicies can always be replaced by

Given an episode policy let denote the episode copolicy that initializes the state to i.e. Let denote the distribution over outcomes determined by the interaction of and Note that the expected loss with respect to is equal to the expected loss for the Newcombian problem, i.e.

Recall that throughout this sequence, we assume that is finite. By the remark at the beginning of the proof, the expected loss in one episode for the corresponding supra-POMDP can be written as a maximum expected loss over a finite set of -copolicies Namely,

Then

and thus for any episode policy

We now extend this analysis to the optimal loss over episodes for [1] Let denote the episode optimal loss for Let be an arbitrary policy for episodes of Then as before,

where the maximum is over a finite set of -episode copolicies By the single episode case,

and thus

It remains to show that the opposite inequality holds in the many-episode and limit.

Recall that given we define

Recall that since satisfies pseudocausality, there exists a -optimal policy such that for all if then is also optimal for Consequently, for any episode copolicy either or To see this, suppose there exists an episode copolicy such that Then there exists a policy such that and . By pseudocausality, Thus

Define

By the remark at the beginning of the proof, the relevant set of copolicies in the definition of is finite, and thus is well-defined. If then Thus

Consider the iterated Newcombian problem over episodes. Let denote the multi-episode policy such that restricted to every episode is Let denote an arbitrary copolicy that interacts with Furthermore, let denote the number of episodes for which the episode-restriction of interacting with satisfies [2]

We have

Furthermore,

We leave it to the reader to verify that

  1. ^

    Recall that if and is given by then the loss over episodes with geometric time discount is defined by

  2. ^

    A copolicy can depend on the past, meaning it can depend on the policy. Thus can depend on .

No comments.