This is very cool, and I haven’t digested it yet, but I wonder if it might be open to the criticism that you’re effectively postulating the favored answer to Newcomb’s Problem (and other such scenarios) by postulating that when you surgically alter one of the nodes, you correspondingly alter the nodes for the other instances of the computation. After all, the crux of the counterfactual-reasoning dilemma in Newcomb’s Problem (and similarly in the Prisoner’s Dilemma) is to jusftify the inference “If I choose both boxes, then (probably) so does the simulation (even if in fact I/it do not)” rather than “If I choose both boxes, then the simulation doesn’t necessarily match my choice (even though in fact it does)”. It could be objected that your formalism postulates the desired answer rather than giving a basis for deriving it—an objection that becomes more important when we move away from identical or functionally equivalent source code and start to consider approximate similarities. (See my criticism of Leslie (1991)’s proposal that you should make your choice as though you were also choosing on behalf of other agents of similar causal structure. If I’m not mistaken, your proposal seems to be a formalization of that idea.)
Here’s an alternative proposal.
Metacircular Decision Theory (MCDT)
For purposes of this discussion, let me just stipulate that subjective probabilities will be modeled as though they were quantum under MWI—that is, we’ll regard the entire distribution as part of the universe. That move will help with dual-simulation/counterfactual-mugging scenarios; but also, as I argued in Good and Real, we effectively make that move whenever we assign value to probabilistic outcomes even in nonesoteric situations (so we may as well avail ourselves of that move in the weird scenarios too, though eventually we need to justify the move).
Say we have an agent embodied in the universe. The agent knows some facts about the universe (including itself), has an inference system of some sort for expanding on those facts, and has a preference scheme that assigns a value to the set of facts, and is wired to select an action—specifically, the/an action that implies (using its inference system) the/a most-preferred set of facts.
But without further constraint, this process often leads to a contradiction. Suppose the agent’s repertoire of actions is A1, …An, and the value of action Ai is simply i. Say the agent starts by considering the action A7, and dutifully evaluates it as 7. Next, it contemplates the action A6, and reasons as follows: “Suppose I choose A6. I know I’m a utility-maximizing agent, and I already know there’s another choice that has value 7. Therefore, if follows from my (hypothetical) choice of A6 that A6 has a value of at least 7.” But that inference, while sound, contradicts the fact that A6′s value is 6.
Unsurprisingly, a false premise leads to a contradiction. To avoid contradiction, we need to limit the set of facts that the agent is allowed to reason from when making inferences about a hypothetical action. But which facts do we omit? Different choices yield different preferred actions. If we omit the fact that val(A6)=6, then we can infer val(A6)>=7; if instead we omit the fact that the agent utility-maximizes, then we can infer val(A6)=6 without contradiction (or at least without the particular contradiction above).
So this is the usual full-blown problem of counterfactual inference: which things do we “hold fixed” when contemplating a counterfactual antecedent, and which do we “let vary” for consistency with that antecedent? Different choices here correspond to different decision theories. If the agent allows inferences (only) from all facts about physical law as applied to the future, and all facts about the past and present universe-state, except for facts about the agent’s internal decision-making state, then we get CDT. If we leave the criteria unspecified/ambiguous, we get EDT. If we allow the agent to reason from facts about the future as well as the past and present, we get FDT (Fatalist Decision Theory: choice is futile, which most people think follows from determinism).
MCDT’s proposed criterion is this: the agent makes a meta-choice about which facts to omit when making inferences about the hypothetical actions, and selects the set of facts which lead to the best outcome if the agent then evaluates the original candidate actions with respect to that choice of facts. The agent then iterates that meta-evaluation as needed (probably not very far) until a fixed point is reached, i.e. the same choice (as to which facts to omit) leaves the first-order choice unchanged. (It’s ok if that’s intractable or uncomputable; the agent can muddle through with some approximate algorithm.)
EDIT1: The algorithm also needs to check, when it evaluates a meta-level choice candidate, that the winning choice at the next level down is consistent with all known facts. If not, the meta-level candidate is eliminated from consideration. (Otherwise, the A6 choice could remain stable in the example above.)
EDIT2: Or rather, that consistency check can probably substitute for the additional meta-iterations.
So e.g. in Newcomb’s Problem or the Prisoner’s Dilemma, the agent can calculate that it does better if it retains the fact that its dispositional-state/source-code is functionally equivalent to the simulation’s/other’s (but omits facts about which particular choice is made by both) than if it makes the CDT choice and omits the fact about equivalence, but keeps the facts about the simulation’s/other’s choice (or keeps some probability distribution about the simulation’s/other’s choice).
In other words, metacircular consistency isn’t just a test that we’d like the decision theory to pass. Metacircular consistency is the theory; it is the algorithm.
Replied at http://lesswrong.com/lw/164/timeless_decision_theory_and_metacircular/
To clarify: the agent in MCDT is a particular physical instantiation, rather than being timeless/Platonic (well, except insofar as physics itself is Platonic).