Breaking Newcomb’s Problem with Non-Halting states
Note: I dug around quite a bit and haven’t found anyone who previously discussed this, but if it’s been talked about previously somewhere links would be appreciated.
Epistemic Status: Interesting Conjecture, I think my logic is all sound but please check me
I think I discovered a toy decision theory agent that generates non-halting states under certain conditions and is capable of breaking various game theory problems in some novel ways.
Imagine an idealized agent employing what I’m informally calling conditional decision theory. Lets name her Marion. Marion goes into Newcomb’s Problem with the following algorithm:
“If I Predict that the Predictor has put money in both boxes, I will take both boxes, because that would result in gaining $1,001,000, however, if I predict that the Predictor has only put money in one box, I will only take box B, since this will result in the accurate prediction that I will take one box, which will mean that both boxes contain money and allow me to take both boxes.”
The Predictor wants to fill Box B conditional on its prediction that Marion doesn’t also take Box A. Marion wants to take Box A conditional on her prediction that Box B is filled.
If Marion takes both boxes, The Predictor will have predicted that she takes both boxes, causing it to not fill box B. Marion knows that the Predictor will predict this, so she only takes box B. The Predictor knows Marion will only take Box B, therefore box B is filled. Marion knows this and thus takes both boxes. The Predictor knows this so it leaves box B empty. Marion knows box B will be empty and thus only takes Box B. The Predictor knows this so it fills box B. Marion knows this so she takes both boxes.
This is, as far as I can tell, non-halting under the premise that both agents have identical predictive power. Marion can’t win using this algorithm, but she can keep the Predictor from winning. If Marion and The Predictor have identical predictive and reasoning capability, then the answer as to “What is the outcome?” is unfalsifiable, the retrocausal inferences recurse to infinity and neither one of them can accurately predict the other.
This only works assuming you’re basically playing two Predictors against one another, in the real world the predictive errors stack up with each successive recursion until one of them falls out of equilibrium and fails to accurately predict the other.
If you add a third conditional clause to Marion you can patch this out:
”″If I Predict that the Predictor has put money in both boxes, I will take both boxes, because that would result in gaining $1,001,000, however, if I predict that the Predictor has only put money in one box, I will only take box B, since this will result in the accurate prediction that I will take one box, which will mean that both boxes contain money and allow me to take both boxes. I will execute this algorithm unless doing so produces non-halting states, in which case I will only take Box B since this creates a schelling point at my next most preferred worldstate.”
Lets try out Marion in Solomon’s Problem:
”If I predict that chewing gum reduces throat lesions, I will chew gum, if I predict that chewing gum increases throat lesions, I will not chew gum.”
Huh, this is unusually straightforward and basically cashes out to the litany of Tarski. Wonder what’s up with that? Lets try her in Parfit’s Hitchhiker:
“If I predict that Paul Ekman will give me a ride out of the desert, I will not pay him $100, however, if predict that he will not give me a ride out of the desert unless I pay him $100, I will pay him $100.”
Paul Ekman accurates predicts that Marion won’t pay him $100, so he doesn’t agree to give her a ride, so she will pay him $100, so he will give her a ride, so she won’t pay him $100, so he won’t give her a ride. This is again, non-halting, and can be patched the same way we patched Newcomb’s problem. We just say “If this produces non-halting states, move on to the best outcome that doesn’t.” The next best outcome for Marion to getting out of the desert for free is getting out of the desert at all, so Marion gets a ride out of the desert and pays Paul.
Alright, last one, lets test Marion against Ziz’s Bear Attack
“If Evil Emperor Paul Ekman believes me when I tell him that I will bare my neck and let myself be mauled to death, then I will let myself be mauled to death. However, if Emperor Ekman doesn’t believe me when I tell him that I will bare my neck, than I will run from the bear and attempt to escape.”
Parable of the Dagger, throw her to the bears at once!
Sure, in a real world scenario, you can’t usually one up your captors with semantic games like this. However, if we’re treating this as a purely Newcomblike hypothetical, then this breaks Paul’s ability to predict Marion while also optimizing for what he wants (entertainment via bear maulings). If Emperor Ekman believes Marion, than he doesn’t throw her to the bears because he’s not entertained by it. If he doesn’t believe Marion, he gets entertained, so he should want to believe that he doesn’t believe Marion. However this would require him to predict her inaccurately, and Paul Ekman is famously accurate in his predictions. In the real world he can just toss that out and Parable of the Dagger her to the bears anyway “well let’s test it and see” but if he’s purely operating on his predictions, then he accurately predicts that Marion will just stand there and he doesn’t throw her to the bears. Okay that is halting. However that’s also not the scenario.
In the original scenario, logical time has already ticked forward to a place after Paul Ekman has made his predictions. In the original scenario, he already doesn’t believe Marion. A typical timeless decision theory agent would not have a conditional attached to their timeless algorithm specifying that they should run if Paul Ekman doesn’t believe them. This agent stands there lamenting that they wish they had a different decision theory and lets themselves be mauled to death by a bear in order to delete the counterfactual timeline.
Wait...what? That can’t be right, counterfactuals are by definition not real. If you think you’re living in a counterfactual timeline, you are incorrect. There is only one flow of time, you can’t go back and anthropic principle a different version of yourself into existence at the expense of your current world, there is just the world.
In this scenario, Marion knows that Paul Ekman has already incorrectly predicted her, so the non-halting state created by the indeterminability of the conditional has already collapsed and her best option is to try and run from the bear. What is the difference between this scenario and Newcomb’s problem?
In Newcomb’s Problem, the non-halting state is collapsed by Marion concluding that taking box B alone is the best option, which means that the Predictor has filled Box B. In Ziz’s Bear Attack, the non-halting state is collapsed by Paul Ekman predicting either correctly or incorrectly that Marion will bare her neck to the bear. In both scenarios, the act of deciding collapses the non-halting state, however in Newcomb’s the decider is Marion, and in Ziz’s its Paul Ekman, who essentially forces Marion’s conditional computation into one state or another based on his prediction of her. The timeless agent is forced to iterate their decision tree in this scenario as if the non-halting state has not already collapsed. This forced entanglement means the timeless agent believes they should let themselves be eaten in order to maximize their chances of survival, despite having strong evidence for not being in a world where this would help and this being a completely incoherent position.
The Ziz’s Bear Attack scenario is interesting because it presents what is to a naive reasoner clearly a bad move and calls it actually the rational thing to do, in a very similar way to how Casual theorists will insist that actually two-boxing Newcomb’s problem is the rational choice. Well my goal isn’t to do the rational thing it’s to win! Getting eaten by a bear to avoid a counterfactual that I am already in the midst of isn’t winning, it’s fractally engineered losing.
So Marion with third conditional chews gum and one-boxes but still runs from bears released by evil emperors and gets rescued from the desert in exchange for paying $100. This seems really solid. I don’t think conditional decision theory Marion is anything like an ideal rational agent in the sense of always winning, but she seems pretty good at getting the right outcomes on these games or else breaking them in interesting ways.
I don’t know what you can do with all this, and really I just wanted to start a discussion around the topic and see if it sparked any interesting insights or if I’m wrong on something in how I’m thinking about these scenarios. I will close with this postulate: an actually ideal decision theory should let you actually win Newcomb’s problem. That is, I don’t believe it is logically impossible to create a decision theory which lets you reliably win $1,001,000 on Newcomb’s problem.