I can’t help but notice that Transparent Newcomb seems flawed: namely, it seems impossible to have a very accurate predictor, even if the predictor is capable of perfectly simulating your brain.
Someone who doesn’t care about the money and only wants to spite the predictor could precommit to the following strategy:
If I see that the big box is empty, I’ll take one box. If I see that the big box is full, I’ll take both boxes.
Then, the predictor has a 0% chance of being correct, which is far from “very accurate”. (Of course, there could be some intervention which forces you to choose against your will, but that would defeat the whole point of the thought experiment if you can’t enforce your decisions)
Anyway, this is just poking holes in Transparent Newcomb and probably unrelated to the reflexive inconsistency and preference-changing mentioned in the post, as I suspect that you could find some other thought experiment which arrives at the same conclusions in the post. But I’m curious if anyone’s mentioned this apparent paradox in Transparent Newcomb before, and if there’s an agreed-upon “solution” to it.
Isn’t this identical to the proof for why there’s no general algorithm for solving the Halting Problem?
The Halting Problem asks for an algorithm A(S, I) that when given the source code S and input I for another program will report whether S(I) halts (vs run forever).
There is a proof that says A does not exist. There is no general algorithm for determining whether an arbitrary program will halt. “General” and “arbitrary” are important keywords because it’s trivial to consider specific algorithms and specific programs and say, yes, we can determine that this specific program will halt via this specific algorithm.
That proof of the Halting Problem (for a general algorithm and arbitrary programs!) works by defining a pathological program S that inspects what the general algorithm A would predict and then does the opposite.
What you’re describing above seems almost word-for-word the same construction used for constructing the pathological program S, except the algorithm A for “will this program halt?” is replaced by the predictor “will this person one-box?”.
I’m not sure that this necessarily matters for the thought experiment. For example, perhaps we can pretend that the predictor works on all strategies except the pathological case described here, and other strategies isomorphic to it.
If you precommit to act this way, then it’s not the case that [the predictor predicts that you wouldn’t take the small box regardless of what you see in the big one] (since you do take it if the big box is full, so in that case you can’t be predicted not take the small box). By the stated algorithm of box-filling, this results in the big box being empty. The predictor is not predicting what happens in actuality, it’s predicting what happens in the hypothetical situation where the big box is full (regardless of whether this situation actually takes place), and what happens in the hypothetical situation where the big box is empty (also regardless of what happens in reality). The predictor is not deciding what to do in these hypothetical situations, it’s deciding what to do in reality.
Even if the big box is empty and you one-box anyway, the predictor can just say “Yes, but if the big box had been full, you would have two-boxed.” and it’s unclear whether the predictor is accurate or not since you weren’t in that situation.
and it’s unclear whether the predictor is accurate or not since you weren’t in that situation
The predictor acts based on your behavior in both hypotheticals, and from either of the hypotheticals you don’t get to observe your own decision in the other, to verify that it was taken into account correctly.
If the big box is full and you one-box, the predictor can say “Yes, and if the big box had been empty, you would have also one-boxed.” And it’s unclear whether the predictor is accurate or not since you weren’t in that situation. Being wrong in your favor is also a possibility.
You don’t get to verify that your decision was taken into account correctly anyway. If the big box is full and you two-box, the predictor can say “Yes, so you are currently in a hypothetical, in reality the big box is empty.”
This objection to Newcomb-like problems (that IF I’m actually predicted, THEN what I think I’d do is irrelevant—either the question is meaningless or the predictor is impossible) does get brought up occasionally, and usually ignored or shouted down as “fighting the hypothetical”. The fact that humans don’t precommit, and if they could the question would be uninteresting, is pretty much ignored.
Replacing the human with a simple, transparent, legible decision process makes this a lot more applicable, but also a lot less interesting. Whatever can be predicted as one-boxing makes more money, and per the setup, that implies actually one-boxing. done.
Your estimate of your actions can be correct or relevant even if you’ve been predicted.
Huh? You break the simulation if you act differently than the prediction. Sure you can estimate or say whatever you want, but you can be wrong, and Omega can’t.
just run the algorithm in your mind and do what it says.
This really does not match my lived experience of predicting and committing myself, nor the vast majority of fiction or biographical work I’ve read. Actual studies on commitment levels and follow-through are generally more complicated, so it’s a little less clear how strongly counter-evident they are, but they’re certainly not evidence that humans are rational in these dimensions. You can claim to precommit. You can WANT to precommit. You can even believe it’s in your best interest to have precommitted. But when the time comes, that commitment is weaker than you thought.
You break the simulation if you act differently than the prediction.
I didn’t say you could act differently than the prediction. It’s correct that you can’t, but that’s not relevant for either variant of the problem.
Precommitment is a completely different concept from commitment. Commitment involves feelings, strength of will, etc. Precommitment involves none of those, and it only means running the simple algorithm. It doesn’t have a strength—it’s binary (either I run it, or not).
It’s this running of the simple algorithm in your mind that gives you the pseudomagical powers in Newcomb’s problem that manifest as the seeming ability to influence the past. (Omega already left, but because I’m precommited to one-box, his prediction will have been that I would one-box. This goes both ways, of course—if I would take both boxes, I will lose, even though Omega already left.)
You could use the word precommitment to mean something else—like wishing really hard to execute action X beforehand, and then updating on evidence and doing whatever appears to result in most utility. We could call this precommitment_2 (and the previous kind precommitment_1). The problem is that precommitting_2 to one-box implies precommitting_1 to two-box, and so it guarantees losing.
Then you’re wrong as a matter of biology. Neural networks can do that in general.
I could see an argument being made that if the precommitment algorithm contains a line “jump off a cliff,” the human might freeze in fear instead of being capable of doing that.
But if that line is “take one box,” I don’t see why a human being couldn’t do it.
An algorithm would be, to put it simply, a list of instructions.
So are you saying that a human isn’t capable of following a list of instructions, and if so, do you mean any list of of instructions at all, or only some specific ones?
A human isnt capable.of following a list of instructions perfectly, relentlessly, forever. The problem with a pre comitment is sticking to it...whether you think of it as an algorithm.or a resolution or a promise or an oath.
I can’t help but notice that Transparent Newcomb seems flawed: namely, it seems impossible to have a very accurate predictor, even if the predictor is capable of perfectly simulating your brain.
Someone who doesn’t care about the money and only wants to spite the predictor could precommit to the following strategy:
Then, the predictor has a 0% chance of being correct, which is far from “very accurate”. (Of course, there could be some intervention which forces you to choose against your will, but that would defeat the whole point of the thought experiment if you can’t enforce your decisions)
Anyway, this is just poking holes in Transparent Newcomb and probably unrelated to the reflexive inconsistency and preference-changing mentioned in the post, as I suspect that you could find some other thought experiment which arrives at the same conclusions in the post. But I’m curious if anyone’s mentioned this apparent paradox in Transparent Newcomb before, and if there’s an agreed-upon “solution” to it.
Isn’t this identical to the proof for why there’s no general algorithm for solving the Halting Problem?
The Halting Problem asks for an algorithm
A(S, I)
that when given the source codeS
and inputI
for another program will report whetherS(I)
halts (vs run forever).There is a proof that says
A
does not exist. There is no general algorithm for determining whether an arbitrary program will halt. “General” and “arbitrary” are important keywords because it’s trivial to consider specific algorithms and specific programs and say, yes, we can determine that this specific program will halt via this specific algorithm.That proof of the Halting Problem (for a general algorithm and arbitrary programs!) works by defining a pathological program
S
that inspects what the general algorithmA
would predict and then does the opposite.What you’re describing above seems almost word-for-word the same construction used for constructing the pathological program
S
, except the algorithmA
for “will this program halt?” is replaced by the predictor “will this person one-box?”.I’m not sure that this necessarily matters for the thought experiment. For example, perhaps we can pretend that the predictor works on all strategies except the pathological case described here, and other strategies isomorphic to it.
If you precommit to act this way, then it’s not the case that [the predictor predicts that you wouldn’t take the small box regardless of what you see in the big one] (since you do take it if the big box is full, so in that case you can’t be predicted not take the small box). By the stated algorithm of box-filling, this results in the big box being empty. The predictor is not predicting what happens in actuality, it’s predicting what happens in the hypothetical situation where the big box is full (regardless of whether this situation actually takes place), and what happens in the hypothetical situation where the big box is empty (also regardless of what happens in reality). The predictor is not deciding what to do in these hypothetical situations, it’s deciding what to do in reality.
Even if the big box is empty and you one-box anyway, the predictor can just say “Yes, but if the big box had been full, you would have two-boxed.” and it’s unclear whether the predictor is accurate or not since you weren’t in that situation.
The predictor acts based on your behavior in both hypotheticals, and from either of the hypotheticals you don’t get to observe your own decision in the other, to verify that it was taken into account correctly.
If the big box is full and you one-box, the predictor can say “Yes, and if the big box had been empty, you would have also one-boxed.” And it’s unclear whether the predictor is accurate or not since you weren’t in that situation. Being wrong in your favor is also a possibility.
You don’t get to verify that your decision was taken into account correctly anyway. If the big box is full and you two-box, the predictor can say “Yes, so you are currently in a hypothetical, in reality the big box is empty.”
This objection to Newcomb-like problems (that IF I’m actually predicted, THEN what I think I’d do is irrelevant—either the question is meaningless or the predictor is impossible) does get brought up occasionally, and usually ignored or shouted down as “fighting the hypothetical”. The fact that humans don’t precommit, and if they could the question would be uninteresting, is pretty much ignored.
Replacing the human with a simple, transparent, legible decision process makes this a lot more applicable, but also a lot less interesting. Whatever can be predicted as one-boxing makes more money, and per the setup, that implies actually one-boxing. done.
This doesn’t follow. Your estimate of your actions can be correct or relevant even if you’ve been predicted.
Humans can precommit just like simple machines—just run the algorithm in your mind and do what it says. There is nothing more to it.
Huh? You break the simulation if you act differently than the prediction. Sure you can estimate or say whatever you want, but you can be wrong, and Omega can’t.
This really does not match my lived experience of predicting and committing myself, nor the vast majority of fiction or biographical work I’ve read. Actual studies on commitment levels and follow-through are generally more complicated, so it’s a little less clear how strongly counter-evident they are, but they’re certainly not evidence that humans are rational in these dimensions. You can claim to precommit. You can WANT to precommit. You can even believe it’s in your best interest to have precommitted. But when the time comes, that commitment is weaker than you thought.
I didn’t say you could act differently than the prediction. It’s correct that you can’t, but that’s not relevant for either variant of the problem.
Precommitment is a completely different concept from commitment. Commitment involves feelings, strength of will, etc. Precommitment involves none of those, and it only means running the simple algorithm. It doesn’t have a strength—it’s binary (either I run it, or not).
It’s this running of the simple algorithm in your mind that gives you the pseudomagical powers in Newcomb’s problem that manifest as the seeming ability to influence the past. (Omega already left, but because I’m precommited to one-box, his prediction will have been that I would one-box. This goes both ways, of course—if I would take both boxes, I will lose, even though Omega already left.)
You could use the word precommitment to mean something else—like wishing really hard to execute action X beforehand, and then updating on evidence and doing whatever appears to result in most utility. We could call this precommitment_2 (and the previous kind precommitment_1). The problem is that precommitting_2 to one-box implies precommitting_1 to two-box, and so it guarantees losing.
That doesnt seem like something a human being could do.
Then you’re wrong as a matter of biology. Neural networks can do that in general.
I could see an argument being made that if the precommitment algorithm contains a line “jump off a cliff,” the human might freeze in fear instead of being capable of doing that.
But if that line is “take one box,” I don’t see why a human being couldn’t do it.
You mean artificial neural networks? Which can also do things like running forever without resting. I think a citation is needed.
An algorithm would be, to put it simply, a list of instructions.
So are you saying that a human isn’t capable of following a list of instructions, and if so, do you mean any list of of instructions at all, or only some specific ones?
A human isnt capable.of following a list of instructions perfectly, relentlessly, forever. The problem with a pre comitment is sticking to it...whether you think of it as an algorithm.or a resolution or a promise or an oath.
So you’re saying humans can’t follow an algorithm that would require to be followed perfectly, relentlessly and forever.
But one-boxing is neither relentless, nor forever. That leaves perfection.
Are you suggesting that humans can’t perfectly one-box? If so, are you saying they can only imperfectly one-box?