The first calculator is like calculators we are used to. The second calculator is from a foreign land: it’s identical except that the numbers it outputs always come with a negative sign (‘–’) in front of them when you’d expect there to be none, and no negative sign when you expect there to be one. Are these calculators running the same algorithm or not?
Yes: Knowing what output the one gives helps you figure out what output the other gives.
“But wait, don’t we already know what output the other gives? It’s a calculator.”
Yes, this test admittedly only works for programs that are hard enough to run that you notice a shortcut.
Fortunately, the test only needs to work for finding instances of yourself, and you don’t yet know what you will do while you are calculating the consequences of your possible actions. So: In order to figure out whether to one-box, start with the assumption that running your sourcecode on your inputs outputs “one-box”, notice that you can now infer more about what you were predicted to do, act accordingly. Changing in what language you are being predicted does not break this.
I’m not trying to rely on statistical correlation. Suppose I’m in a Prisoner’s Dilemma with a twin, I’m sitting in a red room, he’s sitting in a blue room. I am implemented in C and he is implemented in c, which is the same programming language except all the lower- and uppercase characters have the opposite interpretation. In order to figure out whether to cooperate, I assume that FDT cooperates in my situation and see what I can deduce. Let us reason about the computation trace t by which FDT cooperated in my situation. (We can’t write out t fully, since then it would be longer than itself.) If in t we swap all occurences of “blue” and “red”, it is still a legal computation trace; thus we can infer that FDT cooperates also in the situation where I am sitting in a blue room and my twin is sitting in a red room. Next, we can prove that translating a program between C and c never changes behavior; thus we can infer that FDT cooperates also in the situation where my twin is sitting in a blue room and I am sitting in a red room. Now we have shown that if FDT cooperates in my situation, we achieve mutual cooperation. Similarly, if FDT defects in my situation, we achieve mutual defection. Therefore, I cooperate.
Yes: Knowing what output the one gives helps you figure out what output the other gives.
“But wait, don’t we already know what output the other gives? It’s a calculator.”
Yes, this test admittedly only works for programs that are hard enough to run that you notice a shortcut.
Fortunately, the test only needs to work for finding instances of yourself, and you don’t yet know what you will do while you are calculating the consequences of your possible actions. So: In order to figure out whether to one-box, start with the assumption that running your sourcecode on your inputs outputs “one-box”, notice that you can now infer more about what you were predicted to do, act accordingly. Changing in what language you are being predicted does not break this.
I explain at length why you can’t just rely on correlations between two algorithms wrt their outputs.
I’m not trying to rely on statistical correlation. Suppose I’m in a Prisoner’s Dilemma with a twin, I’m sitting in a red room, he’s sitting in a blue room. I am implemented in C and he is implemented in c, which is the same programming language except all the lower- and uppercase characters have the opposite interpretation. In order to figure out whether to cooperate, I assume that FDT cooperates in my situation and see what I can deduce. Let us reason about the computation trace t by which FDT cooperated in my situation. (We can’t write out t fully, since then it would be longer than itself.) If in t we swap all occurences of “blue” and “red”, it is still a legal computation trace; thus we can infer that FDT cooperates also in the situation where I am sitting in a blue room and my twin is sitting in a red room. Next, we can prove that translating a program between C and c never changes behavior; thus we can infer that FDT cooperates also in the situation where my twin is sitting in a blue room and I am sitting in a red room. Now we have shown that if FDT cooperates in my situation, we achieve mutual cooperation. Similarly, if FDT defects in my situation, we achieve mutual defection. Therefore, I cooperate.