In UDT1, I would model this problem using the following world program. (For those not familiar with programming convention, 0=False, and 1=True.)
def P(i):
E = (Pi(i) == 0)
D = Omega_Predict(S, i, "box contains $1M")
if D ^ E:
C = S(i, "box contains $1M")
payout = 1001000 - C * 1000 + E * 1e9
else:
C = S(i, "box is empty")
payout = 1000 - C * 1000 + E * 1e9
We then ask, what function S maximizes the expected payout at the end of P? When S sees “box is empty” clearly it should return 0. What should it do when it sees “box contains $1M”?
If it returns 0 (i.e. two-boxes), then
with probability .1, E=1, D^E=1, and payout = 1e9 + 1001000,
with probability .9, E=0, D^E=0, and payout = 1000
If it returns 1 (i.e. one-boxes), then
with probability .1, E=1, D^E=0, and payout = 1e9 + 1000,
with probability .9, E=0, D^E=1, and payout = 1000000
So returning 1 maximizes expected payout. If S=UDT1, then whenever it’s called, it performs the above computation to determine what the optimal S is, then returns the same value that S would given that input.
The updateless part of the solution is that when determining the counterfactual dependencies that are necessary to find the optimal S*, UDT1 doesn’t look at its input, so that even when called with “box contains $1M”, it still doesn’t “know” that D^E=1, in which case E is clearly independent of what it returns.
I can’t follow the payouts here. For example: 1001000 - C * 1000 + E * 1e9, seems to indicate that the payout could be over $2 million. How is that possible?
The “E * 1e9” (note that 1e9 is a billion) part is supposed to model “Thus, if I happen to have a strong enough preference that E output True”. Does that help?
That’s very elegant! But the trick here, it seems to me, lies in the rules for setting up the world program in the first place.
First, the world-program’s calling tree should match the structure of TDT’s graph, or at least match the graph’s (physically-)causal links. The physically-causal part of the structure tends to be uncontroversial, so (for present purposes) I’m ok with just stipulating the physical structure for a given problem.
But then there’s the choice to use the same variable S in multiple places in the code. That corresponds to a choice (in TDT) to splice in a logical-dependency link from the Platonic decision-computation node to other Platonic nodes. In both theories, we need to be precise about the criteria for this dependency. Otherwise, the sense of dependency you’re invoking might turn out to be wrong (it makes the theory prescribe incorrect decisions) or question-begging (it implicitly presupposes an answer to the key question that the theory itself is supposed to figure out for us, namely what things are or are not counterfactual consequences of the decision-computation).
So the question, in UDT1, is: under what circumstances do you represent two real-world computations as being tied together via the same variable in a world-program?
That’s perhaps straightforward if S is implemented by literally the same physical state in multiple places. But as you acknowledge, you might instead have distinct Si’s that diverge from one another for some inputs (though not for the actual input in this case). And the different instances need not have the same physical substrate, or even use the same algorithm, as long as they give the same answers when the relevant inputs are the same, for some mapping between the inputs and between the outputs of the two Si’s. So there’s quite a bit of latitude as to whether to construe two computations as “logically equivalent”.
So, for example, for the conventional transparent-boxes problem, what principle tells us to formulate the world program as you proposed, rather than having:
def P1(i):
const S1;
E = (Pi(i) == 0)
D = Omega_Predict(S1, i, "box contains $1M")
if D ^ E:
C = S(i, "box contains $1M")
payout = 1001000 - C * 1000
else:
C = S(i, "box is empty")
payout = 1000 - C * 1000
(along with a similar program P2 that uses constant S2, yielding a different output from Omega_Predict)?
This alternative formulation ends up telling us to two-box. In this formulation, if S and S1 (or S and S2) are in fact the same, they would (counterfactually) differ if a different answer (than the actual one) were output from S—which is precisely what a causalist asserts. (A similar issue arises when deciding what facts to model as “inputs” to S—thus forbidding S to “know” those facts for purposes of figuring out the counterfactual dependencies—and what facts to build instead into the structure of the world-program, or to just leave as implicit background knowledge.)
So my concern is that UDT1 may covertly beg the question by selecting, among the possible formulations of the world-program, a version that turns out to presuppose an answer to the very question that UDT1 is intended to figure out for us (namely, what counterfactually depends on the decision-computation). And although I agree that the formulation you’ve selected in this example is correct and the above alternative formulation isn’t, I think it remains to explain why.
(As with my comments about TDT, my remarks about UDT1 are under the blanket caveat that my grasp of the intended content of the theories is still tentative, so my criticisms may just reflect a misunderstanding on my part.)
First, to clear up a possible confusion, the S in my P is not supposed to be a variable. It’s a constant, more specifically a piece of code that implements UDT1 itself. (If I sometimes talk about it as if it’s a variable, that’s because I’m trying to informally describe what is going on inside the computation that UDT1 does.)
For the more general question of how do we know the structure of the world program, the idea is that for an actual AI, we would program it to care about all possible world programs (or more generally, mathematical structures, see example 3 in my UDT1 post, but also Nesov’s recent post for a critique). The implementation of UDT1 in the AI would then figure out which world programs it’s in by looking at its inputs (which would contain all of the AI’s memories and sensory data) and checking which world programs call it with those inputs.
For these sample problems, the assumption is that somehow Omega has previously provided us with enough evidence for us to trust its word on what the structure of the current problem is. So in the actual P, ‘S(i, “box contains $1M”)’ is really something like ‘S(memories, omegas_explanations_about_this_problem, i, “box contains $1M”)’ and these additional inputs allow S to conclude that it’s being invoked inside this P, and not some other world program.
(An additional subtlety here is that if we consider all possible world programs, there are bound to be some other world programs where S is being called with these exact same inputs, for example ones where S is being instantiated inside a Boltzmann brain, but presumably those worlds/regions have very low weights, meaning that the AI doesn’t care much about them.)
Let me know if that answers your questions/concerns. I didn’t answer you point by point because I’m not sure which questions/concerns remain after you see my general answers. Feel free to repeat anything you still want me to answer.
First, to clear up a possible confusion, the S in my P is not supposed to be a variable. It’s a constant, more specifically a piece of code that implements UDT1 itself. (If I sometimes talk about it as if it’s a variable, that’s because I’m trying to informally describe what is going on inside the computation that UDT1 does.)
Then it should be S(P), because S can’t make any decisions without getting to read the problem description.
Note that since our agent is considering possible world-programs, these world-programs are in some sense already part of the agent’s program (and the agent is in turn part of some of these world-programs-inside-the-agent, which reflects recursive character of the definition of the agent-program). The agent is a much better top-level program to consider than all-possible-world-programs, which is even more of a simplification if these world-programs somehow “exist at the same time”. When the (prior) definition of the world is seen as already part of the agent, a lot of the ontological confusion goes away.
def P1(i):
const S1;
E = (Pi(i) == 0)
D = Omega_Predict(S1, i, "box contains $1M")
if D ^ E:
C = S(i, "box contains $1M")
payout = 1001000 - C * 1000
else:
C = S(i, "box is empty")
payout = 1000 - C * 1000
(along with a similar program P2 that uses constant S2, yielding a different output from Omega_Predict)?
This alternative formulation ends up telling us to two-box. In this formulation, if S and S1 (or S and S2) are in fact the same, they would (counterfactually) differ if a different answer (than the actual one) were output from S—which is precisely what a causalist asserts. (A similar issue arises when deciding what facts to model as “inputs” to S—thus forbidding S to “know” those facts for purposes of figuring out the counterfactual dependencies—and what facts to build instead into the structure of the world-program, or to just leave as implicit background knowledge.)
So my concern is that UDT1 may covertly beg the question by selecting, among the possible formulations of the world-program, a version that turns out to presuppose an answer to the very question that UDT1 is intended to figure out for us (namely, what counterfactually depends on the decision-computation). And although I agree that the formulation you’ve selected in this example is correct and the above alternative formulation isn’t, I think it remains to explain why.
(As with my comments about TDT, my remarks about UDT1 are under the blanket caveat that my grasp of the intended content of the theories is still tentative, so my criticisms may just reflect a misunderstanding on my part.)
It seems to me that the world-program is part of the problem description, not the analysis. It’s equally tricky whether it’s given in English or in a computer program; Wei Dai just translated it faithfully, preserving the strange properties it had to begin with.
My concern is that there may be several world-programs that correspond faithfully to a given problem description, but that correspond to different analyses, yielding different decision prescriptions, as illustrated by the P1 example above. (Upon further consideration, I should probably modify P1 to include “S()=S1()” as an additional input to S and to Omega_Predict, duly reflecting that aspect of the problem description.)
If there are multiple translations, then either the translations are all mathematically equivalent, in the sense that they agree on the output for every combination of inputs, or the problem is underspecified. (This seems like it ought to be the definition for the word underspecified. It’s also worth noting that all game-theory problems are underspecified in this sense, since they contain an opponent you know little about.)
Now, if two world programs were mathematically equivalent but a decision theory gave them different answers, then that would be a serious problem with the decision theory. And this does, in fact, happen with some decision theories; in particular, it happens to theories that work by trying to decompose the world program into parts, when those parts are related in a way that the decision theory doesn’t know how to handle. If you treat the world-program as an opaque object, though, then all mathematically equivalent formulations of it should give the same answer.
I assume (please correct me if I’m mistaken) that you’re referring to the payout-value as the output of the world program. In that case, a P-style program and a P1-style program can certainly give different outputs for some hypothetical outputs of S (for the given inputs). However, both programs’s payout-outputs will be the same for whatever turns out to be the actual output of S (for the given inputs).
P and P1 have the same causal structure. And they have the same output with regard to (whatever is) the actual output of S (for the given inputs). But P and P1 differ counterfactually as to what the payout-output would be if the output of S (for the given inputs) were different than whatever it actually is.
So I guess you could say that what’s unspecified are the counterfactual consequences of a hypothetical decision, given the (fully specified) physical structure of the scenario. But figuring out the counterfactual consequences of a decision is the main thing that the decision theory itself is supposed to do for us; that’s what the whole Newcomb/Prisoner controversy boils down to. So I think it’s the solution that’s underspecified here, not the problem itself. We need a theory that takes the physical structure of the scenario as input, and generates counterfactual consequences (of hypothetical decisions) as outputs.
PS: To make P and P1 fully comparable, drop the “E*1e9” terms in P, so that both programs model the conventional transparent-boxes problem without an extraneous pi-preference payout.
This conversation is a bit confused. Looking back, P and P1 aren’t the same at all; P1 corresponds to the case where Omega never asks you for any decision at all! If S must be equal to S1 and S1 is part of the world program, then S must be part of the world program, too, not chosen by the player. If choosing an S such that S!=S1 is allowed, then it corresponds to the case where Omega simulates someone else (not specified).
The root of the confusion seems to be that Wei Dai wrote “def P(i): …”, when he should have written “def P(S): …”, since S is what the player gets to control. I’m not sure where making i a parameter to P came from, since the English description of the problem had i as part of the world-program, not a parameter to it.
In UDT1, I would model this problem using the following world program. (For those not familiar with programming convention, 0=False, and 1=True.)
We then ask, what function S maximizes the expected payout at the end of P? When S sees “box is empty” clearly it should return 0. What should it do when it sees “box contains $1M”?
If it returns 0 (i.e. two-boxes), then
with probability .1, E=1, D^E=1, and payout = 1e9 + 1001000,
with probability .9, E=0, D^E=0, and payout = 1000
If it returns 1 (i.e. one-boxes), then
with probability .1, E=1, D^E=0, and payout = 1e9 + 1000,
with probability .9, E=0, D^E=1, and payout = 1000000
So returning 1 maximizes expected payout. If S=UDT1, then whenever it’s called, it performs the above computation to determine what the optimal S is, then returns the same value that S would given that input.
The updateless part of the solution is that when determining the counterfactual dependencies that are necessary to find the optimal S*, UDT1 doesn’t look at its input, so that even when called with “box contains $1M”, it still doesn’t “know” that D^E=1, in which case E is clearly independent of what it returns.
I can’t follow the payouts here. For example:
1001000 - C * 1000 + E * 1e9
, seems to indicate that the payout could be over $2 million. How is that possible?The “E * 1e9” (note that 1e9 is a billion) part is supposed to model “Thus, if I happen to have a strong enough preference that E output True”. Does that help?
Ah, thanks, that makes sense now!
That’s very elegant! But the trick here, it seems to me, lies in the rules for setting up the world program in the first place.
First, the world-program’s calling tree should match the structure of TDT’s graph, or at least match the graph’s (physically-)causal links. The physically-causal part of the structure tends to be uncontroversial, so (for present purposes) I’m ok with just stipulating the physical structure for a given problem.
But then there’s the choice to use the same variable S in multiple places in the code. That corresponds to a choice (in TDT) to splice in a logical-dependency link from the Platonic decision-computation node to other Platonic nodes. In both theories, we need to be precise about the criteria for this dependency. Otherwise, the sense of dependency you’re invoking might turn out to be wrong (it makes the theory prescribe incorrect decisions) or question-begging (it implicitly presupposes an answer to the key question that the theory itself is supposed to figure out for us, namely what things are or are not counterfactual consequences of the decision-computation).
So the question, in UDT1, is: under what circumstances do you represent two real-world computations as being tied together via the same variable in a world-program?
That’s perhaps straightforward if S is implemented by literally the same physical state in multiple places. But as you acknowledge, you might instead have distinct Si’s that diverge from one another for some inputs (though not for the actual input in this case). And the different instances need not have the same physical substrate, or even use the same algorithm, as long as they give the same answers when the relevant inputs are the same, for some mapping between the inputs and between the outputs of the two Si’s. So there’s quite a bit of latitude as to whether to construe two computations as “logically equivalent”.
So, for example, for the conventional transparent-boxes problem, what principle tells us to formulate the world program as you proposed, rather than having:
(along with a similar program P2 that uses constant S2, yielding a different output from Omega_Predict)?
This alternative formulation ends up telling us to two-box. In this formulation, if S and S1 (or S and S2) are in fact the same, they would (counterfactually) differ if a different answer (than the actual one) were output from S—which is precisely what a causalist asserts. (A similar issue arises when deciding what facts to model as “inputs” to S—thus forbidding S to “know” those facts for purposes of figuring out the counterfactual dependencies—and what facts to build instead into the structure of the world-program, or to just leave as implicit background knowledge.)
So my concern is that UDT1 may covertly beg the question by selecting, among the possible formulations of the world-program, a version that turns out to presuppose an answer to the very question that UDT1 is intended to figure out for us (namely, what counterfactually depends on the decision-computation). And although I agree that the formulation you’ve selected in this example is correct and the above alternative formulation isn’t, I think it remains to explain why.
(As with my comments about TDT, my remarks about UDT1 are under the blanket caveat that my grasp of the intended content of the theories is still tentative, so my criticisms may just reflect a misunderstanding on my part.)
First, to clear up a possible confusion, the S in my P is not supposed to be a variable. It’s a constant, more specifically a piece of code that implements UDT1 itself. (If I sometimes talk about it as if it’s a variable, that’s because I’m trying to informally describe what is going on inside the computation that UDT1 does.)
For the more general question of how do we know the structure of the world program, the idea is that for an actual AI, we would program it to care about all possible world programs (or more generally, mathematical structures, see example 3 in my UDT1 post, but also Nesov’s recent post for a critique). The implementation of UDT1 in the AI would then figure out which world programs it’s in by looking at its inputs (which would contain all of the AI’s memories and sensory data) and checking which world programs call it with those inputs.
For these sample problems, the assumption is that somehow Omega has previously provided us with enough evidence for us to trust its word on what the structure of the current problem is. So in the actual P, ‘S(i, “box contains $1M”)’ is really something like ‘S(memories, omegas_explanations_about_this_problem, i, “box contains $1M”)’ and these additional inputs allow S to conclude that it’s being invoked inside this P, and not some other world program.
(An additional subtlety here is that if we consider all possible world programs, there are bound to be some other world programs where S is being called with these exact same inputs, for example ones where S is being instantiated inside a Boltzmann brain, but presumably those worlds/regions have very low weights, meaning that the AI doesn’t care much about them.)
Let me know if that answers your questions/concerns. I didn’t answer you point by point because I’m not sure which questions/concerns remain after you see my general answers. Feel free to repeat anything you still want me to answer.
Then it should be S(P), because S can’t make any decisions without getting to read the problem description.
Note that since our agent is considering possible world-programs, these world-programs are in some sense already part of the agent’s program (and the agent is in turn part of some of these world-programs-inside-the-agent, which reflects recursive character of the definition of the agent-program). The agent is a much better top-level program to consider than all-possible-world-programs, which is even more of a simplification if these world-programs somehow “exist at the same time”. When the (prior) definition of the world is seen as already part of the agent, a lot of the ontological confusion goes away.
(along with a similar program P2 that uses constant S2, yielding a different output from Omega_Predict)?
This alternative formulation ends up telling us to two-box. In this formulation, if S and S1 (or S and S2) are in fact the same, they would (counterfactually) differ if a different answer (than the actual one) were output from S—which is precisely what a causalist asserts. (A similar issue arises when deciding what facts to model as “inputs” to S—thus forbidding S to “know” those facts for purposes of figuring out the counterfactual dependencies—and what facts to build instead into the structure of the world-program, or to just leave as implicit background knowledge.)
So my concern is that UDT1 may covertly beg the question by selecting, among the possible formulations of the world-program, a version that turns out to presuppose an answer to the very question that UDT1 is intended to figure out for us (namely, what counterfactually depends on the decision-computation). And although I agree that the formulation you’ve selected in this example is correct and the above alternative formulation isn’t, I think it remains to explain why.
(As with my comments about TDT, my remarks about UDT1 are under the blanket caveat that my grasp of the intended content of the theories is still tentative, so my criticisms may just reflect a misunderstanding on my part.)
It seems to me that the world-program is part of the problem description, not the analysis. It’s equally tricky whether it’s given in English or in a computer program; Wei Dai just translated it faithfully, preserving the strange properties it had to begin with.
My concern is that there may be several world-programs that correspond faithfully to a given problem description, but that correspond to different analyses, yielding different decision prescriptions, as illustrated by the P1 example above. (Upon further consideration, I should probably modify P1 to include “S()=S1()” as an additional input to S and to Omega_Predict, duly reflecting that aspect of the problem description.)
If there are multiple translations, then either the translations are all mathematically equivalent, in the sense that they agree on the output for every combination of inputs, or the problem is underspecified. (This seems like it ought to be the definition for the word underspecified. It’s also worth noting that all game-theory problems are underspecified in this sense, since they contain an opponent you know little about.)
Now, if two world programs were mathematically equivalent but a decision theory gave them different answers, then that would be a serious problem with the decision theory. And this does, in fact, happen with some decision theories; in particular, it happens to theories that work by trying to decompose the world program into parts, when those parts are related in a way that the decision theory doesn’t know how to handle. If you treat the world-program as an opaque object, though, then all mathematically equivalent formulations of it should give the same answer.
I assume (please correct me if I’m mistaken) that you’re referring to the payout-value as the output of the world program. In that case, a P-style program and a P1-style program can certainly give different outputs for some hypothetical outputs of S (for the given inputs). However, both programs’s payout-outputs will be the same for whatever turns out to be the actual output of S (for the given inputs).
P and P1 have the same causal structure. And they have the same output with regard to (whatever is) the actual output of S (for the given inputs). But P and P1 differ counterfactually as to what the payout-output would be if the output of S (for the given inputs) were different than whatever it actually is.
So I guess you could say that what’s unspecified are the counterfactual consequences of a hypothetical decision, given the (fully specified) physical structure of the scenario. But figuring out the counterfactual consequences of a decision is the main thing that the decision theory itself is supposed to do for us; that’s what the whole Newcomb/Prisoner controversy boils down to. So I think it’s the solution that’s underspecified here, not the problem itself. We need a theory that takes the physical structure of the scenario as input, and generates counterfactual consequences (of hypothetical decisions) as outputs.
PS: To make P and P1 fully comparable, drop the “E*1e9” terms in P, so that both programs model the conventional transparent-boxes problem without an extraneous pi-preference payout.
This conversation is a bit confused. Looking back, P and P1 aren’t the same at all; P1 corresponds to the case where Omega never asks you for any decision at all! If S must be equal to S1 and S1 is part of the world program, then S must be part of the world program, too, not chosen by the player. If choosing an S such that S!=S1 is allowed, then it corresponds to the case where Omega simulates someone else (not specified).
The root of the confusion seems to be that Wei Dai wrote “def P(i): …”, when he should have written “def P(S): …”, since S is what the player gets to control. I’m not sure where making i a parameter to P came from, since the English description of the problem had i as part of the world-program, not a parameter to it.