This is an unfinished part of the theory that I’ve also thought about,
though your example puts it very crisply (you might consider posting
it to LW?)
My current thoughts on resolution tend to see two main avenues:
1) Construct a full-blown DAG of math and Platonic facts, an account
of which mathematical facts make other mathematical facts true, so
that we can compute mathematical counterfactuals.
2) Treat differently mathematical knowledge that we learn by
genuinely mathematical reasoning and by physical observation. In this
case we know (D xor E) not by mathematical reasoning, but by
physically observing a box whose state we believe to be correlated
with D xor E. This may justify constructing a causal DAG with a node
descending from D and E, so a counterfactual setting of D won’t affect
the setting of E.
Currently I’d say that (2) looks like the better avenue. Can you come
up with an improper mathematical dependency where E is inferred from
D, and shouldn’t be seen as counterfactually affected, based on
mathematical reasoning only without postulating the observation of a
physical variable that descends from both E and D?
Incidentally, note that an unsolvable problem that should stay
unsolvable is as follows: I’m asked to pick red or green, and told “A
simulation of you given this information as well picked the wrong
color and got shot.” Whichever choice I make, I deduce that the other
choice was better. The exact details here will depend on how I
believe the simulator chose to tell me this, but ceteris paribus it’s
an unsolvable problem.
2) Treat differently mathematical knowledge that we learn by genuinely mathematical reasoning and by physical observation. In this case we know (D xor E) not by mathematical reasoning, but by physically observing a box whose state we believe to be correlated with D xor E. This may justify constructing a causal DAG with a node descending from D and E, so a counterfactual setting of D won’t affect
the setting of E.
Perhaps I’m misunderstanding you here, but D and E are Platonic computations. What does it mean to construct a causal DAG among Platonic computations? [EDIT: Ok, I may understand that a little better now; see my edit to my reply to (1).] Such a graph links together general mathematical facts, so the same issues arise as in (1), it seems to me: Do the links correspond to logical inference, or something else? What makes the graph acyclic? Is mathematical causality even coherent? And if you did have a module that can detect (presumably timeless) causal links among Platonic computations, then why not use that module directly to solve your decision problems?
Plus I’m not convinced that there’s a meaningful distinction between math knowledge that you gain by genuine math reasoning, and math knowledge that you gain by physical observation.
Let’s say, for instance, that I feed a particular conjecture to an automatic theorem prover, which tells me it’s true. Have I then learned that math fact by genuine mathematical reasoning (performed by the physical computer’s Platonic abstraction)? Or have I learned it by physical observation (of the physical computer’s output), and hence be barred from using that math fact for purposes of TDT’s logical-dependency-detection? Presumably the former, right? (Or else TDT will make even worse errors.)
But then suppose the predictor has simulated the universe sufficiently to establish that U (the universe’s algorithm, including physics and initial conditions) leads to there being $1M in the box in this situation. That’s a mathematical fact about U, obtained by (the simulator’s) mathematical reasoning. Let’s suppose that when the predictor briefs me, the briefing includes mention of this mathematical fact. So even if I keep my eyes closed and never physically see the $1M, I can rely instead on the corresponding mathematically derived fact.
(Or more straightforwardly, we can view the universe itself as a computer that’s performing mathematical reasoning about how U unfolds, in which case any physical observation is intrinsically obtained by mathematical reasoning.)
Logical uncertainty has always been more difficult to deal with than physical uncertainty; the problem with logical uncertainty is that if you analyze it enough, it goes away. I’ve never seen any really good treatment of logical uncertainty.
But if we depart from TDT for a moment, then it does seem clear that we need to have causelike nodes corresponding to logical uncertainty in a DAG which describes our probability distribution. There is no other way you can completely observe the state of a calculator sent to Mars and a calculator sent to Venus, and yet remain uncertain of their outcomes yet believe the outcomes are correlated. And if you talk about error-prone calculators, two of which say 17 and one of which says 18, and you deduce that the “Platonic answer” was probably in fact 17, you can see that logical uncertainty behaves in an even more causelike way than this.
So, going back to TDT, my hope is that there’s a neat set of rules for factoring our logical uncertainty in our causal beliefs, and that these same rules also resolve the sort of situation that you describe.
If you consider the notion of the correlated error-prone calculators, two returning 17 and one returning 18, then the most intuitive way to handle this would be to see a “Platonic answer” as its own causal node, and the calculators as error-prone descendants. I’m pretty sure this is how my brain is drawing the graph, but I’m not sure it’s the correct answer; it seems to me that a more principled answer would involve uncertainty about which mathematical fact affects each calculator—physically uncertain gates which determine which calculation affects each calculator.
For the (D xor E) problem, we know the behavior we want the TDT calculation to exhibit; we want (D xor E) to be a descendant node of D and E. If we view the physical observation of $1m as telling us the raw mathematical fact (D xor E), and then perform mathematical inference on D, we’ll find that we can affect E, which is not what we want. Conversely if we view D as having a physical effect, and E as having a physical effect, and the node D xor E as a physical descendant of D and E, we’ll get the behavior we want. So the question is whether there’s any principled way of setting this up which will yield the second behavior rather than the first, and also, presumably, yield epistemically correct behavior when reasoning about calculators and so on.
That’s if we go down avenue (2). If we go down avenue (1), then we give primacy to our intuition that if-counterfactually you make a different decision, this logically controls the mathematical fact (D xor E) with E held constant, but does not logically control E with (D xor E) held constant. While this does sound intuitive in a sense, it isn’t quite nailed down—after all, D is ultimately just as constant as E and (D xor E), and to change any of them makes the model equally inconsistent.
These sorts of issues are something I’m still thinking through, as I think I’ve mentioned, so let me think out loud for a bit.
In order to observe anything that you think has already been controlled by your decision—any physical thing in which a copy of D has already played a role—then (leaving aside the question of Omega’s strategy that simulated alternate versions of you to select a self-consistent problem, and whether this introduces conditional-strategy-dependence rather than just decision-dependence into the problem) there have to be other physical facts which combine with D to yield our observation.
Some of these physical facts may themselves be affected by mathematical facts, like an implemented computation of E; but the point is that in order to have observed anything controlled by D, we already had to draw a physical, causal diagram in which other nodes descended from D.
So suppose we introduce the rule that in every case like this, we will have some physical node that is affected by D, and if we can observe info that depends on D in any way, we’ll view the other mathematical facts as combining with D’s physical node. This is a rule that tells us not to draw the diagram with a physical node being determined by the mathematical fact D xor E, but rather to have a physical node determined by D, and then a physical descendent D xor E. (Which in this particular problem should descend from a physical node E that descends from the mathematical fact E, because the mathematical fact E is correlated with our uncertainty about other things, and a factored causal graph should have no remaining correlated sources of background uncertainty; but if E didn’t correlate to anything else in particular, we could just have D descending to (D xor E) via the (xor with E) rule.)
When I evaluate this proposed solution for ad-hoc-ness, it does admittedly look a bit ad-hoc, but it solves at least one other problem than the one I started with, and which I didn’t think of until now. Suppose Omega tells me that I make the same decision in the Prisoner’s Dilemma as Agent X. This does not necessarily imply that I should cooperate with Agent X. X and I could have made the same decision for different (uncorrelated) reasons, and Omega could have simply found out by simulating the two of us that X and I gave the same response. In this case, presumably defecting; but if I cooperated, X wouldn’t do anything differently. X is just a piece of paper with “Defect” written on it.
If I draw a causal diagram of how I came to learn this correlation from Omega, and I follow the rule of always drawing a causal boundary around the mathematical fact D as soon as it physically affects something, then, given the way Omega simulated both of us to observe the correlation, I see that D and X separately physically affected the correlation-checker node.
On the other hand, if I can analyze the two pieces of code D and X and see that they return the same output, without yet knowing the output, then this knowledge was obtained in a way that doesn’t involve D producing an output, so I don’t have to draw a hard causal boundary around that output.
If this works, the underlying principle that makes it work is something along the lines of “for D to control X, the correlation between our uncertainty about D and X has to emerge in a way that doesn’t involve anyone already computing D”. Otherwise D has no free will (said firmly tongue-in-cheek). I am not sure that this principle has any more elegant expression than the rule, “whenever, in your physical model of the universe, D finishes computing, draw a physical/causal boundary around that finished computation and have other things physically/causally descend from it”.
If this principle is violated then D ends up “correlated” to all sorts of other things we observe, like the price of fish and whether it’s raining outside, via the magic of xor.
When you use terms like “draw a hard causal boundary” I’m forced to imagine you’re actually drawing these things on the back of a cocktail napkin somewhere using some sorts of standard symbols. Are there such standards, and do you have such diagrams scanned in online somewhere?
ETA: A note for future readers: Eliezer below is referring to Judea Pearl (simply “Pearl” doesn’t convey much via google-searching, though I suppose “pearl causality” does at the moment)
Hmm… Pearl uses a lot of diagrams but they all seem pretty ad-hoc. Just the sorts of arrows and dots and things that you’d use to represent any graph (in the mathematics sense). Should I infer from this description that the answer is, “No, there isn’t a standard”?
I was picturing something like a legend that would tell someone, “Use a dashed line for a causal boundary, and a red dotted line to represent a logical inference, and a pink squirrel to represent postmodernism”
Um… I’m not sure there’s much I can say to that beyond “Read Probabilistic Reasoning in Intelligent Systems, or Causality”.
Pearl’s system is not ad-hoc. It is very not ad-hoc. It has a metric fuckload of math backing up the simple rules. But Pearl’s system does not include logical uncertainty. I’m trying to put logical uncertainty into it, while obeying the rules. This is a work in progress.
Pearl’s system is not ad-hoc. It is very not ad-hoc. It has a metric fuckload of math backing up the simple rules.
Thomblake’s observation may be that while Pearl’s system is extremely rigorous the diagrams used do not give an authoritative standard style for diagram drawing.
I’m rereading past discussions to find insights. This jumped out at me:
Suppose Omega tells me that I make the same decision in the Prisoner’s Dilemma as Agent X. This does not necessarily imply that I should cooperate with Agent X.
I was referring to the example Eliezer gives with your opponent being a DefectBot, in which case cooperating makes Omega’s claim false, which may just mean that you’d make your branch of the thought experiment counterfactual, instead of convincing DefectBot to cooperate:
X is just a piece of paper with “Defect” written on it.
Winning is about how alternatives you choose between compare. By cooperating against a same-action DefectBot, you are choosing nonexistence over a (D,D), which is not obviously a neutral choice.
I don’t think this is how it works. Particular counterfactual instances of you can’t influence whether they are counterfactual or exist in some stronger sense. They can only choose whether there are more real instances with identical experiences (and their choices can sometimes acausally influence what happens with real instances, which doesn’t seem to be the case here since the real you will choose defect either way as predicted by Omega). Hypothetical instances don’t lose anything by being in the branch that chooses the opposite of what the real you chooses unless they value being identical to the real you, which IMO would be silly.
Particular counterfactual instances of you can’t influence whether they are counterfactual or exist in some stronger sense.
What can influence things like that? Whatever property of a situation can mark it as counterfactual (more precisely, given by a contradictory specification, or not following from a preceding construction, assumed-real past state for example), that property could as well be a decision made by an agent present in that situation. There is nothing special about agents or their decisions.
Why do you think something can influence it? Whether you choose to cooperate or defect, you can always ask both “what would happen if I cooperated?” and “what would happen if I defected?”. In as far as being counterfactual makes sense the alternative to being the answer to “what would happen if I cooperated?” is being the answer to “what would happen if I defected?”, even if you know that the real you defects.
Compare Omega telling you that your answer will be the the same as the Nth digit of Pi. That doesn’t you allow to choose the Nth digit of Pi.
Winning is about how alternatives you choose between compare. By cooperating against a same-action DefectBot, you are choosing nonexistence over a (D,D), which is not obviously a neutral choice.
This becomes a (relatively) straightforward matter of working out where the (potentially counterfactual—depending what you choose) calculation is being performed to determine exactly what this ‘nonexistence’ means. Since this particular thought experiment doesn’t seem to specify any other broader context I assert that cooperate is clearly the correct option. Any agent which doesn’t cooperate is broken.
Basically, if you ever find yourself in this situation then you don’t matter. It’s your job to play chicken with the universe and not exist so the actual you can win.
I don’t see this argument making sense. Omega’s claim reduces to neglibible chances that a choice of Defection will be advantageous for me, because Omega’s claim makes it of neglible probability that either (D,C) or (C, D) will be realized. So I can only choose between the worlds of (C, C) and (D, D). Which means that the Cooperation world is advantageous, and that I should Cooperate.
In contrast, if Omega had claimed that we’d make the opposite decisions, then I’d only have to choose between the worlds of (D, C) or (C, D) -- with the worlds of (C, C) and (D, D) now having negligible probability. In which case, I should, of course, Defect.
The reasons for the correlation between me and Agent X are irrelevant when the fact of their correlation is known.
Agent X is a piece of paper with “Defect” written on it.
Sorry, was this intended as part of the problem statement, like “Omega tells you that agent X is a DefectBot that will play the same as you”? If yes, then ok. But if we don’t know what agent X is, then I don’t understand why a DefectBot is apriori more probable than a CooperateBot. If they are equally probable, then it cancels out (edit: no it doesn’t, it actually makes cooperating a better choice, thx ArisKatsaris). And there’s also the case where X is a copy of you, where cooperating does help. So it seems to be a better choice overall.
If we go down avenue (1), then we give primacy to our intuition that if-counterfactually you make a different decision, this logically controls the mathematical fact (D xor E) with E held constant, but does not logically control E with (D xor E) held constant. While this does sound intuitive in a sense, it isn’t quite nailed down—after all, D is ultimately just as constant as E and (D xor E), and to change any of them makes the model equally inconsistent.
I agree this sounds intuitive. As I mentioned earlier, though, nailing this down is tantamount to circling back and solving the full-blown problem of (decision-supporting) counterfactual reasoning: the problem of how to distinguish which facts to “hold fixed”, and which to “let vary” for consistency with a counterfactual antecedent.
In any event, is the idea to try to build a separate graph for math facts, and use that to analyze “logical dependency” among the Platonic nodes in the original graph, in order to carry out TDT’s modified “surgical alteration” of the original graph? Or would you try to build one big graph that encompasses physical and logical facts alike, and then use Pearl’s decision procedure without further modification?
If we view the physical observation of $1m as telling us the raw mathematical fact (D xor E), and then perform mathematical inference on D, we’ll find that we can affect E, which is not what we want.
Wait, isn’t it decision-computation C—rather than simulation D—whose “effect” (in the sense of logical consequence) on E we’re concerned about here? It’s the logical dependents of C that get surgically altered in the graph when C gets surgically altered, right? (I know C and D are logically equivalent, but you’re talking about inserting a physical node after D, not C, so I’m a bit confused.)
I’m having trouble following the gist of avenue (2) at the moment. Even with the node structure you suggest, we can still infer E from C and from the physical node that matches (D xor E)—unless the new rule prohibits relying on that physical node, which I guess is the idea. But what exactly is the prohibition? Are we forbidden to infer any mathematical fact from any physical indicator of that fact? Or is there something in particular about node (D xor E) that makes it forbidden? (It would be circular to cite the node’s dependence on C in the very sense of “dependence” that the new rule is helping us to compute.)
Or would you try to build one big graph that encompasses physical and logical facts alike, and then use Pearl’s decision procedure without further modification?
I definitely want one big graph if I can get it.
Wait, isn’t it decision-computation C—rather than simulation D—whose “effect” (in the sense of logical consequence) on E we’re concerned about here?
Sorry, yes, C.
Even with the node structure you suggest, we can still infer E from C and from the physical node that matches (D xor E)—unless the new rule prohibits relying on that physical node, which I guess is the idea. But what exactly is the prohibition? Are we forbidden to infer any mathematical fact from any physical indicator of that fact?
No, but whenever we see a physical fact F that depends on a decision C/D we’re still in the process of making plus Something Else (E), then we express our uncertainty in the form of a causal graph with directed arrows from C to D, D to F, and E to F. Thus when we compute a counterfactual on C, we find that F changes, but E does not.
No, but whenever we see a physical fact F that depends on a decision C/D we’re still in the process of making plus Something Else (E),
Wait, F depends on decision computation C in what sense of “depends on”? It can’t quite be the originally defined sense (quoted from your email near the top of the OP), since that defines dependency between Platonic computations, not between a Platonic computation and a physical fact. Do you mean that D depends on C in the original sense, and F in turn depends on D (and on E) in a different sense?
then we express our uncertainty in the form of a causal graph with directed arrows from C to D, D to F, and E to F.
Ok, but these arrows can’t be used to define the relevant sense of dependency above, since the relevant sense of dependency is what tells us we need to draw the arrows that way, if I understand correctly.
Sorry to keep being pedantic about the meaning of “depends”; I know you’re in thinking-out-loud mode here. But the theory gives wildly different answers depending (heh) on how that gets pinned down.
In my view, the chief form of “dependence” that needs to be discriminated is inferential dependence and causal dependence. If earthquakes cause burglar alarms to go off, then we can infer an earthquake from a burglar alarm or infer a burglar alarm from an earthquake. Logical reasoning doesn’t have the kind of directionality that causation does—or at least, classical logical reasoning does not—there’s no preferred form between ~A->B, ~B->A, and A \/ B.
The link between the Platonic decision C and the physical decision D might be different from the link between the physical decision D and the physical observation F, but I don’t know of anything in the current theory that calls for treating them differently. They’re just directional causal links. On the other hand, if C mathematically implies a decision C-2 somewhere else, that’s a logical implication that ought to symmetrically run backward to ~C-2 → ~C, except of course that we’re presumably controlling/evaluating C rather than C-2.
Thinking out loud here, the view is that your mathematical uncertainty ought to be in one place, and your physical uncertainty should be built on top of your mathematical uncertainty. The mathematical uncertainty is a logical graph with symmetric inferences, the physical uncertainty is a directed acyclic graph. To form controlling counterfactuals, you update the mathematical uncertainty, including any logical inferences that take place in mathland, and watch it propagate downward into the physical uncertainty. When you’ve already observed facts that physically depend on mathematical decisions you control but you haven’t yet made and hence whose values you don’t know, then those observations stay in the causal, directed, acyclic world; when the counterfactual gets evaluated, they get updated in the Pearl, directional way, not the logical, symmetrical inferential way.
Okay, then we have a logical link from C-platonic to D-platonic, and causal links descending from C-platonic to C-physical, E-platonic to E-physical, and D-platonic to D-physical to F-physical = D-physical xor E-physical. The idea being that when we counterfactualize on C-platonic, we update D-platonic and its descendents, but not E-platonic or its descendents.
I suppose that as written, this requires a rule, “for purposes of computing counterfactuals, keep in the causal graph rather than the logical knowledge base, any mathematical knowledge gained by observing a fact descended from your decision-output or any logical implications of your decision-output”. I could hope that this is a special case of something more elegant, but it would only be hope.
Ok. I think it would be very helpful to sketch, all in one place, what TDT2 (i.e., the envisioned avenue-2 version of TDT) looks like, taking care to pin down any needed sense of “dependency”. And similarly for TDT1, the avenue-1 version. (These suggestions may be premature, I realize.)
This is a rule that tells us not to draw the diagram with a physical node being determined by the mathematical fact D xor E, but rather to have a physical node determined by D, and then a physical descendent D xor E...
When I evaluate this proposed solution for ad-hoc-ness, it does admittedly look a bit ad-hoc, but it solves at least one other problem than the one I started with, and which I didn’t think of until now. Suppose Omega tells me that I make the same decision in the Prisoner’s Dilemma as Agent X. This does not necessarily imply that I should cooperate with Agent X. X and I could have made the same decision for different (uncorrelated) reasons, and Omega could have simply found out by simulating the two of us that X and I gave the same response. In this case, presumably defecting; but if I cooperated, X wouldn’t do anything differently. X is just a piece of paper with “Defect” written on it.
If X isn’t like us, we can’t “control” X by making a decision similar to what we would want X to output*. We shouldn’t go from being an agent that defects in the prisoner’s dilemma with Agent X when told we “make the same decision in the Prisoner’s Dilemma as Agent X” to being one that does not defect, just as we do not unilaterally switch from natural to precision bidding when in contract bridge a partner opens with two clubs (which signals a good hand under precision bidding, and not under natural bidding).
However, there do exist agents who should cooperate every time they hear they “make the same decision in the Prisoner’s Dilemma as Agent X”, those who have committed to cooperate in such cases. In some such cases, they are up against pieces of paper on which “cooperate” is written (too bad they didn’t have a more discriminating algorithm/clear Omega), in others, they are up against copies of themselves or other agents whose output depends on what Omega tells them. In any case, many agents should cooperate when they hear that.
Yes? No?
Why shouldn’t one be such an agent? Do we know ahead of time that we are likely to be up against pieces of paper with “cooperate” on them, and Omega would tell unhelpfully tell us we “make the same decision in the Prisoner’s Dilemma as Agent X” in all such cases, though if we had a different strategy we could have gotten useful information and defected in that case?
*Other cases include us defecting to get X to cooperate, and others where X’s play depends on ours, but this is the natural case to use when considering if the Agent X’s action depends on ours, a not strategically incompetent Agent X that has a strategy at least as good as always defecting or cooperating and does not try to condition his cooperating on our defecting or the like.
1) Construct a full-blown DAG of math and Platonic facts, an account of which mathematical facts make other mathematical facts true, so that we can compute mathematical counterfactuals.
“Makes true” means logically implies? Why would that graph be acyclic?
[EDIT: Wait, maybe I see what you mean. If you take a pdf of your beliefs about various mathematical facts, and run Pearl’s algorithm, you should be able to construct an acyclic graph.]
Although I know of no worked-out theory that I find convincing, I believe that counterfactual inference (of the sort that’s appropriate to use in the decision computation) makes sense with regard to events in universes characterized by certain kinds of physical laws. But when you speak of mathematical counterfactuals more generally, it’s not clear to me that that’s even coherent.
Plus, if you did have a general math-counterfactual-solving module, why would you relegate it to the logical-dependency-finding subproblem in TDT, and then return to the original factored causal graph? Instead, why not cast the whole problem as a mathematical abstraction, and then directly ask your math-counterfactual-solving module whether, say, (Platonic) C’s one-boxing counterfactually entails (Platonic) $1M? (Then do the argmax over the respective math-counterfactual consequences of C’s candidate outputs.)
I’ve been reviewing some of this discussion, and noticed that Eliezer hasn’t answered the question in your last paragraph. Here is his answer to one of my questions, which is similar to yours. But I’m afraid I still don’t have a really good understanding of the answer. In other words, I’m still not really sure why we need all the extra machinery in TDT, when having a general math-counterfactual-solving module (what I called “mathematical intuition module”) seems both necessary and sufficient.
I wonder if you, or anyone else, understands this well enough to try to explain it. It might help me, and perhaps others, to understand Eliezer’s approach to see it explained in a couple of different ways.
Instead, why not cast the whole problem as a mathematical abstraction, and then directly ask your math-counterfactual-solving module whether, say, (Platonic) C’s one-boxing counterfactually entails (Platonic) $1M?
This is basically the approach I took in (what I now call) UDT1.
For now, let me just reply to your incidental concluding point, because that’s brief.
I disagree that the red/green problem is unsolvable. I’d say the solution is that, with respect to the available information, both choices have equal (low) utility, so it’s simply a toss-up. A correct decision algorithm will just flip a coin or whatever.
Having done so, will a correct decision algorithm try to revise its choice in light of its (tentative) new knowledge of what its choice is? Only if it has nothing more productive to do with its remaining time.
Actually, one can do even better than that. As (I think), Eliezer implied, the key is Omega saying those words. (about the simulated you getting it wrong)
Did the simulated version receive that message too? (if yes, and if we assume Omega is always truthful, this implies an infinite recursion of simulations… let us not go invoking infinite nested computations willy-nilly.) If there was only a single layer of simulation, them Omega either gave that statement as input to it or did not. If yes, Omega is untruthful, which throws pretty much all of the standard reasoning about Omega out the window and we can simply take into account the possibility that Omega is blatantly lying.
If Omega is truthful, even to the simulations, then the simulation would not have received that prefix message. In which case you are in a different state than simulated you was. So all you have to do is make the decision opposite to what you would have done if you hadn’t heard that particular extra message. This may be guessed by simply one iteration of “I automatically want to guess color1… but wait, simulated me got it wrong, so I’ll guess color2 instead” since “actual” you has the knowledge that the previous version of you got it wrong.
If Omega lies to simulations and tells truth to “actuals” (and can somehow simulate without the simulation being conscious, so there’s no ambiguity about which you are, yet still be accurate… (am skeptical but confused on that point)), then we have an issue. But then it would require Omega to take a risk: if when telling the lie to the simulation, the simulation then gets it right, then what does Omega tell “actual” you?
(“actual” in quotes because I honestly don’t know whether or not one could be modeled with sufficient accuracy, however indirectly, without the model being conscious. I’m actually kind of skeptical of the prospect of a perfectly accurate model not being conscious, although a model that can determine some properties/approximations of the person without being conscious is probably possible)
TL;DR: even without access to coinflips beyond Omega’s predictive power, one might be able to do better in the red/green problem simply by noting that the nature of the additional information Omega provided you opens up the possibility that Omega’s simulation of you was a bit different than the actual situation you are in.
“Simulate telling the human that they got the answer wrong. If in this case they get the answer wrong, actually tell them that they get the answer wrong. Otherwise say nothing.”
This ought to make it relatively easy for Omega to truthfully put you in a “you’re screwed” situation a fair amount of the time. Albeit, if you know that this is Omega’s procedure, the rest of the time you should figure out what you would have done if Omega said “you’re wrong” and then do that.
This kind of thinking is, I think, outside the domain of current TDT, because it involves strategies that depend on actions you would have taken in counterfactual branches. I think it may even be outside the domain of current UDT for the same reason.
I don’t see why this is outside of UDT’s domain. It seems straightforward to model and solve the decision problem in UDT1. Here’s the world program:
def P(color):
outcome = "die"
if Omega_Predict(S, "you're wrong") == color:
if S("") == color:
outcome = "live"
else:
if S("you're wrong") == color:
outcome = "live"
Assuming a preference to maximize the occurrence of outcome=”live” averaged over P(“green”) and P(“red”), UDT1 would conclude that the optimal S returns a constant, either “green” or “red”, and do that.
BTW, do you find this “world program” style analysis useful? I don’t want to over-do them and get people annoyed. (I refrained from doing this for the problem described in Gary’s post, since it doesn’t mention UDT at all, and therefore I’m assuming you want to find a TDT-only solution.)
(I refrained from doing this for the problem described in Gary’s post, since it doesn’t mention UDT at all, and therefore I’m assuming you want to find a TDT-only solution.)
Yes, I was focusing on a specific difficulty in TDT, But I certainly have no objection to bringing UDT into the thread too. (I myself haven’t yet gotten around to giving UDT the attention I think it deserves.)
I was modeling what Eliezer wrote in the comment that I was responding to:
“Simulate telling the human that they got the answer wrong. If in this case they get the answer wrong, actually tell them that they get the answer wrong. Otherwise say nothing.”
BTW, if you add a tab in front of each line of your program listing, it will get formatted correctly.
Ah, I see. Then it seems that you are really solving the problem of minimizing the probability that Omega presents this problem in the first place.
What about the scenario, where Omega uses the strategy: Simulate telling the human that they got the answer wrong. Define the resulting answer as wrong, and the other as right.
This is what I modeled.
BTW, if you add a tab in front of each line of your program listing, it will get formatted correctly.
Thanks. Is there an easier way to get a tab into the comment input box than copy paste from an outside editor?
What about the scenario, where Omega uses the strategy: Simulate telling the human that they got the answer wrong. Define the resulting answer as wrong, and the other as right.
Is there an easier way to get a tab into the comment input box than copy paste from an outside editor?
Not that I’m aware of.
Are you guys talking about getting code to indent properly? You can do that by typing four spaces in front of each line. Each quadruple of spaces produces a further indentation.
Are you guys talking about getting code to indent properly? You can do that by typing four spaces in front of each line.
Spaces? Think of the wasted negentropy! I say we make tab the official Less Wrong indention symbol, and kick out anyone who disagrees. Who’s with me? :-)
Hm, I think the difference in our model programs indicates something that I don’t understand about UDT, like a wrong assumption that justified an optimization. But it seems they both produce the same result for P(S(“you’re wrong”)), which is outcome=”die” for all S.
Do you agree that this problem is, and should remain, unsolvable? (I understand “should remain unsolvable” to mean that any supposed solution must represent some sort of confusion about the problem.)
The input to P is supposed to contain the physical randomness in the problem, so P(S(“you’re wrong”)) doesn’t make sense to me. The idea is that both P(“green”) and P(“red”) get run, and we can think of them as different universes in a multiverse. Actually in this case I should have wrote “def P():” since there is no random correct color.
wrong assumption that justified an optimization
I’m not quite sure what you mean here, but in general I suggest just translating the decision problem directly into a world program without trying to optimize it.
Do you agree that this problem is, and should remain, unsolvable? (I understand “should remain unsolvable” to mean that any supposed solution must represent some sort of confusion about the problem.)
No, like I said, it seems pretty straightforward to solve in UDT. It’s just that even in the optimal solution you still die.
The input to P is supposed to contain the physical randomness in the problem, so P(S(“you’re wrong”)) doesn’t make sense to me. The idea is that both P(“green”) and P(“red”) get run, and we can think of them as different universes in a multiverse. Actually in this case I should have wrote “def P():” since there is no random correct color.
Ok, now I understood why you wrote your program the way you did.
It’s just that even in the optimal solution you still die.
By solve, I meant find a way to win. I think that after getting past different word use, we agree on the nature of the problem.
I’m not sure the algorithm you describe here is necessarily outside current TDT though. The counterfactual still corresponds to an actual thing Omega simulated. It’d be more like this: Omega did not add the “you are wrong” prefix. Therefore, conditioning on the idea that Omega always tries simulating with that prefix and only states the prefix if I (or whoever Omega is offering the challenge to) was wrong in that simulation, the simulation in question then did not produce the wrong answer.
Therefore a sufficient property for a good answer (one with higher expected utility) is that it should have the same output as that simulation. Therefore determine what that output was...
ie, TDT shouldn’t have much more problem (in principle) with that than with being told that it needs to guess the Nth digit of Pi. If possible, it would simply compute the Nth digit of Pi. In this case, it has to simply compute the outcome of a certain different algorithm which happens to be equivalent to its own decision algorithm when faced with a certain situation. I don’t THINK this would be inherently outside of current TDT as I understand it
I may be completely wrong on this, though, but that’s the way it seems to me.
As far as stuff like the problem in the OP, I suspect though that the Right Way for dealing with things analogous to counterfactual mugging (and extended to the problem in the OP) and such amounts to a very general precommitment… Or a retroactive precommitment.
My thinking here is rather fuzzy. I do suspect though that the Right Way probably looks something like the the TDT, in advance, doing a very general precommitment to be the sort of being that tends to have high expected utility when faced with counterfactual muggers and whatnot… (Or retroactively deciding to be the sort of being that effectively has the logical implication of being mathematically “precommited” to be such.)
By “unsolvable” I mean that you’re screwed over in final outcomes, not that TDT fails to have an output.
The interesting part of the problem is that, whatever you decide, you deduce facts about the background such that you know that what you are doing is the wrong thing. However, if you do anything differently, you would have to make a different deduction about the background facts, and again know that what you were doing was the wrong thing. Since we don’t believe that our decision is capable of affecting the background facts, the background facts ought to be a fixed constant, and we should be able to alter our decision without affecting the background facts… however, as soon as we do so, our inference about the unalterable background facts changes. It’s not 100% clear how to square this with TDT.
Actually, there is an optimal solution to this dilemma. Rather than use any internal process to decide, using a truly random process gives a 50% chance of survival. If you base your decision on a quantum randomness source, in principle no simulation can predict your choice (or rather, a complete simulation would correctly predict you fail in 50% of possible worlds).
Knowing how to use randomness against an intelligent adversary is important.
Gary postulated an infallible simulator, which presumably includes your entire initial state and all pseudorandom algorithms you might run. Known quantum randomness methods can only amplify existing entropy, not manufacture it ab initio. So you have no recourse to coinflips.
EDIT: Oops! pengvado is right. I was thinking of the case discussed here, where the random bits are provided by some quantum black box.
Quantum coinflips work even if Omega can predict them. It’s like a branch-both-ways instruction. Just measure some quantum variable, then measure a noncommuting variable, and voila, you’ve been split into two or more branches that observe different results and thus can perform different strategies. Omega’s perfect predictor tells it that you will do both strategies, each with half of your original measure. There is no arrangement of atoms (encoding the right answer) that Omega can choose in advance that would make both of you wrong.
If Omega wants to smack down the use of randomness, I can’t stop it. But there are a number of game theoretic situations where the optimal response is random play, and any decision theory that can’t respond correctly is broken.
A black box RNG is still useless despite being based on a quantum mechanism, or
That a quantum device will necessarily manufacture random bits.
Counterexamples to 2 are pretty straightforward (quantum computers), so I’m assuming you mean 1. I’m operating at the edge of my knowledge here (as my original mistake shows), but I think the entire point of Pironio et al’s paper was that you can verify random bits obtained from an adversary, subject to the conditions:
Bell inequality violations are observable (i.e., it’s a quantum generator).
The adversary can’t predict your measurement strategy.
By “unsolvable” I mean that you’re screwed over in final outcomes, not that TDT fails to have an output.
Oh ok. So it’s unsolvable in the same sense that “Choose red or green. Then I’ll shoot you.” is unsolvable. Sometimes choice really is futile. :) [EDIT: Oops, I probably misunderstood what you’re referring to by “screwed over”.]
The interesting part of the problem is that, whatever you decide, you deduce facts about the background such that you know that what you are doing is the wrong thing.
Yes, assuming that you’re the sort of algorithm that can (without inconsistency) know its own choice here before the choice is executed.
If you’re the sort of algorithm that may revise its intended action in response to the updated deduction, and if you have enough time left to perform the updated deduction, then the (previously) intended action may not be reliable evidence of what you will actually do, so it fails to provide sound reason for the update in the first place.
1) Construct a full-blown DAG of math and Platonic facts, an account of which mathematical facts make other mathematical facts true, so that we can compute mathematical counterfactuals.
If mathematical truths were drawn in a DAG graph, it’s unclear how counterfactuals would work. Since math is consistent, then, by the principle of explosion, the inversion of any statement makes all statements true. The counterfactual graph would therefore be completely uninformative.
Or, perhaps, it would just generate another system of math. But then you have to know the inferential relationship between that new math and the rest of the world.
Treating same inputs on duplicate functions also arises in the treatment of counterfactuals (since one duplicates the causal graph across worlds of interest). The treatment I am familiar with is systematic merges of portions of the counterfactual graph which can be proved to be the same. I don’t really understand why this issue is about logic (rather than about duplication).
What was confusing me, however, was the remark that it is possible to create causal graphs of mathematical facts (presumably with entailment functioning as a causal relationship between facts). I really don’t see how this can be done. In particular the result is highly cyclic, infinite for most interesting theories, and it is not clear how to define interventions on such graphs in a satisfactory way.
I was going to suggest (2) myself, but then I realized that it seems to follow directly from your definition of “dependent on”, so you must have thought of it yourself:
For D to depend on C means that if C has various logical outputs, we can infer new logical facts about D’s logical output in at least some cases, relative to our current state of non-omniscient logical knowledge. [emphasis added]
And this was my reply:
This is an unfinished part of the theory that I’ve also thought about, though your example puts it very crisply (you might consider posting it to LW?)
My current thoughts on resolution tend to see two main avenues:
1) Construct a full-blown DAG of math and Platonic facts, an account of which mathematical facts make other mathematical facts true, so that we can compute mathematical counterfactuals.
2) Treat differently mathematical knowledge that we learn by genuinely mathematical reasoning and by physical observation. In this case we know (D xor E) not by mathematical reasoning, but by physically observing a box whose state we believe to be correlated with D xor E. This may justify constructing a causal DAG with a node descending from D and E, so a counterfactual setting of D won’t affect the setting of E.
Currently I’d say that (2) looks like the better avenue. Can you come up with an improper mathematical dependency where E is inferred from D, and shouldn’t be seen as counterfactually affected, based on mathematical reasoning only without postulating the observation of a physical variable that descends from both E and D?
Incidentally, note that an unsolvable problem that should stay unsolvable is as follows: I’m asked to pick red or green, and told “A simulation of you given this information as well picked the wrong color and got shot.” Whichever choice I make, I deduce that the other choice was better. The exact details here will depend on how I believe the simulator chose to tell me this, but ceteris paribus it’s an unsolvable problem.
Perhaps I’m misunderstanding you here, but D and E are Platonic computations. What does it mean to construct a causal DAG among Platonic computations? [EDIT: Ok, I may understand that a little better now; see my edit to my reply to (1).] Such a graph links together general mathematical facts, so the same issues arise as in (1), it seems to me: Do the links correspond to logical inference, or something else? What makes the graph acyclic? Is mathematical causality even coherent? And if you did have a module that can detect (presumably timeless) causal links among Platonic computations, then why not use that module directly to solve your decision problems?
Plus I’m not convinced that there’s a meaningful distinction between math knowledge that you gain by genuine math reasoning, and math knowledge that you gain by physical observation.
Let’s say, for instance, that I feed a particular conjecture to an automatic theorem prover, which tells me it’s true. Have I then learned that math fact by genuine mathematical reasoning (performed by the physical computer’s Platonic abstraction)? Or have I learned it by physical observation (of the physical computer’s output), and hence be barred from using that math fact for purposes of TDT’s logical-dependency-detection? Presumably the former, right? (Or else TDT will make even worse errors.)
But then suppose the predictor has simulated the universe sufficiently to establish that U (the universe’s algorithm, including physics and initial conditions) leads to there being $1M in the box in this situation. That’s a mathematical fact about U, obtained by (the simulator’s) mathematical reasoning. Let’s suppose that when the predictor briefs me, the briefing includes mention of this mathematical fact. So even if I keep my eyes closed and never physically see the $1M, I can rely instead on the corresponding mathematically derived fact.
(Or more straightforwardly, we can view the universe itself as a computer that’s performing mathematical reasoning about how U unfolds, in which case any physical observation is intrinsically obtained by mathematical reasoning.)
Logical uncertainty has always been more difficult to deal with than physical uncertainty; the problem with logical uncertainty is that if you analyze it enough, it goes away. I’ve never seen any really good treatment of logical uncertainty.
But if we depart from TDT for a moment, then it does seem clear that we need to have causelike nodes corresponding to logical uncertainty in a DAG which describes our probability distribution. There is no other way you can completely observe the state of a calculator sent to Mars and a calculator sent to Venus, and yet remain uncertain of their outcomes yet believe the outcomes are correlated. And if you talk about error-prone calculators, two of which say 17 and one of which says 18, and you deduce that the “Platonic answer” was probably in fact 17, you can see that logical uncertainty behaves in an even more causelike way than this.
So, going back to TDT, my hope is that there’s a neat set of rules for factoring our logical uncertainty in our causal beliefs, and that these same rules also resolve the sort of situation that you describe.
If you consider the notion of the correlated error-prone calculators, two returning 17 and one returning 18, then the most intuitive way to handle this would be to see a “Platonic answer” as its own causal node, and the calculators as error-prone descendants. I’m pretty sure this is how my brain is drawing the graph, but I’m not sure it’s the correct answer; it seems to me that a more principled answer would involve uncertainty about which mathematical fact affects each calculator—physically uncertain gates which determine which calculation affects each calculator.
For the (D xor E) problem, we know the behavior we want the TDT calculation to exhibit; we want (D xor E) to be a descendant node of D and E. If we view the physical observation of $1m as telling us the raw mathematical fact (D xor E), and then perform mathematical inference on D, we’ll find that we can affect E, which is not what we want. Conversely if we view D as having a physical effect, and E as having a physical effect, and the node D xor E as a physical descendant of D and E, we’ll get the behavior we want. So the question is whether there’s any principled way of setting this up which will yield the second behavior rather than the first, and also, presumably, yield epistemically correct behavior when reasoning about calculators and so on.
That’s if we go down avenue (2). If we go down avenue (1), then we give primacy to our intuition that if-counterfactually you make a different decision, this logically controls the mathematical fact (D xor E) with E held constant, but does not logically control E with (D xor E) held constant. While this does sound intuitive in a sense, it isn’t quite nailed down—after all, D is ultimately just as constant as E and (D xor E), and to change any of them makes the model equally inconsistent.
These sorts of issues are something I’m still thinking through, as I think I’ve mentioned, so let me think out loud for a bit.
In order to observe anything that you think has already been controlled by your decision—any physical thing in which a copy of D has already played a role—then (leaving aside the question of Omega’s strategy that simulated alternate versions of you to select a self-consistent problem, and whether this introduces conditional-strategy-dependence rather than just decision-dependence into the problem) there have to be other physical facts which combine with D to yield our observation.
Some of these physical facts may themselves be affected by mathematical facts, like an implemented computation of E; but the point is that in order to have observed anything controlled by D, we already had to draw a physical, causal diagram in which other nodes descended from D.
So suppose we introduce the rule that in every case like this, we will have some physical node that is affected by D, and if we can observe info that depends on D in any way, we’ll view the other mathematical facts as combining with D’s physical node. This is a rule that tells us not to draw the diagram with a physical node being determined by the mathematical fact D xor E, but rather to have a physical node determined by D, and then a physical descendent D xor E. (Which in this particular problem should descend from a physical node E that descends from the mathematical fact E, because the mathematical fact E is correlated with our uncertainty about other things, and a factored causal graph should have no remaining correlated sources of background uncertainty; but if E didn’t correlate to anything else in particular, we could just have D descending to (D xor E) via the (xor with E) rule.)
When I evaluate this proposed solution for ad-hoc-ness, it does admittedly look a bit ad-hoc, but it solves at least one other problem than the one I started with, and which I didn’t think of until now. Suppose Omega tells me that I make the same decision in the Prisoner’s Dilemma as Agent X. This does not necessarily imply that I should cooperate with Agent X. X and I could have made the same decision for different (uncorrelated) reasons, and Omega could have simply found out by simulating the two of us that X and I gave the same response. In this case, presumably defecting; but if I cooperated, X wouldn’t do anything differently. X is just a piece of paper with “Defect” written on it.
If I draw a causal diagram of how I came to learn this correlation from Omega, and I follow the rule of always drawing a causal boundary around the mathematical fact D as soon as it physically affects something, then, given the way Omega simulated both of us to observe the correlation, I see that D and X separately physically affected the correlation-checker node.
On the other hand, if I can analyze the two pieces of code D and X and see that they return the same output, without yet knowing the output, then this knowledge was obtained in a way that doesn’t involve D producing an output, so I don’t have to draw a hard causal boundary around that output.
If this works, the underlying principle that makes it work is something along the lines of “for D to control X, the correlation between our uncertainty about D and X has to emerge in a way that doesn’t involve anyone already computing D”. Otherwise D has no free will (said firmly tongue-in-cheek). I am not sure that this principle has any more elegant expression than the rule, “whenever, in your physical model of the universe, D finishes computing, draw a physical/causal boundary around that finished computation and have other things physically/causally descend from it”.
If this principle is violated then D ends up “correlated” to all sorts of other things we observe, like the price of fish and whether it’s raining outside, via the magic of xor.
When you use terms like “draw a hard causal boundary” I’m forced to imagine you’re actually drawing these things on the back of a cocktail napkin somewhere using some sorts of standard symbols. Are there such standards, and do you have such diagrams scanned in online somewhere?
ETA: A note for future readers: Eliezer below is referring to Judea Pearl (simply “Pearl” doesn’t convey much via google-searching, though I suppose “pearl causality” does at the moment)
Read Pearl. I think his online intros should give you a good idea of what the cocktail napkin looks like.
Hmm… Pearl uses a lot of diagrams but they all seem pretty ad-hoc. Just the sorts of arrows and dots and things that you’d use to represent any graph (in the mathematics sense). Should I infer from this description that the answer is, “No, there isn’t a standard”?
I was picturing something like a legend that would tell someone, “Use a dashed line for a causal boundary, and a red dotted line to represent a logical inference, and a pink squirrel to represent postmodernism”
Um… I’m not sure there’s much I can say to that beyond “Read Probabilistic Reasoning in Intelligent Systems, or Causality”.
Pearl’s system is not ad-hoc. It is very not ad-hoc. It has a metric fuckload of math backing up the simple rules. But Pearl’s system does not include logical uncertainty. I’m trying to put logical uncertainty into it, while obeying the rules. This is a work in progress.
I’d just like to register a general approval of specifying that one’s imaginary units are metric.
FWIW
Thomblake’s observation may be that while Pearl’s system is extremely rigorous the diagrams used do not give an authoritative standard style for diagram drawing.
That’s correct—I was looking for a standard style for diagram drawing.
I’m rereading past discussions to find insights. This jumped out at me:
Do you still believe this?
Playing chicken with Omega may result in you becoming counterfactual.
Why is cooperation more likely to qualify as “playing chicken” than defection here?
I was referring to the example Eliezer gives with your opponent being a DefectBot, in which case cooperating makes Omega’s claim false, which may just mean that you’d make your branch of the thought experiment counterfactual, instead of convincing DefectBot to cooperate:
So? That doesn’t hurt my utility in reality. I would cooperate because that wins if agent X is correlated with me, and doesn’t lose otherwise.
Winning is about how alternatives you choose between compare. By cooperating against a same-action DefectBot, you are choosing nonexistence over a (D,D), which is not obviously a neutral choice.
I don’t think this is how it works. Particular counterfactual instances of you can’t influence whether they are counterfactual or exist in some stronger sense. They can only choose whether there are more real instances with identical experiences (and their choices can sometimes acausally influence what happens with real instances, which doesn’t seem to be the case here since the real you will choose defect either way as predicted by Omega). Hypothetical instances don’t lose anything by being in the branch that chooses the opposite of what the real you chooses unless they value being identical to the real you, which IMO would be silly.
What can influence things like that? Whatever property of a situation can mark it as counterfactual (more precisely, given by a contradictory specification, or not following from a preceding construction, assumed-real past state for example), that property could as well be a decision made by an agent present in that situation. There is nothing special about agents or their decisions.
Why do you think something can influence it? Whether you choose to cooperate or defect, you can always ask both “what would happen if I cooperated?” and “what would happen if I defected?”. In as far as being counterfactual makes sense the alternative to being the answer to “what would happen if I cooperated?” is being the answer to “what would happen if I defected?”, even if you know that the real you defects.
Compare Omega telling you that your answer will be the the same as the Nth digit of Pi. That doesn’t you allow to choose the Nth digit of Pi.
This becomes a (relatively) straightforward matter of working out where the (potentially counterfactual—depending what you choose) calculation is being performed to determine exactly what this ‘nonexistence’ means. Since this particular thought experiment doesn’t seem to specify any other broader context I assert that cooperate is clearly the correct option. Any agent which doesn’t cooperate is broken.
Basically, if you ever find yourself in this situation then you don’t matter. It’s your job to play chicken with the universe and not exist so the actual you can win.
Agent X is a piece of paper with “Defect” written on it. I defect against it. Omega’s claim is true and does not imply that I should cooperate.
I don’t see this argument making sense. Omega’s claim reduces to neglibible chances that a choice of Defection will be advantageous for me, because Omega’s claim makes it of neglible probability that either (D,C) or (C, D) will be realized. So I can only choose between the worlds of (C, C) and (D, D). Which means that the Cooperation world is advantageous, and that I should Cooperate.
In contrast, if Omega had claimed that we’d make the opposite decisions, then I’d only have to choose between the worlds of (D, C) or (C, D) -- with the worlds of (C, C) and (D, D) now having negligible probability. In which case, I should, of course, Defect.
The reasons for the correlation between me and Agent X are irrelevant when the fact of their correlation is known.
Sorry, was this intended as part of the problem statement, like “Omega tells you that agent X is a DefectBot that will play the same as you”? If yes, then ok. But if we don’t know what agent X is, then I don’t understand why a DefectBot is apriori more probable than a CooperateBot. If they are equally probable, then it cancels out (edit: no it doesn’t, it actually makes cooperating a better choice, thx ArisKatsaris). And there’s also the case where X is a copy of you, where cooperating does help. So it seems to be a better choice overall.
There is also a case where X is an anticopy (performs opposite action), which argues for defecting in the same manner.
Edit: This reply is wrong.
No it doesn’t. If X is an anticopy, the situation can’t be real and your action doesn’t matter.
Why can’t it be real?
Because Omega has told you that X’s action is the same as yours.
OK.
I agree this sounds intuitive. As I mentioned earlier, though, nailing this down is tantamount to circling back and solving the full-blown problem of (decision-supporting) counterfactual reasoning: the problem of how to distinguish which facts to “hold fixed”, and which to “let vary” for consistency with a counterfactual antecedent.
In any event, is the idea to try to build a separate graph for math facts, and use that to analyze “logical dependency” among the Platonic nodes in the original graph, in order to carry out TDT’s modified “surgical alteration” of the original graph? Or would you try to build one big graph that encompasses physical and logical facts alike, and then use Pearl’s decision procedure without further modification?
Wait, isn’t it decision-computation C—rather than simulation D—whose “effect” (in the sense of logical consequence) on E we’re concerned about here? It’s the logical dependents of C that get surgically altered in the graph when C gets surgically altered, right? (I know C and D are logically equivalent, but you’re talking about inserting a physical node after D, not C, so I’m a bit confused.)
I’m having trouble following the gist of avenue (2) at the moment. Even with the node structure you suggest, we can still infer E from C and from the physical node that matches (D xor E)—unless the new rule prohibits relying on that physical node, which I guess is the idea. But what exactly is the prohibition? Are we forbidden to infer any mathematical fact from any physical indicator of that fact? Or is there something in particular about node (D xor E) that makes it forbidden? (It would be circular to cite the node’s dependence on C in the very sense of “dependence” that the new rule is helping us to compute.)
I definitely want one big graph if I can get it.
Sorry, yes, C.
No, but whenever we see a physical fact F that depends on a decision C/D we’re still in the process of making plus Something Else (E), then we express our uncertainty in the form of a causal graph with directed arrows from C to D, D to F, and E to F. Thus when we compute a counterfactual on C, we find that F changes, but E does not.
Wait, F depends on decision computation C in what sense of “depends on”? It can’t quite be the originally defined sense (quoted from your email near the top of the OP), since that defines dependency between Platonic computations, not between a Platonic computation and a physical fact. Do you mean that D depends on C in the original sense, and F in turn depends on D (and on E) in a different sense?
Ok, but these arrows can’t be used to define the relevant sense of dependency above, since the relevant sense of dependency is what tells us we need to draw the arrows that way, if I understand correctly.
Sorry to keep being pedantic about the meaning of “depends”; I know you’re in thinking-out-loud mode here. But the theory gives wildly different answers depending (heh) on how that gets pinned down.
In my view, the chief form of “dependence” that needs to be discriminated is inferential dependence and causal dependence. If earthquakes cause burglar alarms to go off, then we can infer an earthquake from a burglar alarm or infer a burglar alarm from an earthquake. Logical reasoning doesn’t have the kind of directionality that causation does—or at least, classical logical reasoning does not—there’s no preferred form between ~A->B, ~B->A, and A \/ B.
The link between the Platonic decision C and the physical decision D might be different from the link between the physical decision D and the physical observation F, but I don’t know of anything in the current theory that calls for treating them differently. They’re just directional causal links. On the other hand, if C mathematically implies a decision C-2 somewhere else, that’s a logical implication that ought to symmetrically run backward to ~C-2 → ~C, except of course that we’re presumably controlling/evaluating C rather than C-2.
Thinking out loud here, the view is that your mathematical uncertainty ought to be in one place, and your physical uncertainty should be built on top of your mathematical uncertainty. The mathematical uncertainty is a logical graph with symmetric inferences, the physical uncertainty is a directed acyclic graph. To form controlling counterfactuals, you update the mathematical uncertainty, including any logical inferences that take place in mathland, and watch it propagate downward into the physical uncertainty. When you’ve already observed facts that physically depend on mathematical decisions you control but you haven’t yet made and hence whose values you don’t know, then those observations stay in the causal, directed, acyclic world; when the counterfactual gets evaluated, they get updated in the Pearl, directional way, not the logical, symmetrical inferential way.
No, D was the Platonic simulator. That’s why the nature of the C->D dependency is crucial here.
Okay, then we have a logical link from C-platonic to D-platonic, and causal links descending from C-platonic to C-physical, E-platonic to E-physical, and D-platonic to D-physical to F-physical = D-physical xor E-physical. The idea being that when we counterfactualize on C-platonic, we update D-platonic and its descendents, but not E-platonic or its descendents.
I suppose that as written, this requires a rule, “for purposes of computing counterfactuals, keep in the causal graph rather than the logical knowledge base, any mathematical knowledge gained by observing a fact descended from your decision-output or any logical implications of your decision-output”. I could hope that this is a special case of something more elegant, but it would only be hope.
Ok. I think it would be very helpful to sketch, all in one place, what TDT2 (i.e., the envisioned avenue-2 version of TDT) looks like, taking care to pin down any needed sense of “dependency”. And similarly for TDT1, the avenue-1 version. (These suggestions may be premature, I realize.)
If X isn’t like us, we can’t “control” X by making a decision similar to what we would want X to output*. We shouldn’t go from being an agent that defects in the prisoner’s dilemma with Agent X when told we “make the same decision in the Prisoner’s Dilemma as Agent X” to being one that does not defect, just as we do not unilaterally switch from natural to precision bidding when in contract bridge a partner opens with two clubs (which signals a good hand under precision bidding, and not under natural bidding).
However, there do exist agents who should cooperate every time they hear they “make the same decision in the Prisoner’s Dilemma as Agent X”, those who have committed to cooperate in such cases. In some such cases, they are up against pieces of paper on which “cooperate” is written (too bad they didn’t have a more discriminating algorithm/clear Omega), in others, they are up against copies of themselves or other agents whose output depends on what Omega tells them. In any case, many agents should cooperate when they hear that.
Yes? No?
Why shouldn’t one be such an agent? Do we know ahead of time that we are likely to be up against pieces of paper with “cooperate” on them, and Omega would tell unhelpfully tell us we “make the same decision in the Prisoner’s Dilemma as Agent X” in all such cases, though if we had a different strategy we could have gotten useful information and defected in that case?
*Other cases include us defecting to get X to cooperate, and others where X’s play depends on ours, but this is the natural case to use when considering if the Agent X’s action depends on ours, a not strategically incompetent Agent X that has a strategy at least as good as always defecting or cooperating and does not try to condition his cooperating on our defecting or the like.
“Makes true” means logically implies? Why would that graph be acyclic? [EDIT: Wait, maybe I see what you mean. If you take a pdf of your beliefs about various mathematical facts, and run Pearl’s algorithm, you should be able to construct an acyclic graph.]
Although I know of no worked-out theory that I find convincing, I believe that counterfactual inference (of the sort that’s appropriate to use in the decision computation) makes sense with regard to events in universes characterized by certain kinds of physical laws. But when you speak of mathematical counterfactuals more generally, it’s not clear to me that that’s even coherent.
Plus, if you did have a general math-counterfactual-solving module, why would you relegate it to the logical-dependency-finding subproblem in TDT, and then return to the original factored causal graph? Instead, why not cast the whole problem as a mathematical abstraction, and then directly ask your math-counterfactual-solving module whether, say, (Platonic) C’s one-boxing counterfactually entails (Platonic) $1M? (Then do the argmax over the respective math-counterfactual consequences of C’s candidate outputs.)
I’ve been reviewing some of this discussion, and noticed that Eliezer hasn’t answered the question in your last paragraph. Here is his answer to one of my questions, which is similar to yours. But I’m afraid I still don’t have a really good understanding of the answer. In other words, I’m still not really sure why we need all the extra machinery in TDT, when having a general math-counterfactual-solving module (what I called “mathematical intuition module”) seems both necessary and sufficient.
I wonder if you, or anyone else, understands this well enough to try to explain it. It might help me, and perhaps others, to understand Eliezer’s approach to see it explained in a couple of different ways.
This is basically the approach I took in (what I now call) UDT1.
For now, let me just reply to your incidental concluding point, because that’s brief.
I disagree that the red/green problem is unsolvable. I’d say the solution is that, with respect to the available information, both choices have equal (low) utility, so it’s simply a toss-up. A correct decision algorithm will just flip a coin or whatever.
Having done so, will a correct decision algorithm try to revise its choice in light of its (tentative) new knowledge of what its choice is? Only if it has nothing more productive to do with its remaining time.
Actually, one can do even better than that. As (I think), Eliezer implied, the key is Omega saying those words. (about the simulated you getting it wrong)
Did the simulated version receive that message too? (if yes, and if we assume Omega is always truthful, this implies an infinite recursion of simulations… let us not go invoking infinite nested computations willy-nilly.) If there was only a single layer of simulation, them Omega either gave that statement as input to it or did not. If yes, Omega is untruthful, which throws pretty much all of the standard reasoning about Omega out the window and we can simply take into account the possibility that Omega is blatantly lying.
If Omega is truthful, even to the simulations, then the simulation would not have received that prefix message. In which case you are in a different state than simulated you was. So all you have to do is make the decision opposite to what you would have done if you hadn’t heard that particular extra message. This may be guessed by simply one iteration of “I automatically want to guess color1… but wait, simulated me got it wrong, so I’ll guess color2 instead” since “actual” you has the knowledge that the previous version of you got it wrong.
If Omega lies to simulations and tells truth to “actuals” (and can somehow simulate without the simulation being conscious, so there’s no ambiguity about which you are, yet still be accurate… (am skeptical but confused on that point)), then we have an issue. But then it would require Omega to take a risk: if when telling the lie to the simulation, the simulation then gets it right, then what does Omega tell “actual” you?
(“actual” in quotes because I honestly don’t know whether or not one could be modeled with sufficient accuracy, however indirectly, without the model being conscious. I’m actually kind of skeptical of the prospect of a perfectly accurate model not being conscious, although a model that can determine some properties/approximations of the person without being conscious is probably possible)
TL;DR: even without access to coinflips beyond Omega’s predictive power, one might be able to do better in the red/green problem simply by noting that the nature of the additional information Omega provided you opens up the possibility that Omega’s simulation of you was a bit different than the actual situation you are in.
Omega can use the following algorithm:
“Simulate telling the human that they got the answer wrong. If in this case they get the answer wrong, actually tell them that they get the answer wrong. Otherwise say nothing.”
This ought to make it relatively easy for Omega to truthfully put you in a “you’re screwed” situation a fair amount of the time. Albeit, if you know that this is Omega’s procedure, the rest of the time you should figure out what you would have done if Omega said “you’re wrong” and then do that.
This kind of thinking is, I think, outside the domain of current TDT, because it involves strategies that depend on actions you would have taken in counterfactual branches. I think it may even be outside the domain of current UDT for the same reason.
I don’t see why this is outside of UDT’s domain. It seems straightforward to model and solve the decision problem in UDT1. Here’s the world program:
Assuming a preference to maximize the occurrence of outcome=”live” averaged over P(“green”) and P(“red”), UDT1 would conclude that the optimal S returns a constant, either “green” or “red”, and do that.
BTW, do you find this “world program” style analysis useful? I don’t want to over-do them and get people annoyed. (I refrained from doing this for the problem described in Gary’s post, since it doesn’t mention UDT at all, and therefore I’m assuming you want to find a TDT-only solution.)
Yes, I was focusing on a specific difficulty in TDT, But I certainly have no objection to bringing UDT into the thread too. (I myself haven’t yet gotten around to giving UDT the attention I think it deserves.)
The world program I would use to model this scenario is:
The else branch seems unreachable, given color = S(“your’e wrong) and the usual assumptions about Omega.
I don’t understand what your nested if statements are modeling.
I was modeling what Eliezer wrote in the comment that I was responding to:
BTW, if you add a tab in front of each line of your program listing, it will get formatted correctly.
Ah, I see. Then it seems that you are really solving the problem of minimizing the probability that Omega presents this problem in the first place.
What about the scenario, where Omega uses the strategy: Simulate telling the human that they got the answer wrong. Define the resulting answer as wrong, and the other as right.
This is what I modeled.
Thanks. Is there an easier way to get a tab into the comment input box than copy paste from an outside editor?
In that case it should be modeled like this:
Not that I’m aware of.
Are you guys talking about getting code to indent properly? You can do that by typing four spaces in front of each line. Each quadruple of spaces produces a further indentation.
http://daringfireball.net/projects/markdown/syntax#precode
Spaces? Think of the wasted negentropy! I say we make tab the official Less Wrong indention symbol, and kick out anyone who disagrees. Who’s with me? :-)
Hm, I think the difference in our model programs indicates something that I don’t understand about UDT, like a wrong assumption that justified an optimization. But it seems they both produce the same result for P(S(“you’re wrong”)), which is outcome=”die” for all S.
Do you agree that this problem is, and should remain, unsolvable? (I understand “should remain unsolvable” to mean that any supposed solution must represent some sort of confusion about the problem.)
The input to P is supposed to contain the physical randomness in the problem, so P(S(“you’re wrong”)) doesn’t make sense to me. The idea is that both P(“green”) and P(“red”) get run, and we can think of them as different universes in a multiverse. Actually in this case I should have wrote “def P():” since there is no random correct color.
I’m not quite sure what you mean here, but in general I suggest just translating the decision problem directly into a world program without trying to optimize it.
No, like I said, it seems pretty straightforward to solve in UDT. It’s just that even in the optimal solution you still die.
Ok, now I understood why you wrote your program the way you did.
By solve, I meant find a way to win. I think that after getting past different word use, we agree on the nature of the problem.
Fair enough.
I’m not sure the algorithm you describe here is necessarily outside current TDT though. The counterfactual still corresponds to an actual thing Omega simulated. It’d be more like this: Omega did not add the “you are wrong” prefix. Therefore, conditioning on the idea that Omega always tries simulating with that prefix and only states the prefix if I (or whoever Omega is offering the challenge to) was wrong in that simulation, the simulation in question then did not produce the wrong answer.
Therefore a sufficient property for a good answer (one with higher expected utility) is that it should have the same output as that simulation. Therefore determine what that output was...
ie, TDT shouldn’t have much more problem (in principle) with that than with being told that it needs to guess the Nth digit of Pi. If possible, it would simply compute the Nth digit of Pi. In this case, it has to simply compute the outcome of a certain different algorithm which happens to be equivalent to its own decision algorithm when faced with a certain situation. I don’t THINK this would be inherently outside of current TDT as I understand it
I may be completely wrong on this, though, but that’s the way it seems to me.
As far as stuff like the problem in the OP, I suspect though that the Right Way for dealing with things analogous to counterfactual mugging (and extended to the problem in the OP) and such amounts to a very general precommitment… Or a retroactive precommitment.
My thinking here is rather fuzzy. I do suspect though that the Right Way probably looks something like the the TDT, in advance, doing a very general precommitment to be the sort of being that tends to have high expected utility when faced with counterfactual muggers and whatnot… (Or retroactively deciding to be the sort of being that effectively has the logical implication of being mathematically “precommited” to be such.)
By “unsolvable” I mean that you’re screwed over in final outcomes, not that TDT fails to have an output.
The interesting part of the problem is that, whatever you decide, you deduce facts about the background such that you know that what you are doing is the wrong thing. However, if you do anything differently, you would have to make a different deduction about the background facts, and again know that what you were doing was the wrong thing. Since we don’t believe that our decision is capable of affecting the background facts, the background facts ought to be a fixed constant, and we should be able to alter our decision without affecting the background facts… however, as soon as we do so, our inference about the unalterable background facts changes. It’s not 100% clear how to square this with TDT.
This is like trying to decide whether this statement is true:
“You will decide that this statement is false.”
There is nothing paradoxical about this statement. It is either true or false. The only problem is that you can’t get it right.
Actually, there is an optimal solution to this dilemma. Rather than use any internal process to decide, using a truly random process gives a 50% chance of survival. If you base your decision on a quantum randomness source, in principle no simulation can predict your choice (or rather, a complete simulation would correctly predict you fail in 50% of possible worlds).
Knowing how to use randomness against an intelligent adversary is important.
Gary postulated an infallible simulator, which presumably includes your entire initial state and all pseudorandom algorithms you might run. Known quantum randomness methods can only amplify existing entropy, not manufacture it ab initio. So you have no recourse to coinflips.
EDIT: Oops! pengvado is right. I was thinking of the case discussed here, where the random bits are provided by some quantum black box.
Quantum coinflips work even if Omega can predict them. It’s like a branch-both-ways instruction. Just measure some quantum variable, then measure a noncommuting variable, and voila, you’ve been split into two or more branches that observe different results and thus can perform different strategies. Omega’s perfect predictor tells it that you will do both strategies, each with half of your original measure. There is no arrangement of atoms (encoding the right answer) that Omega can choose in advance that would make both of you wrong.
I agree, and for this reason whenever I make descriptions I make Omega’s response to quantum smart-asses and other randomisers explicit and negative.
If Omega wants to smack down the use of randomness, I can’t stop it. But there are a number of game theoretic situations where the optimal response is random play, and any decision theory that can’t respond correctly is broken.
Does putting the ‘quantum’ in a black box change anything?
Not sure I know which question you’re asking:
A black box RNG is still useless despite being based on a quantum mechanism, or
That a quantum device will necessarily manufacture random bits.
Counterexamples to 2 are pretty straightforward (quantum computers), so I’m assuming you mean 1. I’m operating at the edge of my knowledge here (as my original mistake shows), but I think the entire point of Pironio et al’s paper was that you can verify random bits obtained from an adversary, subject to the conditions:
Bell inequality violations are observable (i.e., it’s a quantum generator).
The adversary can’t predict your measurement strategy.
Am I misunderstanding something?
Oh ok. So it’s unsolvable in the same sense that “Choose red or green. Then I’ll shoot you.” is unsolvable. Sometimes choice really is futile. :) [EDIT: Oops, I probably misunderstood what you’re referring to by “screwed over”.]
Yes, assuming that you’re the sort of algorithm that can (without inconsistency) know its own choice here before the choice is executed.
If you’re the sort of algorithm that may revise its intended action in response to the updated deduction, and if you have enough time left to perform the updated deduction, then the (previously) intended action may not be reliable evidence of what you will actually do, so it fails to provide sound reason for the update in the first place.
If mathematical truths were drawn in a DAG graph, it’s unclear how counterfactuals would work. Since math is consistent, then, by the principle of explosion, the inversion of any statement makes all statements true. The counterfactual graph would therefore be completely uninformative.
Or, perhaps, it would just generate another system of math. But then you have to know the inferential relationship between that new math and the rest of the world.
I don’t see how logical entailment acts as functional causal dependence in Pearl’s account of causation. Can you explain?
Pearl’s account doesn’t include logical uncertainty at all so far as I know, but I made my case here
http://lesswrong.com/lw/15z/ingredients_of_timeless_decision_theory/
that Pearl’s account has to be modified to include logical uncertainty on purely epistemic grounds, never mind decision theory.
If this isn’t what you’re asking about then please further clarify the question?
Treating same inputs on duplicate functions also arises in the treatment of counterfactuals (since one duplicates the causal graph across worlds of interest). The treatment I am familiar with is systematic merges of portions of the counterfactual graph which can be proved to be the same. I don’t really understand why this issue is about logic (rather than about duplication).
What was confusing me, however, was the remark that it is possible to create causal graphs of mathematical facts (presumably with entailment functioning as a causal relationship between facts). I really don’t see how this can be done. In particular the result is highly cyclic, infinite for most interesting theories, and it is not clear how to define interventions on such graphs in a satisfactory way.
I was going to suggest (2) myself, but then I realized that it seems to follow directly from your definition of “dependent on”, so you must have thought of it yourself: