I’m still really keen for footnotes. They allow people to make more nuanced arguments by addressing objections or misunderstandings that people may have without breaking the flow.
Chris_Leong
“The problem is that principle F elides”—Yeah, I was noting that principle F doesn’t actually get us there and I’d have to assume a principle of independence as well. I’m still trying to think that through.
Hmm… that’s a fascinating argument. I’ve been having trouble figuring out how to respond to you, so I’m thinking that I need to make my argument more precise and then perhaps that’ll help us understand the situation.
Let’s start from the objection I’ve heard against Counterfactual Mugging. Someone might say, well I understand that if I don’t pay, then it means I would have lost out if it had come up heads, but since I know it didn’t came up heads, I don’t care. Making this more precise, when constructing counterfactuals for a decision, if we know fact F about the world before we’ve made our decision, F must be true in every counterfactual we construct (call this Principle F).
Now let’s consider Counterfactual Prisoner’s Dilemma. If the coin comes up HEADS, then principle F tells us that the counterfactuals need to have the COIN coming up HEADS as well. However, it doesn’t tell us how to handle the impact of the agent’s policy if they had seen TAILS. I think we should construct counterfactuals where the agent’s TAILS policy is independent of its HEADS policy, whilst you think we should construct counterfactuals where they are linked.
You justify your construction by noting that the agent can figure out that it will make the same decision in both the HEADS and TAILS case. In contrast, my tendency is to exclude information about our decision making procedures. So, if you knew you were a utility maximiser this would typically exclude all but one counterfactual and prevent us saying choice A is better than choice B. Similarly, my tendency here is to suggest that we should be erasing the agent’s self-knowledge of how it decides so that we can imagine the possibility of the agent choosing PAY/NOT PAY or NOT PAY/PAY.
But I still feel somewhat confused about this situation.
One of the biggest considerations would be the process for activating “crunch time”. In what situations should crunch time be declared? Who decides? How far out would we want to activate and would there be different levels? Are there any downsides of such a process including unwanted attention?
If these aren’t discussed in advance, then I imagine that far too much of the available time could be taken up by whether to activate crunch time protocols or not.PS. I actually proposed here that we might be able to get a superintelligence to solve most of the problem of embedded agency by itself. I’ll try to write it up into a proper post soon.
Strongly agree. There’s lots of people who might not seem worth funding at the moment, but when it comes to crunch time, EA should be prepared to burn money.
You’re correct that paying in Counterfactual Prisoner’s Dilemma doesn’t necessarily commit you to paying in Counterfactual Mugging.
However, it does appear to provide a counter-example to the claim that we ought to adopt the principle of making decisions by only considering the branches of reality that are consistent with our knowledge as this would result in us refusing to pay in Counterfactual Prisoner’s Dilemma regardless of the coin flip result.
(Interestingly enough, blackmail problems seem to also demonstrate that this principle is flawed as well).
This seems to suggest that we need to consider policies rather than completely separate decisions for each possible branch of reality. And while, as I already noted, this doesn’t get us all the way, it does make the argument for paying much more compelling by defeating the strongest objection.
He suggests that he subscribes to such a maximum, but also proposes a weaker maxim—that we should give unless the bad that would happen to us is comparable in moral significance to the harm we would prevent (note: this says comparable in moral significance, not equal in morally significance).
I would encourage you and other commentators to read Singer’s article Poverty, Affluence and Morality in which he develops his argument in more detail. It’s an extremely easy read and not overly long either (less than 20 pages). He explicitly adresses the potentially overwhelming nature of our obligations. He also addresses the issue of overpopulation that Vanilla Cabs raises. If I had more time, I’d provide a summary of what he says, but unfortunately I’ll have to leave this for someone else.
So it’s worth understanding that this analogy is where his argument begins and not where it ends. I don’t think he would claim that you should make your decision upon the basis of this thought experiment by itself.
Oh that makes a lot more sense. Is delta v1 hat the change of v1 rather than a infintesimal? (Asking because if it was then it’d be easier to understand how it is calculated).
I’m confused. The bottom diagram seems to also involve bidirectional flow of information.
vhat?
Underlying Newcomb’s Problem, then, lurks the is-ought problem.
It’s interesting that you say this. I’ve been thinking a lot about what counterfactuals are at their base and one of the possibilities I’ve been considering is that maybe we can’t provide an objective answer due to the is-ought problem.
Thanks Eric for writing this post, I found it fascinating.
I imagine that there are are lot of lessons from General Semantics or analytic philosophy that might not have made it into rational-sphere, so if you ever find time to share some of that with us, I imagine it would be well-received.
Well I’ve got another post arguing backwards causation isn’t necessarily absurd. But we don’t need to depend on it.
It’s not so much people arguing against this as being confused about how to explain backwards causation. So like tying up loose ends.
Hmm… it’s interesting that you’re writing this comment. I suppose it indicates that I didn’t make this point clearly enough?
I guess I probably should have expanded on this sentence: “From the view of pure reality, it’s important to understand that you can only make the choice you made and the past can only be as it was”.
I think the best way to explain this is to imagine characterise the two views as slightly different functions both of which return sets. Of course, the exact type representations isn’t the point. Instead, the types are just there to illustrate the difference between two slightly different concepts.
possible_world_pure() returns {x} where x is either <study & pass> or <beach & fail>, but we don’t know which one it will be
possible_world_augmented() returns {<study & pass>, <beach & fail>}
Once we’ve defined possible worlds, it naturally provides us a definition of possible actions and possible outcomes that matches what we expect. So for example:
size(possible_world_pure()) = size(possible_action_pure()) = size(possible_outcome_pure()) = 1
size(possible_world_augmented()) = size(possible_action_augmented()) = size(possible_outcome_augmented()) = 2
And if we have a decide function that iterates over all the counterfactuals in the set and returns the highest one, we need to call it on possible_world_augmented() rather than possible_world_pure().
Note that they aren’t always this similar. For example, for Transparent Newcomb they are:
possible_world_pure() returns {<1-box, million>}
possible_world_augmented() returns {<1-box, million>, <2-box, thousand>}
The point is that if we remain conscious of the type differences then we can avoid certain errors.
For example possible_outcome_pure() = {”PASS”}, doesn’t mean that possible_outcome_augmented() = {”PASS”}. It’s that later which would imply it doesn’t matter what the student does, not the former.
Excellent question. Maybe I haven’t framed this well enough.
We need a way of talking about the fact that both your outcome and your action are fixed by the past.We also need a way of talking about the fact that we can augment the world with counterfactuals (Of course, since we don’t have complete knowledge of the world, we typically won’t know which is the factual and which are the counterfactuals).
And that these are two distinct ways of looking at the world.
I’ll try to think about a cleaner way of framing this, but do you have any suggestions?
(For the record, the term I used before was Raw Counterfactuals—meaning consistent counterfactuals—and that’s a different concept than looking at the world in a particular way).
(Something that might help is that if we are looking at multiple possible pure realities, then we’ve introduced counterfactuals as only one is true and “possible” is determined by the map rather than the territory)
I believe that we need to take a Conceputal Engineering approach here. That is, I don’t see counterfactuals as intrinsically part of the world, but rather someone we construct. The question to answer is what purpose are we constructing these for? Once we’ve answer this question, we’ll be 90% of the way towards constructing them.
As far as I can see, the answer is that we imagine a set of possible worlds and we notice that agents that use certain notions of counterfactuals tend to perform better than agents that don’t. Of course, this raises the question of which possible worlds to consider, at which point we notice that this whole thing is somewhat circular.
However, this is less problematic than people think. Just as we can only talk about what things are true after already having taken some assumptions to be true (see Where Recursive Justification hits Bottom), it seems plausible that we might only be able to talk about possibility after having already taken some things to be possible.