But if a stupid agent is asked to write a smart agent, it will want to write an agent that will agree to pay.
Wait, I’m afraid I’m already lost and this question seems so simple as to suggest I’m missing some important premise of the hypothetical scenario: Why would the stupid agent want this? Why wouldn’t it want to write a smart agent that calculates the millionth digit and makes the winning choice?
Restatement of what I understand about the problem:
You offer me lots of money if the millionth digit of pi is even and a small loss of it is odd. I should take the bet since I can’t calculate the answer and it might as well be random .
You offer me lots of money if the millionth digit of pi is even and a small loss of it is odd, and the chance to build a calculator to calculate the answer. I should still take the bet, even if my calculator tells me that it’s odd.
If I’m rephrasing it correctly it, then why?! If you’re given the chance to make a calculator to solve the problem, why wouldn’t you use it?
What you’re describing is not Counterfactual Mugging, it’s just a bet, and the right decision is indeed to use the calculator. The interesting feature of Counterfactual Mugging is that Omega is using counterfactual reasoning to figure out what you would have done if the coin had come out differently. You get the money only if you would have paid up in the counterfactual branch. In that case the right decision is to not use the calculator, I think. Though other people might have different intuitions, I’m sort of an outlier in how much I’m willing to follow UDT-ish reasoning.
The setup is such that muggings and rewards are grouped in pairs, for each coin there is a reward and a mugging, and the decision in the mugging only affects the reward of that same coin. So even if you don’t know where the coin comes from, or whether there are other coins with the same setup, or other coins where you don’t have a calculator, your decision on a mugging for a particular coin doesn’t affect them. If you can manage it, you should pay up only in counterfactuals, situations where you hypothetically observe Omega asserting an incorrect statement.
Recognizing counterfactuals requires that the calculator can be trusted to be more accurate than Omega. If you trust the calculator, the algorithm is that if the calculator disagrees with Omega, you pay up, but if the calculator confirms Omega’s correctness, you refuse to pay (so this confirmation of Omega’s correctness translates into a different decision than just observing Omega’s claim without checking it).
Perhaps in the counterfactual where the logical coin is the opposite of what’s true, the calculator should be assumed to also report the incorrect answer, so that its result will still agree with Omega’s. In this case, the calculator provides no further evidence, there is no point in using it, and you should unconditionally pay up.
Perhaps in the counterfactual where the logical coin is the opposite of what’s true, the calculator should be assumed to also report the incorrect answer, so that its result will still agree with Omega’s. In this case, the calculator provides no further evidence, there is no point in using it, and you should unconditionally pay up.
Yeah, that’s pretty much the assumption made in the post, which goes on to conclude (after a bunch of math) that you should indeed pay up unconditionally. I can’t tell if there’s any disagreement between us...
The origin of the logical coin seems relevant if you can compute it. Even if you know which side is counterfactual according to a particular logical coin, you might still be uncertain about why (whether) this coin (puzzle) was selected and not another coin that might have a different answer. This uncertainty, if allowed by the boundaries of the game, would motivate still paying up where you know reward to be logically impossible (according to the particular coin/puzzle), because it might still be possible according to other possible coins, that you can’t rule out a priori.
It seems to me that if you have a calculator, you should pay up exactly when you are in a counterfactual (i.e. you hypothetically observe Omega asserting an incorrect statement about the logical coin), but refuse to pay up if the alternative (Omega paying you) is counterfactual (in this case, you know that the event of being paid won’t be realized, assuming these are indeed the boundaries of the game). There doesn’t appear to be a downside to this strategy, if you do have a calculator and are capable of not exploding in the counterfactual that you know to be counterfactual (according to whatever dynamic is used to “predict” you in the counterfactual).
(Intuitively, a possible downside is that you might value situations that are contradictory, but I don’t see how this would not be a semantic confusion, seeing a situation itself as contradictory as opposed to merely its description being contradictory, a model that might have to go through all of the motions for the real thing, but eventually get refuted.)
I think the reason is sounds so odd is: how the hell is Omega calculating what your answer would have been if 1=0?
If what Omega is really calculating is what you would have done if you were merely told something equivalent to 1=0, then sure, paying up can make sense.
It seems to me that the relevant difference between “1=0” and “the billionth digit of pi is even” is that the latter statement has a really long disproof, but there might be a much shorter proof of what the agent would do if that statement were true. Or at least I imagine Omega to be doing the same sort of proof-theoretic counterfactual reasoning that’s described in the post. Though maybe there’s some better formalization of Counterfactual Mugging with a logical coin that we haven’t found...
Even if you’re cutting off Omega’s proofs at some length, there are plenty of math problems that people can’t do that are shorter than high-probability predictions that people will or won’t pay up. Certainly when I imagine the problem, I imagine it in the form of predicting someone who’s been told that the trillionth digit of pi is even and then paying out to that person depending on their counterfactual actions.
Of course, that leads to odd situations when the agent being predicted can do the math problem, but Omega still says “no bro, trust me, the trillionth digit of pi really is even.” But an agent who can do the math will still give Omega the money because decision theory, so does it really matter?
If you’re proposing to treat Omega’s words as just observational evidence that isn’t connected to math and could turn out one way or the other with probability 50%, I suppose the existing formalizations of UDT already cover such problems. But how does the agent assign probability 50% to a particular math statement made by Omega? If it’s more complicated than “the trillionth digit of pi is even”, then the agent needs some sort of logical prior over inconsistent theories to calculate the probabilities, and needs to be smart enough to treat these probabilities updatelessly, which brings us back to the questions asked at the beginning of my post… Or maybe I’m missing something, can you specify your proposal in more detail?
Well, I was thinking more in terms of a logical prior over single statements, see my favorite here.
But yeah I guess I was missing the point of the problem.
Also: suppose Omega comes up to you and says “If 1=0 was true I would have given you billion dollars if and only if you would give me 100 dollars if 1=1 was true. 1=1 is true, so can you spare $100?” Does this sound trustworthy? Frankly not, it feels like there’s a principle of explosion problem that insists that Omega would have given you all possible amounts of money at once if 1=0 was true.
A formulation that avoids the principle of explosion is “I used some process that I cannot prove the outcome of to pick a digit of pi. If that digit of pi was odd I would have given you a billion dollars iff [etc].”
Are you saying that Omega won’t even offer you the deal unless it used counter-factual reasoning to figure out what you’ll do once it offers?
So if Omega has already offered you the deal and you know the coin came out against your favor, and you find you are physically capable of rejecting the deal, you should reject the deal. You’ve already fooled Omega into thinking you’ll take the deal.
It’s just that if you’ve successfully “pre-committed” to the extent that a 100% accurate Omega has predicted you will take the offer, you’ll be physically incapable of not taking the offer. It’s just like Newcombs problem.
And if that’s true, it means that the problem we are facing is, how to make an algorithm that can’t go back on its pre-commitments even after it gains the knowledge of how the bet came out.
Wait, I’m afraid I’m already lost and this question seems so simple as to suggest I’m missing some important premise of the hypothetical scenario: Why would the stupid agent want this? Why wouldn’t it want to write a smart agent that calculates the millionth digit and makes the winning choice?
Restatement of what I understand about the problem:
You offer me lots of money if the millionth digit of pi is even and a small loss of it is odd. I should take the bet since I can’t calculate the answer and it might as well be random .
You offer me lots of money if the millionth digit of pi is even and a small loss of it is odd, and the chance to build a calculator to calculate the answer. I should still take the bet, even if my calculator tells me that it’s odd.
If I’m rephrasing it correctly it, then why?! If you’re given the chance to make a calculator to solve the problem, why wouldn’t you use it?
What you’re describing is not Counterfactual Mugging, it’s just a bet, and the right decision is indeed to use the calculator. The interesting feature of Counterfactual Mugging is that Omega is using counterfactual reasoning to figure out what you would have done if the coin had come out differently. You get the money only if you would have paid up in the counterfactual branch. In that case the right decision is to not use the calculator, I think. Though other people might have different intuitions, I’m sort of an outlier in how much I’m willing to follow UDT-ish reasoning.
The setup is such that muggings and rewards are grouped in pairs, for each coin there is a reward and a mugging, and the decision in the mugging only affects the reward of that same coin. So even if you don’t know where the coin comes from, or whether there are other coins with the same setup, or other coins where you don’t have a calculator, your decision on a mugging for a particular coin doesn’t affect them. If you can manage it, you should pay up only in counterfactuals, situations where you hypothetically observe Omega asserting an incorrect statement.
Recognizing counterfactuals requires that the calculator can be trusted to be more accurate than Omega. If you trust the calculator, the algorithm is that if the calculator disagrees with Omega, you pay up, but if the calculator confirms Omega’s correctness, you refuse to pay (so this confirmation of Omega’s correctness translates into a different decision than just observing Omega’s claim without checking it).
Perhaps in the counterfactual where the logical coin is the opposite of what’s true, the calculator should be assumed to also report the incorrect answer, so that its result will still agree with Omega’s. In this case, the calculator provides no further evidence, there is no point in using it, and you should unconditionally pay up.
Yeah, that’s pretty much the assumption made in the post, which goes on to conclude (after a bunch of math) that you should indeed pay up unconditionally. I can’t tell if there’s any disagreement between us...
The origin of the logical coin seems relevant if you can compute it. Even if you know which side is counterfactual according to a particular logical coin, you might still be uncertain about why (whether) this coin (puzzle) was selected and not another coin that might have a different answer. This uncertainty, if allowed by the boundaries of the game, would motivate still paying up where you know reward to be logically impossible (according to the particular coin/puzzle), because it might still be possible according to other possible coins, that you can’t rule out a priori.
It seems to me that if you have a calculator, you should pay up exactly when you are in a counterfactual (i.e. you hypothetically observe Omega asserting an incorrect statement about the logical coin), but refuse to pay up if the alternative (Omega paying you) is counterfactual (in this case, you know that the event of being paid won’t be realized, assuming these are indeed the boundaries of the game). There doesn’t appear to be a downside to this strategy, if you do have a calculator and are capable of not exploding in the counterfactual that you know to be counterfactual (according to whatever dynamic is used to “predict” you in the counterfactual).
(Intuitively, a possible downside is that you might value situations that are contradictory, but I don’t see how this would not be a semantic confusion, seeing a situation itself as contradictory as opposed to merely its description being contradictory, a model that might have to go through all of the motions for the real thing, but eventually get refuted.)
Hm, yeah, that sounds really odd.
I think the reason is sounds so odd is: how the hell is Omega calculating what your answer would have been if 1=0?
If what Omega is really calculating is what you would have done if you were merely told something equivalent to 1=0, then sure, paying up can make sense.
It seems to me that the relevant difference between “1=0” and “the billionth digit of pi is even” is that the latter statement has a really long disproof, but there might be a much shorter proof of what the agent would do if that statement were true. Or at least I imagine Omega to be doing the same sort of proof-theoretic counterfactual reasoning that’s described in the post. Though maybe there’s some better formalization of Counterfactual Mugging with a logical coin that we haven’t found...
Even if you’re cutting off Omega’s proofs at some length, there are plenty of math problems that people can’t do that are shorter than high-probability predictions that people will or won’t pay up. Certainly when I imagine the problem, I imagine it in the form of predicting someone who’s been told that the trillionth digit of pi is even and then paying out to that person depending on their counterfactual actions.
Of course, that leads to odd situations when the agent being predicted can do the math problem, but Omega still says “no bro, trust me, the trillionth digit of pi really is even.” But an agent who can do the math will still give Omega the money because decision theory, so does it really matter?
If you’re proposing to treat Omega’s words as just observational evidence that isn’t connected to math and could turn out one way or the other with probability 50%, I suppose the existing formalizations of UDT already cover such problems. But how does the agent assign probability 50% to a particular math statement made by Omega? If it’s more complicated than “the trillionth digit of pi is even”, then the agent needs some sort of logical prior over inconsistent theories to calculate the probabilities, and needs to be smart enough to treat these probabilities updatelessly, which brings us back to the questions asked at the beginning of my post… Or maybe I’m missing something, can you specify your proposal in more detail?
Well, I was thinking more in terms of a logical prior over single statements, see my favorite here.
But yeah I guess I was missing the point of the problem.
Also: suppose Omega comes up to you and says “If 1=0 was true I would have given you billion dollars if and only if you would give me 100 dollars if 1=1 was true. 1=1 is true, so can you spare $100?” Does this sound trustworthy? Frankly not, it feels like there’s a principle of explosion problem that insists that Omega would have given you all possible amounts of money at once if 1=0 was true.
A formulation that avoids the principle of explosion is “I used some process that I cannot prove the outcome of to pick a digit of pi. If that digit of pi was odd I would have given you a billion dollars iff [etc].”
Are you saying that Omega won’t even offer you the deal unless it used counter-factual reasoning to figure out what you’ll do once it offers?
So if Omega has already offered you the deal and you know the coin came out against your favor, and you find you are physically capable of rejecting the deal, you should reject the deal. You’ve already fooled Omega into thinking you’ll take the deal.
It’s just that if you’ve successfully “pre-committed” to the extent that a 100% accurate Omega has predicted you will take the offer, you’ll be physically incapable of not taking the offer. It’s just like Newcombs problem.
And if that’s true, it means that the problem we are facing is, how to make an algorithm that can’t go back on its pre-commitments even after it gains the knowledge of how the bet came out.
Retraction was unintentional—I thought this was a duplicate comment and “unretract” isn’t a thing.
You can delete and then re-post a retracted comment if it has no replies yet.