You seem to have misunderstood the problem statement . If you commit to doing “FDT, except that if the predictor makes a mistake and there’s a bomb in the Left, take Right instead”, then you will almost surely have to pay $100 (since the predictor predicts that you will take Right), whereas if you commit to using pure FDT, then you will almost surely have to pay nothing (with a small chance of death). There really is no “strategy that, if the agent commits to it before the predictor makes her prediction, does better than FDT”.
 Which is fair enough, as it wasn’t actually specified correctly: the predictor is actually trying to predict whether you will take Left or Right if it leaves its helpful note, not in the general case. But this assumption has to be added, since otherwise FDT says to take Right.
-”Charging a toll for a bridge you didn’t build is not okay; that’s pure extraction.”
This is probably just a nitpick, but as worded this doesn’t take into account the scenario where the builder of the bridge sells the rights to charge a toll to another party, who can then legitimately charge the toll even though they didn’t build the bridge.
Yes they do. For simplicity suppose there are only two hosts, and suppose host A precommits to not putting money host B’s box, while host B makes no precommitments about how much money he will put in host A’s box. Then the human’s optimal strategy is “pick host A’s box with probability 1 - x epsilon, where x is the amount of money in host A’s box”. This incentivizes host B to maximize the amount in host A’s box (resulting in payoff ~101 for the human), but it would have been better for him if he had precommitted to do the same as A, since then by symmetry his box would have been picked half the time instead of 101 epsilon of the time.
Couldn’t you equally argue that they will do their best not to be smallest by not putting any money in all their opponent’s boxes? After all, “second-fullest” is the same as “third-emptiest”.
Ah, you’re right. That makes more sense now.
Why would precommitting to pick the second-fullest box give an incentive for predictors to put money in everyone else’s boxes?
If the hosts move first logically, then TDT will lead to the same outcomes as CDT, since it’s in each host’s interest to precommit to incentivising the human to pick their own box—once the host has precommitted to doing this, the incentive works regardless of what decision theory the human uses. In math terms, if x is the choice of which box to incentivize (with “incentivize your own box” being interpreted as “don’t place any money in any of the other boxes”), the human gets to choose a box f(x) on the basis of x, and the host gets to choose x=g(f) on the basis of the function f, which is known to the host since it is assumed to be superintelligent enough to simulate the human’s choices in hypothetical simulations. By definition, the host moving first in logical time would mean that g is chosen before f, and f is chosen on the basis of what’s in the human’s best interest given that the host will incentivize box g(f). But then the optimal strategy is for g to be a constant function.
Regarding $100 and $200, I think I missed the part where you said the human picks the box with the maximum amount of money—I was assuming he picked a random box.
Regarding the question of how to force all the incentives into one box, what about the following strategy: choose box 1 with probability 1 - (400 - x) epsilon, where x is the payoff of box 1. Then it is obviously in each host’s interest to predict box 1, since it has the largest probability of any box, but then it is also in each host’s interest to minimize 400 - x i.e. maximize x. This is true even though the hosts’ competition is zero-sum.
You seem to be assuming the human moves first in logical time, before the superintelligent hosts. You also seem to be assuming that the superintelligent hosts are using CDT (if they use FDT, then by symmetry considerations all of their possible actions have equal payoff, so what they do is arbitrary). Any particular reason for these assumptions?
Where do the numbers $152 and $275 come from? I would have thought they should be $100 and $200, respectively.
In the 5 box problem, why doesn’t FDT force all of the incentives into box 1, thus getting $400?
-”The main question is: In the counter-factual scenario in which TDT recommends action X to agent A , what does would another agent B do?”
This is actually not the main issue. If you fix an algorithm X for agent A to use, then the question “what would agent B do if he is using TDT and knows that agent A is using algorithm X?” has a well-defined answer, say f(X). The question “what would agent A do if she knows that whatever algorithm X she uses, agent B will use counter-algorithm f(X)” then also has a well-defined answer, say Z. So you could define “the result of TDT agents A and B playing against each other” to be where A plays Z and B plays f(Z). The problem is that this setup is not symmetric, and would yield a different result if we switched the order of A and B.
-”In a blackmail scenario it’s not so obvious, but I do think there is a certain symmetry between rejecting all blackmail and sending all blackmail.”
The symmetry argument only works when you have exact symmetry, though. To recall, the argument is that by controlling the output of the TDT algorithm in player A’s position, you are also by logical necessity controlling the output in player B’s position, hence TDT can act as though it controls player B’s action. If there is even the slighest difference between player A and player B then there is no logical necessity and the argument doesn’t work. For example, in a prisoner’s dilemma where the payoffs are not quite symmetric, TDT says nothing.
-”So I no longer believe the claim that TDT agents simply avoid all negative-sum trades.”
I agree with you, but I think that’s because TDT is actually undefined in scenarios where negative-sum trading might occur.
If I understand correctly, TDT is not well-defined in these kinds of scenarios. By definition, the output of TDT is the argmax of utility over possible algorithms for computing the actions of the agent you are trying to compute TDT for, but if there are multiple agents in a scenario then utility is not a function solely of these algorithms, but also of the algorithms determining the actions of the other agents.
If I remember right, some people have expressed optimism that TDT could be extended to such scenarios in a way that allows for “positive-sum” acausal trade but not for “negative-sum” trade such as acausal blackmail. However, I am not aware (please correct me if I am wrong!) of any meaningful progress on defining such an extension.
In the bank transfer example, the combined action still seems to violate deontology to me because you are still hacking into the bank’s computer.
I’m not really sure you can treat the button-presser as hostile in the same sense as someone you are playing poker against is hostile. Someone might for example just think it’s funny to take down the frontpage, it doesn’t mean they have an incentive to minimize the information we get out of it.
What exactly is the proposal here? Have next year’s Petrov Day celebration only go down to 300 karma, or what?
Yes, but the field of order 4 is not the same thing as the integers modulo 4. The field of order 4 has the four elements 0,1,x,x+1, where x satisfies the equation x^2 + x + 1 = 0 and 2=0. For more details see https://en.wikipedia.org/wiki/Finite_field#Field_with_four_elements
The formula x=-x is valid for all fields of order 2^k, not just the field of order 2.
Sure, that patch wouldn’t have the problem I described.
Anyway, do whatever works for you—if you find this exercise helps people train their calibration, then I suppose that’s a good thing. I guess my main point would be not to take too seriously what this method tells us about who is “best” at calibration—and I guess you’re saying people already don’t take seriously in the case of someone who is doing badly at the trivia portion, but I think the failure mode is a bit more general than that. Anyway, I guess it doesn’t matter too much.
The people with the best calibration scores will not be those with the most skill at calibration. It will be those who “don’t guess” on the trivia questions—they either know it or they don’t (100% of 0% chance of getting it right). This is because if you guess and have (e.g.) a 50% chance of getting it right, then even if you are perfectly calibrated about that 50%, you will still get a Brier score of 0.25, as opposed to a score of 0 for someone who “doesn’t guess”.
Consequently, I don’t really see this game as being very useful at measuring calibration.
The problem I see with this proposal is that the concept of “information” that you are using here is not a physical concept but an epistemological one. The lamp is information to your friend, but it would not be information to another person in the same location who was unaware of your plan. So trying to reduce this concept of information to fundamental physics seems incorrect—except in the sense that we can reduce your friend’s brain to fundamental physics.
This may be missing the point, but when talking about a population of 600 people, “This intervention will save 200 lives” and “This intervention will result in 400 deaths” actually mean different things. The former means that the number of deaths will be decreased by 200 if you do the intervention, whereas the latter means that the number of deaths will be increased by 400 if you do the intervention. Relative to not doing the intervention. I realize you meant it to be “relative to everyone dying” and “relative to no one dying”, respectively, but that is not what the sentences mean in English if you interpret them naturally.