You can analyze problems like this in the framework of my UDT/CDT/SIA post to work out how CDT under Bayesian updating and SIA is compatible with (but does not necessarily imply) the policy you would get from UDT-like policy selection. (note, SIA is irrelevant for the non-anthropic version of the problem)
Consider the policy of always saying “no”, which is what UDT policy selection gives you. If this is your policy in general, then on the margin as a “random” green person (under SIA), your decision makes no difference. Therefore it’s CDT-compatible to say “no” (“locally optimal” in the post).
Consider alternatively the policy of always saying “yes”. If this is your policy in general, then on the margin as a “random” green person (under SIA), you should say yes, because you’re definitely pivotal and when you’re pivotal it’s usually good to make the decision “yes”. This means it’s also “locally optimal” to always say yes. But that’s compatible with the general result because it just says every globally optimal policy is also locally optimal, not the reverse.
Let’s also consider “trembling hand” logic where your policy is to almost always say no (say, with 99% probability). In this case, the probability if there are 18 greens that you are pivotal is 10−34, whereas if there are 2 greens it’s 1/100. So you’re much much more likely to be pivotal conditioned on the second. Given the second you shouldn’t say green. So under trembling hand logic you’d move from almost always saying “no” to always saying “no”, as is compatible with UDT.
If on the other hand you almost always say “yes” (say, with 99% probability), you’d move towards saying yes more often (since you’re probably pivotal, and you’re probably in the first scenario). Which is compatible with the result, since it just says UDT is globally optimal.
The overall framework of the post can be converted to normal form game theory in the finite case (such as this). In the language of normal game theory, what I am saying is that always saying “no” is a trembling hand perfect equilibrium of the Bayesian game.
I think my conclusion is similar to yours above, but I consider randomized strategies in more detail, for both this problem and its variation with negated rewards.
I’ll be interested to have a look at your framework.
You can analyze problems like this in the framework of my UDT/CDT/SIA post to work out how CDT under Bayesian updating and SIA is compatible with (but does not necessarily imply) the policy you would get from UDT-like policy selection. (note, SIA is irrelevant for the non-anthropic version of the problem)
Consider the policy of always saying “no”, which is what UDT policy selection gives you. If this is your policy in general, then on the margin as a “random” green person (under SIA), your decision makes no difference. Therefore it’s CDT-compatible to say “no” (“locally optimal” in the post).
Consider alternatively the policy of always saying “yes”. If this is your policy in general, then on the margin as a “random” green person (under SIA), you should say yes, because you’re definitely pivotal and when you’re pivotal it’s usually good to make the decision “yes”. This means it’s also “locally optimal” to always say yes. But that’s compatible with the general result because it just says every globally optimal policy is also locally optimal, not the reverse.
Let’s also consider “trembling hand” logic where your policy is to almost always say no (say, with 99% probability). In this case, the probability if there are 18 greens that you are pivotal is 10−34, whereas if there are 2 greens it’s 1/100. So you’re much much more likely to be pivotal conditioned on the second. Given the second you shouldn’t say green. So under trembling hand logic you’d move from almost always saying “no” to always saying “no”, as is compatible with UDT.
If on the other hand you almost always say “yes” (say, with 99% probability), you’d move towards saying yes more often (since you’re probably pivotal, and you’re probably in the first scenario). Which is compatible with the result, since it just says UDT is globally optimal.
The overall framework of the post can be converted to normal form game theory in the finite case (such as this). In the language of normal game theory, what I am saying is that always saying “no” is a trembling hand perfect equilibrium of the Bayesian game.
I’ve just put up this post, before having read your comment:
https://www.lesswrong.com/posts/aSXMM8QicBzTyxTj3/reflective-consistency-randomized-decisions-and-the-dangers
I think my conclusion is similar to yours above, but I consider randomized strategies in more detail, for both this problem and its variation with negated rewards.
I’ll be interested to have a look at your framework.
Yeah, agree with your analysis.