Interesting. The natural approach is to imagine that you just have a 3-sided die with 2, 4, 6 on the sides, and if you do that, then I compute A = 12 and B = 6[1]. But, as the top Reddit comment’s edit points out, the difference between that problem and the one you posed is that your version heavily weights the probability towards short sequences—that weighting being 1/2^n for a sequence of length n. (Note that the numbers I got, A=12 and B=6, are so much higher than the A≈2.7 and B=3 you get.) It’s an interesting selection effect.
The thing is that, if you roll a 6 and then a non-6, in an “A” sequence you’re likely to just die due to rolling an odd number before you succeed in getting the double 6, and thus exclude the sequence from the surviving set; whereas in a “B” sequence there’s a much higher chance you’ll roll a 6 before dying, and thus include this longer “sequence of 3+ rolls” in the set.
To illustrate with an extreme version, consider:
A: The expected number of rolls of a fair die until you roll two 6s in a row, given that you succeed in doing this. You ragequit if it takes more than two rolls.
Excluding odd rolls completely, so the die has a 1⁄3 chance of rolling 6 and a 2⁄3 chance of rolling an even number that’s not 6, we have:
A = 1 + 1⁄3 * A2 + 2⁄3 * A
Where A2 represents “the expected number of die rolls until you get two 6′s in a row, given that the last roll was a 6”. Subtraction and multiplication then yields:
A = 3 + A2
And if we consider rolling a die from the A2 state, we get:
A2 = 1 + 1⁄3 * 0 + 2⁄3 * A = 1 + 2⁄3 * A
Substituting:
A = 3 + 1 + 2⁄3 * A => (subtract) 1⁄3 * A = 4 => (multiply) A = 12
The thing is that, if you roll a 6 and then a non-6, in an “A” sequence you’re likely to just die due to rolling an odd number before you succeed in getting the double 6, and thus exclude the sequence from the surviving set; whereas in a “B” sequence there’s a much higher chance you’ll roll a 6 before dying, and thus include this longer “sequence of 3+ rolls” in the set.
Yes! This kind of kills the “paradox”. Its approaching an apples and oranges comparison.
Surviving sequences with n=100 rolls (for illustrative purposes)
A: The probability that you roll a fair die until you roll two 6s in a row, given that all rolls were even.
B: The probability that you roll a fair die until you roll two non-consecutive 6s (not necessarily in a row), given that all rolls were even.
This changes the code to:
A_estimate = num_sequences_without_odds/n
B_estimate = num_sequences_without_odds/n
And the result (n=100000)
Estimate for A: 0.045 Estimate for B: 0.062
I guess this is what most people where thinking when reading the problem, i.e., its a bigger chance of getting two non consecutive 6s. But by the wording (see above) of the “paradox” it gives more rolls on average for the surviving sequences, but you on the other hand have more surviving sequences hence higher probability.
It’s worth highlighting that the two expectations do not condition on the same event. This explains why we can have E[A | all even] < E[B | even] even though A ≥ B almost surely: the two “all even”s actually refer to different events.
Interesting. The natural approach is to imagine that you just have a 3-sided die with 2, 4, 6 on the sides, and if you do that, then I compute A = 12 and B = 6[1]. But, as the top Reddit comment’s edit points out, the difference between that problem and the one you posed is that your version heavily weights the probability towards short sequences—that weighting being 1/2^n for a sequence of length n. (Note that the numbers I got, A=12 and B=6, are so much higher than the A≈2.7 and B=3 you get.) It’s an interesting selection effect.
The thing is that, if you roll a 6 and then a non-6, in an “A” sequence you’re likely to just die due to rolling an odd number before you succeed in getting the double 6, and thus exclude the sequence from the surviving set; whereas in a “B” sequence there’s a much higher chance you’ll roll a 6 before dying, and thus include this longer “sequence of 3+ rolls” in the set.
To illustrate with an extreme version, consider:
Obviously that’s one way to reduce A to 2.
Excluding odd rolls completely, so the die has a 1⁄3 chance of rolling 6 and a 2⁄3 chance of rolling an even number that’s not 6, we have:
A = 1 + 1⁄3 * A2 + 2⁄3 * A
Where A2 represents “the expected number of die rolls until you get two 6′s in a row, given that the last roll was a 6”. Subtraction and multiplication then yields:
A = 3 + A2
And if we consider rolling a die from the A2 state, we get:
A2 = 1 + 1⁄3 * 0 + 2⁄3 * A
= 1 + 2⁄3 * A
Substituting:
A = 3 + 1 + 2⁄3 * A
=> (subtract)
1⁄3 * A = 4
=> (multiply)
A = 12
For B, a similar approach yields the equations:
B = 1 + 1⁄3 * B2 + 2⁄3 * B
B2 = 1 + 1⁄3 * 0 + 2⁄3 * B2
And the reader may solve for B = 6.
Yes! This kind of kills the “paradox”. Its approaching an apples and oranges comparison.
Surviving sequences with n=100 rolls (for illustrative purposes)
[6, 6]
[6, 6]
[2, 6, 6]
[6, 6]
[2, 6, 6]
[6, 6]
Estimate for A: 2.333
[6, 6]
[4, 4, 6, 2, 2, 6]
[6, 6]
[6, 2, 4, 4, 6]
[6, 4, 6]
[4, 4, 6, 4, 6]
[6, 6]
[6, 6]
Estimate for B: 3.375
if you rephrase
A: The probability that you roll a fair die until you roll two 6s in a row, given that all rolls were even.
B: The probability that you roll a fair die until you roll two non-consecutive 6s (not necessarily in a row), given that all rolls were even.
This changes the code to:
A_estimate = num_sequences_without_odds/n
B_estimate = num_sequences_without_odds/n
And the result (n=100000)
Estimate for A: 0.045
Estimate for B: 0.062
I guess this is what most people where thinking when reading the problem, i.e., its a bigger chance of getting two non consecutive 6s. But by the wording (see above) of the “paradox” it gives more rolls on average for the surviving sequences, but you on the other hand have more surviving sequences hence higher probability.
It’s worth highlighting that the two expectations do not condition on the same event. This explains why we can have E[A | all even] < E[B | even] even though A ≥ B almost surely: the two “all even”s actually refer to different events.