Hm, I’m not following your definitions of P and Q. Note that there’s no (that I know of) easy closed-form expression for the likelihoods of various sequences for these chains; I had to calculate them using dynamic programming on the Markov chains.
The relevant effect driving it is that the degree of shiftiness (how far it deviates from 50%-heads rate) builds up over a streak, so although in any given case where Switchy and Sticky deviate (say there’s a streak of 2, and Switchy has a 30% of continuing while Sticky has a 70% chance), they have the same degree of divergence, Switchy makes it less likely that you’ll run into these long streaks of divergences while Sticky makes it extremely likely. Neither Switchy nor Sticky gives a constant rate of switching; it depends on the streak length. (Compare a hypergeometric distribution.)
Take a look at §4 of the paper and the “Limited data (full sequence): asymmetric closeness and convergence” section of the Mathematica Notebook linked from the paper to see how I calculated their KL divergences. Let me know what you think!
Hm, I’m not following your definitions of P and Q. Note that there’s no (that I know of) easy closed-form expression for the likelihoods of various sequences for these chains; I had to calculate them using dynamic programming on the Markov chains.
The relevant effect driving it is that the degree of shiftiness (how far it deviates from 50%-heads rate) builds up over a streak, so although in any given case where Switchy and Sticky deviate (say there’s a streak of 2, and Switchy has a 30% of continuing while Sticky has a 70% chance), they have the same degree of divergence, Switchy makes it less likely that you’ll run into these long streaks of divergences while Sticky makes it extremely likely. Neither Switchy nor Sticky gives a constant rate of switching; it depends on the streak length. (Compare a hypergeometric distribution.)
Take a look at §4 of the paper and the “Limited data (full sequence): asymmetric closeness and convergence” section of the Mathematica Notebook linked from the paper to see how I calculated their KL divergences. Let me know what you think!