Hm, I’m not following your definitions of P and Q. Note that there’s no (that I know of) easy closed-form expression for the likelihoods of various sequences for these chains; I had to calculate them using dynamic programming on the Markov chains.
The relevant effect driving it is that the degree of shiftiness (how far it deviates from 50%-heads rate) builds up over a streak, so although in any given case where Switchy and Sticky deviate (say there’s a streak of 2, and Switchy has a 30% of continuing while Sticky has a 70% chance), they have the same degree of divergence, Switchy makes it less likely that you’ll run into these long streaks of divergences while Sticky makes it extremely likely. Neither Switchy nor Sticky gives a constant rate of switching; it depends on the streak length. (Compare a hypergeometric distribution.)
Take a look at §4 of the paper and the “Limited data (full sequence): asymmetric closeness and convergence” section of the Mathematica Notebook linked from the paper to see how I calculated their KL divergences. Let me know what you think!
I think it depends on what we mean by assuming the truth is in the center of the spectrum. In the model at the end, we assume is at the extreme left of the initial distribution—i.e. µ=40, while everyone’s estimates are higher than 40. Even then, we end up with a spread where those who end up in the middle (ish—not exactly the middle) are both more accurate and less biased.
What we do need is that wherever the truth is, people will end up being on either side of it. Obviously in some cases that won’t hold. But in many cases it will—it’s basically inevitable if people’s estimates are subject to noise and people’s priors aren’t in the completely wrong region of logical space.