“each program is further weighted by its fit to all data observed so far. This gives you a weighted mixture of experts that can predict future bits.”
I don’t see it explained anywhere what algorithm is used to weight the experts for this measure. Does it matter? And how are the “fit” probabilities and “complexity” probabilities combined? Multiply and normalize?
Although it has been years, and Anonymous may never see this, I just want to point out to any future readers that have their best thoughts in the shower that decent waterproof notepads now exist. “AquaNotes” is one I have tried, and it works exactly as advertised. And the paper isn’t unreasonably thick either...