in my opinion, this is a poor choice of problem for demonstrating the generator/predictor simplicity gap.
If not restricted to Markov model based predictors, we can do a lot better simplicity-wise.
Simple Bayesian predictor tracks one real valued probability B in range 0...1. Probability of state A is implicitly 1-B.
This is initialized to B=p/(p+q) as a prior given equilibrium probabilities of A/B states after many time steps.
P("1")=qA is our prediction with P("0")=1-P("1") implicitly.
Then update the usual Bayesian way:
if “1”, B=0 (known state transition to A)
if “0″, A,B:=(A*(1-p),A*p+B*(1-q)), then normalise by dividing both by the sum. (standard bayesian update discarding falsified B-->A state transition)
In one step after simplification: B:=(B(1+p-q)-p)/(Bq-1)
That’s a lot more practical than having infinite states. Numerical stability and achieving acceptable accuracy of a real implementable predictor is straightforward but not trivial. A near perfect predictor is only slightly larger than the generator.
A perfect predictor can use 1 bit (have we ever observed a 1) and ceil(log2(n)) bits counting n, the number of observed zeroes in the last run to calculate the perfectly correct prediction. Technically as n-->infinity this turns into infinite bits but scaling is logarithmic so a practical predictor will never need more than ~500 bits given known physics.
in my opinion, this is a poor choice of problem for demonstrating the generator/predictor simplicity gap.
If not restricted to Markov model based predictors, we can do a lot better simplicity-wise.
Simple Bayesian predictor tracks one real valued probability B in range 0...1. Probability of state A is implicitly 1-B.
This is initialized to
B=p/(p+q)as a prior given equilibrium probabilities of A/B states after many time steps.P("1")=qAis our prediction withP("0")=1-P("1")implicitly.Then update the usual Bayesian way: if “1”,
B=0(known state transition to A) if “0″,A,B:=(A*(1-p),A*p+B*(1-q)), then normalise by dividing both by the sum. (standard bayesian update discarding falsified B-->A state transition) In one step after simplification:B:=(B(1+p-q)-p)/(Bq-1)That’s a lot more practical than having infinite states. Numerical stability and achieving acceptable accuracy of a real implementable predictor is straightforward but not trivial. A near perfect predictor is only slightly larger than the generator.
A perfect predictor can use 1 bit (have we ever observed a 1) and ceil(log2(n)) bits counting n, the number of observed zeroes in the last run to calculate the perfectly correct prediction. Technically as n-->infinity this turns into infinite bits but scaling is logarithmic so a practical predictor will never need more than ~500 bits given known physics.
Yes—this is specifically staying within the framework of hidden markov chains.
Even if you go outside though it seems you agree there is a generative predictive gap—you’re just saying it’s not infinite.
Eggsyntax below gives the canonical example of hash function where prediction is harder than generation which hold for general computable processes.