It’s not obvious to me that they got the Bayesian analysis right in that blog post. If you can have “no observation” for Y, it seems like what we actually observe is some Y’ that can take on the values {0,1,null}, and the probability distribution over our observations of the variables (X,R,Y’) is p(X) * P(R|X) * P(Y’|X,R).
EDIT: Never mind, it’s not a problem. Even if it was, it wouldn’t have changed their case that the Bayesian update won’t give you this “uniform consistency” property. Which seems like something worth looking into.
As for this “low information” bull-hockey, let us put a MML prior over theta(x) and never speak of it again.
It’s not obvious to me that they got the Bayesian analysis right in that blog post. If you can have “no observation” for Y, it seems like what we actually observe is some Y’ that can take on the values {0,1,null}, and the probability distribution over our observations of the variables (X,R,Y’) is p(X) * P(R|X) * P(Y’|X,R).
EDIT: Never mind, it’s not a problem. Even if it was, it wouldn’t have changed their case that the Bayesian update won’t give you this “uniform consistency” property. Which seems like something worth looking into.
As for this “low information” bull-hockey, let us put a MML prior over theta(x) and never speak of it again.