The matter seems terribly complex and interesting to me.
Notions of Accuracy?
Suppose is a prior which has uncertainty about and uncertainty about . This is the more ignorant prior. Consider some prior which has the same beliefs about the universal statement -- -- but which knows and .
We observe that can increase its credence in the universal statement by observing the first two instances, and , while cannot do this -- needs to wait for further evidence. This is interpreted as a defect.
The moral is apparently that a less ignorant prior can be worse than a more ignorant one; more specifically, it can learn more slowly.
However, I think we need to be careful about the informal notion of “more ignorant” at play here. We can formalize this by imagining a numerical measure of the accuracy of a prior. We might want it to be the case that more accurate priors are always better to start with. Put more precisely: a more accurate prior should also imply a more accurate posterior after updating. Paul’s example challenges this notion, but he does not prove that no plausible notion of accuracy will have this property; he only relies on an informal notion of ignorance.
So I think the question is open: when can a notion of accuracy fail to follow the rule “more accurate priors yield more accurate posteriors”? EG, can a proper scoring rule fail to meet this criterion? This question might be pretty easy to investigate.
Conditional probabilities also change?
I think the example rests on an intuitive notion that we can construct by imagining but modifying it to know and . However, the most obvious way to modify it so is by updating on those sentences. This fails to meet the conditions of the example, however; would already have an increased probability for the universal statement.
So, in order to move the probability of and upwards to 1 without also increasing the probability of the universal, we must do some damage to the probabilistic relationship between the instances and the universal. The prior doesn’t just know and ; it also believes the conditional probability of the universal statement given those two sentences to be lower than believes them to be.
It doesn’t think it should learn from them!
This supports Alexander’s argument that there is no paradox, I think. However, I am not ultimately convinced. Perhaps I will find more time to write about the matter later.
(continued..)
Explanations?
Alexander analyzes the difference between p1 and p2 in terms of the famous “explaining away” effect. Alexander supposes that p2 has learned some “causes”:
Postulating these causes adds something to the scenario. One possible view is that Alexander is correct so far as Alexander’s argument goes, but incorrect if there are no such Cj to consider.
However, I do not find myself endorsing Alexander’s argument even that far.
If C1 and C2 have a common form, or are correlated in some way—so there is an explanation which tells us why the first two sentences, ϕ(x1) and ϕ(x2), are true, and which does not apply to n>2 -- then I agree with Alexander’s argument.
If C1 and C2 are uncorrelated, then it starts to look like a coincidence. If I find a similarly uncorrelated C3 for ϕ(x3), C4 for ϕ(x4), and a few more, then it will feel positively unexplained. Although each explanation is individually satisfying, nowhere do I have an explanation of why all of them are turning up true.
I think the probability of the universal sentence should go up at this point.
So, what about my “conditional probabilities also change” variant of Alexander’s argument? We might intuitively think that ϕ(x1) and ϕ(x2) should be evidence for the universal generalization, but p2 does not believe this—its conditional probabilities indicate otherwise.
I find this ultimately unconvincing because the point of Paul’s example, in my view, is that more accurate priors do not imply more accurate posteriors. I still want to understand what conditions can lead to this (including whether it is true for all notions of “accuracy” satisfying some reasonable assumptions EG proper scoring rules).
Another reason I find it unconvincing is because even if we accepted this answer for the paradox of ignorance, I think it is not at all convincing for the problem of old evidence.
What is the ‘problem’ in the problem of old evidence?
… to be further expanded later …