Again, Bayesians would start with a very low prior for Atlantis, and assess the evidence as very low, and end up with a probability distribution something like Khafre 80%, Khufu 19.999999%, Atlantis 0.000001%
This isn’t quite how a pure Bayesian analysis would work. We should end up with higher probability for Khafre/Khufu, even if the prior starts with comparable weight on all three.
We want to calculate the probability that the sphinx was built by Atlanteans, given the evidence: P[atlantis | evidence]. By Bayes’ rule, that’s proportional to P[evidence | atlantis] times the prior P[atlantis]. Let’s just go ahead and fix the prior at 1⁄3 for the sake of exposition, so that the heavy lifting will be done by P[evidence | atlantis].
The key piece: what does P[evidence | atlantis] mean? If the new-agers say “ah, the Atlantis theory predicts all of this evidence perfectly”, does that mean that P[evidence | atlantis] is very high? No, because we expect that the new-agers would have said that regardless of what evidence was found. A theory cannot assign high probability to all possible evidence, because the theory’s evidence-distribution must sum to one. To properly compute P[evidence | atlantis], we have to step back and ask “before seeing this evidence, what probability would I assign it, assuming the sphinx was actually built by Atlanteans?”
What matters most for computing P[evidence | atlantis] is that the Atlantis theory puts nonzero probability on all sorts of unusual hypothetical evidence-scenarios. For instance, if somebody ran an ultrasound on the sphinx, and found that it contained pure aluminum, or a compact nuclear reactor, or a cavity containing tablets with linear A script on them, or anything else that Egyptians would definitely not have put in there… the Atlantis theory would put nonzero probability on all those crazy possibilities. But there’s a lot of crazy possibilities, and allocating probability to all of them means that there can’t be very much left for the boring possibilities—remember, it all has to add up to one, so we’re on a limited probability budget here. On the other hand, Khafre/Khufu both assign basically-zero probability to all the crazy possibilities, which leaves basically their entire probability budget on the boring stuff.
So when the evidence actually ends up being pretty boring, P[evidence | atlantis] has to be a lot lower than P[evidence | khafre] or P[evidence | khufu].
I feel like this is mostly a question of what you mean with “atlantis”.
If you want to calculate P(evidence | the_specific_atlantis_that_newagers_specified_after_hearing_the_evidence) * P(the_specific_atlantis_that_newagers_specified_after_hearing_the_evidence), then the first term is going to be pretty high, and the second term would be very low (because it specifies a lot of things about what the atleantans did).
But if you want to calculate P(evidence | the_type_of_atlantis_that_people_mostly_associate_to_before_thinking_about_the_sphinx) * P(the_type_of_atlantis_that_people_mostly_associate_to_before_thinking_about_the_sphinx), the first term would be very low, while the second term would be somewhat higher.
The difference between the two cases is whether you think about the new agers as holding exactly one hypothesis and lying about what it predicts (as it cannot assign high probability to all of the things, since you’re correct that the different probabilities must sum to 1), or whether you think about the new agers as switching to a new hypothesis every time they discover a new fact about the sphinx / every time they’re asked a new question.
In this particular article, Scott mostly wants to make a point about cases where theories have similar P(E|T) but differ in the prior probabilities, so he focused on the first case.
This isn’t quite how a pure Bayesian analysis would work. We should end up with higher probability for Khafre/Khufu, even if the prior starts with comparable weight on all three.
We want to calculate the probability that the sphinx was built by Atlanteans, given the evidence: P[atlantis | evidence]. By Bayes’ rule, that’s proportional to P[evidence | atlantis] times the prior P[atlantis]. Let’s just go ahead and fix the prior at 1⁄3 for the sake of exposition, so that the heavy lifting will be done by P[evidence | atlantis].
The key piece: what does P[evidence | atlantis] mean? If the new-agers say “ah, the Atlantis theory predicts all of this evidence perfectly”, does that mean that P[evidence | atlantis] is very high? No, because we expect that the new-agers would have said that regardless of what evidence was found. A theory cannot assign high probability to all possible evidence, because the theory’s evidence-distribution must sum to one. To properly compute P[evidence | atlantis], we have to step back and ask “before seeing this evidence, what probability would I assign it, assuming the sphinx was actually built by Atlanteans?”
What matters most for computing P[evidence | atlantis] is that the Atlantis theory puts nonzero probability on all sorts of unusual hypothetical evidence-scenarios. For instance, if somebody ran an ultrasound on the sphinx, and found that it contained pure aluminum, or a compact nuclear reactor, or a cavity containing tablets with linear A script on them, or anything else that Egyptians would definitely not have put in there… the Atlantis theory would put nonzero probability on all those crazy possibilities. But there’s a lot of crazy possibilities, and allocating probability to all of them means that there can’t be very much left for the boring possibilities—remember, it all has to add up to one, so we’re on a limited probability budget here. On the other hand, Khafre/Khufu both assign basically-zero probability to all the crazy possibilities, which leaves basically their entire probability budget on the boring stuff.
So when the evidence actually ends up being pretty boring, P[evidence | atlantis] has to be a lot lower than P[evidence | khafre] or P[evidence | khufu].
I feel like this is mostly a question of what you mean with “atlantis”.
If you want to calculate P(evidence | the_specific_atlantis_that_newagers_specified_after_hearing_the_evidence) * P(the_specific_atlantis_that_newagers_specified_after_hearing_the_evidence), then the first term is going to be pretty high, and the second term would be very low (because it specifies a lot of things about what the atleantans did).
But if you want to calculate P(evidence | the_type_of_atlantis_that_people_mostly_associate_to_before_thinking_about_the_sphinx) * P(the_type_of_atlantis_that_people_mostly_associate_to_before_thinking_about_the_sphinx), the first term would be very low, while the second term would be somewhat higher.
The difference between the two cases is whether you think about the new agers as holding exactly one hypothesis and lying about what it predicts (as it cannot assign high probability to all of the things, since you’re correct that the different probabilities must sum to 1), or whether you think about the new agers as switching to a new hypothesis every time they discover a new fact about the sphinx / every time they’re asked a new question.
In this particular article, Scott mostly wants to make a point about cases where theories have similar P(E|T) but differ in the prior probabilities, so he focused on the first case.
Ah, I see. Thanks.