This seems to me like a valuable post, both on the object level, and as a particularly emblematic example of a category (“Just-so-story debunkers”) that would be good to broadly encourage.

The tradeoff view of manioc production is an excellent insight, and is an important objection to encourage: the original post and book (haven’t read in the entirety) appear to have leaned to heavily on what might be described as a special case of a just-so story: the phenomena is a behavior difference is explained as an absolute by using a post-hoc framework, and then doesn’t evaluate the meaning of the narrative beyond the intended explanatory effect.

This is incredibly important, because just-so stories have a high potential to deceive a careless agent. Let’s look at the recent example of a AstroZeneca’s vaccine. Due to a mistake, one section of the vaccine arm of the trial was dosed with a half dose followed by a full dose. Science isn’t completely broken, so the possibility that this is a fluke is being considered, but potential causes for why a half-dose full-dose regime (HDFDR) would be more effective have also been proposed. Figuring out how much to update on these pieces of evidence is somewhat difficult, because the selection effect is normally not crucial to evaluating hypotheses in the presence of theory.

To put it mathematically, let A be “HDFDR is more effective than a normal regime,” B be “AstroZeneca’s groups with HDFDR were more COVID-safe than the treatment group,” C be “post-B, a explanation that predicts A is accepted as fact,” and D be “pre-B, a explanation that predicts A is accepted as the scientific consensus.

We’re interested in P(A|B), P(A|(B&C)), and P(A|(B&D)). P(A|B) is fairly straightforward: By simple application of Bayes’s theorem, P(A|B)=P(B|A)*P(A)/(P(A)*P(B|A)+P(¬A)*P(B|¬A). Plugging in toy numbers, let P(B|A)=90% (if HDFDR was more effective, we’re pretty sure that the HDFDR would have been more effective in AstroZeneca’s trial), P(A)=5% (this is a weird result that was not anticipated, but isn’t totally insane). P(B|¬A)=10% (this one is a bit arbitrary, and it depends on the size/power of the trials, a brief google suggests that this is not totally insane). Then, P(A|B)=0.90*0.05/(0.9*0.05+0.95*0.1)=0.32

Next, let’s look at P(A|B&C). We’re interested in finding the updated probability of A, after observing B and then observing C, meaning we can use our updated prior: P(A|C)=P(C|(A&B))*P(A|B)/(P(C|(A&B))*P(A|B) + P(C|(¬A)&B) * P(¬A|B)). If we slightly exaggerate how broken the world is for the sake of this example, and say that P(C|A&B)=0.99 and P(C|¬A&B)=0.9 (If there is a real scientific explanation, we are almost certain to find it, if there is not, then we’ll likely still find something that looks right), then this simplifies to 0.99*0.32/(0.99*0.32+ 0.9 * 0.68), or 0.34: post-hoc evidence adds very little credence in a complex system in which there are sufficient effects that any result can be explained.

This should not, however, be taken as a suggestion to disregard all theories or scientific explorations in complex systems as evidence. Pre-hoc evidence is very valuable: P(A|D&B) can be first evaluated by evaluating P(A|D)=P(D|A)*P(A)/(P(A)*P(D|A)+P(¬A)*P(D|¬A). As before, P(A)=0.05. Filling in other values with roughly reasonable numbers: P(D|¬A)=0.05 (coming up with an incorrect explanation with no motivation is very unlikely), P(D|A)=0.5 (there’s a fair chance we’ll find a legitimate explanation with no prior motivation). These choices also roughly preserve the log-odds relationship between P(C|A&B) and P(C|¬A&B). Already, this is a 34% chance of A, which further demonstrates the value of pre-registering trials and testing hypotheses.

P(A|B&D) then equals P(B|(A&D))*P(A|D)/(P(D|(A&B))*P(A|D)) + P(B|(¬A)&D) * P(¬A|D)). Notably, D has no impact on B (assuming a well-run trial, which allows further generalization), meaning P(B|A&D)=P(B|A), simplifying this to P(B|(A))*P(A|D)/(P(B|(A))*P(A|D)) + P(B|(¬A)) * P(¬A|D)), or 0.9*0.34/( 0.9*0.34+0.1*0.66), or 0.82. This is a stark difference from the previous case, and suggests that the timing of theories is crucial in determining how a Bayesian reasoner ought to evaluate statements. Unfortunately, this information is often hard to acquire, and must be carefully interrogated.

In case the analogy isn’t clear, in this case, the equivalent of a unexpected regime being more effective is that reason apparently breaks down and yields severely suboptimal results: the hypothesis that reason is actually less useful than culture in problems with non-monotonically increasing rewards as the solution progresses is a possible one, but because it was likely arrived at to explain the results of the manioc story, the existence of this hypothesis is weak evidence to prefer it over the hypothesis with more prior probability mass: that different cultures value time in different ways.

Obviously, this Bayesian approach isn’t particularly novel, but I think it’s a useful reminder as to why we have to be careful about the types of problems outlined in this post, especially in the case of complex systems where multiple strategies are potentially legitimate. I strongly support collation on a meta-level to express approval for the debunking of just-so stories and allowing better reasoning. This is especially true when the just-so story has a ring of truth, and meshes well with cultural narratives.

This seems to me like a valuable post, both on the object level, and as a particularly emblematic example of a category (“Just-so-story debunkers”) that would be good to broadly encourage.

The tradeoff view of manioc production is an excellent insight, and is an important objection to encourage: the original post and book (haven’t read in the entirety) appear to have leaned to heavily on what might be described as a special case of a just-so story: the phenomena is a behavior difference is explained as an absolute by using a post-hoc framework, and then doesn’t evaluate the meaning of the narrative beyond the intended explanatory effect.

This is incredibly important, because just-so stories have a high potential to deceive a careless agent. Let’s look at the recent example of a AstroZeneca’s vaccine. Due to a mistake, one section of the vaccine arm of the trial was dosed with a half dose followed by a full dose. Science isn’t completely broken, so the possibility that this is a fluke is being considered, but potential causes for why a half-dose full-dose regime (HDFDR) would be more effective have also been proposed. Figuring out how much to update on these pieces of evidence is somewhat difficult, because the selection effect is normally not crucial to evaluating hypotheses in the presence of theory.

To put it mathematically, let A be “HDFDR is more effective than a normal regime,” B be “AstroZeneca’s groups with HDFDR were more COVID-safe than the treatment group,” C be “post-B, a explanation that predicts A is accepted as fact,” and D be “pre-B, a explanation that predicts A is accepted as the scientific consensus.

We’re interested in P(A|B), P(A|(B&C)), and P(A|(B&D)). P(A|B) is fairly straightforward: By simple application of Bayes’s theorem, P(A|B)=P(B|A)*P(A)/(P(A)*P(B|A)+P(¬A)*P(B|¬A). Plugging in toy numbers, let P(B|A)=90% (if HDFDR was more effective, we’re pretty sure that the HDFDR would have been more effective in AstroZeneca’s trial), P(A)=5% (this is a weird result that was not anticipated, but isn’t totally insane). P(B|¬A)=10% (this one is a bit arbitrary, and it depends on the size/power of the trials, a brief google suggests that this is not totally insane). Then, P(A|B)=0.90*0.05/(0.9*0.05+0.95*0.1)=0.32

Next, let’s look at P(A|B&C). We’re interested in finding the updated probability of A, after observing B and then observing C, meaning we can use our updated prior: P(A|C)=P(C|(A&B))*P(A|B)/(P(C|(A&B))*P(A|B) + P(C|(¬A)&B) * P(¬A|B)). If we slightly exaggerate how broken the world is for the sake of this example, and say that P(C|A&B)=0.99 and P(C|¬A&B)=0.9 (If there is a real scientific explanation, we are almost certain to find it, if there is not, then we’ll likely still find something that looks right), then this simplifies to 0.99*0.32/(0.99*0.32+ 0.9 * 0.68), or 0.34: post-hoc evidence adds very little credence in a complex system in which there are sufficient effects that any result can be explained.

This should not, however, be taken as a suggestion to disregard all theories or scientific explorations in complex systems as evidence. Pre-hoc evidence is very valuable: P(A|D&B) can be first evaluated by evaluating P(A|D)=P(D|A)*P(A)/(P(A)*P(D|A)+P(¬A)*P(D|¬A). As before, P(A)=0.05. Filling in other values with roughly reasonable numbers: P(D|¬A)=0.05 (coming up with an incorrect explanation with no motivation is very unlikely), P(D|A)=0.5 (there’s a fair chance we’ll find a legitimate explanation with no prior motivation). These choices also roughly preserve the log-odds relationship between P(C|A&B) and P(C|¬A&B). Already, this is a 34% chance of A, which further demonstrates the value of pre-registering trials and testing hypotheses.

P(A|B&D) then equals P(B|(A&D))*P(A|D)/(P(D|(A&B))*P(A|D)) + P(B|(¬A)&D) * P(¬A|D)). Notably, D has no impact on B (assuming a well-run trial, which allows further generalization), meaning P(B|A&D)=P(B|A), simplifying this to P(B|(A))*P(A|D)/(P(B|(A))*P(A|D)) + P(B|(¬A)) * P(¬A|D)), or 0.9*0.34/( 0.9*0.34+0.1*0.66), or 0.82. This is a stark difference from the previous case, and suggests that the timing of theories is crucial in determining how a Bayesian reasoner ought to evaluate statements. Unfortunately, this information is often hard to acquire, and must be carefully interrogated.

In case the analogy isn’t clear, in this case, the equivalent of a unexpected regime being more effective is that reason apparently breaks down and yields severely suboptimal results: the hypothesis that reason is actually less useful than culture in problems with non-monotonically increasing rewards as the solution progresses is a possible one, but because it was likely arrived at to explain the results of the manioc story, the existence of this hypothesis is weak evidence to prefer it over the hypothesis with more prior probability mass: that different cultures value time in different ways.

Obviously, this Bayesian approach isn’t particularly novel, but I think it’s a useful reminder as to why we have to be careful about the types of problems outlined in this post, especially in the case of complex systems where multiple strategies are potentially legitimate. I strongly support collation on a meta-level to express approval for the debunking of just-so stories and allowing better reasoning. This is especially true when the just-so story has a ring of truth, and meshes well with cultural narratives.