A long time ago on Overcoming Bias, there was a thread started by Eliezer which was a link to a post on someone else’s blog. The linked post posed a question, something like: “Consider two scientists. One does twenty experiments, and formulates a theory that explains all twenty results. Another does ten experiments, formulates a theory that adequately explains all ten results, does another ten experiments, and finds that eir theory correctly predicted the results. Which theory should we trust more and why?”
I remember Eliezer said he thought he had an answer to the question but was going to wait before posting it. I’ve since lost the post. Does anyone remember what post this is or whether anyone ever gave a really good formal answer?
I remember vaguely one discussion of Bayesianism vs. frequentism, where the frequentists held that a study in which the experimenters resolved to keep making observations until they observed Foo X times (for a total of X times and Y total observations) must be treated statistically different from a study where the experimenters resolved to make Y total observations but wound up X times observing Foo; while from the Bayesian perspective, these studies, with their identical objective facts, ought to be treated the same.
How valuable was each experiment? If you make a theory after ten experiments, then you could design the next ten experiments to be very specific to the theory: if the theory is right, they come out one way, and they should come out a very different way if the theory is wrong.
It’s like writing test-cases for software: once you know what exactly you’re testing, you can write test cases that aim at any potential weak spots, in a deliberate and targeted attempt to break it. So if we’re assuming that these scientists are actual people (rather than really hypothetical abstractions), I would give more credence to the guy who did ten experiments, formulated a theory, and did ten more experiments, iff those later ten experiments look like they were designed to give new information and really stress-test the theory.
If we’re talking about the exact same 20 experiments, then I would generally favor doing maybe 13 experiments, making a theory, and then doing the other 7, to avoid overfitting. Or split the experiments up into two sets of ten, and have two scientists each look at ten of the experiments, make a theory, then test it with the other ten. This kind of thinking would have killed Ptolemy’s theory of epicycles, which is a classic case of overcomplicating the theory to match your observations.
I know that’s hardly a formal answer, but I think the original question was oversimplifying.
A long time ago on Overcoming Bias, there was a thread started by Eliezer which was a link to a post on someone else’s blog. The linked post posed a question, something like: “Consider two scientists. One does twenty experiments, and formulates a theory that explains all twenty results. Another does ten experiments, formulates a theory that adequately explains all ten results, does another ten experiments, and finds that eir theory correctly predicted the results. Which theory should we trust more and why?”
I remember Eliezer said he thought he had an answer to the question but was going to wait before posting it. I’ve since lost the post. Does anyone remember what post this is or whether anyone ever gave a really good formal answer?
Here’s the post
I remember vaguely one discussion of Bayesianism vs. frequentism, where the frequentists held that a study in which the experimenters resolved to keep making observations until they observed Foo X times (for a total of X times and Y total observations) must be treated statistically different from a study where the experimenters resolved to make Y total observations but wound up X times observing Foo; while from the Bayesian perspective, these studies, with their identical objective facts, ought to be treated the same.
Does this sound like it?
How valuable was each experiment? If you make a theory after ten experiments, then you could design the next ten experiments to be very specific to the theory: if the theory is right, they come out one way, and they should come out a very different way if the theory is wrong.
It’s like writing test-cases for software: once you know what exactly you’re testing, you can write test cases that aim at any potential weak spots, in a deliberate and targeted attempt to break it. So if we’re assuming that these scientists are actual people (rather than really hypothetical abstractions), I would give more credence to the guy who did ten experiments, formulated a theory, and did ten more experiments, iff those later ten experiments look like they were designed to give new information and really stress-test the theory.
If we’re talking about the exact same 20 experiments, then I would generally favor doing maybe 13 experiments, making a theory, and then doing the other 7, to avoid overfitting. Or split the experiments up into two sets of ten, and have two scientists each look at ten of the experiments, make a theory, then test it with the other ten. This kind of thinking would have killed Ptolemy’s theory of epicycles, which is a classic case of overcomplicating the theory to match your observations.
I know that’s hardly a formal answer, but I think the original question was oversimplifying.