The short answer is that you should put more of your probability mass on T1′s prediction because experts vary, and an expert’s past performance is at least somewhat predictive of his future performance.
We need to assume that all else is symmetrical: you had equal priors over the results of the next experiment before you heard the scientists’ theories; the scientists were of equal apparent caliber; P( the first twenty experimental results | T1 ) = P( the first twenty experimental results | T2); neither theorist influenced the process by which the next experiment was chosen; etc.
Suppose we have a bag of experts, each of which contains a function for generating theories from data. We draw a first expert from our bag at random and show him data points 1-10; expert 1 generates theory T1. We draw a second expert from our bag at random and show him data points 1-20: expert 2 generates theory T2.
Given the manner in which real human experts vary (some know more than others about a given domain; some aim for accuracy where others aim to support their own political factions; etc.), it is reasonable to suppose that some experts have priors that are well aligned with the problem at hand (or behave as if they do) while others have priors that are poorly aligned. Expert 1 distinguished himself by accurately predicting the results of experiments 11-20 from the results of experiments 1-10; many predictive processes would not have done so well. Expert 2 has only shown an ability to find some theory that is consistent with the results of experiments 1-20; many predictive processes put a non-zero prior on some such theory that would not have given the results of experiments 11-20 “most expected” status based only on the results from experiments 1-10. We should therefore expect better future performance from Expert 1, all else equal.
The problem at hand is complicated slightly in that we are judging, not experts, but theories, and the two experts generated their theories at different times from different amounts of information. If Expert 1 would have assigned a probability < 1 to results 11-20 (despite producing a theory that predicted those results), Expert 2 is working from more information than Expert 1, which gives Expert 2 at least a slight advantage. Still, given the details of human variability and the fact that Expert 1 did predict results 11-20, I would expect the former consideration to outweigh the latter.
Formalizing Vijay’s answer here:
The short answer is that you should put more of your probability mass on T1′s prediction because experts vary, and an expert’s past performance is at least somewhat predictive of his future performance.
We need to assume that all else is symmetrical: you had equal priors over the results of the next experiment before you heard the scientists’ theories; the scientists were of equal apparent caliber; P( the first twenty experimental results | T1 ) = P( the first twenty experimental results | T2); neither theorist influenced the process by which the next experiment was chosen; etc.
Suppose we have a bag of experts, each of which contains a function for generating theories from data. We draw a first expert from our bag at random and show him data points 1-10; expert 1 generates theory T1. We draw a second expert from our bag at random and show him data points 1-20: expert 2 generates theory T2.
Given the manner in which real human experts vary (some know more than others about a given domain; some aim for accuracy where others aim to support their own political factions; etc.), it is reasonable to suppose that some experts have priors that are well aligned with the problem at hand (or behave as if they do) while others have priors that are poorly aligned. Expert 1 distinguished himself by accurately predicting the results of experiments 11-20 from the results of experiments 1-10; many predictive processes would not have done so well. Expert 2 has only shown an ability to find some theory that is consistent with the results of experiments 1-20; many predictive processes put a non-zero prior on some such theory that would not have given the results of experiments 11-20 “most expected” status based only on the results from experiments 1-10. We should therefore expect better future performance from Expert 1, all else equal.
The problem at hand is complicated slightly in that we are judging, not experts, but theories, and the two experts generated their theories at different times from different amounts of information. If Expert 1 would have assigned a probability < 1 to results 11-20 (despite producing a theory that predicted those results), Expert 2 is working from more information than Expert 1, which gives Expert 2 at least a slight advantage. Still, given the details of human variability and the fact that Expert 1 did predict results 11-20, I would expect the former consideration to outweigh the latter.