Vaniver comments on A follow-up probability question: Data samples with different priors

Vaniver 26 Oct 2012 21:05 UTC
0 points
0
This is my current model of your problem:

You have a set S of start sites, each of which we can make propositions about. Each one of them has some position on the genome.

You’re interested in looking at each of the start sites and assessing some property- “does this start site overlap the previous gene’s stop site?” If that’s true for the particular start site s, we say Q(s)=1; otherwise, Q(s)=0 (using 0 and 1 as synonymous with true and false). This is unknown, so we refer to our uncertainty as P(Q(s)), which might starts off as 1/S for all s, or might vary with the start site. Knowing P(Q(i)) doesn’t tell us anything about P(Q(j)).

When we do an experiment, we get back an observation about s; suppose it signals either “heads” or “tails,” which I’ll shorten to H or T. We can calculate P(O(s)=H|Q(s)=1) and P(O(s)=H|Q(s)=0), and from that we can calculate the likelihood ratio used to update Q(s). Note that the likelihood ratio is dependent only on the probabilities of H, and thus is totally independent of the prior probability on Q(s).

We can do the experiments in batches- on, say, four sites at once. It will give a H or T reading for each start site, and the probabilities may depend on the number of sites measured at once. Thus, the likelihood ratio will be different based on the batch size- suppose we call a “heads” result when there are 2 sites tested H2.

Thus, we want to figure out, say, P(Q(s)|&, H2,T4,H4,H4). “&” stands for “all background knowledge,” which will basically be our prior, and assuming multiple experiments are independent conditioned on Q(s), then we can just multiply the odds contributed by the prior and each of the tests to get one final estimate for Q(s). Assuming we started off with 1:20, H4 contributes 2:1, T4 contributes 1:2, and H2 contributes 4:1, we end up with 1*4*1*2*2:20*1*2*1*1=2:5, and so Q(s)=2/7.