JQuinton comments on Open thread, July 21-27, 2014

JQuinton 21 Jul 2014 21:34 UTC
2 points
Question about Bayesian updates.

Say Jane goes to get a cancer screening. 5% prior of having cancer, the machine has a success rate of 80% and a false positive rate of 9%. Jane gets a positive on the test and so she now has a ~30% chance of having cancer.

Jane goes to get a second opinion across the country. A second cancer screening (same success/false positive rates) says she doesn’t have cancer. What is her probability for having cancer now?
- polymathwannabe 21 Jul 2014 22:49 UTC
  8 points
  Parent
  According to your percentages, out of every 10,000 women, 5% = 500 have cancer and 95% = 9,500 do not.
  
  Of those 500 women with cancer, 80% = 400 will get a positive test and 20% = 100 will get a negative one.
  
  Out of those 9,500 women without cancer, 9% = 855 will get a positive test and 91% = 8,645 will get a negative one.
  
  After taking the first test, Jane belongs to the group of 1,255 women out of every 10,000 who have a positive test.
  
  Of those 1,255 women, 400 have cancer. Jane’s likelihood of having cancer is ⁴⁰⁰⁄_1,255 = 31.87%.
  
  If we take those 1,255 women to a second test, 80% = 320 of the 400 women with cancer will get a positive test and 20% = 80 will get a negative test.
  
  Of those same 1,255 women with a first positive test, 9% = 77 of the 855 women without cancer will get a positive test and 91% = 778 will get a negative test.
  
  After taking the second test, Jane belongs to the group of 858 women out of every 10,000 with one positive and one negative test.
  
  Of those 858 women, 80 have cancer. Now Jane’s likelihood of having cancer is ⁸⁰⁄₈₅₈ = 9.32%.
- Scott Garrabrant 21 Jul 2014 22:39 UTC
  7 points
  Parent
  Are we assuming the two tests are independent?
  
  If so, the original cancer rate was 5:95. Multiply that by 80:9 for the likelihood ratio of getting a positive to get 400:855, which is ~30% as you said. Then, you multiply by the likelihood ratio of getting the second negative 20:91, to get 8000:77805, which as a probability is 8000/(8000+77805)~9.3%.
- Unnamed 22 Jul 2014 1:23 UTC
  4 points
  Parent
  (Assuming that two tests are independent, which is a rather unrealistic assumption in this case) If you know how to calculate the ~30% answer to the first part of the question, then this problem is pretty straightforward to solve. Just use Bayes’ rule again, treating the posterior from your first calculation (~30%) as your prior for the next calculation.
  
  If Kim came from a population that had a ~30% prior of having cancer and took one test which came out negative, then her probability after that one test would be the same as Jane’s probability after both tests.
- Manfred 22 Jul 2014 0:12 UTC
  3 points
  Parent
  Doing this with probabilities is a bit more complicated than what Coscott did, but to illustrate it anyhow...
  
  where A is cancer and and C are the two test results, P(A|BC)=P(A) P(BC|A) / P(BC). P(A) is our prior of 5%. Because B and C are independent, P(BC|A) is just 0.8 * 0.2.
  
  P(BC) is where using probabilities is more complicated than using odds, because it’s not the probability of false positives, it’s the total prior probability of seeing B and then C. Using the product rule, P(BC) = P(B)*P(C|B). Then splitting the possibilities up into cancer and not-cancer, this becomes (P(AB)+P(¬A B))*(P(AC|B)+P(¬A C|B)). Because B and C are independent, the second part becomes (product rule) P(A|B)*P(C|A)+P(¬A|B)*P(C|¬A) - note that even when we added both pieces of evidence at once, we still have to calculate the intermediate probability P(A|B)! Stupid non-time-saving grumble grumble.
  
  Anyhow, if you plug in the numbers, it’s ~0.094.
- ChristianKl 23 Jul 2014 10:03 UTC
  2 points
  Parent
  What does “success rate” mean?
  - polymathwannabe 23 Jul 2014 12:27 UTC
    2 points
    Parent
    Accurately detecting a cancer that does exist.
    - ChristianKl 23 Jul 2014 12:54 UTC
      2 points
      Parent
      The accuracy of a test is generally defined as (Σ True positive + Σ True negative/Σ Total population). That something different then the sensitivity of a test.
      
      I think it’s useful to use the terms used in the statistical literature when talking about something like this instead of making up vague one on your own.