IIRC the p-value is the probability that this is a result from chance. So a p-value of .25 means it’s 25% likely to be by chance, and a p-value of .05 means it is 5% likely to happen by chance.
Any p-value less than .5 means that the explanation tested is better than chance; the p-value being statistically significant means even if you measure several things it’s STILL more likely not to be chance, instead of an outlier.
EDIT: the sub comments here explain thing mic better than I did, and I think better than I can, so I leave it to readers to look to them.
Any p-value less than .5 means that the explanation tested is better than chance;
A p-value less than .5 means that the actual experimental result or a more extreme one (what that means depends on one’s choice of the null hypothesis and few other things) would happen with less than 0.5 chance if the null hypothesis is true. It does not follow that the explanation is better than (the explanation that the results were obtained by) chance.
Note that:
The p-value depends on the null hypothesis H0 and the results but it does not depend on the tested explanation (in fact there is no explanation causally linked to the test except “the null hypothesis is true/false”).
The p-value is equal to P(result or more extreme | H0), which is neither equal to P(H0 | result) nor P(~H0 | result) (and of course not P(an explanation different from H0 | result)) nor related to any of them by a unique relation (even if we forget the “or more extreme” part). Another quantity, typically prior P(H0), is needed to calculate the posterior probability of H0 after observing the result.
The sentences “the obtained result is 27% likely due to chance” and “the result is 27% likely to happen by chance” sound similar, but the former is more likely to be understood as “having obtained this very result we conclude that there is 27% probability that no mechanism distinct from chance has caused it”, while the latter is likely to be understood “assuming no mechanism distinct from chance is at work, this result is likely to be obtained with probability 27%”. Since humans often misunderstand analogous probabilistic statements, it’s wise to be very careful with formulations in such a context, especially when explaining the matter.
(I wanted to stress that the validity of null hypothesis is really an assumption one has to make here and through a lazy mental shortcut iff appeared suitable as a stronger version of if. Later I realised that it is not only false, but rather “not even false”, given that “probability of A if B” is meant to represent p(A|B) - not sure what “probability of A only if B” would represent. But I was too tired to edit the post.)
The p-value is the probability that a result like that could have happened if only chance were at work. That this is not the same as the probability that the result is due to chance is easily seen from the fact that the p-value is inversely correlated with sample size. Surely sample size has no influence on whether there’s a real effect to be measured; it only affects how likely we are to detect the effect. There may be other reasons for thinking that chance is more or less likely; e.g. because there is an extremely plausible causal mechanism, or conversely because there are independent grounds to doubt any meaningful relationship could be present. If so, that can give you good reason for thinking the probability of the chance hypothesis remains lower or higher than the p-value, possibly much lower or much higher. If a study linked prayer and earthquakes with a .001 p-value (and fraud were ruled out as an explanation), it would still surely be most reasonable to think that chance produced an unlikely result (as of course it sometimes does). The current analysis may include instances of the converse situation, where it seems very unlikely that there is no connection, so it may be more reasonable to think that a small, skewed sample has inflated the p-value, rather than thinking that only chance is at work. I suppose I tend to think it probably does include such cases; I can easily believe that some of the effects of ethical theory on behavior could be very small, small enough to require a very large sample to reliably detect, but zero effect seems a priori unlikely in many of the examples.
Connected to Stuff that maes Stuff happen. A null hypothesis could be one where obesity, exercise and internet are not connected, or alternatively that exercise (or lack thereof) causes obesity, and internet is unrelated to both of these. Then, you can conduct an experiment and collect evidence for or against the null hypothesis. If p=P(data | null hypothesis is true)<0.05, a winner is you.
IIRC the p-value is the probability that this is a result from chance. So a p-value of .25 means it’s 25% likely to be by chance, and a p-value of .05 means it is 5% likely to happen by chance.
Any p-value less than .5 means that the explanation tested is better than chance; the p-value being statistically significant means even if you measure several things it’s STILL more likely not to be chance, instead of an outlier.
EDIT: the sub comments here explain thing mic better than I did, and I think better than I can, so I leave it to readers to look to them.
A p-value less than .5 means that the actual experimental result or a more extreme one (what that means depends on one’s choice of the null hypothesis and few other things) would happen with less than 0.5 chance if the null hypothesis is true. It does not follow that the explanation is better than (the explanation that the results were obtained by) chance.
Note that:
The p-value depends on the null hypothesis H0 and the results but it does not depend on the tested explanation (in fact there is no explanation causally linked to the test except “the null hypothesis is true/false”).
The p-value is equal to P(result or more extreme | H0), which is neither equal to P(H0 | result) nor P(~H0 | result) (and of course not P(an explanation different from H0 | result)) nor related to any of them by a unique relation (even if we forget the “or more extreme” part). Another quantity, typically prior P(H0), is needed to calculate the posterior probability of H0 after observing the result.
The sentences “the obtained result is 27% likely due to chance” and “the result is 27% likely to happen by chance” sound similar, but the former is more likely to be understood as “having obtained this very result we conclude that there is 27% probability that no mechanism distinct from chance has caused it”, while the latter is likely to be understood “assuming no mechanism distinct from chance is at work, this result is likely to be obtained with probability 27%”. Since humans often misunderstand analogous probabilistic statements, it’s wise to be very careful with formulations in such a context, especially when explaining the matter.
Thanks, this was very helpful. If you don’t mind it’s reuse , I will edit it into the LW wiki so it can be referred to in the future.
I don’t mind, of course.
Is this a typo? I think the ‘iff’ should be an ‘if’. The ‘only if’ implication is false.
Thanks, corrected.
(I wanted to stress that the validity of null hypothesis is really an assumption one has to make here and through a lazy mental shortcut iff appeared suitable as a stronger version of if. Later I realised that it is not only false, but rather “not even false”, given that “probability of A if B” is meant to represent p(A|B) - not sure what “probability of A only if B” would represent. But I was too tired to edit the post.)
The p-value is the probability that a result like that could have happened if only chance were at work. That this is not the same as the probability that the result is due to chance is easily seen from the fact that the p-value is inversely correlated with sample size. Surely sample size has no influence on whether there’s a real effect to be measured; it only affects how likely we are to detect the effect. There may be other reasons for thinking that chance is more or less likely; e.g. because there is an extremely plausible causal mechanism, or conversely because there are independent grounds to doubt any meaningful relationship could be present. If so, that can give you good reason for thinking the probability of the chance hypothesis remains lower or higher than the p-value, possibly much lower or much higher. If a study linked prayer and earthquakes with a .001 p-value (and fraud were ruled out as an explanation), it would still surely be most reasonable to think that chance produced an unlikely result (as of course it sometimes does). The current analysis may include instances of the converse situation, where it seems very unlikely that there is no connection, so it may be more reasonable to think that a small, skewed sample has inflated the p-value, rather than thinking that only chance is at work. I suppose I tend to think it probably does include such cases; I can easily believe that some of the effects of ethical theory on behavior could be very small, small enough to require a very large sample to reliably detect, but zero effect seems a priori unlikely in many of the examples.
Connected to Stuff that maes Stuff happen. A null hypothesis could be one where obesity, exercise and internet are not connected, or alternatively that exercise (or lack thereof) causes obesity, and internet is unrelated to both of these. Then, you can conduct an experiment and collect evidence for or against the null hypothesis. If p=P(data | null hypothesis is true)<0.05, a winner is you.