Pvalue
From Wikipedia, the free encyclopedia
In statistical hypothesis testing, the pvalue is the probability of obtaining a result at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. The fact that pvalues are based on this assumption is crucial to their correct interpretation.
More technically, a pvalue of an experiment is a random variable defined over the sample space of the experiment such that its distribution under the null hypothesis is uniform on the interval [0,1]. Many pvalues can be defined for the same experiment.
Contents 
[edit] Coin flipping example
For example, an experiment is performed to determine whether a coin flip is fair (50% chance of landing heads or tails) or unfairly biased, either toward heads (> 50% chance of landing heads) or toward tails (< 50% chance of landing heads). (A bent coin produces biased results.)
Since we consider both biased alternatives, a twotailed test is performed. The null hypothesis is that the coin is fair, and that any deviations from the 50% rate can be ascribed to chance alone.
Suppose that the experimental results show the coin turning up heads 14 times out of 20 total flips. The pvalue of this result would be the chance of a fair coin landing on heads at least 14 times out of 20 flips plus the chance of a fair coin landing on tails 14 or more times out of 20 flips. In this case the random variable T has a binomial distribution. The probability that 20 flips of a fair coin would result in 14 or more heads is 0.0577. By symmetry, the probability that 20 flips of the coin would result in 14 or more tails (alternatively, 6 or fewer heads) is the same, 0.0577. Thus, the pvalue for the coin turning up the same face 14 times out of 20 total flips is 0.0577 + 0.0577 = 0.1154 .
[edit] Interpretation
Generally, one rejects the null hypothesis if the pvalue is smaller than or equal to the significance level,^{[1]} often represented by the Greek letter α (alpha). If the level is 0.05, then results that are only 5% likely or less are deemed extraordinary, given that the null hypothesis is true.
In the above example we have:
 null hypothesis (H_{0}) — fair coin;
 observation (O) — 14 heads out of 20 flips; and
 probability (pvalue) of observation (O) given H_{0} — p(O  H_{0}) = 0.0577 × 2 (twotailed) = 0.1154 (percentage expressed as 11.54%).
The calculated pvalue exceeds 0.05, so the observation is consistent with the null hypothesis — that the observed result of 14 heads out of 20 flips can be ascribed to chance alone — as it falls within the range of what would happen 95% of the time were this in fact the case. In our example, we fail to reject the null hypothesis at the 5% level. Although the coin did not fall evenly, the deviation from expected outcome is just small enough to be reported as being "not statistically significant at the 5% level".
However, had a single extra head been obtained, the resulting pvalue (twotailed) would be 0.0414 (4.14%). This time the null hypothesis  that the observed result of 15 heads out of 20 flips can be ascribed to chance alone  is rejected. Such a finding would be described as being "statistically significant at the 5% level".
Critics of pvalues point out that the criterion used to decide "statistical significance" is based on the somewhat arbitrary choice of level (often set at 0.05). A proposed replacement for the pvalue is prep. It is necessary to use a reasonable null hypothesis to assess the result fairly. The choice of null hypothesis entails assumptions.
[edit] Frequent misunderstandings
The conclusion obtained from comparing the pvalue to a significance level yields two results: either the null hypothesis is rejected, or the null hypothesis cannot be rejected at that significance level. You cannot accept the null hypothesis simply by the comparison just made (11% > 5%); there are alternative tests that have to be performed, such as some "goodness of fit" tests. It would be very irresponsible to conclude that the null hypothesis needs to be accepted based on the simple fact that the pvalue is larger than the significance level chosen.
The use of pvalues is widespread; however, such use has come under heavy criticism due both to its inherent shortcomings and the potential for misinterpretation.
There are several common misunderstandings about pvalues.^{[2]}^{[3]}
 The pvalue is not the probability that the null hypothesis is true. (This false conclusion is used to justify the "rule" of considering a result to be significant if its pvalue is very small (near zero).)
In fact, frequentist statistics does not, and cannot, attach probabilities to hypotheses. Comparison of Bayesian and classical approaches shows that a pvalue can be very close to zero while the posterior probability of the null is very close to unity. This is the JeffreysLindley paradox.  The pvalue is not the probability that a finding is "merely a fluke." (Again, this conclusion arises from the "rule" that small pvalues indicate significant differences.)
As the calculation of a pvalue is based on the assumption that a finding is the product of chance alone, it patently cannot also be used to gauge the probability of that assumption being true. This is subtly different from the real meaning which is that the pvalue is the chance that null hypothesis explains the result: the result might not be "merely a fluke," and be explicable by the null hypothesis with confidence equal to the pvalue.  The pvalue is not the probability of falsely rejecting the null hypothesis. This error is a version of the socalled prosecutor's fallacy.
 The pvalue is not the probability that a replicating experiment would not yield the same conclusion.
 1 − (pvalue) is not the probability of the alternative hypothesis being true (see (1)).
 The significance level of the test is not determined by the pvalue.
The significance level of a test is a value that should be decided upon by the agent interpreting the data before the data are viewed, and is compared against the pvalue or any other statistic calculated after the test has been performed.  The pvalue does not indicate the size or importance of the observed effect (compare with effect size).
[edit] See also
[edit] Additional reading
 Dallal GE (2007) Historical background to the origins of pvalues and the choice of 0.05 as the cutoff for significance
 Hubbard R, Armstrong JS (2005) Historical background on the widespread confusion of the pvalue (PDF)
 Fisher's method for combining independent tests of significance using their pvalues
 Dallal GE (2007) The Little Handbook of Statistical Practice (A good tutorial)
[edit] References
 ^ http://economics.about.com/od/termsbeginningwithp/g/pvaluedef.htm
 ^ Sterne JAC, Smith GD (2001). "Sifting the evidence — what's wrong with significance tests?". BMJ 322 (7280): 226–231. doi: . PMID 11159626. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=11159626.
 ^ Schervish MJ (1996). "P Values: What They Are and What They Are Not". The American Statistician 50 (3): 203206. http://www.jstor.org/sici?sici=00031305(199608)50%3A3%3C203%3APVWTAA%3E2.0.CO%3B20.
[edit] External links
 Free online pvalues calculators for various specific tests (chisquare, fisher's Ftest, etc).
 Understanding Pvalues, including a Java applet that illustrates how the numerical values of pvalues can give quite misleading impressions about the truth or falsity of the hypothesis under test.
