This 2016 statement, worth reading directly and in full, represents the formal recommendations of the American Statistical Association (ASA) regarding the use and interpretation of p-values. It is supported by a variety of additional commentary by ASA members.
The ASA defines the p-value as follows:
A p-value is the probability under a specified statistical model that a statistical summary of the data (e.g., the sample mean difference between two compared groups) would be equal to or more extreme than its observed value.
It then provides six principles to guide p-value usage:
- P-values indicate how incompatible the data are with a specified statistical model. In the case of “null hypothesis” testing, a lower p-value indicates the belief that there is no difference between two tested groups (e.g., a control and a treatment group). The lower the p-value, the less compatible the data are with the null hypothesis.
- P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone. The p-value merely tells us how compatible a particular explanation of the data (i.e., that there is no difference between groups) is with the data itself. It does not tell us how likely this explanation, or its converse, is to be true.
- Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold. The design of the study, the quality of the measurements, and other data surrounding the studied phenomenon should all inform how a p-value is interpreted. A single p-value cannot denote the truth or falsehood of a statement or hypothesis. The widespread use of “statistical significance” (generally interpreted as “p 0.05”) as a license for making a claim of a scientific finding (or implied truth) leads to considerable distortion of the scientific process.
- Proper inference requires full reporting and transparency. Selectively reporting p-values, calculating p-values only for some hypotheses, or discussing only data where the calculated p-value was below a specific threshold renders the data distorted and uninterpretable. The p-value of a particular analysis can only be assessed with complete transparency regarding other analyses performed and their results.
- A p-value, or statistical significance, does not measure the size of an effect or the importance of a result. A very low p-value may indicate a small or unimportant event measured very precisely, while a large or important effect may generate a large p-value if measurement precision is low. The p-value tells us nothing about the size of an effect or its scientific, human, or economic significance.
- By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis. A low p-value provides only weak evidence against a null hypothesis, while a high p-value does not imply the truth of the null. Other hypotheses may be more consistent with the observed data. Any p-value should be the start, not end, of an analytic approach.
After noting the increasing prominence of a variety of statistical approaches (including confidence intervals, Bayesian methods, and more complicated forms of modeling), the group concludes:
Good statistical practice, as an essential component of good scientific practice, emphasizes principles of good study design and conduct, a variety of numerical and graphical summaries of data, understanding of the phenomenon under study, interpretation of results in context, complete reporting and proper logical and quantitative understanding of what data summaries mean. No single index should substitute for scientific reasoning.