P-values

Be cautious of p-values.

Estimation of the size of an effect and the margin of error (confidence interval) for that estimate is much more informative and less likely to mislead people than a p-value.

Explanation

In a comparison or a systematic of comparisons, the difference between the outcomes of interest in the comparison groups is the best estimate of how effective or safe a health action is. However, because of the play of chance, the true difference may be larger or smaller than this.

P-values are measures of the play of chance that are often reported with the results of studies. A p-value or “significance level” is a measure of the probability of observing a result simply by chance. The smaller the p-value the less likely it is that a result is due to chance. In comparisons of health actions, the smaller the p-value, the less likely it is that there is no difference between comparison groups in the numbers of outcomes of interest – and the more likely it is that there is an actual difference. Typically, a p-value of less than 0.05 (p < 0.05) is “statistically significant”.

P-values can be misleading because people may misinterpret them in several ways:

  • Statistical significance may be confused with importance. People often assume that a low p-value indicates an important effect. However, a low p-value may or may not indicate an important effect.
  • People also may wrongly assume that a low p-value indicates the likelihood that the observed treatment effect is the true effect.
  • The cut-off for considering a result as statistically significant is arbitrary.
  • P values do not indicate anything about the risk of bias in studies.

When researchers report the results of comparisons they often also report a confidence interval (margin of error) for an effect estimate. The confidence interval is the range within which the true difference is likely to lie, after considering the play of chance. In research a 95% confidence interval (95% CI) is often given. A 95% CI means that we can be 95% confident that the actual size of the effect is between the lower and upper limit specified by the confidence interval. This means there is a 5% chance that the true effect is outside of this range.

The confidence interval tells us how precise an effect estimate is. A narrower confidence interval, with a small range between the lower and upper limit, means we can be more confident that the effect estimate is close to the true effect, while a wider confidence interval means we are less confident in the estimate. A confidence interval is more informative than a p-value because it helps focus attention on the size of an effect. Unfortunately, p-values are often reported instead of confidence intervals.

Example

Researchers reviewed 51 articles that reported “statistically significant tiny effects” published in four high profile journals. Even minimal bias in those studies could explain the observed “effects”. Yet, more than half (28) of the articles did not express any concern about the size or uncertainty of the effect estimate. Despite the low p-values reported in these articles, the results often excluded effects that would be large enough to be important. Interpretation of small effects based on p-values alone is likely to be misleading.

Remember: Whenever possible, consider confidence intervals when assessing the reliability of estimates of the effects of health actions. Do not be misled by p-values.

Educational resources for this concept
Back to Top