If you’re involved in research, you’ve likely come across the notation ‘p < .05’ in journal articles or elsewhere, or you’re using this notation in your own writing to indicate a statistically significant result. It is used pretty much everywhere, almost as a mantra across research studies.
But do you know what p < .05 actually means? And are you interpreting it correctly?
Here are a few common interpretations of p < .05 (Spence & Stanley, 2018).
- “There is a low probability (less than 5%) that the result was due to chance,”
- “There is less than a 5% chance that the null hypothesis is true”,
- “There is a 95% chance of finding the same result in a replication”, and
- “The odds that a result happened due to chance is small – specifically less than 5%”.
The trouble with these interpretations is that they’re all wrong.
What does the expression p < .05 really mean?
As most students hate reading about statistics, I’m taking the liberty of presenting the bottom line first.
The bottom line of a statistically significant result is that it means that there probably is an effect – there probably is a difference between the means of your groups, or there probably is a relationship between your variables.
Now let’s look at some of the explanatory notes that may appear beneath this bottom line. Stay with me, please…
The notes explain that in the case of a significant result (p < .05), you would have no idea how big the effect is that probably exists, how big the difference is that probably exists between the group means, or how strong the relationship is that probably exists between your variables.
All you can safely say from a significant result is that there is likely to be some nonzero effect, some nonzero difference, or some nonzero relationship.
So where to from here?
To judge how strong this probable nonzero effect is, or how big the mean difference or relationship is, the next step is to calculate the effect size. The effect size tells you the meaningfulness or practical significance of your statistically significant result.
The concept of effect size is very important (and the topic of one of my future blogs).
Want to know more of the stats behind p and significance?
Here we go…
The main method of testing your hypothesis or expectation is called null hypothesis significance testing. In this method, we test the assumption via the null hypothesis that there is no treatment effect in your study, i.e., no difference between your groups or no relationship between your variables. Opposing the null hypothesis is your alternate hypothesis that there is an effect. So, the null hypothesis is Ho: No treatment effect. The alternate hypothesis is Ha: There is an effect. (These are usually written in Greek symbols).
Now, think about your own research. If you are using quantitative methodology, you may have selected a sample, ideally a random sample, from a population, and calculated its mean, a mean difference, or relationship between variables.
Now consider the following theoretical scenario under the assumption that Ho is true.
Hypothetically you could continue drawing many, many random samples from your population of interest and so, land up with a population of samples under the assumption that Ho is true. Thereafter, you could easily work out the proportion of samples that would be drawn from this hypothetical distribution of samples under Ho that would give a result like your observed sample result, or a more extreme result than it. If you find that many of these hypothetical samples give a result like the one you found in your study sample, or are more extreme than it, then you continue assuming that Ho is true.
But say you find that very few, i.e., fewer than 5% of the many hypothetical samples drawn under the null hypothesis give the same result as you found in your study sample or yield a more extreme result. Thus, very few samples under Ho support the result you found in your sample. Then, your assumption that the null hypothesis is true must be wrong. As the proportion of hypothetical samples drawn under Ho that supports your finding is so small (p < .05), you would reject Ho.
So, p is an index that determines if you have a significant result. P is just the proportion of samples that would be drawn from a hypothetical distribution of samples under Ho, i.e., if there was no effect, that yield results like yours or are more extreme.
To repeat, as Ho says that there is no effect, the essence of what it means to reject Ho is that you reject that there is no effect. You reject that there is no difference between your groups. You reject that there is no relationship between your variables. So, based on the small proportion (< 5%) of hypothetical samples supporting your finding under Ho, you then say that there probably is an effect, there probably is a difference between your groups, or there probably is a relationship between your variables.
Then you move on to calculating and interpreting the effect size.
Hope you’ll join me further on our statistical journey. The scenery is most pleasant, really 😊