Normal Distribution

by

Adam J. McKee, Ph.D.

January, 2004

                Researchers have found that quite often data behave in a particular way when summarized in a frequency polygon.  Most scores tend to cluster around the center of the distribution, while the farther you get from the mean the fewer scores you have.  That is, most people tend to have scores near the mean, and fewer and fewer scores are present at either extreme.  For example, most people have a grade of C on a test.  Fewer have D and B grades, and fewer still have F and A grades.  This is where the academic idea of "curving grades" comes from.  When many people do badly and the curve is positively skewed, the professor adjusts all of the grades so that the majority of people fall in the middle range (a C score).  The resulting frequency polygon ends up looking like a bell, or bell-shaped curve.  Some very clever fellows were able to fit mathematical equations to this sort of curve and reproduce it.  From this mathematical model, we get the notion of the normal distribution.   If our research data fits this bell-shaped curve, we can say that it is normally distributed.  

                A valuable property of the normal distribution is that we know with a good deal of certainty what proportion (or percent) of scores fall in any given area under the curve.  We also know where a score falls under the normal curve in standard deviation units.  The following figure depicts these areas under the curve:

 

                In theory at least, the curve extends on forever on both directions.  This is why the curve is drawn with the "tails" never touching the horizontal axis.  We can however say that 68% of scores will fall within 1 standard deviation above and below the mean, 95% of scores will fall within 2 standard deviations above and below the mean and that 99.7% of scores will fall within 3 standard deviations below or above the mean.  The proportions associated with these areas under the curve are also the probabilities of a score falling within that particular area.  Thus, we can see that there is a higher probability of a randomly selected score falling within 1 SD of the mean than outside that range.  Put another way, there is a 68% change that a randomly selected score will fall within 1 standard deviation above or below the mean.  Since the curve is symmetrical, we can divide this probability by 2 and state a particular direction.  Direction simply means which side of the curve we are talking about right (positive) or left (negative).  Therefore, we can say that there is a 34% chance that a randomly selected score will fall within one standard deviation above the mean.  Most statistical tests (parametric ones) rely on these properties of the normal curve for tests of significance.  That is why most of these tests require that the data be normally distributed as an assumption of the test.


This page available at: