Frequency Distributionsby Adam J. McKee, Ph.D. January, 2004 A frequency distribution is simply a list of values for a variable and the number of times they appear in the data. Suppose we are interested in the number of hours of community service a local youth court has been assigning to young vandals. The data is as follows: 20, 20, 25, 25, 25, 25, 30, 30, 30, 30, 30, 30, 30, 35, 35, 35, 35, 40, 40, 45, 45, 45. We now compile this data into a table like the following one:
This is a very simple concept, but the characteristics of this distribution are the basis for most statistical techniques. Large numbers of discrete values or continuous data are often classified into ranges in tables like the one above where the frequency of each range is listed. Class IntervalsData can be grouped together so that several score values can be placed in each group. This makes the distribution more compact and often patterns in the data become easier to detect. Such a group of scores is referred to as a class interval. The steps to constructing a frequency distribution table using class intervals are as follows: 1. Choose a number of class intervals to be used, usually between 10 and 20 intervals. 2. Calculate the range of the scores by subtracting the lowest score from the highest score. 3. Divide the range by the number of class intervals to determine the size of the class intervals. 4. Find the score limits by identifying the lowest and highest score value for each interval. 5. Calculate the real limits by identifying the point halfway between the upper score limit of a particular interval and the lower score limit of the next highest interval: real lower limit = score limit - .5(unit) real upper limit = score limit + .5(unit) Unit refers to the decimal place that you have decided to round to, usually the ones place. If the ones place is used, then the unit is one. If the tenths place is used, the unit is 0.1. If the hundredths place is used, the unit is 0.01. 6. Calculate the midpoint for each class interval by dividing the interval size in half and adding this number to all the lower real limits of the intervals. 7. Calculate the cumulative frequencies by successive addition, beginning with the observed frequency of the lowest interval. Your table should look something like this: Table 2 Cumulative Frequency Table for Grouped Data
Class Interval ƒ Upper Real Limit cƒ 123-125 3 125.5 30 120-122 3 122.5 27 117-119 2 119.5 24 . . . . . . . . 99-101 1 101.5 6 96-98 3 98.5 5 93-95 2 95.5 2 Note: Table and methodology adapted from Shavelson, 1996, p. 65. Remember that frequency polygons can be used to describe this distribution of data. Be careful not to confuse a sample distribution with a frequency distribution. These are two very different things. Sample distributions will be discussed further under hypothesis testing. Two extremely important characteristics of a frequency distribution are its central tendency and its variability. Most of us have an intuitive understanding of central tendency. We generally do not have this grasp of Variability (dispersion). Variability is the foundation for all other statistical tests. In the final analysis, all studies are correlational in nature. We want to understand relationships between variables. Correlations are a measure of how two or more variables vary together or co-vary. Variance is the foundation of this understanding. References Shavelson, R. J. (1996). Statistical Reasoning for the Behavioral Sciences (3rd). Boston: Allyn and Bacon. This page available at: |