Analysis of Variance

by Adam J. McKee
Using
F. J. Gravetter and L. B. Wallnau's Essentials of Statistics for the Behavioral Sciences (4th Ed.).

 

What is Analysis of Variance?

  • In the previous sections, we have learned ways to test null hypotheses about differences between two means.
  • A closely related test is analysis of variance (ANOVA), which is sometimes called the F test.
  • The ANOVA is used to test the difference between two or more means.

What does the ANOVA involve?

  • First, we calculate a statistic called F, which is very similar to the z and t statistics we’ve done before.
  • Also like we did before, we have to calculate degrees of freedom and use the degrees of freedom to find a critical value of F.
  • Then the calculated F statistic is compared to Fcritical.
  • After we have done this, we can see that the probability is the same as if we had used a t test; the value of F, however, will be much larger.
  • This is because if we compare two means using both an F and a test, we find that F = t2
  • This relationship is not perfect once we start testing the difference between more than two means.

Example

  • A new drug for treating headaches was tested on three groups selected at random.
  • The first group received 250 milligrams, the second received 100mg, the third received a placebo (control group).
  • The means of the three groups were
  • barX1 = 1.78 barX2 = 3.98 barX3 = 12.88
  • There are three tests that must be done if we use t-tests.
  • The ANOVA allows us to test all the differences as a set.
  • For an ANOVA, the null hypothesis is about the set of means.
  • By rejecting the null hypothesis, we are rejecting the notion that one or more of the differences were created at random (sampling error).
  • Notice that the test does not tell us which of the three differences are responsible for the rejection of the null hypothesis.
  • It could be that only one of the three differences was responsible for the significance.
  • The most common way to report the results of an ANOVA is with an ANOVA Table.

The ANOVA Table

Computing SS With the Computational Formula

  • Previously we learned how to compute the sum of squares (SS) for our data using a heuristic formula.
  • Heuristic means a way that lets us understand what is going on.
  • The problem is that this way makes us do more work—we can compute SS using an easier way, but the formula doesn’t reinforce the idea that we are looking at the spread of each score from the mean.
  • For ANOVA problems, we will use the computational formula—because it is less work.

Step 1: Setting Up Data for an ANOVA

Step 2: Total Sum of Squares

Compute the total sum of squares (SST):

  • Where the sum of X squared is the sum of the squared scores for each group—add the three sums in your table together
  • The sum of X total is the sum of all scores (add the sums from the table)
  • N is the total number of subjects in all groups

Step 3: Between Groups Sum of Squares

This is a measure of the variation between groups—it is needed in order to compute the value of F.

  • This means that you will obtain the sum of X, square it, and divide that by the number of scores in the group for each group.
  • Then you take the sum of X for all of the scores and then divide that by the total number of scores.
  • For example, if you have 3 groups, your equation will look like this:

Step 4: Within Groups Sum of Squares

This is an estimate of the amount of variation within groups.

This works because the total variance is composed of variance between subjects and variance within subjects. Since we know what SS between subjects is, all the rest of it is SS within.

Step 5: Compute df for between groups

This is simply the number of groups minus 1.

For example, if I have three groups, then:

dfb = 3 – 1 = 2

Step 6: Compute df for the Total

The total degrees of freedom is equal to the total number of subjects minus 1.

For example, if my experiment involves three groups of six subjects for a total of 18 subjects, then my df total will be

18 – 1 = 17

Step 7: Compute df Within Groups

Like we did with SS, we get df by subtraction:

dfw = dft – dfb = 17 – 2 = 15

This is a good time to go back and replace the question marks in your ANOVA table with the degrees of freedom values.

Step 8: Compute the means squares

Mean squares are computed by dividing each sum of squares by its degrees of freedom.

MSb = SSb / dfb

MSw = SSw / dfw

Go ahead and ender these values into your ANOVA Table.

Step 9: Compute F

It took a lot of work to get here, but the computation of F is simple once we have all of the other information:

Establishing p

  • At this point, all we are missing is the value of p.
  • To obtain this, we must compare our observed (computed) value of F with the appropriate critical value in our F table.
  • Note that we have to use degrees of freedom for both within and between.

The Decision Rule

IF the calculated value of F is greater than the critical value (table value), then we reject the null hypothesis; otherwise, we do not reject (fail to reject).

If we reject, then we state that p is less than the value of our alpha level (if we use a computer program such as SPSS, we can report the exact value of p).

Example Table by Steps

What Does It Mean?

  • Finding that F is significant tells us that at least one group’s mean is significantly different from the others.
  • However, we do not know which one.
  • To find out, we’ll have to use a new procedure.

Post Hoc Tests

Because these tests are conducted after the ANOVA, they are called post hoc tests and are designed for use when individual comparisons were not planed in advance based on theory; they are also known as multiple comparison tests.

Although a t-test may be of use to test the difference between a pair of means, its use to make repeated comparisons involving the use of data more than one time makes the probabilities obtained with t inaccurate.

Tukey’s HSD Test

When a one-way ANOVA is statistically significant, one cannot be sure which specific differences are significant.

In a test with three groups, there are three possible significant differences between group means:

  • Group A v. Group B (6.000 – 5.500 = 0.500)
  • Group A v. Group C (6.000 – 2.333 = 3.667)
  • Group B v. Group C (5.500 – 2.333 = 3.167)

(Note that the numbers in parentheses are sample means supplied for the purpose of the following examples).

Calculating HSD

q = the studentized range statistic (which we obtain from a table)

MSwithin is the mean square for within groups for our ANOVA

n is the number of subjects for each group—Tukey’s HSD can only be used if we have the same number of people in each group

Using the q Table

Determine the degrees of freedom associated with the within groups mean square from your ANOVA.

Also determine the number of groups.

Read from the table where the number of groups meets the degrees of freedom.

Performing the Comparison

Fir each pair of means, compare the mean differences to HSD.

Here is the decision rule: If the observed difference between a pair of means is greater than HSD, reject the null hypothesis; otherwise, do not reject it.

Example: Suppose that we compute HSD to be 2.127 and that for comparison A v. B the observed difference is .500. Since observed difference is not greater than HSD, we fail to reject the null.


This page available at: