Statistical MethodsHypothesis Testingby Adam J. McKeeUsingF. J. Gravetter and L. B. Wallnau's Essentials of Statistics for the Behavioral Sciences (4th Ed.).Use of Samples Usually it is impossible or impractical for a researcher to observe every individual in a population. Therefore, researchers usually collect data from a sample and then use the sample data to help answer questions about the population. Hypothesis Testing A statistical method that uses sample data to evaluate a hypothesis about a population parameter. 4 Steps in the Logic of Hypothesis Testing Step 1 State a hypothesis about a population. Usually concerns the value of the population mean μ. For example, we might hypothesize that the average IQ of registered voters is 110. Step 2 Before we actually select a sample, we use the hypothesis to predict the characteristics that the sample should have. Say we hypothesize a population mean of 100, we would assume that the sample should have a mean around 100—there will always be some error in a sample. Step 3 Next, we obtain a random sample from the population. Step 4 Finally, we compare the obtained sample data with the prediction that was made from the hypothesis. If the sample mean is consistent with the prediction, we will conclude that the hypothesis is reasonable. If there is a big difference between the data and the prediction, we will decide the hypothesis is wrong. Some Clarifications Note that the researcher begins with a known population. This is the set of individuals as they exist before treatment. The purpose of research is to determine the effect of the treatment on the individuals in the population. That is, the goal is to determine what happens to the population after the treatment is administered. (had an effect on population mean) A hypothesis test is a formalized procedure that follows a standard series of operations. In this way, researchers have standardized method for evaluating the results of their research studies. Other researchers will understand exactly what was done and how conclusions were reached. 4 Steps of Hypothesis Testing by Example Example: It has been demonstrated that stimulation of infant rats results in eyes opening sooner, more rapid brain maturation, faster growth, and higher body weights. This might prompt a researcher to ask if the same were true with human infants. We know from national health stats that the mean weight for 2-year-old children is μ = 26 pounds with a standard deviation of 4 pounds. The researcher’s plan is to obtain a sample of n = 16 newborns and give their parents detailed instruction for giving the infants increased stimulation. At age 2, each of the 16 infants will be weighed, and the mean weight for the sample will be computed. If the mean weight for the sample is substantially more (or less) than the general population mean, then the researcher can conclude that the increased handling will have an effect on development. On the other hand, if the mean weight for the sample is around 26 pounds, then the researcher must conclude that increased handling does not appear to have an effect. Using the previously detailed example, we can illustrate the four steps of hypothesis testing. Step 1: State the Hypothesis As the name implies, the process of hypothesis testing begins by stating a hypothesis about the unknown population. Actually, we state two opposing hypotheses. Note that both are stated in terms of population parameters. The Null Hypothesis The first is the null hypothesis – the null hypothesis states that the treatment has no effect. In general, the null states that there is no change, no effect, no difference—nothing happened—hence the name null. The null is indicated by the symbol H0 Hypotheses in Symbolic Notation H0: μ = 26 pounds In the context of an experiment, H0 predicts that the independent variable will have no effect on the dependent variable. Alternative Hypothesis The alternative hypothesis states that there is a change, a difference, or a relationship for the general population. In symbolic terms H1: μ ≠ 26 Directional Hypotheses Note that the alternative hypothesis simply states that there will be some type of change. It does not specify whether the effect will be increased or decreased by growth. In some circumstances, it is appropriate to specify the direction of the effect of H1. For example, μ > 26 pounds This results in a directional hypothesis—we’ll stick to non-directional tests at this point. Population Confusion NOTE THAT BOTH HYPOTHESES REFER TO A POPULATION WHOSE MEAN IS UNKNOWN—NAMELY THE POPULATION OF INFANTS WHO RECEIVE EXTRA HANDLING EARLY IN LIFE. ONCE THE TREATMENT HAS BEEN ADMINISTERED, THE SAMPLE IS NO LONGER A SUBSET OF THE POPULATION IT WAS DRAWN FROM. Step 2: Set the Criteria for a Decision The researcher will eventually sue the data from the ample to evaluate the credibility of the null hypothesis. The data will either provide support for the null or tend to refute the null. In particular, if there is a big difference between the data and the hypothesis, we will conclude that the hypothesis was wrong. Formalizing the Decision Making Process To formalizes the decision process, we will use the null to predict exactly what kind of sample should be obtained if the treatment has no effect. In particular, we will examine all the possible sample means that could be obtained if the null is true. Our Example The null states that the mean is still = 26, and the researcher is planning to obtain a sample of 16 children. Therefore, we will examine the distribution of sample means for n = 16, centered at a value of mu = 26 pounds. The distribution is then divided into two sections: Sample means that are likely to be obtained of Ho is true—sample means that are close to the null hypothesis Sample means that are unlikely to be obtained if Ho is true—sample means that are very different from the null hypothesis [see figure 8.2 on p. 171] – the distribution of sample means] Location of Very Low Probability Areas Note that the high-probability samples are located in the center of the distribution and have sample means close to the value specified in the null. On the other hand, the low-probability samples are located in the extreme tails of the distribution. After we divide the distribution this way, we can compare our sample data with the values in the distribution. We determine whether our sample mean is consistent with the null hypothesis, or whether our sample mean is very different from the null like the values in the extreme tails. Alpha Level To find the boundaries that separate the high probability samples from the low probability samples, we have to define what we mean by "low" and "high." This is accomplished by selecting a specific probability value which is known as the level of significance or the alpha level for the hypothesis test. What Alpha Is The alpha (ά) value is a small probability that is used to identify the low probability samples. By convention, commonly used alpha levels are alpha = .05 (5%) and alpha = .01 (1%). An alpha of .05 separates the most likely 95% from the least likely 5%. The Critical Region The extremely unlikely values that are defined by the alpha level make up what is called the critical region. These extreme values in the tails of the distribution define outcomes that are inconsistent with the null hypothesis: they are very unlikely to occur if the null is true. Whenever the data from a study produce a sample mean that is located in the critical region, we will conclude that the data are inconsistent with the null hypothesis and we will reject the null. The Boundaries of the Critical Region To determine the exact location for the boundaries that define the critical region, we will use the alpha-level probability and the unit normal table. In most cases, the distribution of sample means is normal, and the unit normal table will provide the precise z-score location for the critical region boundaries. Using the z-score Table Because the extreme 5% is split between two tails of the distribution, there is exactly 2.5% (0.025) in each tail. In the table, you can look up a proportion of 0.0250 in column C (the tail) and find that the z-score boundary is z = 1.96. Thus, for any normal distribution, the extreme 5% is in the tails of the distribution beyond z = 1.96 and z = -1.96. These boundaries define the critical region For alpha = .01, z = 2.58 Hypothesis Testing Step 3: Get Data and Compute Sample Statistic The next step is to obtain the sample data. At this time, the researcher selects a random sample (infants in our example) Note that the data are collected after the researcher has stated the hypotheses and established the criteria for a decision. This sequence of events helps ensure that a researcher makes an honest, objective evaluation of the data and does not tamper with the decision criteria after the experimental outcome is known. Summarizing the Sample Next, the raw data from the sample are summarized with the appropriate statistics: In this case we need the mean. Once we have the mean, we can compare the sample mean with the null hypothesis. This is the heart of hypothesis testing: comparing the data with the hypothesis. Performing the Comparison The comparison is accomplished by computing a z-score that describes exactly where the sample mean is located relative to the hypothesized population mean from Ho.
Elements of the Formula
What is measured? The top of the z-score formula measures how much difference there is between that data and the hypothesis. The bottom of the formula measures the standard distance that ought to exist between the sample mean and the population mean. Making a Decision Use the z-score value obtained to make a decision about the null hypothesis according to the established alpha level. There are two possible decisions, both of which are always stated in terms of the null hypotheses. Rejecting the Null One possible decision is to reject the null hypothesis. This decision is made when ever the sample data fall into the critical region. Falling into the critical region indicates that there is a big difference between the sample and the null hypothesis—the sample mean is in the extreme tail of the distribution of sample means. What does rejection mean? This kind of outcome is very unlikely to occur if the null hypothesis is true, so our conclusion is to reject Ho. In this case, the data provide convincing evidence that the null is wrong, and we conclude that the treatment really did have an effect on the individuals in the sample. An example For the example we’ve been using, suppose the sample of n = 16 infants produced a sample mean of 30 pounds at age 2. With n = 16 and σ = 4
Interpreting The Example With an alpha level of .05, this z-score is far beyond the boundary of 1.96. Because the sample z-score is in the critical region, we reject the null hypothesis and conclude that the special handling did have an effect in infants’ weights. The Second Possibility When the sample data are not in the critical region. In this case, the data are reasonably close to the null hypothesis (in the center of the distribution of sample means) The data do not provide good evidence that the null is wrong, so we fail to reject the null hypothesis Failing to Reject the Null This conclusion means the treatment does not appear to have an effect. For example, suppose our sample produced a mean of 27 pounds, where z = 1. In this case we would fail to reject the null hypothesis. Since the evidence is not convincing, all you can do is conclude that there is not enough evidence. Test Statistics The z-score statistic is the first example of what is called a test statistic. The term test statistic simply indicates that the sample data are converted into a single, specific statistic that can be used to test the hypothesis. The Test Statistic as a Ratio The numerator measures the obtained difference between the sample data and the null hypothesis. The standard error in the denominator of the formula measures how much difference ought to exist between the sample mean and the population mean by chance. Most Test Stats are Ratios The z-score and most other test statistics form a ratio with the following form: Statistic = observed difference/difference due to chance This answers the question: Are the results of a research study—the observed differences—more than would be expected by chance alone? Purpose of Test Statistics In general, the purpose of a test statistic is to determine whether the result of the research study (the obtained difference in means) is more than would be expected by chance alone. Whenever the test statistic has a value greater than 1, it means that the obtained result (numerator) is greater than chance (denominator). More Than Chance Not Good Enough Most researchers are not satisfied with results that are simply more than chance. Instead, conventional research standards require results that are substantially more than chance. Specifically, most hypothesis tests require that the test statistic ratio have a value of at least 2 or 3—the obtained difference must be 2 or 3 times bigger than chance before the researcher will reject the null hypothesis. Uncertainty in Hypothesis Testing Hypothesis testing is an inferential process, which means that it uses limited information as the basis for reaching general conclusions. A sample provides only limited or incomplete information about the whole population, yet hypothesis tests use the sample to make decisions about the population Always Possibility of Wrong Conclusions There is always the possibility that an incorrect conclusion will be made. Although sample data are usually representative of the population, there is always a chance that the sample is misleading and will cause a researcher to make the wrong decisions about the research results. Type I Errors It is possible that the data will lead you to reject the null hypothesis when in fact the treatment has no effect. Samples are not expected to be identical to the population—some extreme samples can be very different from the populations they are supposed to represent. Type I Errors If a researcher selects one of these extreme samples, just by chance, then the data from the sample may give the appearance of a strong treatment effect, even though there is no real effect. In this case, the researcher is likely to conclude that the treatment has had an effect, when in fact it really did not. This is called a Type I Error. Type I Error Defined A Type I Error occurs when a researcher rejects a null hypothesis that is actually true. Type one error means that the researcher concludes that a treatment does have an effect when In fact the treatment has no effect. A Stupid Mistake? A Type I Error is not a stupid mistake in the sense that a researcher is overlooking something that should be perfectly obvious. In general, a researcher never knows for certain whether a hypothesis is true or false. Instead, we must rely on the data to tell us whether the hypothesis appears to be true or false. How Serious is a Type I Error? In many cases, a type one error can be very serious. It is likely that the researcher will report or publish the research results. Others may try to build on these results, wasting time, effort, and money. How Can We Reduce the Risk? Fortunately, the hypothesis test is structured to control and minimize the risk of Type 1 Error. Although extreme and misleading samples are possible, they are relatively unlikely. For an extreme sample to produce a Type 1 Error, the sample mean must be in the critical region. However, the critical region is structured so that it is very unlikely to obtain sample means in the critical region when the null is true—the exact probability of this is determined by alpha Alpha Level Defined The alpha level for a hypothesis test is the probability that the test will lead to a Type I Error. The alpha level determines the probability of obtaining sample data in the critical region even though the null hypothesis is true. Directional Hypothesis Tests One-tailed Tests The procedures we’ve discussed before were two-tailed tests—that is, there were critical regions at both ends of the distribution. Usually a researcher will begin a research project by stating a directional hypothesis—the treatment will increase or decrease the DV A Directional Hypothesis Test In a directional test, or a one-tailed test, the statistical hypothesis specify either an increase or a decrease in the population mean score. That is, they make a statement about the direction of the treatment effect. Hypotheses For directional hypotheses, we use greater than and less than symbols to state the direction: The drug will decrease food consumption. H1: Mu < 10 (lower) H0: Mu >= 10 (no change or lower) Changes in Our Testing Steps In the first step, the directional prediction is incorporated into the statement of the hypotheses. In the second step, of the process, the critical region is located entirely in one tail of the distribution. Critical Values Alpha = .05, z critical = 1.65 Alpha = .01, z critical = 2.33 Alpha = .001, z critical = 3.09 Making a Statistical Decision Evaluate a one-tailed test the same way that you did a two-tailed test—if the calculated z is larger than z critical, reject the null. In the interpretation, you must state the direction of the difference. One v. Two Tailed Tests The critical difference is the size of the difference—you are more likely to reject the null hypothesis with a one tail test because the critical value of z is smaller.
This page available at: |