|
Correlation by Adam J. McKee What is Correlation?
The Direction of the Relationship In a positive correlation, the two variables tend to change in the same direction: When the X variable increases, Y also increases; if the X variable decreases, the Y variable also decreases. In a negative correlation, the two variables tend to go in opposite directions. As the X variable increases, the Y variable decreases. The Form of the Relationship
Degree of Relationship
The Pearson Correlation The Pearson correlation measures the degree and direction of the linear relationship between two variables. It is identified by the letter r. The sum of Products of Deviation
Computing SP
Preliminary Data Analysis
Calculation of Person’s r
Once you have SS for X and Y and SP, you can calculate r as follows:
Using and Interpreting r If two variables are known to be related in some systematic way, it is possible to use one of the variables to make accurate predictions about another. For example, college admissions officials often use SAT or ACT scores to predict success in college. 2: Validity In research, validity asks the question "Am I measuring what I think I am measuring?" For example, if an intelligence test was based on weight and shoe size, it wouldn’t be valid at all. One common technique of validating tests is to use correlation—if a new test is highly correlated with another test regarded as valid, then the validity of the new test is supported. 3: Reliability Closely related to validity, reliability asks how well (precisely) am I measuring whatever I’m measuring? The key to reliability is consistency of results—if I give you a test and you make 100 on it, then I give it to you again and you make a 57, then the test is not reliable. 4: Theory Verification Many social scientific theories make specific predictions about relationships between variables. For example, a theory may predict a relationship between brain size and learning ability. Correlation Caveats: #1 Correlation simply describes a relationship between two variables. It does not explain why the two variables are related. Specifically, correlations can not be used as proof of a cause-effect relationship between two variables—what is a third variable z caused both? #2 The value of a correlation can be affected greatly by the range of scores represented in the data. Be careful in your interpretation of data that does not include a full range of scores—using senior English majors in a study of reading ability, for example. #3 One or two extreme data points, often called outliers, can have a dramatic effect on the value of correlations. #4 When judging how "good" a relationship is, it is tempting to focus on the number value. Be careful not to interpret a correlation as a proportion—this is the appropriate interpretation of r2 Hypothesis Testing with r
Regression When we talked about scatter plots, we saw that you can draw a line through the middle of the data points. This line serves several purposes: The line makes the relationship between the variables easier to see. The line identifies the center, or central tendency, of the relationship—just like the mean, it provides a ‘snapshot’ of the relationship between the two variables. Finally, the line can be used for prediction. The line established a one-to-one relationship between X and Y. Drawing the line on a graph is useful, but how do we know we are in the middle of all those dots? We can get a lot more precise about the line if we remember from algebra that straight lines can be drawn on a graph using simple equations. Linear Equations In general, a linear relationship between variables X and Y can be expressed by the equation Y = bX + a, where b and a are fixed constants. For example, a local tennis club charges a fee of $5 per hour and an annual membership fee of $25. With this information, we can determine the total cost of playing tennis by using a linear equation that describes the relationship between total cost (Y) and the number of hours (X). Y = 5X + 25 In the general linear equation, the value of b is called the slope. The slope determines how much the Y variable will change when X is increased by 1 point. The value of a is called the Y-intercept because it determines the value of Y when X = 0. Finding the Equation The statistical technique for finding the best-fitting straight line for a set of data is called regression, and the resulting straight line is called the regression line. The Least Squares Solution
Uses of Regression
This page available at: |