Go to CCP Homepage Go to Materials Page Go to Table of Contents
Go Back One Page Go Forward One Page

Linear Correlation and Regression

Part 1: Association

In this module we will look at methods for studying the relationship between two variables. Many of the statistical ideas here are due to Sir Francis Galton, an English anthropologist who was interested in hereditary processes. Knowing that children bear some resemblance to their parents, Sir Galton was interested in how strong that resemblance was. While Galton inspired many statistical questions, the formal development of many of his ideas is due to his colleague Karl Pearson. Some of the data collected by Galton and Pearson are available, but the data sets are rather large for our purposes, so we will look at some test scores for a math class.

  1. Consider the test data given below. For each test, compute the average (also known as the mean) and the standard deviation of the scores. What do you notice?

    Student Test 1 Score Test 2 Score Test 3 Score
    Liza

    70

    75

    60

    Geoff

    91

    88

    74

    Kirsten

    88

    91

    81

    Jeremy

    87

    90

    66

    Ko

    45

    41

    50

    Clyde

    81

    80

    95

    Serpil

    77

    74

    86

    Shameeka

    74

    76

    68

    Zoe Ann

    73

    74

    83

    Yiannis

    95

    91

    94

    Alfred

    69

    70

    88

    Suzi

    67

    65

    59

    Ashya

    65

    63

    69

    Sreek

    60

    66

    77

    Lionel

    83

    81

    75

     

  2. Use your computer algebra system to draw the following scatter plots:
  3. Click here for help creating a scatter plot with Maple.

  4. We say that two variables are positively associated when larger values of one variable tend to occur with larger values of the other variable. When larger values of one variable occur with smaller values of the other variable, we say the variables are negatively associated. What do the scatter plots tell you about the association between a student's score on Test 1 and his or her score on Test 2? What about the association between a student's score on Test 2 and Test 3?

  5. Consider relationship between a student's test average and the number of times he or she misses class during the term. Would you expect a positive or negative association between these variables? Why?

  6. If a student scored an 82 on Test 1, what do you predict he would score on Test 2? If the student scored an 82 on Test 2, what would you predict for his score on Test 3? Which prediction do you expect to be more accurate? Why?

  7. In the scatter plot of Test 2 scores versus Test 1 scores, the points appear to be tightly clustered along a line: there is a strong linear association between scores on the first two tests. In the second scatter plot, the clustering is looser (and the linear association is weaker). How is this observation related to the previous question?

Go to CCP Homepage Go to Materials Page Go to Table of Contents
Go Back One Page Go Forward One Page

| CCP Home | Materials | Test Modules | Contents | Back | Forward |

modules at math.duke.edu Copyright CCP and the author(s), 1999