Week 3 Discussion: Analyzing Correlation

Liz1116

SCATTERPLOTS.pdf

Home >Mathematics homework help >Statistics homework help >Week 3 Discussion: Analyzing Correlation

Scatterplots

Does smoking cause lung cancer? Does low unemployment lead to inflation? Does human use of

fossil fuel cause global warming? A major goal of statistical studies is to determine if there is a

relationship between different variables. Once we know there is a relationship between the variables,

we can try to determine if one variable causes the other. One of the first steps in this process is to

make a scatterplot. A scatterplot is a diagram that represents the relationship between two

quantitative variable. It is a plot of paired values (x, y) with the horizontal axis representing the first, x,

variable and the vertical axis representing the second or y variable. In choosing the x and y variables,

x is the explanatory variable and y is the response variable. The choice of the x and y variables is

important. Ask yourself which variable depends on the result of the other and that will be the response

or y variable. The pattern of the dots in a scatterplot are important in determining whether there is a

correlation or relationship between the two variables.

Interpreting scatterplots involves examining the overall pattern for deviations from the pattern, or

outliers. The overall pattern can be explained using the direction, form, and strength of the

relationship. The direction would state whether the two variables have a positive or negative

relationship. In the case of a positive relationship, both variables move in the same direction. As one

variable increases, so does the other; and as one variable decreases, so does the other. A negative

direction would see the variables move in opposite directions. As one variable increases, the other

decreases. For a positive direction, the points would move in an upward direction while a negative

direction would see the points move downward. The form of the scatterplot would be whether the

points seem to cluster in the form of a straight line, a parabola, or a cubic function. We will only study

linear or straight-line relationships. The strength of the relationship is seen in how closely the points

are clustered together around a line. The measure of strength is the correlation coefficient, which

we will discuss next.

There are a number of ways to create a scatterplot. Using technology is preferred.

Stat Disk Example: We want to determine if the weight of a car is related to its city miles per gallon.

Solution: Open Stat Disk and choose Data Sets, 12th edition of the textbook, and the file Car

Measurements. The data will populate the

spreadsheet. Refer to page 4 in the Stat

Disk User’s Manual for directions on how

to open a data file. Once you have the

dataset displayed, click on Data,

Scatterplot, and choose column 3, weight

as the x or explanatory variable, and

column 8, city MPG, as the y or response

variable. Then click Evaluate. Refer to

page 13 of the Stat Disk User’s Manual for

help. The resulting scatterplot is

displayed to the left. Note that there is a

line drawn on the scatterplot. This is

called the regression line or line of best

fit, and we will discuss that later. Note

Scatterplots

that the dots and line have a downward direction. As in algebra, this means that the slope of the line

is negative meaning that as the x variable increases the y variable decreases. In this example, the city

MPG decreases as the weight of the car increases. The cluster of points does seem to form a

straight line so the form is linear. The strength seems to be strong because the points are clustered

around the line.

TI-84 Example: Scientists have examined data on sea surface temperature and coral growth per year at

locations in the Red Sea. Determine the explanatory and response variables and create a scatterplot using

the data in the table below.

Sea Surface temperature

29.68 29.87 30.16 30.22 30.48 30.65 30.90

Coral Growth

2.63 2.58 2.60 2.48 2.26 2.38 2.26

Solution: The coral growth would depend on the temperature so

growth would be the response variable and temperature would be the

explanatory variable. To do the scatterplot on the TI-84, select Stat, 1:

Edit and enter temperature in L1 and growth in L2. Then select STAT

PLOT and choose 1: Enter. Choose Plot 1 and verify the information

and turn it on. Click Zoom 9 to plot the data. The direction for this are

given in the D2L classroom TI Technology Manual. This relationship

has a negative direction, appears to be linear and the strength appears

to be strong because the points are clustered close to a line.

Example: There is some evidence that drinking moderate amounts of wine helps prevent heart attacks. The table on the next page gives data on yearly wine consumption (liters of alcohol from drinking wine, per person) and yearly deaths from heart disease (deaths per 100,000 people) in 19 developed nations.

(a) Make a scatterplot that shows how national wine consumption helps explain heart disease death rates. (b) Describe the form of the relationship. Is there a linear pattern? How strong is the relationship? Is the direction of the association positive or negative? Explain in simple language what this says about wine and heart disease. Do you think these data give good evidence that drinking wine causes a reduction in heart disease deaths? Why?

Solution: a) Enter the alcohol from wine into the first column and heart disease deaths in the second column in Stat Disk. Choose Data, Scatterplot, x as column 1 and y as column 2 and click Plot. The resulting scatterplot is shown to the left. b) The points tend to move downward which means a negative relationship. This means that as the alcohol consumption increases, the number of deaths from heart disease decrease. The relationship appears to be strong because the points are fairly close to the line. Later we will discuss calculating the correlation coefficient as a measure of the strength of the relationship. This particular relationship has a correlation coefficient of - 0.84. This qualifies as a strong, negative correlation.

This would give strong evidence that drinking moderate amounts of wine decreases heart disease.