db cj stats
© 2014 by Pearson Higher Education, Inc Upper Saddle River, New Jersey 07458 • All Rights Reserved
Fox/Levin/Forde, Elementary Statistics in Criminal Justice Research, 4e
Chapter 10: Correlation
*
*
© 2014 by Pearson Higher Education, Inc Upper Saddle River, New Jersey 07458 • All Rights Reserved
Differentiate between the strength and direction of a correlation
Identify a curvilinear correlation
Discuss the characteristics of correlation coefficients
Calculate and test the significance of Pearson’s correlation coefficient
Calculate the partial correlation coefficient
CHAPTER OBJECTIVES
10.1
10.2
10.3
10.4
10.5
*
Differentiate between the strength and direction of a correlation
Learning Objectives
After this lecture, you should be able to complete the following Learning Outcomes
10.1
*
10.1
*
Until now we’ve examined the presence or absence of a relationship between two or more variables.
What about the strength and direction of this relationship?
- We refer to this as the correlation between variables.
Strength of Correlation
- This can be visualized using a scatterplot.
X variable (IV) is located on horizontal line
Y variable (DV) is located on vertical line
Strength increases as the points more closely form an imaginary diagonal line across the center.
Correlation
*
Direction of Correlation
- Correlations can be described as either positive or negative.
Positive – both variables move in the same direction
- High score on X variable tends to have a high score on the Y variable
Negative – the variables move in opposite directions
- High score on X variable tends to have a low score on the Y variable
- A + or – correlation represents a straight line correlation
Lines in the scatterplot tend to form a straight line through the center of the graph
*
© 2014 by Pearson Higher Education, Inc Upper Saddle River, New Jersey 07458 • All Rights Reserved
10.1
*
© 2014 by Pearson Higher Education, Inc Upper Saddle River, New Jersey 07458 • All Rights Reserved
10.1
*
Identify a curvilinear correlation
Learning Objectives
After this lecture, you should be able to complete the following Learning Outcomes
10.2
*
10.2
*
A relationship between X and Y that begins as positive and becomes negative, or begins as negative and becomes positive
Fear of crime tends to have a curvilinear correlation
Fear of crime tends to decrease with age until ppl reach their thirties after which fear tends to increase with age
Curvilinear Correlation
*
© 2014 by Pearson Higher Education, Inc Upper Saddle River, New Jersey 07458 • All Rights Reserved
10.2
*
Discuss the characteristics of correlation coefficients
Learning Objectives
After this lecture, you should be able to complete the following Learning Outcomes
10.3
*
The Correlation Coefficient
10.3
Direction
Strength
*
- The sign (either – or +) indicates the direction of the relationship
- Values close to zero indicate little or no correlation
- Values closer to -1 or +1, indicate stronger correlations
Numerically expresses both the direction and strength of a relationship between two variables
- Ranges between -1.0 and + 1.0
*
*
Calculate and test the significance of Pearson’s correlation coefficient (r)
Learning Objectives
After this lecture, you should be able to complete the following Learning Outcomes
10.4
*
10.4
*
We can determine the strength and direction of the relationship bt X and Y – must be measured at the interval level
Focuses on the product of the X and Y deviations from their respective means
Deviations Formula:
Computational Formula:
Pearson’s Correlation Coefficient (r)
*
Imagine we want to examine the relationship bt the length of a murder trial in days and the length of time in hours that the jury deliberates.
*
*
The scatterplot seems to indicate a positive relationship with a few exceptions
Pearson’s r indicates precisely how much the jury deliberation is extended with lengthier trials
Pearson’s r focuses on the product of the x and y deviations from their respective means
The deviation (X-x bar) tells how much longer or shorter than average a particular trial is
The deviation (y-y bar) tells how much longer or shorter than average a particular jury deliberation takes
With Pearson’s r we add the products to see if the positive products or the negative products are more abundant and sizeable
*
Calculating Pearson’s r requires us to compute the sum of the product for all the cases
The sum of the final column is + (+ relationship)
*
BUT we have to divide SP by the square root of the product of the sum of squares of both variables to get an r between -1 and +1
*
*
10.4
*
To test the significance of a measure of correlation, we set up a null hypothesis
no correlation exists in the population (ρ = 0).
ρ is pronounced rho
- To test the significance of r, a t ratio with degrees of freedom N – 2 must be calculated.
A simplified method for testing the significance of r
- Compare the calculated r to a critical value found in Table H in Appendix C
Testing the Significance of Pearson’s r
*
Step by Step – Pearson’s r
*
*
*
*
*
Requirements for the Use of Pearson’s r Correlation Coefficient
10.4
A Straight-Line Relationship
Interval Data
Random Sampling
Normally Distributed Characteristics
*
The Importance of Scatterplots
For a dataset containing several variables, a computer can obtain a correlation matrix
Displays in compact form the interrelationships of several variables simultaneously
Below the entry in the 2nd row, 4th column tells us the correlation of offender’s and spouse’s education
*
The matrix is triangular with the portion above the diagonal being identical to the portion below the diagonal
Pitfall – correlations may gloss over some major violations of assumptions of Pearson’s r
The correlation matrix does not tell us if a linear relationship actually exists
IT IS IMPORTANT TO CHECK THE SCATTERPLOT
*
Calculate the partial correlation coefficient
Learning Objectives
After this lecture, you should be able to complete the following Learning Outcomes
10.5
*
It is also important to consider if a correlation holds up when controlling for additional variables
Again we can focus on scatterplots
A scatterplot displays all the information in the correlation coefficient
We can construct separate scatterplots for different subgroups of a sample to see if the correlation observed for the full sample holds when controlling for the subgroup or control variable
*
Imagine we examine the relationship between height and salary and find a correlation
Is there a 3rd variable that could explain that relationship? Perhaps sex?
*
If any correlation remains in either of the two gender specific subplots, the relationship between height and salary is nowhere near as strong as we first saw
The partial correlation coefficient is the correlation bt two variables after removing (partialing out) the common effects of the 3rd variable
This too ranges from -1 to +1
*
10.5
*
The correlation between two variables, X and Y, after removing the common effects of a third variable, Z
When testing the significance of a partial correlation, a slightly different t formula is used
Partial Correlation
*
A consultant for a PD finds a -.44 correlation between performance on a physical fitness test (x) and salary (y)
This might suggest that the dept pays a lower salary to those in top shape
Perhaps a 3rd variable (number of years on the force, z) is impacts both x and y
*
*
*
The partial correlation coefficient is very useful for finding spurious relationships
The correlation between the rate of rape (per 100,000) in 1982 and the circulation of Playboy (per 100,000) in 1979 for 49 U.S. states (Alaska is an outlier on rape and is excluded) is r = +40.
Because of this substantial correlation, many observers have asked:
If Playboy has this kind of effect on sex crimes, imagine what harm may be caused by truly hardcore pornography?
*
*
© 2014 by Pearson Higher Education, Inc Upper Saddle River, New Jersey 07458 • All Rights Reserved
Correlation allows researchers to determine the strength and direction of the relationship between two or more variables.
In a curvilinear correlation the relationship between two variables starts out positive and turns negative, or vice-versa.
The correlation coefficient numerically expresses the direction and strength of a linear relationship between two variables.
Pearson’s correlation coefficient can be calculated for two interval-level variables.
The partial correlation coefficient can be used to examine the relationship between two variables, after removing the common effect of a third variable.
CHAPTER SUMMARY
10.1
10.2
10.3
10.4
10.5
*
(
)
(
)
(
)
(
)
22
SP
SSSS
XY
XXYY
r
XXYY
--
==
--
å
åå
(
)
(
)
2222
XYNXY
r
XNXYNY
-
=
--
å
åå
2
2
1
rN
t
r
-
=
-
.
22
11
XYXZYZ
XYZ
XZYZ
rrr
r
rr
-
=
--
.
2
.
3
1
XYZ
XYZ
rN
t
r
-
=
-