psychology 302
1
Scatterplots and Correlation
Correlation
¾ Useful tool to assess relationships ¾ Must have two variables measured on one set of
people
¾ Correlation only measures strength of linear association
Linear relationships are not perfect lines
¾ Variables have variability (duh) ¾ Relationships may be generally linear
even if all points are not on the line
Magnitude of r Not all relationships are linear
2
Properties of r
¾ X & Y must be quantitative z Interval or ratio
¾ I doe n ma e hich a iable i edic o and which is response z rxy = ryx
Properties of r
¾ Correlation has no units z So r can be compared for different variables
¾ Value of r is always between -1 and +1
Computing r
¾ Consider deviations around mean of X & Y ¾ (X 𝑋) (Y 𝑌)
Cross-Product
¾ To consider X & Y together, multiply their deviations
¾ (X 𝑋)(Y 𝑌) ¾ Sign will be positive or negative
¾ Sum of cross-products is an indication of overall relationship
¾ S (X 𝑋)(Y 𝑌)
¾ Mostly positive, sum = positive ¾ Mostly negative, sum = negative ¾ About equal pos & neg, m 0
¾ Sum of cross-products can get very big if N is large
¾ So le adj fo N
S (X 𝑋)(Y 𝑌) 𝑁
¾ This is called the Covariance
3
Bill McDonald Speed Trap Subject X (Age) Y (Speed) X- 𝑋 Y- 𝒀 (X 𝑋)(Y 𝑌)
1 18 48 -10 3 -30 2 24 51 -4 6 -24 3 43 41 15 -4 -60 4 34 39 6 -6 -36 5 21 46 -7 1 -7
Sum: 140 225 0 0 -157 Mean: 28 45 SD: 9.23 4.42
Just for this sample, not estimating population
¾ Covariance = -157/5 = 31.4
¾ What does that mean? z Is it big? z Is it small? z What are the units?
¾ We need to get this on a scale that makes sense!
¾ Le anda di e i z Divide by standard deviation of X and Y
S (X 𝑋)(Y 𝑌) 𝑁𝑆𝑋𝑆𝑌
¾ When you standardize the covariance, you ge
¾ ha o ge i r ¾ The units cancel out!
. . = -.77
Cohen g ideline fo Corr
¾ These are for the behavioral sciences only z ~.10 = small z ~.30 = medium z ~.50 = large
Interpret in words!
¾ Younger drivers tended to be driving faster when pulled over.
¾ Watch wording! z Do interpret direction z Don e ca al lang age
Being o ng doe n CAUSE eo le o d i e fa
4
r as a measure of effect size
¾ Variance in a variable can be partitioned z Explained + unexplained = total
John Venn
r2 is proportion of variance explained
¾ rage,speed = -.77 ¾ That means -.772 = .59 (59%) of variance in
speed is explained by or predicted by age z Not necessarily caused by age
¾ 1 - .59 = .41 ¾ 41% of the variability in speed is NOT
explained by age
Factors that affect size of r
¾ Biased samples z Restricted range
tends to make corr smaller z extreme groups
e.g., only use people really high or low on X Tends to make corr larger
Design Issues
¾ N ho ld be e big ( 30) z Small samples are too unstable
¾ Wide variability on X & Y z No restriction of range
Factors that affect size of r
¾ Combined groups z May mask relationship in each group
¾ r is very affected by outliers