psychology 302

Hong666
correlation.pdf

1

Scatterplots and Correlation

Correlation

¾ Useful tool to assess relationships ¾ Must have two variables measured on one set of

people

¾ Correlation only measures strength of linear association

Linear relationships are not perfect lines

¾ Variables have variability (duh) ¾ Relationships may be generally linear

even if all points are not on the line

Magnitude of r Not all relationships are linear

2

Properties of r

¾ X & Y must be quantitative z Interval or ratio

¾ I doe n ma e hich a iable i edic o and which is response z rxy = ryx

Properties of r

¾ Correlation has no units z So r can be compared for different variables

¾ Value of r is always between -1 and +1

Computing r

¾ Consider deviations around mean of X & Y ¾ (X 𝑋) (Y 𝑌)

Cross-Product

¾ To consider X & Y together, multiply their deviations

¾ (X 𝑋)(Y 𝑌) ¾ Sign will be positive or negative

¾ Sum of cross-products is an indication of overall relationship

¾ S (X 𝑋)(Y 𝑌)

¾ Mostly positive, sum = positive ¾ Mostly negative, sum = negative ¾ About equal pos & neg, m 0

¾ Sum of cross-products can get very big if N is large

¾ So le adj fo N

S (X 𝑋)(Y 𝑌) 𝑁

¾ This is called the Covariance

3

Bill McDonald Speed Trap Subject X (Age) Y (Speed) X- 𝑋 Y- 𝒀 (X 𝑋)(Y 𝑌)

1 18 48 -10 3 -30 2 24 51 -4 6 -24 3 43 41 15 -4 -60 4 34 39 6 -6 -36 5 21 46 -7 1 -7

Sum: 140 225 0 0 -157 Mean: 28 45 SD: 9.23 4.42

Just for this sample, not estimating population

¾ Covariance = -157/5 = 31.4

¾ What does that mean? z Is it big? z Is it small? z What are the units?

¾ We need to get this on a scale that makes sense!

¾ Le anda di e i z Divide by standard deviation of X and Y

S (X 𝑋)(Y 𝑌) 𝑁𝑆𝑋𝑆𝑌

¾ When you standardize the covariance, you ge

¾ ha o ge i r ¾ The units cancel out!

. . = -.77

Cohen g ideline fo Corr

¾ These are for the behavioral sciences only z ~.10 = small z ~.30 = medium z ~.50 = large

Interpret in words!

¾ Younger drivers tended to be driving faster when pulled over.

¾ Watch wording! z Do interpret direction z Don e ca al lang age

Being o ng doe n CAUSE eo le o d i e fa

4

r as a measure of effect size

¾ Variance in a variable can be partitioned z Explained + unexplained = total

John Venn

r2 is proportion of variance explained

¾ rage,speed = -.77 ¾ That means -.772 = .59 (59%) of variance in

speed is explained by or predicted by age z Not necessarily caused by age

¾ 1 - .59 = .41 ¾ 41% of the variability in speed is NOT

explained by age

Factors that affect size of r

¾ Biased samples z Restricted range

tends to make corr smaller z extreme groups

e.g., only use people really high or low on X Tends to make corr larger

Design Issues

¾ N ho ld be e big ( 30) z Small samples are too unstable

¾ Wide variability on X & Y z No restriction of range

Factors that affect size of r

¾ Combined groups z May mask relationship in each group

¾ r is very affected by outliers