Wk7-HW7-rev.docx

WEEK 7 – HW 7 FALL 2017

CORRELATION/REGRESSION PROBLEMS BASED ON LANE CHAPTER 14 AND ILLOWSKY CHAPTER 12

ALWAYS REMEMBER THAT EVEN A PERFECT CORRELATION DOES NOT PROVE AN ACTUAL RELATIONSHIP BETWEEN TWO VARIABLES. OTHER SOLID (FACTUAL) EVIDENCE MUST SUPPORT SUCH A RELATIONSHIP AS WELL. COMMON SENSE MAY BE INCLUDED IF UNBIASED.

#1 Write the equation for a regression line and explain what does each term refers to.

#2 The standard error of the estimate measures what? What is its formula ?

#3 For the X,Y data below: (a) calculate r and determine if it is significant (b) test if the slope of the regression line differs significantly from zero. (c) Calculate the slope’s 95% confidence interval.

X

4

3

5

11

10

14

Y

6

7

12

17

9

21

#4 The various inferential statistics of linear regression are based on what assumptions?

#5 (a) True/false: The slope of the regression line is significant, the correlation is also significant

(b)True/false: The slope of X verses Y is larger for Population 1 than for Population 2, hence the correlation is larger in Population 1 than in Population 2. EXPLAIN?

(c) True/false; 40% of the variance is explained if the correlation is 0.8

(d) True/false: If the actual Y score was 31, but the predicted score was 28, then the error of prediction is 3.

#6 The data in the table show home values (in $MILLIONS) and taxes (in MILLIONS) on those homes

HOME VALUE ($M)

0.6

0.75

1.0

1.5

2.0

2.5

3.0

TAXES ($M)

0.03

0.09

0.20

0.44

0.69

1.04

1.35

(a) Which is the INDEPENDENT variable (X) and which is the DEPENDENT variable (Y) ?

(b) DRAW a SCATTER PLOT OF THESE TWO VARIABLES (X VS Y) & DOES IT SHOW A RELATIONSHIP VISUALLY ?

(c) Calculate the regression line (least squares line) and fill in the “a” and “b” in the equation: Y-hat = a +b*X

(d) Calculate the CORRELATION COEFFICIENT and DETERMINE if it is SIGNIFICANT

(e) Using your regression equation what does it estimate the taxes to be on a $1 million and the $2.5 million home? AND, what would be the predicted taxes on a home with NO VALUE (e.g., severe water damage). How do these regression equation taxes compare to the actual ones in the Table do you feel?

(f) Interpret the SLOPE of this least squares line.

THIS NEXT STATISTICAL TOOL ALLOWS US TO COMPARE TO SETS OF DATA TO SEE IF THEY ARE SIMILAR OR NOT. WE ARE ACTUALLY COMPARING THE STANDARD DEVIATIONS OF THESE TWO DATA SETS. IT COULD INVOLVE COMPARING SAMPLES FROM TWO DIFFERENT POPULATIONS OR COMPARING WHAT IS OBSERVED IN A SINGLE POPULATION VERSES WHAT IS EXPECTED TO BE OBSERVED IN THAT POPULATION. SOUND USEFUL?

THE DOWNSIDE IS THAT WE USE YET ANOTHER DISTRIBUTION: THE CHI SQUARE (X2 OR X^2) DISTRIBUTION THAT ALSO HAS ITS OWN TABLE (AVAILABLE IN COURSE RESOURCES). BUT, AS ALWAYS WE WILL BE COMPARING A X2 TEST STATISTIC TO ITS CRITICAL VALUE FOR THE DEGREES OF FREEDOM (LIKE THE t-TEST) AND LEVEL OF SIGNIFICANCE (α) AND α IS TYPICALLY 5% OR 0.05)

#7 HS Sophomore students organized a pep-rally raffle and 36 students from all years participated. The winners included: 6 freshmen, 14 sophomores, 9 juniors and 7 seniors. The school population is actually 30% freshmen, 25% sophomores, 25% juniors, and 20% seniors. As a freshman you feel the sophomores cheated somehow. Prove it (one way or the other).

This is another set up the Table problem, where we simply compare two sets of data to see if they are statistically similar (=) or not (as in a hypothesis test).

CLASS

OBSERVED (O)

EXPECTED (E)

(E – O)2 / E

Freshmen

Sophomores

Juniors

Seniors

test statistic

X2 = SUM: >>

The “SUM” in this Table is the X2 TEST STATISTIC that you compare to the critical value for the level of significance you want (α = 1% or 5% or 10% for example). ASSUME 5% HERE. YOU ALSO NEED TO FIGURE OUT THE DEGREES OF FREEDOM HERE (df = N – 1).

(a) FIND THE CRITICAL VALUE (FOR THIS α AND df) AND COMPARE IT TO THE TEST STATISTIC TO SEE IF YOU ACCEPT OR REJECT Ho.

(b) USE THE SOFTWARE TO SIMPLY GET THE PROBABILITY FOR THE X2 TEST STATISTIC VALUE (the “SUM”) YOU CALCULATED IN THE TABLE ABOVE. BOTTOM LINE: DID THE SOPHOMORES CHEAT?

#8 For whatever reason a restaurant wants to know if men order the same varieties of breakfast items as women. Here are their data:

French Toast

Pancakes

Waffles

Omelets

TOTALS

Men

O: 47

E:

O: 35

E:

O: 28

E:

O: 53

E:

Women

O: 65

E:

O: 59

E:

O: 55

E:

O: 60

E:

TOTALS

DO THEY INDEED HAVE DIFFERENT PREFERENCES? Assume the level of significance (α) is 5% (0.05) . WHAT ARE YOUR Ho AND Ha ? (THE X2 TEST IS A RIGHT TAILED TEST – THE X2 CURVE IS DEFINITELY POSITIVELY SKEWED BUT GETS MORE “NORMAL’ LOOKING AS THE df INCREASE – CHECK IT OUT).

The data Table above has the additional columns necessary for calculating the X2 test statistic that you will then compare to the critical X2 value from a Table (in Course Resources or on-line) . Also, get the Probability for that test statistic from a X2 probability calculator on-line.

This problem is a TEST FOR HOMOGENEITY (similar to the TEST FOR INDEPENDENCE). CHECK ILLOWSK7 SECTIONS 11.3 AND 11.4 FOR X2 TEST STATISTIC CALCULATION DETAILS (SEE EXAMPLE PROBLEMS – PARTICULARLY IN 11.3). IT LOOKS COMPLICATED, BUT IT ISN’T. THE KEY IS GETTING THE “TOTALS” AND USING THEM APPROPRIATELY TO GET THE “E” VALUES FOR EACH BOX. DON’T FOR GET THE DEGREES OF FREEDOM (df) (A SLIGHTLY DIFFERENT CALCULATION INVOLVED).

ONE REQUIREMENT IS THAT EVERY “OBSERVED” DATA GROUP MUST CONTAIN MORE THAN 5 DATA POINTS (WHICH THE ABOVE PROBLEM DOES).

YOU HAVE THE “O”BSERVED VALUES AND ONCE YOU CALCULATE THE “E”XPECTED VALUES JUST SET UP THE X2 CALCULATION TABLE:

BKFST CHOICE

O

E

(O – E)2 / E

M-FT

W-FT

you do the rest

(need more rows and the last one is the Total for (O-E)2/E ) which is the X2 test value

Σ = X2 (TEST STATISTIC)

THAT’S ENOUGH FOR THIS WEEK. GET READY FOR A BUSY WEEK, NEXT WEEK. THE FINAL EXAM WILL APPEAR AT 12:01 AM FRIDAY MORNING AND YOU HAVE UNTIL 11:59 PM THE FOLLOWING SUNDAY TO SUBMIT IT.