" For Asma only"

profilemsartishia
week_6_tutorial_8250_regression_v21_v2.pdf

© Charles T. Diebold, Ph.D., 7/11/13, 10/03/13. All Rights Reserved. Page 1 of 11

Multiple Linear Regression Tutorial:

RSCH-8250 Advanced Quantitative Reasoning

Charles T. Diebold, Ph.D.

July 11, 2013 (revised October 3, 2013)

How to cite this document:

Diebold, C. T. (2013, October 3). Multiple linear regression tutorial: RSCH-8250 advanced quantitative

reasoning. Available from [email protected].

Assignment and Tutorial Introduction .................................................................................................................... 2

Section 1: SPSS Specification of the Assignment .................................................................................................. 2

Section 2: Annotated Example SPSS Output, Write Up Guide, and Sample APA Tables .................................... 5

Descriptive Statistics ........................................................................................................................................... 5

Bivariate Correlation Matrix ............................................................................................................................... 6

Regression Method ............................................................................................................................................. 6

Model Summary.................................................................................................................................................. 7

ANOVA F Test of the Omnibus Regression ...................................................................................................... 7

Coefficients Output ............................................................................................................................................. 8

Results Write Up Guide .................................................................................................................................... 10

© Charles T. Diebold, Ph.D., 7/11/13, 10/03/13. All Rights Reserved. Page 2 of 11

Multiple Linear Regression Tutorial:

RSCH-8250 Advanced Quantitative Reasoning

Assignment and Tutorial Introduction

This tutorial is intended to assist RSCH8250 students in completing the Week 6 application assignment. I

recommend that you use this tutorial as your first line of instruction; then, if you have time, study the

textbook chapter and other resources noted in the classroom.

3 rd

edition of Field textbook:

Chapter 7 in the Field textbook, Smart Alex's Task #1 on p. 262, using the Supermodel.sav SPSS dataset.

4 th

edition of Field textbook:

Chapter 8 in the Field textbook, Smart Alex's Task #4 on p. 355, using the Supermodel.sav SPSS dataset.

The objective of the exercise is to conduct and interpret a standard multiple regression, including assessment of

multicollinearity.

The tutorial contains two sections. Section 1 provides step-by-step graphic user interface (GUI) screenshots for

specifying the assignment in SPSS. If you follow the steps you will produce correct SPSS output. Section 2

presents and interprets output for a different set of variables, and includes a results write up guide, and sample

APA style tables (the variables and data in Section 2 are “made up” and do not reflect real research).

Section 1: SPSS Specification of the Assignment

The assignment asks you to regress the per day salary of models (SALARY) on model’s age (AGE), number of

years having worked as a model (YEARS), and a rating of the model’s attractiveness (BEAUTY). The

capitalized words are the respective variable names in the Supermodel.sav SPSS dataset.

Open the dataset, The Variable View screenshot is shown below. There are four variables in the dataset,

corresponding to the four variable described in the previous paragraph.

© Charles T. Diebold, Ph.D., 7/11/13, 10/03/13. All Rights Reserved. Page 3 of 11

Go to AnalyzeDescriptive StatisticsDescriptives.

The Descriptives dialogue box appears (below left). Select all four variables and move into the Variable(s) box

(below right).

Click the OK button, which will produce output with the minimum, maximum, mean, and standard deviation

values for each variable.

© Charles T. Diebold, Ph.D., 7/11/13, 10/03/13. All Rights Reserved. Page 4 of 11

Go to AnalyzeRegressionLinear as shown below.

The Linear Regression dialogue box appears (below left). We want to predict salary, so it is the dependent

variable; click on it to highlight, then click the arrow next to the Dependent box, which will move salary into

the box. The other three variables are being used to predict salary, so they are independent variables; select and

move each one into the Independent(s) box. When done, it should look like the screen below right.

Below the Independent(s) box is the word “Method” and a dropdown box with different ways to specify the

entry of the predictors. For this assignment, leave it as “Enter”, which will force all predictors into the analysis

at the same time, regardless of whether they are statistically significant. The “Enter” method represents what is

referred to as a standard regression.

© Charles T. Diebold, Ph.D., 7/11/13, 10/03/13. All Rights Reserved. Page 5 of 11

In the Linear Regression dialogue box (see previous

screenshot), there is a column of buttons along the upper

right. Click the “Statistics” button; a new dialogue box will

appear. Click the boxes so that checkmarks appear for each

of the elements as shown at left.

For the purposes of this assignment, there is no need to

examine the dialogue boxes for Plots, Save, Options, or

Bootstrap. Even though Field discusses some regression

diagnostics, these are (except for multicollinearity) beyond

the level of this course.

So, once you have specified the statistics at left click the

Continue button, which will return you to the Linear

Regression dialogue, in which clicking the OK button will

run the analysis and produce adequate output for the

assignment. Example output is shown and interpreted in the

next section.

Section 2: Annotated Example SPSS Output, Write Up Guide, and Sample APA Tables

The example output shown below uses variables different from the Week 6 assignment. The purpose is to

explain key elements of the output, point out what to focus on, and demonstrate how to interpret and report the

results in APA statistical style.

The criterion (aka dependent variable, what we are trying to predict) is overall grade point average (GPA) of 9 th

grade students. The predictors (aka independent variables) are intelligence quotient (IQ), grade earned in an

English course (ENGG), and a measure of attention deficit (ADDSC).

Descriptive Statistics

As shown in the descriptive statistics output (from the DESCRIPTIVES procedure in SPSS), data had been

collected on 216 individuals. The minimum, maximum, mean, and standard deviation of each variable are

provided. Reporting on the operationalization of each variable and the observed values in the sample give the

reader insight into the variable being analyzed. For example: “Attention deficit was measured on a scale of 0 to

100 with higher scores indicating more pronounced attention deficit symptomatology. In the sample of 9 th

grade

students, attention deficit scores ranged from 24.67 to 76.67 with a mean of 52.85 (SD = 10.45).” The

regression procedure will also produce a descriptive statistics table, but it does not include the minimum and

maximum values.

Descriptive Statistics

N Minimum Maximum Mean Std. Deviation

GPA 216 .25 4.00 2.4386 .84507

IQ 216 55.00 137.00 102.3542 12.55762

ENGG 216 .00 4.00 2.4954 .90988

ADDSC 216 24.67 76.67 52.8480 10.45221

Valid N (listwise) 216

© Charles T. Diebold, Ph.D., 7/11/13, 10/03/13. All Rights Reserved. Page 6 of 11

Bivariate Correlation Matrix

From the regression output, the correlations table indicates the bivariate correlations and one-tailed p values of

each pair of variables. There was a statistically significant inverse relationship between GPA and attention

deficit score, r(214) = -.542, p < .001 one-tailed, indicating that as attention deficit increased, GPA tended to

decrease. You can similarly report the other two bivariate correlations with GPA, but keep in mind that these

are just descriptive because the focus is on the multiple regression results. As FYI, the 214 in the parenthesis in

the example above is the df value. For correlations, the df value is N – 2. The table below indicates that N = 216,

so df = 216 – 2 = 214.

Correlations

GPA ENGG ADDSC IQ

Pearson Correlation

GPA 1.000 .746 -.542 .446

ENGG .746 1.000 -.445 .283

ADDSC -.542 -.445 1.000 -.629

IQ .446 .283 -.629 1.000

Sig. (1-tailed)

GPA . .000 .000 .000

ENGG .000 . .000 .000

ADDSC .000 .000 . .000

IQ .000 .000 .000 .

N

GPA 216 216 216 216

ENGG 216 216 216 216

ADDSC 216 216 216 216

IQ 216 216 216 216

Regression Method

The output below simply informs us that all three variables were entered simultaneously, which is what had to

happen because we had specified the “Enter” method. In the results write up you just need to identify the

method used. Such as: “The purpose of the standard regression analysis was to examine the combined and

relative effects of 9 th

grade students IQ, English grade, and attention deficit score in predicting overall GPA.”

The term standard regression means that all predictors were entered simultaneously. Two other common

methods are statistical regression (aka stepwise regression) in which variables enter according to level of

significance, and sequential regression (aka hierarchical regression) in which the analyst decides and specifies

the order of entry of each variable.

Variables Entered/Removed a

Model Variables Entered Variables

Removed

Method

1 IQ, ENGG,

ADDSC b

. Enter

a. Dependent Variable: GPA

b. All requested variables entered.

© Charles T. Diebold, Ph.D., 7/11/13, 10/03/13. All Rights Reserved. Page 7 of 11

Model Summary

For a standard regression the information highlighted in yellow in the Model Summary output is relevant (the other information is somewhat

redundant to what we will pull out of the ANOVA output that follows). In this example, the three predictors combined explained 62.9% of the

variance in overall GPA. R 2 is the sample result, and adjusted R

2 is an estimate for the population; the sample result is typically reported.

Model Summary

Model R R Square Adjusted R

Square

Std. Error of the

Estimate

Change Statistics

R Square Change F Change df1 df2 Sig. F Change

1 .793 a .629 .624 .51832 .629 119.836 3 212 .000

a. Predictors: (Constant), IQ, ENGG, ADDSC

ANOVA F Test of the Omnibus Regression

The ANOVA output provides the test of statistical significance of the regression. In this example, the combined effect of student’s IQ, English

grade, and attention deficit score statistically significantly predicted overall GPA, F(3, 212) = 119.84, p < .001, R 2 = .63. The output shows

.000 in the Sig column, but probability cannot be zero, so in such cases, report as p < .001 (which is APA style), do not report p = .000. To be

clear, ignore Dr. Morrow’s reporting of p = .000 in her videos and, instead, follow APA style.

FYI for the inquisitive:

The regression sum of squares of 96.585 is the explained variance in GPA. The residual sum of squares of 56.955 is the variance in GPA that

was not explained by the three predictors. The sum of these is the total sum of squares. If you divide the total sum of squares by the regression

sum of squares you get the proportion of variance explained, which is R 2 = 96.585 ÷ 153. 540 = .629. R, which in this example is .793, is the

correlation between predicted GPA and actual GPA. That is, if you saved the predicted GPA scores from the regression analysis and then did a

correlation between those predicted scores and the original GPA scores that we were predicting, the correlation would be .793. For multiple

regression, a R value of .14 is considered a small effect, .36 a medium effect, and .51 a large effect; these correspond to the R 2 values of .02,

.13, and .26, respectively.

ANOVA a

Model Sum of Squares df Mean Square F Sig.

1

Regression 96.585 3 32.195 119.836 .000 b

Residual 56.955 212 .269

Total 153.540 215

a. Dependent Variable: GPA

b. Predictors: (Constant), IQ, ENGG, ADDSC

© Charles T. Diebold, Ph.D., 7/11/13, 10/03/13. All Rights Reserved. Page 8 of 11

Coefficients Output

The Coefficients output details the effects of each predictor while holding the other predictors constant (or, said another way, while controlling

for the other predictors). That is, standard multiple regression is not about the isolated effect of a predictor on the criterion, but the effect of a

predictor on the criterion while simultaneously considering the effect of the other predictors and the correlations among the predictors.

Correlations. To illustrate this point, look at the zero-order and part correlations columns. The zero-order is the simple correlation between a

predictor and the criterion. For example, the simple correlation between English grade and overall GPA is shown as .746, which is the same as

was shown in the previous correlations output. The part correlation (aka semipartial correlation, which is the more common term in the

literature, and is the term I will use hereafter) indexes the unique relationship between the predictor and criterion that none of the other

predictors explains. That is, when the predictors are correlated, some of the variance in the criterion is explained by more than one predictor;

the semipartial correlation filters out any shared explanation by predictors, leaving only each predictor’s unique contribution.

If predictors are correlated, the semipartial will always be smaller than the simple zero-order correlation (if predictors are uncorrelated, a rare

event, the zero-order and semipartial will be equal). In this example, the semipartial correlation between English grade and overall GPA is .563,

much less than the simple correlation of .746. The semipartial squared (sr 2 ), a commonly reported effect size, is the proportion of variance in

the criterion uniquely accounted for by the predictor; so, English grade uniquely accounted for 31.7% of the variance in overall GPA.

The interpretation of the partial correlation is not as straightforward as the semipartial correlation. In addition to the unique variance accounted

for, the partial correlation attributes to each predictor its relative proportion of explained variance in the criterion that is shared with other

predictors. When predictors are correlated, the partial correlation will always be higher than the semipartial correlation.

Relative Importance of Predictors. If interested in rank ordering the relative importance of each predictor, such is best arranged by using sr 2

or the absolute value of the semipartial correlations (Tabachnick & Fidell, 2007). In this example, order of variable importance is English grade

(sr 2 = .317), IQ (sr

2 = .017), then attention deficit score (sr

2 = .013).

Coefficients a

Model Unstandardized

Coefficients

Standardized

Coefficients

t Sig. 95.0% Confidence Interval

for B

Correlations Collinearity

Statistics

B Std. Error Beta Lower Bound Upper Bound Zero-

order

Partial Part Tolerance VIF

1

(Constant) .477 .580 .823 .412 -.666 1.620

ENGG .584 .043 .628 13.451 .000 .498 .669 .746 .679 .563 .802 1.247

ADDSC -.013 .005 -.156 -2.704 .007 -.022 -.003 -.542 -.183 -.113 .527 1.897

IQ .011 .004 .170 3.160 .002 .004 .019 .446 .212 .132 .605 1.654

a. Dependent Variable: GPA

© Charles T. Diebold, Ph.D., 7/11/13, 10/03/13. All Rights Reserved. Page 9 of 11

Unstandardized and Standardized Coefficients. The unstandardized coefficient, B, indexes the amount of raw score change in the criterion

for a 1-unit raw score change in the predictor. For example, while holding the other predictors constant, if English grade increases from 2.0 to

3.0 (or 2.4 to 3.4. or 1.7 to 2.7, in other words, any 1-unit increase), overall GPA is predicted to increase by .584 points (with 95% CI from .498

to .669). Similarly, if IQ increases 10 points, overall GPA is expected to increase .110 points (i.e., the B weight is .011 for a 1 point increase, so

for a 10 point increase it would be 10 x .011 = .110). From these two examples, you should be able to determine how to report the relationship

between attention deficit score and overall GPA.

The generic unstandardized regression equation is: Y’ = B0 + B1X1 + B2X2 + …BkXk

Y’ is the predicted value of the criterion (aka dependent variable), B0 is the constant, and the numbered Bs and Xs represent each predictor.

Contextualized to this example, the unstandardized equation is:

Overall GPA’ = .477 + .584(English grade) -.013(attention deficit score) + .011(IQ)

The equation can be used to predict overall GPA for specific values of each predictor. For example, if a student had an English grade of 2.7, an

attention deficit score of 65, and an IQ of 105, predicted overall GPA would be: .477 + .584(2.7) - .013(65) + .011(105) = 2.36.

The standardized coefficient, β (pronounced beta), indexes the standard unit change in the criterion for a 1-standard deviation change in the

predictor. For a 1 standard deviation increase in attention deficit score, overall GPA is predicted to decrease by .156 standard deviations.

Predictor Significance Tests. A t test is used to determine the statistical significance of each predictor. Technically, the t test determines if the

B coefficient is different from 0. The t value is equal to the B coefficient divided by its standard error (SE). For English grade, t = .584 ÷ .043

= 13.58, which is within rounding error of the t value shown in the output (for the computation to be accurate, the B and SE values need to be

known to several more decimal places than shown in the output). The t value is evaluated at the error degrees of freedom (df) value, which is N

– k - 1, where k is the number of predictor variables. For IQ one might report: “While holding the effects of the other predictors constant, the

effect of IQ on predicting overall GPA was statistically significant, t(212) = 3.16, p = .002, sr 2 = .017, uniquely accounting for 1.7% of the

variance in overall GPA. For each 1-point increase in IQ, overall GPA was expected to increase .011 points (95% CI from .004 to .019).”

Similar statements should be made for the other predictors.

Collinearity. Multicollinearity exists if one predictor is highly predicted by the set of other predictors, which can be the case when highly

correlated with just one of the other predictors in the set. In the last two columns of the Coefficients output (see previous page) the tolerance

and variance inflation factor (VIF) values can be examined to assess multicollinearity. Tolerance values greater than .1 and VIF values greater

than 10 are often cited as cutoffs. Cohen, Cohen, West, and Aiken (2003) demonstrated how these cutoffs are not particularly useful in

identifying potential multicollinearity issues, requiring bivariate correlations in excess of .90. The Help information within the SPSS software

recommends tolerance of .5 and VIF of 2 as cutoffs for further examination of any multicollinearity effects. Correlations between predictors of

.70 or higher can cause issues. Though there are other possible causes unrelated to multicollinearity, a common manifestation of

multicollinearity is different signs (i.e., + or -) for a predictor’s zero-order correlation and its β weight. In such cases, if multicollinearity is the

culprit, the regression analysis is invalid.

© Charles T. Diebold, Ph.D., 7/11/13, 10/03/13. All Rights Reserved. Page 10 of 11

Results Write Up Guide

Begin the write up by describing the context of the research and the variables. If known, state how each variable was operationalized, for

example: “Overall GPA was measured on the traditional 4-point scale from 0 (F) to 4 (A)”, or “Satisfaction was measured on a 5-point likert-

type scale from 1 (not at all satisfied) to 5 (extremely satisfied).” Please pay attention to APA style for reporting scale anchors (see p. 91 and p.

105 in the 6 th

edition of the APA Manual).

Report descriptive statistics such as minimum, maximum, mean, and standard deviation for each metric variable. For nominal variables, report

percentage for each level of the variable, for example: “Of the total sample (N = 150) there were 40 (26.7%) males and 110 (73.3%) females.”

Keep in mind that a sentence that includes information in parentheticals must still be a sentence (and make sense) if the parentheticals are

removed. For example, the one above without parentheticals is still a sentence and makes sense: “Of the total sample there were 40 males and

110 females.”

State the purpose of the analysis or provide the guiding research question(s). If you use research questions, do not craft them such that they can

be answered with a yes or no. Instead, craft them so that they will have a quantitative answer. For example: “What is the strength and direction

of relationship between X and Y?” or “What is the difference in group means on X between males and females?”

Present null and alternative hypothesis sets applicable to the analysis. For regression there would be a hypothesis set for the overall result (i.e.,

the combined effect of the predictors) and a hypothesis set for each predictor while “controlling for” or “holding constant” the effects of the

other predictors.

State assumptions or other considerations for the analysis, and report the actual statistical result for relevant tests. For this course, the only

regression consideration that needs to be presented and discussed is for multicollinearity. Even if violated, you must still report and interpret

the remaining results.

Report and interpret the overall regression results. Report and interpret the results of each predictor. Be sure to include the actual statistical

results in text—examples were provided within the annotated output section of this tutorial. Don’t forget to interpret the results (e.g., as IQ

increased, overall GPA was predicted to increase; based on semipartial correlations, variable x was the most important predictor of y; etc.).

Draw conclusions about rejecting or failing to reject each null. If needed, summarize the results, without statistics, in a concluding sentence or

paragraph.

Provide APA style tables appropriate to the analysis. Do not use SPSS output, it is not in APA style. Example APA tables for a multiple

regression are shown below using the results from the example output in this tutorial. Although one would typically not duplicate information

in text and tables, it is important to demonstrate competence in both ways of reporting the results; so, you cannot just provide tables, you must

also report the relevant statistical results within the textual write up.

© Charles T. Diebold, Ph.D., 7/11/13, 10/03/13. All Rights Reserved. Page 11 of 11

Example APA Tables for Standard Regression

Table 1

Means, Standard Deviations, and Intercorrelations for Overall GPA and Predictor Variables IQ, English

Grade, and Attention Deficit Score

Variable M SD 1 2 3 4

1. Overall GPA 2.44 0.85 .45 .75 -.54

2. IQ 102.35 12.56 < .001 .28 -.63

3. English grade 2.50 0.91 < .001 < .001 -.45

4. Attention deficit score 52.85 10.45 < .001 < .001 < .001

Note. Upper diagonal contains correlation coefficients. Lower diagonal contains p values.

Table 2

Standard Regression Summary for IQ, English Grade, and Attention Deficit Scores Predicting Overall GPA

Variable B 95% CI β sr p

Constant 0.477 [-0.666, 1.620]

IQ 0.011 [0.004, 0.019] .170 .132 .002

English grade 0.584 [0.498, 0.669] .628 .563 < .001

Attention deficit score -0.013 [-0.022, -0.003] -.156 -.113 .007

Note. CI = confidence interval for B; sr = semipartial correlation (aka, part correlation).