MultipleRegression1.pdf

32

Multiple Regression Sometimes you may want to explain variability in a continuous DV using several different continuous IVs. Multiple regression allows us to build an equation predicting the value of the DV from the values of two or more IVs. The parameters of this equation can be used to relate the variability in our DV to the variability in specific IVs. Sometimes people use the term multivariate regression to refer to multiple regression, but most statisticians do not use ìmultiple" and ìmultivariate" as synonyms. Instead, they use the term ìmultiple" to describe analyses that examine the effect of two or more IVs on a single DV, while they reserve the term ìmultivariate" to describe analyses that examine the effect of any number of IVs on two or more DVs. The general form of the multiple regression model is Yi = β0 + β 1Xi1 + β 2Xi2 + Ö + βkXik + εi,. The elements in this equation are the same as those found in simple linear regression, except that we now have k different parameters which are multipled by the values of the k IVs to get our predicted value. We can again use least squares estimation to determine the estimates of these parameters that best our observed data. Once we obtain these estimates we can either use our equation for prediction, or we can test whether our parameters are significantly different from zero to determine whether each of our IVs makes a significant contribution to our model. Care must be taken when making inferences based on the coefficients obtained in multiple regression. The way that you interpret a multiple regression coefficient is somewhat different from the way that you interpret coefficients obtained using simple linear regression. Specifically, the value of a multiple regression coefficient represents the ability of part of the corresponding IV that is unrelated to the other IVs to predict the part of the DV that is unrelated to the other IVs. It therefore represents the unique ability of the IV to account for variability in the DV. One implication of the way coefficients are determined is that your parameter estimates become very difficult to interpret if there are large correlations among your IVs. The effect of these relationships on multiple regression coefficients is called multicollinearity. This changes the values of your coefficients and greatly increases their variance. It can cause you to find that none of your coefficients are significantly different from zero, even when the overall model does a good job predicting the value of the DV. One implication of the way coefficients are determined is that your parameter estimates become very difficult to interpret if there are large correlations among your IVs. The typical effect of multicollinearity is to reduce the size of your parameter estimates. Since the value of the coefficient is based on the unique ability for an IV to account for variability in a DV, if there is a portion of variability that is accounted for by multiple IVs, all of their coefficients will be reduced. Under certain circumstances multicollinearity can also create a suppression effect. If you have one IV that has a high correlation with another IV but a low correlation with the DV, you can find that the multiple regression coefficient for the second IV from a model including both variables can be larger (or even opposite in direction!) compared to the coefficient from a model that doesn't include the first IV. This happens when the part of the second IV that is independent of the first IV has a different relationship with the DV than does the part that is

33

related to the first IV. It is called a suppression effect because the relationship that appears in multiple regression is suppressed when you just look at the variable by itself. To perform a multiple regression in SPSS

• Choose Analyze !!!! Regression !!!! Linear. • Move the DV to the Dependent box. • Move all of the IVs to the Independent(s) box. • Click the Continue button. • Click the OK button.

The SPSS output from a multiple regression analysis contains the following sections.

• Variables Entered/Removed. This section is only used in model building and contains no useful information in standard multiple regression.

• Model Summary. The value listed below R is the multiple correlation between your IVs and your DV. The value listed below R square is the proportion of variance in your DV that can be accounted for by your IV. The value in the Adjusted R Square column is a measure of model fit, adjusting for the number of IVs in the model. The value listed below Std. Error of the Estimate is the standard deviation of the residuals.

• ANOVA. This section provides an F test for your statistical model. If this F is significant, it indicates that the model as a whole (that is, all IVs combined) predicts significantly more variability in the DV compared to a null model that only has an intercept parameter. Notice that this test is affected by the number of IVs in the model being tested.

• Coefficients. This section contains a table where each row corresponds to a single coefficient in your model. The row labeled Constant refers to the intercept, while the coefficients for each of your IVs appear in the row beginning with the name of the IV. Inside the table, the column labeled B contains the estimates of the parameters and the column labeled Std. Error contains the standard error of those estimates. The column labeled Beta contains the standardized regression coefficient. The column labeled t contains the value of the t-statistic testing whether the value of each parameter is equal to zero. The p-value of this test is found in the column labeled Sig. A significant t-test indicates that the IV is able to account for a significant amount of variability in the DV, independent of the other IVs in your regression model.

Multiple regression with interactions In addition to determining the independent effect of each IV on the DV, multiple regression can also be used to detect interactions between your IVs. An interaction measures the extent to which the relationship between an IV and a DV depends on the level of other IVs in the model. For example, if you have an interaction between two IVs (called a two-way interaction) then you expect that the relationship between the first IV and the DV will be different across different levels of the second IV. Interactions are symmetric, so if you have an interaction such that the effect of IV1 on the DV depends on the level of IV2, then it is also true that the effect of IV2 on the DV depends on the level of IV1. It therefore does not matter whether you say that you have an interaction between IV1 and IV2 or an interaction between IV2 and IV1. You can also have interactions between more than two IVs. For example, you can have a three-way interaction between IV1, IV2, and IV3. This would mean that the two-way interaction between IV1 and IV2 depends on the level of IV3. Just like two-way interactions, three-way interactions are also