stat (multiple regression)
Name:
T.A. name/Class time:
MW Lecturer:
Lab 11: Multiple Regression
Open the lab dataset on Blackboard Learn or previously saved in SPSS.
-You may find pages 19-21 on the SPSS Instruction Manual useful on this lab.
-Use three decimal points throughout the lab.
This dataset contains the data for 120 subjects. The following are the variables of interest:
1. BMIChange = change in body mass index from the start of the study to the end of the study
1. PhysicalActivity = daily amount of exercise in minuets
1. DailyCalories = amount of calories consumed daily
1. AvgSleepHours= daily amount of sleep time on a school night
1. Age = age of the subject at the start of the study in years
We are interested in determining whether the subjects’ BMIChange (amount of change in BMI throughout the four years of college) can be predicted from the other variables above. To do this, we will use the data set to determine the best multiple regression model, check all regression assumptions, and report the results. Use a 5% significance level for this entire problem set.
1. (3 points) In SPSS, create a correlation table and find the correlation of BMIChange with each of the other 4 variables. Organize the variables (name, r) in the chart below from the strongest to the weakest correlation with BMIChange. Put a star (*) above all the variables which have significant correlations with BMIChange. You do not need to attach the correlation table as part of your outputs.
|
|
|
|
|
|
|
Variable |
|
|
|
|
|
Correlation (r) |
|
|
|
|
|
|
strongest |
|
|
weakest |
Note: Although there are multiple ways to choose the best multiple regression model, we will start with the full model and delete variables one at a time. Make sure to print out your ANOVA tables, model summary and coefficients tables for each model in questions 2-3.
2. (4 points) Perform a regression of BMIChange using the 4 explanatory variables.
1. Complete the table below regarding the regression line:
|
R2 |
Model Standard error |
ANOVA table F- test statistic |
P-value for F-test |
|
|
|
|
|
1. Complete the table below regarding the coefficients information:
|
Explanatory variables in model |
Is the variable significant? Answer yes or no. |
List the correspondent p-value. |
|
PhysicalActivity
|
|
|
|
DailyCalories
|
|
|
|
AvgSleepHours
|
|
|
|
Age
|
|
|
3. (3 points) Now drop the least significant variable and re-run the regression.
a) Complete the tables below:
|
R2 |
Model Standard error |
ANOVA table F- test statistic |
P-value for F-test |
|
|
|
|
|
|
Explanatory variables in model |
Is the variable significant? Answer yes or no. |
List the correspondent p-value. |
|
|
|
|
|
|
|
|
|
|
|
|
b) Was dropping that variable a good change? Explain why or why not.
4. (2 points) Based on your answer to question 3b, state the regression equation for the “better” model. Do not forget the proper notation.
5. (2 points) Using the regression equation from question 4, predict the BMIChange for the 22 year old male (Subject_ID=59) with PhysicalActivity of 44 minutes, DailyCalories of 3100 calories and AveSleepHours of 5.9 hours. Show your work.
6. (2 points) What is the residual for the prediction in the above problem? Show your work.
7. (3 points) Even if the model used in question 4 was a “better” model, does the model still have some variables that are not significant? If so, which ones?
Note: Remember that an analysis is not complete unless all assumptions are checked. Although not performed in this lab, be sure to always check the Normal probability plot and residual plots for each explanatory variable in order to determine whether the analysis is valid.
1