Regression Analysis Power Point
QUANTIFYING AN ASSOCIATION TO PREDICT FUTURE EVENTS
Chapter 12
Regression Analysis
Regression Analysis
- A statistical technique to
quantify associations and
begin the process of
predicting future events
Linear Regression
Used to identify a relationship between a single independent variable (x axis) and a single dependent variable (y axis) at the interval or ratio level
If there is a linear relationship when these variables are graphed you can use the slope of the line to tell you how much the predicted value of the dependent variable changes when there is a one unit change in the independent variable.
Linear Regression
Chart1
| 1 |
| 5 |
| 10 |
| 3 |
| 12 |
| 6 |
| 4 |
Sheet1
| number of cigarettes | Fetal weight at various levels of daily cigarette consumption |
| 1 | 8.5 |
| 5 | 7.9 |
| 10 | 7 |
| 3 | 8.2 |
| 12 | 6.6 |
| 6 | 7.1 |
| 4 | 7.8 |
Residual
- The difference between where the data actually falls and where the linear regression line predicts they will fall
- Also called prediction error
- Lower residuals mean a better fit of the prediction line and the data
*
Multiple Regression
Used when you wish to examine the relationship between multiple independent variables (X1 , X2….) and an outcome of interest (Y) at the interval or ratio level
Yi=a- b1 X1+ b2X2+e
a=constant, value when x=0
b= beta value
e=error term
X1 =number of cigarettes smoked each day
X2=diabetic status (yes/no)
Beta Values in Multiple Regression
- Beta values are the rate of change in the outcome variable (Y) for every one unit increase in the independent variable holding the other independent variables constant
- When beta value is positive there is an increase in the outcome variable when this independent variable is increased
- When beta value is negative an
increase in the level of the independent
variable decreases the level of the
outcome variable.
Looking at Computer Output
R2
R Square tells you the percent of the variance in the dependent or outcome variable that is explained in your regression model.
If R Square is 0.74 that means the variables you have included in your regression model explain 74% of the variance in the outcome you are studying. R Square always increases when additional independent variables are included in the model even if the added independent variables are not significant.
The adjusted R square is a more conservative estimate of the R square and is a better option when a large number of independent variables are included.
Standard Error of the Estimate
- Tells you the average amount of error there will be in the predicted outcome using your specific regression model.
- You would like to minimize this to make your prediction as accurate as possible so the closer the standard error of the estimate is to zero the better.
Determining Significance
It is a little trickier than previous tests but still has the same underlying principals.
Look at your R Square and find the corresponding p value- is that significant (<alpha)?
Then look at each of the independent variables for significance as well. You can have a significant R Square with an independent variable included that doesn’t add anything to your model. Is the R Square change significant for each added independent variable? If not, it means when you controlled for the other independent variables the insignificant variable no longer added to the ability to predict the outcome.
The Beta value for each independent variable
Look at the column in SPSS for unstandardized coefficients and you will see the beta coefficients for the independent variables in the regression equation.
A one unit change in variable X1 will produce a change equal to the beta for variable in the outcome variable. For example, a beta for independent variable X1 is 2.3. That means for every one unit increase in variable A that occurs, you would predict a 2.3 unit increase in your outcome variable when controlling your other independent variables.
Logistic Regression
Examines the relationship
between multiple independent
variables with a dependent variable
that is binary (nominal or ordinal
with only two categories) such as yes/no, alive/dead.
Generates an odds ratio (OR) which is frequently helpful for explaining results to the public. (It is just the probability of the outcome occurring divided by the probability of the outcome not occurring). We will discuss the OR further in the next chapter.
Tests that Control for the Impact of More than One Independent Variable on a Single Dependent Variable
*Multiple and logistic regression allow the researcher to examine the effect of multiple independent variables on a single dependent variable. For example, if the researcher believes that maternal age and smoking both impact infant birth weight the relationship between maternal age and infant birth weight can be seen while controlling for the impact of smoking on infant birth weight.
| Dependent Variable | Test | Example |
| Yes/No | Logistic Regression | Among adolescents who attempt to commit suicide, what is the relationship between alcohol consumption, age, gender and risk of death? (independent variables alcohol consumption, age, gender, dependent variable- death (yes/no) |
| Continuous Variable | Multiple Regression | How do parents education level, income level and school district rank impact fourth grade reading scores among impoverished children? (independent variables- parents education level, income level, and school district rank, dependent variable reading score at the interval/ratio level) |
Assignment
- Complete chapter 12 exercises.
- Complete Research application exercise