Regression Analysis Power Point

profileIsabelita27
templateforregresionanalysis.ppt

QUANTIFYING AN ASSOCIATION TO PREDICT FUTURE EVENTS

Chapter 12
Regression Analysis

Regression Analysis

  • A statistical technique to

quantify associations and

begin the process of

predicting future events

Linear Regression

Used to identify a relationship between a single independent variable (x axis) and a single dependent variable (y axis) at the interval or ratio level

If there is a linear relationship when these variables are graphed you can use the slope of the line to tell you how much the predicted value of the dependent variable changes when there is a one unit change in the independent variable.

Linear Regression

Chart1

1
5
10
3
12
6
4
Fetal weight at various levels of daily cigarette consumption
8.5
7.9
7
8.2
6.6
7.1
7.8

Sheet1

number of cigarettes Fetal weight at various levels of daily cigarette consumption
1 8.5
5 7.9
10 7
3 8.2
12 6.6
6 7.1
4 7.8

Residual

  • The difference between where the data actually falls and where the linear regression line predicts they will fall
  • Also called prediction error
  • Lower residuals mean a better fit of the prediction line and the data

*

Multiple Regression

Used when you wish to examine the relationship between multiple independent variables (X1 , X2….) and an outcome of interest (Y) at the interval or ratio level

Yi=a- b1 X1+ b2X2+e

a=constant, value when x=0

b= beta value

e=error term

X1 =number of cigarettes smoked each day

X2=diabetic status (yes/no)

Beta Values in Multiple Regression

  • Beta values are the rate of change in the outcome variable (Y) for every one unit increase in the independent variable holding the other independent variables constant
  • When beta value is positive there is an increase in the outcome variable when this independent variable is increased
  • When beta value is negative an

increase in the level of the independent

variable decreases the level of the

outcome variable.

Looking at Computer Output

R2

R Square tells you the percent of the variance in the dependent or outcome variable that is explained in your regression model.

If R Square is 0.74 that means the variables you have included in your regression model explain 74% of the variance in the outcome you are studying. R Square always increases when additional independent variables are included in the model even if the added independent variables are not significant.

The adjusted R square is a more conservative estimate of the R square and is a better option when a large number of independent variables are included.

Standard Error of the Estimate

  • Tells you the average amount of error there will be in the predicted outcome using your specific regression model.
  • You would like to minimize this to make your prediction as accurate as possible so the closer the standard error of the estimate is to zero the better.

Determining Significance

It is a little trickier than previous tests but still has the same underlying principals.

Look at your R Square and find the corresponding p value- is that significant (<alpha)?

Then look at each of the independent variables for significance as well. You can have a significant R Square with an independent variable included that doesn’t add anything to your model. Is the R Square change significant for each added independent variable? If not, it means when you controlled for the other independent variables the insignificant variable no longer added to the ability to predict the outcome.

The Beta value for each independent variable

Look at the column in SPSS for unstandardized coefficients and you will see the beta coefficients for the independent variables in the regression equation.

A one unit change in variable X1 will produce a change equal to the beta for variable in the outcome variable. For example, a beta for independent variable X1 is 2.3. That means for every one unit increase in variable A that occurs, you would predict a 2.3 unit increase in your outcome variable when controlling your other independent variables.

Logistic Regression

Examines the relationship

between multiple independent

variables with a dependent variable

that is binary (nominal or ordinal

with only two categories) such as yes/no, alive/dead.

Generates an odds ratio (OR) which is frequently helpful for explaining results to the public. (It is just the probability of the outcome occurring divided by the probability of the outcome not occurring). We will discuss the OR further in the next chapter.

Tests that Control for the Impact of More than One Independent Variable on a Single Dependent Variable

*Multiple and logistic regression allow the researcher to examine the effect of multiple independent variables on a single dependent variable. For example, if the researcher believes that maternal age and smoking both impact infant birth weight the relationship between maternal age and infant birth weight can be seen while controlling for the impact of smoking on infant birth weight.

Dependent Variable Test Example
Yes/No Logistic Regression Among adolescents who attempt to commit suicide, what is the relationship between alcohol consumption, age, gender and risk of death? (independent variables alcohol consumption, age, gender, dependent variable- death (yes/no)
     
Continuous Variable Multiple Regression How do parents education level, income level and school district rank impact fourth grade reading scores among impoverished children? (independent variables- parents education level, income level, and school district rank, dependent variable reading score at the interval/ratio level)

Assignment

  • Complete chapter 12 exercises.
  • Complete Research application exercise