#75461 - 4 Pages - Research project of computer gaming
Lab 3 – SPSS [ANOVA & REGRESSION]
1 | P a g e
The Road So Far
Lab 1: Data Cleaning and Univariate Analysis
Lab 2: Bivariate Analysis: Correlation and t-test
Lab 3 OBJECTIVE:
1. Familiarizing yourself with Analysis of Variance (ANOVA)
o When can you use it o How to run it o How to interpret results o Limitation
2. Re-familiarizing yourself with Regression Analysis
o Linear regression with no controls o Linear regression with controls o Linear regression with multiple independent variables o How to run regression in SPSS o How to interpret results
Lab 3 – SPSS [ANOVA & REGRESSION]
2 | P a g e
REVIEWING T-TEST
EXAMPLE 1 – GENDER AND DEPRESSION
From marking LAB 2, I identified a few patterned problems with the “mechanic” and interpretation of t- test results:
Levene’s Test for Equality of Variances doesn’t tell you about the significance of the t-test for Equality of Means. It tells you about whether you can reject the null hypothesis of equality of variances.
EXERCISE 1 – No. Publication and Discipline PhD
Since in Lab 2 we had no stat. sig. t-test, the interpretation of the Mean Difference in the result table might not be 100% clear for all of you.
Consider the following results: o (1) Is Levene’s significant, (2) is t-test significant, and (3) what does 3.6121 means
SUBSTANTIALLY?
Lab 3 – SPSS [ANOVA & REGRESSION]
3 | P a g e
REVISION Relationship between two continuous variables Pearson’s Correlation Relationship between one categorical var. and one continuous variable T-Test Relationship between two categorical variables Chi-Square
One-way ANOVA (Analysis of Variance [amongst the means])
RELATIONSHIP BETWEEN CATEGORICAL & CONTINUOUS VARIABLES
The one-way analysis of variance (ANOVA) is used to determine whether there are any statistically significant differences between the means of two or more independent (unrelated) groups (although you tend to only see it used when there is a minimum of three, rather than two groups).
For three groups and more, a statistically significant ANOVA tells you that is at least one mean comparaison between the groups is statistically significant. (Hint: between lower- vs. highest mean group.)
From StackOverFlow:
“In the typical ANOVA we have a categorical variable with different groups, and we attempt to determine whether the measurement of a continuous variable differs between groups.”
Exercise 2 – One-way ANOVA – DEGREE_TYPE_LAB_3 and SALARY
One-Way ANOVA can tell us if there is a statistically significant difference in “mean” salary between UBC BA, BSc, and Bcom degree holders who have a LinkedIn account.
Q.2.1: what is the independent variable and dependent variable? (4 pts)
The mechanic of running a one-way ANOVA:
Analysis Compare Means One-Way Anova o Factor == Independent Categorical Variable
Click on “Options” and Check “Descriptive” and “Means plot”.
Lab 3 – SPSS [ANOVA & REGRESSION]
4 | P a g e
Click “Continue” and then “OK” to run the one-way ANOVA. Let’s look at the result and always remember that stats here is not too scary. You are looking at
salary measures for three different groups.
The first thing to do is to look at the Graph “Means Plots”:
Q.2.2: By looking at both the graph and the descriptive statistics table, what can you say about the salary of UBC graduates? (4 pts)
Lab 3 – SPSS [ANOVA & REGRESSION]
5 | P a g e
Now that you have a basic understanding of the descriptive results, let’s look at the ANOVA table:
There is always a lot of information, but the important thing to pay attention is:
Always remind yourself the goal of the statistical test you are doing before looking at the results. Here we have a one-way ANOVA. We want to know if there is a statistically significant difference
between the salary of BA, BSc, and BCom holders? Put differently, what is the probability that the difference is due to chance alone?
o Null hypothesis: no difference o Alternative hypothesis: not due to chance and related to categorical variable
With that in mind, you look at the p-value, that is, the significance statistics.
Q.2.3: What is the result of the one-way ANOVA test? What does it tell us about the relationship between salary and bachelor degree types? (4 pts)
Important Point (and frustrating things with one-way ANOVA)
Is the difference in salary statistically significant between BA and BSc, or between BSc and Bcom? One-way ANOVA doesn’t tell us that! What one-way ANOVA is telling you is this: “there is at least ONE mean comparison between the
groups that is statistically significant”.
Lab 3 – SPSS [ANOVA & REGRESSION]
6 | P a g e
EXERCISE 3 – Two-way ANOVA – DEGREE_TYPE_LAB_3, GENDER and SALARY
Q.3.1: what are the two independent variables and one dependent variable? (6 pts)
Keep in mind with two-way ANOVA, we are still looking at difference in means
Dependent Variable Continuous Variable o Salary (annual income in $K)
Two Catagorical Variables
o Gender (Female and Male ) o Degree_Type_Lab_2 (BA, BSC, and BCOM)
In SPSS, you need General Linear Model:
o Analysis General Linear Model Univariate
The mechanic of running a two-way ANOVA :
STEP 1
Lab 3 – SPSS [ANOVA & REGRESSION]
7 | P a g e
STEP 2 (Click on Options)
STEP 3 (Click on Plots)
Lab 3 – SPSS [ANOVA & REGRESSION]
8 | P a g e
TA, TA, how do I know what goes where in STEP 3?
Separate Lines: categorical variable with the LESS attributes Horizontal Axies: catetorical variable with the MOST attributes Click on Add and Continue Click on OK Let’s slowly check out the OUTPUT
A two-way ANOVA OUTPUT file can be a bit intimidating, let’s look at it together:
1. Tips from the Pro: Scroll to Plots at the bottom of Output first
Q.3.2: A one-way ANOVA couldn’t tell us which mean salary difference amongst the 3 degree types WAS significant. From the plot above, can you start to evaluate which salary difference between degree type is probably NOT significant? Which is it? Why is that? (6 pts)
Lab 3 – SPSS [ANOVA & REGRESSION]
9 | P a g e
Ok, still a lot of results in the OUTPUT, how do we look at all that stuff?
Plan of Attack I:
1. Scroll back up to Descriptive Table o Try to make connection with what you just saw in the GRAPH.
2. Scroll down Pairwise Comparions for Gender of Graduate
o Remember when I said that ANOVA is like a t-test for categorical variable with more than two attributes…what do you see?
Descriptive Table
Q.3.3: By looking at both the graph and the descriptive statistics table, what can you say about the salary of UBC graduates based on degree type and gender? What stand out to you? What seems important to mention? (6 pts)
Pairwise Comparions for Gender of Graduate
Again, remind yourself of what you are doing here: COMPARING MEAN DIFFERENCE Does the Pairwise Comparisons Table for Gender strangely remind you of a significance test from
LAB 2? Which one?
Lab 3 – SPSS [ANOVA & REGRESSION]
10 | P a g e
Pairwise Comparions for Degree_Type
Q.3.4: From looking at the two Pairwise Comparisons Tables, what can you tell about the differences of salary based on Gender and Degree Type? What stand out for you? What will probably be significant? (6 pts)
Plan of Attack II:
We looked at:
Plot Descriptive Statistics Table Pairwise Comparisons Tables Helpful?
The next step is to look at the two-way ANOVA statistical results: Test of Between- Subjects Effect
Gender is significant Degree_Type is significant Degree_Type and Gender is significant R-Squared: 79.3%
Lab 3 – SPSS [ANOVA & REGRESSION]
11 | P a g e
Q.3.5: What does the Tests of Between-Subjects Effects tells you? What is statistically significant? (10 pts)
FROM ANOVA TO REGRESSION
The best way to think of the differences between ANOVA and REGRESSION is to look at the kind of variables used for each:
ANOVA: test the difference between group’s means for variable of choice o 1 dependent continuous variable o 1 or multiple independent categorical variable(s) o In terms of statistical language, in comparing means, ANOVA ask how much the
difference between these groups (e.g. race, gender, job type) explain variation in salary
REGRESSION: same as ANOVA with dummy variables
o 1 dependent continuous variable (but can also be categorical) *key difference o 1 or multiple independent and control variable(s)
Can be categorical or continuous *key difference o Can discriminate between the effect of each attribute for categorical variable
It is the difference between BA or Bsc that are driving the significance? *key difference
Lab 3 – SPSS [ANOVA & REGRESSION]
12 | P a g e
EXERCISE 4 – LINEAR REGRESSION– SALARY
Q4.1: Ask yourself, if you think gender, race, and GPA can help predict salary, could you use a two-way ANOVA? Why? (4 pts)
BUILDING YOUR REGRESSION MODEL
Building model means:
o What do you think can help predict the variation in our graduate’s annual salary on the job market?
Dependent Variable: Salary (continuous variable) Control Variable(s): Gender, Race Independent Variable: Degree Type, # of Language, # of Skills
* List dependent variable you want to predict in your regression model (2 pts)
* List 2 control variables you want to include in your regression model (4 pts)
* List 3 independent variables you want to include in your regression model (6 pts)
STEP 1: Dealing with Independent Categorical Variables
BA = 1 BSc = 2 BCom = 3
* In linear regression, the arbitrary assignment of numerical values to each degree type variable’s attributes (1, 2, 3) would be interpret numerically. As we know, apart from distinguishing each attribute from one another, the numerical value doesn’t mean anything in themselves.
* In linear regression, all independent categorical variables need to be recoded as dummy variables as well to allow the model to differentiate between the effect of each degree type on annual salary.
* Remember differentiating between each degree type is something ANOVA was not able to tell us. This is key to remember.
* Recoding dummy variables is what allowed to look at the specific effect of each attribute on annual income.
Lab 3 – SPSS [ANOVA & REGRESSION]
13 | P a g e
Look in Variable View, the recode of Degree Type into three dummy variables:
BA_DUMMY BSC_DUMMY BCOM_DUMMY
Q4.2: Why is Degree Type the only independent variable recoded into multiple dummy variables? (4 pts)
Let’s Build Our Model:
Analyze Regression Linear
Let’s look at the effect of Degree Type on variation of annual salary: o For independent categorical variable, you cannot include all 3 dummy variables. o One need to be the baseline [when you will see the results it is become clear what
that means]. Let’s leave Bcom out of the model o Add BA_DUMMY & BCOM_DUMMY as independent variables o Click OK and run the model
Lab 3 – SPSS [ANOVA & REGRESSION]
14 | P a g e
LINEAR REGRESSION OUTPUT I
Coefficients TABLE
What do you see? Two things should be noteworthy:
1. degree-type-BA is statistically significant 2. degree-type-Bsc is statistically significant
Remark I
In plain language, the effect of holding a BA vs. two other types of degree help explain the variation in annual salary. More precisely, holding a BA has a negative effect on annual salary.
Remark II
Remember when we left Bcom out of the regression model earlier? Bcom is the referent. What is that means? Holding a BA has a negative effect on annual salary. Compare to Bcom holder, BA holder earn on average 17, 454.18$ less annually. You can see here that what a REGRESSION MODEL can do that a two-way ANOVA couldn’t Measuring the statistical significance of difference on a continuous variable (Salary) between two attributes (BA vs. BCOM) of a categorical variable.
Q4.3: Use the same interpretative logic contained in Remark I & II for degree-type-BSC. (6 pts)
Lab 3 – SPSS [ANOVA & REGRESSION]
15 | P a g e
LINEAR REGRESSION OUTPUT II
Model Summary Table
What does the Adjusted R Square tell us?
53.8% of the variation in annual salary amongst our 130 cases two years after graduate can be explained by one variable Degree-Type
CONTROL VARIABLE(S)
Since the beginning of the semester, Dr. Bartolic and I have been bugging you with control variable(s). We shall now see why.
We discussed the limitation of ANOVA earlier. That said, the two-way ANOVA reveals that gender was significant. It is thus important to add GENDER and see if its change anything to the explanatory power of degree-type on salary. In statistics language, we want to see if a portion of the variation that was attributed to degree-type in our previous model could not, in fact, due to gender.
Let’s add gender:
Analyze Regression Linear o Add GENDER to Independent(s) block o Click OK and run the regression
Lab 3 – SPSS [ANOVA & REGRESSION]
16 | P a g e
REGRESSION MODEL WITH GENDER
Q4.4: Looking at the Coefficients and Summary Table, what can you conclude on the effect of gender on salary? Also, did introducing gender to the regression model changed the overall effect of degree- type? (10 pts)
EXERCISE 5 – LINEAR REGRESSION– SALARY (more)
Keep working with Salary as a dependent variable. Look at the list of variables and find two new independent variables (not degree-type) you think might have an effect on salary.
Q5.1: What are the two independent variables you choose? (4 pts)
Q5.2: Why do you think they might influence salary? (6 pts)
Q5.3: Build and run the regression model. (10 pts)
Q5.4: Interpret the results. (10 pts)
Pick a first control variable (not gender) and add it to the model.
Q5.5: Run the regression model with control variable. (10 pts)
Q5.6: Interpret the results. (10 pts)
Add Gender and Degree-Type to your final model
Q5.7: Run the regression model. (10 pts)
Q5.8: Interpret the results. (10 pts)