8210 wk 10 discc

Candyy31
Discussion7.docx

2

Estimating Models Using Dummy Variables

Name

Institution Name

Course Name

Professor’s Name

Date

2

The topic for this week's debate is the relationship between male and female respondents' labor force statuses. These variables' frequency distributions exhibited no outliers, according to the preliminary information. A total of 2536 people were included in the study. Male and female respondent scales (M=.4491, SD=.49750) were found to have normal distributions of scores on the labor force status (M= 3, SD= 2.355), as well as the female (M=.5509, SD=.4975). The gender of the respondent was used as a variable in a linear regression to predict their employment status. An F (1, 2534) = 73.585, p = 0.001 and a modest effect R2 =.028 were found in the investigation.

Respondents Sex:

Case Processing Summary

Cases

RESPONDENTS Valid

Missing

Total

SEX N

Percent

N

Percent

N

Percent

LABOR FORCE STATUS

MALE

1139

99.8%

2

0.2%

1141

100.0%

FEMALE

1397

100.0%

0

0.0%

1397

100.0%

Labor Force Statistics:

Descriptive Statistics

Mean

Std. Deviation

N

LABOR FORCE STATUS

3.00

2.355

2536

FemaleRespondent

.5509

.49750

2536

MaleRespondents

.4491

.49750

2536

Correlations:

LABOR FORCE STATUS

FemaleResponde nt

MaleRespondents

Pearson Correlation

LABOR FORCE STATUS

1.000

.168

-.168

FemaleRespondent

.168

1.000

-1.000

MaleRespondents

-.168

-1.000

1.000

Sig. (1-tailed) LABOR FORCE STATUS

.

.000

.000

FemaleRespondent

.000

.

.000

MaleRespondents

.000

.000

.

N

LABOR FORCE STATUS

2536

2536

2536

FemaleRespondent

2536

2536

2536

MaleRespondents

2536

2536

2536

ANOVAa

Model Sum of Squares

df

Mean Square

F

Sig.

1

Regression

396.625

1

396.625

73.585

.000b

Residual

13658.356

2534

5.390

Total

14054.981

2535

a. Dependent Variable: LABOR FORCE STATUS

b. Predictors: (Constant), MaleRespondents

Unstandardized Coefficients

Standardized Coefficients

Beta

t

Sig.

Correlations

Collinearity Statistics

Model B

Std. Error

Zero-

order

Partial

Part

Tolerance

VIF

1

(Constant)

3.354

.062

54.002

.000

MaleRespondent s

-.795

.093

-.168

-8.578

.000

-.168

-.168

-.16

8

1.000

1.000

a. Dependent Variable: LABOR FORCE STATUS

Excluded Variablesa

Model Beta In

t

Sig.

Partial Correlation

Collinearity Statistics

Tolerance

VIF

Minimum Tolerance

1 FemaleResponden t

.b

.

.

.

.000

.

.000

a. Dependent Variable: LABOR FORCE STATUS

b. Predictors in the Model: (Constant), MaleRespondents

Collinearity Diagnosticsa

Model Dimension Eigenvalue

Condition Index

Variance Proportions

(Constant)

MaleRespondents

1

1

1.670

1.000

.16

.16

2

.330

2.250

.84

.84

a. Dependent Variable: LABOR FORCE STATUS

RE: Discussion - Week 10

COLLAPSE

Top of Form

A dummy variable is a dichotomy with two categories not including missing values (Wagner, 2020).  Multiple regression is an extension of bivariate regression by examining the effect of two or more independent variables on the dependent variable (Frankfort-Nachmias et al., 2020). Dummy variables are useful for nominal variables in an analysis that requires interval or ratio variables to perfome multiple regression models. 

General Social Survey Dataset’s Mean of Age

In the General Social Survey Dataset, the older population's mean of age in the dataset is 49.01 likely implying there are older respondents in the population and the median is 49.00 which would make the data normally distributed, however, the lowest mode is 53. The median is the "middle score" of 49.00 which would make the data normally distributed, however, the level of skewness is positive which makes the data skewed slightly to the right. Thus, the mean indicates the results may not be generalizable across all age groups. I chose to display the data surrounding the respondent's highest degree and age of the respondents.

Research Question

Using the General Social Survey (GSS) dataset, a research question was constructed through a regression model to find an answer. The dependent variable is the respondents' highest year of school completed and the two independent variables are owning or buying a home and paying rent used to construct the research question. Understanding the highest year of school completed can influence individuals’ ability to either purchase a home or rent one. Using the GSS data, the following research question and hypothesis were developed:  

· Research question: Do people with a higher year of school completed own or are buying a home rather than paying rent? 

· The null hypothesis (Ho): There is no relationship between the highest year of school completed and owning or buying a home and paying rent. 

· The alternative hypothesis (Ha): There is a relationship between the highest year of school completed and owning or buying a home and paying rent. 

Interpretation of the Coefficients of the Model

Table 1 shows the coefficients of owning or buying a home and paying rent for the highest year of school completed. The unstandardized coefficient of own or is buying is 1.960 (table 1) is different than pays rent which is less than that by .839 (table 1) units of measure. The significance level is less than .001 (table 1) for both predictors of the own or is buying and pays rent which is well below the .05 threashold and we can reject the null hypothesis that there is no relationship between lived poverty index and problems with public health clinics and public schools (Walden University, 2016). Thus, both respondents' owning or buying and paying rent for their homes are significant predictors of respondents' highest year of school completed (Walden University, 2016). The Variance Inflation Factor (VIF) with a value above 10 shows serious multi-collinearity in the model meaning the independent variables of own/buying and pays rent have a high level of correlation between each other (Walden University, 2016). Therefore, we can assume that we have not met the assumption since the VIF is 16.106 (table 1). I don't think there is a possible remedy for this assumption violation because violating the assumption of linearity implies that the model fails to capture the systematic pattern of relationship between the dependent and independent variables (Nonlinearity, 1991). Consequently, there is no linear relationship in this model. 

Table 1

Coefficients

 

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

Collinearity Statistics

B

Std. Error

Beta

Tolerance

VIF

1

(Constant)

12.240

.596

 

20.554

<.001

 

 

dwelown=OWN OR IS BUYING

1.960

.603

.314

3.252

.001

.062

16.106

dwelown=PAYS RENT

.839

.608

.133

1.381

.168

.062

16.106

a. Dependent Variable: HIGHEST YEAR OF SCHOOL COMPLETED

Analysis of the Model Summary

The Durbin-Watson statistic has values from 0 to 4.0 and provides information about the independence of errors (Walden University, 2016). A model summary calculates a Durbin-Watson value of 1.622 which shows their is no correlation between residuals (Walden University, 2016). Dubrin-Watson values below 1.0 and above 3.0 are considered dangeours because the model suffers from serious correlation which is a lagged version of itself over various time intervals (Walden University, 2016). The model summary calculates the R square of 0.35 units revealing a negligible effect that are slightly meaningful since a a higher effect size means that the research finding has practical significance, while a negligible effect size indicates limited practical applications (Warner, 2012).

Diagnostics for the Regression Model

The residuals statistics in tabel 2 shows the Cook's Distance values which if are greater than or equal to 1.0 are considered problematic and further diagnotics should be performed for unduly influence (Walden University, 2016). Unduly influence the results of the analysis because their presence may signal that the regression model fails to capture important characteristics of the data (Outlying and Influential Data, 1991). For example, specific outliers for one or more of the variables that might be causing unduly influence on the model and have a significant impact (Walden University, 2016). Table 2 shows values for Cook's Distance ranging from a minimum of .000 to .037 (table 2) well below 1.0 revealing that we have no unduly influence in this model (Walden University, 2016). Having a negative minimum residual of -14.200 (table 2) means that the predicted value is too high, a positive maximum residual of 6.921 (table 2) means that the predicted value was too low. Thus, the aim of a regression model is to explain variability in dependent variable by means of one or more of independent or control variables. 

Table 2

Residuals Statistics

Residuals Statisticsa

 

Minimum

Maximum

Mean

Std. Deviation

N

Predicted Value

12.24

14.20

13.76

.569

1669

Std. Predicted Value

-2.672

.770

.000

1.000

1669

Standard Error of Predicted Value

.093

.596

.110

.061

1669

Adjusted Predicted Value

12.04

14.21

13.76

.569

1669

Residual

-14.200

6.921

.000

2.976

1669

Std. Residual

-4.769

2.324

.000

.999

1669

Stud. Residual

-4.771

2.326

.000

1.000

1669

Deleted Residual

-14.214

6.933

.000

2.981

1669

Stud. Deleted Residual

-4.803

2.329

.000

1.001

1669

Mahal. Distance

.612

65.721

1.999

7.879

1669

Cook's Distance

.000

.037

.001

.002

1669

Centered Leverage Value

.000

.039

.001

.005

1669

a. Dependent Variable: HIGHEST YEAR OF SCHOOL COMPLETED

Conclusion

The regression model is used to test the simple hypotheses and to analyze an association of causality across two or more independent variables. Visual displays of data can be invaluable because it makes it easier to understand the statistical findings through visual representation. After all, research can be useful when it organizes, summarizes, and communicates information (Frankfort-Nachmias et al., 2020). Therefore, the biggest takeaways from the respondents’ highest year of school completed is whether the model meets all the assumptions or can find possible remedies to any violations. 

References

Frankfort-Nachmias, C., Leon-Guerrero, A., & Davis, G. (2020). Social statistics for a diverse society (9th ed.). Sage Publications.

Nonlinearity. (1991). In J. Fox (Ed.), Regression Diagnostics. (pp. 54-62). SAGE Publications, Inc.

Outlying and Influential Data. (1991). In J. Fox (Ed.), Regression Diagnostics. (pp. 22-41). SAGE Publications, Inc.

Wagner, III, W. E. (2020). Using IBM® SPSS® statistics for research methods and social science statistics (7th ed.). Sage Publications.

Walden University, LLC. (Producer). (2016). Regression diagnostics and model evaluation [Video file]. Author.

Warner, R. M. (2012). Applied statistics from bivariate through multivariate techniques (2nd ed.). Sage Publications.

Bottom of Form