8210 wk 10 discc
2
Estimating Models Using Dummy Variables
Name
Institution Name
Course Name
Professor’s Name
Date
2
The topic for this week's debate is the relationship between male and female respondents' labor force statuses. These variables' frequency distributions exhibited no outliers, according to the preliminary information. A total of 2536 people were included in the study. Male and female respondent scales (M=.4491, SD=.49750) were found to have normal distributions of scores on the labor force status (M= 3, SD= 2.355), as well as the female (M=.5509, SD=.4975). The gender of the respondent was used as a variable in a linear regression to predict their employment status. An F (1, 2534) = 73.585, p = 0.001 and a modest effect R2 =.028 were found in the investigation.
Respondents Sex:
Case Processing Summary
|
|
Cases |
||||||
|
|
RESPONDENTS Valid |
Missing |
Total |
||||
|
|
SEX N |
Percent |
N |
Percent |
N |
Percent |
|
|
LABOR FORCE STATUS |
MALE |
1139 |
99.8% |
2 |
0.2% |
1141 |
100.0% |
|
|
FEMALE |
1397 |
100.0% |
0 |
0.0% |
1397 |
100.0% |
Labor Force Statistics:
Descriptive Statistics
|
Mean |
Std. Deviation |
N |
|
|
LABOR FORCE STATUS |
3.00 |
2.355 |
2536 |
|
FemaleRespondent |
.5509 |
.49750 |
2536 |
|
MaleRespondents |
.4491 |
.49750 |
2536 |
Correlations:
|
LABOR FORCE STATUS |
FemaleResponde nt |
MaleRespondents |
||
|
Pearson Correlation |
LABOR FORCE STATUS |
1.000 |
.168 |
-.168 |
|
|
FemaleRespondent |
.168 |
1.000 |
-1.000 |
|
|
MaleRespondents |
-.168 |
-1.000 |
1.000 |
|
Sig. (1-tailed) LABOR FORCE STATUS |
. |
.000 |
.000 |
|
|
FemaleRespondent |
.000 |
. |
.000 |
|
|
MaleRespondents |
.000 |
.000 |
. |
|
N |
LABOR FORCE STATUS |
2536 |
2536 |
2536 |
|
|
FemaleRespondent |
2536 |
2536 |
2536 |
|
|
MaleRespondents |
2536 |
2536 |
2536 |
ANOVAa
|
Model Sum of Squares |
df |
Mean Square |
F |
Sig. |
||
|
1 |
Regression |
396.625 |
1 |
396.625 |
73.585 |
.000b |
|
|
Residual |
13658.356 |
2534 |
5.390 |
|
|
|
|
Total |
14054.981 |
2535 |
|
|
|
a. Dependent Variable: LABOR FORCE STATUS
b. Predictors: (Constant), MaleRespondents
|
Unstandardized Coefficients |
Standardized Coefficients
Beta |
t |
Sig. |
Correlations |
Collinearity Statistics |
||||||
|
Model B |
Std. Error |
|
|
|
Zero- order |
Partial |
Part |
Tolerance |
VIF |
||
|
1 |
(Constant) |
3.354 |
.062 |
|
54.002 |
.000 |
|
|
|
|
|
|
|
MaleRespondent s |
-.795 |
.093 |
-.168 |
-8.578 |
.000 |
-.168 |
-.168 |
-.16 8 |
1.000 |
1.000 |
a. Dependent Variable: LABOR FORCE STATUS
Excluded Variablesa
|
Model Beta In |
t |
Sig. |
Partial Correlation |
Collinearity Statistics |
|||
|
|
|
|
|
Tolerance |
VIF |
Minimum Tolerance |
|
|
1 FemaleResponden t |
.b |
. |
. |
. |
.000 |
. |
.000 |
a. Dependent Variable: LABOR FORCE STATUS
b. Predictors in the Model: (Constant), MaleRespondents
Collinearity Diagnosticsa
|
Model Dimension Eigenvalue |
Condition Index |
Variance Proportions |
|||
|
|
|
(Constant) |
MaleRespondents |
||
|
1 |
1 |
1.670 |
1.000 |
.16 |
.16 |
|
|
2 |
.330 |
2.250 |
.84 |
.84 |
a. Dependent Variable: LABOR FORCE STATUS
RE: Discussion - Week 10
Top of Form
A dummy variable is a dichotomy with two categories not including missing values (Wagner, 2020). Multiple regression is an extension of bivariate regression by examining the effect of two or more independent variables on the dependent variable (Frankfort-Nachmias et al., 2020). Dummy variables are useful for nominal variables in an analysis that requires interval or ratio variables to perfome multiple regression models.
General Social Survey Dataset’s Mean of Age
In the General Social Survey Dataset, the older population's mean of age in the dataset is 49.01 likely implying there are older respondents in the population and the median is 49.00 which would make the data normally distributed, however, the lowest mode is 53. The median is the "middle score" of 49.00 which would make the data normally distributed, however, the level of skewness is positive which makes the data skewed slightly to the right. Thus, the mean indicates the results may not be generalizable across all age groups. I chose to display the data surrounding the respondent's highest degree and age of the respondents.
Research Question
Using the General Social Survey (GSS) dataset, a research question was constructed through a regression model to find an answer. The dependent variable is the respondents' highest year of school completed and the two independent variables are owning or buying a home and paying rent used to construct the research question. Understanding the highest year of school completed can influence individuals’ ability to either purchase a home or rent one. Using the GSS data, the following research question and hypothesis were developed:
· Research question: Do people with a higher year of school completed own or are buying a home rather than paying rent?
· The null hypothesis (Ho): There is no relationship between the highest year of school completed and owning or buying a home and paying rent.
· The alternative hypothesis (Ha): There is a relationship between the highest year of school completed and owning or buying a home and paying rent.
Interpretation of the Coefficients of the Model
Table 1 shows the coefficients of owning or buying a home and paying rent for the highest year of school completed. The unstandardized coefficient of own or is buying is 1.960 (table 1) is different than pays rent which is less than that by .839 (table 1) units of measure. The significance level is less than .001 (table 1) for both predictors of the own or is buying and pays rent which is well below the .05 threashold and we can reject the null hypothesis that there is no relationship between lived poverty index and problems with public health clinics and public schools (Walden University, 2016). Thus, both respondents' owning or buying and paying rent for their homes are significant predictors of respondents' highest year of school completed (Walden University, 2016). The Variance Inflation Factor (VIF) with a value above 10 shows serious multi-collinearity in the model meaning the independent variables of own/buying and pays rent have a high level of correlation between each other (Walden University, 2016). Therefore, we can assume that we have not met the assumption since the VIF is 16.106 (table 1). I don't think there is a possible remedy for this assumption violation because violating the assumption of linearity implies that the model fails to capture the systematic pattern of relationship between the dependent and independent variables (Nonlinearity, 1991). Consequently, there is no linear relationship in this model.
Table 1
Coefficients
|
Coefficientsa |
||||||||
|
Model |
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
Collinearity Statistics |
|||
|
|
B |
Std. Error |
Beta |
|
|
Tolerance |
VIF |
|
|
1 |
(Constant) |
12.240 |
.596 |
|
20.554 |
<.001 |
|
|
|
|
dwelown=OWN OR IS BUYING |
1.960 |
.603 |
.314 |
3.252 |
.001 |
.062 |
16.106 |
|
|
dwelown=PAYS RENT |
.839 |
.608 |
.133 |
1.381 |
.168 |
.062 |
16.106 |
|
a. Dependent Variable: HIGHEST YEAR OF SCHOOL COMPLETED |
Analysis of the Model Summary
The Durbin-Watson statistic has values from 0 to 4.0 and provides information about the independence of errors (Walden University, 2016). A model summary calculates a Durbin-Watson value of 1.622 which shows their is no correlation between residuals (Walden University, 2016). Dubrin-Watson values below 1.0 and above 3.0 are considered dangeours because the model suffers from serious correlation which is a lagged version of itself over various time intervals (Walden University, 2016). The model summary calculates the R square of 0.35 units revealing a negligible effect that are slightly meaningful since a a higher effect size means that the research finding has practical significance, while a negligible effect size indicates limited practical applications (Warner, 2012).
Diagnostics for the Regression Model
The residuals statistics in tabel 2 shows the Cook's Distance values which if are greater than or equal to 1.0 are considered problematic and further diagnotics should be performed for unduly influence (Walden University, 2016). Unduly influence the results of the analysis because their presence may signal that the regression model fails to capture important characteristics of the data (Outlying and Influential Data, 1991). For example, specific outliers for one or more of the variables that might be causing unduly influence on the model and have a significant impact (Walden University, 2016). Table 2 shows values for Cook's Distance ranging from a minimum of .000 to .037 (table 2) well below 1.0 revealing that we have no unduly influence in this model (Walden University, 2016). Having a negative minimum residual of -14.200 (table 2) means that the predicted value is too high, a positive maximum residual of 6.921 (table 2) means that the predicted value was too low. Thus, the aim of a regression model is to explain variability in dependent variable by means of one or more of independent or control variables.
Table 2
Residuals Statistics
|
Residuals Statisticsa |
|||||
|
|
Minimum |
Maximum |
Mean |
Std. Deviation |
N |
|
Predicted Value |
12.24 |
14.20 |
13.76 |
.569 |
1669 |
|
Std. Predicted Value |
-2.672 |
.770 |
.000 |
1.000 |
1669 |
|
Standard Error of Predicted Value |
.093 |
.596 |
.110 |
.061 |
1669 |
|
Adjusted Predicted Value |
12.04 |
14.21 |
13.76 |
.569 |
1669 |
|
Residual |
-14.200 |
6.921 |
.000 |
2.976 |
1669 |
|
Std. Residual |
-4.769 |
2.324 |
.000 |
.999 |
1669 |
|
Stud. Residual |
-4.771 |
2.326 |
.000 |
1.000 |
1669 |
|
Deleted Residual |
-14.214 |
6.933 |
.000 |
2.981 |
1669 |
|
Stud. Deleted Residual |
-4.803 |
2.329 |
.000 |
1.001 |
1669 |
|
Mahal. Distance |
.612 |
65.721 |
1.999 |
7.879 |
1669 |
|
Cook's Distance |
.000 |
.037 |
.001 |
.002 |
1669 |
|
Centered Leverage Value |
.000 |
.039 |
.001 |
.005 |
1669 |
|
a. Dependent Variable: HIGHEST YEAR OF SCHOOL COMPLETED |
Conclusion
The regression model is used to test the simple hypotheses and to analyze an association of causality across two or more independent variables. Visual displays of data can be invaluable because it makes it easier to understand the statistical findings through visual representation. After all, research can be useful when it organizes, summarizes, and communicates information (Frankfort-Nachmias et al., 2020). Therefore, the biggest takeaways from the respondents’ highest year of school completed is whether the model meets all the assumptions or can find possible remedies to any violations.
References
Frankfort-Nachmias, C., Leon-Guerrero, A., & Davis, G. (2020). Social statistics for a diverse society (9th ed.). Sage Publications.
Nonlinearity. (1991). In J. Fox (Ed.), Regression Diagnostics. (pp. 54-62). SAGE Publications, Inc.
Outlying and Influential Data. (1991). In J. Fox (Ed.), Regression Diagnostics. (pp. 22-41). SAGE Publications, Inc.
Wagner, III, W. E. (2020). Using IBM® SPSS® statistics for research methods and social science statistics (7th ed.). Sage Publications.
Walden University, LLC. (Producer). (2016). Regression diagnostics and model evaluation [Video file]. Author.
Warner, R. M. (2012). Applied statistics from bivariate through multivariate techniques (2nd ed.). Sage Publications.
Bottom of Form