8210 wk 10 discc

Candyy31

Discussion7.docx

Home >Education homework help >8210 wk 10 discc

Estimating Models Using Dummy Variables

Name

Institution Name

Course Name

Professor’s Name

Date

The topic for this week's debate is the relationship between male and female respondents' labor force statuses. These variables' frequency distributions exhibited no outliers, according to the preliminary information. A total of 2536 people were included in the study. Male and female respondent scales (M=.4491, SD=.49750) were found to have normal distributions of scores on the labor force status (M= 3, SD= 2.355), as well as the female (M=.5509, SD=.4975). The gender of the respondent was used as a variable in a linear regression to predict their employment status. An F (1, 2534) = 73.585, p = 0.001 and a modest effect R2 =.028 were found in the investigation.

Respondents Sex:

Case Processing Summary

	Cases
	RESPONDENTS Valid	Missing	Total
	SEX N	Percent	N	Percent	N	Percent
LABOR FORCE STATUS	MALE	1139	99.8%	2	0.2%	1141	100.0%
	FEMALE	1397	100.0%	0	0.0%	1397	100.0%

Labor Force Statistics:

Descriptive Statistics

Mean	Std. Deviation	N
LABOR FORCE STATUS	3.00	2.355	2536
FemaleRespondent	.5509	.49750	2536
MaleRespondents	.4491	.49750	2536

Correlations:

LABOR FORCE STATUS	FemaleResponde nt	MaleRespondents
Pearson Correlation	LABOR FORCE STATUS	1.000	.168	-.168
	FemaleRespondent	.168	1.000	-1.000
	MaleRespondents	-.168	-1.000	1.000
Sig. (1-tailed) LABOR FORCE STATUS	.	.000	.000

	FemaleRespondent	.000	.	.000
	MaleRespondents	.000	.000	.
N	LABOR FORCE STATUS	2536	2536	2536
	FemaleRespondent	2536	2536	2536
	MaleRespondents	2536	2536	2536

ANOVAa

Model Sum of Squares	df	Mean Square	F	Sig.
1	Regression	396.625	1	396.625	73.585	.000b
	Residual	13658.356	2534	5.390
	Total	14054.981	2535

a. Dependent Variable: LABOR FORCE STATUS

b. Predictors: (Constant), MaleRespondents

Unstandardized Coefficients	Standardized Coefficients Beta	t	Sig.	Correlations	Collinearity Statistics
Model B	Std. Error				Zero- order	Partial	Part	Tolerance	VIF
1	(Constant)	3.354	.062		54.002	.000
	MaleRespondent s	-.795	.093	-.168	-8.578	.000	-.168	-.168	-.16 8	1.000	1.000

a. Dependent Variable: LABOR FORCE STATUS

Excluded Variablesa

Model Beta In

Sig.

Partial Correlation

Collinearity Statistics

Tolerance

VIF

Minimum Tolerance

1 FemaleResponden t

.000

a. Dependent Variable: LABOR FORCE STATUS

b. Predictors in the Model: (Constant), MaleRespondents

Collinearity Diagnosticsa

Model Dimension Eigenvalue	Condition Index	Variance Proportions
		(Constant)	MaleRespondents
1	1	1.670	1.000	.16	.16
	2	.330	2.250	.84	.84

a. Dependent Variable: LABOR FORCE STATUS

RE: Discussion - Week 10

COLLAPSE

Top of Form

A dummy variable is a dichotomy with two categories not including missing values (Wagner, 2020). Multiple regression is an extension of bivariate regression by examining the effect of two or more independent variables on the dependent variable (Frankfort-Nachmias et al., 2020). Dummy variables are useful for nominal variables in an analysis that requires interval or ratio variables to perfome multiple regression models.

General Social Survey Dataset’s Mean of Age

In the General Social Survey Dataset, the older population's mean of age in the dataset is 49.01 likely implying there are older respondents in the population and the median is 49.00 which would make the data normally distributed, however, the lowest mode is 53. The median is the "middle score" of 49.00 which would make the data normally distributed, however, the level of skewness is positive which makes the data skewed slightly to the right. Thus, the mean indicates the results may not be generalizable across all age groups. I chose to display the data surrounding the respondent's highest degree and age of the respondents.

Research Question

Using the General Social Survey (GSS) dataset, a research question was constructed through a regression model to find an answer. The dependent variable is the respondents' highest year of school completed and the two independent variables are owning or buying a home and paying rent used to construct the research question. Understanding the highest year of school completed can influence individuals’ ability to either purchase a home or rent one. Using the GSS data, the following research question and hypothesis were developed:

· Research question: Do people with a higher year of school completed own or are buying a home rather than paying rent?

· The null hypothesis (Ho): There is no relationship between the highest year of school completed and owning or buying a home and paying rent.

· The alternative hypothesis (Ha): There is a relationship between the highest year of school completed and owning or buying a home and paying rent.

Interpretation of the Coefficients of the Model

Table 1 shows the coefficients of owning or buying a home and paying rent for the highest year of school completed. The unstandardized coefficient of own or is buying is 1.960 (table 1) is different than pays rent which is less than that by .839 (table 1) units of measure. The significance level is less than .001 (table 1) for both predictors of the own or is buying and pays rent which is well below the .05 threashold and we can reject the null hypothesis that there is no relationship between lived poverty index and problems with public health clinics and public schools (Walden University, 2016). Thus, both respondents' owning or buying and paying rent for their homes are significant predictors of respondents' highest year of school completed (Walden University, 2016). The Variance Inflation Factor (VIF) with a value above 10 shows serious multi-collinearity in the model meaning the independent variables of own/buying and pays rent have a high level of correlation between each other (Walden University, 2016). Therefore, we can assume that we have not met the assumption since the VIF is 16.106 (table 1). I don't think there is a possible remedy for this assumption violation because violating the assumption of linearity implies that the model fails to capture the systematic pattern of relationship between the dependent and independent variables (Nonlinearity, 1991). Consequently, there is no linear relationship in this model.

Table 1

Coefficients

Coefficientsa
Model	Unstandardized Coefficients	Standardized Coefficients	t	Sig.	Collinearity Statistics
	B	Std. Error	Beta			Tolerance	VIF
1	(Constant)	12.240	.596		20.554	<.001
	dwelown=OWN OR IS BUYING	1.960	.603	.314	3.252	.001	.062	16.106
	dwelown=PAYS RENT	.839	.608	.133	1.381	.168	.062	16.106
a. Dependent Variable: HIGHEST YEAR OF SCHOOL COMPLETED

Analysis of the Model Summary

The Durbin-Watson statistic has values from 0 to 4.0 and provides information about the independence of errors (Walden University, 2016). A model summary calculates a Durbin-Watson value of 1.622 which shows their is no correlation between residuals (Walden University, 2016). Dubrin-Watson values below 1.0 and above 3.0 are considered dangeours because the model suffers from serious correlation which is a lagged version of itself over various time intervals (Walden University, 2016). The model summary calculates the R square of 0.35 units revealing a negligible effect that are slightly meaningful since a a higher effect size means that the research finding has practical significance, while a negligible effect size indicates limited practical applications (Warner, 2012).

Diagnostics for the Regression Model

The residuals statistics in tabel 2 shows the Cook's Distance values which if are greater than or equal to 1.0 are considered problematic and further diagnotics should be performed for unduly influence (Walden University, 2016). Unduly influence the results of the analysis because their presence may signal that the regression model fails to capture important characteristics of the data (Outlying and Influential Data, 1991). For example, specific outliers for one or more of the variables that might be causing unduly influence on the model and have a significant impact (Walden University, 2016). Table 2 shows values for Cook's Distance ranging from a minimum of .000 to .037 (table 2) well below 1.0 revealing that we have no unduly influence in this model (Walden University, 2016). Having a negative minimum residual of -14.200 (table 2) means that the predicted value is too high, a positive maximum residual of 6.921 (table 2) means that the predicted value was too low. Thus, the aim of a regression model is to explain variability in dependent variable by means of one or more of independent or control variables.

Table 2

Residuals Statistics

Residuals Statisticsa
	Minimum	Maximum	Mean	Std. Deviation	N
Predicted Value	12.24	14.20	13.76	.569	1669
Std. Predicted Value	-2.672	.770	.000	1.000	1669
Standard Error of Predicted Value	.093	.596	.110	.061	1669
Adjusted Predicted Value	12.04	14.21	13.76	.569	1669
Residual	-14.200	6.921	.000	2.976	1669
Std. Residual	-4.769	2.324	.000	.999	1669
Stud. Residual	-4.771	2.326	.000	1.000	1669
Deleted Residual	-14.214	6.933	.000	2.981	1669
Stud. Deleted Residual	-4.803	2.329	.000	1.001	1669
Mahal. Distance	.612	65.721	1.999	7.879	1669
Cook's Distance	.000	.037	.001	.002	1669
Centered Leverage Value	.000	.039	.001	.005	1669
a. Dependent Variable: HIGHEST YEAR OF SCHOOL COMPLETED

Conclusion

The regression model is used to test the simple hypotheses and to analyze an association of causality across two or more independent variables. Visual displays of data can be invaluable because it makes it easier to understand the statistical findings through visual representation. After all, research can be useful when it organizes, summarizes, and communicates information (Frankfort-Nachmias et al., 2020). Therefore, the biggest takeaways from the respondents’ highest year of school completed is whether the model meets all the assumptions or can find possible remedies to any violations.

References

Frankfort-Nachmias, C., Leon-Guerrero, A., & Davis, G. (2020). Social statistics for a diverse society (9th ed.). Sage Publications.

Nonlinearity. (1991). In J. Fox (Ed.), Regression Diagnostics. (pp. 54-62). SAGE Publications, Inc.

Outlying and Influential Data. (1991). In J. Fox (Ed.), Regression Diagnostics. (pp. 22-41). SAGE Publications, Inc.

Wagner, III, W. E. (2020). Using IBM® SPSS® statistics for research methods and social science statistics (7th ed.). Sage Publications.

Walden University, LLC. (Producer). (2016). Regression diagnostics and model evaluation [Video file]. Author.

Warner, R. M. (2012). Applied statistics from bivariate through multivariate techniques (2nd ed.). Sage Publications.

Bottom of Form