Learning Outcome PPT for the Course Assignment
Running Head: LOGISTIC REGRESSION AND DISCRIMINANT ANALYSIS 1
LOGISTIC REGRESSION AND DISCRIMINANT ANALYSIS 9
LOGISTIC REGRESSION AND DISCRIMINANT ANALYSIS
NAME:
INSTRUCTOR:
DATE:
Part I
Logistic Regression
The age and gender of guests in a nursing home were examined whether they are the cause of deaths in 2015. Data was collected for gender, age and whether the guest died or not. In this case, death is our dependent variable while age and gender are the independent variables.
Since the dependent variable “died” was categorical with two levels, this was an indication that logistic regression analysis was suitable for prediction in this study (Austin & Merlo, 2017). The assumption of a dichotomous dependent variable was met in this case where “died” took to values 0) No, 1) Yes. The assumption of one or more predictor variables was met. Age was quantitative reporting respective ages of the guests. Gender was categorical with two levels 0) Females and 1) Males.
Analysis
The collected data was analyzed to examine the relationship between the predictor variables and the binary dependent variable. A sample of 284 guests was used for this study for easy analysis and generalizations. The overall logistic regression model is given as;
Table 1 shows the total number of participants and the valid sample that was utilized in this study.
Table 1: Logistic regression summary
The analysis showed that there were 144 successes and 140 failures. According to the results, the overall model was statistically significant with χ2 (2) = 82.46, p < 0.001 (Warner, 2020). This implies that we can carry on with the analysis.
Analysis showed that gender and age were both statistically significant and contributed to the variation in deaths. Gender was statistically significant where b = 1.96, OR = 7.08, p < 0.05 implying it had an impact on deaths. Age was also statistically significant where b = 0.196, OR = 1.22, p < 0.05. The likelihood of dying is 7.08 times higher in males as compared to females according to the odds ratio. Older people are 1.22 more likely to succeed in the tests (Norton et al., 2018).
The logistic regression equation that helps in prediction of death of a person given the age and gender will be given as;
Part II
Discriminant Analysis
Two tests were developed in a firm to determine whether some of the employees will perform in a given position. A sample of 43 employees was examined. The main aim is to group employees as either successful or unsuccessful by using the tests given.
Discriminant analysis suits this case since exclusive grouping was required and the dependent variable was categorical with two groups 0) Unsuccessful and 1) Successful (Bowerman et al., 2019). Two independent variables used in this study (Test 1 and Test 2) were quantitative reporting the scores of the employees in the two tests.
Analysis
Discriminant analysis was carried out in SPSS to classify the employees as successful or unsuccessful based on the two tests. Descriptive statistics were as shown in table 2.
Table 2: Descriptive statistics
|
Group Statistics |
|||||
|
Group |
Mean |
Std. Deviation |
Valid N (listwise) |
||
|
|
|
|
Unweighted |
Weighted |
|
|
Unsuccessful |
Test1 |
84.7500 |
4.24109 |
20 |
20.000 |
|
|
Test2 |
79.1000 |
4.38778 |
20 |
20.000 |
|
Successful |
Test1 |
92.4348 |
3.47492 |
23 |
23.000 |
|
|
Test2 |
84.7826 |
6.23740 |
23 |
23.000 |
|
Total |
Test1 |
88.8605 |
5.43175 |
43 |
43.000 |
|
|
Test2 |
82.1395 |
6.10847 |
43 |
43.000 |
The mean of test 1 in the unsuccessful group was 84.75 while for test 2 in the unsuccessful group was 79.10. The means for test 1 and 2 in the successful group were 92.43 and 84.78 respectively.
Table 3 shows the importance of the independent variables in the discriminant function used to group the employees.
Table 3: Test of equality of group means
|
Tests of Equality of Group Means |
|||||
|
|
Wilks' Lambda |
F |
df1 |
df2 |
Sig. |
|
Test1 |
.490 |
42.644 |
1 |
41 |
.000 |
|
Test2 |
.780 |
11.593 |
1 |
41 |
.001 |
According to the analysis, both tests scores were statistically significant in for the discriminant function.
Table 4 shows the correlation matrix of the predictor variables.
Table 4: Correlation matrix
|
Pooled Within-Groups Matrices |
|||
|
|
Test1 |
Test2 |
|
|
Correlation |
Test1 |
1.000 |
.187 |
|
|
Test2 |
.187 |
1.000 |
According to the analysis, the correlation between the scores of test 1 and test 2 was r = 0.19. This is a weak positive relationship implying the independent variables are not correlated.
The assumption of multivariate normality was examined and the test results were as shown in the Box’s M statistics given in table 10 (Ul Hassan et al., 2017).
Table 5: Homogeneity of covariance matrix
|
Test Results |
||
|
Box's M |
5.014 |
|
|
F |
Approx. |
1.582 |
|
|
df1 |
3 |
|
|
df2 |
936960.353 |
|
|
Sig. |
.191 |
|
Tests null hypothesis of equal population covariance matrices. |
According to the analysis, it is clear that groups did not differ in the covariance matrices implying that the assumption is not violated and the analysis can continue.
According to table 6, one discriminant function was found given the two-grouped dependent variable.
Table 6: Canonical discriminant function
|
Eigenvalues |
||||
|
Function |
Eigenvalue |
% of Variance |
Cumulative % |
Canonical Correlation |
|
1 |
1.161a |
100.0 |
100.0 |
.733 |
|
a. First 1 canonical discriminant functions were used in the analysis. |
The strong positive canonical correlation implies that there was a strong association between the discriminant function and the dependent variable (Uurtio et al., 2017).
Table 7 shows the coefficients of the independent variables.
Table 7: Standardized canonical discriminant function coefficients
|
Standardized Canonical Discriminant Function Coefficients |
|
|
|
Function |
|
|
1 |
|
Test1 |
.885 |
|
Test2 |
.328 |
According to the analysis, Test 1 had the best ability in discriminating as compared to Test 2. This implies that Test 1 is very significant in predicting whether employees will be successful or unsuccessful in the position.
Table 8 shows the unstandardized canonical coefficients of the model.
Table 8: Unstandardized canonical coefficients
|
Canonical Discriminant Function Coefficients |
|
|
|
Function |
|
|
1 |
|
Test1 |
.230 |
|
Test2 |
.060 |
|
(Constant) |
-25.380 |
|
Unstandardized coefficients |
The discriminant equation becomes;
D = -25.38 + 0.23*Test 1 + 0.06*Test 2
Table 9 shows the classification of the given variables.
Table 9: Classification
|
Classification Resultsa,c |
|||||
|
|
|
Group |
Predicted Group Membership |
Total |
|
|
|
|
|
Unsuccessful |
Successful |
|
|
Original |
Count |
Unsuccessful |
16 |
4 |
20 |
|
|
|
Successful |
5 |
18 |
23 |
|
|
% |
Unsuccessful |
80.0 |
20.0 |
100.0 |
|
|
|
Successful |
21.7 |
78.3 |
100.0 |
|
Cross-validatedb |
Count |
Unsuccessful |
16 |
4 |
20 |
|
|
|
Successful |
5 |
18 |
23 |
|
|
% |
Unsuccessful |
80.0 |
20.0 |
100.0 |
|
|
|
Successful |
21.7 |
78.3 |
100.0 |
|
a. 79.1% of original grouped cases correctly classified. |
|||||
|
b. Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by the functions derived from all cases other than that case. |
|||||
|
c. 79.1% of cross-validated grouped cases correctly classified. |
Analysis showed that 80% of the employees classified as unsuccessful were unsuccessful while 20% who were successful were classified as unsuccessful. 78.30% successful employees were classified as successful while 21.7% successful were classified as unsuccessful. In overall, 79.10% cases were correctly classified.
Reference
Austin, P. C., & Merlo, J. (2017). Intermediate and advanced topics in multilevel logistic regression analysis. Statistics in medicine, 36(20), 3257-3277.
Warner, R. M. (2012). Applied statistics: From bivariate through multivariate techniques. Sage Publications.
Norton, E. C., Dowd, B. E., & Maciejewski, M. L. (2018). Odds ratios—current best practice and use. Jama, 320(1), 84-85.
Bowerman, B., Drougas, A. M., Duckworth, A. G., Hummel, R. M. Moniger, K. B., & Schur, P. J. (2019). Business statistics and analytics in practice (9th ed.). McGraw-Hill
Ul Hassan, E., Zainuddin, Z., & Nordin, S. (2017). A review of financial distress prediction models: logistic regression and multivariate discriminant analysis. Indian-Pacific Journal of Accounting and Finance, 1(3), 13-23.
Uurtio, V., Monteiro, J. M., Kandola, J., Shawe-Taylor, J., Fernandez-Reyes, D., & Rousu, J. (2017). A tutorial on canonical correlation methods. ACM Computing Surveys (CSUR), 50(6), 1-33.