Learning Outcome PPT for the Course Assignment

profilevoyage
CLA2.docx

Introduction

The prediction is the process of determining the magnitude of predictors on the response variable. Prediction helps determine the value of the outcome variable in the future using predictors or factors included during the study. Moreover, multiple linear regression is a statistical test used in assessing the relationship between the response variable and more than one predictor variable (Keith, 2019). Also, Cacoullos (2014) argued that discriminant which is used in testing the equality of group centroids is associated with multivariate analysis of variance since it uses Wilks’ lambda applied in GLM multivariate

In this regard, multiple linear regression and discriminant analysis are the most appropriate statistical techniques in predicting the outcome of a dependent variable at different values of the predictor variables and assess the contribution of the predictors on the outcome and association between the independent variable respectively (Keith, 2019). Besides, the researcher can assess the variation of the outcome explained by the independent variables included in the model. The test is used in predicting the values of the response variable using more than one predictor variable. Discriminant analysis was used to assess the association between the two tests that were developed in a firm to examine employee performance. The study used a sample of 43 employees who were grouped as either successful or unsuccessful and performed the two test to assess their performance in a given position. Before conducting both multiple linear regression and discriminant, it is good to check whether the variables meet the necessary assumptions. For multiple linear regression dependent variable must be continuous and approximately normally distributed, while the predictor variables can either be continuous or categorical. For discriminant analysis the dependent variable must be divided into two or more groups. Besides, the predictor variables should not be correlated with each other, or there should be no multicollinearity (Alin, 2010) for both statistical tests. This can be assessed using the variance inflation factor, which should be less than ten, or through correlation coefficients between the predictor variables. The current study assessed the relationship between the cost of constructing an LWR plant and the three predictor variables S, N, and CT and assessed the association between the two test used to examine the employee performance.

Assumption of Regression Analysis

Multicollinearity

Correlation analysis is used in determining the strength and direction of association between two variables (Puth et al., 2014). The Pearson correlation coefficient is used for testing the strength and direction of association between two variables with a continuous level of measurement. However, when the variables have an ordinal level of measurement, we use the Spearman rank correlation coefficient (Puth et al., 2014). The Pearson correlation coefficient ranges from -1 to 1, with -1 or 1 indicating perfect correlation and zero indicating no correlation.

Table 1: Correlation Analysis

S

N

S

Pearson Correlation

1

.193

Sig. (2-tailed)

.289

N

32

32

N

Pearson Correlation

.193

1

Sig. (2-tailed)

.289

N

32

32

The correlation analysis in table 1 above indicates that the correlation between the two predictor variables (S and N) is positively weak and not significant at 0.05 level of significance (r = 0.289, p = 0.193). This suggested that multicollinearity does not exist and the assumption multicollinearity is not violated. Furthermore, based on the analysis, the variance inflation factor (VIF) is less than 10 (Daoud, 2017, December), suggesting that the multicollinearity assumption is not violated.

Normality test

Table 2: Tests of Normality

Kolmogorov-Smirnova

Shapiro-Wilk

Statistic

df

Sig.

Statistic

df

Sig.

ln_C

.104

32

.200*

.967

32

.414

*. This is a lower bound of the true significance.

a. Lilliefors Significance Correction

The test of normality of the dependent variable (ln(C)) revealed that the normality assumption is not violated after transforming the variable C, using natural log at 0.05 level of significance (Shapiro-Wilk = 0.967, p = 0.414).

Results and discussion

Regression Analysis

a. Use residual analysis and R2 to check your model. 

Table 3: Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

Change Statistics

R Square Change

F Change

df1

df2

Sig. F Change

1

.482a

.232

.179

.34240

.232

4.385

2

29

.022

2

.483b

.234

.151

.34814

.001

.052

1

28

.822

a. Predictors: (Constant), N, S

b. Predictors: (Constant), N, S, CT

The R-Squared of 0.232 indicates that the model can explain about 23.2% of ln(C) variation and 76.8% of the variation is explained by other variables not included in the model. Besides, the analysis indicated high residuals. The low R-Square and high residuals indicated that the model does not fit the data well (Brown, 2009).

b. State which variables are important in predicting the cost of constructing an LWR plant?

c. Table 4: Regression Coefficients

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

Collinearity Statistics

B

Std. Error

Beta

Tolerance

VIF

1

(Constant)

5.300

.277

19.161

.000

S

.001

.000

.406

2.447

.021

.963

1.039

N

.012

.010

.193

1.164

.254

.963

1.039

2

(Constant)

5.294

.283

18.718

.000

S

.001

.000

.403

2.385

.024

.958

1.044

N

.011

.010

.189

1.110

.276

.950

1.053

CT

.028

.125

.038

.227

.822

.978

1.022

a. Dependent Variable: ln_C

The regression analysis displayed in table 4 above indicates that S is a significant contributing factor in predicting ln(C) at a 0.05 level of significance (p = 0.021). However, the predictor variable N does not significantly predict the ln(C) at a 0.05 level of significance (p = 0.254). According to the analysis, there is no significant difference in ln(C) between the two levels of the cooling tower (p = 0.822), suggesting that the dummy variable CT does not have a significant effect in predicting ln(C). Therefore, the researcher used the S predictor to predict the cost of constructing an LWR plant but removed N and CT from the model.

c. State a prediction equation that can be used to predict ln(C).

After dropping N and CT from the model since they do not have a significance effect in predicting ln(C), the prediction equation is given by:

d. Does adding CT improve R2? If so, by what amount?

Based on the analysis displayed on table 3 above, there is no significant improvement in R-Squared after adding CT (p = 0.822). Adding CT in the model changes R-Square by 0.001 from 0.232 to 0.234 which is not significant different from zero.

Correlational Analysis

a. Evaluate the correlation between the two scores and state if there seems to be any association between the two.

Table 5: Pooled Within-Groups Matrices

Test1

Test2

Correlation

Test1

1.000

.187

Test2

.187

1.000

The correlation analysis is shown in table 5 above indicates that there was a weak positive correlation between the two tests (r = 0.187). This suggested that the two test scores were not correlated.

b. Find the probability of upgrading for each division of the sample by the Bayes’ theorem.

Given that: P(T1) = 43/86; P(T2) = 43/86

P (T1/Up) = 23/46; P (T2/Up) = 23/46

P(Up/T1) = P (T1/Up) P(Up) ÷ P(T1)

= (23/46*46/86) ÷43/86

= 23/43

P(Up/T2) = P (T2/Up) P(Up) ÷ P(T2)

= (23/46*46/86) ÷43/86

= 23/43

c. Find the probability of upgrading for each division of the sample by the naïve version of the Bayes’ theorem.

P(Up/T1) = P (T1/Up) P(Up) ÷ P(T1)

= (23/46*46/86) ÷43/86

= 23/43

P(Up/T2) = P (T2/Up) P(Up) ÷ P(T2)

= (23/46*46/86) ÷43/86

= 23/43

d. Compare your results in parts b and c and explain the difference or indifference based on observed probabilities

Since we have only one predictor in each sample division, the naïve version and Bayes theorem have similar probabilities. There is indifference based on observed probabilities. This is because it is applied with Bayes's theorem with an assumption of independence between the features of predictor variables (Webb, 2010).

Conclusion and Recommendations

The analysis revealed that the model with the three predictors predicting the cost of constructing an LWR plant does not fit the data well. This suggested that most of the variations of the outcome variable are explained by variables not included in the model. Further analysis indicated that the S predictor had a significant effect in predicting the cost of constructing an LWR plant (C). However, N and CT did not have a significant effect in predicting. Therefore, the researcher should drop N and CT predictors from the model and only use the S predictor in predicting the cost of constructing the LWR plant. The analysis also indicated that the two test were not associated with each other but, the first test had the best potential in discriminating than the second test. Which suggested that the first test was the best to use in predicting whether employees will be unsuccessful or successful in the position.

Nevertheless, the study did not control for alternative explanations that would affect the validity of the findings. Further study is needed that will include variables with a good fit of the data to help in predicting the cost of constructing an LWR plant. In addition, a study that will control all possible confounding is required to help in the prediction of the outcome variable and assessing the best test in predicting employee performance.

References

Alin, A. (2010). Multicollinearity. Wiley Interdisciplinary Reviews: Computational Statistics2(3), 370-374.

Brown, J. D. (2009). The coefficient of determination.

Cacoullos, T. (Ed.). (2014). Discriminant analysis and applications. Academic Press.

Daoud, J. I. (2017, December). Multicollinearity and regression analysis. In Journal of Physics: Conference Series (Vol. 949, No. 1, p. 012009). IOP Publishing.

Keith, T. Z. (2019). Multiple regression and beyond: An introduction to multiple regression and structural equation modeling. Routledge.

Puth, M. T., Neuhäuser, M., & Ruxton, G. D. (2014). Effective use of Pearson's product–moment correlation coefficient. Animal behaviour93, 183-189.

Webb, G. I. (2010). Naïve Bayes. Encyclopedia of machine learning15, 713-714.