Learning Outcome PPT for the Course Assignment

voyage
CLA2Presentation-BUS606.pptx

CLA 2 Presentation

BUS 606 Advanced Statistical Concepts And Business Analytics

Agenda

Introduction

Multiple linear regression is the most appropriate statistical technique in predicting the outcome of a dependent variable at different values (Keith, 2019).

The study assessed the relationship between the cost of constructing an LWR Plant and the three predictor variables S, N, and CT.

We assessed the association between the two-test used to examine the employee performance.

Assumption of Regression Analysis

Multicollinearity

Multicollinearity is the condition where the predictor variables are highly correlated (Alin, 2010).

Correlation Analysis

4

Assumption of Regression Analysis Cont’

Normality test

The normality assumption is not violated after transforming the outcome variable C, using natural log (C) (Shapiro-Wilk = 0.967, p = 0.414).

5

Results and Discussion – Regression Analysis

Use Residual Analysis and R2 to Check Your Model

The R-Squared of 0.232 indicates that the model can explain about 23.2% of ln(C)

The low R-Square indicated that the model does not fit the data well (Brown, 2009).

6

Results and Discussion Cont’

State which Variables are Important in predicting the cost of constructing an LWR plant?

S is a significant contributing factor in predicting ln(C)(p = 0.021), but N and CT have no significant effect in predicting (p > 0.05)

7

Results and Discussion Cont’

State a prediction equation that can be used to predict ln(C).

After dropping N and CT from the model since they do not have a significance effect in predicting ln(C), the prediction equation is given by:

Does adding CT improve R2? If so, by what amount?

Adding CT in the model changes R-Square by 0.001 from 0.232 to 0.234 which is not significant different from zero (p > 0.05).

8

Results and Discussion Cont’ - Correlational Analysis

Evaluate the correlation between the two scores and state if there seems to be any association between the two.

There was a weak positive correlation between the two tests (r = 0.187). This suggested that the two test scores were not correlated.

9

Results and Discussion Cont’

Find the probability of upgrading for each division of the sample by the Bayes’ theorem.

P(Up/T1) = P (T1/Up) P(Up) ÷ P(T1)

= (23/46*46/86) ÷43/86

= 23/43

P(Up/T2) = P (T2/Up) P(Up) ÷ P(T2)

= (23/46*46/86) ÷43/86

= 23/43

10

Results and Discussion Cont’

Find the probability of upgrading for each division of the sample by the naïve version of the Bayes’ theorem

P(Up/T1) = P (T1/Up) P(Up) ÷ P(T1)

= (23/46*46/86) ÷43/86

= 23/43

P(Up/T2) = P (T2/Up) P(Up) ÷ P(T2)

= (23/46*46/86) ÷43/86

= 23/43

11

Results and Discussion Cont’

Compare your results in parts b and c and explain the difference or indifference based on observed probabilities

Naïve version and Bayes theorem have similar probabilities.

We have only one predictor in each sample division

This is because Naïve is applied with Bayes's theorem with an assumption of independence between the features of predictor variables (Webb, 2010).

12

Conclusion and Recommendations – LWR Plant

Most of the variations of the outcome variable are explained by variables not included in the model.

Further analysis indicated that the S predictor had a significant effect in predicting the cost of constructing an LWR plant (C).

N and CT did not have a significant effect in predicting.

Should drop N and CT predictors from the model.

13

Conclusion and Recommendations – Employee Performance

The analysis also indicated that the two test were not related with each other but, the first test had the best ability in discriminating than the second test.

Which suggested that the first test was the best to use in predicting whether employees will be unsuccessful or successful in the position.

14

References

Alin, A. (2010). Multicollinearity. Wiley Interdisciplinary Reviews: Computational Statistics, 2(3), 370-374.

Brown, J. D. (2009). The coefficient of determination.

Daoud, J. I. (2017, December). Multicollinearity and regression analysis. In Journal of Physics: Conference Series (Vol. 949, No. 1, p. 012009). IOP Publishing.

Keith, T. Z. (2019). Multiple regression and beyond: An introduction to multiple regression and structural equation modeling. Routledge.

Puth, M. T., Neuhäuser, M., & Ruxton, G. D. (2014). Effective use of Pearson's product–moment correlation coefficient. Animal behaviour, 93, 183-189.

Webb, G. I. (2010). Naïve Bayes. Encyclopedia of machine learning, 15, 713-714.

Kolmogorov-Smirnov

a

Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

ln_C .104 32 .200

*

.967 32 .414

*. This is a lower bound of the true significance.

a. Lilliefors Significance Correction

Model R

R

Square

Adjusted

R Square

Std. Error of

the Estimate

Change Statistics

R Square

Change

F

Change df1 df2

Sig. F

Change

1 .482

a

.232 .179 .34240 .232 4.385 2 29 .022

2 .483

b

.234 .151 .34814 .001 .052 1 28 .822

a. Predictors: (Constant), N, S

b. Predictors: (Constant), N, S, CT

Model

Unstandardized

Coefficients

Standardized

Coefficients

t Sig.

Collinearity

Statistics

B Std. Error Beta Tolerance VIF

(Constant) 5.300 .277

19.161 .000

S .001 .000 .406 2.447 .021 .963 1.039

N .012 .010 .193 1.164 .254 .963 1.039

(Constant) 5.294 .283

18.718 .000

S .001 .000 .403 2.385 .024 .958 1.044

N .011 .010 .189 1.110 .276 .950 1.053

CT .028 .125 .038 .227 .822 .978 1.022

a. Dependent Variable: ln_C