business analytics
Perform a logit and probit analysis of the variables that affect whether a customer takes out a loan. Consider only main effects. Which variables are significant? How do the significant variables influence the likelihood of taking out a loan? Copy screen snapshots of your analysis in R to your report. (20%)
Answer –
Selected the attributes not in red as the ones that are meaningful, as impacting someone getting a personal loan. Personal Loan is the resulting value.
Legends –
Variables in bold are significant
Color notation
|
|
intutive |
|
|
non intutive |
|
0 |
no loan |
|
1 |
loan |
|
logit :
|
|
|
|
|
|
|
Coefficients: |
|
|
|
|
|
|
|
Estimate |
Std. Error |
z value |
Pr(>|z|) |
|
|
(Intercept) |
-9.4198094 |
0.45714157 |
-20.606 |
< 2e-16 |
*** |
|
Age |
0.00790014 |
0.00568947 |
1.389 |
0.16497 |
|
|
CCAvg |
0.06116208 |
0.03310581 |
1.847 |
0.06468 |
. |
|
CDAccount |
3.25884502 |
0.26617751 |
12.243 |
< 2e-16 |
*** |
|
CreditCard |
-0.9895626 |
0.18303085 |
-5.407 |
6.43E-08 |
*** |
|
Family |
0.84227277 |
0.06983048 |
12.062 |
< 2e-16 |
*** |
|
Income |
0.04261172 |
0.00201485 |
21.149 |
< 2e-16 |
*** |
|
Mortgage |
0.00006727 |
0.00048133 |
0.14 |
0.88885 |
|
|
SecuritiesAccount |
-0.8324365 |
0.2534961 |
-3.284 |
0.00102 |
** |
variables influence the likelihood of taking out a loan: variables in green and are significant and intutive
probit :
|
Coefficients: |
|
|
|
|
|
|
|
Estimate |
Std. Error |
z value |
Pr(>|z|) |
|
|
(Intercept) |
-4.9152874 |
0.22786528 |
-21.571 |
< 2e-16 |
*** |
|
Age |
0.00349745 |
0.00299645 |
1.167 |
0.24313 |
|
|
CCAvg |
0.04330281 |
0.0181783 |
2.382 |
0.017214 |
* |
|
CDAccount |
1.70007706 |
0.13661644 |
12.444 |
< 2e-16 |
*** |
|
CreditCard |
-0.482478 |
0.09313191 |
-5.181 |
2.21E-07 |
*** |
|
Family |
0.40058921 |
0.03538077 |
11.322 |
< 2e-16 |
*** |
|
Income |
0.02248318 |
0.00103494 |
21.724 |
< 2e-16 |
*** |
|
Mortgage |
0.00007086 |
0.00026259 |
0.27 |
0.787284 |
|
|
SecuritiesAccount |
-0.4296079 |
0.13052363 |
-3.291 |
0.000997 |
*** |
R commander screenshot :
2. Add moderating effects (interactions of variables). Which interactions make sense conceptually? Which interactions are statistically significant? How do you interpret the coefficients on these variables? Copy screen snapshots of your analysis in R to your report. (20%)
|
Coefficients: |
|
|
|
|
|
|
|
Estimate |
Std. Error |
z value |
Pr(>|z|) |
|
|
(Intercept) |
-3.6146848 |
0.4796968 |
-7.535 |
4.87E-14 |
*** |
|
Family |
-1.4895358 |
0.2119153 |
-7.029 |
2.08E-12 |
*** |
|
Income |
0.0001998 |
0.0038434 |
0.052 |
0.959 |
|
|
Family:Income |
0.0202891 |
0.0018521 |
10.955 |
< 2e-16 |
*** |
The moderating effects coefficient is positive , which means it has an accelerating impact.
|
|
definition |
input |
output |
|
(Intercept) |
always1 |
1 |
-3.6146848 |
|
Family |
number of family members |
4 |
-5.9581432 |
|
Income |
income in 1000 $ |
200 |
0.03996 |
|
Family:Income |
moderating effect |
800 |
16.23128 |
|
|
sum |
|
6.698412 |
|
|
exp(sum) |
|
811.116749 |
|
probability |
exp(sum)/(1+exp(sum)) |
100% |
3. Create a final regression model with the variables that you feel are important (both main effects and interaction terms). Create a spreadsheet prediction of the model. Which variables have the greatest influence on the customers’ loan behavior (combined main effects and interaction effects)? Perform a sensitivity analysis as seen earlier in the semester. Copy screen snapshots of your analysis in R to your report. (20%)
Logit
|
Coefficients |
: |
|
|
|
|
|
|
Estimate |
Std. Error |
z value |
Pr(>|z|) |
|
|
(Intercept) |
-9.18953 |
0.3619 |
-25.393 |
< 2e-16 |
*** |
|
CDAccount |
2.8646 |
0.22651 |
12.647 |
< 2e-16 |
*** |
|
CreditCard |
-0.88961 |
0.17821 |
-4.992 |
5.98E-07 |
*** |
|
Family |
0.84717 |
0.06939 |
12.21 |
< 2e-16 |
*** |
|
Income |
0.04461 |
0.00179 |
24.92 |
< 2e-16 |
*** |
|
|
|
|
|
|
|
|
|
|
definition |
input |
output |
|
|
|
(Intercept) |
always1 |
1 |
-9.18953 |
income |
|
|
CDAccount |
no idea |
1 |
2.8646 |
|
|
|
CreditCard |
number of cards |
1 |
-0.88961 |
|
|
|
Family |
number of family members |
4 |
3.38868 |
|
|
|
Income |
income in 1000 $ |
200 |
8.922 |
|
|
|
|
sum |
|
5.09614 |
|
|
|
|
exp(sum) |
|
163.390003 |
|
|
|
probability |
exp(sum)/(1+exp(sum)) |
99% |
|
|
|
|
|
|
|
|
|
|
sensitivity analysis |
|
|
|
|
|
|
|
|
|
|
|
|
family members |
|
|
|
|
99% |
1 |
2 |
3 |
4 |
|
10 |
0% |
1% |
1% |
3% |
|
20 |
0% |
1% |
2% |
5% |
|
30 |
1% |
2% |
3% |
8% |
|
40 |
1% |
2% |
5% |
11% |
|
50 |
2% |
4% |
8% |
17% |
|
60 |
2% |
6% |
12% |
24% |
|
70 |
4% |
8% |
18% |
33% |
|
80 |
6% |
12% |
25% |
44% |
|
90 |
9% |
18% |
34% |
55% |
|
100 |
13% |
26% |
45% |
65% |
|
110 |
19% |
35% |
56% |
75% |
|
120 |
27% |
46% |
66% |
82% |
|
130 |
36% |
57% |
76% |
88% |
|
140 |
47% |
67% |
83% |
92% |
|
150 |
58% |
76% |
88% |
95% |
|
160 |
68% |
83% |
92% |
96% |
|
170 |
77% |
89% |
95% |
98% |
|
180 |
84% |
92% |
97% |
99% |
|
190 |
89% |
95% |
98% |
99% |
|
200 |
93% |
97% |
99% |
99% |
|
210 |
95% |
98% |
99% |
100% |
|
220 |
97% |
99% |
99% |
100% |
Probit
|
Coefficient |
s: |
|
|
|
|
|
|
Estimate |
Std. Error |
z value |
Pr(>|z|) |
|
|
(Intercept) |
-4.8230308 |
0.176111 |
-27.386 |
< 2e-16 |
*** |
|
CDAccount |
1.5112701 |
0.1178771 |
12.821 |
< 2e-16 |
*** |
|
CreditCard |
-0.434615 |
0.0908606 |
-4.783 |
0.00000172 |
*** |
|
Family |
0.4034737 |
0.0351834 |
11.468 |
< 2e-16 |
*** |
|
Income |
0.0238448 |
0.0009083 |
26.251 |
< 2e-16 |
*** |
|
|
|
|
|
|
|
definition |
input |
output |
|
(Intercept) |
always1 |
1 |
-4.8230308 |
|
CDAccount |
no idea |
1 |
1.5112701 |
|
CreditCard |
number of cards |
1 |
-0.434615 |
|
Family |
number of family members |
4 |
1.6138948 |
|
Income |
income in 1000 $ |
200 |
4.76896 |
|
|
sum |
|
2.6364791 |
|
|
|
|
|
|
|
probability |
|
100% |
|
sensitivity analysis |
|
|
|
|
|
|
|
|
|
|
|
|
family members |
|
|
|
|
100% |
1 |
2 |
3 |
4 |
|
10 |
0% |
0% |
1% |
3% |
|
20 |
0% |
1% |
2% |
5% |
|
30 |
0% |
1% |
3% |
8% |
|
40 |
1% |
2% |
6% |
12% |
|
50 |
2% |
4% |
9% |
17% |
|
60 |
3% |
7% |
13% |
24% |
|
70 |
5% |
10% |
19% |
32% |
|
80 |
8% |
15% |
26% |
41% |
|
90 |
12% |
21% |
35% |
51% |
|
100 |
17% |
29% |
44% |
60% |
|
110 |
24% |
38% |
53% |
69% |
|
120 |
32% |
47% |
63% |
77% |
|
130 |
40% |
56% |
71% |
83% |
|
140 |
50% |
65% |
79% |
89% |
|
150 |
59% |
74% |
85% |
93% |
|
160 |
68% |
81% |
90% |
95% |
|
170 |
76% |
87% |
94% |
97% |
|
180 |
83% |
91% |
96% |
98% |
|
190 |
88% |
94% |
98% |
99% |
|
200 |
92% |
97% |
99% |
100% |
|
210 |
95% |
98% |
99% |
100% |
|
220 |
97% |
99% |
100% |
100% |
4. Perform a neural network analysis of the variables found to be significant in the logit and probit analysis above. Copy screen snapshots of your final neural network model in R to your report. (20%)
5. Create a prediction model of the neural network. Using the prediction model, perform a sensitivity analysis for the neural network model similar to the logit and probit sensitivity analysis. (20%)