Basic Econometrics Part 1 Question 4,5,6 ONLY

profilegladyss
BEgroupass.docx

Basic Econometrics

Research Report Group Assignment

This is a group assignment where you can work alone or with up to four other students (a maximum group size of four). All group members will receive the same marks for the assignment. You must submit an electronic copy of your assignment in Canvas in pdf, doc or docx format. Hard copies will not be accepted. Show your tables and calculations as well as answering the questions in full sentences. Please make sure your tables of results are neatly formatted, not just copied and pasted from STATA, and that you write your answers in clear sentences. You should write no more than 1000 words (not including tables/calculations) in total for this assignment. The number of words, tables, graphs, calculations given in parentheses after each question are a guide.

PART 1

This assignment uses data from the BUPA health insurance call centre. Each observation includes data from one call to the call centre. The variables describe several characteristics of the call (eg the length of the call, the amount of silence in the call), characteristics of the customer (eg state of residence, family type, number of adults and children), and measures of performance (eg net promoter score, sentiment score of the customer). In this assignment we are interested in predicting the net promoter score and the length of the call.

Please use the dataset CallCentre.dta and associated information file CC_DEFINITIONS_.XLSX to answer these questions. Use the software program STATA 15 available through RMIT MyDesktop for all data analysis.

1. Calculate descriptive statistics using the ‘summarize’ command for the variables net_promoter_score, total_silence, total_silence_weighted, agent_to_cust_index and agent_crosstalk_weighted and present the results in a table. Comment on what we learn about these variables from the descriptives. Graph a scatter plot of net_promoter_score against agent_crosstalk_weighted and describe the relationship between these two variables.

(4.5 marks) (100 words, 1 table, 1 graph)

Measure by seconds, total_silence indicates the duration of silence on call. According the table, the average silence on call is 44 seconds with a range of fraction from 0s to 518s. Besides, it has a considerable variability with its standard deviation which is bigger than its average. The proportion of Total_silence_weighted and agent_crosstalk_weighted at the average is 0.01 and 0.02 respectively. Bigger standard deviation compared with to their mean cause a substantial variability for both. In the other hand, the duration of agent talking is longer than customer talking at 2.06 where the standard deviation is 0.5 less than mean.

Net_promoter_score and agent crosstalk weighted appear to positively correlated although there are some outliner values with high net promoter score and low agent crosstalk or low net promoter score and high agent crosstalk.

2. Estimate a multiple linear regression with net_promoter_score as the dependent variable and total_silence_weighted, agent_to_cust_index and agent_crosstalk_weighted as the explanatory (independent) variables. Predict the change in net_promoter_score associated with a 0.1 increase in total_silence_weighted and a 0.01 increase in agent_crosstalk_weighted. Assuming this is the correct model specification, are we sure that total_silence_weighted has a negative effect? [Hint: consider the t-statistic and p-value]

(6 marks) (50 words, 1 table, 2 calculations)

If total_silence_weighted increase by 0.1, the model predicts the net promoter score will decrease by 0.1x 0.015= 0.0058. If agent_crosstalk_agent increases by 0.1, the model predicts that the net promoter score will increase 0.01 x 7.55 = 0.0755.

To determine if total_silence _weighted has a negative effect, we could test the hypothesis that the coeffiecient on total_silence _weighted is less than zero. The t-statistic is – 0.16 and the p-value is 0.876.

3. Add dummy variables to the regression to control for all of the potential effects of State and Package. Make sure the base category is customers with the “HOSPITAL AND EXTRAS” package in NSW. Carefully interpret the estimated coefficient on the package1 dummy variable you have included. Why is this NOT a very important result?

[Hint: Use the variable labels to include and interpret the correct variables, consider the descriptive statistics of the dummy variables to interpret their importance]

(4.5 marks) (50 words, 1 table)

According the data, net_prmoter_score in package 3(Hospital and Extra) is lower than net_promoter_score in package 1(Ambulance only) by 0.69 and other things is equal.

The coefficient is statistically significant level of 10%.

_cons 8.460002 .1436005 58.91 0.000 8.178373 8.74163

package4 .0424604 .1215829 0.35 0.727 -.1959871 .280908

package2 .2375923 .170722 1.39 0.164 -.0972264 .572411

package1 .6856614 .3538092 1.94 0.053 -.0082266 1.379549

state6 .0365856 .1741556 0.21 0.834 -.3049671 .3781383

state5 -.1239567 .1237824 -1.00 0.317 -.3667177 .1188044

state4 -.3105358 .3121037 -0.99 0.320 -.9226312 .3015596

state3 -.1145238 .1469216 -0.78 0.436 -.4026652 .1736177

state2 -.0197515 .1263756 -0.16 0.876 -.2675985 .2280955

agent_crosstalk_weighted 7.627287 3.396154 2.25 0.025 .9667757 14.2878

agent_to_cust_index -.0080275 .0303648 -0.26 0.792 -.0675788 .0515237

total_silence_weighted -.0812762 .3733226 -0.22 0.828 -.8134338 .6508814

net_promoter_score Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 7547.17233 1,943 3.88428838 Root MSE = 1.9692

Adj R-squared = 0.0017

Residual 7491.99671 1,932 3.87784509 R-squared = 0.0073

Model 55.1756166 11 5.01596514 Prob > F = 0.2216

F(11, 1932) = 1.29

Source SS df MS Number of obs = 1,944

agent_cros~d 1,944 .0195041 .0140015 0 .092

agent_to_c~x 1,945 2.061445 1.50401 .142 14.674

total_sile~d 1,945 .0985188 .1252939 0 .665

total_sile~e 1,945 43.89775 72.24571 0 518.5

net_promot~e 1,945 8.567095 1.970395 0 10

Variable Obs Mean Std. Dev. Min Max

0

2

4

6

8

1

0

n

e

t

_

p

r

o

m

o

t

e

r

_

s

c

o

r

e

0.02.04.06.08.1

agent_crosstalk_weighted

Total 7547.17233 1,943 3.88428838 Root MSE = 1.9692

Adj R-squared = 0.0016

Residual 7523.12375 1,940 3.87789884 R-squared = 0.0032

Model 24.0485783 3 8.01619278 Prob > F = 0.1026

F(3, 1940) = 2.07

Source SS df MS Number of obs = 1,944

_cons 8.444175 .1226135 68.87 0.000 8.203707 8.684643

agent_crosstalk_weighted 7.556478 3.389054 2.23 0.026 .9099085 14.20305

agent_to_cust_index -.0089455 .030321 -0.30 0.768 -.0684107 .0505197

total_silence_weighted -.058195 .3722264 -0.16 0.876 -.7882008 .6718109

net_promoter_score Coef. Std. Err. t P>|t| [95% Conf. Interval]