statistics_work.docx

Statistics Week 4: Solution

1)

a) Variables that can be used in Pearson’s correlation table:

· Salary

· Compa

· Midpoint

· Performance Rating

· Service

· Age

· Raise

· Degree

Pearson’s correlation coefficients

Salary Compa Midpoint Age Rating Sevice

salary 1.00000000 0.61647174 0.98897178 0.5435797 0.1513070 0.45170496

Compa 0.61647174 1.00000000 0.50065775 0.1952180 -0.1012709 0.18207462

Midpoint 0.98897178 0.50065775 1.00000000 0.5671107 0.1917508 0.47114670

Age 0.54357969 0.19521802 0.56711066 1.0000000 0.1392384 0.56513321

Rating 0.15130696 -0.10127086 0.19175077 0.1392384 1.0000000 0.22570076

Service 0.45170496 0.18207462 0.47114670 0.5651332 0.2257008 1.00000000

Raise -0.04142104 -0.04273128 -0.02891341 -0.1804269 0.6736598 0.10278690

Degree 0.18938976 0.13937660 0.17674821 -0.1273210 0.0619469 0.02826732

Raise Degree

Salary -0.04142104 0.18938976

Compa -0.04273128 0.13937660

Midpoint -0.02891341 0.17674821

Age -0.18042685 -0.12732102

Rating 0.67365976 0.06194690

Service 0.10278690 0.02826732

Raise 1.00000000 0.04425923

Degree 0.04425923 1.00000000

Variables significantly related to:

a) Compa

Salary

Midpoint

b) Salary

Compa

Midpoint

Age

Service years

NB: Here we consider variables with |correlation|>|0.28|

c)

The main surprises that appear is that age shows a negative relationship with Raise. Someone would expect that as someone work for more years gaining more experience, the rise in salary should be incremental and not negative.

A second thing that causes surprise is that salary shows a negative correlation with raise. A person would expect salary to increase with raise and not the vice versa as is the case with this data.

e)

These relationships cannot help us to answer our question on equality because it expresses the relationship between two variables at a time. In addition, it assumes that at any time, only one variable affect our key variable of concern. It assumes absolute independence of these factors.

2)

Coefficients:

(Intercept) Midpoint Age Rating Service Raise

1.47306 1.19136 -0.02261 -0.09302 -0.02542 0.51829

Degree

0.48499

Therefore if y= salary, x1=midpoint, x2=Age, x3= Rating, X4= Service, X5= Raise, X6= Degree:

Then;

Analysis of Variance Table

Response: Salary

Df Sum Sq Mean Sq F value Pr(>F)

mid 1 17669.7 17669.7 2139.8992 < 2e-16 ***

age 1 7.9 7.9 0.9627 0.33201

rating 1 26.5 26.5 3.2077 0.08033 .

service 1 0.1 0.1 0.0125 0.91166

raise 1 4.1 4.1 0.4931 0.48635

degree 1 2.6 2.6 0.3151 0.57748

Residuals 43 355.1 8.3

Since p-value<0.05, then at alpha = 0.05, we reject the null hypothesis H0: B0=0. For all the variables, x1,x2,…..,x6; p-value>0.05 hence we fail to reject the null hypothesis that their coefficients are = 0.

Hence, the model is not significant in predicting salary.

3) Using Compa instead of Salary.

Therefore if y= Compa, x1=midpoint, x2=Age, x3= Rating, X4= Service, X5= Raise, X6= Degree:

Coefficients:

(Intercept) x1 x2 x3 x4 x5

1.0696151 0.0028372 -0.0004348 -0.0022836 -0.0002492 0.0172612

X6

0.0061451

Analysis of Variance Table

Response: y=Compa

Df Sum Sq Mean Sq F value Pr(>F)

X1 1 0.072491 0.072491 15.8338 0.0002613 ***

X2 1 0.003355 0.003355 0.7328 0.3967286

X3 1 0.011233 0.011233 2.4535 0.1245942

X4 1 0.000043 0.000043 0.0093 0.9234868

X5 1 0.004799 0.004799 1.0481 0.3116667

X6 1 0.000418 0.000418 0.0912 0.7640658

Residuals 43 0.196865 0.004578

H0: β=0 against H1: β≠0

At alpha =0.05, we reject H0:β0=0 but for all other coefficients we fail to reject H0: β1,…, β6 =0.

Hence the model is not significant in predicting Compa.

4)

If one has to consider only the mere regression line between Gender and Compa, one may be tempted to say that there exist some little relationship between Gender and Compa, or even say that Female employees are paid a little higher than their male counterparts

Coefficients:

(Intercept) Gender(x)

1.05624 0.01248 ; y= 1.05624 + 0.01248x…………if y=Compa

If we consider only gender Compa, the ANOVA indicates that there is no significant relationship between these two variables.

Analysis of Variance Table

Response: Compa

Df Sum Sq Mean Sq F value Pr(>F)

Gender 1 0.001947 0.0019469 0.3253 0.5711

Residuals 48 0.287256 0.0059845

Given the null hypothesis that H0: β=0 against H1: β≠0 , we will fail to reject the null hypothesis. This shows that there is little or no relationship between Gender and pay.

Also, considering the correlation coefficient between Gender and Compa which is 0.08204811; there seems to be any minimal relationship between Gender and pay.

Compa Gender

Compa 1.00000000 0.08204811

Gender 0.08204811 1.00000000

The best variable for analysis is the Compa because it gives a lower F value and a lower variance than when Salary is used.

The most interesting thing we got from the analysis of data is that there is equality in pay for both genders.

5)

The single factor tests and analysis could not provide a complete answer to our salary equality question because they produced results with high variability. This means that salary is a function of more than just one variable and needed a number of variables to explain.

An outcome such as one’s ability to perform a task effectively may be seen as linked to a number of factors such as the reward, job environment, skills among other related factors and can therefore be explained only by multiple regression and not simpler one-variable models.

Another aspect could be a student’s ability to succeed later in life. This does not only depend on the grades obtained at school but a lot more others. Such aspects cannot be explained by simpler one-variable models.