Woud
Statistics Week 4: Solution
1)
a) Variables that can be used in Pearson’s correlation table:
· Salary
· Compa
· Midpoint
· Performance Rating
· Service
· Age
· Raise
· Degree
Pearson’s correlation coefficients
Salary Compa Midpoint Age Rating Sevice
salary 1.00000000 0.61647174 0.98897178 0.5435797 0.1513070 0.45170496
Compa 0.61647174 1.00000000 0.50065775 0.1952180 -0.1012709 0.18207462
Midpoint 0.98897178 0.50065775 1.00000000 0.5671107 0.1917508 0.47114670
Age 0.54357969 0.19521802 0.56711066 1.0000000 0.1392384 0.56513321
Rating 0.15130696 -0.10127086 0.19175077 0.1392384 1.0000000 0.22570076
Service 0.45170496 0.18207462 0.47114670 0.5651332 0.2257008 1.00000000
Raise -0.04142104 -0.04273128 -0.02891341 -0.1804269 0.6736598 0.10278690
Degree 0.18938976 0.13937660 0.17674821 -0.1273210 0.0619469 0.02826732
Raise Degree
Salary -0.04142104 0.18938976
Compa -0.04273128 0.13937660
Midpoint -0.02891341 0.17674821
Age -0.18042685 -0.12732102
Rating 0.67365976 0.06194690
Service 0.10278690 0.02826732
Raise 1.00000000 0.04425923
Degree 0.04425923 1.00000000
Variables significantly related to:
a) Compa
Salary
Midpoint
b) Salary
Compa
Midpoint
Age
Service years
NB: Here we consider variables with |correlation|>|0.28|
c)
The main surprises that appear is that age shows a negative relationship with Raise. Someone would expect that as someone work for more years gaining more experience, the rise in salary should be incremental and not negative.
A second thing that causes surprise is that salary shows a negative correlation with raise. A person would expect salary to increase with raise and not the vice versa as is the case with this data.
e)
These relationships cannot help us to answer our question on equality because it expresses the relationship between two variables at a time. In addition, it assumes that at any time, only one variable affect our key variable of concern. It assumes absolute independence of these factors.
2)
Coefficients:
(Intercept) Midpoint Age Rating Service Raise
1.47306 1.19136 -0.02261 -0.09302 -0.02542 0.51829
Degree
0.48499
Therefore if y= salary, x1=midpoint, x2=Age, x3= Rating, X4= Service, X5= Raise, X6= Degree:
Then;
Analysis of Variance Table
Response: Salary
Df Sum Sq Mean Sq F value Pr(>F)
mid 1 17669.7 17669.7 2139.8992 < 2e-16 ***
age 1 7.9 7.9 0.9627 0.33201
rating 1 26.5 26.5 3.2077 0.08033 .
service 1 0.1 0.1 0.0125 0.91166
raise 1 4.1 4.1 0.4931 0.48635
degree 1 2.6 2.6 0.3151 0.57748
Residuals 43 355.1 8.3
Since p-value<0.05, then at alpha = 0.05, we reject the null hypothesis H0: B0=0. For all the variables, x1,x2,…..,x6; p-value>0.05 hence we fail to reject the null hypothesis that their coefficients are = 0.
Hence, the model is not significant in predicting salary.
3) Using Compa instead of Salary.
Therefore if y= Compa, x1=midpoint, x2=Age, x3= Rating, X4= Service, X5= Raise, X6= Degree:
Coefficients:
(Intercept) x1 x2 x3 x4 x5
1.0696151 0.0028372 -0.0004348 -0.0022836 -0.0002492 0.0172612
X6
0.0061451
Analysis of Variance Table
Response: y=Compa
Df Sum Sq Mean Sq F value Pr(>F)
X1 1 0.072491 0.072491 15.8338 0.0002613 ***
X2 1 0.003355 0.003355 0.7328 0.3967286
X3 1 0.011233 0.011233 2.4535 0.1245942
X4 1 0.000043 0.000043 0.0093 0.9234868
X5 1 0.004799 0.004799 1.0481 0.3116667
X6 1 0.000418 0.000418 0.0912 0.7640658
Residuals 43 0.196865 0.004578
H0: β=0 against H1: β≠0
At alpha =0.05, we reject H0:β0=0 but for all other coefficients we fail to reject H0: β1,…, β6 =0.
Hence the model is not significant in predicting Compa.
4)
If one has to consider only the mere regression line between Gender and Compa, one may be tempted to say that there exist some little relationship between Gender and Compa, or even say that Female employees are paid a little higher than their male counterparts
Coefficients:
(Intercept) Gender(x)
1.05624 0.01248 ; y= 1.05624 + 0.01248x…………if y=Compa
If we consider only gender Compa, the ANOVA indicates that there is no significant relationship between these two variables.
Analysis of Variance Table
Response: Compa
Df Sum Sq Mean Sq F value Pr(>F)
Gender 1 0.001947 0.0019469 0.3253 0.5711
Residuals 48 0.287256 0.0059845
Given the null hypothesis that H0: β=0 against H1: β≠0 , we will fail to reject the null hypothesis. This shows that there is little or no relationship between Gender and pay.
Also, considering the correlation coefficient between Gender and Compa which is 0.08204811; there seems to be any minimal relationship between Gender and pay.
Compa Gender
Compa 1.00000000 0.08204811
Gender 0.08204811 1.00000000
The best variable for analysis is the Compa because it gives a lower F value and a lower variance than when Salary is used.
The most interesting thing we got from the analysis of data is that there is equality in pay for both genders.
5)
The single factor tests and analysis could not provide a complete answer to our salary equality question because they produced results with high variability. This means that salary is a function of more than just one variable and needed a number of variables to explain.
An outcome such as one’s ability to perform a task effectively may be seen as linked to a number of factors such as the reward, job environment, skills among other related factors and can therefore be explained only by multiple regression and not simpler one-variable models.
Another aspect could be a student’s ability to succeed later in life. This does not only depend on the grades obtained at school but a lot more others. Such aspects cannot be explained by simpler one-variable models.