Course Project

profileRahul666
samplepaper-1.pdf

2

Contents

Research Questions & Hypotheses ..................................................................................................3

Introduction & Methodology ...........................................................................................................3

Research Design ..........................................................................................................................3

Sample .........................................................................................................................................4

Buisness Understanding...............................................................................................................5

Results..............................................................................................................................................5

Descriptive Statistics ...................................................................................................................5

Normality .....................................................................................................................................7

Correlation ...................................................................................................................................8

Multiple Linear Regression Model -1......................................................................................9

Multiple Linear Regression Model 2. ........................................................................................14

Multiple Linear Regression Model-3.....................................................................................18

Discussion......................................................................................................................................23

Limitations .....................................................................................................................................23

Recommendations..........................................................................................................................24

3

Research Questions & Hypotheses

The purpose of the research study is to identify determinants of Ph.D. completion among

candidates including GPA and other selector metrics. To fulfill the purpose of the study, the

following research questions have been identified,

1. What are the determinants of Ph.D. completion among successful candidates?

2. Is GPA a proper factor to determine Ph.D. completion among candidates?

Based on the above research questions, the following Hypotheses have been devised,

H1 (Hypothesis 1): GPA, GRE determines Ph.D. completion among candidates.

H2 (Hypothesis 2): Letter of Recommendations, Student Motivation, determines Ph.D.

completion among candidates.

H3 (Hypothesis 3): Age, Gender determines Ph.D. completion among students.

H4 (Hypothesis 4): Emotional Stability, Financial Resources, Hostility and Social Abilities Mean

Rating of Selectors Impression of Applicant determines Ph.D. completion among candidates.

Introduction & Methodology

Research Design

The methodology of this research study is based upon a quantitative research design based on a

critical realist approach. Since the research study considers a sample in real-time, therefore, a

realist approach has been opted for, explaining the determinants of Ph.D. completion among

candidates. Moreover, based on the approach, the study can be called deductive since it deduces

generalizations based on results obtained through a proper detailed analysis provided in the

4

report ahead. On the other hand, as mentioned before the research strategy that has been opted

within this study is quantitative, using a more descriptive approach so that the determinants or

predictors of Ph.D. completion among candidates can be discussed in detail. Instead of a

longitudinal research study, a cross-sectional study has been considered to avoid any

discrepancies such as differences in grading plans, etc.

For the data collection process, among the sample chosen, a survey was distributed which was

filled by the students gradually. Therefore, a structured questionnaire was used for the survey.

The survey was based on 18 variables including gender, age, GPA, GRE scores, letter of

recommendations, motivation, stability, financial funding, marital status, age, social skills,

hostility, and impression. Among all these variables Ph.D. completion is the dependent variable

whereas GPA, GRE Scores, Motivation, Stability, Financial Resources, Hostility, Impression,

Letter of Recommendation, and Social Abilities have been identified to be independent variables.

All these variables have been measured on a 9-point Hedonic scale ranging from extremely low

to extremely high.

Sample

Since the study is regarding Ph.D. completion, 100 Ph.D. candidates from higher degree

institutes have been selected for this study’s sample. Each candidate was asked regarding the

status of completion of their degrees whereas the entire sample was chosen based on convenient

sampling. Convenient sampling helped in supporting the researcher since it provided direct ease

in selecting candidates who were easily in reach and available, reducing time and cost.

5

Buisness Understanding

For Data Analysis, the multiple linear regression model has been utilized using the SPSS

software to avoid any kinds of human error within the analysis of the results. As mentioned

above, Ph.D. completion has been considered as the dependent variable GPA, GRE Scores,

Gender, Age at entry ,Motivation, Stability, Financial Resources, Hostility, Impression, Letter of

Recommendation, and Social Abilities have been identified to be independent variables.

Results

Descriptive Statistics

Around 100 Ph.D. candidates were engaged within this research study to provide an effective

representation of the small sample considered. Among the entire sample, there were around 64%

Females i.e., 64 and 36% Males i.e., 36 out of a sample of 100 as displayed in Table 1.

Table 1. Gender

Gender

Frequency Percent Valid Percent Cumulative

Percent

Female 64 64.0 64.0 64.0

Male 36 36.0 36.0 100.0Valid

Total 100 100.0 100.0

Similarly, most of the candidates were single i.e., 60% whereas the rest were married accounting

to be only 40% of the entire sample as displayed in Table 2.

6

Table 2. Marital Status

Marital Status

Frequency Percent Valid Percent Cumulative

Percent

Married 40 40.0 40.0 40.0

Single 60 60.0 60.0 100.0Valid

Total 100 100.0 100.0

As a Ph.D. is a higher degree, the age bracket starts from 20 years of age. Around 76% of the

sample was between the ages of 20 to 30, followed by 20% within 31-40 and only 4 were either

41 or above. Table 3 displays the categorical distribution of the sample by age.

Table 3. Age

Age at Entry

Frequency Percent Valid Percent Cumulative

Percent

20-30 76 76.0 76.0 76.0

31-40 20 20.0 20.0 96.0

41 and Above 4 4.0 4.0 100.0 Valid

Total 100 100.0 100.0

Table 4. Ph.D. completion status

Ph.D. completion

Frequency Percent Valid Percent Cumulative

Percent

Completed 50 50.0 50.0 50.0

Incomplete 50 50.0 50.0 100.0Valid

Total 100 100.0 100.0

7

Coming towards the main sample in consideration, the entire sample is divided into Ph.D.

candidates who have completed their degree and who have not. Therefore, an equal sample is

considered so that an equal representation can be given to each, based on which, 50% of the

sample has completed their degree and 50% of the sample has not as displayed in Table 5.

Normality

Table 5. Descriptive Statistics

Descriptive Statistics

N Minimum Maximu

m

Mean Std.

Deviation

Skewness Kurtosis

Statistic Statistic Statistic Statistic Statistic Statistic Std.

Error

Statistic Std.

Error

1 Letter of

Recommendation

100 4 9 6.94 1.324 -.101 .241 -.844 .478

2 Letter of

Recommendation

100 4 9 7.00 1.303 -.168 .241 -.768 .478

3 Letter of

Recommendation

100 4 9 7.06 1.324 -.219 .241 -.783 .478

Student Motivation 100 6 9 7.82 .957 -.334 .241 -.848 .478

Emotional Stability 100 4 9 6.38 1.668 .121 .241 -1.211 .478

Financial Resources 100 3 9 5.78 1.685 .044 .241 -.754 .478

Interpersonal Skills 100 4 9 6.58 1.365 -.123 .241 -.639 .478

Hostility 100 1 5 2.60 1.044 .217 .241 -.422 .478

Selectors Impression

of Applicant

100 5 9 7.08 1.203 -.227 .241 -.970 .478

College GPA 100 2.75 3.97 3.5130 .26591 -.567 .241 .278 .478

Major GPA 100 3.20 4.00 3.7778 .19812 -.811 .241 .222 .478

GRE Specialty 100 520 790 652.20 70.504 -.041 .241 -.889 .478

GRE Quantitative 100 550 787 688.49 63.906 -.343 .241 -.899 .478

GRE Verbal 100 470 780 631.80 71.596 -.051 .241 -.652 .478

Valid N (listwise) 100

8

Since the Skewness and Kurtosis of the data are between +/-1, the data is considered normal as

displayed above in Table 5.

Correlation

Table 6 displays that Ph.D. completion significantly correlates with the GRE score of the

quantitative portion i.e., -0.608, along with Letter of recommendation-1, Letter of

Recommendation-3, Student motivation with a correlation coefficient of -0.592, –0.683, and -

0.567 respectively. All these variables are highly correlated with Ph.D. and have a strong

negative relationship. College GPA, Letter of Recommendation-2, and Age at entry are

moderately correlated and have a moderate negative relationship with Ph.D. with correlation

coefficient –0.433, –0.494, and –0.377. From these correlation values, we can assume that these

factors determine the Ph.D. completion of a candidate, however, there is no significant relation

reported in results with Impression of the Selector.

From the correlation table, we can also observe that the independent variables are highly

correlated with each other and have a strong positive relationship between them. This high

correlation between the independent variables leads to a multicollinearity issue. The following

are the highly correlated independent variables. College GPA and Major GPA with a correlation

coefficient of 0.901. GRE quantitative correlates with Letter of Recommendation-1 and Letter of

Recommendation- 3 whose correlation coefficients are 0.699 and 0.646. GRE specialty and GRE

Verbal with correlation 0.984, GRE quantitative, and student motivation with correlation 0.591.

Letter of Recommendation-1, Letter of Recommendation- 2 with student motivation with

correlation coefficients 0.501 and 0.567respectively. Letter of Recommendation-1 and Letter of

Recommendation- 3 with correlation 0.520.

9

Table 6. Correlation

Multiple Linear Regression Model -1

From the Table 7 model summary, we observe that the R-value = 0.850, which is the correlation

coefficient of the overall model with the dependent variable Ph.D. R square = 0.722, which is

also known as the coefficient of determination, explains that the model is a good fit model as

72.2% of the variance in Ph.D. completion can be predicted from the independent variables. Both

the R and R square values are good which from which we can assume our model may be a good

fit.

10

From the ANOVA (Analysis of Variance) table, the significance value of the overall model is

P<0.001 which explains that the model is statistically significant as it is less when compared to

alpha 0.05.

Table 7. Model 1

11

As we have 100 observations, so the degrees of freedom df= number of observations - 1 = 99.

With 95% of confidence interval and df=99, p-value =0.05, so the critical t-value is ±1.984.

Based on corresponding t-values and p-values we can tell which independent variables are good

for our model. This can be done by comparing the critical t-value against the calculated t-value,

the calculated t-value should not fall in the interval of critical t-value, also checking the

significance level of the independent variables whose p-value<0.05. This helps in interpreting

which independent variables to be considered as a good fit for our model.

Upon comparison based on the above discussion, we can understand that the independent

variables College GPA, Major GPA, GRE specialty, GRE verbal, GRE quantitative, 1 Letter of

recommendation, 2 Letter of recommendation, emotional stability, Marital status, interpersonal

skills, Hostility, selectors impression of applicants have p-value more than 0.005 and t-calculated

values fall in the interval of ±1.98 which is the t-critical value. Hence, indicating that these

variables are not statistically significant and may overfit our model. So, there is a need to drop

12

these variables and rerun our regression model without considering those variables which

violates our assumption of statistical significance.

13

From the histogram and the normal P-P plot, we can infer that the residuals(errors) are not much

normally distributed as the residuals do not fall along with the linear line in the normal P-P plot,

and the distribution of residuals in the histogram also does not seem normally distributed

violating the normally distributed errors assumption.

Homoscedasticity, independent errors, Linearity, normally distributed errors Assumptions:

A regression plane shown above tells us about the linearity of the multiple regression with the

standardized predictor and residual variables on the x-axis and y-axis respectively which are

cantered around Zero (0). We observe a pattern of dots rather than being randomly scattered,

showing that the successive residuals are correlated, and the errors are not normally distributed

proving that linearity assumption, independent errors, and normally distributed errors assumption

are violated. Because of the dots not being normally distributed, it may indicate that the

variances of the residuals are not constant which violates the Homoscedasticity assumption.

14

Multiple Linear Regression Model 2.

The following linear regression model is considered by dropping the independent variables from

the above regression model-1 based on the p and t-values.

Table 8 Correlations

From the above correlation table, we can infer that the independent variables have a very weak

relation among them, indicating that no multicollinearity issue. Letter of recommendation-3 and

student motivation are highly correlated with Ph.D., they are exhibiting a strong negative

relationship.

15

Linear Regression:

From the Table 8 model summary, we observe that the correlation coefficient R-value = 0.786,

coefficient of determination R square = 0.618, when compared with linear regression model-1are

low, yet the model is a good fit model as 61.8% of the variance in Ph.D. completion can be

predicted from the of the independent variables. There is no change in significance values, hence

the overall model is still statistically significant rejecting the null hypothesis with p<0.001 from

the ANOVA table.

Table 8 Model 2.

16

The critical t-value remains the same (i.e., ±1.98) which is obtained earlier as the df=99.

Applying the same conditions of comparing critical t-value against calculated t-value and

corresponding p-values of the independent variables to check which independent variables

needed to be dropped.

Depending on that we now conclude that the financial resources independent variable needs to

drop. This is because the t-value = -1.732, which falls in the interval of critical t-value, and the

corresponding p-value = 0.86 >0.05 indicating that the variable is not statistically significant.

17

From the graphs of the histogram and normal P-P plot between the residuals, we can interpret

that the residuals are normally distributed on the histogram and are aligned along the linear line

18

on the P-P plot, which is an improvement after dropping the variables. Hence, we may assume

that the normally distributed errors assumption is true.

The scatter plot between the standardized predictor values and residuals is an improvement when

compared to the model-1 regression plane. Yet this still needs to be improved as it violates the

homoscedasticity, linearity, independent errors assumptions because the residuals are not

randomly scattered.

Multiple Linear Regression Model-3

We are now considering a third regression model by dropping the financial resources

independent as it violated the assumption.

Correlation

Table 9 Correlations

From the above correlation table, we can infer that Letter of recommendation-3 and student

motivation are highly correlated with Ph.D. and are exhibiting a strong negative relationship and

there is no violation of the multicollinearity assumption.

19

Linear Regression Model-3:

From the Table 9 model summary, we observe that the correlation coefficient R-value = 0.778,

coefficient of determination R square = 0.606, explains that the model is a good fit model as

60.6% of the variance in Ph.D. completion can be predicted from the of the independent

variables. There is no change in significance values, hence the overall model is still statistically

significant rejecting the null hypothesis with p<0.001 from the ANOVA table. The F-value is

also good when compared to the previous model-1 and model-2.

Table 9 Model-3

20

From the coefficients table,

Gender: with the coefficient of 0.203, t-value = 2.923, p=0.04< 0.05, is statistically significant

independent variable.Letter of recommendation-3: with the coefficient of –0.194, t-value = -

6.926, p=0.001<0.05, is statistically significant independent variable.Student Motivation: with

the coefficient of –0.130, t-value = -3.291, p=0.001< 0.05, is statistically significant independent

variable.Age at Entry: with the coefficient of –0.169, t-value = -2.609, p=0.011< 0.05, is

statistically significant independent variable.Since all the four independent variables are

statistically significant, we can now write our regression equation to represent which factors

influence Ph.D.

PhD = 3.827 + 2.923*Gender – 0.194*Letter of recommendation -0.130*Student Motivation –

0.169*Age at Entry.

From the regression equation, we can interpret that,Gender: Ph.D. completion is determined by

gender with an increase in the factor of 2.923.Letter of Recommendation-3: Ph.D. completion is

determined by the letter of recommendation-3 with a decrease in the factor of –0.194. Student

Motivation: Ph.D. completion is determined by student motivation with a decrease in the factor

of –0.130.Age at entry: Ph.D. completion is determined by age at entry with a decrease in the

factor of –0.169.

21

22

Compared to the Model-2 residual graphs, the model-3 residual plots are far better. From the

graphs of the histogram and normal P-P plot between the residuals, we can interpret that the

residuals are normally distributed on the histogram and are aligned along the linear line on the P-

P plot, which is an improvement after dropping the variables. Hence, we may assume that the

normally distributed errors assumption is true.

The scatter plot between the standardized predictor values and residuals is an improvement when

compared to the model-2 regression plane. Yet this still needs to be improved because a few dots

are still closer to each other. This violates the homoscedasticity, linearity, independent errors

assumptions because the residuals are not randomly scattered.

23

Discussion

Based on the results of the regression analysis, Student Motivation, Letter of Recommendation-

3, Gender, Age at Entry can easily play the role of a determinant or predictor for Ph.D.

completion among candidates. Student Motivation, Letter of Recommendation-3 are strongly

correlated with Ph.D. and exhibit a strong negative relationship with correlation coefficients –

0.683, -0.567 respectively. Student Motivation can be assumed as the strongest predictor because

it is the willpower of an individual himself that makes him capable of achieving greater heights

and continue the path towards achieving the Ph.D. degree as well. On the other hand, the letter of

recommendations might provide information regarding the caliber and capabilities of the

individual, therefore are a great predictor of whether the candidate can complete the Ph.D.

degree in time. On the other hand, Age at entry and Gender are moderately correlated with

Ph.D., while age shows a negative correlation of –0.377, gender has a positive correlation of

0.250. All the four independent variables Student Motivation, Letter of Recommendation-3,

Gender, Age at Entry are statistically significant with each other independently. Among the four-

hypothesis derived from the research questions, H2 and H3 hypotheses are satisfied from

regression analysis of Model-3.

Limitations

In terms of Limitations, only a small sample i.e., a sample was considered to identify whether

GPA, GRE scores, and the Interpersonal Abilities of an individual are significant predictors of

Ph.D. completion among candidates. Therefore, a small sample cannot be a true representative of

the entire country but can be compared to a small locality. Moreover, only 100 candidates have

been selected within the locality which is also too short to be a true representative of a large

country .

24

Recommendations

The results can further be separated into samples accordingly i.e., people who completed the

Ph.D. as well as people who did not complete it. Based on the two different samples,

determinants can be predicted based on an independent t-test which would explain statistical

variances between the two samples. Moreover, more predictors can also be examined within the

study.