STATA Assignment
2
Economists generally agree that an individual’s earnings are positively related to the
knowledge and skills obtained through education and experience. Numerous studies show that,
on average, people with more years of education do earn higher wages. Although some
economists believe that education increases a worker’s productivity and thus differences in
wages to a large degree “reflect the ‘skill premium’ commanded by relatively higher-educated,
better-trained workers” (Henderson 1), others argue that differences in workers individual
characteristics such as talents, personal skills and motivations are the driving forces behind the
wage inequality. Supporters of the screening (signaling) theory argue that education allows
employers “to screen individuals by innate ability” (Hoff and Stiglitz 405). This claim is based
on two arguments: 1) innate abilities of an individual are known to him/her but unobserved by
the potential employers and; 2) education is less costly for individuals with higher abilities (in
terms of time and efforts required for studying) and, consequently, people with high abilities tend
to obtain more education to “signal” to the employers their innate talents. The correlation
between unobserved capabilities and education and difficulties in measuring innate abilities of
workers makes distinguishing between the two effects difficult and, as some econometricians
believe, lead to an overstatement of returns on education-- creates a so called “ability bias”.
The estimation of returns on education has important policy implications. If education
raises person’s earnings regardless of his/her abilities, then policies that improve the quality of
education and provide financial assistance for those willing and able to study should be
implemented. If, however, different abilities are the underlying reasons for earnings differentials,
as the screening theory claims, then such policies would not be as beneficial.
The goal of my paper is to estimate the effect that education has and on wages and to
determine how it changes when controlled for ability.
For my regression analysis I used the data for 935 men surveyed in 1980. The data set
contains information on monthly earnings, education, experience, some demographic
characteristics, job location and average weekly hours.
After omitting observations with missing values, I had 663 valid observations for male
workers, ages 28-38. Among them, about 7.5% completed less than 12 years of schooling (do not
have a High School Diploma) with average monthly wages of $760.20, about 40.70% are High
School graduates with average monthly wages of $889.04, about 21.27% have associate degrees
or some certificates and earn average monthly wages of $1027.49, 18.25% are college graduates
3
with average monthly salaries of $1104.13 about 12.22% have postgraduate degrees or
certificates and earn average monthly wages of $1220.15. About 90% of men are married, 8.14%
are black and about 28% live in rural areas. The average wage in the sample is $ 988.48 per
month.
Since there are no data available for workers’ occupations and wages vary significantly
among different professions, I used a variable natural logarithm of monthly wages (lwage in the
data set) as the dependent variable and chose a log-linear functional form for my model. The log-
linear functional form allows to interpret effects that variables have on wages in percentage
terms and to capture some curvature in the relationships between wages and explanatory
variables.
The key explanatory variables in my base model are years of schooling and years of
experience (educ and exper). The average years of schooling are approximately 13.68 years. The
average years of experience in the sample are 11.40 years. There are, however, other factors
(besides education and experience) that affect wages. I believe that skills related to a current
occupation will have a positive effect on earnings, so the variable tenure (measured in years) is
also included in the model. Some workers might suffer from racial discrimination in the labor
market. To control for such possibility, I include a racial dummy variable, black. In part due to
increased motivation to seek higher paid jobs and relief from the household responsibilities,
married workers often have higher wages. Thus a dummy variable married is included in the
model. Because of lower population density, cost of living and prevalence of low-skilled jobs,
workers in rural and southern areas of the country tend to earn less. To account for this effect, I
include the dummy variables urban and south. In addition, a family background affects
individual’s professional and academic aspirations as well as personal characteristics, which
might be valued by employers. Therefore I include variables birth order (brthord ), number of
siblings (sibs) and parental education (Peduc). The firstborn children often baby sit younger
siblings and tend to develop a sense of responsibility and independence in decision making.
Children in large families, learn to carry on specific responsibilities as well as work as team
members. These personal characteristics might be valued in the labor market. Since I am not
interested whether a father’s or a mother’s education has a greater effect on earnings, but believe
that the presence of a parent with a higher level of education is likely to encourage a child to
4
obtain more education, I derive the variable parental education by choosing the maximum years
of schooling of the two parents (=MAX(Q2:R2)).
The last two variables are IQ and KWW. While the IQ score is widely used as a measure
of general intelligence, the KWW score , Knowledge of the World of Work, (less known) is used
as a measure of an individual’s knowledge of various occupations and related duties (Regan,
Oxaca and Burghardt 42). Although it is questionable whether IQ and KWW scores reflect
abilities that are rewarded in the labor market, the descriptive statistics show that sampled
workers with IQ scores above 100 (average),on average, earn higher wages and attain more
education than workers with average and below average IQ scores ($1081.27 and 14.63 years vs.
$866.91 and 12.42 years). In order to detect a possible bias, I do not include these two variables
in my first regression, but add them later as proxy variables for ability. Definitions and
descriptive statistics of the variables are presented in Table 1. Variables do not exhibit
multicollinearity (coefficients of correlation can be found in Table 2)
I do not include the variables hours (average weekly hours) and age (worker’s age) in my
regression analysis. . Although it might be argued that some individuals work longer hours and
thus earn more, this relationship between hours worked and wages depend on his/her pay basis.
While hourly-paid workers may increase their wages by working more hours, salaried workers
often may work more or less hours with very little or no effect on their earnings. Unfortunately,
there is no information on workers pay bases or/and occupations (from which I could probably
assume whether a worker paid hourly or per week). Thus to avoid a misrepresentation, I do not
include the variable hours in my model. Though a worker’s age might increase the likelihood of
an unlawful layoff (for older workers), it is unlikely to have a direct effect on his/her wages. Age
might be used as a proxy for experience. In this regression analysis, however, such a proxy is
unnecessary, because the variable years of experience is available in the data set. Therefore, the
variable age will not provide any additional information.
My base regression model for estimating the rate of return on education takes the
following form:
Ln(Wage)i=β0+β1Educi+β2Experi+β3Tenurei+β4Marriedi+β5Blacki+β6Southi+β7Urbani+β8Sibsi +β9Brthordi+β10Peduci+εi (1) Table 3 shows the results of estimating this equation. All the variables, except birth order
and the number of siblings, are significant at the 5 percent level. Although it is tempting to
5
conclude that these variables have no effect on earnings, I refrain from such conclusion until all
the variables are included in the regression. All estimated coefficients take the expected signs.
Holding other variables constant, an estimated rate of return to each additional year of
education is 6.12%.
The results of the regression show that the model explains 24.89% of variance in log of
monthly wages in the sample (Adjusted R2=0.24892).
My next step is to add the first proxy for ability, IQ, to the model:
Ln(Wage)i=β0+β1Educi+β2Experi+β3Tenurei+β4Marriedi+β5Blacki+β6Southi+β7Urbani+β8Sibsi +β9Brthordi+β10Peduci+ β11IQi+εi (2)
The results of this regression are presented in Table 4. The estimate for IQ is positive and
is statistically significant (p-value 0.00316197). At this stage, ability appears to affect wages.
The increased value of the adjusted R2 (25.77%) and the decreased value of the sum of squared
residuals (82.10) indicate that the addition of the variable IQ improved the fit of the model.
Although most coefficients did not change significantly, the rate of return to education dropped
from 6.12% to 5.05%, confirming that the return to education is influenced by the worker’s IQ.
Another, and rather surprising result, is a large increase in the coefficient for race. The estimate
for the dummy variable black increased from -15.64% to -11.74%.
Moreover, the variable IQ affected statistical significance of other variables. The
variables south and parental education became insignificant at the 5% level (the t-statistics are -
1.69 and 1.94 respectively).
To see which of the variables representing ability has a greater impact on returns on
education, I replace the variable IQ with KWW:
Ln(Wage)i=β0+β1Educi+β2Experi+β3Tenurei+β4Marriedi+β5Blacki+β6Southi+β7Urbani+β8Sibsi +β9Brthordi+β10Peduci+ β12KWWi+εi
(3)
The results of this regression are reported in Table 5. The impact of the variable KWW
on the regression as a whole seems to be almost identical to the impact of the variable IQ. The
values of the adjusted R2 and the sum of squared residuals hardly changed (25.54% and 82.37
respectively). The variable KWW seems to have a somewhat smaller impact on the rate of
return on education—the coefficient is slightly higher with KWW (5.63%) in the regression than
with IQ (5.05%). But the difference is minute. Nevertheless, unlike the variable IQ, the variable
6
KWW affected returns on experience and tenure. The coefficients slightly changed, perhaps, due
to the nature of the test: workers with more years of experience in the labor market would be
expected to score higher on the test.
At last, I ran the regression with both proxies for ability:
Ln(Wage)i=β0+β1Educi+β2Experi+β3Tenurei+β4Marriedi+β5Blacki+β6Southi+β7Urbani+β8Sibsi +β9Brthordi+β10Peduci+ β11IQi+ β12KWWi+εi
(4)
The results of this regression can be found in Table 6. The fit of the regression
improved—the adjusted R2 increased to 26.15% and the sum of squared residuals decreased to
81.56. Both proxies for ability are statistically significant at the 5% level. Although the return on
years of schooling decreased even more and is now 4.53%, indicating that the previously
estimated returns to education were positively biased, the effect of education on earnings is
significant. Before proceeding further, I conduct the global F-test to check the overall adequacy
of the model for predicting wages.
H0: β1=β2=β3=β4=β5=β6=β7=β8=β9=β10=β11=β12=0
HA: At least one of the coefficients is not equal to 0.
r=12 d.f=650 α=0.05
Test Statistics:
F=20.54
Rejection region:
F> 1.7671
Since the calculated F value is greater than the critical F-value, I reject the null
hypothesis at the 5% level of significance. The test indicates that the model is adequate for
explaining variability of wages.
Now that the usefulness of the model has been verified, I perform the t-test to determine
whether education has a positive effect on wages:
H0: β1=0
HA: β1>0
d.f=650 α=0.05 t0.05= 1.647202 Test statistics:
0.045330337-0 t= 0.00874542
=5.183323294
7
Rejection Region: t≥1.647202 Since the calculated t value falls in the rejection region, I reject the null hypothesis at the
5 percent level of significance. The test confirms that education has a positive statistically
significant effect on wages.
According to the results of this regression, the estimated increase in wages with
additional year of experience is 1.43%. And an additional year with current employer raises
worker’s earnings by approximately 0.8%. The regression results also confirm existence of a
“marriage premium”–married workers earn approximately 19.76% more. Male workers in urban
areas earn 19.26% more than male workers in rural areas with the same characteristics, i.e.,
education, experience, tenure, marital status, race and family characteristics.
To determine whether worker’s ability affects his/her wages, I perform an F-test for joint
significance of variables IQ and KWW:
H0: β11=β12=0
HA: At least one of the coefficients is not equal to 0.
R=2 d.f=650 α=0.01
Test Statistics:
(83.207236-81.558914)/2 F=
81.558914/650 =6.57
Rejection region:
F> 4.638
Since the calculated F value is greater than the critical F-value, I reject the null
hypothesis at the 1% level of significance. The test indicates that the variables IQ and KWW
improve the usefulness of the model in predicting wages. The test seems to support the argument
of the screening theory that workers’ abilities affect their earnings: workers with IQ scores 110
(high average), for instance, on average earn 3.07% more than workers with average IQ scores.
The presence of the variables IQ and KWW affected not only returns on education,
experience and tenure but other variables as well. The estimated difference in wages due to racial
discrimination is now10.41% (5.24% below estimated in the initial regression). In addition,
parental education appears to have statistically insignificant effect on wages when controlled for
ability.
8
In order to decide whether a worker’s family background has any impact on his/her
earnings, I use the F-test. The results of estimating restricted model are shown in Table 7.
Ln(Wage)i=β0+β1Educi+β2Experi+β3Tenurei+β4Marriedi+β5Blacki+β6Southi+β7Urbanii+
β11IQi+ β12KWWi+εi
(5)
H0: β9=β10= β11 =0
HA: At least one of the coefficients is not equal to 0.
r=3 d.f=650 α=0.05
Test Statistics:
(82.365031-81.558914)/3 F=
81.558914/650 =2.1415
Rejection region:
F> 2.619
Since the computed value of F does not fall in the rejection region, I fail to reject the null
hypothesis at 5% level of significance. The test result is somewhat unexpected and contradicts
my hypothesis that family background of an individual affects his/her wages—family
characteristics seem to have no or very little effect on wages. Though the sum of squared
residuals slightly increased and the value of adjusted R2 decreased to 25.77%, exclusion of
variables parental education, birth order and number of siblings had practically no effect on
estimates for the remaining parameters. Thus these variables, perhaps, should be dropped from
the model.
9
Conclusion
An individual’s wage is determined by numerous factors. Among them, the most
fundamental are knowledge, skills, ability and luck. Some of these factors are easier to measure
than other. Education is traditionally viewed as the main source of knowledge and skills which
are valued in the labor market. The estimation of returns on education, however, is not an easy
task. Although through my regression analysis I found evidence of an upward ability bias, the
marginal return on education is still quite high and statistically significant. After controlling for
ability, the estimated return on an additional year of education for male workers with the same
abilities, experience and other characteristics is 4.53%.
One of the main issues of concern for policymakers is the economic well-being of the
population, which is measured with poverty rates and average wages. As my regression analysis
shows, education increases workers’ wages regardless of their abilities. Thus, policies directed
on improvement of education and providing financial assistance in obtaining more education to
those in need will benefit society as a whole. For an individual, on the other hand, education still
remains the first step on the way to success in the labor market.
10
Table 1: Descriptive Statistics
Variables Description Mean Median Standard Deviation Minimum Maximum
wage Monthly earnings in dollars 988.48 937 406.5115119 115 3078
lwage Natural log of monthly earnings 6.81 6.842683 0.412206453 4.744932 8.032035
educ Years of education 13.68 13 2.231405597 9 18
exper Years of experience 11.40 11 4.258397196 1 22
tenure Years with current employer 7.22 7 5.055690459 0 22
married Takes 1 if married 0.90 1 0.299621776 0 1
black Takes 1 if black 0.08 0 0.273728342 0 1
south Takes 1 if live in south 0.32 0 0.467890576 0 1
urban Takes 1 if live in SMSA 0.72 1 0.449603727 0 1
sibs Number of siblings 2.85 2 2.240895542 0 14
brthord Birth order 2.18 2 1.48761173 1 10
Peduc
Maximum years of education completed by one of the parents 11.56 12 2.727012032 2 18
IQ IQ score 102.48 104 14.68611745 54 145
KWW
Knowledge of the World Work score 36.19 37 7.529187963 13 56
11
Table 2: Correlations
educ exper tenure married black south urban sibs brthord Peduc IQ KWW educ 1 exper -0.4508 1 tenure -0.0301 0.2901 1 married -0.0567 0.0985 0.0701 1
black -0.1156 0.0215 -
0.0543 -0.0483 1
south -0.0573 -0.0340 -
0.0884 0.0033 0.1836 1
urban 0.1002 -0.0380 -
0.0310 -0.0394 0.0755 -0.1146 1
sibs -0.1972 0.0131 -
0.0493 0.0019 0.2692 0.0474 -
0.0444 1
brthord -0.1762 0.0437 -
0.0168 -0.0144 0.1276 0.1322 -
0.0246 0.5783 1 Peduc 0.4217 -0.2170 0.0088 -0.0384 -0.1952 -0.1334 0.0898 -0.2228 -0.2575 1 IQ 0.5435 -0.2316 0.0185 -0.0124 -0.3108 -0.1627 0.0555 -0.2473 -0.2033 0.3504 1 KWW 0.4063 0.0216 0.1700 0.0575 -0.2371 -0.0792 0.1103 -0.2493 -0.1516 0.2808 0.4103 1
12
Table 3:Results of the regression #1 Regression Statistics
Multiple R 0.510166137 R Square 0.260269487 Adjusted R Square 0.248923927 Standard Error 0.35723726 Observations 663 ANOVA
df SS MS F Significance F Regression 10 29.27593798 2.927593798 22.94020625 4.79527E-37 Residual 652 83.20723603 0.12761846 Total 662 112.483174
Coefficients Standard
Error t Stat P-value Lower 95% Upper 95% Intercept 5.305937968 0.143916722 36.868113 1.2407E-161 5.023341919 5.588534017 educ 0.061161295 0.007640862 8.004501913 5.5094E-15 0.046157636 0.076164955 exper 0.01632547 0.003870203 4.218245784 2.81075E-05 0.008725906 0.023925033 tenure 0.008790444 0.002906445 3.024466057 0.002588633 0.003083325 0.014497563 married 0.204897947 0.046738891 4.383885495 1.35876E-05 0.113121079 0.296674815 black -0.156467081 0.054655272 -2.862799413 0.004333985 -0.26378862 -0.049145543 south -0.061210506 0.030981192 -1.975731136 0.048606446 -0.122045427 -0.000375584 urban 0.19929221 0.031555264 6.315656536 4.97902E-10 0.137330036 0.261254384 sibs 0.00585853 0.007937403 0.738091561 0.460724382 -0.009727419 0.021444479 brthord -0.018111359 0.011720817 -1.545230061 0.122775836 -0.041126451 0.004903733 Peduc 0.0129322 0.005837182 2.215486819 0.027071388 0.001470262 0.024394139
13
Table 4: Results of the regression#2 Regression Statistics
Multiple R 0.519721068 R Square 0.270109989 Adjusted R Square 0.257776978 Standard Error 0.355125614 Observations 663 ANOVA
df SS MS F Significance F Regression 11 30.38282887 2.762075352 21.9013824 3.56955E-38 Residual 651 82.10034513 0.126114201 Total 662 112.483174
Coefficients Standard
Error t Stat P-value Lower 95% Upper 95% Intercept 5.097930215 0.159366123 31.98879478 1.1636E-135 4.78499687 5.410863559 educ 0.050451094 0.008412132 5.997420456 3.32466E-09 0.033932925 0.066969264 exper 0.016399211 0.003847407 4.262406123 2.32151E-05 0.008844394 0.023954029 tenure 0.008673671 0.002889534 3.001754685 0.002787056 0.002999746 0.014347596 married 0.203702378 0.046464368 4.384055738 1.35804E-05 0.112464355 0.294940401 black -0.117446161 0.055905904 -2.100782779 0.036044009 -0.227223704 -0.007668617 south -0.052395498 0.030941458 -1.693375191 0.090862462 -0.113152538 0.008361543 urban 0.198682525 0.031369414 6.333638315 4.46579E-10 0.137085144 0.260279906 sibs 0.007296729 0.007905404 0.923005192 0.35634646 -0.008226423 0.02281988 brthord -0.017156326 0.011655993 -1.471888773 0.141534159 -0.040044182 0.005731531 Peduc 0.011354992 0.005827049 1.948669286 0.051764164 -8.7076E-05 0.022797059 IQ 0.003533193 0.001192606 2.962582209 0.003161971 0.001191377 0.005875008
14
Table 5:Results of the regression#3 Regression Statistics
Multiple R 0.517445512 R Square 0.267749858 Adjusted R Square 0.255376968 Standard Error 0.355699307 Observations 663 ANOVA
df SS MS F Significance F Regression 11 30.11735391 2.737941265 21.64004148 9.81808E-38 Residual 651 82.36582009 0.126521997 Total 662 112.483174
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 5.264671914 0.14418783 36.51259539 1.0958E-159 4.981542859 5.54780097 educ 0.053164506 0.008215655 6.471121685 1.91383E-10 0.03703214 0.069296871 exper 0.014316784 0.003931474 3.641581213 0.000292505 0.00659689 0.022036677 tenure 0.007870067 0.002915857 2.699058502 0.007134062 0.002144454 0.013595681 married 0.197249146 0.046632095 4.229901052 2.67325E-05 0.105681773 0.28881652 black -0.133834257 0.055123124 -2.427914968 0.015455911 -0.242074724 -0.025593791 south -0.062306665 0.030850742 -2.019616374 0.043832421 -0.122885574 -0.001727755 urban 0.191686486 0.031557534 6.074190925 2.11939E-09 0.129719712 0.253653261 sibs 0.008039811 0.007948365 1.01150493 0.312150619 -0.007567701 0.023647322 brthord -0.018872788 0.011674092 -1.616638666 0.106440745 -0.041796183 0.004050607 Peduc 0.011455328 0.005840199 1.961461823 0.050250964 -1.25616E-05 0.022923218 KWW 0.005625086 0.002181257 2.578827824 0.010132009 0.001341942 0.00990823
15
Table 6: Results of the regression#4 Regression Statistics
Multiple R 0.524331415 R Square 0.274923432 Adjusted R Square 0.261537403 Standard Error 0.354224861 Observations 663 ANOVA
df SS MS F Significance
F Regression 12 30.92426029 2.57702169 20.53808741 2.18154E-38 Residual 650 81.55891372 0.125475252 Total 662 112.483174
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 5.091557447 0.158991503 32.02408533 8.8686E-136 4.779358266 5.403756627 educ 0.045330337 0.00874542 5.183323294 2.91344E-07 0.02815764 0.062503033 exper 0.014750082 0.003918904 3.763827872 0.000182473 0.007054835 0.022445328 tenure 0.007937809 0.002903893 2.733506245 0.006436931 0.002235662 0.013639955 married 0.197616539 0.046439021 4.255398466 2.39389E-05 0.106427865 0.288805213 black -0.104096466 0.056133197 -1.854454616 0.06412684 -0.214320836 0.006127903 south -0.054447415 0.030878781 -1.763263119 0.078325918 -0.115081663 0.006186832 urban 0.192554928 0.031428587 6.126744617 1.55416E-09 0.130841069 0.254268788 sibs 0.008888236 0.007922485 1.121899962 0.262319243 -0.006668529 0.024445001 brthord -0.017903168 0.011631986 -1.53913249 0.124258598 -0.04074399 0.004937654 Peduc 0.010356658 0.005832105 1.775801035 0.076233483 -0.00109539 0.021808707 IQ 0.003069347 0.001210357 2.535902398 0.011448859 0.000692664 0.00544603 KWW 0.004591081 0.002210153 2.077268743 0.038168875 0.000251177 0.008930986
16
Table 7: Results of the regression#5 Regression Statistics
Multiple R 0.517452292 R Square 0.267756875 Adjusted R Square 0.257664703 Standard Error 0.355152472 Observations 663 ANOVA
df SS MS F Significance
F Regression 9 30.11814313 3.346460348 26.53114536 3.65762E-39 Residual 653 82.36503088 0.126133279 Total 662 112.483174
Coefficients Standard
Error t Stat P-value Lower 95% Upper 95% Intercept 5.121754614 0.14682191 34.88413005 2.5557E-151 4.833454581 5.410054647 educ 0.04918157 0.008539515 5.759293 1.30099E-08 0.032413347 0.065949792 exper 0.014162514 0.003914038 3.618388941 0.000319357 0.006476893 0.021848134 tenure 0.007918069 0.002910755 2.720280295 0.006696273 0.0022025 0.013633638 married 0.19780494 0.046512412 4.252734475 2.42027E-05 0.106472999 0.289136881 black -0.104401155 0.054735828 -1.907364123 0.0569119 -0.211880626 0.003078315 south -0.065509978 0.030604427 -2.140539248 0.032680947 -0.125604942 -0.005415014 urban 0.193411776 0.031438729 6.152022738 1.33386E-09 0.131678574 0.255144978 KWW 0.004809015 0.002195837 2.190059909 0.028872862 0.000497261 0.00912077 IQ 0.003312779 0.001205005 2.749183135 0.006139632 0.000946627 0.005678931
17
Bibliography
1. Henderson, Nell. “Greenspan Says Worker’s Lack of Skills Lowers Wages.” Washington Post 22
July 2004
< http://www.washingtonpost.com/wp-dyn/articles/A4121-2004Jul21.html>
2. Hoff, Karla and Joseph E. Stiglitz. 2000. “Modern Economic Theory and Development”,
<http://www2.gsb.columbia.edu/faculty/jstiglitz/ >
3. Regan, Tracy L., Ronald L. Oxaca and Galen Burghardt. 2003. “A Human Capital Model of the Effects of Abilities and Family Background on Optimal Schooling Levels”, p.42 <http://www.iza.org/en/webcontent/events/transatlantic/papers_2003/regan_oaxaca_burghardt.pdf>