empirical analysis
ECON 203 Project [SAMPLE]
1
The effect of family income on labor supply of employed youth in Argentina
1. Introduction In basic economic theory, individuals maximize a utility function that depends on consumption of goods and leisure, subject to a budget constraint. Family income can be an important factor of the budget constraint, where higher family income would increase the consumption possibilities of the individual and decrease their need to work in order to earn income to maintain a similar level of consumption. At the same time, the individual would be better off by working less since they value leisure. In this study I want to examine the main determinants of labor supply of working youths in Argentina, measured by hours worked. In particular, I am interested in testing whether the effect of family income satisfies the predictions given by traditional economic theory. The main prediction would be that higher family income would decrease the number of hours work by the individual. In order to evaluate the determinants of hours of labor supplied, I use individual level data for 2005 from the Permanent Survey of Households of the National Institute of Statistics (INDEC) from Argentina. I restrict the sample to youths between the ages of 18 and 25 that work a positive number of hours in order to identify how family income affects the intensive margin of hours of work supplied. By using multiple regressions model, I am able to examine the main determinants of youth labor supply and also examine how family income affects different populations.
I show that my final model satisfies all the underlying assumptions. The data has no serious outliers, nor are the independent variables highly correlated among each other. In addition, by examining the histograms of standardized residuals and scatter plot of residuals against predicted values of the dependent variable, I am able to conclude that the error term is normally distributed and the variance appears to be constant. This implies that it is not necessary to transform the dependent variable.
Results suggest that family income plays a central role in determining number of hours worked by youths in Argentina. Increasing family income by $1,000 pesos (around 16% of the mean income) is associated to a decrease of 9 hours of work per week (22% of the average hours worked). This evidence supports the theoretical predictions of the effect of family income on labor supply. I also find that age, gender, years of education and experience are important determinants of labor supply. On the other hand, family size, number of children and being married do not appear to be statistically related to hours worked. Finally, I test whether males respond differently to changes in family income compared to females by including the interaction between a dummy for male and family income. I find that males respond less to changes in family income than women.
ECON 203 Project [SAMPLE]
2
2. Data
In this study I use data for Buenos Aires, Argentina for the year 2005, based on the Permanent Survey of Households compiled by the National Institute of Statistics (INDEC) and processed by CEDLAS of the University of La Plata. Since the main objective of this study is to analyze the effect of family income on labor supply, the dependent variable used will be total hours worked in a week (hrswrk) and the main independent variable is family monthly income (faminc). Family income does not include income earned by the individual and is measured in Argentine pesos. Other independent variables included are: age (age), a dummy variable that equals 1 if the individual is male (male), years of education (educ), number of children (nchild), a dummy variable that equals 1 if married (married), number of family members (nfamily), and a measure of potential experience (exp).
Table 1 presents descriptive statistics for my sample. The sample is comprised of 685 working individuals, where around 66% of them are male. On average, they work 41 hours a week, with a maximum of 98 hours and a minimum of 1 (individuals not working were excluded). The average family monthly income is $6,200 pesos. In Figure 1, I show the scatter plots of hours worked against each of the independent variables in order to identify severe outliers. From the plots it appears that there are no severe outliers. In addition, the plot of hours work against family income suggests that there may be a curve-linear relation between them.
Before performing regression analysis, I examined the correlation between all my variables in order to identify possible serious multicollinearity between independent variables. The correlation table is presented in Table 2. No two independent variables have a correlation higher than 0.8, therefore serious multicollinearity should not be a concern in the regression analysis.
3. Regression Analysis In order to examine the determinants of labor supply I estimate the following linear regression as the initial model: 𝒉𝒓𝒔𝒘𝒓𝒌𝒊
= 𝜶 + 𝜷𝟏𝒇𝒂𝒎𝒊𝒏𝒄𝒊 + 𝜷𝟐𝒇𝒂𝒎𝒊𝒏𝒄𝟐 + 𝜷𝟑𝒏𝒇𝒂𝒎𝒊𝒍𝒚𝒊 + 𝜷𝟒𝒎𝒂𝒍𝒆𝒊 + 𝜷𝟓𝒏𝒄𝒉𝒊𝒍𝒅𝒊 + 𝜷𝟔𝒎𝒂𝒓𝒓𝒊𝒆𝒅𝒊 + 𝜷𝟕𝒆𝒅𝒖𝒄𝒊 + 𝜷𝟖𝒆𝒙𝒑𝒊 + 𝜷𝟗𝒂𝒈𝒆𝒊 + 𝜺𝒊
As discussed earlier, family income (faminc) is likely an important determinant of labor supply since it enters the budget constraint of the utility maximization problem of the individual. I include family income squared since the scatter plot from Figure 1 suggested that there may be a curve-‐linear relation between family income and hours worked. There are other independent variables that are potentially important factors in determining labor supply and typically used in labor economics. For example, older individuals may receive less support from their parents and may be forced to work. Being married or having children may also
ECON 203 Project [SAMPLE]
3
influence the decision to work. At the same time, individuals with more education may receive more attractive job offers, increasing the opportunity cost of leisure. The results for the initial model are presented in Table 3. When considering the overall goodness of fit of the model, the R-‐squared indicates that the initial model explains 62% of the variation of hours of work. In addition, the independent variables are relevant as a whole since the null hypothesis of the F-‐test for overall significance of the model is rejected at the 5% level. Even though the model is good as a whole, there are many individual slope coefficients that are not statistically different from zero even at the 10% level. These variables are: number of family members, number of children and marital status. Additionally, the quadratic term for family income has a p-‐value of over 20%, therefore I can discard a curve-‐linear relation between family income and hours worked. In the previous section we already discarded the problem of serious multicolinearity between independent variables. One reason that some of these variables may not be significant is the possibility of the variance not being constant (heteroskedasticity) or non-‐normality of the error term. In both of these cases, the standard errors obtained would not be valid. In Figure 2 I present the histogram of standardized residuals to visually inspect for non-‐normality of the error term. The figure clearly shows that the distribution follows a normal distribution. To test for the existence of heteroskedasticity, I plot the residuals against the predicted hours worked. This is presented in Figure 3. It appears that the variance is quite constant and across predicted values of hours work. In order to obtain a final model on which to make inference of the results, I drop the variables that are not significant and re-‐estimate the model. Regression results are presented in Table 4. As a result of dropping these insignificant variables, the R-‐ squared does not change and the Adjusted R-‐squared increases slightly. This suggests that including those three extra variables was not contributing anything to explaining the variation in the dependent variable and including them was actually penalizing more than contributing to the Adjusted R-‐squared. In addition, I performed a partial F-‐test to make sure that the variables excluded from the model do not significantly contribute to explaining the variation in hours worked. The partial F-‐test statistic is 0.0313 with a p-‐value of 0.9925. Therefore we cannot reject the null hypothesis of the three variables having coefficients equal to zero (i.e. being irrelevant in explaining hours of work). All remaining independent variables are statistically different from zero at the 10% level. The reduced model satisfies the assumption of normality of the error term and constant variance as presented in the histogram and scatter plots of Figures 4 and 5. Since the assumptions appear to be satisfied, the reduced model will be my final model that will be used to make inference. This model is defined as:
𝒉𝒓𝒔𝒘𝒓𝒌𝒊 = 𝜶 + 𝜷𝟏𝒇𝒂𝒎𝒊𝒏𝒄𝒊 + 𝜷𝟐𝒎𝒂𝒍𝒆𝒊 + 𝜷𝟑𝒆𝒅𝒖𝒄𝒊 + 𝜷𝟒𝒆𝒙𝒑𝒊 + 𝜷𝟓𝒂𝒈𝒆𝒊 + 𝜺𝒊
ECON 203 Project [SAMPLE]
4
4. Empirical Results The coefficients obtained in Table 4 suggest that if family income were to increase by $1,000 pesos, an individual from the sample would decrease labor supply by 9 hours per week on average (all else constant). Considering that the average number of hours worked is 41, this would represent a 22% decrease in hours worked per week (9/41=0.219), a sizeable effect of family income on labor supply. The older individuals are, the more hours they are likely to work. Increasing age by one year relates to an increase of 8 hours of work per week. The dummy variable for male suggests that males in my sample on average work 2 hours more than females. Quite surprisingly, both years of education and years of potential experience are negatively related to labor supply and are similar in magnitude to the effect of age. I found that family income has sizeable effects on labor supply. An interesting question is whether family income affects different populations differently. For example, the relation could be different for males or females. In order to test whether family income affects males differently I estimate the reduced model including an interaction term between family income and the dummy variable for male. The results are presented in Table 5. For the interaction term I can reject the null hypothesis of the slope equal to zero allowing a 5% chance of Type I error. The interaction term suggests that women reduce their labor supply more than men if family income increases. 5. Summary and Discussion This study explored the determinants of youth labor supply for workers in Argentina. In particular, I was interested in analyzing the role of family income and testing whether the theoretical predictions are satisfied empirically. I found that family income is one of the most important factors in explaining the variation in hours of work. Holding all else constant, an increase of $1,000 pesos in family income relates to a decrease of 9 hours of work per week (an effect of around 22% of the mean). Other factors I found to be important determinants of hours worked are age, education, experience and gender. One significant shortfall of this study is that I am only considering the intensive margin of labor supply. That is, only focusing on individuals who are working. Family income could potentially have different effects on the extensive margin: deciding whether to work or not. Since I do not include individuals with zero hours of work, I cannot extrapolate my results to this population. Future research should address this issue carefully. A second shortfall is that I am using a very selected sample for Argentina, which may not necessarily be representative of other populations. Youths living in the United Sates or in European countries may respond differently to changes in family income since the labor market conditions are very different between countries.
ECON 203 Project [SAMPLE]
5
Figure 1: Scatter plots of Hours work vs independent variables
Figure 2: Histogram of standardized residuals – initial
model Figure 3: Residuals vs predicted hours of work -‐ initial
model
0 20
40 60
80 10 0
hr sw rk
2000 4000 6000 8000 10000 faminc
0 20
40 60
80 10 0
hr sw rk
18 20 22 24 26 age
0 20
40 60
80 10 0
hr sw rk
0 5 10 15 nfamily
0 20
40 60
80 10 0
hr sw rk
0 .2 .4 .6 .8 1 male
0 20
40 60
80 10 0
hr sw rk
0 1 2 3 nchild
0 20
40 60
80 10 0
hr sw rk
0 .2 .4 .6 .8 1 married
0 20
40 60
80 10 0
hr sw rk
0 5 10 15 20 educ
0 20
40 60
80 10 0
hr sw rk
0 5 10 15 exp
0 .2
.4 .6
D en si ty
-2 -1 0 1 2 3 stdresid
-4 0
-2 0
0 20
40 R
es id
ua ls
20 40 60 80 Fitted values
ECON 203 Project [SAMPLE]
6
Figure 4: Histogram of standardized residuals – reduced model
Figure 5: Residuals vs predicted hours of work -‐ reduced model
Table 1: Descriptive Statistics Variable Observations Mean Std. Dev. Min Max
hrswrk 685 41.587 17.188 1 98.1 faminc 685 6184.595 1547.212 1831.163 9643.931 age 685 22.261 2.189 18 25 nfamily 685 4.632 2.350 1 15 male 685 0.663 0.473 0 1 nchild 685 0.257 0.615 0 3 married 685 0.309 0.463 0 1 educ 685 10.645 2.692 3 17 exp 685 5.625 3.264 0 16
Notes: Own calculations based on Permanent Survey of Households for Argentina in 2005. Sample of youths of Buenos Aires that work.
Table 2: Correlations
hrswrk faminc age nfamily male nchild married educ exp
hrswrk 1 faminc -0.785 1
age 0.0429 -0.0366 1 nfamily -0.1014 0.1265 -0.1782 1
male 0.1647 -0.1451 -0.039 0.0171 1 nchild 0.0434 -0.0591 0.2269 -0.1368 0.0319 1
married 0.056 -0.0728 0.2376 -0.1385 0.0166 0.5936 1 educ -0.0053 0.0375 0.1085 -0.2388 -0.1767 -0.1638 -0.1406 1
exp 0.0322 -0.0557 0.5797 0.0767 0.1187 0.2862 0.2745 -0.7466 1 Notes: Own calculations based on Permanent Survey of Households for Argentina in 2005. Sample of youths of Buenos Aires that work.
0 .2
.4 .6
D en si ty
-2 -1 0 1 2 3 stdresid
-4 0
-2 0
0 20
40 R
es id
ua ls
0 20 40 60 80 Fitted values
ECON 203 Project [SAMPLE]
7
Table 3: Initial model regression results
Coefficient Std. Err. T-stat P-value [95% Conf. Interval]
faminc -0.011 0.002 -6.5 0.000 -0.014 -0.007 faminc2 0.000 0.000 1.22 0.225 0.000 0.000 age 8.102 4.441 1.82 0.069 -0.617 16.821 nfamily 0.052 0.184 0.28 0.778 -0.310 0.414 male 2.165 0.883 2.45 0.014 0.432 3.898 nchild -0.138 0.837 -0.17 0.869 -1.782 1.505 married 0.173 1.109 0.16 0.876 -2.005 2.351 educ -7.722 4.424 -1.75 0.081 -16.407 0.964 exp -8.008 4.451 -1.8 0.072 -16.748 0.733 Constant 45.516 27.509 1.65 0.098 -8.498 99.529
Observations 685
Source SS df MS R-squared 0.6228
Adj-R-squared 0.6178
Model 125857.77 9 13984.1967 F-stat 123.84
Residual 76219.1037 675 112.917191
Pval. F-stat 0.000
Notes: Own calculations based on Permanent Survey of Households for Argentina in 2005. Sample of youths of Buenos Aires that work. Ordinary least squares estimates presented.
Table 4: Reduced model regression results
Coefficient Std. Err. T-stat P-value [95% Conf. Interval]
faminc -0.009 0.000 -32.59 0.000 -0.009 -0.008 age 8.185 4.431 1.85 0.065 -0.515 16.885 male 2.109 0.880 2.4 0.017 0.381 3.837 educ -7.844 4.414 -1.78 0.076 -16.511 0.822 exp -8.108 4.441 -1.83 0.068 -16.829 0.613 Constant 40.544 27.004 1.5 0.134 -12.478 93.565
Observations 685
SS df MS R-squared 0.622
Adj-R-squared 0.6192
Model 125680.337 5 25136.0675 F-stat 223.41
Residual 76396.5367 679 112.513309
Pval. F-stat 0.000
Notes: Own calculations based on Permanent Survey of Households for Argentina in 2005. Sample of youths of Buenos Aires that work. Ordinary least squares estimates presented.
ECON 203 Project [SAMPLE]
8
Table 5: Reduced model regression results with interaction
Coefficient Std. Err. T-stat P-value [95% Conf. Interval]
faminc -0.010 0.000 -20.37 0.000 -0.011 -0.009 male -6.781 3.718 -1.82 0.069 -14.081 0.519 maleXfaminc 0.001 0.001 2.46 0.014 0.000 0.003 age 8.539 4.417 1.93 0.054 -0.133 17.211 educ -8.170 4.400 -1.86 0.064 -16.808 0.469 exp -8.443 4.427 -1.91 0.057 -17.136 0.249 Constant 44.258 26.946 1.64 0.101 -8.651 97.166
Observations 685
SS df MS R-squared 0.6253
Adj-R-squared 0.622
Model 126356.579 6 21059.4299 F-stat 188.57
Residual 75720.2948 678 111.681851
Pval. F-stat 0.000
Notes: Own calculations based on Permanent Survey of Households for Argentina in 2005. Sample of youths of Buenos Aires that work. Ordinary least squares estimates presented.