ProblemSet31.pdf

ECO 120 Problem Set 3

Professor Jonathan Robinson 1. This question continues our study of the farming program and requires Stata. In this problem,

we are going to calculate the rate of return to the inputs on the blue plot.

(a) The variables kg_tdfert_unmarked, kg_plantfert_unmarked, and packets_hyb_unmarked give you the kilograms of TD fertilizer, kilograms of planting fertilizer, and packets of hybrid seeds that were used on the unmarked plot. What are the average values of these measures for the unmarked plot? Similarly, kg_tdfert_blue, kg_plantfert_blue, and packets_hyb_blue give the same measures on the unmarked plot. What are the average on the blue and unmarked plots?

(b) Construct the difference for each variable between the blue and unmarked plot and test whether the difference is equal to 0.

(c) To figure out the costs of these inputs, we need the prices of each. It turns out that TD fertilizer costs 65 Ksh per kg, planting fertilizer costs 70 Ksh per kg, and hybrid seeds cost 230 per packet. What is the total amount spent on inputs on the blue plot? The unmarked plot? Is the difference statistically significant?

(d) To calculate the rate of return properly, we will need to account for possible differences in labor time spent on the 2 plots. We will do this in the next problem set, but for now, let’s just generate the rate of return as follows:

(revenueblue − revenueunmarked) − (input costsblue − input costsunmarked) input costsblue − input costsunmarked

In words, the rate of return is the value of the extra maize on the blue plot minus the cost of the extra inputs, divided by the cost of the extra inputs. What is the mean, median, minimum and maximum of this variable?

2. This question involves using some of the graphic options that Stata has. Stata has a command which creates a histogram of results. We are interested in doing this for the revenues on the two plots. Let’s start with the revenue on the unmarked plot. Type the following command into Stata:

• histogram revenue_blue, saving(histogram_revenue_blue, replace) xtitle("revenue (Ksh)") ytitle("Percentage of farmers") title("Distribution of revenue on blue plot")

Let me explain what this command does. "histogram revenue_blue" is the main part of the command, which tells Stata to make the graph. After the comma, "saving(histogram_revenue_blue, replace)" tells Stata to save this graph in a file called histogram_revenue_blue.gph (you will see that this is created in your directory). "xtitle" labels the x-axis. "ytitle" labels the y-axis, and "title" labels the whole graph.

(a) Generate histograms for revenues the blue plot and the unmarked plot. (b) There is a way to "smooth out" the histogram, by using something called a kernel den-

sity. The command for this is "kdensity." So if you type ‘kdensity revenue_blue, xti- tle("revenue (Ksh)") ytitle("Percentage of farmers") title("Distribution of revenue on blue plot")’, you will generate a kernel density for the blue revenue. Do this for the blue and the unmarked plot.

1

(c) It’d be nice to graph the two densities together on the same graph. Luckily Stata can do this for us. In particular, type

• twoway (kdensity revenue_blue) (kdensity revenue_unmarked, clpattern("-")), sav- ing(kdensity_both, replace) xtitle("revenue (Ksh)") ytitle("Percentage of farmers") title("Distribution of revenue on both plots") legend(order(1 "Blue plot" 2 "Un- marked plot"))

Stata will print the 2 graphs on the same chart. The only other new thing here is that the "legend" command tells Stata how to label the legend at the bottom, and the "clpattern" command tells Stata to make the distribution for the unmarked plot to have a dashed line (so that you can read it when you print it out in black and white). Implement this command and comment on what you find.

(d) Now let’s look at the returns to the inputs that you calculated in Question 1. One problem with the returns is that they include many very large and very small values. One way of dealing with this is to throw out the outliers. Let’s throw out the top and bottom 5% of returns. What is the 5th percentile of returns? What is the 95th? Graph the kernel density of returns, including only those returns between the 5th and 95th percentile of the distribution. For the rest of the problem, just focus on returns between the 5th and 95th percentile.

(e) Interpret what you’ve found in this problem - why might farmers not be excited about using inputs?

3. This question asks you to look at the returns to training your dataset.

(a) On the same graph, plot the return to inputs for the trained group and the untrained group separately. Please restrict these plots to between the 5th and 95th percentile of returns.

(b) Implement regressions to test whether: i. The revenue on the blue plot is different between the trained and untrained groups.

Interpret what this means. ii. The revenue on the unmarked plot is different between the trained and untrained

groups. Interpret what this means. iii. The return to inputs is different between the trained and untrained groups. Interpret

what this means.

2