Order 1193674: Regression Analysis

profiletutorthammy
Assignment.pdf

Assignment for ECO321H1F 2018. Due October 1. Background information This topic and these data pertain to the required reading: “Why Nations Fail” Chapter 1 by Acemoglu and Robinson and Acemoglu, Johnson and Robinson. “The Colonial Origins of Comparative Development: An Empirical Investigation.” American Economic Review, 91 no 5 (2001): 1369-1401. The dataset consists of an amalgamation of the data used in the Colonial Origins paper, a second paper (Daron Acemoglu, Simon Johnson, James A. Robinson. “Reversal of Fortune: Geography and Institutions in the Making of the Modern World Income Distribution.” The Quarterly Journal of Economics, Vol. 117, No. 4 (Nov., 2002), pp. 1231-1294) and some information on slave exports from a paper by Nathan Nunn, “Long-term Effects of the Slave Trade” Quarterly Journal of Economics (2008): 123(1): 139–176. The observations are for 163 countries around the world for different time periods. Most of the variables are from a fairly current time period (1995), while some are historic, dating from the year 1500 or 1900. The variables are listed alphabetically in the spreadsheet. A legend describing the variable name appears on the second sheet of the excel workbook. I have highlighted the historic variables, which include the institutions proxy variables (constraint on executive; democracy variables); settler mortality; population density; urbanization; yellow-fever epidemics variable. French or English colony and ex-colony may also be relevant. There are variables which reflect geography and climate, which are (at least cross-country) likely fairly time invariant: humidity, temperature, soil classifications; landlocked & amount of territory within 100km of coast; resources – zinc, silver, iron, oil, gold; latitude. There is also a set of continent variables – Africa, Asia, Europe and the Americas dummy variables and a continent variable identifying the various continents. Current economic variables include gdp per capita measures, urbanization, life expectancy at birth, infant mortality rate; as well as other measures, like malaria exposure, religion. The objective of the assignment is to prepare you for the data work expected for the essay assignments. As described in the general instructions for the essays:

“The best papers are coherent – the graphs chosen relate to the literature described and attempt to uncover patterns in the data. For instance, imagine you choose to examine the relationship between birthweight and income. The literature you read pointed out that other factors like gestation affects birthweight. It also noted gestation was correlated with income. Hence the overall relationship might be biased by the gestation-income relationship – the fact that average gestation varied by income. To investigate, you might show the overall birthweight-income relationship in a graph, but then also illustrate the birthweight-gestation relationship, the gestation-income relationship and finally the birthweight-income

relationship for different gestation categories. In other words, really explore that relationship and find instructive ways to take into account the other factors that might influence the relationship you are examining.”

To do:

1. Choose three variables – an X & Y (the main relationship of your investigation) and a Z. Choose Z such that you are arguing that Z is also a determinant of Y (Y=f(X,Z) and it is also correlated with X, hence in order to carefully investigate the X-Y relationship, you need to control for Z. Choose non-binary variables for X & Y. Identify the variables (state them).

2. Graph the relationship between X and Y as a scatterplot. 3. Construct a table that reports the mean of Y for intervals of X [can use pivot table] 4. Graph Y&X with mean Y on Y axis & intervals of X (grouped X) on X axis. [using

information from #3 above]. 5. Construct a table that reports the count of Y for (same) intervals of X 6. Graph #5 [histogram] 7. Now you will introduce Z: show that Y = f(Z): if Z is a binary (0,1) variable,

construct a table reporting mean Y and number of observations (count) for Z=0 and Z=1; if Z is non-binary, produce a table reporting mean Y & count for grouped Z (e.g., low Z, high Z – essentially transform the Z to a binary) [use pivot table and group the row variable]. (note: scatter may also be useful with a non-binary Z)

8. Show that Z = f(X): construct a table reporting mean X and number of observations (count) for Z=0 and Z=1. To better illustrate the variation in the data, produce a table reporting the count (number of observations) of Z for the (same as above) intervals of X – for Z = 0 and Z = 1. Show this table in a graph.

9. Now show Y = f(X, controlling for Z). Produce a (pivot) table of mean Y for intervals of X by Z (for Z = 1 and Z = 0). Graph the information in the table – include ‘grand total’ (overall relationship) and mean Y for Z=0 and mean Y for Z=1 (over intervals of X). In other words, show the Y=f(X|z=0) and Y =f(X|z=1) and Y = f(X).

Graphs and tables should be labelled properly. Adjust axes in cases where the information in the graph is unclear. Make sensible choices regarding graph type – line graphs if the X variable is continuous; bar graphs if the variable is categorical (e.g., summer, winter, fall, spring).