Econometrics homework
Economic Measurement
Class 1 for Econometrics 1
Vincent Geloso
Focus on the « econ » part
Economics is based on a series of a priori and axioms
Theories have to be logically consistent
However, some theories can be consistent while they can apply to the same phenomenom and be mutually exclusive
E.g. signaling versus asymmetric information
Some theories can also not be mutually exclusive and apply to differing degrees to explain a given applied situation
E.g. the American revolution and tax incidence
Econometrics is about sorting theories and their relevance to the applied situations
Question tells you the measurements you need
As such, the first step in any econometric effort is to ask a clear applied question:
Does class size affect schooling outcomes?
Is the effect of education on wages greater than the effet of an extra year of experience?
Did Quebec separatist governments affect Quebec’s economic growth? (Somers and Vaillancourt 2014; Geloso and Grier 2018)
Were French farmers less efficient farmers than English farmers? (Geloso et al. 2017)
Did the stock market crash in 1929 because of news about the Smoot-Hawley tariffs (Beaudreaux 2015)
The question will tell you the variables you need
Class size and scoring tests
Data on schooling achievements (e.g. which university, what degree, what field) and wages
Support for separatism and GDP per capita
Sample of farm output data and cultural markers
Stock market data and news events
Question tells you the measurements you need
Once the question is set, two things should happen in your mind
You dress up the function of what you are looking for so as to convert the question into an economic form (see section 1.5 on page 13)
You picture the shape of the data!
Skimmed milk (i.e. water), Canada
42278 42309 42339 42370 42401 42430 42461 42491 42522 42552 42583 42614 42644 42675 42705 42736 42767 42795 42826 42856 42887 42917 42948 42979 43009 43040 43070 43101 43132 43160 43191 43221 43252 43282 43313 43344 43374 43405 43435 43466 43497 43525 43556 43586 43617 4.93 4.93 4.93 4.92 5.09 5.09 5.07 5.0999999999999996 5.0999999999999996 5.13 5.13 5.1100000000000003 5.13 5.07 5.1100000000000003 5.09 5.0999999999999996 5.1100000000000003 5.1100000000000003 5.0999999999999996 5.0999999999999996 5.0999999999999996 5.09 5.0999999999999996 5.0599999999999996 5.07 5.07 5.07 5.0999999999999996 5.0999999999999996 5.1100000000000003 5.17 5.09 5.0999999999999996 5.08 5.24 5.25 5.24 5.23 5.22 5.22 5.2 5.19 5.19 5.17
Shape of data
Cross sections (different cases at a single point in time)
E.g. French and English Canadian farmers in Quebec in 1831
Time series (a single case over time)
E.g. the New York stock market daily data in 1929 and an index of good/bad news regarding trade tariffs
Pooled cross-sections (different cases over time)
E.g. taking the individuals in different annual surveys (such as the census) and putting the surveys together
Panel or longitudinal (same cases over time)
E.g. Take taxpayers in 1989 and track their income to 2019; track the French and English farms of 1831 over the different censuses until 1871 etc.
Can you identify the types of data (click to download)?
Irish emigration in the 19th century
Irish parishes and poor relief in the late 19 th century
Types of variables
Nominal/categorical
A dataset of earnings where the state of origin (e.g. Texas, California) is included
It will be included as dummy variable (i.e. Texas or not Texas) and can be useful in many settings such in panels when you cannot control for certain things (in econometrics II, I will discuss this in great details with the fixed effects model)
Ordinal
When you can make an ordinal but not cardinal ranking of things!
I can say that an unskilled workers will earn less than a skilled worker, but I cannot evaluate the actual distance in skills between the two. Here, we also use dummy variables
Interval/ratio measurement
A cardinal measure that can set the actual distance between different observations of the same variable
Take the same datasets as earlier and identify the types of variables
Irish emigration in the 19th century
Irish parishes and poor relief in the late 19 th century
Birth rates in Irish parishes in the late 19 th century
Birth and death rates in Quebec, 1688 to 1858
A quick note of caution (not exam)
Whatever results you get are as good as the data you used to answer the question! Sometimes, people forget this and can get false results.
As we will see later in this class, we use data to make inferences. Some of these inferences may be fallacious!
Example: Simpson’s Paradox and the ecological inference fallacy
Example: The situation of the French language in Quebec (Arsenault Morin and Geloso 2018)
Assembling the data
As we will see later (weeks 4 and 5), there are populations and samples. Populations are generally very large and it is daunting to take the whole population.
Censuses can do that, but they are costly and there are still problems that make them deviate from the true population features (i.e. people say shit on censuses).
You must also consider the right population (if you want to assess wage effect of going to Harvard University, what is the right population?)
The features of a population are known as parameters
Assembling the data
You must assemble a sample then!
Ideally, you likerandom samples (see page 11 of textbook)
The parameters of the population are statistics (or estimators) in the sample – the sample is used to infer statements about the parameters (which are generally unknown)
The way to write the data
One thing that you will frequently see in econometrics is the use of log-form (or ln with the natural logarithm)
A log = the power to which a given base must be raised to equal that number
Log of 100 to base of 10 is 2 because 10^2 = 100
The natural log: as n goes to infinity,
Try it in excel (take 400 rows to mean your n and then try the equation above, it will tend to 2.71828)
The great virtue of logs for our purposes is that the effects can be expressed in proportional terms (remember that in micro, the use of log-linear form)
Index numbers
One easy way to convert large quantities of information is to use index numbers, a little like you did in macro classes with GDP
The downside with index numbers is that they can « drift » (Gerschenkron 1947 as an example with Soviet GDP, 1913-1945) See appendix B in textbook (will be on exam)
Trends and fluctuations
One important form of data is time series
We do not discuss this in great details in Econometrics I, we will do so in Econometrics II
But there are things to know largely because economics really often uses time series
Trends and fluctuations are the main issue
You can decompose them! One way is to use the regressions we will see later with a « time trend » variable to control for the trend.
The other way is to use moving averages (gets you the trend) and then you can divide actual over moving average to get the detrended movements (i.e. how things move around the trend)