Econometrics homework
Theoretical Probability Distributions (and sampling first)
Class 3 for Econometrics 1
Vincent Geloso
Sampling
What we saw in class 2 is what we call « descriptive statistics » which is useful but is not the same « inferential » statistics.
What do you mean by « inferential »? Statistics from which we make inferences (e.g. I use survey data to make inferences about the effect of marriage of women’s wages).
In other words: Drawing conclusions and/or making decisions concerning a population based only on sample data
Population vs. Sample
Population
Sample
Why Sample?
Less time consuming than a census
Less costly to administer than a census
It is possible to obtain statistical results of a sufficiently high precision based on samples.
The “proper” sample (like that guy)
Is your sample representative of the population?
This often amounts to asking how you sampled (were certain units more likely to be selected than others causing a “selection bias” or was the sample “random”).
E.g. Prison records from the 19th century on heights – do you think there is a selection bias? Is this sample representative?
E.g. The example of tax records mentioned in textbook (p.118)
E.g. Death reported to enumerators in the census in 19th century Canada
The proper sample
Look at this table (and forget all but the first three ones and the last one – we will get to the other ones later). The goal of “inferential” statistics is to make an inference for using
Three tools are needed to do so – one of which actually does not exist but which (see in a few slides) we can approximate more under the CENTRAL LIMIT THEOREM
The proper sample
Population distribution: Everyone…the whole lot of them … truthfully!
This is not available!
However, we can a) take the distribution of a sample and b) the distribution of all sample means (known as the sampling distribution of sample means)!
The first is straightforward, the second (b) requires explanation: imagine that there are 10,000 people whose income we want to measure in a population and that we drew a large number of samples of 1,000 people.
Take the mean of each sample and you can plot them on a graph.
Central limit theorem
The central limit theorem is quite useful (and we will see it in a second) because of the concept of the normal distribution (see last pages of chapter 2 that were not included in the previous lecture)
The Normal Distribution
A normal distribution is the « bell » curve you often see
For a large enough population, it will appear very smooth.
In samples, it does not always look that smooth (see slides on graphical presentaiton in class 2), but that is not an issue worth concerning ourselves with
(The smooth bell curve is a theoretical construct by the way that relies on two irrational numbers e and and it is not necessary to discuss them here but you can see the full equation here)
You can construct it with two types of random variables
Continuous (any value – e.g. 1.1, 1.11484, 1.231, 2.1354)
Discrete (finite number of values – e.g. whole numbers such as « children » -- you cannot have 1.11 children)
The Normal Distribution
The random variable is then expressed in relation with another (e.g. children and mother’s income). Here, children would be on the X-axis (horizontal) and income of mothers on the Y axis. It could also simply be the number of persons at each point on the horizontal (such as figure 2.6 in textbook)
There are some violations of the normal distribution that exist out there (and they pose important econometric problems in terms of deriving estimators from a sample)
E.g. heights of military conscripts (heights are often used to proxy living standards) who had « minimum » height requirements.
However, these violations (which I only cover in econometrics II) are not that frequent and so we can make important use of the theoretical features of the normal distribution.
The Normal Distribution
The features of importance that we will exploit in econometrics are that
90% of all observations with a normal distribution will be within 1.645 standard deviations (see class 2) of the mean leaving 5% of each tail
At 1.96 standard deviations, we jump to 95%
At 2.58 standard deviation, we jump to 99%
This suggests that we can standardize the distribution by asking what is the distance of any value of X expressed in units of standard deviation (this gives us something like the familiar Z-score)
Comparing X and Z units
Z
100
2.0
0
200
X
Note that the distribution is the same, only the scale has changed. We can express the problem in original units (X) or in standardized units (Z)
(μ = 100, σ = 50)
( μ = 0 , σ = 1)
Finding Normal Probabilities
a
b
x
f(x)
Z
µ
0
You can convert into Z-score to get sense of distribution
The standard normal distribution will often come with a table of Z-values (which are the same for everyone as normal distribution is a theoretical construct that we argue can approximate a given distribution)
For a given Z-value a , the table shows F(a)
(the area under the curve from negative infinity to a )
Z
0
a
Z
.9772
Example:
P(Z < 2.00) = .9772
General Procedure for Finding Probabilities
Draw the normal curve for the problem in
terms of X
Translate X-values to Z-values
Use the Cumulative Normal Table
To find P(a < X < b) when X is distributed normally:
Suppose X is normal with mean 8.0 and standard deviation 5.0. Find P(X < 8.6)
Z
0.12
0
X
8.6
8
μ = 8
σ = 10
μ = 0
σ = 1
P(X < 8.6)
P(Z < 0.12)
Finding Normal Probabilities
Solution: Finding P(Z < 0.12)
Z
0.12
z
F(z)
.10
.5398
.11
.5438
.12
.5478
.13
.5517
F(0.12) = 0.5478
Standardized Normal Probability
Table (Portion)
0.00
= P(Z < 0.12)
P(X < 8.6)
Upper Tail Probabilities
Suppose X is normal with mean 8.0 and standard deviation 5.0.
Now Find P(X > 8.6)
X
8.6
8.0
Upper Tail Probabilities
Now Find P(X > 8.6)…
Z
0.12
0
Z
0.12
0.5478
0
1.000
1.0 - 0.5478 = 0.4522
P(X > 8.6) = P(Z > 0.12) = 1.0 - P(Z ≤ 0.12)
= 1.0 - 0.5478 = 0.4522
Probability distributions
Let’s take tables 2.6 and 2.7 in the textbook to glance
Ways to see violations of normal distribution
A normal probability plot for data from a normal distribution will be approximately linear:
0
100
Data
Percent
The Normal Probability Plot
Left-Skewed
Right-Skewed
0
100
Data
Percent
Nonlinear plots indicate a deviation from normality
0
100
Data
Percent
Central limit theorem
Which brings us back to the Central Limit Theorem
If the population has a normal distribution, a large number of samples taken from that population (the sampling distribution of the sample means) will also be normal
Even if the population is not normally distributed, the sampling distribution of will be normally distributed (see next slide)
The mean of the sampling distribution of sample means (what a mouthful eh?) will be equal to (the great unknown because we do not know the population distribution)
On average, the mean of each sample will be equal to the mean of the sampling distribution
In logical suit, the known sample mean of any sample will be equal to
Central limit theorem
The standard deviation of the distribution of samples is known as a « standard error » (SE) (See next slide on the standard error vs standard deviation --- thing to remember: the standard error tells you how the means of the distribution of sample means is distributed)
If you move back a few slides, you saw that we could standardize a normal distribution to say how far/close something was from the mean. Here, the SE tells us the shape of the curve (flat or quite concentrated) so what we can produce an estimation error of any given
This is where sample size is crucial
As the sample size increases, the more clustered will be the different sample means around the true population mean and the smaller the error that can be attributed to a given sample mean ( (see previous slide for a partial glimpse at the importance of this).
We simply import the same as for the mean for the standard deviation (s) which we divide by the square root of the sample size --- see image on the right and textbook p. 124
Difference between standard deviation and standard error
Standard Deviation
First step of the standard error
Difference between standard deviation and standard error
Second step of the standard error
Last step of the standard error
Example
Standard deviation = 4.505
n=214
4.505/√214 = 0.308
Mean=15.636 +/- 0.308
PDF – not the electronic format *wink*
Imagine a discrete random variable (i.e. whole numbers)
How can we picture, for that random variable, all the values it can take? More precisely, what is the probability for each possible value?
The variable (X) takes i potential values ( – each potential i has a certain probability of occuring that cannot exceed one or be negative( ) and the sum of all probabilities must be equal to 1.
Translation: The chance of any random event occuring must be somewhere from impossible (zero) to absolutely certain (1) and the sum of all chances must equal 1 (100%).
Coin toss
Toss three (3) coins at the same time (notice: the result of each is independent of the others), what are the possible outcomes?
There are eight (8)
| 1st coin | 2nd coin | 3rd coin |
| T | T | T |
| T | T | H |
| T | H | T |
| H | T | T |
| T | H | H |
| H | H | T |
| H | T | H |
| H | H | H |
PDF – not the electronic format *wink*
0 heads / 1:8 or 12.5%
1 head / 3:8 or 37.5%
2 heads / 3:8 or 37.5%
3 head / 1:8 or 12.5%
Coin toss
Draw this and you’ll see that it looks like a normal curve!
Increase the sample size and it will get smoother and smoother (see below)
Z-score for the distribution of sample means, degrees of freedom and t-stat
The Z-score we computed above can be understood through the probability distribution for all possible samples from a given population.
Each of the successive values in the distribution of sample values has a probability of happening that is contingent on the standard error and the mean (of sample means). Here, we take the same equation but replace s by standard error of sample means (see slide 27) (see comment on slide)
Vincent Geloso (VG) - Notice one thing in the equation. I used 𝜇 (the population average). This is because we are speaking about the population of samples that we use. I will do an illustration on the board.
Z-score for the distribution of sample means, degrees of freedom and t-stat
The problem is that we cannot exactly use Z-score in that case because we have a degrees of freedom problem. What are degrees of freedom?
Imagine that you had to pick a sample. The first values you pick are freely taken. However, the last value is assigned to you (it is imposed). That is a lost degree of freedom. Since we use the sample mean in the formula above, we lost a degree of freedom in making statements about the distribution. We had n degrees of degree and we lost one (see panel 5.3 in textbook p.133 for full explanation). Thus we now get the t-stat (also known as Student’s Law)
T-stat
Z-score for the distribution of sample means, degrees of freedom and t-stat
The t-stat has, like Z-score, a theoretical distribution (see page 135 in textbook) that we can use to get values that we need. In fact, as long we don’t know the population’s standard deviation, the t-stat is superior to z-score. It is especially superior if the sample is relatively small.
T-stat table
Confidence interval
Thanks to elements of the normal distribution, we can link with some of what we covered in class 2.
For example, if we know the sample mean , how well does it speak to population mean ?
While on average, the different means of a population of equivalent samples may imitate (by virtue of central limit theorem) each individual sample mean can be somewhat off from the true value
We can create an interval and that interval links the first moment (mean) with second moment (standard deviation)
Confidence interval
This interval will be particularly important once we get to hypothesis testing (just after we deal with regressions – chapters 6 and 7 in textbook).
But think of what you often hear when opinion polls come out:
19 times out 20, our opinion of Canadians that place the Conservatives at 31% and the Liberals at 34% will be within 3% of that range. They are saying that this is their confidence interval!
It means that they expect that, 95% of the time (19 out of 20), their value (which we can replace by for our purposes) will include the true population value!
This is ))
Confidence interval: example
Take this sample of 214 parishes in England and take the birth rate variable (BRTHRATE)
The standard error () for this sample is 0.308 births per 100 families and an average of 15.636 birtsh per 100 families.
The t-stat (not shown in table above or in textbook but which you can get here by entering 214-1 degrees of freedom and the threshold for decision you want with it) = 1.971
15.636 (1.971*0.308) = 15.636 95% probability that a range from 15.029 to 16.243 birth per 100 families will include the population mean (which we cant know)
This is akin to saying that our sample of 214 parishes would, 19 times out of 20, be right (however, its hard to know until we get to hypothesis testing whether a single sample is the one out of 20 (see page 139)).
Practice question
Take this sample of 214 parishes in England, but use only county #2 (in column A)
Redo the example above for the n in that county only and explain your results relative to those we obtained above. Why are they different?
0
5
10
15
Frequency
-2-10123
Intellect
Intellect of US Presidents, 1789-2016