Econometrics homework

profileVincent666
Class2.pptx

The Presentation of Data

Class 2 for Econometrics 1

Vincent Geloso

Class organization

On Monday we do the slides and divide the class in two bits

Graphical presentation

Numerical presentation

On Wednesday, I will complete the lecture and introduce you to stata

NOTE: The homework I will assign may involve the use of Stata

Graphical presentation

Normally, and this is a dirty trick in econometrics, you can generally see where things will be going for any « applied » topcis with some basic graphical depictions.

Graphics from textbook

An illustration from economic history

It was the heat of the « moment »!

First moment: central tendency (mean)

Second moment: dispersion (variance)

Third moment: skewness (shape of the curve)

Fourth moment: kurtosis (tails or the sharpness of the peak of the distribution – kurtosis has its ethymology in the Greek for « arching »)

Measures of Central Tendency

Central tendency measures tell how « the central tendencies » (duh!) (and they are the first moments)

But think about it! « Central » refers to a single thing (there can only one centre to any unit or, in our case, any distribution).

A measure of central tendency will tell you about what the distribution moves around!

Measures of Central Tendency

Central Tendency

Mean

Median

Mode

Overview

Midpoint of ranked values

Most frequently observed value

(if one exists)

Arithmetic average

The arithmetic mean (mean) is the most common measure of central tendency (but it is affected by extreme values!)

For a population of N values:

For a sample of size n:

Sample size

Observed values

Population size

Population values

Arithmetic Mean

Weighted mean

Where w is the weight applied to each

Very useful when trying to get measures that are unaffected by other variables

For example: Cuba has an increasing death rate but also has an aging population. If you used the death rates in each age group as they evolve over time but conserved the weights of each group in, say, 1995, would you get a different evolution?

This relates a little to when we mentioned Simpson’s paradox (see this great clip on why these kinds of adjustments may matter a lot!)

Geometric Mean

Geometric mean

Used to measure the rate of change of a variable over time

Geometric mean rate of return

Measures the status of an investment over time

Where xi is the rate of return in time period i

Example

An investment of $100,000 rose to $150,000 at the end of year one and increased to $180,000 at end of year two:

50% increase 20% increase

What is the mean percentage return over time?

Example

Use the 1-year returns to compute the arithmetic mean and the geometric mean:

Arithmetic mean rate of return:

Geometric mean rate of return:

Misleading result

Accurate result

Median

In an ordered list, the median is the “middle” number (50% above, 50% below)

Not affected by extreme values

0 1 2 3 4 5 6 7 8 9 10

Median = 3

0 1 2 3 4 5 6 7 8 9 10

Median = 3

Finding the Median

The location of the median:

If the number of values is odd, the median is the middle number

If the number of values is even, the median is the average of the two middle numbers

Note that is not the value of the median, only the position of the median in the ranked data

Mean and median

You can already derive some things of value economically from just these two values – it will give you an idea of how big of an effect outliers have.

E.g. the ratio of the two over time is sometimes used to study income inequality

Mode

A measure of central tendency

Value that occurs most often

Not affected by extreme values

Used for either numerical or categorical data

There may may be no mode

There may be several modes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 9

0 1 2 3 4 5 6

No Mode

Mean, median and mode

The three together should give you a rough idea (although not accurate) of what your data distribution looks like

For example: if you have two modes

Shape of a Distribution

The mean and median can also give you ideas about the shape of the distribution as well!

Symmetric or skewed

Mean = Median

Mean < Median

Median < Mean

Right-Skewed

Left-Skewed

Symmetric

Measures of Dispersion

The problem with « first moments » is that like « first impressions », they can mislead you!

Dispersion measures

Same center,

different variation

Variation

Variance

Standard Deviation

Coefficient of Variation

Range

Interquartile

Range

Measures of variation give information on the spread or variability of the data values.

The range

The range is probably the one you will never end up using in great details – its often simply reported in economics paper for the sake of due diligence (as min/max)

The difference between the minimum and maximum which has the obvious downside of ignoring everything in between

Percentiles and Quartiles

Percentiles and Quartiles indicate the position of a value relative to the entire set of data

Generally used to describe large data sets

Example: An IQ score at the 90th percentile means that 10% of the population has a higher IQ score and 90% have a lower IQ score.

Pth percentile = value located in the (P/100)(n + 1)th

ordered position

With percentiles, you can do the interquartile range

Interquartile range

Practical illustration

Income at P25, P50 (median) and P75, Delaware 1926-1938

Q1 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 372.08272782166262 313.47744692602799 357.39031554634653 378.66126527422097 386.70290242429132 248.94363369098119 288.12901623649975 289.17088151459524 285.32173524980732 235.54448218450719 378.64902368115588 402.0418201940102 390.38738551936262 Median 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 512.32897011112505 441.82857945602194 696.78626083522568 744.7537129199967 720.01732250626117 352.77213809849428 396.60422598844224 399.18681805270273 399.71294208815908 334.6259481131496 662.91704194072872 879.74065796761226 681.50002858928985 Q3 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1520.1075496622718 1308.5914241853047 1551.4755242327517 1660.8277277739442 1377.7429978796586 925.92294073361256 967.53302691484089 917.62393737098466 876.02474731927339 715.34285824703011 1352.9344683344982 1562.8723545897806 1456.1582624672712

Interquartile range

Very useful in discussions of income inequality – especially the « Great Compression » of the 1940s-1950s (incomes below the 90th centile grew and grew increasingly together)

Measures of Dispersion

The coefficient of variation is quite useful without any reliance on econometrics by the way. It can serve to assess market integration and convergence to the law of one price over time in economic history.

Population Variance

Average of squared deviations of values from the mean

Population variance:

Where

= population mean

N = population size

xi = ith value of the variable x

Sample Variance

Average (approximately) of squared deviations of values from the mean

Sample variance:

Where

= arithmetic mean

n = sample size

Xi = ith value of the variable X

Population Standard Deviation

Most commonly used measure of variation

Shows variation about the mean

Has the same units as the original data

Population standard deviation:

Sample Standard Deviation

Most commonly used measure of variation

Shows variation about the mean

Has the same units as the original data

Sample standard deviation:

Comparing Standard Deviations

s = 3.338

(compare to the two cases below)

11 12 13 14 15 16 17 18 19 20 21

11 12 13 14 15 16 17 18 19 20 21

Data B

Data A

s = 0.926

(values are concentrated near the mean)

11 12 13 14 15 16 17 18 19 20 21

s = 4.570

(values are dispersed far from the mean)

Data C

Mean = 15.5 for each data set

Measuring dispersion

Small standard deviation

Large standard deviation

Calculation Example: Sample Standard Deviation

Sample Data (xi) : 10 12 14 15 17 18 18 24

n = 8 Mean = x = 16

A measure of the “average” scatter around the mean

Coefficient of Variation

Measures relative variation

Always in percentage (%)

Shows variation relative to mean

Can be used to compare two or more sets of data measured in different units

Population coefficient of variation:

Sample coefficient of variation:

Comparing Coefficient of Variation

Stock A:

Average price last year = $50

Standard deviation = $5

Stock B:

Average price last year = $100

Standard deviation = $5

Both stocks have the same standard deviation, but stock B is less variable relative to its price

The coefficient of variation

The coefficient of variation is also super useful when comes the time to measure « market integration ».

Remember that in micro (and macro), you were shown the law of one price (prices net of transport costs should converge and, if there are no costs to transport, equalize).

How do you measure convergence?

The coefficient of variation for the sample of all places-prices is one way!

Skewness

Notice one thing important we did when dealing with measures of dispersion: their computation included measures of central tendency.

At each « moment », we include previous moments to arrive at values.

In the present case, we want to know about skewness

Skewness

As we will see in the next unit, a lot of econometrics relies on the « normal distribution » (i.e. bell curve).

The two best measures we will use in this class are the Pearson skewness coefficient and a variant on it

Practice exercises

Do exercises 1 and 2 on page 66 (download data here)

Do them by hand!