ECO Project

profilexiaohua123
Studenmund_Ch11_v2.ppt

Chapter 11

Running Your Own Regression Project

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.

Slides by Niels-Hugo Blunch Washington and Lee University

*

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

Choosing Your Topic

  • There are at least three keys to choosing a topic:

Try to pick a field that you find interesting and/or that you know something about

Make sure that data are readily available with a reasonable sample (we suggest at least 25 observations)

Make sure that there is some substance to your topic

– Avoid topics that are purely descriptive or virtually tautological in nature

– Instead, look for topics that address an inherently interesting economic or behavioral question or choice

© 2011 Pearson Addison-Wesley. All rights reserved.

*

*

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

Choosing Your Topic (cont.)

  • Places to look:
  • your textbooks and notes from previous economics classes
  • economics journals
  • For example, Table 11.1 contains a list of the journals cited so far in this textbook (in order of the frequency of citation)

© 2011 Pearson Addison-Wesley. All rights reserved.

*

*

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

Table 11.1a
Sources of Potential Topic Ideas

© 2011 Pearson Addison-Wesley. All rights reserved.

*

*

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

Table 11.1b
Sources of Potential Topic Ideas

© 2011 Pearson Addison-Wesley. All rights reserved.

*

*

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

Collecting Your Data

  • Before any quantitative analysis can be done, the data must be:
  • collected
  • organized
  • entered into a computer
  • Usually, this is a time-consuming and frustrating task because of:
  • the difficulty of finding data
  • the existence of definitional differences between theoretical variables and their empirical counterparts
  • and the high probability of data entry errors or data transmission errors
  • But time spent thinking about and collecting the data is well spent, since a researcher who knows the data sources and definitions is much less likely to make mistakes using or interpreting regressions run on that data
  • We will now discuss three data collection issues in a bit more detail

© 2011 Pearson Addison-Wesley. All rights reserved.

*

*

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

What Data to Look For

  • Checking for data availability means deciding what specific variables you want to study:
  • dependent variable
  • all relevant independent variables
  • At least 5 issues to consider here:

1. Time periods:

  • If the dependent variable is measured annually, the explanatory variables should also be measured annually and not, say, monthly

2. Measuring quantity:

  • If the market and/or quality of a given variable has changed over time, it makes little sense to use quantity in units
  • Example: TVs have changed so much over time that it makes more sense to use quantity in terms of monetary equivalent: more comparable across time

© 2011 Pearson Addison-Wesley. All rights reserved.

*

*

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

What Data to Look For (cont.)

3. Nominal or real terms?

  • Depends on theory – essentially: do we want to “clean” for inflation?
  • TVs, again: probably use real terms

4. Appropriate variable definitions depend on whether data are cross-sectional or time-series

  • TVs, again: national advertising would be a good candidate for an explanatory variable in a time-series model, while advertising in or near each state (or city) would make sense in a cross-sectional model

5. Be careful when reading (and creating!) descriptions of data:

  • Where did the data originate?
  • Are prices and/or income measured in nominal or real terms?
  • Are prices retail or wholesale?

© 2011 Pearson Addison-Wesley. All rights reserved.

*

*

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

Where to Look for
Economic Data

  • Although some researchers generate their own data through surveys or other techniques (see Section 11.3), the vast majority of regressions are run on publicly available data
  • Good sources here include:

1. Government publications:

  • Statistical Abstract of the U.S.
  • the annual Economic Report of the President
  • the Handbook of Labor Statistics
  • Historical Statistics of the U.S. (published in 1975)
  • Census Catalog and Guide

© 2011 Pearson Addison-Wesley. All rights reserved.

*

*

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

Where to Look for
Economic Data (cont.)

2. International data sources:

  • U.N. Statistical Yearbook
  • U.N. Yearbook of National Account Statistics

3. Internet resources:

  • “Resources for Economists on the Internet”
  • Economagic
  • WebEC
  • EconLit (www.econlit.org)
  • “Dialog”
  • Links to these sites and other good sources of data are on the text’s Web site: www.pearsonhighered.com/studenmund

© 2011 Pearson Addison-Wesley. All rights reserved.

*

*

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

Missing Data

  • Suppose the data aren’t there?
  • What happens if you choose the perfect variable and look in all the right sources and can’t find the data?
  • The answer to this question depends on how much
    data is missing:

A few observations:

  • in a cross-section study:
  • Can usually afford to drop these observations from the sample
  • in a time-series study:
  • May interpolate value (taking the mean of adjacent values)

© 2011 Pearson Addison-Wesley. All rights reserved.

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

Missing Data (cont.)

2. No data at all available (for a theoretically relevant variable!):

  • From Chapter 6, we know that this is likely to cause omitted variables bias
  • A possible solution here is to use a proxy variable
  • For example, the value of net investment is a variable that is not measured directly in a number of countries
  • Instead, might use the value of gross investment as a proxy, the assumption being that the value of gross investment is directly proportional to the value of net investment

© 2011 Pearson Addison-Wesley. All rights reserved.

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

Advanced Data Sources

  • So far, all the data sets have been:

1. cross-sectional or time-series in nature

2. been collected by observing the world around us, instead being created

  • It turns out, however, that:

1. time-series and cross-sectional data can be pooled to form panel data

2. data can be generated through surveys

  • We will now briefly introduce these more advanced data sources and explain why it probably doesn't make sense to use these data sources on your first regression project:

© 2011 Pearson Addison-Wesley. All rights reserved.

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

Surveys

  • Surveys are everywhere in our society and are used for many different purposes—examples include:
  • marketing firms using surveys to learn more about products and competition
  • political candidates using surveys to finetune their campaign advertising or strategies
  • governments using surveys for all sorts of purposes, including keeping track of their citizens with instruments like the U.S. Census

© 2011 Pearson Addison-Wesley. All rights reserved.

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

Surveys (cont.)

  • While running your own survey might be tempting as a way of obtaining data for your own project, running a survey is not as easy as it might seem surveys:
  • must be carefully thought through; it’s virtually impossible to go
  • back to the respondents and add another question later
  • must be worded precisely (and pretested) to avoid confusing the respondent or "leading" the respondent to a particular answer
  • must have samples that are random and avoid the selection, survivor, and nonresponse biases explained in Section 17.2
  • As a result, we don't encourage beginning researchers to run their own surveys...

© 2011 Pearson Addison-Wesley. All rights reserved.

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

Panel Data

  • Again, panel data are formed when cross-sectional and time-series data sets are pooled to create a single data set
  • Two main reasons for using panel data:
  • To increase the sample size
  • To provide an insight into an analytical question that can't be obtained by using time-series or cross-sectional data alone

© 2011 Pearson Addison-Wesley. All rights reserved.

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

Panel Data (cont.)

  • Example: suppose we’re interested in the relationship between budget deficits and interest rates but only have 10 years’ of annual data to study
  • But ten observations is too small a sample for a reasonable regression!
  • However, if we can find time-series data on the same economic variables-interest rates and budget deficits—for the same ten years for six different countries, we’ll end up with a sample of 10*6 = 60 observations, which is more than enough
  • The result is a pooled cross-section time-series data set—a panel data set!
  • Panel data estimation methods are treated in Chapter 16

© 2011 Pearson Addison-Wesley. All rights reserved.

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

Practical Advice for
Your Project

  • We now move to a discussion of practical advice about actually doing applied econometric work
  • This discussion is structured in three parts:

1. The 10 Commandments of Applied Econometrics (by Peter Kennedy)

What to check if you get an unexpected sign

3. A collection of a dozen practical tips, brought together from other sections of this text that are worth reiterating specifically in the context of actually doing applied econometric work

© 2011 Pearson Addison-Wesley. All rights reserved.

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

Practical Advice for
Your Project

  • We now move to a discussion of practical advice about actually doing applied econometric work
  • This discussion is structured in three parts:

1. The 10 Commandments of Applied Econometrics (by Peter Kennedy)

What to check if you get an unexpected sign

3. A collection of a dozen practical tips, brought together from other sections of this text that are worth reiterating specifically in the context of actually doing applied econometric work

© 2011 Pearson Addison-Wesley. All rights reserved.

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

The 10 Commandments of Applied Econometrics

Use common sense and economic theory:

Example: match per capita variables with per capita variables, use real exchange rates to explain real imports or exports, etc

Ask the right questions:

Ask plenty of, perhaps, seemingly silly questions to ensure that you fully understand the goal of the research

Know the context:

Be sure to be familiar with the history, institutions, operating constraints, measurement peculiarities, cultural customs, etc, underlying the object under study

4. Inspect the data:

a. This includes calculating summary statistics, graphs, and data cleaning (including checking filters)

b. The objective is to get to know the data well

© 2011 Pearson Addison-Wesley. All rights reserved.

*

*

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

5. Keep it sensibly simple:

a. Begin with a simple model and only complicate it if it fails

b. This both goes for the specifications, functional forms, etc and for the estimation method

6. Look long and hard at your results:

a. Check that the results make sense, including signs and magnitudes

b. Apply the “laugh test”

7. Understand the costs and benefits of data mining:

a. “Bad” data mining: deliberately searching for a specification that “works” (i.e. “torturing” the data)

b. “Good” data mining: experimenting with the data to discover empirical regularities that can inform economic theory and be tested on a second data set

The 10 Commandments of Applied Econometrics (cont.)

© 2011 Pearson Addison-Wesley. All rights reserved.

*

*

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

The 10 Commandments of Applied Econometrics (cont.)

8. Be prepared to compromise:

a. The Classical Assumptions are only rarely are satisfied

b. Applied econometricians are therefore forced to compromise and adopt suboptimal solutions, the characteristics and consequences of which are not always known

c. Applied econometrics is necessarily ad hoc: we develop our analysis, including responses to potential problems, as we go along…

9. Do not confuse statistical significance with meaningful magnitude:

a. If the sample size is large enough, any (two-sided) hypothesis can be rejected (when large enough to make the SEs small enough)

b. Substantive significance—i.e. “how large?”—is also important, not just statistical significance

© 2011 Pearson Addison-Wesley. All rights reserved.

*

*

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

The 10 Commandments of Applied Econometrics (cont.)

10. Report a sensitivity analysis:

a. Dimensions to examine:

i. sample period

ii. the functional form

iii. the set of explanatory variables

iv. the choice of proxies

b. If results are not robust across the examined dimensions, then this casts doubt on the conclusions of the research

© 2011 Pearson Addison-Wesley. All rights reserved.

*

*

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

What to Check If You Get an Unexpected Sign

1. Recheck the expected sign

Were dummy variables computed “upside down,” for example?

2. Check your data for input errors and/or outliers

3. Check for an omitted variable

The most frequent source of significant unexpected signs

4. Check for an irrelevant variable

Frequent source of insignificant unexpected signs

5. Check for multicollinearity

Multicollinearity increases the variances and standard errors of the estimated coefficients, increasing the chance that a coefficient could have an unexpected sign

© 2011 Pearson Addison-Wesley. All rights reserved.

*

*

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

6. Check for sample selection bias

An unexpected sign sometimes can be due to the fact that the observations included in the data were not obtained randomly

7. Check your sample size

The smaller the sample size, the higher the variance on SEs

8. Check your theory

If nothing else is apparently wrong, only two possibilities remain: the theory is wrong or the data is bad

What to Check If You Get an Unexpected Sign

© 2011 Pearson Addison-Wesley. All rights reserved.

*

*

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

A Dozen Practical Tips Worth Reiterating

1. Don’t attempt to maximize (Chapter 2)

2. Always review the literature and hypothesize the signs of your coefficients before estimating a model (Chapter 3)

3. Inspect and clean your data before estimating a model. Know that outliers should not be automatically omitted; instead, they should be investigated to make sure that they belong in the sample (Chapter 3)

4. Know the Classical Assumptions cold! (Chapter 4)

5. In general, use a one-sided t-test unless the expected sign of the coefficient actually is in doubt (Chapter 5)

© 2011 Pearson Addison-Wesley. All rights reserved.

*

*

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

6. Don’t automatically discard a variable with an insignificant t-score. In general, be willing to live with a variable with a t-score lower than the critical value in order to decrease the chance of omitting a relevant variable (Chapter 6)

7. Know how to analyze the size and direction of the bias caused by an omitted variable (Chapter 6)

8. Understand all the different functional form options and their common uses, and remember to choose your functional form primarily on the basis of theory, not fit (Chapter 7)

A Dozen Practical Tips Worth Reiterating (cont.)

© 2011 Pearson Addison-Wesley. All rights reserved.

*

*

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

9. Multicollinearity doesn’t create bias; the estimated variances are large, but the estimated coefficients themselves are unbiased: So, the most-used remedy for multicollinearity is to do nothing (Chapter 8)

10. If you get a significant Durbin–Watson, Park, or White test, remember to consider the possibility that a specification error might be causing impure serial correlation or heteroskedasticity. Don’t change your estimation technique from OLS to GLS or use adjusted standard errors until you have the best possible specification. (Chapters 9 and 10)

A Dozen Practical Tips Worth Reiterating (cont.)

© 2011 Pearson Addison-Wesley. All rights reserved.

*

*

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

11. Adjusted standard errors like Newey–West standard errors or HC standard errors use the OLS coefficient estimates. It’s the standard errors of the estimated coefficients that change, not the estimated coefficients themselves. (Chapters 9 and 10)

12. Finally, if in doubt, rely on common sense and economic theory, not on statistical tests

A Dozen Practical Tips Worth Reiterating (cont.)

© 2011 Pearson Addison-Wesley. All rights reserved.

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

The Ethical Econometrician

  • We think that there are two reasonable goals for econometricians when estimating models:

1. Run as few different specifications as possible while still attempting to avoid the major econometric problems

  • The only exception is sensitivity analysis, described in
    Section 6.4

2. Report honestly the number and type of different specifications estimated so that readers of the research can evaluate how much weight to give to your results

© 2011 Pearson Addison-Wesley. All rights reserved.

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

Writing Your Research Report

  • Most good research reports have a number of elements in common:
  • A brief introduction that defines the dependent variable and states the goals of the research
  • A short review of relevant previous literature and research
  • An explanation of the specification of the equation (model):
  • Independent variables
  • functional forms
  • expected signs of (or other hypotheses about) the slope coefficients
  • A description of the data:
  • generated variables
  • data sources
  • data irregularities (if any)

© 2011 Pearson Addison-Wesley. All rights reserved.

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

Writing Your Research Report (cont.)

  • A presentation of each estimated specification, using our standard documentation format
  • If you estimate more than one specification, be sure to explain which one is best (and why!)
  • A careful analysis of the regression results:
  • discussion of any econometric problems encountered
  • complete documentation of all:
  • equations estimated
  • tests run
  • A short summary/conclusion that includes any policy recommendations or suggestions for further research
  • A bibliography
  • An appendix that includes all data, all regression runs, and all relevant computer output

© 2011 Pearson Addison-Wesley. All rights reserved.

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

Table 11.2a
Regression User’s Checklist

© 2011 Pearson Addison-Wesley. All rights reserved.

*

*

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

Table 11.2b
Regression User’s Checklist

© 2011 Pearson Addison-Wesley. All rights reserved.

*

*

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

Table 11.2c
Regression User’s Checklist

© 2011 Pearson Addison-Wesley. All rights reserved.

*

*

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

Table 11.2d
Regression User’s Checklist

© 2011 Pearson Addison-Wesley. All rights reserved.

*

*

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

Table 11.3a
Regression User’s Guide

© 2011 Pearson Addison-Wesley. All rights reserved.

*

*

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

Table 11.3b
Regression User’s Guide

© 2011 Pearson Addison-Wesley. All rights reserved.

*

*

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

Table 11.3c
Regression User’s Guide

© 2011 Pearson Addison-Wesley. All rights reserved.

*

*

11-*

© 2011 Pearson Addison-Wesley. All rights reserved.

Key Terms from Chapter 11

  • Choosing a research topic
  • Data collection
  • Missing data
  • Surveys
  • Panel data
  • The 10 Commandments of Applied Econometrics
  • What to Check If You Get An Unexpected Sign
  • A Dozen Practical Tips Worth Reiterating
  • The Ethical Econometrician
  • Writing your research report
  • A Regression User’s Checklist
  • A Regression User’s Guide

© 2011 Pearson Addison-Wesley. All rights reserved.

*

*

R

2