Chapter5.pptx

Chapter 5 classical linear regression model assumptions and diagnostic tests

5.1 introduction

Five assumptions relating to the classical linear regression model (CLRM) were made.

Also saw that:

the estimator technique, ordinary least squares (OLS), had a number of desirable properties, and,

hypothesis tests regarding the coefficient estimates could validly be conducted.

Five assumptions:

(1) E(ut) = 0

(2) var(ut) = σ2 < 

(3) cov(ui,uj) = 0

(4) cov(ut,xt) = 0

(5) ut  N(0, σ2)

5.1 introduction

These assumptions will now be studied further:

How can violations of the assumptions be detected?

What are the most likely causes of the violations in practice?

What are the consequences for the model if an assumption is violated but this fact is ignored and the researcher proceeds regardless?

5.1 introduction

The answer to the last of these questions is that, in general, the model could encounter any combination of three problems:

the coefficient estimates ( ) are wrong

the associated standard errors are wrong

the distributions that were assumed for the test statistics are inappropriate.

5.1 introduction

A pragmatic approach to ‘solving’ problems associated with the use of models where one or more of the assumption is not supported by the data will then be adopted.

Such solutions usually operate such that:

the assumptions are no longer violated, or

the problems are side-stepped, so that alternative techniques are used which are still valid.

5.1 introduction

5.2 statistical distributions for diagnostic tests

We look at various regression diagnostic (misspecification) tests that are based on the calculation of a test statistic.

These tests can be constructed in several ways; precise approach to constructing the test statistic will determine the distribution that the test statistic is assumed to follow.

Two particular approaches are in common use and their results are given by the statistical packages: the LM test and the Wald test. (Further details on these in chapter 8).

For now, we need to know that LM test statistics in the context of the diagnostic tests presented here follow a x2 (Chi-square) distribution with m degrees of freedom (equal to the number of restrictions placed on the model).

The Wald version of the test follows an F-distribution with (m, T – k) degrees of freedom.

Asymptotically, these two tests are equivalent, although the results will differ somewhat in small samples.

5.2 statistical distributions for diagnostic tests

The first assumption required is that the average value of the errors is zero.

If a constant term is included in the regression equation, this assumption will never be violated.

But what if financial theory suggests that, for a particular application, there should be no intercept so that the regression line is forced through the origin?

If the regression did not include an intercept, and the average value of the errors was non-zero, this could lead to several undesirable consequences.

5.3 assumption 1: e(ut) = 0

First, R2, defined as ESS/TSS can be negative, implying that the sample average, Ў, ‘explains’ more of the variation in y than the explanatory variables.

Second, and more fundamentally, a regression with no intercept parameter could lead to potentially severe biases in the slope coefficient estimates.

To see this, consider figure 4.1.

5.3 assumption 1: e(ut) = 0

Effect of no

intercept on a

regression line

5.3 assumption 1: e(ut) = 0

Figure 5.1

The solid line shows the regression line estimated including a constant term; the dotted line shows the effect of suppressing (e.g. setting to zero) the constant term  the estimated line in this case is forced through the origin, so that the estimate of the slope coefficient is biased.

Additionally, R2 and Ř2 are usually meaningless in such a context  the mean value of the dependent variable, ŷ, will not be equal to the mean of the fitted values from the model, i.e. the mean of ŷ if there is no constant in the regression.

5.3 assumption 1: e(ut) = 0

5.4 assumption 2: var = <

Assumed thus far that the variance of the errors is constant, σ2  known as the assumption of homoscedasticity.

If errors do not have a constant variance, they are said to be heteroscedastic.

Illustration of heteroscedasticity: Suppose that a regression had been estimated and the residuals, ût, have been calculated and then plotted against one of the explanatory variables, x2t, as shown in figure 4.2.

Graphical

Illustration of

heteroscedascity

5.4 assumption 2: var = < ∞

Figure 5.2

It is clear that the errors in figure 4.2 are heteroscedastic – that is, although their mean value is roughly constant, their variance is increasing systematically with x2t.

5.4 assumption 2: var = < ∞

How to tell if the errors are heteroscedastic or not?

Use a graphical method as above, but one can rarely find the cause or the form of the heteroscedasticity, so that a plot is likely to reveal nothing.

For example, if the variance of the errors was an increasing function of x3t, and the researcher had plotted the residuals against x2t, he would be unlikely to see any pattern, and would thus wrongly conclude that the errors had constant variance.

5.4.1 DETECTION OF HETEROSCEDASTICITY

It is also possible that the variance of the errors changes over time rather than systematically with one of the explanatory variables; this phenomenon is known as ’ARCH’ (described in chapter 9).

We have a number of formal statistical tests for heteroscedasticity; the simplest is Goldfeld-Quandt (1965) test.

Split the total sample of length T into two sub-samples of length T1 and T2; estimate two regression equations for the two sub-samples and the two residual variances; then see if they are different. (See p. 182-183 for the details of the test).

Another popular test is White’s (1980) general test for heteroscedasticity; makes few assumptions about the likely form of the heteroscedasticity.

The test is carried out as in box 4.2.

5.4.1 DETECTION OF HETEROSCEDASTICITY

1. Assume that the regression model estimated is of the standard linear form, e.g.

yt = ß1 + ß2x2t + ß3x3t + ut (5.2)

To test var(ut) = 2, estimate the model above, obtaining the residual, ût.

5.4.1 DETECTION OF HETEROSCEDASTICITY

2. Then run the auxiliary regression

û2t= 1+2x2t+3x3t+4x22t+5x23t+6x2tx3t+vt (5.3)

where vt is a normally distributed disturbance term independent of ut.

This regression is of the squared residuals on a constant, the original explanatory variables, the squares of the explanatory variables and their cross-products.

To see why the squared residuals are the quantity of interest, recall that for a variable ut, the variance can be written

var(ut) = E[(ut – E(ut))2] (5.4)

5.4.1 DETECTION OF HETEROSCEDASTICITY

Under the assumption that E(ut) = 0, the second part of the RHS of this expression disappears:

var(ut) = E[u2t] (5.5)

Once again, it is not possible to know the squares of the population disturbance, u2t, so their sample counterparts, the squared residuals, are used instead.

5.4.1 DETECTION OF HETEROSCEDASTICITY

The auxiliary regression takes this form to investigate whether the variance of the residuals (embodied in û2t) varies systematically with any known variables relevant to the model.

Relevant variables will include the original explanatory variables, their squared values, and their cross-products.

This regression should include a constant term, even if the original regression did not  result of the fact that û2t will always have a non-zero mean, even if ût has a zero mean.

5.4.1 DETECTION OF HETEROSCEDASTICITY

3. Given the auxiliary regression, as stated above, the test can be conducted using two different approaches.

First, it is possible to use the F-test framework described in chapter 4  estimate (5.3) as the unrestricted regression and then running a restricted regression of û2t on a constant only; the RSS from each specification would then be used as inputs to the standard F-test formula.

5.4.1 DETECTION OF HETEROSCEDASTICITY

22

An alternative approach can be adopted that does not require the estimation of a second (restricted) regression  known as a Lagrange Multiplier (LM) test, which centres around the value of R2 for the auxiliary regression.

If one or more coefficients in (5.3) is statistically significant, the value of R2 for that equation will be relatively high, while if none of the variables is significant, R2 will be relatively low.

5.4.1 DETECTION OF HETEROSCEDASTICITY

The LM test would thus operate by obtaining R2 from the auxiliary regression and multiplying it by the number of observations, T. It can be shown that

TR2  X2(m)

where m is the number of regressors in the auxiliary regression (excluding the constant term), equivalent to the number of restrictions that would have to be placed under the F-test approach.

5.4.1 DETECTION OF HETEROSCEDASTICITY

4. The test is one of the joint null

hypothesis that 2 = 0, and 3 = 0,

and 4 = 0, and 5 = 0, and 6 = 0.

For the LM test, if the X2-test

statistic from step 3 is greater than

the corresponding value from the

statistical table then reject the

null hypothesis that the errors are

homoscedastic.

5.4.1 DETECTION OF HETEROSCEDASTICITY

Example 5.1

Suppose that the model (5.2) above has been estimated using 120 observations; R2 from auxiliary regression (5.3) is 0.234. The test statistic will be given by TR2 = 120 x 0.234 = 28.08, which will follow a X2(5) under the null hypothesis.

The 5% critical value from the X2 table is 11.07. The test statistic is therefore more than the critical value and hence the null hypothesis is rejected.

5.4.1 DETECTION OF HETEROSCEDASTICITY

Conclusion: significant evidence of heteroscedasticity  it would not be plausible to assume that the variance of the errors is constant in this case.

5.4.1 DETECTION OF HETEROSCEDASTICITY

5.4.2 consequences of using ols in the presence of heteroscedasticty

What happens if the errors are heteroscedastic, but this fact is ignored and the researcher proceeds with estimation and inference?

OLS estimators will still give unbiased (and therefore also consistent) coefficient estimates, but they are no longer BLUE – that is, they no longer have the minimum variance among the class of unbiased estimators.

The reason: the error variance, 2, plays no part in the proof that the OLS estimator is consistent and unbiased, but σ2 does appear in the formulae for the coefficient variances.

If the errors are heteroscedastic, the formulae presented for the coefficient standard errors no longer hold. [For a very accessible algebraic treatment of the consequences of heteroscedasticity, see Hill, Griffiths and Judge (1997, pp. 217-18)].

5.4.2 consequences of using ols in the presence of heteroscedasticty

So, if OLS is still used in the presence of heteroscedasticity, the standard errors could be wrong and hence any inferences made could be misleading.

In general, the OLS standard errors will be too large for the intercept when the errors are heteroscedastic.

The effect of heteroscedasticity on the slope standard errors will depend on its form.

5.4.2 consequences of using ols in the presence of heteroscedasticty

For example, if the variance of the errors is positively related to the square of an explanatory variable (which is often the case in practice), the OLS standard error for the slope will be too low.

On the other hand, the OLS slope standard errors will be too big when the variance of the errors is inversely related to an explanatory variable.

5.4.2 consequences of using ols in the presence of heteroscedasticty

5.4.3 dealing with heteroscedasticity

If the cause of the heteroscedasticity is known, then an alternative estimation method which takes this into account can be used.

One possibility is called generalised least squares (GLS).

For example, suppose that the error variance was related to zt by the expression

var (ut) = σ2z2t (5.6)

All that would be required to remove the heteroscedasticity would be to divide the regression equation (5.2) through by zt

(5.7)

where vt = is an error term.

Now, if var(ut) = 2 z2t,

for known z.

5.4.3 dealing with heteroscedasticity

33

Therefore, the disturbances from (5.7) will be homoscedastic.

Note that this latter regression does not include a constant since 1 is multiplied by (1/zt).

GLS can be viewed as OLS applied to transformed data that satisfy the OLS assumptions.

GLS is also known as weighted least squares (WLS), since under GLS a weighted sum of the squared residuals is minimised, whereas under OLS it is an unweighted sum.

5.4.3 dealing with heteroscedasticity

Researchers are typically unsure of the exact cause of the heteroscedasticity; hence this technique is usually infeasible in practice.

Two other possible ‘solutions’ for heteroscedasticity are shown in box 5.2.

5.4.3 dealing with heteroscedasticity

1. Transforming the variables into logs or reducing by some other measure of ‘size’. This has the effect of re-scaling the data to ‘pull in’ extreme observations.

The regression would then be conducted upon the natural logarithms or the transformed data.

Taking logarithms also has the effect of making a previously multiplicative model, such as the exponential regression model discussed previously (with a multiplicative error term), into an additive one.

5.4.3 dealing with heteroscedasticity

Box 5.2 ‘Solutions’ for heteroscedasticity

However, logarithms of a variable cannot be taken in situations where the variable can take on zero or negative values, for the log will not be defined in such cases.

2. Using heteroscedasticity-consistent error estimates. Most standard econometrics software package have an option that allows the user to employ standard error estimates that have been modified to account for the heteroscedasticity following White (1980).

5.4.3 dealing with heteroscedasticity

5.4.4 testing for heteroskedasticity using eviews

Homework 6:

Replicate the test shown in this section (pages 186-188), present output from EViews, and interpret results on presence of heteroskedasticity (three tests are discussed in this section). Submit.

5.5 assumption 3:

Assumption 3 on CLRM’s disturbance terms is that the covariance between the error terms over time (or cross-sectionally, for this type of data) is zero  errors are uncorrelated with one another.

If the errors are not uncorrelated with one another, they are ‘autocorrelated’ or ‘serially correlated’. A test of this assumption is required.

Again, the population disturbances cannot be observed, so tests for autocorrelation are conducted on the residuals, û.

Before we can proceed on formal tests for autocorrelation, the concept of the lagged value of a variable needs to be defined.

The lagged value of a variable (which may be yt, xt or ut) is simply the value that the variable took during a previous period.

For example, the value yt lagged one period, written yt-1, can be constructed by shifting all of the observations forward one period in a spredsheet, as illustrated in table 5.2.

5.5 assumption 3:

Table 5.1 Constructing a series of lagged values and first difference

5.5.1 the concept of a lagged value

t yt yt-1 yt

1989M09 0.8 - -

1989M10 1.3 0.8 1.3-0.8=0.5

1989M11 -0.9 1.3 -0.9-1.3=-2.2

1989M12 0.2 -0.9 0.2--0.9=1.1

1990M01 -1.7 0.2 -1.7-0.2=-1.9

1990M02 2.3 -1.7 2.3--1.7=4.0

1990M03 0.1 2.3 0.1-2.3=-2.2

1990M04 0.0 0.1 0.0-0.1=-0.1

. . . .

. . . .

. . . .

So, the value in the 1989M 10 row and the yt-1 column shows the value that yt took in the previous period, 1989M 09, which was 0.8.

The last coumn in table 4.2 shows another quantity relating to y, namely the ‘first difference’.

Te first difference of y, also known as the change in y, and denoted ∆yt, is calculated as the diference between the values of y in this period and in the previous period.

5.5.1 the concept of a lagged value

This is calculated as:

∆ yt = yt - yt-1 (5.8)

Note that when one-period lags or first differences of a variable are constructed, the first observation is lost.

Thus a regression of ∆yt using the above data would begin with the October 1989 data point.

It is also possible to produce two-period lags, three-period lags, and so on.

These would be accomplished in the obvious way.

5.5.1 the concept of a lagged value

In order to test for autocorrelation, investigate whether any relationships exist between the current value û, ût, and any of its previous values, ût-1, ût-2,…

The first step: consider possible relationships between the current residual and the immediately previous one, ût-1, via a graphical exploration.

Thus ût is plotted against ût-1, and ût is plotted overtime.

Some stereotypical patterns that may be found in the residuals are discussed below.

5.5.2 graphical tests for autocorrelation

Figure 4.5

Plot of ût against

ût-1, showing

positive

autocorrelation

5.5.2 graphical tests for autocorrelation

Figure 5.3

Figure 4.6

Pot of ût over time,

showing positive

autocorrelation

5.5.2 graphical tests for autocorrelation

Figure 54

Figure 5.3 and 5.4 show positive autocorrelation in the residuals  cyclical residual plot over time  case is positive autocorrelation  on average if the residual at time t – 1 is positive, the residual at time t is likely to be also positive; similarly, if the residual at t – 1 is negative, the resudual at t is also likely to be negative.

Figure 5.3 shows that most of the dots representing observations are in the first and third quadrants.

Figure 5.4 shows that a positively autocorrelated series of residuals will not cross the time-axis very frequently.

5.5.2 graphical tests for autocorrelation

5.5.2 graphical tests for autocorrelation

Figure 5.5

Figure 4.8

Plot of ut over time

showing negative

autocorrelation

5.5.2 graphical tests for autocorrelation

Figure 5.6

Figure 5.5 and 5.6 show negative autocorrelation  alternating pattern in the residuals  case is known as negative autocorralation  on average if the residual at t - 1 is positive, the residual at time t is likely to be negative; similarly, if the residual at t – 1 is negative, the residual at t is likely to be positive.

Figure 5.5 shows that most of the dots are in the second and fourth quadrants.

Figure 5.6 shows that a negatively autocorrelated series of residuals will cross the time-axis more frequently than if they were ditributed randomly.

5.5.2 graphical tests for autocorrelation

Figure 5.7

Plot of ût against

ût-1 showing no

autocorrelation

5.5.2 graphical tests for autocorrelation

5.5.2 graphical tests for autocorrelation

Figure 5.8

Finally, figure 5.7 and 5.8 show no pattern in residuals at all  desirable property.

In the plot of ût against ût-1 (figure 5.7), the points are randomly spread across all four quadrants, and the time series plot of the residuals (figure 5.8) does not cross the x-axis either too frequently or too little.

5.5.2 graphical tests for autocorrelation

5.5.3 detecting autocorrelation: the durbin-watson test

A first step in testing whether the residual series from an estimated model are autocorrelated would be to plot the residuals as above  look for any patterns.

Graphical methods may be difficult to interpret in practice, however, and hence a formal statistical test should also be applied.

The simplest test is from Durbin and Watson (1951).

Durbin-Watson (DW) is a test for first order autocorrelation  tests only for a relationship between an error and its immediately previous value.

One way to motivate the test and to interpret the test statistic would be in the context of a regression of the time t error on its previous value

ut = ρut-1 + vt (5.9)

where vt  N(0, v2).

5.5.3 detecting autocorrelation: the durbin-watson test

The DW test statistic has as its null and alternative hypotheses:

Ho: ρ = 0 and H1: ρ ≠ 0

Thus, under the null hypothesis, the errors at time t - 1 and t are independent of one another, and if this null were rejected it would be concluded that there was evidence of a relationship between successive residuals.

5.5.3 detecting autocorrelation: the durbin-watson test

In fact, it is not necessary to run the regression given by (5.9) since the test statistic can be calculated using quantities that are already available after the first regression has been run.

(5.10)

5.5.3 detecting autocorrelation: the durbin-watson test

The denominator of the test statistic is simply (the number of observations -1) x the variance of the residuals.

This arises since if the average of the residual is zero,

so that

5.5.3 detecting autocorrelation: the durbin-watson test

The numerator ‘compares’ the values of the error at times t – 1 and t.

If there is positive autocorrelation in the errors, this difference in the numerator will be relatively small; if there is negative autocorrelation, with the sign of the error changing very frequently, the numerator will be relatively large.

No autocorrelation would result in a value for the numerator between small and large.

5.5.3 detecting autocorrelation: the durbin-watson test

It is also posible to express the DW statistic as an approximate function of the estimated value of ρ

(5.11)

where is the estimated correlation coefficient that would have been obtained from an estimation of (5.9).

5.5.3 detecting autocorrelation: the durbin-watson test

Thus, it is possible to write

(5.13)

so that the DW test statistic is approximately equal to 2(1- ).

Since is a correlation, it implies that

-1  1.

That is, is bounded to lie between -1 and +1.

5.5.3 detecting autocorrelation: the durbin-watson test

Substituting in these limits for to calculate DW from (5.11) would give the corresponding limits for DW as 0  DW  4.

Consider now the implication of DW taking one of three important values (0, 2, and 4):

● = 0, DW = 2  no autocorrelation in the residuals  null hypothesis would not be rejected if DW is near 2  little evidence of autocorrelation.

5.5.3 detecting autocorrelation: the durbin-watson test

● =1, DW = 0  corresponds to the case where there is perfect positive autocorrelation in the residuals.

● =-1, DW = 4  corresponds to the case where there is perfect negative autocorrelation in the residuals.

5.5.3 detecting autocorrelation: the durbin-watson test

The DW test does not follow a standard statistical distribution such as a t, F, or X2.

DW has 2 critical values: an upper critical value (dU) and a lower critical value (dL); there is also an intermediate region where the null hypothesis of no autocorrelation can neither be rejected nor not rejected.

The rejection, non-rejection, and inconclusive regions are shown on the following number line:

5.5.3 detecting autocorrelation: the durbin-watson test

So, the null hypothesis is rejected and the existance of positive autocorrelation presumed if DW is less than the lower critical value;

the null hypothesis is rejected and the existence of negative autocorrelation presumed if DW is greater than 4 minus the lower critical value;

the null hypothesis is not rejected and no significant residual autocorrelation is presumed if DW is between the upper and 4 minus the upper limits.

5.5.3 detecting autocorrelation: the durbin-watson test

Example 5.2

A researcher wishes to test for first order serial correlation in the residuals from a linear regression. DW test statistic value is 0.86. There are 80 quarterly observations in the regression, and the regression is of the form

yt=1+ 2x2t+3x3t+4x4t+ ut (5.15)

5.5.3 detecting autocorrelation: the durbin-watson test

The relevant critical values for the test (see table A2.6 in the appendix of statistical distributions at the end of text) are

dL=1.42, dU =1.57, so 4 - dU =2.43 and 4 - dL =2.58.

The test statistic is clearly lower than the lower critical value and hence the null hypothesis of no autocorrelation is rejected; conclude that the residuals from the model appear to be positively autocorrelated.

5.5.3 detecting autocorrelation: the durbin-watson test

DW is a test only of whether consecutive errors are related to one another.

So, DW test cannot be applied if a certain set of circumstances are not fulfilled.

There will also be many forms of residual autocorrelation that DW cannot detect.

For example, if corr(ût, ût-1)=0, but corr(ût, ût-2) ≠ 0, DW as defined above will not find any autocorrelation.

5.5.5 another test for autocorrelation: the breusch-godfrey test

One possible solution would be to replace ût-1 in (5.10) with ût-2.

However pairwise examinaions of the correlations:

(ût-ût-1),(ût-ût-2),(ût-ût-3),... will be tedious in practice and is not coded in Econometrics software packages, which have been programmed to construct DW using only a one-period lag.

5.5.5 another test for autocorrelation: the breusch-godfrey test

Also, the approximation in (5.11) will deteriorate as the difference between the two time indices increases. Consequently, the critical values should also be modified somewhat in these cases.

It is desirable to examine a joint test for autocorrelation that will allow examination of the relationship between ût and several of its lagged values at the same time.

5.5.5 another test for autocorrelation: the breusch-godfrey test

The Breusch Godfrey test is a more general test for autocorrelation up to the rth order.

The model for the errors under this test is

ut = ρ1ut-1+ρ2ut-2+ρ3ut-3+...+ρrut-r+vt,

vt  N (0, σv2) (5.16)

The null and alternative hypothesis are:

Ho: ρ1=0 and ρ2=0 and ...and ρr =0

Ho: ρ1≠0 or ρ2≠0 or ...or ρr≠0

So, under the null hypothesis, the current error is not related to any of its r previous values. The test is carried out as in box 5.4.

5.5.5 another test for autocorrelation: the breusch-godfrey test

5.5.5 another test for autocorrelation: the breusch-godfrey test

Box 5.4 Conducting a Breusch-Godfrey test

5.17

A potential dificulty with Breusch-Godfrey is in determining an appropriate value of r, the number of lags of the residuals, to use in computing the test  no obvious answer to this  experiment with a range of values, and also use different frequency of data.

If the data is monthly or quarterly, set r equal to 12 or 4, respectively  the argument: errors at any given time would be expected to be related only to those errors in the previous year.

If the model is statistically adequate, no evidence of autocorrelation should be found in the residuals whatever value of r is chosen.

5.5.5 another test for autocorrelation: the breusch-godfrey test

5.5.6 consequences of ignoring autocorrelation if it is present

The consequences of ignoring autocorrelation when it is present are similar to those of ignoring heteroskedasticity.

The coefficient estimates derived using OLS are still unbiased, but they are inefficient, i.e. they are not BLUE, even at large sample sizes so that the standard error estimates could be wrong  possibility that the wrong inferences could be made about whether a variable is or is not an important determinant of variations in y.

In the case of positive serial correlation in the residuals, the OLS standard error estimates will be biased downwards relative to the true standard errors  OLS will understate their true variability  leads to an increase in the probability of type 1 error — that is, a tendency to reject the null hypothesis sometimes when it is correct.

Furthermore, R2 is likely to be inflated relative to its ‘correct’ value if autocorrelation is present but ignored, since residual autocorrelation will lead to an underestimate of the true error variance (for positive autocorrelation).

5.5.6 consequences of ignoring autocorrelation if it is present

If the form of the autocorrelation is known, it would be possible to use a GLS procedure.

One approach, which was once fairly popular, is known as the Cochran-Orcutt procedure (see box 5.5)  works by assuming a particular form for the structure of the autocorrelation (usally a first order autotegressive process—chapter 6 gives a general description of these models).

5.5.7 dealing with autocorrelation

The model would thus be specified as follows:

yt=1+ 2x2t+ 3x3t+ ut, ut =ρut-1+vt (5.18)

Note that a constant is not required in the specification for the errors since E(ut)=0.

If this model holds at time t, it is assumed to also hold for time t-1, so that the model in (5.18) is lagged one period

yt-1=1+ 2x2t-1+ 3x3t-1+ ut-1, (5.19)

5.5.7 dealing with autocorrelation

Multiplying (5.19) by ρ

ρyt-1 = ρ1+ ρ2x2t-1+ρ3x3t-1+ρut-1 (5.20)

Subtracting (5.20) from (5.18) would give

yt - ρyt-1 = 1 -ρ1 +2x2t -ρ2x2t-1+3x3t

-ρ3x3t-1 +ut - ρut-1 (5.21)

Factorising, and noting that vt = ut - ρut-1

(yt - ρyt-1)=(1- ρ)1 + 2(x2t-ρx2t-1)

+ 3(x3t-ρx3t-1)+ vt (5.22)

5.5.7 dealing with autocorrelation

(5.23)

Since the final specification (5.23) contains an error term that is free from autocorrelation, OLS can be directly applied to it.

This procedure is effectively an application of GLS.

Of course, the construction of etc. requires ρ to be known; this will never be the case in practice so that ρ has to be estimated before (5.23) can be used.

5.5.7 dealing with autocorrelation

A simple method would be to use the ρ obtained from rearranging the equation for the DW statistic given in (5.11).

However this is only an approximation as the related algebra showed; approximation poor in the context of small samples.

The Cochran-Orcutt procedure is an alternative, which operates as in box 5.5.

5.5.7 dealing with autocorrelation

5.5.7 dealing with autocorrelation

Box 5.5 The Cochran-Orcutt Procedure

(1) Assume that the general model is of the form (5.18) above. Estimate the equation in (5.18) using OLS, ignoring the residual autocorrelation.

(2) Obtain the residuals, and run the regression

ˆ ut = ρ ˆ ut−1 + vt (5.27)

(3) Obtain ρˆ and construct y∗t etc. using this estimate of ρˆ.

(4) Run the GLS regression (5.23).

This could be the end of the process.

However, Cochran an Orcutt (1949) argue that better estimates can be obtained by going through steps 2--4 again.

That is, given the new coefficient estimates, construct again the residual and regress it on its previous value to obtain a new estimate for . This would then be used to construct new values of the variables and a new (5.23) is estimated.

5.5.7 dealing with autocorrelation

This procedure would be repeated until the change in between one iteration and the next is less than some fixed amount (e.g., 0.01).

In practice, a small number of iterations (no more than 5) will usually suffice.

5.5.7 dealing with autocorrelation

83

5.5.9 why might lags be required in a regression?

Lagged values of the explanatory variables or of the dependent variable (or both) may capture important dynamic structure in the dependent variables that might be caused by a number of factors.

Two possibilities that are relevant in finance are as follows:

Inertia of the dependent variable

Often a change in the value of one of the explanatory variables will not affect the dependent variable immediately during one time period, but rather with a lag over with several time periods.

For example, the effect of a change in microstructure or government policy may take a few months or longer to work through since agents may be initially unsure of what the implications for asset pricing are, and so on.

More generally, many variables in economics and finance will change only slowly.

Delays in response may also arise as a result of technological or institutional factors. For example, the speed of technology will limit how quickly investors’ buy or sell orders can be executed.

5.5.9 why might lags be required in a regression?

Similarly, many investors have savings plans or other financial products where they are ‘locked in’ and therefore unable to act for a fixed period.

It is also worth noting that dynamic structure is likely to be stronger and more prevalent the higher is the frequency of observation of the data.

5.5.9 why might lags be required in a regression?

Overreactions

Financial markets overreact to news. If a firm makes a profit warning, implying that its profits are likely to be down when formally reported later in the year, the markets might preceive this as implying that the value of the firm is less than was previously thought, and hence the price of its shares will fall.

If there is an overreaction, the price will initially fall below that which is appropriate for the firm given this bad news, before subsequently bouncing back up to a new level.

Moving from a purely static model to one which alows for lagged effects is likely to reduce, and possibly remove, serial correlation which was present in the static model’s residuals.

5.5.9 why might lags be required in a regression?

5.5.9 why might lags be required in a regression?

Omission of relevant variables which are themselves autocorrelated

If a variable which is an important determinant of movements in y has not been included in the model, and is autocorrelated , this will induce the residuals from the estimated model to be serially correlated.

In many instances, a move from a static to a dynamic one will result in a removal of residual autocorrelation.

The use of lagged variables in regression model does, however, bring with it additional problems:

 Inclussion of lagged values of the dependent variable violates the assumption that the explanatory variables are non-stochastic (assumption 4 of the CLRM), since by definition the value of y is determined partly by a random error term, and so its lagged values cannot be non-stoshastic.

5.5.11 problems with adding lagged regressors to ‘cure’ autocorrelation

 What does an equation with a large number of lags actually means?

A models with many lags may have solved a statistical problem (autocorrelated residuals) at the expense of creating an interpretational one (the empirical model containing many lags or diferencd terms is difficult to interpret and may not test the original financial theory that motivated the use of regression analysis in the first place).

5.5.11 problems with adding lagged regressors to ‘cure’ autocorrelation

5.5.12 autocorrelation and dynamic models in eviews

Homework 7:

Replicate Section 5.5.12 (pages 206-208) using EViews. Interpret results. Submit.

Autocorrelation may occur in the context of a time series regression.

It is also plausible that autocorrelation could be present in certain types of cross-sectional data.

For example, if the cross-sectional data comprise the profitability of banks in different regions of the US, autocorrelation may arise in a spatial sense, if there is a regional dimension to bank profitability that is not captured by the model. Thus the residuals from banks of the same region or in neighbouring regions may be correlated. Testing for autocorrelation in the case would be rather more complex than in the time series context

5.5.13 autocorrelation in cross-sectional data

5.6 assumption 4: the Xt are non-stochastic

The OLS estimator is consistent and unbiased in the presence of stochastic regressors, provided that the regressors are not correlated with the error term of the estimated equation. (Proof is in text)

However, if one or more of the explanatory variables is contemporaneously correlated with the disturbance term, the OLS estimator will not even be consistent  results in the estimator assigning explanatory power to the variables where in reality it is arising from the correlation between the error term and yt.

5.7 assumption 5: the disturbances are normally distributed

The normality assumption (utN(0, σ2)) is required in order to conduct single or joint hypothesis tests about the model parameters.

5.7.1 testing for departure from normality

Bera-Jarque test is one of the most commonly applied test for normality is the (hereafter BJ).

Uses the property of a normally distributed random variable that the entire distribution is characterised by the first two moments-the mean and the variance.

The standardized third and fourth moments of a distribution are known as its skewness and kurtosis.

Skewness measures the extent to which a distribution is not symmetric about its mean value.

Kurtosis measures how fat the tails of the distribution are.

A normal distribution is not skewed and is defined to have a coefficient of kurtosis of 3.

It is possible to define a coefficient of excess kurtosis, equal to the kurtosis minus 3; a normal distribution will thus have coefficient of excess kurtosis of zero.

5.7.1 testing for departure from normality

A normal distribution is symmetric and is said to be mesokurtic.

Figures 5.11 and 5.12 give some illustrations of what a series having specific departures from normality may look like.

A normal distribution is symmetric about its mean; a skewed distribution will not be; will have one tail longer than the other.

5.7.1 testing for departure from normality

5.7.1 testing for departure from normality

Figure 4.11

A normal versus

a skewed

distribution

Figure 5.11

x

Leptokurtic distribution (LD): one which has fatter tails and is more peaked at the mean than a normally distributed random variable with the same mean and variance.

More likely to characterize financial (and economic) time series, and also the residuals from a financial time series model.

Platykurtic distribution will be less peaked in the mean, will have thinner tails, and more of the distribution in the shoulders than a normal.

In figure 5.12, the LD is shown by the bold line, with the normal by faint line.

5.7.1 testing for departure from normality

5.7.1 testing for departure from normality

Figure 5.12

Bera and Jarque (1981) formalise these ideas by testing whether the coefficient of skewness and the coefficient of excess kurtosis are jointly zero.

Denoting the errors by u and their variance by σ2, it can be proved that the coefficiets of skewness and kurtosis can be expressed respectively as

(5.48)

5.7.1 testing for departure from normality

The kurtosis of the normal distribution is 3 so its excess kurtosis (b2-3) is zero.

The Bera-Jarque test statistic is given by

(5.49)

where T is the sample size. The test statistic asymptotically follows a X2(2) under the null hypothesis that the distribution of the series is symmetric and mesokurtic.

5.7.1 testing for departure from normality

b1 and b2 can be estimated using the residuals from the OLS regression, û.

The null hypothesis is of normality, and this would be rejected if the residuals from the model were either significantly skewed or leptokurtic/platykurtic (or both).

5.7.1 testing for departure from normality

5.7.3 what should be done if evidence of non-normality is found?

Not obvious what should be done!

Possible to employ an estimation method (e.g. nonparametric method) that does not assume normality; such a method may be dificult to implement; may be less sure of its properties.

It is thus desirable to stick with OLS if possible, since its behavior in a variety of circumstances has been well researched.

For sample sizes that are sufficiently large, violation of the normality assumption is virtually inconsequential  appealing to a central limit theorem, the test statistics will assymptotically follow the appropriate distributions even in the absence of error normality.

In economic or financial modelling, quite often one or two very extreme residuals cause a rejection of the normality assumption  such observation would appear in the tails of the distribution  would lead u4, which enters into the definition of kurtosis, to be very large.

5.7.3 what should be done if evidence of non-normality is found?

Such observations are known as outliers. In such cases, one way to improve the chances of error normality is to use dummy variables or some other method to effectively remove those observations.

Suppose that a monthly model of asset returns from 1980-1990 had been estimated, and the residuals plotted, and that a particularly large outlier has been observed for October 1987, shown in figure 5.12.

A new variable: D87M10t can be defined as

D87M10t = 1 during October, zero otherwise.

5.7.3 what should be done if evidence of non-normality is found?

Figure 4.13

Regression

residuals from

stock return data,

outlier for

October 1987.

5.7.3 what should be done if evidence of non-normality is found?

Figure 5.12

Time Value of dummy variable D 87 M 10t

1986M12 0

1987M01 0

- -

- -

- -

1987M09 0

1987M10 1

1987M11 0

- -

- -

- -

5.7.3 what should be done if evidence of non-normality is found?

Box 5.7 Observations for the dummy variable

The observations for the dummy variable would appear as in box 4.7.

Use the dummy variable just like any other variable in the regression model, e.g.

yt = 1 +2x2t+3x3t+4D87M10t+ut (5.50)

This type of dummy variable that takes the value one for only a single observation has an effect exactly equivalent to knocking out that observation from the sample altogether, by forcing the residual for that observation to zero. (The estimated coefficient on the dummy variable will be equal to the residual that the dummied observation would have taken if the dummy variable had not been included.)

5.7.3 what should be done if evidence of non-normality is found?

Many econometricians argue that dummy variables to remove outliers will reduce standard errors, reduce the RSS, and so increase R2, thus improving the apparent fit of the model.

Another consideration: it is also hard to reconcile with the notion in statistics that each data point represents a useful piece of information.

The other side of this argument: outliers that are ‘a long way away’ from the rest of the data and do not fit in with the general pattern of the rest of the data can have a serious effect on coefficient estimates, since by definition, OLS will receive a big penalty, in the form of an increased RSS, for points that are a long way away from the fitted line.

5.7.3 what should be done if evidence of non-normality is found?

Consequently, OLS will try extra hard to minimise the distances the points that would have otherwise been a long way away from the line.

Graphicaly the possible effect of an outlier on OLS estimation is shown in figure 4.13 in which one point is a long way away from the rest which if included, the fitted line will be dotted one, which has a slight positive slope.

If this observation were removed, the full line would be the one fitted. Clearly, the slope is now large and negative.

OLS would not select this line if the outlier is included since the observation is a long way from the others and hence when the residual (the distance from the point to the fitted line) is squared, it would lead to a big increase in the RSS.

5.7.3 what should be done if evidence of non-normality is found?

Figure 4.14

Possible effect of

an outlier on OLS

estimation

5.7.3 what should be done if evidence of non-normality is found?

Figure 5.13

Outliers could be detected by plotting y against x only in the context of a bivariate regression.

Where there are more explanatory variables, outliers are easily identified by plotting the residuals over time, as in figure 5.12.

So, a trade-off potentially exists between the need to remove outlying observations that could have an undue impact on the OLS estimates and cause residual non-normality on the one hand, and the notion that each data point represents a useful piece of information on the other.

Removing observations at will could artificially improve the fit of the model.

5.7.3 what should be done if evidence of non-normality is found?

A sensible way: introduce dummy variables only if there is both a statistical need to do and a theoretical justification for their inclusion.

Use dummy variables to remove observations or extreme events that are considered highly unlikely to be repeated, and the information content of which is deemed of no relevance for the data as a whole: market crashes, financial panics, government crises, and so on.

Non-normality in financial data could also arise from certain types of heteroscedasticity, known as ARCH (chapter 9)  non-normality is intrinsic to all of the data and therefore outlier removal would not make the residuals of such a model normal.

5.7.3 what should be done if evidence of non-normality is found?

Another important use of dummy variables is in the modelling of seasonality in financial data, and accounting for so-called ‘calendar anomalies’, such as day-of-the-week effects and week-end effects. These are discussed in chapter 10.

5.7.3 what should be done if evidence of non-normality is found?

5.8 multicollinearity

An implicit assumption that is made when using the OLS estimation method is that the explanatory variables are not correlated with one another.

If there is no relationship between the explanatory variables, they would be said to be orthogonal to one another.

If the explanatory variables were orthogonal to one another, adding or removing a variable from a regression equation would not cause the values of the coefficients on the other variables to change.

In practice, the correlation between explanatory variables will be non-zero. If small in degree, it will not cause too much loss of precision.

However, a problem occurs when the explanatory vriables are very highly correlated with each other, and this problem is known as multicollinearity.

Two classes of multicollinearity: perfect multicollinearity and near multicollinearity.

5.8 multicollinearity

Perfect multicollinearity: exact relationship between two or more variables  not possible to estimate all of the coefficients in the model; usually can be observed only when the same explanatory variable is inadvertently used twice in a regression.

Example: Suppose that two variables were employed in a regression function such that the value of one variable was always twice that of the other (e.g. suppose x3=2x2). If both x3 and x2 were used as explanatory variables in the same regression, then the model parameters cannot be estimated. Since the two variables are perfectly related to one another, together they contain only enough information to estimae one parameter, not two.

5.8 multicollinearity

Near multicollinearity is much more likely to occur in practice, and would arise when there was a non-negligible, but not perfect, relationship between two or more of the explanatory variables.

Note that a high correlation between the dependent variable and one of the independent variables is not multicollinearity.

5.8 multicolinearity

Testing for multicollinearity is surprisingly difficult, and hence all that is presented in text is a simple method to investigate the presence or otherwise of the most easily detected forms of near multicollinearity.

This method simply involves looking at the matrix of correlations between the individual variables.

Suppose that a regression equation has three explanatory variables (plus a constant term), and that the pair-wise correlations between these explanatory variables are:

5.8.1 measuring near multicollinearity

5.8.1 measuring near multicollinearity

Clearly, if multicollinearity was suspected, the most likely culprit would be a high correlation between x2 and x4. Of course, if the relationship involves three or more variables that are collinear – e.g. x2 + x3  x4 – then multicollinearity would be very difficult to detect.

First, R2 will be high but individual coefficients will have high standard errors; regression ‘looks good’, but individual variables are not significant; arises in the context of very closely related explanatory variables as a result of the dificulty in observing the individual contribution of each variable to overall fit of the regression.

Second, regression becomes very sensitive to small changes in specification  adding or removing an explanatory variable leads to large changes in coefficient values or significances of other variables.

Finally, near multicollinearity will thus make confidence intervals for parameters very wide, and significance tests might therefore give inappropriate conclusions, and so making it difficult to draw sharp conclusion.

5.8.2 problems if near multicollinearity is present but ignored

A number of alternative estimation techniques have been proposed that are valid in the presence of multicollinearity -- for example, ridge regression, or principal components.

Principal components analysis is discussed briefly in an appendix to chapter 4.

Many researchers do not use these techniques:

can be complex;

their properties are less well understood than those of the OLS estimator;

many econometricians would argue that multicollinearity is more a problem with the data than with the model or estimation method.

5.8.3 solutions to the problem of multicollinearity

Other, more ad hoc methods for dealing with the possible existence of near multicollinearity include:

Ignore it, if the model is otherwise adequate, i.e. statistically and in terms of each coefficient being of a plausible magnitude and having an appropriate sign.

Sometimes, the existence of multicollinearity does not reduce the t-ratios on variables that would have been significant without the multicollinearity sufficiently to make them significant.

The presence of near multicollinearity does not affect the BLUE properties of the OLS estimator: it will still be consistent, unbiased and efficient since the presence of near multicollinearity does not violate any of the CLRM assumptions 1-4.

5.8.3 solutions to the problem of multicollinearity

However, in the presence of near multicollinearity, it will be hard to obtain small standard errors.

This will not not matter if the aim of the model-building exercise is to produce forecast from the estimated model; the forecasts will be unaffected by the presence of near multicollinearity so long as this relationship between the explanatory variables continues to hold over the forcasted sample.

5.8.3 solutions to the problem of multicollinearity

Drop one of the collinear variables, so that the problem disappears.

This may be unacceptable to the researcher if there were strong a priori theoretical reasons for including both variables in the model.

Also, if the removed variable was relevant in the data generating process for y, an ommitted variable bias would result (see section 5.10)

5.8.3 solutions to the problem of multicollinearity

Transform the highly correlated variables into a ratio and include only the ratio and not the individual variables in the regression.

This may be unacceptable if financial theory suggests that changes in the dependent variable should occur following changes in the individual explanatory variables, and not a ratio of them.

5.8.3 solutions to the problem of multicollinearity

Collect more data. Near multicollinearity is more a problem with the data than with the model, so that there is insufficient information in the sample to obtain estimates for all of the coefficients. Near multicollinearity leads coefficient estimates to have wide standard errors; result of small sample size. Increase in sample size will usually lead to an increase in the accuracy of coefficient estimation and a reduction in the coefficient standard errors; enables model to better dissect the effects of the various explanatory variables on the explained variable.

5.8.3 solutions to the problem of multicollinearity

A further possibility is to collect more data - taking a longer run of data, or switching to a higher frequency of sampling.

It may be infeasible to increase the sample size if all available data is being utilised already.

A further method of increasing the available quantity of data would be to use a pooled sample. This would involve the use of data with both cross-sectional and time series dimensions (see Gujrati, 1995, pp.340-1).

5.8.3 solutions to the problem of multicollinearity

5.8.4 multicollinearity in eviews

Homework 8:

Replicate section 5.8.4 (page 220) and interpret the results. Submit.

5.12 parameter stability tests

So far, regressions of a form such as

yt=1+2x2t+3x3t+ut (5.61)

have been estimated.

These regressions assume the parameters (1, 2 and 3) are constant for the entire sample, both for the data period used to estimate the model, and for any subsequent period used in the construction of forecasts.

This assumption can be tested using parameter stability tests  split the data into sub-periods and then estimate up to three models, for each of the sub-parts and for all the data and then ‘compare’ the RSS for each of the models.

Two types of tests are considered: Chow (analysis of variance) test and predictive failure test.

5.12.1 the chow test

The steps involved are shown in box 5.7.

1 Split the data into two sub-periods. Esimate the regression over the whole period and then for the two sub-periods separately (3 regressions). Obtain the RSS for each regression. 2 The restricted regression is now the regression for the whole period while the ‘unrestricted regression’ comes in two parts: one for each of the sub-samples.

Box 5.7 Conducting a Chow test

It is thus possible to form an F-test, which is based on the difference between the RSSs. The stastistic is test statistic = (5.62) where: RSS=residual sum of squares for whole sample RSS1=residual sum of squares for sub-sample 1 RSS2= residual sum of squares for sub-sample 1 T = number of ovservations 2k = number of regressors in the ‘unrestricted’ regression (since it comes in two parts) k= number of regressors in (each) ‘unrestricted’ regression.

5.12.1 the chow test

5.12.1 the chow test

The unrestricted regression is the one where the restriction has not been imposed on the model. Since the restriction is that the coefficients are equal across the sub-samples, the restricted regression will be single regression for the whole sample. Thus, the test is one of how much the RSS for the whole sample is bigger than the sum of the residual sums of squares for the two sub-samples (RSS1+RSS2).

If the coefficients do not change much between the samples, the RSS will not rise much upon imposing the restriction. Thus the test statistic in (5.73) can be considered a straight-forward application of the standard F-test formula discussed in chapter 4. The restricted residual sum of squares in (5.62) is RSS; the unrestricted residual sum of squares is (RSS1+RSS2).

5.12.1 the chow test

The number of restrictions is equal to the number of coefficients that are estimated for each of the regressions, i.e. k. Number of regressions in the unrestricted regression (including the constants) is 2k, since the unrestricted regression comes in two parts, each with k regressors. 3. Perform the test. If the value if the test statistic is greater than the critical value from the F-distribution, which is an F(k, T-2k), then reject the null hypothesis that the parameters are stable over time.

5.12.1 the chow test

It is also possible to use a dummy variable approach to calculating both Chow and predictive failure tests. In the case of the Chow test, the unrestricted regression would contain dummy variables for the intercept and for all of the slope coefficients (see chapter 10).

For example, suppose that the regression is of the form

yt=1+2x2t+3x3t+ut (5.63)

5.12.1 the chow test

If the split of the total of T obseravations is made so that the sub-samples contain T1 and T2 observations (where T1+T2=T), the unrestricted regression would be given by

yt=1+2x2t+3x3t+4Dt+5Dtx2t+6Dtx3t+vt (5.64)

where Dt=1 for t T1 and zero otherwise.

In other words, Dt takes the value one for observations in the first sub-sample and zero for observations in the second sub-sample.

5.12.1 the chow test

The Chow test viewed in this way would then be a standard F-test of the joint restriction H0:4=0 and 5=0 and 6=0, with (5.64) and (5.63) being the unrestricted and restricted regressions, respectively.

Example 5.4

Suppose that it is now January 1992. Consider the following regression for the standard CAPM  for the returns on a stock

rgt= + rMt+ut (5.65)

where rgt and rMt are excess returns on Glaxo shares and on a market portfolio, respectively.

5.12.1 the chow test

Suppose you are interested in estimating beta using monthly data from 1981 to 1992, to aid a stock selection decision.

Another researcher expresses concern that the October 1987 stock market crash fundamentally altered the risk-return relationship.

5.12.1 the chow test

Test this conjecture using a Chow test. The model for each sub period is

1981M1 ‒ 1987M10

ȓgt= 0.24+1.2rmt T=82 RSS1=0.03555 (5.66)

1987M11 ‒ 1992M12

ȓgt=0.68+1.53rmt T=62 RSS2=0.00336 (5.67)

1981M1 ‒ 1992M12

ȓgt=0.39+1.37rmt T=144 RSS=0.0434 (5.68)

5.12.1 the chow test

The null hypothesis is

H0: 1=2 and 1=2

where the subscripts 1and 2 denote the parameter for the first and second sub-sample. The test statistic will be given by

(5.69)

5.12.1 the chow test

The test statistic should be compared with a 5%, F(2,140)=3.06.

H0 is rejected at the 5% level and hence it is concluded that the restriction that the coefficients are the same in the two periods cannot be employed.

The appropriate modelling response would probably be to employ only the second part of the data in estimating the CAPM beta relevant for investment decisions made in early 1992.

5.12.1 the chow test

A problem with the Chow test is that it is necessary to have enough data to do the regresson on both sub-samples, i.e. T1»k, T2»k.

This may not hold in the situation where the total number of observations available is small.

More likely is the situation where the researcher would like to examine the effect of splitting the sample at some point very close to the start or very close to the end of the sample.

5.12.2 the predictive failure test

An alternative formulation of a test for the stability of the model is the predictive failure test; requires estimation for the full sample and one of the sub-samples only.

The predictive failure test works by estimating the regression over a ‘long’ sub-period (i.e. most of the data) and then using those coefficient estimates for predicting values of y for the other period.

These predictions for y are then implicity compared with the actual values.

Although it can be expressed in several different ways, the null hypothesis for this test is that the prediction errors for all of the forecasted observations are zero.

5.12.2 the predictive failure test

To calculate the test:

● Run the regression for the whole period (the restricted regression) and obtain the RSS.

● Run the regression for the ‘large’ sub-period and obtain the RSS (called RSS1). Note that in this book, the number of observaions for the long estimation sub-period will be denoted by T1(even though it may come second).

5.12.2 the predictive failure test

The test statistic is given by

Test statistic = (5.81) (5.70)

where T2=number of observations that the model is attempting to ‘predict’. The test statistic will follow an F(T2, T1 ‒k).

5.12.2 the predicted failure test

There are two types of predictive failure tests – forward test and backward test.

Forward predictive failure tests are where the last few observations are kept back for forecast testing. For example, suppose that observations for 1980Q1-2004Q4 are available. A forward predictive failure test could involve estimating the model over 1980Q1-2003Q4 and forcasting 2004Q1-2004Q4.

Backward predictive failure test attempts to ‘back-cast’ the first few observations, e.g. if data for 1980Q1-2004Q4 is available, and estimate using 1981Q1 to 2004Q4, and back-cast 1980Q1-1980Q4.

Both types of test offer further evidence on the stability of the regression relationship over the whole sample period.

5.12.3 backwards versus forwards predictive failure tests

Example 5.5

Suppose that the researcher decided to determine the stability of the estimated model for stock returns over the whole sample in example 5.4 by using a predictive failure test of the last two years of observations. The following models would be estimated:

1981M1-1992M12 (whole sample)

ȓgt=0.39+1.37rmt T=144 RSS=0.0434 (5.72)

1981M1‒1990M12 (‘long sub-sample’)

ȓgt=0.32+1.31rmt T=120 RSS1=0.0420 (5.73)

5.12.3 backwards versus forwards predictive failure tests

Can this regression adequately ‘forecast’ the values for the last two years? The test statistic would be given by

test statistic = (5.74)

=

Compare the test statistic with an F(24,118)=1.66 at the 5% level. The null hypothesis that the model can adequately predict the last few obsetvations would not be rejected. Conclude that the model did not suffer from predictive failure during the 1991M1‒1992M12 period.

5.12.3 backwards versus forwards predictive failure tests

5.12.4 Appropriate sub-parts

Some or all of these methods can be used to find where the overall sample split occurs.

Plot the dependent variable over time and see if there are any obvious structural changes in the series, as shown in figure 5.14. Conduct a Chow test with the sample split at this observation.

Split the data based on any known important historical events, e.g. a stock market crash, a government change, etc,

Use all but the last few observations and do a forward predictive failure test on those.

Use all but the last few observations and do a backward predictive failure test on those.

Figure 4.15

Plot of a

variable

showing

suggestion

for break

date

5.12.4 Appropriate sub-parts

Figure 5.12

5.12.4 Appropriate sub-parts

If a model is good, it will survive a Chow or predictive failure test with any break date.

If the model fails the Chow or predictive failure test, two approaches are available.

Respecify the model either by including additional variables or separate estimations are conducted for each of the sub-samples.

5.12.5 the QLR test

If the break date in a time series is not known in advance, or only a range is known, a modified version of the Chow test -- Quandt likelihood ratio – can be employed (p. 231-232 – text).

5.12.7 Stability tests in Eviews

Homework 9:

Replicate section 5.12.7 (pages 233-235) in EViews, and interpret the results. Submit.

Sovereign credit ratings are an assessment of the riskiness of debt issued by governments.

Estimate of the probality that the borrower will default on her obligation.

Two famous US ratings agencies, Moody’s and Standard and Poor’s provide ratings for many governments. Gradings are split into two broad categories: investment grade and speculative grade.

5.15 determinants of sovereign credit ratings 5.15.1 background

Investment grade issuers have good or adequate payment capacity, while speculative grade issuers either have a high degree of uncertainty about whether they will make their payments, or are already in default.

The highest grade offered by the agencies, for the highest quality of payment capacity, is ‘triple A’, which Moody’s denotes ‘Aaa’ and Standard and poor denotes ‘AAA’.

The lowest grade issued to a sovereign in the Cantor and Packer sample was B3 (Moody’s) or B-(Standard and Poor’s).

Thus the number of grades of debt quality from the highest to the lowest given to governments in their sample is 16.

5.15.1 background

The central aim of Cantor and Packer’s paper is an attempt to explain and model how the agencies arrived at their ratings. Although ratings themselves are publicly available, the models or methods used to arrive at them are shrouded in secrecy.

The agencies also provide virtually no explanation as to what the relative weights of the factors that make up the ratings are.

Thus, a model of the determinants of sovereign credit ratings could be useful in assessing whether the ratings agencies appear to have acted rationally.

Such a model could also be employed to try to predict the rating that would be awarded to a sovereign that has not previously been rated and when a re-rating is likely to occur.

5.15.1 background

The paper continues, among other things, to consider whether ratings add to publicly available information, and whether it is possible to determine what factors affect how the sovereign yields react to rating announcements.

5.15.1 background

5.15.2 data

Cantor and Packer (1996) obtain a sample of government debt ratings for 49 countries as of September 1995 that range between the above gradings.

The ratings are quantified, so that the highest credit quality (Aaa/AAA) in the sample is given a score of 16, while the lowest rated sovereign in the sample is given a score of 1 (B3/B ‒).

This score forms the dependent variable. The factors that are used to explain the variability in the ratings scores are macroeconomic variables.

All of these variables embody factors that are likely to influence a government’s ability and willingness to service its debt costs.

The model would also include proxies for socio-political factors; are difficult to measure objectively and so are not included. The included variables (with their units of measurement) are:

● Per capita income (in 1994 thousand US dollars).Cantor and Packer argue that per capita income determines the tax base, which in term influences the government’s ability to raise revenue.

5.15.2 data

● GDP growth (annual 1991-94 average,%). The growth rate of increase in GDP is argued to measure how much easier it will become to service debt costs in future.

● Inflation (annual 1992-94 average, %). Cantor and Packer argue that high inflation suggest that inflationary money financing will be used to service debt when the government is unwilling or unable to raise the required revenue through the tax system.

● Fiscal balance (average annual government budget surplus as a proportion of GDP 1992-4, %). Again, a large fiscal deficit shows that the government has a relatively weak capacity to raise additional revenue and to service debt costs.

5.15.2 data

● External balance (average annual current acount surplus as a proportion of GDP 1992-4, %). Cantor and Packer argue that a persistent current account deficit leads to increasing foreign indebtness, which may be unsustainable in the long run.

● External debt (Foreign currency debt as a proportion of exports in 1994, %). Reasoning as for external balance (which is the change in external debt over time).

5.15.2 data

● Dummy for economic development (=1 for a country classified by the IMF as developed, 0 otherwise). Cantor and Packer argue that credit rating agencies perceive developing countries as relatively more risky beyond that suggested by the values of the other factors listed above.

● Dummy for default history (=1 if a country has defaulted, 0 otherwise). It is argued that countries that have previously defaulted experience a large fall in their credit rating.

5.15.2 data

The income and inflation variables are transformed to their logarithms.

The model is linear and estimated using OLS.

OLS is not an appropriate technique when the dependent variable can take on only one of a certain limited set of values (in this case, 1,2,3,...16).

In such applications, a technique such as ordered probit (not covered in this text) would usually be more appropriate.

Cantor and Packer argue that any approach other than OLS is infeasible given the relatively small sample size (49), and large number (16) of ratings catagories.

5.15.2 data

165

The results from regressing the rating value on the variables listed above are presented in table 5.2.

Four regressions are conducted, each with identical independent variables but a different dependent variable.

Regressions are conducted for the rating score given by each agency separately, with result presented in columns (4) and (5) of table 5.2.

Occasionally, the rating agencies give different scores to a country–-for example, in the case of Italy, Moody’s gives a rating ‘A1’, which would generate a score of 12 on a 16-scale. Standard and Poor’s (S and P), on the other hand, gives a rating of ‘AA’, which would score 14 on the 16-scale, two grading higher.

5.15.2 data

Thus a regression with the average score across the two agencies, and with the difference between the two scores as dependent variables, is also conducted, and presented in column (3) and (6), respectively of table 5.2.

The models are difficult to interpret in terms of their statistical adequacy, since virtually no diagnostic test have been undertaken.

The values of the adjusted R2, at over 90% for each of the three rating regressions, are high for cross-sectional regressions, indicating that the model seems able to capture almost all of the variability of the ratings about their mean values across the sample.

5.15.2 data

There does not appear to be any attempt at reparameterisation presented in the paper, so it is assumed that the authors reached this set of models after some searching.

The residuals have an interesting interpretation as the difference between the actual and fitted ratings. The actual ratings will be integers from 1 to 16, although the fitted values from the regression and therefore the residual can take on any real value.

Cantor and Packer argue that the model is working well as no residual is bigger than 3, so that no fitted rating is more than 3 categories out from the actual rating, and only 4 countries have residuals bigger than 2 categories.

5.15.3 interpreting the models

Furthermore, 70% of the countries have ratings predicted exactly (i.e. the residuals are less than 0.5 in absolute value).

It is of interest to investigate whether the coefficients have their expected signs and sizes. The expected signs for the regression results of columns (3)-(5) are displayed in column (2) of table 5.2.

As can be seen, all of the coefficients have their expected signs, although the fiscal balance and external balance variables are not significant or are only very marginally significant in all three cases.

5.15.3 interpreting the models

The coefficients can be interpreted as the average change in the rating score that would result from a unit change in the variable.

So, for example, a rise in per capita income of $1,000 will on average increase the rating by 1.0 unit according to Moody’s and 1.5 units according to Standard and Poors.

The development dummy suggests that, on average, a developed country will have a rating 3 notches higher than an otherwise identical developing country.

And everything else equal, a country that has defaulted in the past will have a rating 2 notches lower than one that has always kept obligation.

5.15.3 interpreting the models

By and large, the ratings agencies appear to place similar weights on each of the variables, as evidenced by the similar coefficients and significances across columns (4) and (5) of table 5.2.

This is formally tested in column (6) of the table, where the dependent variable is the difference between Moody’s and Standard and Poor’s ratings.

Only 3 variables are statistically significantly differently weighted by the two agencies. Standard and Poor’s places higher weights on income and default history, while Moody’s places more emphasis on external debt.

5.15.3 interpreting the models

5.15.3 interpreting the models

Table 5.2 Determination and impacts of sovereign credit ratings

5.15.4 the relationship between ratings and yields

Cantor and Packer try to determine whether ratings have any additional information useful for modelling the cross-sectional variability of sovereign yield spreads over and above that contained in publicily available macroeconomic data.

The dependent variable is the log of the yield spread:

In(Yield on the sovereign bond- Yield on a US Treasury Bond)

One may argue that such a measure of the spread is imprecise; the true credit spread should be defined by the entire credit quality curve rather than by just two points on it.

However, leaving this issue aside, the results are presented in table 5.3.

Three regressions are presented in table 5.3, denoted by specifications (1), (2) and (3).

The first of these is a regression of the In(spread) on only a constant and the average rating (colum (3)), and this shows that ratings have a highly significant inverse impact on the spread.

Specification (2) is a regression of the In(spread) on the macroeconomic variables used in the previous analysis. The expected signs are given in column (2).

As can be seen, all coefficients have their expected signs, although now only the coefficients belonging to the external debt and the two dummy variables are statistically significant.

5.15.4 the relationship between ratings and yields

5.15.4 the relationship between ratings and yields

Table 5.3 Do ratings add to public information ?

Specification (3) is a regression on both the average rating and the macroeconomic variables.

When the rating is included with the macroeconomic factors, none of the latter is any longer significant - only the rating coefficient is statistically significantly different from zero.

This message is also portrayed by the adjusted R2 values, which are highest for the regression containing only the rating, and slightly lower for the regression containing the macroeconomic variables and the rating.

One may also observe that, under specification (3), the coefficients on the per capita income, GDP growth and inflation variables now have the wrong sign.

5.15.4 the relationship between ratings and yields

This is, in fact, never really an issue, for if a coefficient is not statistically significant, it is indistinguishable from zero in the context of hypothesis testing, and therefore it does not matter whether it is actually insignificant and positive or insignificant and negative.

Only coefficients that are both of the wrong sign and statistically significant imply that there is a problem with the regression.

We can conclude from this part of the paper that there is no more incremental information in the publicly available macroecnomic variables that is useful for predicting the yield spread than that embodied in the rating.

The information contained in the ratings encompasses that contained in the macroeconomic variables.

5.15.4 the relationship between ratings and yields

Cantor and Packer also consider whether it is possible to build a model to predict how the market will react to ratings announcements, in terms of the resutling change in the yield spread.

The dependent variable for this set of regressions is now the change in the log of the relative spread, i.e. log[(sovereign yield – treasury yield)/treasury yield], over a 2-day period at the time of the announcement.

The sample employed for estimation comprises every announcement of a ratings change that occured between 1987 and 1994.

79 such announcements were made, spread over 18 countries.

5.15.5 what determines how the market reacts to ratings announcements?

Of these, 39 are actual ratings changes by one or more of the agencies, and 40 were listed as likely in the near future to experience a regrading. Moody’s calls this a ‘watchlist’, while Standard and Poor’s term it their ‘outlook’ list.

The explanatory variables are mainly dummy variables for:

whether the announcement was positive – i.e. an upgrade

whether there was an actual ratings change or just listing for probable regrading

whether the bond was speculative grade or investment grade

whether there had been another rating announcement in the previous 60 days

the ratings gap between the announcing and the other agency.

5.15.5 what determines how the market reacts to ratings announcements?

The following cardinal variable was also employed:

the change in the spread over the previous 60 days.

The results are presented in table 5.4, but only the final specification (numbered 5 in Cantor and Packer’s exhibit 11) containing all of the variables described above is included.

We can see from table 5.4, the model appears to do a relatively poor job explaining how the market will react to ratings anouncements.

The adjusted R2: only 12%  the highest of the 5 specifications tested.

Only 2 variables are significant and 1 marginally significant of the 7 employed in the model.

5.15.5 what determines how the market reacts to ratings announcements?

Table 5.4 What determines reactions to ratings announcements?

Dependent variable: log relative spread

---------------------------------------------------------------------------------------------------------------------------------Independent variable Coefficient (t-ratio)

Intercept -0.02

(-1.4)

Positive announcements 0.01

(0.34)

Rating changes -0.01

(-0.37)

Moody’s announcements 0.02

5.15.5 what determines how the market reacts to ratings announcements?

_______________________________________

Note: * and ** denote significance at the 10% and 5% levels,

respectively.

Source: Cantor and Packer (1996). Reorinted wih permission from Institutional Investor.

5.15.5 what determines how the market reacts to ratings announcements?

Conclusion: yield changes are significantly higher following a ratings announcement for speculative than investment grade bonds, and that ratings changes have a bigger impact on yield spreads if there is an agreement between the ratings agencies at the time the announcement is made.

Further, yields change significantly more if there has been a previous announcement in the past 60 days than if not.

5.15.5 what determines how the market reacts to ratings announcements?

On the other hand, neither whether the announcement is an upgrade or downgrade, nor whether it is an actual ratings change or a name on the watchlist, nor whether the announcement is made by Moody’s or Standard and Poor’s, nor the amount by which the relative spread has already changed over the past 60 days, has any significant impact on how the market reacts to ratings announcements.

5.15.5 what determines how the market reacts to ratings announcements?

5.15.6 conclusions

To summarise, six factors appear to play a big role in determining sovereign credit ratings – income, GDP growth, inflation, external debt, industrialised or not, and default history.

The ratings provide more information on yields than all of the macroeconomic factors put together.

One can not determine with any degree of confidence what factors determine how the markets will react to ratings announcements.

t

u

ˆ

+

-

t

x

2

image2.wmf

t

x

2

image1.wmf

t

u

ˆ

+

-

� EMBED Equation.3 ���

_1016354757.unknown

_1086421233.unknown

t

t

t

t

t

t

t

t

v

z

x

z

x

z

z

y

+

+

+

=

3

3

2

2

1

1

b

b

b

(

)

(

)

2

2

2

2

2

var

var

var

s

s

=

=

=

÷

÷

ø

ö

ç

ç

è

æ

=

t

t

t

t

t

t

t

z

z

z

u

z

u

v

+

-

-

t

u

ˆ

+

1

ˆ

-

t

u

-3.7

-6

-6.5

-6

-3.1

-5

-3

0.5

-1

1

4

3

5

7

8

7

+

-

Time

t

u

ˆ

+

-

-

t

u

ˆ

+

1

ˆ

-

t

u

+

-

t

u

ˆ

Time

(

)

DW

u

u

u

t

t

t

T

t

t

T

=

-

-

=

=

å

å

$

$

$

1

2

2

2

2

Corr

x

2

x

3

x

4

x

2

-

0.2

0.8

x

3

0.2

-

0.3

x

4

0.8

0.3

-

Corr

x2

x3

x4

x2

-

0.2

0.8

x3

0.2

-

0.3

x4

0.8

0.3

-

Dependen

t Variable

Explanatory Variable

Expected

sign

Average

Rating

Moody’s

Rating

S&P

Rating

Moody’s / S&P

Difference

Intercept

?

1.442

(0.663)

3.408

(1.379)

-

0.524

(

-

0.223)

3.932**

(2.521)

Per capita income

+

1.242***

(5.302)

1.027***

(4.041)

1.458***

(6.048)

-

0.431***

(

-

2.688)

GDP growth

+

0.151

(1.935)

0.130

(1.545)

0.171**

(2.132)

-

0.040

(0.756)

Inflation

-

-

0.611***

(

-

2.839)

-

0.630***

(

-

2.701)

-

0.591***

(2.671)

-

0.039

(

-

0.265)

Fiscal Balance

+

0.073

(1.324)

0.049

(0.818)

0.097

*

(1.71)

-

0.048

(

-

1.274)

External Balance

+

0.003

(0.314)

0.006

(0.535)

0.001

(0.046)

0.006

(0.779)

External Debt

-

-

0.013***

(

-

5.088)

-

0.015***

(

-

5.365)

-

0.011***

(

-

4.236)

-

0.004***

(

-

2.133)

Development dummy

+

2.776***

(4.25)

2

.957***

(4.175)

2.595***

(3.861)

0.362

(0.81)

Default dummy

-

-

2.042***

(

-

3.175)

-

1.63**

(

-

2.097)

-

2.622***

(

-

3.962)

1.159***

(2.632)

Adjusted

R

2

0.924

0.905

0.926

0.836

Notes:

t

-

ratios in parentheses; *, **, and *** indicate significance at

the 10%, 5% and 1% levels

respectively. Source: Cantor and Packer (1996). Re

printed with permission from

Institutional Investor

.

Dependent Variable: Log (yield spread)

Variable

Expected Sign

(1)

(2)

(3)

Intercept

?

2.105***

(16.148)

0.466

(0.345)

0.074

(0.071)

Average

Rating

-

-

0.221***

(

-

19.175)

-

0.218***

(

-

4.276)

Per capita

income

-

-

0.144

(

-

0.927)

0.226

(1.523)

GDP growth

-

-

0.004

(

-

0.142)

0.029

(1.227)

Inflation

+

0.108

(1.393)

-

0.004

(

-

0.068)

Fiscal Balance

-

-

0.037

(

-

1.557)

-

0.02

(

-

1.045)

External

Balance

-

-

0.038

(

-

1.29)

-

0.023

(

-

1.008)

External Debt

+

0.003***

(2.651)

0.000

(0.095)

Development

dummy

-

-

0.723***

(

-

2.059)

-

0.38

(

-

1.341)

Default dummy

+

0.612***

(2.577)

0.085

(0.385)

Adjusted

R

2

0.919

0.857

0.914

Notes: t

-

ratios in parentheses; *, **, and *** indicate significance at the 10%, 5% and 1% levels

respectively.

Source: Cantor and Packer (1996). Re

printed with permission from

Institutional Investor

.