Urgent Econometrics test
Regression with Time-Series Data: Nonstationary Variables
The aim is to describe how to estimate regression models involving nonstationary variables
- The first step is to examine the time-series concepts of stationarity (and nonstationarity) and how we distinguish between them.
- Cointegration is another important related concept that has a bearing on our choice of a regression model
The change in a variable is an important concept
- The change in a variable yt, also known as its first difference, is given by Δyt = yt – yt-1.
- Δyt is the change in the value of the variable y from period t - 1 to period t
FIGURE 12.1 U.S. economic time series
FIGURE 12.1 (Continued) U.S. economic time series
Formally, a time series yt is stationary if its mean and variance are constant over time, and if the covariance between two values from the series depends only on the length of time separating the two values, and not on the actual times at which the variables are observed
That is, the time series yt is stationary if for all values, and every time period, it is true that:
Table 12.1 Sample Means of Time Series Shown in Figure 12.1
Nonstationary series with nonconstant means are often described as not having the property of mean reversion
- Stationary series have the property of mean reversion
The econometric model generating a time-series variable yt is called a stochastic or random process
- A sample of observed yt values is called a particular realization of the stochastic process
- It is one of many possible paths that the stochastic process could have taken
The autoregressive model of order one, the AR(1) model, is a useful univariate time series model for explaining the difference between stationary and nonstationary series:
(12.2a)
- The errors vt are independent, with zero mean and constant variance , and may be normally distributed
- The errors are sometimes known as ‘‘shocks’’ or ‘‘innovations’’
FIGURE 12.2 Time-series models
FIGURE 12.2 (Continued) Time-series models
The value ‘‘zero’’ is the constant mean of the series, and it can be determined by doing some algebra known as recursive substitution
Consider the value of y at time t = 1, then its value at time t = 2 and so on
These values are:
The mean of yt is:
Real-world data rarely have a zero mean
- We can introduce a nonzero mean μ as:
Or
With α = 1 and ρ = 0.7:
An extension to Eq. 12.2a is to consider an AR(1) model fluctuating around a linear trend: (μ + δt)
- Let the ‘‘de-trended’’ series (yt -μ - δt) behave like an autoregressive model:
Or:
Consider the special case of ρ = 1:
This model is known as the random walk model
- These time series are called random walks because they appear to wander slowly upward or downward with no real pattern
- the values of sample means calculated from subsamples of observations will be dependent on the sample period
- This is a characteristic of nonstationary series
We can understand the ‘‘wandering’’ by recursive substitution:
The term is often called the stochastic trend
- This term arises because a stochastic component vt is added for each time t, and because it causes the time series to trend in unpredictable directions
Recognizing that the vt are independent, taking the expectation and the variance of yt yields, for a fixed initial value y0:
- The random walk has a mean equal to its initial value and a variance that increases over time, eventually becoming infinite
Another nonstationary model is obtained by adding a constant term:
- This model is known as the random walk with drift
A better understanding is obtained by applying recursive substitution:
The term tα a deterministic trend component
- It is called a deterministic trend because a fixed value α is added for each time t
- The variable y wanders up and down as well as increases by a fixed amount at each time t
The mean and variance of yt are:
We can extend the random walk model even further by adding a time trend:
The addition of a time-trend variable t strengthens the trend behavior:
where we used:
The main reason why it is important to know whether a time series is stationary or nonstationary before one embarks on a regression analysis is that there is a danger of obtaining apparently significant regression results from unrelated data when nonstationary series are used in regression analysis
- Such regressions are said to be spurious
Spurious Regressions
Consider two independent random walks:
- These series were generated independently and, in truth, have no relation to one another
- Yet when plotted we see a positive relationship between them
FIGURE 12.3 Time series and scatter plot of two random walk variables
FIGURE 12.3 (Continued) Time series and scatter plot of two random walk variables
A simple regression of series one (rw1) on series two (rw2) yields:
- These results are completely meaningless, or spurious
- The apparent significance of the relationship is false
When nonstationary time series are used in a regression model, the results may spuriously indicate a significant relationship when there is none
- In these cases the least squares estimator and least squares predictor do not have their usual properties, and t-statistics are not reliable
- Since many macroeconomic time series are nonstationary, it is particularly important to take care when estimating regressions with macroeconomic variables
There are many tests for determining whether a series is stationary or nonstationary
- The most popular is the Dickey–Fuller test
The AR(1) process yt = ρyt-1 + vt is stationary when |ρ| < 1
- But, when ρ = 1, it becomes the nonstationary random walk process
- We want to test whether ρ is equal to one or significantly less than one
- Tests for this purpose are known as unit root tests for stationarity
Consider again the AR(1) model:
- We can test for nonstationarity by testing the null hypothesis that ρ = 1 against the alternative that |ρ| < 1
- Or simply ρ < 1
Unit Root Tests for Stationarity
A more convenient form is:
(12.5a)
The hypotheses are:
The second Dickey–Fuller test includes a constant term in the test equation:
(12.5b)
- The null and alternative hypotheses are the same as before
The third Dickey–Fuller test includes a constant and a trend in the test equation:
(12.5c)
- The null and alternative hypotheses are H0: γ = 0 and H1:γ < 0
To test the hypothesis in all three cases, we simply estimate the test equation by least squares and examine the t-statistic for the hypothesis that γ = 0
- Unfortunately this t-statistic no longer has the t-distribution
- Instead, we use the statistic often called a τ (tau) statistic
Table 12.2 Critical Values for the Dickey–Fuller Test
To carry out a one-tail test of significance, if τc is the critical value obtained from Table 12.2, we reject the null hypothesis of nonstationarity if τ ≤ τc
- If τ > τc then we do not reject the null hypothesis that the series is nonstationary
An important extension of the Dickey–Fuller test allows for the possibility that the error term is autocorrelated
- Consider the model:
(12.6)
where
The unit root tests based on Eq. 12.6 and its variants (intercept excluded or trend included) are referred to as augmented Dickey–Fuller tests
- When γ = 0, in addition to saying that the series is nonstationary, we also say the series has a unit root
- In practice, we always use the augmented Dickey–Fuller test
Table 12.3 AR processes and the Dickey-Fuller Tests
The Dickey-Fuller testing procedure:
- First plot the time series of the variable and select a suitable Dickey-Fuller test based on a visual inspection of the plot
- If the series appears to be wandering or fluctuating around a sample average of zero, use test equation (12.5a)
- If the series appears to be wandering or fluctuating around a sample average which is nonzero, use test equation (12.5b)
- If the series appears to be wandering or fluctuating around a linear trend, use test equation (12.5c)
The Dickey-Fuller testing procedure :
- Second, proceed with one of the unit root tests described in Sections 12.3.1 to 12.3.3
As an example, consider the two interest rate series:
The federal funds rate (Ft)
The three-year bond rate (Bt)
Following procedures described in Sections 9.3 and 9.4, we find that the inclusion of one lagged difference term is sufficient to eliminate autocorrelation in the residuals in both cases
The results from estimating the resulting equations are:
The 5% critical value for tau (τc) is -2.86
Since -2.505 > -2.86, we do not reject the null hypothesis
Recall that if yt follows a random walk, then γ = 0 and the first difference of yt becomes:
- Series like yt, which can be made stationary by taking the first difference, are said to be integrated of order one, and denoted as I(1)
- Stationary series are said to be integrated of order zero, I(0)
- In general, the order of integration of a series is the minimum number of times it must be differenced to make it stationary
The results of the Dickey–Fuller test for a random walk applied to the first differences are:
Based on the large negative value of the tau statistic (-5.487 < -1.94), we reject the null hypothesis that ΔFt is nonstationary and accept the alternative that it is stationary
- We similarly conclude that ΔBt is stationary (-7:662 < -1:94)
As a general rule, nonstationary time-series variables should not be used in regression models to avoid the problem of spurious regression
- There is an exception to this rule
There is an important case when et = yt - β1 - β2xt is a stationary I(0) process
- In this case yt and xt are said to be cointegrated
- Cointegration implies that yt and xt share similar stochastic trends, and, since the difference et is stationary, they never diverge too far from each other
Cointegration
Table 12.4 Critical Values for the Cointegration Test
There are three sets of critical values
- Which set we use depends on whether the residuals are derived from:
Consider the estimated model:
- The unit root test for stationarity in the estimated residuals is:
The null and alternative hypotheses in the test for cointegration are:
Similar to the one-tail unit root tests, we reject the null hypothesis of no cointegration if τ ≤ τc, and we do not reject the null hypothesis that the series are not cointegrated if τ > τc
Consider a general model that contains lags of y and x
- Namely, the autoregressive distributed lag (ARDL) model, except the variables are nonstationary:
If y and x are cointegrated, it means that there is a long-run relationship between them
- To derive this exact relationship, we set yt = yt-1 = y, xt = xt-1 = x and vt = 0
- Imposing this concept in the ARDL, we obtain:
- This can be rewritten in the form:
Add the term -yt-1 to both sides of the equation:
Add the term – δ0xt-1+ δ0xt-1:
Manipulating this we get:
Or:
- This is called an error correction equation
- This is a very popular model because:
- It allows for an underlying or fundamental link between variables (the long-run relationship)
- It allows for short-run adjustments (i.e. changes) between variables, including adjustments to achieve the cointegrating relationship
For the bond and federal funds rates example, we have:
The estimated residuals are
The result from applying the ADF test for stationarity is:
- Comparing the calculated value (-3.929) with the critical value, we reject the null hypothesis and conclude that (B, F) are cointegrated
How we convert nonstationary series to stationary series, and the kind of model we estimate, depend on whether the variables are difference stationary or trend stationary
- In the former case, we convert the nonstationary series to its stationary counterpart by taking first differences
- In the latter case, we convert the nonstationary series to its stationary counterpart by de-trending
Consider the random walk model:
- This can be rendered stationary by taking the first difference:
- The variable yt is said to be a first difference stationary series
A suitable regression involving only stationary variables is:
- Now consider a series yt that behaves like a random walk with drift:
with first difference:
- The variable yt is also said to be a first difference stationary series, even though it is stationary around a constant term
Suppose that y and x are I(1) and not cointegrated
- An example of a suitable regression equation is:
Consider a model with a constant term, a trend term, and a stationary error term:
- The variable yt is said to be trend stationary because it can be made stationary by removing the effect of the deterministic (constant and trend) components:
- If y and x are two trend-stationary variables, a possible autoregressive distributed lag model is:
As an alternative to using the de-trended data for estimation, a constant term and a trend term can be included directly in the equation:
where:
If variables are stationary, or I(1) and cointegrated, we can estimate a regression relationship between the levels of those variables without fear of encountering a spurious regression
If the variables are I(1) and not cointegrated, we need to estimate a relationship in first differences, with or without the constant term
If they are trend stationary, we can either de-trend the series first and then perform regression analysis with the stationary (de-trended) variables or, alternatively, estimate a regression relationship that includes a trend variable
FIGURE 12.4 Regression with time-series data: nonstationary variables
(
)
(
)
(
)
(
)
2
μ (co
nstant mean)
var
σ (con
stant variance)
cov,cov,
γ (covariance depends on , not )
t
t
ttsttss
Ey
y
yyyyst
+-
=
=
==
1
, 1
ttt
yyv
-
=r+r<
2
σ
v
101
2
212012012
2
120
()
.....
t
tttt
yyv
yyvyvvyvv
yvvvy
--
=r+
=r+=rr++=r+r+
=+r+r++r
M
(
)
(
)
2
12
0
tttt
EyEvvv
--
=+r+r+=
K
1
()()
ttt
yyv
-
-m=r-m+
1
, 1
ttt
yyv
-
=a+r+r<
()/(1)1/(10.7)3.33
t
Ey
=m=a-r=-=
1
()((1)), 1
ttt
ytytv
-
-m-d=r-m-d-+r<
1
ttt
yytv
-
=a+r+l+
1
ttt
yyv
-
=+
101
2
2120120
1
10
1
()
s
s
t
ttts
s
yyv
yyvyvvyv
yyvyv
=
-
=
=+
=+=++=+
=+=+
å
å
M
1
t
s
s
v
=
å
0120
()(...)
tt
EyyEvvvy
=++++=
2
12
var()var(...)
σ
ttv
yvvvt
=+++=
1
ttt
yyv
-
=a++
101
2
2120120
1
10
1
()2
s
s
t
ttts
s
yyv
yyvyvvyv
yyvtyv
=
-
=
=a++
=a++=a+a+++=a++
=a++=a++
å
å
M
01230
()(...)
tt
EytyEvvvvty
=a++++++=a+
2
123
var()var(...)
ttv
yvvvvt
=++++=s
1
ttt
ytyv
-
=a+d++
101
2
2120120
1
10
1
22()23
(1)
2
s
s
t
ttts
s
yyv
yyvyvvyv
tt
ytyvtyv
=
-
=
=a+d++
=a+d++=a+d+a+d+++=a+d++
+
æö
=a+d++=a+d++
ç÷
èø
å
å
M
(
)
12312
ttt
++++=+
L
111
212
:
:
ttt
ttt
rwyyv
rwxxv
-
-
=+
=+
·
2
12
17.8180.842 , 0.70
() (40.837)
tt
rwrwR
t
=+=
1
ttt
yyv
-
=r+
(
)
111
1
1
1
ttttt
ttt
tt
yyyyv
yyv
yv
---
-
-
-=r-+
D=r-+
=g+
00
11
:1:0
:1:0
HH
HH
r=Ûg=
r<Ûg<
1
ttt
yyv
-
D=a+g+
1
ttt
yytv
-
D=a+g+l+
1
1
m
ttstst
s
yyayv
--
=
D=a+g+D+
å
(
)
(
)
112223
,,
tttttt
yyyyyy
------
D=-D=-
K
¶
·
11
11
0.1730.0450.561
() (2.505)
0.2370.0560.237
() (2.703)
ttt
ttt
FFF
tau
BBB
tau
--
--
D=-+D
-
D=-+D
-
1
tttt
yyyv
-
D=-=
(
)
·
(
)
1
0.447
() (5.487)
tt
FF
tau
-
DD=-D
-
(
)
·
(
)
1
0.701
() (7.662)
tt
BB
tau
-
DD=-D
-
ˆ
1:
ttt
Equationeybx
=-
21
ˆ
2:
ttt
Equationeybxb
=--
21
ˆ
ˆ
3:
ttt
Equationeybxbt
=---d
2
ˆ
1.1400.914, 0.881
() (6.548) (29.421)
tt
BFR
t
=+=
11
ˆˆˆ
0.2250.254
() (4.196)
ttt
eee
tau
--
D=-+D
-
0
1
:the series are not cointegratedresidual
s are nonstationary
:the series are cointegratedresiduals ar
e stationary
H
H
Û
Û
11011
δθδδ
ttttt
yyxxv
--
=++++
(
)
(
)
101
1
θδδδ
yx
-=++
12
ββ
yx
=+
(
)
111011
δθ1δδ
tttttt
yyyxxv
---
-=+-+++
(
)
(
)
(
)
1101011
δθ1δδδ
tttttt
yyxxxv
---
D=+-+-+++
(
)
(
)
(
)
(
)
01
1110
11
δδ
δ
θ1δ
θ1θ1
ttttt
yyxxv
--
æö
+
D=-+++D+
ç÷
ç÷
--
èø
(
)
11210
αββδ
ttttt
yyxxv
--
D=---+D+
(
)
(
)
(
)
(
)
(
)
111
ˆ
0.1421.4290.7770.8420.327
2.857
9.387 3.855
ttttt
BBFFF
t
---
D=---+D-D
(
)
111
ˆ
1.4290.777
ttt
eBF
---
=--
(
)
(
)
11
ˆˆˆ
0.1690.180
3.929
ttt
eee
t
--
D=-+D
-
1
ttt
yyv
-
=+
1
tttt
yyyv
-
D=-=
1011
ttttt
yyxxe
--
D=qD+bD+bD+
1
ttt
yyv
-
=a++
tt
yv
D=a+
1011
ttttt
yyxxe
--
D=a+qD+bD+bD+
tt
ytv
=a+d+
tt
ytv
-a-d=
1011
ttttt
yyxxe
****
--
=q+b+b+
1011
tt
t
tt
ytyxxe
--
=a+d+q+b+b+
112011112
(1)()
a=a-q-ab+b+qd+bd
11201
(1)()
d=d-q-db+b