Final Project need help from A+Grade Helper

Samplewrite-up-sheepdataset.pdf

Home >Mathematics homework help >Statistics homework help >Final Project need help from A+Grade Helper

Time Series Analysis Sample Write-up for project

Analysis of Annual sheep population The “annual_sheep_population.csv” file contains the annual sheep population (in 1000’s) from 1867 through 1939.

Based on the plot of data, graph of autocorrelation function and partial autocorrelation function (see below), it appears the data has an overall downward trend and that the series is not stationary. The trend is not consistent enough to justify a regression time model.

The ACF dies of in a somewhat linear fashion. The PACF is has 3 significant lags. It is possible that the series is stationary and that an AR(3) model would be appropriate. However, the behavior of the autocorrelation function is cause for concern. We have a fairly small data set (73 observations) so the changes in autocorrelation values will occur fairly quickly. From our initial plot we also identified the likelihood that we have a unit root (need differencing). We can test for this using the Augmented Dickey- Fuller Unit Root test.

The p-value for the test is p=####, The null hypothesis of the test is that there is a unit root. With a large p-value, we fail to reject the null and conclude that differencing is needed.

The differenced data, the ACF of the differenced data and PACF of the differenced data are shown below:

This series looks much more stable. The ACF is insignificant after lag 4 and the PACF is insignificant after lag 3 so an MA(4) and an AR(3) model on the differenced data both appear reasonable. Because it takes 3-4 lags for both functions to die off, we might even consider a mixed ARMA(1,1) model.

With the above in mind, we will construct and compare the following models: ARIMA(0,1,4), ARIMA(3,1,0) and ARIMA(1,1,1).

Time

1870 1890 1910 1930

14 00

18 00

2 2

0 0

5 10 15

-0 .2

0 .2

0 .6

Series sheep_ts

Lag

A C

5 10 15

-0 .4

0 .0

0. 4

0 .8

Lag

P ar

tia l A

C F

Time

sh ee

p _

di ff

1870 1890 1910 1930

-1 5

0 -5

0 50

1 5

5 10 15

-0 .4

-0 .2

0 .0

0 .2

Series sheep_diff

Lag

A C

5 10 15

-0 .3

-0 .1

0 .1

0 .3

Lag

P a

rt ia

l A C

Model1: Moving Average 4 on differenced data i.e. ARIMA(0,1,4).

The summary of the model is:

INPUT SUMMARY HERE

A review of the residual plot, autocorrelation of residuals and partial autocorrelation of residuals (see below) indicates there is no discernible pattern in the residuals and no significant spikes in ACF or PACF and so it looks like we have explained as much variation as possible with this model.

A test for white noise was performed. The null hypothesis is that the residuals are white noise. The p- value is ###. With the large p-value, we would be unable to reject the null hypothesis so the assumption of white noise is reasonable.

A test for normality of the residuals was also performed. The null hypothesis is that the residuals follow a normal distribution. The p-value is ###. With the large p-value, we would be unable to reject the null hypothesis so the assumption of normality is reasonable.

We followed this test with a visual check of normality by plotting the the qqplot of the residuals and the histogram of the residuals producing the following graphs:

Based on these values, we can be comfortable with the assumption that the residuals are normal white noise.

Model 2: Autoregressive 3 on differenced data i.e. ARIMA(3,1,0).

The summary of the model is:

INPUT SUMMARY HERE

Time

sh e

ep __

a rim

a 0

14 $

re si

d ua

1870 1890 1910 1930

-1 50

-5 0

5 0

1 50

5 10 15

-0 .2

-0 .1

0 .0

0 .1

0. 2

Series sheep__arima014$residuals

Lag

A C

5 10 15

-0 .2

-0 .1

0. 0

0. 1

0 .2

Lag

P a

rti a

l A C

-2 -1 0 1 2

-1 5

0 -5

0 5

0 1

5 0

Normal Q-Q Plot

Theoretical Quantiles

S a

m p

le Q

u a

n til

e s

Histogram of sheep__arima014$residuals

sheep__arima014$residuals

F re

q u

e n

-150 -100 -50 0 50 100 150 200

0 5

1 0

1 5

2 0

Notice that there is no discernible pattern in the residuals (only significant spikes in ACF and PACF are at late lags and are marginally significant) and so it looks like we have explained as much variation as possible with this model.

We followed this test with a visual check of normality by plotting the the qqplot of the residuals and the histogram of the residuals producing the following graphs:

Based on these values, we can be comfortable with the assumption that the residuals are normal white noise. The qqplot and histogram for this model appear to have a more defined bell-shape than the corresponding graphs for model1.

Model 3: Mixed ARMA(1,1) model on the differenced data i.e. ARIMA(1,1,1)

An ARIMA(1,1,1) model was fit and it yielded an AIC value of 830.75. This value is significantly higher than the other two models. As such, I have removed it from consideration.

Time

sh e

e p_

ar im

a3 10

$ re

si d

ua ls

1870 1890 1910 1930

-1 50

-5 0

5 0

15 0

5 10 15

-0 .2

0 .0

0 .1

0. 2

Series sheep_arima310$residuals

Lag

A C

5 10 15

-0 .2

0. 0

0. 1

0 .2

Lag

P a

rti al

A C

-2 -1 0 1 2

-1 5

0 -5

0 5

0 1

5 0

Normal Q-Q Plot

Theoretical Quantiles

S a

m p

le Q

u a

n til

e s

Histogram of sheep_arima310$residuals

sheep_arima310$residuals

F re

q u

e n

-200 -100 0 100 200

0 5

1 0

1 5

2 0

Visual diagnostics:

For the two models that appear to be good candidates, I will visually inspect how well the model fits the data. The following graphs were produced:

Actual vs. ARIMA(0,1,4) predicted Actual vs. ARIMA(3,1,0) predicted

Both have a similar fit and closely mimic the pattern of the original data set.

Predicted values:

A graph of predicted values shows similar ranges with those from ARIMA(3,1,) being slightly higher. However, the confidence intervals provide very similar results.

Predicted ARIMA(0,1,4) Predicted ARIMA(3,1,0)

Conclusion: Because the models are so close my first recommendation is to track of both and see which one develops into the better predictor. The models will need to be updated after each new data point addition. My recommendation for a single model is the ARIMA(3,1,0) model. There is not a significant difference in the AIC values so that statistic does not give a clear winner. Additionally, the residuals from both models appear to be white noise and those of the ARIMA(3,1,0) model show more of a tendency to be normal. Lastly, the ARIMA(3,1,0) model is a slightly simpler model (1 order lower than the ARIMA(0,1,4) model).

Time

sh e

e p_

1870 1890 1910 1930

1 4

0 0

1 8

00 2

20 0

Time

sh e

e p

_t s

1870 1890 1910 1930

1 4

00 1

8 0

0 2

2 0

Time

1880 1900 1920 1940

1 4

0 0

1 8

0 0

2 2

0 0

Time

1880 1900 1920 1940

1 4

0 0

1 8

0 0

2 2

0 0