Final Project need help from A+Grade Helper

profileneedhelphw82
finalprojectinstructions.pdf

Time Series Analysis Final Project

This project/exam must be done entirely on your own. There may not be any collaboration from anyone. If you are having difficulty getting results from R, contact me or refer to the Walk-through and in-class examples. When turning in your final project, you must state that all work is your own and that you have neither given nor received help from anyone on this project. Analysis of Time Series Data (20 points per data set – 100 points total) There are five datasets.

1. Annual Swedish Fertility rates from 1750 through 1849 (columns are year, rate) 2. Annual water flows from the Colorado River from 1911 through 1972 (columns are year, flow) 3. Weekly market share data of Crest Toothpaste for 276 weeks starting in January of 1958 through

1963 (columns are year, week, price) 4. Monthly CO2 levels at Mauna Loa from January, 1965 through December 1980 (columns are

year, mon, co2) 5. Quarterly Production of Clay Bricks, first quarter 1956 through third quarter 1994 (columns are

(year, qtr, bricks) For ease of importing, each data set has two or three columns as indicated above. The last column which will contain the data you wish to analyze. You may use the “head” command to see the first six rows of information. Your objective is to determine the model you think best fits the data and to justify your answers (i.e. provide an explanation why you took each action and what you learned about the data set). You may use ARIMA models, Holt-Winters or Regression models. If you use Regression, the independent variable needs to have an obvious relationship to the data. The data sets may require differencing and may incorporate seasonal effects. If you feel comfortable, you may use fractional differencing or ARCH models as an option in your models (if appropriate). You should be able to come up with reasonable models without using either fractional differencing or ARCH. During your analysis of each data set, you should list at least three possible models that you are going to test and the justification for each of them. You should then compare the results by looking at the AIC values or r-squared, the behavior of the residuals and any tests that your performed on the residuals and, if possible, compare the fit of models to the actual data sets by plotting them. From this information you should state which model you think is best for each data set and why. For all datasets and analysis, you should include the following information. 1. A graph of the original data. 2. Graphs of any autocorrelation functions you create and what these graphs imply (invertibility,

possible moving average orders when appropriate). 3. Graphs of partial autocorrelations you create and what these graphs imply (stationarity, possible

autoregressive orders when appropriate). 4. List the possible models from this information and any other tests you think may be appropriate. 5. Estimate the parameters for each model you thought might be possible from steps 2-4. Include the fit

statistics (Variance, AIC and/or r-squared, Correlation of parameters, Autocorrelation graph of the residuals) and whether or not parameters are significant. You do not necessarily need to include graphs for every model you fit, if it was obvious to you that it was not a good fit. You may just want to include a paragraph stating what other models you fit and that they were discarded from consideration and why they were discarded. For example, Say I fit AR(1), AR(2), MA(1), MA(2),

MA(3), ARMA(1,1) and ARMA(2,1). From looking at the output, I realize that AR(2), MA(2) and ARMA(1,1) are models I want to investigate further. I would include all diagnostic information on those models but then have a paragraph where I state I also tried AR(1), MA(1), MA(3) and ARMA(2,1) but that these were excluded from further consideration after reviewing the models.

6. Based on your analysis (including the fit statistics and the plots of actual vs forecasted values), indicate which model you feel best represents the data and explain how you based your conclusion. If no model is significantly “better” fitting than the others, provide a reasonable justification for why you chose a specific model or how you would proceed with the information.

7. DO NOT INCLUDE CODE. When someone is reading your report, they are not going to care what you entered to get the output, they will only be interested in the output and your conclusions.

The project should be in report form. I am not expecting you to write a novel, but it should have some structure. It should not just be pages of graphs with comments. It should have a logical flow. Each problem should be its own section of the report. You should include a description of the data that you are analyzing (i.e the description on the first page) and not just “number 1, number 2, etc.”). You should discuss your assumptions and your approach to analyzing each problem. Don’t just perform the steps, tell me why you think that step is necessary and what information you hope to gain. You should expect to spend anywhere from one to three hours per data set. I do not expect this project to consume your life but I want to see that you have gained enough familiarity with the concepts to put together a coherent approach to analysis. Here is the rubric for how each problem will be graded 4 points for your assumptions and having a coherent approach to the analysis. (was it organized in such a way that the analysis is readable and understandable and sufficient that someone could repeat your process. Additionally, were the assumptions reasonable and appropriate). 4 points for including appropriate graphs and test results (was there sufficient documentation to support your approach. 4 points for interpretation of those graphs and results (were your interpretations reasonable and did you justify them). 4 points for testing at least 3 appropriate models for each data set and summarizing the models (were there at least three models tested. Were the models tested appropriate for the data set i.e. from the first three components, could someone reading the report conclude these models might be a reflection of the behavior of the data. 4 points for your conclusion of which model you would recommend (summary of your final descision and justification for your selection of model)..