Stat Regression Project
STAT 378 - 01/02 Spring 2018 Group Project Guidelines
Introduction The project will consist of a multiple linear regression analysis on a dataset of your choosing. In a group of three, you will be expected to find a dataset, develop questions of interest, run a complete multiple linear regression analysis, and summarize your results in a written report.
Groups You will be assigned to a group of approximately 3 students based on subject area of interest. You will likely not be in a group with your friends. I believe this is to your benefit as it allows you to meet new people with common interest as well as simulate a collaborative setting like you will encounter after college. To help me place you in an appropriate group, please fill out the survey and return it to me. Deadline for survey: Friday, March 9th at 10:00 am. This portion of the project is worth 5 points.
Finding a Data Set Once given your group, you must find a dataset related to your topic of interest. You may select a data set from a textbook (except Dielman), the Internet, another class, or use data that you have collected on your own. I will provide you with some ideas of different places to look for data for your specific topic. Your data set should have at least three potential explanatory variables and at least twenty observations. Real data sets are more difficult to analyze than data found in a textbook. Additionally, the more explanatory variables involved, the more complex the analysis will be (and possibly more problematic). For these reasons, your grade on the project will depend on how well your analysis meets the challenges of your dataset. Thus, I expect you do more analysis with a simple dataset. Once you have found a data set of interest, you will need to save it. The easiest way to do this is to save the data file as a .csv file (which can be opened in both Excel and Minitab). If you have questions on saving a data set, please contact me and I would be happy to help.
Project Proposal/Description of the Data Set Your group should provide me with a brief project description including a description of the data (identifying the individuals, the response variable, the explanatory variables, and how this data was obtained) and the question(s) of interest. This project description should be a FORMAL SUMMARY/WRITE UP and can be submitted in person or electronically. In addition to the project proposal, you need to send me electronically the data set you will be analyzing as a MINITAB file. Deadline for project proposal and data set submission: Friday, April 6th at 10:00 am. This portion of the project is worth 15 points.
Analyzing the Data You have been/will be exposed to many procedures and analyses in multiple linear regression. You are not expected to apply every technique we discuss in class to your data set. However, your analysis should use the methods that are appropriate for your chosen data set. I would suggest the following steps as a general approach:
Make some scatterplots of each explanatory variable verses the response variable. This will suggest whether there are any necessary transformations when dealing with your data. However, don’t go overboard – think about applying a transformation only when there is a strong reason to do so.
Use model selection procedures to narrow your search down to two or three candidate models.
Think about𝑅𝑎𝑑𝑗 2 , hypothesis tests, multicollinearity, and assumption checking when
choosing your final model. Once you have settled on the final model, comment on the usefulness of that model and
possibly use it to make predictions and/or interpret the regression coefficients within the context of the problem.
The complexity of your data set may determine which techniques can be used, and simpler data sets are expected to be accompanied by more complete analyses. A project that involves the simplest of dataset and only a minimum amount of analyses would likely earn a low B or a C as a final grade. To earn a higher grade, you should demonstrate an in-depth understanding of the course material while showing that you put in the effort to build an informative model.
Written Report General guidelines for the final report:
It should be a typed formal write-up (full sentences, no bullet points) There is no specific length requirement, but I expect most of you will have 5-15 pages. Only include MINITAB output when needed to explain the decisions you made or to
illustrate key points in your discussion. Do not include unneeded MINITAB output. Graphs and tables may be included throughout the text or may be collected as an
appendix, as long as they are labeled and referred to in the text of the document. General report outline:
Introduction: Your write-up should include an introduction that describes a description of your data and where it came from, the question(s) of interest, the variables you chose to investigate, etc. The introduction should also include a brief statement of the final model you came up with.
Model Selection: You should describe the steps involved in your analysis. Make certain to include the purpose behind each step in the analysis, e.g., “we transformed the response variable because there was evidence on non-constant variance”. If you tried many models and had many false starts, you don’t need to include all of those attempts in your report, but you should include the important parts of your decision-making process that led to the final model. Similarly, should not include the full MINITAB output for everything you did, but you include the details form the analyses that led to your final model.
Final Model Results: You should comment on the usefulness of your final model, interpreting coefficient estimates and making predictions if that applies to your question(s) of interest. You may wish to address whether model assumptions are met and whether there are unusual observations influencing the fit of your final model.
Conclusion: This should be a discussion of what you learned about your data and the relationship with the response. You should also comment on difficulties or issues encountered with the analysis, things you might have done differently or that could be improved, and the potential for future research on this topic.
Grading of the written report will address appropriateness, accuracy, and depth of analyses as well as writing (including style, spelling, and grammar). I will provide you with some examples of exceptional written reports as well as a general rubric I will use for grading. Optional Draft Submission: Friday, April 27th at 10:00 am. If you choose to submit a draft to me by this date, I will provide you with feedback and make suggestions as to how you could improve your analysis and/or paper. Peer edits: Tuesday, May 1st in class. Every member of the team must bring his/her own copy of a draft of the final paper. Each individual will peer edit another group’s project. This portion of the project is worth 10 points.
Deadline for final written report: Sunday, May 6th at noon. This portion of the project is worth 60 points.
Peer/Self Reviews Each group member will anonymously evaluate the other group members to make sure that the workload was shared among all three individuals*. In addition to evaluating your group members, you will also evaluate yourself. *If it comes to my attention that one group member did not contribute at all, that person will do an individual project or take a zero as his/her final project grade.
Deadline for peer/self review: Sunday, May 6th at noon. This portion of the project is worth 10 points.
Grade Break Down and Due Dates
Portion of Project Due Date Points Survey Friday, March 9th at 10:00 am 5 Proposal and data set Friday, April 6th at 10:00 am 15 Optional draft to instructor Friday, April 27th at 10:00 am 0 Peer edits Tuesday, May 1st in class 10 Final report Sunday, May 6th at noon 60 Peer/self review Sunday, May 6th at noon 10 Total 100