#7208 Topic: Statistic Project
Stat 02260 -Major Project.pdf 1 of 1
Major Project: Analyze Data using Descriptive Statistics
Overview: This project’s objective is for the student to show mastery of descriptive statistics on a dataset of their choice. Mastery must show strong use of the technology tool “STATKEY” (http://www.lock5stat.com/StatKey/) (Do not use excel) and when appropriate use calculator. This document contains the instructions, details for the various parts and their corresponding due dates and the grading rubric. NOTE: IT IS VERY IMPORTANT THAT STUDENTS ARE COMPLETE IN THEIR READING AND COMPREHENSION OF THE INSTRUCTIONS FOR EACH PART OF THIS MAJOR PROJECT. So read very carefully and do NOT skip over any instructions, which would reduce your reading comprehension.
Instructions:
Part 1: Select a Data Set –For this project, you must select a data set from the included list of datasets provided OR you can source your own custom data set. You must declare your dataset by the end of Week #2. Overview of your dataset requirement that you are using for this Project:
Data set requirements: Requirement 1: The dataset you select must have at least 25 cases.
Requirement 2: It also must have at least two categorical variables and at least two quantitative variables. (Students must make sure their dataset fits BOTH requirements or it cannot be used for this Project).
Requirement 3: Categorical variables must either have only five or less levels OR be adapted to only have five or less levels. (For example: Gender has two levels: Male and Female. You can often decrease the number of levels in a categorical variable by pooling more than one level under a new “OTHER” grouping.)
For submission requirements for Part 1: Need an detailed overview of your chosen dataset: In Word document, please make sure your name is included on the top of the first page include the wording “Major Project - Part 1”. For this submission, provide a small paragraph describing your chosen dataset (MAKE SURE TO INCLUDE this description and explain /reveal the number of cases in the dataset), then include a listing of all variables provided in the dataset with additional explanation of each variable and state whether each variable is categorical or quantitative (Use a table format, with three headings: 1) Variable name 2) Variable description and 3) Variable type). If the variable is Categorical include the number of levels.
Example:
Gender Gender of the person doing the survey Categorical (2 levels) (For help in the first two requirements in your variable “table”, please see Appendix B in Lock text for appropriate format/examples of how to summarize and explain each variable). If you are indeed sourcing your own dataset, you are required to include the above submission and also the student must include an Excel file of the “custom” dataset to be approved. (see the full instructions for choosing a custom dataset below under Option 2). Again, it is only vital to include an Excel of the dataset, if you are submitting a custom dataset (one not vetted by the instructor).
Options for Choosing your Dataset:
Part 1 - Option 1: Choosing a dataset from a THIS SPECIFIC list of non-custom datasets - [ONLY THESE DATASETS NAMED BELOW ARE CONSIDERED NON-CUSTOM since they have already been vetted by the instructor and fits the requirements of at least 25 cases and the requirement of at least two categorical (with five or less levels) and at least two numerical variables.] Students can select from the following data sets (these are considered non-custom datasets):
Provided non-custom datasets (already vetted by instructor):
Vetted Datasets are: ICUAdmissions, HollywoodMovies2011, MiamiHeat, NutritionStudy, SleepStudy, SpeedDating, StudentSurvey OR USStates
Do
not
just click the link/URL below and grab any dataset on that webpage.
The datasets named above have been vetted by instructor and fit the requirements
, any other dataset you choose not named already, you must vetted yourselfand would be considered a CUSTOM dataset and thus you will need to follow the instructions for submission of a custom dataset and all the additional requirements.
The NAMED/Vetted datasets can be reviewed from the webpage:
http://higheredbcs.wiley.com/legacy/college/lock/0470601876/data_sets/datapage.html
Only these dataset named: NutritionStudy
are (found on this URL above) would be considered a NON-custom dataset since they have been vetted by the instructor. Review THE NAMED ONES and see which one (if any) you would want to use for this Major Project (only these named from that URL would be considered non custom datasets thus no Excel would be needed in your submission). If you do not prefer to use any of those named/vetted already by the instructor, you have another option of providing a custom dataset, so read those directions which follow here.
Part 2: Analyze your Data!:
In this milestone, students are required to complete the analysis portion of their project/report using STATKEY for any graphical displays and when needed the calculator as well. There is no need to do any introduction or conclusion at this point, but must do the analysis section of your report. As seen below are headings for each part of this analysis portion to the report. Be sure to submit this analysis in a Word document and include a cover page with your Author’s name , Date, Note that it is a“Draft Analysis” and the course name and number, along with instructor’s name. CAUTION: Students are to AVOID trying to use other programs/graphing tools
from the general internet. Remember: Statkey is the main tool to be used
followed by some need of the TI-83/84 calculator, when it comes to outlier
analysis.
*********TIPS on Appropriate SNIPPING OF GRAPHS from STATKEY*****************:
TIP #1: Do not snip the graph and the table at the sametime, it makes both too
small and hard to read.
TIP #2: Do not snip so large that you end up with statkey menu’s showing..
Instead do a TIGHTER snip of the Graph such that you cannot even tell it came
from Statkey. Do not leave alot of space above the graph… that is really not
needed.
TIP #3: All graphs and all tables do get a Title (their own title, they do not share
titles). How to add a title? Use the Textbox feature in Word, but make sure to
lose the LINE on the box (pick no line on text box option) and lose the
background color (Option: No background color) and it will blend nicely on the
graph and look like it belongs there and not just something added way above the
graph. A heading to the section is not the same as the TITLE on a graph or TITLE
on a table.
TIP #4: Remember this is a college level project and presentation is part of the
grading rubric. TIP #5: Avoid snips that include any STAT KEY Menu options - example here
Better snip, cleaner, tighter, easier to read and with a title ON THE GRAPH:
The analysis required is as follows (part a through part e) and must include TITLES to each
HEADING as requested for each analysis :
a. Analysis of one Quantitative Variable: For one of your quantitative variable, include summary statistics in tabular form (mean, standard deviation, five number summary) and at least one graphical display. Make sure to discuss the variable by name. Also answer in this section the following (make sure to talk about the variable.. In your sentences): 1. Give some conclusions about the results from this analysis (At least two sentences minimum for this requirement). 2. Is the distribution symmetric, skewed left , or skewed right? (if your original graphical display is not appropriate to see shape of the data, then SWITCH to another graphical display that does allow you to present the shape. Only include ONE graphical display. 3. Are there any outliers? (Must show Min. Fence, Max Fence, IQR and also LIST each and every outlier...using IQR method as shown in lecture/text .)
b. Analysis of one Categorical Variable: For at least one of the categorical variables, include a frequency table and a relative frequency columns in ONE table. What conclusions can be drawn from this analysis? (at least two sentences). Make sure to discuss the variable by name.
c. Analysis of One Relationship between two Categorical Variables: Include a two way table and discuss any relevant proportions. Does there appear to be an association between the two variables? Why (describe it) or why not (describe...why not)? Make sure to discuss the variables by name.
d. Analysis of One Relationship between a Categorical Variable and a Quantitative Variable: Include a side-by-side plot (side by side dotplot, boxplot or histogram) and describe the results seen in several sentences. Also, use some summary statistics (in tabular form) to compare the groups. Does there appear to be an association between the two variables? If so, describe it. If not, why not?
e. Analysis of One Relationship between Two Quantitative Variables: For at least one appropriate pair of quantitative variables, include a scatterplot and discuss it what it reveals ( by stating the pattern of the correlation) in several sentences. Make sure to discuss the variables by name. If there is a linear correlation, be sure to comment on the direction (positive or negative) of that linear relationship. You must give the correlation coefficient and comment on its meaning or not in context of the correlation seen in the scatterplot. NOTE: No line of best fit should be included. No table should be included. Be sure to pick appropriately, one of the variables to be a response variable (on y axis) and the other as an explanatory variable (x axis) and be sure your scatterplot matches those selections. Would you have expected this type of correlation between the variables? Why or Why not? MAC, you can use google docs to help OR use PDF, but you are responsible that the format stays correct when converting it to PDF.
Part 3: Write /Finalize Your Final Report - Your first step in this final milestone to this major project is to review the “critique” of your part 2 analysis which your instructor had given you. For this assignment, students are required to submit their full report after finalizing it (incorporating ALL relevant feedback and also completing the introduction and conclusion sections per the requirements for each).
Your report should include a Cover Page, then the report itself should include: an Introduction section , your Analysis section (in the same order as detailed in the Part 2: Analyze your Dataset instructions) and finally, include a Conclusion section. All sections should have the headings as given: “Introduction”, “Analysis” or “Conclusion”. Further details are given below. Format of the Final report: Cover Page: Include a Title (based on your data/topic, use a format “Major Project : XXXXXXX “ where XXXXX is your specific title which should be based on the type of data being analyzed). Students are also include on the cover page: Course Number & Name, the Date of the submission of the report, the instructor’s name and then finally your name. Introduction: Report is required to have a Label heading “Introduction” to this section. Give a paragraph introducing your data, cite your source and describe all relevant variables. If it is a custom dataset, include a copy of your dataset in tabular form. Make sure to state your number of cases in this description. Remember this is your chance to introduce your dataset to your reader and you need to be complete, since they have no prior knowledge of the dataset.
Analysis: Label this section with heading of “Analysis” and also label each analysis with their specific heading provided above in Project- Part 2. Students must keep the analysis in the order as originally provided, and include all requirements including the Graphs/Tables, but also including in paragraph form answers to the questions posted for each specific analysis stated in part 2.
Conclusion: Label the heading of this section “Conclusion”. Since the middle section of the report includes all the details of your analysis of the data, this section should be a brief summary of the most interesting features of your data (summarizing the key findings from the “body” of this report) in paragraph form. Make sure to include at least four key findings. In this CONCLUSION section students MUST also answer this question/final thought: What other variables would you suggest be gathered next time to increase the depth of the analysis? Be sure to state what variables and why they would be useful/helpful and how they would give greater insight/understanding on this topic/analysis. Grading Rubric (total possible points = 190):
|
Category |
Below Standard
|
Collegiate Level |
Flawless Excellence |
|
Part 1: Selecting Dataset, Summarizing/Identification of Variables |
0-9 points Either no dataset was selected,and/or the assignment was not incomplete in some or all of the requirements. If not submitted by due date, zero points awarded. Approval could not be given due to missing required information (needed to be able to verify that student had a viable dataset for their project). |
10-14 points Dataset was selected and required summary was somewhat incomplete but enough information was submitted to approve the dataset and confirm the identification of the variables types such that approval could be given. |
15 points Through and complete presentation of the dataset was supplied with full summary and full details and descriptions of all variables, such that absolute approval could be given. |
|
Part 2: Draft of Data Analysis |
0-34 points Many concerns about the submission which could include missing analysis, format, major presentation faults, and/or missing or faulty conclusions. Titles were missing on most, if not all graphs/tables will mean a great reduction in points awarded. PRESENTATION LEVEL WAS BELOW COLLEGE LEVEL. StatKey was not used when it was appropriate to do so. Remember for the graphs: STATKEY use is required. |
35-39 points Almost perfect and complete analysis of the dataset was provided in the required format but there were some issues in presentation, conclusions, format. Graphs may have been missing on only a rare graph/table or were not appropriate.
Most graphs were snipped well and presented very well, but at least one graph was not in presentable form and/or easily read. |
40 points Complete Analysis of the Dataset meet all requirements as outlined and all conclusions were well developed and showed depth and mastery of descriptive statistics. Format expected was exactly followed. All graphs/tables presented were 100% complete including appropriate titles.
STATKEY WAS USED as the main source of the descriptive statistics (at least for all graphs) and graphs were well presented/snipped. |
|
Total possible points for Parts 1 and 2 combined: 55 points |
|||
|
Part 3: Final Report |
|||
|
Category |
Below Standard 0-13 points/each category |
Collegiate Level 15-18 points/each category |
Flawless Excellence 19 points/each category |
|
4.1: Analysis of one Quantitative Variable |
Wrong variable(s) type used or the descriptive statistics created has poor design, sizing, lacks readability. Missing or inadequate conclusions given or unclear/not confirmed by analysis. Author DID NOT incorporate targeted feedback received from their instructor’s critique (if applicable feedback & did not use it, max grade is from this rating.) Titles were missing on most, if not all graphs/tables will mean a great reduction in points awarded.
STATKEY was not used as required.
|
Basic descriptive statistics were provided and meet the minimum requirements but there were smaller errors in format, presentation or small errors/missing incomplete conclusions drawn. Author DID incorporate targeted feedback received from their instructor’s critique (if applicable). |
Through and complete analysis and presentation/format of descriptive statistics and the given conclusions meet all requirements. Extremely well developed conclusions and presentation was exactly as required. Author did incorporate targeted feedback received from their instructor’s critique (if applicable).
|
|
4.2: Analysis of one Categorical Variable |
Wrong variable(s) type used or the descriptive statistics created has poor design, sizing, lacks readability. Missing or inadequate conclusions given or unclear/not confirmed by analysis. Author DID NOT incorporate targeted feedback received from their instructor’s critique (if applicable feedback & did not use it, max grade is from this rating.) Titles were missing on most, if not all graphs/tables will mean a great reduction in points awarded.
STATKEY was not used as required.
|
Basic descriptive statistics were provided and meet the minimum requirements but there were smaller errors in format, presentation or small errors/missing incomplete conclusions drawn. Author DID incorporate targeted feedback received from their instructor’s critique (if applicable). |
Through and complete analysis and presentation/format of descriptive statistics and the given conclusions meet all requirements. Extremely well developed conclusions and presentation was exactly as required. Author did incorporate targeted feedback received from their instructor’s critique (if applicable).
|
|
4.3: Analysis of One Relationship between two Categorical Variables |
Wrong variable(s) type used or the descriptive statistics created has poor design, sizing, lacks readability. Missing or inadequate conclusions given or unclear/not confirmed by analysis. Author DID NOT incorporate targeted feedback received from their instructor’s critique (if applicable feedback & did not use it, max grade is from this rating.) Titles were missing on most, if not all graphs/tables will mean a great reduction in points awarded.
STATKEY was not used as required.
|
Basic descriptive statistics were provided and meet the minimum requirements but there were smaller errors in format, presentation or small errors/missing incomplete conclusions drawn. Author DID incorporate targeted feedback received from their instructor’s critique (if applicable). |
Through and complete analysis and presentation/format of descriptive statistics and the given conclusions meet all requirements. Extremely well developed conclusions and presentation was exactly as required. Author did incorporate targeted feedback received from their instructor’s critique (if applicable).
|
|
4.4: Analysis of One Relationship between a Categorical Variable and a Quantitative Variable |
Wrong variable(s) type used or the descriptive statistics created has poor design, sizing, lacks readability. Missing or inadequate conclusions given or unclear/not confirmed by analysis. Author DID NOT incorporate targeted feedback received from their instructor’s critique (if applicable feedback & did not use it, max grade is from this rating.) Titles were missing on most, if not all graphs/tables will mean a great reduction in points awarded. STATKEY was not used as required.
|
Basic descriptive statistics were provided and meet the minimum requirements but there were smaller errors in format, presentation or small errors/missing incomplete conclusions drawn. Author DID incorporate targeted feedback received from their instructor’s critique (if applicable). |
Through and complete analysis and presentation/format of descriptive statistics and the given conclusions meet all requirements. Extremely well developed conclusions and presentation was exactly as required. Author did incorporate targeted feedback received from their instructor’s critique (if applicable).
|
|
4.5: Analysis of One Relationship between Two Quantitative Variables |
Wrong variable(s) type used or the descriptive statistics created has poor design, sizing, lacks readability.
Missing or inadequate conclusions given or unclear/not confirmed by analysis.
Correlation coefficient result was not given and/or Not referenced in this section. Author DID NOT incorporate targeted feedback received from their instructor’s critique (if applicable feedback and did not use it, max grade is from this rating.) Titles were missing on most, if not all graphs/tables will mean a great reduction in points awarded. and/or STATKEY was not used as required.
Included Line of best fit and/or summary table and should not have. |
Basic descriptive statistics were provided and meet the minimum requirements but there were smaller errors in format, presentation or small errors/missing incomplete conclusions drawn. Author DID incorporate targeted feedback received from their instructor’s critique (if applicable). |
Through and complete analysis and presentation/format of descriptive statistics and the given conclusions meet all requirements. Extremely well developed conclusions and presentation was exactly as required. Author DID incorporate targeted feedback received from their instructor’s critique (when applicable).
|
|
Category |
Below Standard 0-14 points/each category |
Collegiate Level 15-19 points/each category |
Flawless Excellence 20 points/each category |
|
4.6: Report: Introduction & Conclusion |
Introduction was not complete and comprehensive. OR Weak or no conclusion and what was provided failed to summarize the key points from their analysis. Some findings did not even match the actual analysis. Seemed hesitant on wording of more than one conclusion. |
Introduction was complete but had errors. OR Only a moderate conclusion was provided and highlighted several key points from their analysis but failed summarize more than 2-3 key findings. Most findings matched the actual analysis or seemed hesitant in the wording of at least one conclusion. |
Complete Introduction and Conclusion was well summarized that highlighted at least 4 key findings well matched back to the actual analysis and was well developed and thoughtful conclusion. Final question was answered in this conclusion per the requirement. |
|
4.7: Report: Format/Grammar |
Format was not followed and/or report was hard to read due to grammatical errors/format errors. |
Format had at least one error or report had at least one grammatical/format errors. |
Report was of excellent quality and no format or grammatical errors. |
|
Total possible points for Part 3 (Final report) = 135 points (Project Part 1, Part 2, and Part 3 is combined 190 points) |
CHECKLIST BY SUBMISSION Part 1 versus Part 2, versus Final submission: NOTE: this checklist is NOT meant to be inclusive, but attempts to remind the students of issues, that they should resolve prior to that submission, based off previous classes and the most common submission errors.
Part 1 For all submission for part 1-Make sure you included a full description of the dataset, including its name, if you took from the Vetted list and make sure you included the VARIABLE explanation table (three columns) and Must BE IN table form, so use the ability to “insert table” in your Word document, so you have a nice formatted table with the column headings as given. For xtudents who did NOT pick from vetted list of datasets in this PDF, they instead choose to try to get approval for a custom datasets for part 1, then READ CAREFULLY: if you did NOT pick from the VETTED dataset list in this PDF and you went out and found another dataset you want to use: then you are required to included beyond all that is already mentioned for Part 1 submission, ONE EXTRA File.. that is an EXCEL file of the dataset with headings for each column put in row 1 and the actual cases of the data in row 2 of the dataset in Excel. Failure to supply the actual custom dataset in Excel file along with your required Word document file for part 1, will result in a zero for any student requesting acceptable/approval for their custom dataset. (if you are taking from the vetted list of datasets given in this PDF, then you are not doing any custom dataset request so no need to supply the EXCEL FILE). Part 2 - Check the following and fix/resolve any issue you have not done correctly, these are some of the reminder issues /most common errors seen from part 2 submissions.
1. Did you include a cover sheet as required?
2. Does each table have a variable(s) related title (having a title that says “summary statistics is NOT acceptable). SEE LECTURE 0 for more insight.
3. Does each graph have a variable(s) related title? (see Lecture 0 for more insight)
4. Does each section of the analysis have the given HEADINGS included (DO NOT include the instructions after the headings at all, this is a FORMAL REPORT, but do include the actual headings titles given for each section in the analysis part seen above. For example: Section A: Analysis of one Quantitative Variable (that would be the acceptable heading) there are also headings for section B, C, D and E so used those headings at the start of the page for that section prior to giving your analysis (graphs and conclusions text etc.)
5. In part E analysis most common errors are these types: -In part E, Did you include the table by accident? (if so , remove it in Section E, since the instructions above clearly said, to NOT include it in Part E/Section E). -in Part E, Did you remember to include a statement of the correlation coefficient in the conclusion/text area? - In part E, Did you remember to include in what kind of correlation type you expected and whether or not the results from the graph surprised you? (if not, correct)
Final submission common issues
1. Did you resolve each and every issue from the feedback given about your part 2 submission?
2. Did you make sure to update the cover sheet to no longer say “draft”?
3. Does your conclusion area make sure to include what was mentioned in this document including the fact you were to tell the reader what other variable(s) results should be included next time, they take another study of respondents beyond the variables already in the dataset you used? (do not tell us analysis you could have done on current variables already available in the dataset.. That is NOT the purpose, but instead tell the reader NEW VARIABLES that should be collected that were not collected this time and why they would be interesting to have next time.