Statistic Project
STAT 4210 Project 3 Description
Overview We will investigate several research questions for this study, described below. Notice that they are Motivation: Schrödinger’s Starburst Whenever Dr. Megan gets a two-pack of Starburst candies, she is both excited and dismayed. Anecdotally, every two-pack she opens (Figure 1) has two yellow (lemon- flavored) candies, and they are her least-favorite (she prefers pink and red). Until the pack is open, they can be any combination of the four flavors: cherry (red), strawberry (pink), lemon (yellow), or orange (orange), but once open they are invariably both yellow. Some amateur internet research indicates that the distribution of the flavors should be uniform, that all four flavors should occur with the same probability.
Figure 1: Two unopened two-packs of Starburst candies, each with two candies, also pictured, inside. All four candies inside the packs are probably yellow, if Dr. Megan’s theory holds.…
Research Questions Research Question 1 (RQ1): Do people with different flavor preferences have different perceptions about the prevalence of Starburst flavor distributions in the two-packs? RQ2: Is the proportion of two-packs that contain only yellow Starburst candies higher than would be expected than by random chance? RQ3: Is the claim on the internet, that the four flavors are uniformly distributed, supportable? Your Report Per the project description, your report will consist of four major sections (Introduction, Methods, Analysis or Results, and Conclusion or Discussion). The rubrics for the two drafts (first and final) follow the detailed explanation of the expectations for each section below. Introduction. This section introduces the reader to the project, and includes a clear statement of the purpose of the study. The response and explanatory variables are clearly defined, as well as their level of measurement (i.e., whether they are quantitative or categorical variables). In the introduction, the authors also include some discussion of the population of interest, the degree to which the study allows for generalization to that population, and why the generalization can be made (i.e., the scope of inquiry). The
STAT 4210 Project 3 Description
discussion regarding the scope of inquiry, and possible causal inference, comes from the design of the study and data collection method, which is discussed in the Methods section. Methods. The goal of the methods section is to clearly lay out the processes that will be used throughout the study; those processes include data collection (e.g., the design of the study, a description of the participants), data analysis, and checking of conditions to assess the validity of the analytic tools used. Each component of the methods may be separated into subheadings for readability and organization of work (i.e., the process of designing a study is distinct from the process of analyzing the data). In the Methods section, you are expected to first state your hypotheses, both in statistical terms (symbolically) and in plain language (in words), as appropriate, using the work we have done in class and previous homework has a guide. The hypotheses should be consistent with the statistical question and purpose outlined in the introduction, and reasonable within the context of the study. All parameters included in the hypotheses or models, if models are included, must be defined in context. The authors must also describe the study, how the data will be (or were, as the case may be) collected. For example, if one conducts an observational study, what are the inclusion criteria for identifying and then sampling participants? When describing the analyses, the authors must be thorough, but concise. If all goes according to plan, what is the complete set of analyses the authors intend to do, and in what order? What would justify any follow- up analyses, or a closer look at specific relationships or comparisons? What would the authors look for in each analytical phase/step? The authors should also indicate in this phase what a large effect would look like, and what a minimum effect of interest would be, and provide some justification for that determination. Think of this as a contract: outline only the analyses you intend to do and would potentially be able to justify doing to address your stated hypotheses -- no more, no less. Finally, once the authors have described their intended analyses, they should identify all assumptions required for those analyses and their inference to be valid. In addition, they authors should indicate how those assumptions will be assessed. If the data are already available, those assumptions should be checked at this stage, using the methods described. Analysis/Results. In this section, the authors demonstrate that they have done only the analyses they described in the Methods section (no snooping). The results of those analyses and, when appropriate, follow-up analyses are included. The sample statistics (numerical and graphical) related and helpful to answering the statistical question guiding the study are reported, along with their standard errors. Test statistics and p-values, along with relevant confidence intervals (with stated confidence levels) should also be reported. NB: This is a very dry section. Conclusion/Discussion. In the conclusion section, the authors review the results, and put those results into context. How do their findings help answer the statistical question they originally posed? How strong is their evidence against their null hypothesis/hypotheses?
STAT 4210 Project 3 Description
Interpretations of the effects or sample statistics in context to aid in the decision-making process is warranted; all effects investigated should be discussed (to avoid the bias of “positive findings” and only highlighting big effects, as we saw in our readings at the beginning of the semester). Discussion of the magnitude of the effect should be included into the conclusion and incorporated into the decision-making process. The authors should also discuss how their findings relate to the greater population, identified in the Introduction section. Are there any points of caution to be made regarding this generalization, or limitations to the extent these conclusions can be generalized? That is, what can the findings of this study tell is plausible beyond the current sample? The authors should provide some sort of answer to the statistical question. If there were any limitations to the study (some examples: limited resources yielding a small sample, excess variability because of human error or imprecise measurement systems, violations of one or more model assumptions), now is the time to mention them. These limitations might have been artificially imposed to simplify the design or model, to streamline the analytic process, or any number of other reasons: but any limitation may impact our generalizability, our validity, or the reliability of our conclusions and must be considered. Finally, future steps should be described. Such steps may include ways to remove some or all of the limitations to improve upon the current study, additional explanatory variables of interest, or new avenues to consider. This is an opportunity to be creative, in a way that is reasonable and relates to the current study. (e.g., a future step in studying college GPA using high school GPA would not change the response variable to family income; that is a new study entirely.) Formatting All text, including tables copied from JMP, should be in a 12-point, serif font (e.g., Times, Times New Roman, Cambria). Output copied from JMP should be copy/pasted directly and should not be screenshots. The document itself should be double-spaced, with 1-inch margins all around. In the left header of the document, group member names should be listed, using the same font as the rest of the document. Reports should be well organized, in complete sentences and paragraph form throughout. Use subsections where appropriate to separate distinct work and ideas to aid in organization to enhance readability. Any included output should support and supplement your writing; neither plots nor tables should be fully rehashed in the text, but output needs to be provided to support what has been written. Do not dump all output that has been and ever will be provided in your report. Distinct tables should be numbered and titled; figures (plots, images) should be numbered and captioned. References to tables or figures in the text should be made to the number rather than via proximity (acceptable: “the scatterplot in Figure 1…”; unacceptable: “the ANOVA table below…”). Captions to figures should be descriptive (e.g., “A plot of the main effect of amount of alcohol on perceived attractiveness; there is a small effect
STAT 4210 Project 3 Description
between 0 and 2 pints of lager and a large effect between 2 and 4 pints of lager”; table titles should be concise but state what the table is (e.g., “Mean perceived attractiveness ratings”) All mathematical copy should be created using the equation editor (see the document on eLC under Content/Homework for keyboard shortcuts to learn how to quickly type equations and avoid using the mouse). Correct mathematical notation for this class dictates that parameters (e.g., 𝜇,𝛽,𝜎,𝜋), statistics (e.g., �̅�, 𝑏1, �̂�,𝑀𝑆𝐸,�̂�,𝜒
2), and variables (e.g., 𝑦,𝑥,𝑥1, 𝑥2) all be italicized. Numerals (e.g., 1, 2) are left in regular font. If you are using a word processor that does not clearly italicize its math copy, I suggest switching to Word or LaTeX. Reports must be submitted as .pdf files (.pages files or Libre office files will be docked credit, because they are not portable across platforms).
STAT 4210 Project 3 Description
Table 1. Rubric – First Draft Task Score Comments States the statistical question [2] Response and predictor(s) identified [2]
- type identified - reasonable - consistent with question
Scope of inquiry justified [2] Identifies appropriate methods [2]
- overall model - follow-up analyses
States correct hypotheses [2] - in words - symbolically (as appropriate) - parameters defined (as appropriate)
Checks correct model assumptions [2] Analyzed data per Methods section [2] Interpretation of results [2]
- current study - broader scope
Limitations and next steps [2] Submission [2]
- Organization - Formatting - .pdf format
Total points: 20 The rough draft of Project 2 is due at the beginning of class on Monday, 25 November, and
constitutes 1/3 of the project grade.
STAT 4210 Project 3 Description
Table 2. Rubric – Final Draft Task Score Comments States the statistical question [1] Response and predictor(s) identified [3]
- type identified - reasonable - consistent with question
Scope of inquiry justified [3] Identifies appropriate methods [4]
- overall model - follow-up analyses
States correct hypotheses [3] - in words - symbolically (as appropriate) - parameters defined (as appropriate)
Checks correct model assumptions [3] Analyzed data per Methods section [4] Interpretation of results [4]
- current study - broader scope
Limitations and next steps [4] Submission [6]
- Organization - Formatting - .pdf format
The final report, along with the Individual Team Feedback [5 points], is due by the end of your class’s respective finals period: • 10:10 class: Monday, 9 Dec 8:00-11:00a • 12:20 class: Wednesday, 11 Dec 12:00-3:00p • 1:25 class: Monday, 9 Dec 12:00-3:00p
This final report constitutes 2/3 of your project grade. Be sure to incorporate feedback received on the first draft in this final report to maximize your total score on the project.