STAT 350 PROJECT

wudixiaoyaoyao
Fall17Project.pdf

Spring 2017 STAT 350 Project (230 points) Due Friday December 1, 2017

1

Objectives: Statistical Inference Instructions

 Groups of 2 – 4 students are required.

 Only PDF files are accepted.

 NO late work is accepted.

 Names of all students in the group with their section (time) is required on the top of the first page. Please include your complete name especially if you have both an international name and an American name. Remember that all students have to have the same instructor.

 Each student must submit his or her own statement of contribution (submitted separately). See below for details.

 Put all code in an appendix; no code is required in the main body. Be sure to clearly label which code is for which part. The output that is necessary to answer the questions is required to be in the main body of the project. You will be graded on whether you have enough or too much output included. Different people in the group can use different software packages.

 Your report should be in the same order as the questions posed. Clearly label each part.

 All discussion should be in complete English sentences. Only one report should be submitted per group with each person submitting their own statement of contribution separately. Everything should be submitted on Blackboard. The statement of contribution should consist of what you did in the project and if there were any problems with any individuals in the group. Please include all of your group mates' names at the top of this page also. Please rate each member of your group as poor, good, or exemplary. A chart with some guidelines is listed at the end of this assignment. People will often be good in some areas and poorer in others. Therefore, if someone is mostly good, that is the rating that you should give them. Please provide further explanations if a person is rated either poor (or unacceptable) or if the person is exemplary. This statement should not be shared with your group mates; therefore, it cannot be included in the body of the project. For the person who is submitting the report, you will need to add a separate attachment for the statement of contribution. If you have any question about the project, please ask on Piazza, ask during office hours, or discuss it with your instructor. It is acceptable for different parts of the project to use different software packages. There will be no tutorials for this project, please refer to the other tutorials as needed.

Project: Statistical Inference Throughout the semester we have learned some basic but useful statistical tools. With these tools, we can conduct analysis on some problems that we may be interested in. Since most data sets contain a large amount of different types of data, it is important to be able to determine which methods are appropriate for each type of data. In this project, you are to decide on three related questions based on the US Demographic data that we have been using this semester.

Spring 2017 STAT 350 Project (230 points) Due Friday December 1, 2017

2

By related, we mean that you should seek to understand one general situation and ask three questions pertaining to that situation. Another way of looking at this is that you should analyze the features of one variable X and its relationship to the other variables in the dataset. Hence, all inferences would involve X alone or X with another variable. Some examples of this are given below. You will then answer each question by performing statistical inference (please see subpart "e" of "Grading for C, D, and E" below for a list of the inference procedures you are allowed to consider). Furthermore, you must use at least two DIFFERENT techniques. For example, it would not be acceptable to use three one-sample t-tests for your inference procedures; however, it is acceptable to use a one-sample t-test and 2 two-sample t-tests. You may also use three different techniques like a 1-sample t-test, a 2-sample t-test, and ANOVA. You are not required to use the question that you posed in Lab 1. All of the analyses must be different from what are asked in the required labs. If you repeat anything that was previously asked, you will receive a 0 on that part. The variables that you can NOT use are in the following table:

Lab Variables 6 1-sample: Average Test Score 7 2-sample: Median Income NE vs. NC, Education Spending in both periods 8 ANOVA: Average Test Score vs. Region 9 Linear Regression: Average Test Score vs. Median Income

Note that percent college graduates, percent divorced males, and percent divorced females are proportions and so the inference techniques that have been discussed in class are not applicable to them. The following are some examples of acceptable situations and the inferences that are used to check them using a dataset that was used previously, the heights and weights of major league baseball players.

Situation Inference 1 Inference 2 Inference 3

Heights of baseball players

1 sample: heights two-sample: Average heights in American League versus National League

ANOVA: heights versus positions of interest (at least 3) OR two-sample: heights of two different positions

Weights of baseball players

1 sample: weights ANOVA: Weights versus positions of interest (at least 3) OR two-sample: weights of two different positions

Regression: heights versus weights

Spring 2017 STAT 350 Project (230 points) Due Friday December 1, 2017

3

Grading and Content Information: A. (15 pts.) Introduction and question. Decide on three related questions that can be

answered via inference in the US Demographic database. Please see the above for help on choosing your questions. Once you decide on the questions, briefly explain why the answers to these questions are important. This part should consist of at least one paragraph with at least one reference. References may be online as long as they are correctly cited.

B. (5 pts.) Data: Make a table of each of the variables that you are using with a brief description of each variable and whether the variable is numeric or categorical.

C. (50 pts.) Inference 1. See below for what needs to be included.

D. (50 pts.) Inference 2: See below for what needs to be included.

E. (50 pts.) Inference 3: See below for what needs to be included.

F. (20 pts.) Write a final conclusion based on Parts C, D, and E. This should be a brief summary of what you have already written in the conclusions of parts C, D, and E and a final answer to your questions in part A. You will not receive full credit unless you discuss the practical significance using non statistical methods. Please write your response so that it is understandable to someone who has not taken statistics.

It is acceptable if the result of any of the inferences is 'not significant.' You will just need to explain in part F how 'not significant' answers the question(s) that you pose in Part A. In addition to the points mentioned above, you will be graded on an additional 40 points. 10 points will be for organization and style. These points will consist of whether the organization of the report is easy to read, the items are in the correct order, complete English sentences are used, and whether student names and sections are at the beginning of the report. The other 30 points will be for group participation as graded by your peers. The points will be based on the submitted statement of contributions of the members of the group. If you do not submit the statement, then you will loose all of these points. The number of points may change depending on yours and your group mates' statements. Because of these points, the final score on the project might be different for different members of the group.

Spring 2017 STAT 350 Project (230 points) Due Friday December 1, 2017

4

Grading for Parts C, D, and E: a) (5 pts.) Code: The code should be clearly labeled in the appendix. You may use different

software packages for the different parts. b) (5 pts.) What statistical procedure should be used and why? Besides the technique itself, be

sure to state whether you are performing a one-sided or two-sided inference with an explanation for the choice. Remember, this needs to be determined BEFORE you analyze the data.

c) (10 pts.) Determine if the appropriate assumptions are correct. Please provide all of the graphs to show that the assumptions are met and explain your decision. If the assumptions are not correct for your methodology and you still perform the analysis, you will lose 25 points. If a transformation is needed, state that you have performed a transformation and explain why. The explanation should include at least the histogram of the original data. You may include additional graphs if necessary. If there is a transformation used, include a complete set of the required graphs for the transformed data. You may assume that the data set is from an SRS as you have been assuming this semester. This assumption must explicitly stated.

d) (5 pts.) Graphically display the data as appropriate for your answer in part b) with an interpretation of the output. The point of this part is to understand and explore your data, not merely to check the assumptions needed for inference as you did in part (c). Some of the graphs in part (d) may have already been used in step c) however, the description of the graphs will be different. To determine which graphs are appropriate, please see the appropriate lab.

e) (20 pts.) Perform the appropriate inference with a significance level of 0.05. This may consist of more than one step depending on the methodology in step b). The possible methodologies are

1) Confidence interval AND hypothesis test (Chapters. 8, 9, and 10): This includes 1 sample, 2 – sample independent and 2 – sample paired t procedures. Each of these procedures is considered a different type of inference.

2) ANOVA (Ch. 11): Both the hypothesis test and the multiple comparison (if appropriate) need to be included.

3) Linear regression (Ch. 12): At least one inference needs to be included besides the equation of the line.

All confidence intervals should include the interpretation. All hypothesis tests should consist of the 4 steps.

f) (5 pts.) A conclusion in words that relates to the context of the question. This should be a short paragraph and should be understandable to someone who has not taken a course in statistics which explains your conclusions of the part. This should answer part of the question that you posed in Part A.

Spring 2017 STAT 350 Project (230 points) Due Friday December 1, 2017

5

Unacceptable Poor Good Exemplary

Contribution to Group’s Tasks

 Chooses not to participate in the group

 Shows no concern for goals

 Impedes goal setting process

 Chooses not to participate in problem-solving

 Participates inconsistently in the group and sometimes helps in the group work

 Shows sporadic concern for goals and sometimes helps in goal setting

 Offers suggestions occasionally to solve problems

 Participates in the group all or most of the time

 Shows concern for goals and participates in goal setting all or most of the time

 Offers suggestions to solve problems and sometimes encourages group participation

 Always leads in group activities

 Always leads in setting goals

 Involves the whole group in problem-solving

Completion of Personal Tasks

 Impedes others from completing their assigned tasks

 Does not complete their assigned tasks

 Sometimes helps others to complete their assigned tasks

 Completes assigned tasks some of the time.

 Sometimes helps others to complete their assigned tasks

 Completes all of their assigned tasks

 Actively helps others complete their assigned tasks

 Thoroughly completes their assigned tasks

Group Interaction

 Discourages sharing

 Does not participate in group discussions

 Does not listen to others

 Shares ideas occasionally when encouraged

 Allows sharing by most group members

 Listens to others sometimes

 Shares ideas all or most of the time and sometimes encourages group members to share

 Listens and takes other’s feelings into consideration all or most of the time

 Shares ideas all or most of the time

 Actively encourages all group members to share their ideas

 Listens attentively to others

 Empathetic to other people’s feelings and ideas