Revision | Statistics homework help

Project_2_.docx

Project 2: Inference for Proportions

Student name: Torie Robinson

Project Goal: Continue analyzing our topic of interest and answering research questions based on the data collected with Project 1. Assess comprehension of the Central Limit Theorem, constructing and interpreting one-sample confidence intervals for proportions, and conducting and interpreting one-sample hypothesis tests for proportions.

Project Overview: This project is broken down into two checkpoints, which you will complete according to the course schedule. All written work should be completed within this document, and you should use it like a step-by-step checklist of questions/objectives to complete. All short answer questions should be written in full sentences, with consideration given for grammar and spelling.

Before you begin working in this project, click file, then save as, and change the saved file name to include your name. For example, Rachel Sherman would use Project_2_RachelSherman.

Central Limit Theorem Checkpoint:

Checkpoint Overview: The Central Limit Theorem Checkpoint is designed to walk you through the process of verifying if the properties of the Central Limit Theorem apply to your collected data, and make observations about simulated samples compared to our own data.

To begin, we will refer to our data that we collected as part of Project 1.

1. Select one of the categorical variables which you collected survey data. Write a sentence that identifies the categorical variable you are using.

The categorical variable I selected is the preference for sports between football and basketball.

2. Conduct a web search to find a proportion related to the categorical variable you are continuing to research in Project 2. Write a sentence that identifies the proportion, include the source.

(Example: 77% (p=0.77) of American women use Facebook, according to Pew Research. https://www.pewresearch.org/fact-tank/2021/06/01/facts-about-americans-and-facebook )

https://news.gallup.com/poll/610046/football-retains-dominant-position-favorite-sport.aspx#:~:text=Forty%2Done%20percent%20of%20U.S.,%25%20and%209%25%2C%20respectively .

41% (p=0.41) U.S adults say football is their favorite sport, according to Gallup.

We will now use the Central Limit Theorem for Proportions, to make predictions about the sampling distribution of sample proportions.

3. If we assume requirements are met and the Central Limit Theorem for Proportions is appropriate, what shape should we expect the sampling distribution of sample proportions to have?

The distribution of these sample proportions would resemble a bell-shaped curve, centered around the population proportion of 41%.

4. Using the identified population proportion (p), your sample size (n), and the Central Limit Theorem formulas given below, predict the mean and the standard deviation for the sampling distribution of the sample proportion, for your sample size. Show your work, and round to at least four decimal places as needed.

Mean of the distribution of sample proportions:

Standard deviation of the distribution of sample proportions:

5. In statistics, we often call values “usual” when they are within 2 standard deviations of the mean. Using your sample size and the assumed population proportion, write a sentence that identifies the usual values of the sample proportion.

The usual values of the sample proportion for those who prefer football over basketball are between 0.283 and 0.537.

0.41-2(0.0635) =0.283

0.41+2(0.0635) =0.537

6. Using technology, and the identified population proportion, simulate the sampling distribution of the sample proportion. Simulate at least 1,000 samples of the same size as your sample, assuming the population proportion. Include a screen clip of the simulated sampling distribution and the summary statistics. (Hint: Consider using the Sampling Distribution of the Sample Proportion App https://dcmpdatatools.utdanacenter.org/sampdist_prop/ )

7. Write a paragraph (3-4 sentences), which describe the shape, center, and spread of the simulated sampling distribution of the sample proportion from the previous question. Does this align with what the Central Limit Theorem predicts? (Hint: You predicted the mean, standard deviation, and shape in Questions 3 & 4.)

The simulated sampling distribution of the sample proportion, based on the Central Limit Theorem predictions, shows a bell-shaped curve centered around a mean of 0.41. This indicates that the most likely sample proportions cluster around the population proportion of 41% favoring football. The spread of this distribution is determined by the standard deviation of approximately 0.0635, reflecting the variability in sample proportions that given a sample size of 60. This aligns with the theoretical expectations of the Central Limit Theorem.

In the next group of questions, we will apply what we previously predicted.

8. Using technology, create a bar chart for your categorical variable. Be sure to include descriptive labels in your bar chart. Include a screen clip of your bar chart here. (Hint: Consider using the Describing and Exploring Categorial Data App https://dcmpdatatools.utdanacenter.org/eda_categorical/ )

A screenshot of a graph Description automatically generated

9. What is the sample proportion? Use appropriate notation and show the calculations. Include the numerical values for the success size, the sample size, and the decimal equivalent of the proportion. ( Example, of sampled American women use Facebook)

0.6333 of preference for sports prefer football over basketball.

10. Using 1-2 sentences, compare the sample proportion to your assumed population proportion. (Hint: Are they close to each other or far apart? Did your sample proportion surprise you? Is your sample proportion a “usual” value?)

The sample proportion of approximately 0.6333 is remarkably higher than the assumed population proportion of 0.41. This sample size did not surprise me because I to also prefer football over basketball.

Now, we will check to see if the Central Limit Theorem requirements are satisfied.

11. The conditions for applying the Central Limit Theorem include sample size requirements and random sampling requirements.

a. The sample size is considered “large enough” if the following conditions are met:

and . Using these formulas, show if the sample size conditions are met.

This is inequality is satisfied. These conditions ensure that the sample size of 60 is sufficient for the Central Limit Theorem to apply, allowing us to make reliable inferences about the population proportion p.

b. Was your sample a random sample? Be sure to review the definition of a random sample before answering this question.

Yes, it was random although I sent it out to people. I don’t know which respondents completed the survey and who didn’t.

12. Based on your answer to Question 11 would it be valid to use Central Limit Theorem concepts to further analyze this sample? Answer yes or no and explain how you arrived at this decision.

Yes, it would be valid to use Central Limit Theorem concepts to analyze this sample. The theorem assures us that with a sufficiently large and randomly selected sample, the sampling distribution of sample proportions will approximate a normal distribution

13. Regardless of your response to question 12, what would you expect to see if you were to take a sample of n=5000? (Hint: As the sample size increases, what do we expect to see about the shape, center, and spread?

The sampling distribution would become more symmetric and approach a bell-shaped curve, resembling a normal distribution.

Analysis & Reflection:

14. Write 1-2 paragraphs summarizing what you have learned so far about the data you have been exploring in questions 3-13. Explain any contradictions and/or alignments between the sample proportion, the predicted results, and the population proportion. Add any additional observations or questions that you have as you move through this project.

While exploring the data on preferences between football and basketball there were several highlights. Initially, based on a population proportion of 41% favoring football, the Central Limit Theorem predicted a sampling distribution of sample proportions that would be bell-shaped with a mean of 0.41 and a standard deviation of approximately 0.0635 for a sample size of 60. This theoretical framework aligned with practical simulations using the technology link, which showed a bell-shaped curve centered at 0.41, confirming the expected shape, center, and spread of the sampling distribution. The usual range predicted by the Central Limit Theorem is within ±2 standard deviations of the mean. For sample proportions, this means that approximately 95% of sample proportions should fall within p±2.

However, when analyzing the actual sample data collected, a sample proportion of approximately 0.6333 was observed, significantly higher than the population proportion. This divergence suggests that the sample may not be fully representative of the entire population, potentially due to sampling variability or biases in respondent preferences. Despite this discrepancy, the sample proportion falls within the usual range predicted by the Central Limit Theorem, indicating that such variability is expected with smaller samples.

Confidence Intervals & Hypothesis Testing for Categorical Data Checkpoint:

Checkpoint Overview: The Confidence Intervals & Hypothesis Testing for Categorical Data Checkpoint is designed to walk you through analyzing categorical dating using confidence intervals and hypothesis testing.

Before you proceed, clearly identify each of the following by replacing the ## with the correct values.

· The assumed population proportion (from Question 2) is:

· The sample proportion (from Question 9) is:

Regardless of how we answered Question 12, we are going to move forward in applying the concepts of the Central Limit Theorem to build our confidence intervals, and eventually conduct a hypothesis test. At the end, we can discuss how this may or may not have caused additional bias in our research.

Consider using the Inference For a Population Proportion App https://dcmpdatatools.utdanacenter.org/inference_prop/ to answer the following questions.

15. Using technology and 95% confidence, answer the following:

a. Provide a screenshot of the entire Stat App readout.

b. What is the point estimate of the population proportion?

c. What is the margin of error?

E= ± 0.1219

d. What is the 95% confidence interval? Use interval notation.

[0.5114, 0.7552]

e. Does your 95% confidence interval contain your assumed population proportion? Answer yes or no and use the specific values as you explain how you arrived at your decision.

No, my 95% confidence of does not fall between [0.5114, 0.7552].

16. Using technology and 90% confidence, answer the following:

a. Provide a screenshot of the entire Stat App readout.

b. What is the confidence interval? Use interval notation.

[0.531, 0.7357]

c. Interpret the confidence interval in context of your research question.

I can be 90% confidence that the true population that viewers prefer football over basketball falls between 53.1% and 73.57%

d. Use 1-2 sentences to compare the 90% confidence interval with the 95% confidence interval.

The 90% confidence interval, [0.531, 0.7357], is narrower than the 95% confidence interval, [0.5114, 0.7552]. This indicates that with a lower confidence level, we have a smaller range of plausible values for the population proportion.

The next set of questions will help you formulate a hypothesis about your population proportion to conduct a one-proportion hypothesis test.

17. Develop a hypothesis about your population proportion.

a. Write a sentence describing your developed hypothesis about the population proportion. (Example: I hypothesize that the proportion of Delaware women who use Facebook is different from 0.77, the proportion of all U.S. women who use Facebook.)

I hypothesize that the proportion of people who like football .41 is greater than those who prefer basketball.

b. Write the null hypothesis in symbols. (Hint: State )

1

c. Write the alternative hypothesis in symbols. (Hint: State )

>.41

d. Is your hypothesis test one-tailed or two-tailed?

This hypothesis framework sets up a one-tailed test to assess whether the sample data provide evidence to support rejecting the null 1 in favor of the alternative hypothesis in favor of the alternative hypothesis >.41

18. Conduct your hypothesis test at a level of significance 5%, . (Consider using the Inference For a Population Proportion App https://dcmpdatatools.utdanacenter.org/inference_prop/ )

a. Provide a screenshot of the entire ShinyApp readout.

b. What is your test statistic?

My test statistics is 3.5173

c. What is your p-value?

P-value =0.0002

d. What is your decision about the null hypothesis?

The null hypothesis is to reject based off my p-value of 0.0002.

e. In 1-2 sentences, write the conclusion of your hypothesis test in context of your research question.

The p-value 0.0002 is much smaller than the significance level 0.050. This decision means that there is sufficient statistical evidence to conclude that the proportion of people who like football is greater than 0.41, compared to those who prefer basketball.

19. The following questions are regarding significance of the hypothesis test results.

a. Was the conclusion of the hypothesis test statistically significant? Explain your answer.

Yes, the conclusion is statistically significant because the p-value is well below the chosen significance level of 0.05. This suggests that the observed data provide strong evidence against the null hypothesis, supporting the idea that the true proportion is indeed greater than 0.41.

b. Was the conclusion of the hypothesis test practically significant? Explain your answer.

Reflection:

20. Personal Reflection: Write a brief paragraph with your personal thoughts on your project and the process. You may wish to include anything which surprised you or things you found to be challenging along the way. What you write here should not be included in your report below.

Throughout this project, I found it interesting how statistical concepts like the Central Limit Theorem and hypothesis testing can provide insights into survey data, allowing us to draw conclusions about population preferences with a reasonable level of confidence. The challenges mainly revolved around ensuring the sample's representativeness and interpreting the practical significance of statistical findings. Overall, this project underscored the importance of statistics in drawing meaningful conclusions from my data.

21. Top of Form

22. Bottom of Form

23.

Report:

21. Write a Final Report for this project. This is a cohesive report, not a series of responses to questions. You will want to use the scaffolded questions above in your project and elaborate where appropriate to summarize the results of your research, in context of your research question.

Use the following guidelines to structure your report:

a. Introduction: Set the stage for your report by providing an overview of the categorical variable from your identified population. Include the identified population proportion and cite the source. Include appropriate background information. (1 paragraph)

The categorical variable chosen for analysis is the preference for sports between football and basketball among survey respondents. According to Gallup's latest survey data, 41% of U.S. adults consider football their favorite sport ("Football Retains Dominant Position as Favorite Sport," Gallup, 2023). Using a sample proportion 0.6333 from our survey data, we aimed to estimate the true population proportion and assess the reliability of our findings. This proportion indicates a significant preference for football within the population sampled, reflecting its popularity in American culture. Understanding the distribution of preferences between football and basketball provides valuable insights into societal interests and trends, particularly regarding spectator sports in the United States. Top of Form

22. Bottom of Form

b. Summary Results: (2-3 paragraphs)

i. Summarize the results from your research and discuss the findings of your confidence intervals. Consider the relationship between the confidence intervals and the assumed population proportion.

The 95% confidence interval was calculated from 0.5114 to 0.7552. This interval suggests that we can be 95% confident that the true proportion of U.S. adults favoring football lies within this range. Importantly, this interval includes our assumed population proportion of 41%, indicating consistency between our sample findings and the broader population. When we reduced the confidence level to 90%, the confidence interval narrowed to [0.531, 0.7357]. This reduction in interval width illustrates the trade-off between confidence level and precision: at 90% confidence, we are more certain about the true proportion but have a smaller range of plausible values.

The relationship between these confidence intervals and the assumed population proportion is critical. The 95% interval, from 51.14% to 75.52%, provides a broader range of potential variations in the true proportion. On the other hand, the narrower 90% interval, from 53.1% to 73.57%, increases our certainty while sacrificing some coverage of possible outcomes. The hypothesis test supports the assertion that the proportion of football preference is likely greater than 0.41 given the observed data.

The hypothesis test conducted to investigate whether the proportion of people who prefer football over basketball is greater than 41%. With a test statistic of 3.5173 and a corresponding p-value of 0.0004, we reject the null hypothesis 1 the 5% significance level. This indicates strong evidence that a larger proportion of the population indeed prefers football, validating our initial hypothesis >.41.

ii. Discuss the procedure, results, and conclusion of your hypothesis test.

c. Conclusion: Consider the implications of requirements in applying the Central Limit Theorem. Include at least one good statistical question related to your research you would like to investigate further. Include reflections about the project that would add insight. (1 paragraph)

The categorical variable chosen for analysis is the preference for sports between football and basketball among survey respondents. According to Gallup's latest survey data, 41% of U.S. adults consider football their favorite sport ("Football Retains Dominant Position as Favorite Sport," Gallup, 2023). Using a sample proportion 0.6333 from our survey data, we aimed to estimate the true population proportion and assess the reliability of our findings. This proportion indicates a significant preference for football within the population sampled, reflecting its popularity in American culture. Understanding the distribution of preferences between football and basketball provides valuable insights into societal interests and trends, particularly regarding spectator sports in the United States. Top of Form

The 95% confidence interval was calculated from 0.5114 to 0.7552. This interval suggests that we can be 95% confident that the true proportion of U.S. adults favoring football lies within this range. Importantly, this interval includes our assumed population proportion of 41%, indicating consistency between our sample findings and the broader population. When we reduced the confidence level to 90%, the confidence interval narrowed to [0.531, 0.7357]. This reduction in interval width illustrates the trade-off between confidence level and precision: at 90% confidence, we are more certain about the true proportion but have a smaller range of plausible values.

The relationship between these confidence intervals and the assumed population proportion is critical. The 95% interval, from 51.14% to 75.52%, provides a broader range of potential variations in the true proportion. On the other hand, the narrower 90% interval, from 53.1% to 73.57%, increases our certainty while sacrificing some coverage of possible outcomes. The hypothesis test supports the assertion that the proportion of football preference is likely greater than 0.41 given the observed data.

The hypothesis test conducted to investigate whether the proportion of people who prefer football over basketball is greater than 41%. With a test statistic of 3.5173 and a corresponding p-value of 0.0004, we reject the null hypothesis 1 the 5% significance level. This indicates strong evidence that a larger proportion of the population indeed prefers football, validating our initial hypothesis >.41.

In conclusion, the application of the Central Limit Theorem played a crucial role in our analysis. The statistical analyses indicate strong evidence that majority of people prefer football over basketball, with both confidence intervals and hypothesis testing. Ensuring that our sample size was sufficient and that the sampling distribution of sample proportions I was able to conduct a hypothesis test. Our findings suggest a strong preference for football among U.S. adults, supported by both the point estimate and the confidence intervals derived from our sample data.

image3.png

image4.png

image5.png

image6.png

image1.png

image2.png

revisionfeedback.docx

A screenshot of a questionnaire Description automatically generated A screenshot of a computer Description automatically generated

A screenshot of a computer Description automatically generated