Statistic Assignment - Research and Analyze
1
Assignment 7
This assignment has 27 questions. It does not require SPSS.
Each set of questions starts on its own page.
Grading note: Some questions ask students to invent examples or explain how to use statistics.
The grades for these questions will reflect the extent to which the student shows mastery of the
relevant, applicable material covered this semester. These questions will also contribute more to
the weight of the assignment grade than the multiple choice and direct interpretation questions,
specifically: Questions 1 to 20 are 85% of the grade, remaining questions are 15% of the
assignment grade.
Type your name at the top!
Upload on time. Due date is in the syllabus, page 5.
2
Invent a Correlation
Think of two variables that you expect to be correlated. Do not choose Age as one variable. Do
not choose Income and Education or other examples from the lectures. Instead, invent your own
example. You can choose a pair of quantitative variables, or a quant-dummy pair.
1. What are the two variables, and how are they measured? Include the information you
would expect to find in a codebook about the two variables. Make sure it’s clear how
they are measured.
2. What is each “case”? In other words, what is each unit of analysis, or, what is shown in
each row of the dataset? Sometimes variables are characteristics of people. Sometimes
variables are characteristics of places. In your example, what is the exact type of case?1
3. Do you expect the correlation to be in the positive or negative direction? Just write
“positive” or “negative”.
4. Explain your answer to the previous question (positive or negative) by writing at least
one complete sentence. Don’t just repeat your expectation. Explain it.
5. Do you expect the correlation to be weak, moderate or strong? Just write “weak”,
“moderate”, or “strong”.
6. Explain your answer to the previous question (strength) by writing at least one complete
sentence. Don’t just repeat your expectation. Explain it.
1 At the beginning of the semester, we used an example where each case was a piece of paper. Each piece of paper had three characteristics: shape, color, and number of seconds it took to cut out. Those three characteristics were the three VARIABLES that described each CASE (piece of paper).
3
Invent a Crosstabulation
Think of two categorical variables that may be related. Do not use examples from lectures, class
meetings or assignments. Invent your own example.
7. What is each “case”? In other words, what is each unit of analysis, or, what is shown in each row of the dataset? Sometimes variables are characteristics of people. Sometimes
variables are characteristics of places. In your example, what is the exact type of case?2
8. Describe one of the variables. What are the different values of the variable?3
9. Describe the second variable. What are the different values of the variable?
10. Draw a blank crosstabulation. Show the values of the variables, but leave the rest blank. Don’t write in any frequencies or percents. It may be easiest to draw on paper, take a
photo with your phone, and paste the photo into your assignment.
11. Realistic: Fill in the crosstabulation with column percents that you might expect to find in real life you had the data. Paste the filled-in crosstab into your assignment.
12. Write complete sentences to interpret two of the percents from the crosstab you filled-in with realistic results.
13. No Relationship: Fill in the crosstabulation with the column percents you would expect to find if there was no relationship between the two variables. Paste the filled-in crosstab
into your assignment.
14. Write complete sentences to interpret two of the percents from the crosstab you filled-in with ‘no relationship’ results.
15. Strong Relationship: Fill in the crosstabulation with the column percents you would expect to find if there was a very strong relationship between the two variables. Paste the
filled-in crosstab into your assignment.
16. Interpret two of the percents from the crosstab you filled-in with ‘strong relationship results with complete sentences.
2 At the beginning of the semester, we used an example where each case was a piece of paper. Each piece of paper had three characteristics: shape, color, and number of seconds it took to cut out. Those three characteristics were the three VARIABLES that described each CASE (piece of paper). 3 For example, in the Intro to Data lecture at the beginning of the semester, the variable SHAPE had three values: circle, square and triangle. COLOR had two values: blue and grey.
4
This semester, we’ve learned about several different data analysis techniques and statistics that
can produce conclusions like the one described in the excerpt below.
“A study from 1976 to 1989 found that young Black males engaged in more violent crime than
young White males. But when the researchers compared only employed young males of both
races, the differences in violent behavior vanished. Or, as the Urban Institute stated in a more
recent report on long-term unemployment, ‘Communities with a higher share of long-term
unemployed workers also tend to have higher rates of crime and violence.’”4
17. How might the researchers have used data to draw their conclusions? What statistics could they have used? What are the variables? What might the researchers have done
with those variables? Write at least three sentences addressing all these questions. Include
a lot of detail. You may also draw tables or graphs.
4 Kendi, I. X. (2019). How to be an antiracist. One world. Edited for clarity.
5
Assume you have data on number of Covid cases and number of residents in each zip code in the
United States for the week of November 23. Assume the data are complete, that is, Covid testing
is widespread so we can assume all Covid cases are detected and included in the dataset.
Sample dataset:
The following questions ask you to address how you would determine whether a specific zip
code has a lot of Covid cases.
18. If the mean of the Covid Cases variable was much, much higher than the median of that
variable, what would that indicate to you? Write complete sentences. You may include
statistics terminology but you should also explain yourself in plain language.
19. Describe how you would use the data to compare zip codes which have very different
numbers of residents. How can you know which zip codes have a lot of cases, relative to
their populations? What would you do with the data, exactly? Use a lot of detail.
20. Describe one method for determining whether a specific zip code is an outlier. What
would you do with the data, exactly? Write complete sentences.
Zip Code
Covid Cases Week of
Nov 23 Residents
45053 276 3690
90210 351 19909
10010 127 33730
95461 61 2440
32533 1058 27532
assume data are available for all zip codes in the U.S.
6
This graph is from the New York Times, March 2018.
21. Write two or three complete sentences explaining why this graph uses Guns Per 100
People instead of just number of guns in each country, and Mass Shooters Per 100
Million People instead of number of mass shooters in each country. Describe or draw
how the graph might look if it used Number of Guns and Number of Mass Shooters
instead of using rates with a base.
7
Researchers randomly assigned children to two groups. The “Jumpstart Group” received
language coaching from volunteers. The other group (“Business As Usual”) did not. They
researchers collected data on several outcomes.
No additional information about this study is necessary to answer these questions.
22. What is shown in the table above? (choose one)
a. Column percents in a crosstabulation b. Results from an experiment c. Correlation statistics
23. Which ‘Estimated Difference’ achieved the greatest level of statistical significance? That is, for which Estimated Difference can we reject the null hypothesis with the most
confidence?
a. English Language and Literacy Development b. Social Emotional Development c. Self-Regulation d. Interpersonal Skills
8
24. Given the statements above, what would you expect from regression results? Choose one.
a. Variables measuring objective risk have higher standardized betas than variables
measuring risk perception.
b. Variables measuring objective risk have lower standardized betas than variables
measuring risk perception.
c. Variables measuring objective risk and variables measuring risk perception are
not independent variables, so they don’t have standardized betas.
9
Codebook
Data are from a sample of 2006 House and Senate candidates.
PERCVOTE: Candidate’s percentage of the total vote. In other words, the share of the votes
received by the candidate.
FRAISEDPC: Per capita money raised. In other words, how much money, in dollars, was raised
by the candidate divided by the number of people in the candidate’s district.
DEMPUN: Percent of the population in the candidate’s district that is unemployed. In other
words, of all people in the candidate’s district who are adults and who are in the labor force,
DEMPUN is the share who do not currently have a job.
Answer the following questions based on the Correlations SPSS output and codebook on this
page.
25. Fill in the blank: In this dataset, each case is a _______________.
26. Interpret the Pearson r correlation statistic for PERCVOTE and FRAISEDPC. Use complete sentences. Use the syllabus page 6.
27. Identify and interpret the p value for the Pearson r correlation statistic for PERCVOTE and FRAISEDPC. Use complete sentences. Include the words “statistically significant”
correctly showing you know what statistical significance means for this example.
10
The assignment is over.
To hand in this assignment, upload it as a single PDF or Word file to Blackboard. Put all your answers into one Word or PDF document. Number your answers.
This is a list of semester topics for your reference. Students should be able to do the following after completing this class:
Identify different types of variables
Interpret numbers from cross tabs
Calculate percentages, including from crosstabs
Interpret rates with a base
Calculate percent and percentage point differences
Articulate the null hypothesis, interpret p values and discuss statistical significance for a Chi Square statistic, Pearson r correlation statistic, and b coefficient from a regression.
Articulate, generally, how the difference between two numbers can be statistically significant or statistically insignificant at different confidence levels; interpret t-test results from experiments.
Identify evidence of a skewed distribution and explain how skew works
Interpret median and other percentiles including cumulative percent
Interpret standard deviation
Use standard deviation and interquartile range to establish criteria to identify outliers
Read SPSS output for correlation and descriptive statistics
Identify when research results are from correlation, chi square, regression, or a t-test.
From a multiple regression including at least one dummy variable:
o Interpret b coefficients, p values, and standardized betas
o Interpret adjusted R Square
o Predict a value of y for given values of x
Interpret the correlation between two variables from a scatterplot
Interpret all the numbers in SPSS correlation matrix output with more than two variables
Create and interpret a confidence interval around a mean and around a proportion
Describe the relationship among sample size, standard error and margin of error