Statistic Assignment - Research and Analyze

Amasrali74
Assig7_Fall20.pdf

1

Assignment 7

This assignment has 27 questions. It does not require SPSS.

Each set of questions starts on its own page.

Grading note: Some questions ask students to invent examples or explain how to use statistics.

The grades for these questions will reflect the extent to which the student shows mastery of the

relevant, applicable material covered this semester. These questions will also contribute more to

the weight of the assignment grade than the multiple choice and direct interpretation questions,

specifically: Questions 1 to 20 are 85% of the grade, remaining questions are 15% of the

assignment grade.

Type your name at the top!

Upload on time. Due date is in the syllabus, page 5.

2

Invent a Correlation

Think of two variables that you expect to be correlated. Do not choose Age as one variable. Do

not choose Income and Education or other examples from the lectures. Instead, invent your own

example. You can choose a pair of quantitative variables, or a quant-dummy pair.

1. What are the two variables, and how are they measured? Include the information you

would expect to find in a codebook about the two variables. Make sure it’s clear how

they are measured.

2. What is each “case”? In other words, what is each unit of analysis, or, what is shown in

each row of the dataset? Sometimes variables are characteristics of people. Sometimes

variables are characteristics of places. In your example, what is the exact type of case?1

3. Do you expect the correlation to be in the positive or negative direction? Just write

“positive” or “negative”.

4. Explain your answer to the previous question (positive or negative) by writing at least

one complete sentence. Don’t just repeat your expectation. Explain it.

5. Do you expect the correlation to be weak, moderate or strong? Just write “weak”,

“moderate”, or “strong”.

6. Explain your answer to the previous question (strength) by writing at least one complete

sentence. Don’t just repeat your expectation. Explain it.

1 At the beginning of the semester, we used an example where each case was a piece of paper. Each piece of paper had three characteristics: shape, color, and number of seconds it took to cut out. Those three characteristics were the three VARIABLES that described each CASE (piece of paper).

3

Invent a Crosstabulation

Think of two categorical variables that may be related. Do not use examples from lectures, class

meetings or assignments. Invent your own example.

7. What is each “case”? In other words, what is each unit of analysis, or, what is shown in each row of the dataset? Sometimes variables are characteristics of people. Sometimes

variables are characteristics of places. In your example, what is the exact type of case?2

8. Describe one of the variables. What are the different values of the variable?3

9. Describe the second variable. What are the different values of the variable?

10. Draw a blank crosstabulation. Show the values of the variables, but leave the rest blank. Don’t write in any frequencies or percents. It may be easiest to draw on paper, take a

photo with your phone, and paste the photo into your assignment.

11. Realistic: Fill in the crosstabulation with column percents that you might expect to find in real life you had the data. Paste the filled-in crosstab into your assignment.

12. Write complete sentences to interpret two of the percents from the crosstab you filled-in with realistic results.

13. No Relationship: Fill in the crosstabulation with the column percents you would expect to find if there was no relationship between the two variables. Paste the filled-in crosstab

into your assignment.

14. Write complete sentences to interpret two of the percents from the crosstab you filled-in with ‘no relationship’ results.

15. Strong Relationship: Fill in the crosstabulation with the column percents you would expect to find if there was a very strong relationship between the two variables. Paste the

filled-in crosstab into your assignment.

16. Interpret two of the percents from the crosstab you filled-in with ‘strong relationship results with complete sentences.

2 At the beginning of the semester, we used an example where each case was a piece of paper. Each piece of paper had three characteristics: shape, color, and number of seconds it took to cut out. Those three characteristics were the three VARIABLES that described each CASE (piece of paper). 3 For example, in the Intro to Data lecture at the beginning of the semester, the variable SHAPE had three values: circle, square and triangle. COLOR had two values: blue and grey.

4

This semester, we’ve learned about several different data analysis techniques and statistics that

can produce conclusions like the one described in the excerpt below.

“A study from 1976 to 1989 found that young Black males engaged in more violent crime than

young White males. But when the researchers compared only employed young males of both

races, the differences in violent behavior vanished. Or, as the Urban Institute stated in a more

recent report on long-term unemployment, ‘Communities with a higher share of long-term

unemployed workers also tend to have higher rates of crime and violence.’”4

17. How might the researchers have used data to draw their conclusions? What statistics could they have used? What are the variables? What might the researchers have done

with those variables? Write at least three sentences addressing all these questions. Include

a lot of detail. You may also draw tables or graphs.

4 Kendi, I. X. (2019). How to be an antiracist. One world. Edited for clarity.

5

Assume you have data on number of Covid cases and number of residents in each zip code in the

United States for the week of November 23. Assume the data are complete, that is, Covid testing

is widespread so we can assume all Covid cases are detected and included in the dataset.

Sample dataset:

The following questions ask you to address how you would determine whether a specific zip

code has a lot of Covid cases.

18. If the mean of the Covid Cases variable was much, much higher than the median of that

variable, what would that indicate to you? Write complete sentences. You may include

statistics terminology but you should also explain yourself in plain language.

19. Describe how you would use the data to compare zip codes which have very different

numbers of residents. How can you know which zip codes have a lot of cases, relative to

their populations? What would you do with the data, exactly? Use a lot of detail.

20. Describe one method for determining whether a specific zip code is an outlier. What

would you do with the data, exactly? Write complete sentences.

Zip Code

Covid Cases Week of

Nov 23 Residents

45053 276 3690

90210 351 19909

10010 127 33730

95461 61 2440

32533 1058 27532

assume data are available for all zip codes in the U.S.

6

This graph is from the New York Times, March 2018.

21. Write two or three complete sentences explaining why this graph uses Guns Per 100

People instead of just number of guns in each country, and Mass Shooters Per 100

Million People instead of number of mass shooters in each country. Describe or draw

how the graph might look if it used Number of Guns and Number of Mass Shooters

instead of using rates with a base.

7

Researchers randomly assigned children to two groups. The “Jumpstart Group” received

language coaching from volunteers. The other group (“Business As Usual”) did not. They

researchers collected data on several outcomes.

No additional information about this study is necessary to answer these questions.

22. What is shown in the table above? (choose one)

a. Column percents in a crosstabulation b. Results from an experiment c. Correlation statistics

23. Which ‘Estimated Difference’ achieved the greatest level of statistical significance? That is, for which Estimated Difference can we reject the null hypothesis with the most

confidence?

a. English Language and Literacy Development b. Social Emotional Development c. Self-Regulation d. Interpersonal Skills

8

24. Given the statements above, what would you expect from regression results? Choose one.

a. Variables measuring objective risk have higher standardized betas than variables

measuring risk perception.

b. Variables measuring objective risk have lower standardized betas than variables

measuring risk perception.

c. Variables measuring objective risk and variables measuring risk perception are

not independent variables, so they don’t have standardized betas.

9

Codebook

Data are from a sample of 2006 House and Senate candidates.

PERCVOTE: Candidate’s percentage of the total vote. In other words, the share of the votes

received by the candidate.

FRAISEDPC: Per capita money raised. In other words, how much money, in dollars, was raised

by the candidate divided by the number of people in the candidate’s district.

DEMPUN: Percent of the population in the candidate’s district that is unemployed. In other

words, of all people in the candidate’s district who are adults and who are in the labor force,

DEMPUN is the share who do not currently have a job.

Answer the following questions based on the Correlations SPSS output and codebook on this

page.

25. Fill in the blank: In this dataset, each case is a _______________.

26. Interpret the Pearson r correlation statistic for PERCVOTE and FRAISEDPC. Use complete sentences. Use the syllabus page 6.

27. Identify and interpret the p value for the Pearson r correlation statistic for PERCVOTE and FRAISEDPC. Use complete sentences. Include the words “statistically significant”

correctly showing you know what statistical significance means for this example.

10

The assignment is over.

To hand in this assignment, upload it as a single PDF or Word file to Blackboard. Put all your answers into one Word or PDF document. Number your answers.

This is a list of semester topics for your reference. Students should be able to do the following after completing this class:

 Identify different types of variables

 Interpret numbers from cross tabs

 Calculate percentages, including from crosstabs

 Interpret rates with a base

 Calculate percent and percentage point differences

 Articulate the null hypothesis, interpret p values and discuss statistical significance for a Chi Square statistic, Pearson r correlation statistic, and b coefficient from a regression.

 Articulate, generally, how the difference between two numbers can be statistically significant or statistically insignificant at different confidence levels; interpret t-test results from experiments.

 Identify evidence of a skewed distribution and explain how skew works

 Interpret median and other percentiles including cumulative percent

 Interpret standard deviation

 Use standard deviation and interquartile range to establish criteria to identify outliers

 Read SPSS output for correlation and descriptive statistics

 Identify when research results are from correlation, chi square, regression, or a t-test.

 From a multiple regression including at least one dummy variable:

o Interpret b coefficients, p values, and standardized betas

o Interpret adjusted R Square

o Predict a value of y for given values of x

 Interpret the correlation between two variables from a scatterplot

 Interpret all the numbers in SPSS correlation matrix output with more than two variables

 Create and interpret a confidence interval around a mean and around a proportion

 Describe the relationship among sample size, standard error and margin of error