need help understanding a Rmarkdown
need help understanding a Rmarkdown
2 years ago
15
prob5.pdf
prob5.pdf
PSET 5
FIRST LAST
2024-11-30
In this problem set we’re going to practice inference, or using estimates from samples to make inferences to population parameters. In the first half of the problem set we’re going to practice confidence intervals and in the second half of the problem set we’re going to practice hypothesis testing. Make sure to consult the book if you are struggling with these topics! See especially chapter 7.
Before starting make sure to change the name of this RMarkdown to include your name and also rewrite the header above to include your name. This is worth 10 points.
Confidence Intervals For the first half of the problem set we’re going to return to the British Election Study we worked with earlier in the term. As a reminder this was a survey of British voters in the lead up to the 2016 Brexit vote. The survey data are contained in the file “BES.csv”. There are four variables in the survey:
• vote: indicates respondent’s vote intention in the EU referendum with values: – “leave” (leave the EU) – “stay” (stay in the EU) – “don’t know” – “won’t vote”
• leave: binary variable version of the leave variable with 1 if voter intended to vote leave, and 0 if they intended to vote to stay. Has NA values if respondent said they “don’t know” or “won’t vote”
• education: respondent’s highest education qualification – 1 - no qualifications – 2 - secondary education – 3 - general certificate of education at advanced level (A level) – 4 - undergraduate degree – 5 - post graduate degree
• age: respondent’s age in years
Question 1 (10 pts) In the code block below, read in the BES dataset. Remove missing observations using na.omit.
Question 2 (10 pts) We want to get an estimate for the age of the British voting population and we also want to include a measure of our uncertainty about this estimate based on the sample mean. Calculate the sample mean of the age variable and construct a 95% confidence interval for the sample mean. You should get an estimate for the confidence interval of (50.54, 50.96). Your job is to reproduce the code to estimate this confidence interval. Hint: consult the textbook, section 7.2.1.
1
Question 3 (10 pts) Were leave voters more likely to be older than stay voters? We found some evidence of this when we examined this survey earlier in the term, but we didn’t quantify our uncertainty about the population estimate. Calculate the sample difference in means of age for leave and stay voters. Use the leave variable and subtract the mean of the stay voters from the mean of the leave voters. Construct a 95% confidence interval around this difference in means. You should get an estimate for the confidence interval of (7.78, 8.58). Your job is to reproduce the code to estimate this confidence interval. Hint: consult the textbook, section 7.2.2.
Question 4 (10 pts, 5 extra points) Interpret your difference-in-means confidence interval. In your interpretation include the following points:
• Provide a substantive interpretation of the sample difference in means estimate. You should include the estimate, indicate what it means in terms of the underlying variable, and include appropriate units of measurement.
• Provide a substantive estimate for the average difference in means in the wider population using the confidence interval (see especially the language at the bottom of pg. 208 of the textbook).
• For 5 extra points discuss the meaning of the confidence interval in terms of what we learned about what confidence intervals represent. To get the full five extra points you need to say what the “95%” represents.
Hypothesis Testing Question 5 (10 pts) For the second part of this problem set we’re going to return to Devah Pager’s study “The Mark of a Criminal Record.’ As a reminder, Pager was interested in the effect of disclosing a criminal record on job applications. To answer her research question Pager ran an audit experiment where she she had real individuals apply to jobs with identical resumes in the Milwaukee area. She randomized whether the person indicated they had a criminal record by including a parole officer as a reference for a random subset of the resumes.
In the code block below read in the Pager dataset “applications.csv.” The “criminal” variable indicates whether the resume included a parole officer as a reference. The “race” variable indicates whether the research assistant posing as a job application was White or Black. The “call” variable indicates whether the respondent received a call back from a potential employer (1 if yes, 0 if no).
As we discussed in the midterm, Pager was also interested in how this effect might vary for Black vs. White job applicants. To simplify this problem set section, we’re just going to look at the effect of a criminal record on White applicants. Subset the dataset to just White applicants. You should end up with a dataframe of 300 observations. In case you are having trouble subsetting the data we’ve created a second dataset “applications_white_subset.csv.” If you can’t get your code to work you can directly read in that dataset (but you won’t get full points for this question).
Question 6 (10 pts) Let’s start by using a linear model to estimate the average causal effect of having a criminal record on the probability of receiving a call back for a job interview for White applicants. In this question, do the following steps:
1. In the code block below, fit a linear model to the data in such a way that the estimated slope coefficient is equivalent to the difference-in-means estimator you are interested in. Store the fitted model in an object called fit. You should have an estimate for the slope coefficient βcriminal of -.173.
2. In text below the code block, answer the following question: what is the estimated average treatment effect of having a criminal record on job call backs? Provide a full substantive answer (make sure to
2
include the assumption, why the assumption is reasonable, the treatment, the outcome, as well as the direction, size, and unit of measurement of the average treatment effect, consult the lecture slides on experiments if you need a refresher on how to write these interpretations).
Question 7 (10 pts) The answer we calculated above is from a sample. We want to know whether we have evidence that there is an effect of having a criminal record for the population of job applicants. To answer this question we’re going to conduct a hypothesis test. Do we have evidence the effect of having a criminal record is different from zero? Let’s start by writing out the null and alternative hypothesis. Do so in a bulleted below (5 points each).
Question 8 (10 pts) In the code block below, use the summary() command on your fitted model output (which you saved above as “fit”). In text below the code block describe (1) what the test statistic for the treatment effect is and (2) the associated p-value. In case you are having trouble figuring out this code, we have included an image of what this output should look like in the appendix. You can answer the two written questions using this image. See especially section 7.3.2 of the the textbook for how to run this code and interpret its results.
Question 9 (10 pts) Is the effect of a criminal record statistically significant at the 5% level? Do we have enough evidence to reject the null hypothesis? Why or why not? For full points you must connect your answer to either the test statistic or the p-value of β̂criminal AND what those values would have to be greater than (for test statistic) or less than (for p-value) to indicate a statistically significant effect at the 5% level.
Question 10 (5 points extra credit) If we (1) tripled the sample size of our experiment and (2) constructed a confidence interval around the β̂criminal, would the confidence interval be larger or smaller than same confidence interval constructed with the current sample size?
Appendix Output of summary() for fitted linear model (from question 8)
3
4
- Confidence Intervals
- Question 1 (10 pts)
- Question 2 (10 pts)
- Question 3 (10 pts)
- Question 4 (10 pts, 5 extra points)
- Hypothesis Testing
- Question 5 (10 pts)
- Question 6 (10 pts)
- Question 7 (10 pts)
- Question 8 (10 pts)
- Question 9 (10 pts)
- Question 10 (5 points extra credit)
- Appendix