STAT Question

profileAAZZDD
PS4s21.docx

1

Biweekly averages of the water salinity and river discharge in Pamlico Sound, North Carolina were recorded between the years 1972 and 1977. The data contain the variables: sal: the average salinity of the water over two weeks, lag: the average salinity of the water lagged two weeks, trend: a factor indicating in which of the 6 biweekly periods between March and May the observations were taken, and dis: the amount of river discharge during the two weeks for which sal is the average salinity. The data are in the data set “salinity” in the library “boot”.

1. Calculate a bootstrap estimate of the median salinity (sal). Include your working and clean code. No screen shots!

2. Calculate the bias for that estimate.

3. For a confidence interval for the median from the salinity data, is it necessary to use a BCA bootstrap confidence interval instead of a bootstrap percentile confidence interval? Explain your answer.

4. Obtain a simple linear least-squares regression for salinity vs. discharge data. Give the equation for the fit.

5. Is the fit a good one? Explain (hint: use the residuals to examine the assumptions of linear regression). You don’t have to include any plots – just the code and your explanations.

6. Obtain a LOESS fit. Describe the LOESS line (its direction and shape).

7. Is the LOESS fit better than the linear fit? Explain.

8. Calculate the Pearson’s correlation coefficient for the relationship between salinity and discharge. Interpret the coefficient.

9. Calculate the Spearman’s correlation coefficient for the relationship between salinity and discharge. Interpret the coefficient.

10. Which correlation coefficient is more appropriate and why?

A random sample of 300 customer weekday noon-hour waiting times at a downtown Columbus Chipotle restaurant were recorded (in minutes). The data are in the R data frame waittime.Rdata on the INSR website. We have downloaded data from this website in class, and instructions can also be found in INSR and in previous homework assignments.

1. Describe the distribution of waiting times.

2. Calculate and report the sample mean wait time. Is the sample mean a good measure of the center of these data? Explain.

3. Create a single bootstrap sample from the waiting times. Copy and paste a summary (using the summary function) of it below with your code (no screen shots). Give the mean and median of this sample.

4. Create the sampling distribution of the sample mean using bootstrapping. Use 1000 replications. Describe the shape of the sampling distribution of the sample mean. Is it approximately normal? Are you surprised? Why or why not? No need to show the histogram, just describe it. Make sure to include your code.

5. With the observed sample, the traditional estimate for the SE of the mean is 0.05438. Give the bootstrap estimate of the SEM. Explain the difference between the two, if there is one.

6. Find and interpret a 95% bootstrap percentile confidence interval for the true mean weekday noon hour waiting times at the downtown Columbus Chipotle restaurant.

7. Find and interpret a 95% bootstrap percentile confidence interval for the true median weekday noon-hour waiting times at a downtown Columbus Chipotle restaurant.

8. Estimate the bias for the mean and the median. Comment on the estimates. How does the bias affect the validity of the CI?

9. Use the “boot” function to obtain BCA intervals for the mean. How does this interval compare with the bootstrap percentile interval? Which one should you use and why?

10. Use the “boot” function to obtain BCA intervals for median of the waittimes. How does this interval compare with the percentile intervals? Which one should you use and why?