Final Assignment

profilePErl
FinalAssignment.docx

Instruction and rules:

Dear students. In order to have no confusion after the submission, let us set some rules:

· Your solutions must be numbered properly. If you do not provide the number of the question you are referring to, you will not receive any credit, even if you have the correct answer!

· For part 2 which is more empirical, I still want your written responses on a paper. Please do not just submit the excel file. Again, your written responses must have numbers referring to the question you are responding to.

· Even 1 second of late submission is not accepted. It is not fair to other students. Please follow this rule and do not leave the submission to the last minutes.

Part 1:

1. Hypothesis Testing: Average years of education of individuals in the United States is 15.5 years. The standard deviation is 3.5 years. Bob has 12.5 years of education. Find what percentage of individuals have less years of education than Bob (assuming the years of education is a Normal variable).

2. Hypothesis Testing: It is believed that average temperature in a normal Winter day is A researcher claims we are experiencing warmer weather during the Winter due to the global warming. She gathers a data set observing 160 normal day in winter and she gets a sample average of The standard deviation she observes is Does this mean she has a statistically significant claim, meaning does her sample rejects or fails to reject (provide both approaches, zscore and pvalue).

3. Expected Value and Variance: The random variable D takes on the values: 1, 2, 3, 4, 5 ,6, 10 with probabilities: 0.301, 0.202, 0.044, 0.102, 0.055,0.011, 0.285. What is the expected value, variance, and standard deviation of D. (if you use Excel you should also provide detailed calculations for all 3 on paper as well, following their formulas).

Part 2:

4. Open the Excel file and carefully read the Data-description.

5. How many variables are in the data? How many observations we have in the data? How many dummy/binary variables do we have in the data (if any)?

6. What is the description of variable “income”? What is the sample average of it? What is the unit of it? Provide a histogram of “income”. The bins are your choice but pick them wisely! Report which bin contains the greatest number of observations?

7. What is the sample average of “age”, “fatalityrate”, and “sb_useage”? and what does these values mean?

8. What is the sample variance and the standard deviation of variable “age”, “income”, “sb-usage”, “fatalityrate” (note: You should report your numbers with appropriate unit of measurement? For example, if money is it $ or cents?! If age is it years or months?)

9. What is the confidence interval of the sample-mean of “income”? (Hint: first you should calculate the mean (which you did in part 6), then the margin of error, and then add and subtract the margin of error from the sample mean to find the confidence interval.). how is this interval related to hypothesis testing?

10. Provide a scatter plot of “fatalityrate” over “sb_useage”. (fatalityrate on the vertical and sb_useage on the horizontal axis.)

11. Do you see any correlation in the scatter plot created in the previous part? What is value of the correlation between “fatalityrate” and “sb_useage”? Is it positive or negative? What does it mean?

12. Run a regression “fatalityrate” over “sb_useage” (“fatalityrate” is the dependent/y variable).

13. What is the predicted intercept?

14. What is the predicted slope?

15. What is the effect of “sb_useage” on “fatalityrate”? (you should read/report the slope, the slope is the effect of x variable on y, 1 unit of x causes B1 unit of y).

16. Write down the predicted equation. (you can use excel for this and add the trend line).

17. What is r-squared of the regression? Is our regression a good fit?

18. Is the B1 significant? Why or why not?

19. What is the confidence interval for B1? What does this interval mean?

Do not forget to attach the plot (from part 10) and the fitted line.