Business statistics lab assignment
Project-3
QBA337- Applied Business Statistics
Due-07-18-2021 at 11:59 PM
• You can discuss this lab-work with any other student in the class.
• You can get help or clarification from the instructor.
• You are permitted to use the course notes, books, the internet, Rscript posted on blackboard, and any other materials you like.
• For each question, you need to fully interpret your results. Your report should be typed. You should include your code as an appendix and cite your output properly.
• You can upload your lab-work file under lab assignment folder in Blackboard.
• Late submission is not accepted.
• Project-3’s data are available under Data folder.
• Use of RStudio is required for all project in this class.
1. When trying to decide what car to buy, real value is not necessarily determined by how much
you spend on the initial purchase. Instead, cars that are reliable and don’t cost much to own
often represent the best values. But, no matter how reliable or inexpensive a car may cost to
own, it must also perform well.
To measure value, Consumer Reports developed a statistic referred to as a value score. The
value score is based upon five-year owner costs, overall road-test scores, and predicted re-
liability ratings. Five-year owner costs are based on the expenses incurred in the first five
years of ownership, including depreciation, fuel, maintenance and repairs, and so on. Using
a national average of 12,000 miles per year, an average cost per mile driven is used as the
measure of five-year owner costs. Road-test scores are the results of more than 50 tests and
evaluations and are based upon a 100-point scale, with higher scores indicating better per-
formance, comfort, convenience, and fuel economy. The highest road-test score obtained in
the tests conducted by Consumer Reports was a 99 for a Lexus LS 460L. Predicted-reliability
ratings (1 = Poor, 2 = Fair, 3 = Good, 4 = Very Good, and 5 = Excellent) are based on
data from Consumer Reports’ Annual Auto Survey.
A car with a value score of 1.0 is considered to be “average-value.” A car with a value score of
2.0 is considered to be twice as good a value as a car with a value score of 1.0; a car with a value
score of 0.5 is considered half as good as average; and so on. The data (familysedan.xls)
for 20 family sedans, including the price ($) of each car tested, follow.
1
Managerial Report
(a) Develop numerical summaries of the data.
(b) Use regression analysis to develop an estimated regression equation that could be used
to predict the value score given the price of the car.
(c) Use regression analysis to develop an estimated regression equation that could be used
to predict the value score given the five-year owner costs (cost/mile).
(d) Use regression analysis to develop an estimated regression equation that could be used
to predict the value score given the road-test score.
(e) Use regression analysis to develop an estimated regression equation that could be used
to predict the value score given the predicted-reliability.
(f) What conclusions can you derive from your analysis?
2. Buckeye Creek Amusement Park is open from the beginning of May to the end of October.
Buckeye Creek relies heavily on the sale of season passes. The sale of season passes brings in
significant revenue prior to the park opening each season, and season pass holders contribute
a substantial portion of the food, beverage, and novelty sales in the park. Greg Ross, director
of marketing at Buckeye Creek, has been asked to develop a targeted marketing campaign to
increase season pass sales.
Greg has data (buckeyecreek.xls) for last season that show the number of season pass
holders for each zip code within 50 miles of Buckeye Creek. He has also obtained the total
population of each zip code from the U.S. Census Bureau website. Greg thinks it may be
possible to use regression analysis to predict the number of season pass holders in a zip code
given the total population of a zip code. If this is possible, he could then conduct a direct mail
campaign that would target zip codes that have fewer than the expected number of season
pass holders.
(a) Compute descriptive statistics and construct a scatter diagram for the data. Discuss
your findings.
(b) Using simple linear regression, develop an estimated regression equation that could be
used to predict the number of season pass holders in a zip code given the total population
of the zip code.
(c) Test for a significant relationship at the .05 level of significance.
(d) Did the estimated regression equation provide a good fit?
(e) Use residual analysis to determine whether the assumed regression model is appropriate.
(f) Discuss if/how the estimated regression equation should be used to guide the marketing
campaign.
(g) What other data might be useful to predict the number of season pass holders in a zip
code?
2