Statisctics
Discussion 1
You have learned about the inferences about population variances, comparing multiple proportions, test of independence and goodness of fit, and now please answer the following questions in detail by applying the knowledge that you have gained from readings and lectures of this week. It is important to include hypothetical examples whenever applicable.
1. Describe how chi squared and F random variables are generated. What are the properties of the distribution of these random variables?
2. Discuss the objective in testing hypotheses on variance of one population, and variances of two populations, and the underlying assumptions. Explain formulation of the hypothesis, the test statistic, the rationale for rejecting the null, the criterion for choosing the rejection region, possible test outcomes, and the criterion for evaluating the p value. Provide hypothetical examples of formulating hypotheses on variance of a population, and uniformity of variance across two populations.
3. Explain the chi square test on uniformity of a proportion across the several populations, goodness of fit, and independence. Explain formulation of the hypothesis, the test statistic, the rationale for rejecting the null, the criterion for choosing the rejection region, possible test outcomes, and the criterion for evaluating the p value. Provide a hypothetical example of formulating hypotheses on uniformity of proportion across several populations. Provide hypothetical examples of formulating hypotheses in each case.
Activity 1 – CLO 1, CLO 2, CLO 3, CLO 4
1. Ball bearing manufacturing is a highly precise business in which minimal part variability is critical. Large variances in the size of the ball bearings cause bearing failure and rapid wear-out. Production standards call for a maximum variance of .0001 inches. Gerry Liddy has gathered a sample of 15 bearings that shows a sample standard deviation of .014 inches. Use ɑ = .10
a. Please determine whether the sample indicates that the maximum acceptable variance is being exceeded.
b. What is the p value?
2. The grade point averages of 352 students who completed a college course in financial accounting have a standard deviation of .940. The grade point averages of 73 students who dropped out of the same course have a standard deviation of .797.
a. Do the data indicate a difference between the variances of grade point averages for students who completed a financial accounting course and students who dropped out?
b. Use ɑ = .05 level of significance.
c. What is the p value?
Note: F of alpha / 2 with degrees of freedom 351 and 72 which yields 0.025 area under its graph to the right is 1.466
Activity 2 – CLO 1, CLO 2, CLO 3, CLO 4
This Activity requires a detailed analysis and to provide the answer to the three questions below:
1. As listed by The Arts Newspaper’s Visitor Figures Survey (https://www.theartnewspaper.com/visitor-figures-2017), the five most-visited art museums in the world, provided in Table 1 are the Louvre Museum, the National Museum in China, the Metropolitan Museum of Art, the Vatican Museums, and the British Museum.
Table 1
Frequencies of Ratings of the Museums by the Visitors Included in the Sample
|
|
Louvre |
National Museum in China |
Metropolitan Museum of Art |
Vatican Museums |
British Museum |
|
Spectacular |
113 |
88 |
94 |
98 |
96 |
|
Not spectacular |
37 |
44 |
46 |
72 |
64 |
Use the sample data provided in the Table 1 to answer the following questions:
a. Calculate the point estimate of the population proportion of visitors who rated each of these museums as spectacular.
b. Conduct a hypothesis test to determine if the population proportion of visitors who rated the museum as spectacular is equal for these five museums using ɑ = .05 level of significance. What is the p-value?
c. If the null is rejected perform post-hoc test(s) using same ɑ and make conclusions.
0. A Financial Times/Harris Poll surveyed people in six countries to assess attitudes toward a variety of alternate forms of energy. The data in Table 2 is a portion of the poll’s findings concerning whether people favor or oppose the building of new nuclear power plants.
Table 2
The Frequencies of the Respondent’s Opinion about a New Nuclear Power Plant by Country of Origin.
|
|
Great Britain |
France |
Italy |
Spain |
Germany |
United States |
|
Strongly favor |
141 |
161 |
298 |
133 |
128 |
204 |
|
Favor more than oppose |
348 |
366 |
309 |
222 |
272 |
326 |
|
Oppose more than favor |
381 |
334 |
219 |
311 |
322 |
316 |
|
Strongly oppose |
217 |
215 |
219 |
443 |
398 |
174 |
Use the sample data provided in the Table 2 to answer the following questions:
a. How large was the sample in this poll?
b. Conduct a hypothesis test to determine whether people’s attitude toward building new nuclear power plants is independent of country using ɑ = .05 level of significance. What is the p-value and what is your conclusion?
c. Using the percentage of respondents who “strongly favor” and “favor more than oppose,” which country has the most favorable attitude toward building new nuclear power plants? Which country has the least favorable attitude?
0. Based on 2017 sales, the six top-selling compact cars are the Honda Civic, Toyota Corolla, Nissan Sentra, Hyundai Elantra, Chevrolet Cruze, and Ford Focus (New York Daily News, http://www.nydailynews.com/autos/street-smarts/best-selling-small-cars-2016-list-article-1.2945432). The 2017 market shares are: Honda Civic 20%, Toyota Corolla 17%, Nissan Sentra 12%, Hyundai Elantra 10%, Chevrolet Cruze 10%, and Ford Focus 8%, with other small car models making up the remaining 23%.
A sample of 400 compact car sales in Chicago showed the number of vehicles sold in Table 3.
Table 3
The Frequencies of Sales of Car Brands
|
Honda Civic |
98 |
|
Toyota Corolla |
72 |
|
Nissan Sentra |
54 |
|
Hyundai Elantra |
44 |
|
Chevrolet Cruze |
42 |
|
Ford Focus |
25 |
|
Others |
65 |
Use a goodness of fit test to determine if the sample data indicate that the market shares for compact cars in Chicago are different than the market shares suggested by nationwide 2017 sales using ɑ = .05 level of significance.
a. What is the p-value and what is your conclusion?
b. If the Chicago market appears to differ significantly from the nationwide sales, which categories contribute most to this difference?
discussion Question 2 – CLO 5, CLO 12
You have learned about the multiple linear regression, and now please answer the following questions in detail by applying the knowledge that you have gained from readings and lectures. It is important to include hypothetical examples whenever applicable.
1. Explain the multiple linear regression model, the independent variables and the dependent variable, assumptions of the model, objectives, and the approach taken to construct the model, and using the model for prediction.
2. Explain the analysis of variance (ANOVA) test on significance of the regression and how the result of this test is interpreted; discuss the hypotheses on coefficients of the regression and how the results of testing these hypotheses are interpreted about significance of these coefficients; include both unidirectional and bidirectional situations the coefficient of determination, adjusted coefficient of determination, and their significance.
3. Describe how multicollinearity can have adverse effects in constructing the regression model, how it is identified, and how normality of residuals is verified.
Activity 3 – CLO 5, CLO 11, CLO 12
The Tire Rack, America’s leading online distributor of tires and wheels, conducts extensive testing to provide customers with products that are right for their vehicle, driving style, and driving conditions. In addition, the Tire Rack maintains an independent consumer survey to help drivers help each other by sharing their long-term tire experiences. The following data show survey ratings (1 to 10 scale with 10 being the highest rating) for 18 maximum performance summer tires. The variable Steering rates, the tire’s steering responsiveness, Tread Wear rates, quickness of wear based on the driver’s expectations, and Buy Again rates the driver’s overall tire satisfaction and desire to purchase the same tire again. Values shown below are averages as obtained from surveys.
Table 4
Sample of Ratings of Eighteen Maximum Performance Summer Tires.
|
Tire |
Steering |
Treadwear |
Buy Again |
|
Goodyear Assurance TripleTred |
8.9 |
8.5 |
8.1 |
|
Michelin HydroEdge |
8.9 |
9.0 |
8.3 |
|
Michelin Harmony |
8.3 |
8.8 |
8.2 |
|
Dunlop SP 60 |
8.2 |
8.5 |
7.9 |
|
Goodyear Assurance ComforTred |
7.9 |
7.7 |
7.1 |
|
Yokohama Y372 |
8.4 |
8.2 |
8.9 |
|
Yokohama Aegis LS4 |
7.9 |
7.0 |
7.1 |
|
Kumho Power Star 758 |
7.9 |
7.9 |
8.3 |
|
Goodyear Assurance |
7.6 |
5.8 |
4.5 |
|
Hankook H406 |
7.8 |
6.8 |
6.2 |
|
Michelin Energy LX4 |
7.4 |
5.7 |
4.8 |
|
Michelin MX4 |
7.0 |
6.5 |
5.3 |
|
Michelin Symmetry |
6.9 |
5.7 |
4.2 |
|
Kumho 722 |
7.2 |
6.6 |
5.0 |
|
Dunlop SP 40 A/S |
6.2 |
4.2 |
3.4 |
|
Bridgestone Insignia SE20 |
5.7 |
5.5 |
3.6 |
|
Goodyear Integrity |
5.7 |
5.4 |
2.9 |
|
Dunlop SP20 FE |
5.7 |
5.0 |
3.3 |
Use the data provided in Table 4 to answer the following questions:
1. Provide descriptive statistics of the data.
2. Develop two simple regression models that can be used to predict the Buy Again rating given the Steering Rating in one and the Tread Wear rating in the other. State the hypotheses on the coefficients, justify formulation of these hypotheses, and interpret the results. Use ɑ = .05. Include all phases of assessment of the model.
3. Develop a multiple regression model that can be used to predict the Buy Again rating given the Steering rating and the Tread Wear rating. State the hypotheses on the coefficients, justify formulation of these hypotheses, and interpret the results. Use ɑ = .05. Include all phases of assessment of the model and do not forget to check multicollinearity.
4. Does combining the two independent variables improve coefficient of determination? Please explain.
5. Choose a combination of steering and treadwear not given in the above table and find the expected Buy Again for this combination.
Professional Assignment 1 – CLO 3, CLO 5, CLO 6, CLO 11, CLO 12
Consumer Research, Inc., is an independent agency that conducts research on consumer attitudes and behaviors for a variety of firms. In one study, a client asked for an investigation of consumer characteristics that can be used to predict the amount charged by credit card users are given in Table 5. Data were collected on annual income, household size, and annual credit card charges for a sample of 50 consumers.
a. Provide descriptive statistics of the data, develop a multiple regression model that can be used to predict the Amount Charged given the Annual Income and Household Size, state the hypotheses on the coefficients, justify formulation of these hypotheses, and interpret the results. Use ɑ = .05. Include all phases of assessment of the model and do not forget to check multicollinearity.
b. To assess robustness of the software, repeat part (a) but this time use full representation of Annual Income. What conclusions can you make?
c. Choose a combination of Annual Income and Household Size not given in the table and find the expected Amount Charged for this combination.
Use ɑ = .05 in testing all hypotheses
Table 5
Data on Annual Income, Household size, Amount of Credit Changed
|
|