RSH 901 Week 6, 7th and 8th $75

profileNeed_Help_Now96
RSH901_Discussion_Assignment112.docx

WEEK 6:

Assignment:

1. Case: Data Mining using Chi Squared Part 1 in the textbook.  Study the case carefully and then answer one of the first four questions from the Questions for Thought section.

2. Case: Data Mining using Chi Squared Part 2 in the textbook.  Using the same case answer one question from questions 5-8 from the Questions for Thought section. 

3. Select one of the following:

a. Review your chosen thesis and check if the topics from module have been used in the thesis. How has it been used and discuss any improvements.

Or

b. Damaged Machines - Problem 39 in Chapter 16.

An appliance manufacturer stockpiles washers and dryers in a large warehouse for shipment to retail stores. Some appliances get damaged in handling. The long-term goal has been to keep the level of damaged machines below 2%. In a recent test, an inspector randomly checked 60 washers and discovered that 5 of them had scratches or dents. Test the null hypothesis: p  0.02 in which p represents the probability of a damaged washer.

(a) Do these data supply enough evidence to reject? Use a binomial model from Chapter 11 to obtain the p-value.

(b) What assumption is necessary in order to use the binomial model for the count of the number of damaged washers?

(c) Test by using a normal model for the sampling distribution of. Does this test reject?

(d) Which test procedure should be used to test? Explain your choice.

4. New Contact Lens - Problem 24 in Chapter 17.

Doctors tested a new type of contact lens. Volunteers who normally wear contact lenses were given a standard type of lens for one eye and a lens made of the new material for the other. After a month of wear, the volunteers rated the level of perceived comfort for each eye.

(a) Should the new lens be used for the left or right eye for every patient?

(b) How should the data on comfort be analyzed?

5. Stock Movement - Problem 27 in Chapter 17.

A stock market analyst recorded the number of stocks that went up or went down each day for 5 consecutive days, producing a contingency table with 2 rows (up or down) and 5 columns (Monday through Friday). Are these data suitable for applying the chi-squared test of independence?

WEEK 7:

Assignment:

1. Case: Analyzing Experiments in the textbook. 

Study the case carefully and then answer question 1 and one other from question 2 through question 5 of the Questions for Thought section. 

2. Select one of the following:

a. Study your chosen thesis and find out if any of the topics covered in this module are applicable. Has they been applied correctly? What improvements would you suggest and why. Support your thinking.

Or

b. OECD Part 1 - Problem 45 in Chapter 19.

The Organization for Economic Cooperation and Development (OECD) tracks various summary statistics of the member economies. The countries lie in Europe, parts of Asia, and North America. Two variables of interest are GDP (gross domestic product per capita, a measure of the overall production in an economy per citizen) and trade balances (measured as a percentage of GDP). Exporting countries tend to have large positive trade balances. Importers have negative balances. These data are from the 2005 report of the OECD.

(a) Describe the association in the scatterplot of GDP on Trade Balance. Does the association in this plot move in the right direction? Does the association appear linear?

(b) Estimate the least squares linear equation for GDP on Trade Balance. Interpret the fitted intercept and slope. Be sure to include their units. Note if either estimate represents a large extrapolation and is consequently not reliable.

(c) Interpret and associated with the fitted equation. Attach units to these summary statistics as appropriate.

(d) Plot the residuals from this regression. After considering this plot, does it provide an adequate summary of the residual variation?

(e) Which country has the largest values of both variables? Is it the country that you expected?

(f) Locate the United States in the scatterplot and find the residual for the United States. Interpret the value of the residual for the United States.

3. OECD Part 2 - Problem 45 in Chapter 19.

The Organization for Economic Cooperation and Development (OECD) tracks various summary statistics of its member economies. The countries lie in Europe, parts of Asia, and North America. Two variables of interest are GDP (gross domestic product per capita, a measure of the overall production in an economy per citizen) and trade balances (measured as a percentage of GDP). Exporting countries tend to have large positive trade balances. Importers have negative balances. These data are from the 2005 report of the OECD. Formulate the SRM with GDP as the response and Trade Balance as the explanatory variable.

 (a) On average, what is the per capita GDP for countries with balanced imports and exports (i.e., with trade balance zero)? Give your answer as a range, suitable for presentation.

 (b) The foreign minister of Krakozia has claimed that by increasing the trade surplus of her country by 2%, she expects to raise GDP per capita by $4,000. Is this claim plausible given this model?

 (c) Suppose that OECD uses this model to predict the GDP for a country with balanced trade. Give the 95% prediction interval.

 (d) Do your answers for parts (a) and (c) differ from each other? Should they? 

4. OECD Part 3 - Problem 45 in Chapter 19.

The Organization for Economic Cooperation and Development (OECD) tracks summary statistics of the member economies. The countries are located in Europe, parts of Asia, and North America. Two variables of interest are GDP (gross domestic product per capita, a measure of the overall production in an economy per citizen) and trade balance (measured as a percentage of GDP). Exporting countries have positive trade balances; importers have negative trade balances. These data are from the 2005 report of the OECD. Formulate the SRM with GDP as the response and Trade Balance as the explanatory variable.

(a) On average, what is the per capita GDP for countries with balanced imports and exports (i.e., with trade balance zero)? Give your answer as a range, suitable for presentation.

(b) The foreign minister of Krakozia has claimed that by increasing the trade surplus of her country by 2%, she expects to raise GDP per capita by $4,000. Is this claim plausible given this model?

(c) Suppose that OECD uses this model to predict the GDP for a country with balanced trade. Give the 95% prediction interval.

(d) Do your answers for parts (a) and (c) differ from each other? Should they?

5. OECD Part 4 - Problem 45 in Chapter 19.

An analyst at the United Nations is developing a model that describes GDP (gross domestic product per capita, a measure of the overall production in an economy per citizen) among developed countries. She is using national data for 29 countries from the 2005 report of the Organization for Economic Cooperation and Development (OECD). She started with the equation (estimated by least squares):

Estimated per capita GDP = $26,714 +$1.441 Trade Balance

The trade balance is measured as a percentage of GDP. Exporting countries tend to have large positive trade balances. Importers have negative balances. This equation explains only 37% of the variation in per capita GDP, so she added a second explanatory variable, the number of kilograms of municipal waste per person.

(a) Examine scatterplots of the response versus the two explanatory variables as well as the scatterplot between the explanatory variables. Do you notice any unusual features in the data? Do the relevant plots appear straight enough for multiple regression?

(b) Do you think, before fitting the multiple regression, that the partial slope for trade balance will be the same as in the equation shown? Explain.

(c) Fit the multiple regression that expands the one-predictor equation by adding the second explanatory variable to the model. Summarize the estimates obtained for the fitted model.

(d) Does the estimated model appear to meet the conditions for the use of the MRM?

(e) Draw the path diagram for this estimated model. Use it to explain why the estimated slope for the trade balance has become smaller than in the simple regression shown.

(f) Give a confidence interval, to presentation precision, for the slope of the municipal waste variable. Does this interval imply that countries can increase their GDP by encouraging residents to produce more municipal waste? 

WEEK 8:

Benchmark Assignment

This is a benchmark assignment for DCS students. Store your submission with any grading feedback in your Professional's Portfolio and use the following tag: DCS-PG3

Assignment:

1. Case: Automated Modeling in the textbook.

Study the case carefully and then answer question 1 and one other from questions 2-8 from the Questions for Thought section.

2. Select one of the following:

a. Review your chosen thesis and assess if the topics covered in this module have been used. Discuss the application and how they can be improved.

Or

b. R & D Expenses - Problem 43 in Chapter 19.

This data file contains a variety of accounting and financial values that describe 493 companies operating in several technology industries in 2004: software, systems design, and semiconductor manufacturing. One column gives the expenses on research and development (R&D), and another gives the total assets of the companies. Both columns are reported in millions of dollars.

(a) Scatterplot R&D Expense on Assets. Does a line seem to you to be a good summary of the relationship between these two variables? Describe the outlying companies.

(b) Estimate the least squares linear equation for R&D Expense on Assets. Interpret the fitted intercept and slope. Be sure to include their units. Note if either estimate represents a large extrapolation and is consequently not reliable.

(c) Interpret the summary values r2 and se associated with the fitted equation. Attach units to these summary statistics as appropriate. Does the value of r2 seem fair to you as a characterization of how well the equation summarizes the association?

(d) Inspect the histograms of the x- and y-variables in this regression. Do the shapes of these histograms anticipate some aspects of the scatterplot and the linear relationship between these variables?

(e) Plot the residuals from this regression. Does this plot reveal patterns in the residuals? Does se provide an adequate summary of the residual variation?

3. R & D Expenses - Problem 43 in Chapter 19.

This data file contains a variety of accounting and financial values that describe 324 companies operating in the information sector in 2010. The largest of these provide telephone services. One column gives the expenses on research and development (R&D), and another gives the total assets of the companies. Both columns are reported in millions of dollars. These data need to be expressed on a log scale; otherwise, outlying companies dominate the analysis. Use the natural logs of both variables rather than the original variables in the data table. (Note that the variables are recorded in millions, so 1,000  = 1 billion.)

(a) What difference in R&D spending (as a percentage) is associated with a 1% increase in the assets of a firm? Give your answer as a range, rounded to meaningful precision.

(b) Revise your model to use base 10 logs of assets and R&D expenses. Does using a different base for both log transformations affect your answer to part (a)?

(c) Find a 95% prediction interval for the R&D expenses of a firm with $1 billion in assets. Be sure to express your range on a dollar scale. Do you expect this interval to have 95% coverage? 

4. R & D Expenses - Problem 43 in Chapter 22.

This table contains accounting and financial data that describe 324 companies operating in the information sector in 2010. The largest of these provide telephone services. One column gives the expenses on research and development (R&D), and another gives the total assets of the companies. Both columns are reported in millions of dollars. Use the logs of both variables rather than the originals. (That is, set Y to the natural log of R&D expenses, and set X to the natural log of assets. Note that the variables are recorded in millions, so 1,000  = 1 billion.)

(a) What problem with the use of the SRM is evident in the scatterplot of y on x as well as in the plot of the residuals from the fitted equation on x?

(b) If the residuals are nearly normal, of the values that lie outside the 95% prediction intervals, what proportion should be above the fitted equation?

(c) Based on the property of residuals identified in part (b), can you anticipate that these residuals are not nearly normal—without needing the normal quantile plot?

5. R & D Expenses - Problem 43 in Chapter 23.

This data table contains accounting and financial data that describe 324 companies operating in the information sector. The variables include the expenses on research and development (R&D), total assets of the company, and the cost of goods sold (CGS). All columns are reported in millions of dollars; the variables are recorded in millions, so 1,000  = 1 billion. Use natural logs of all variables rather than the originals.

(a) Examine scatterplots of the log of spending on R&D versus the log of total assets and the log of the cost of goods sold. Then consider the scatterplot of the log of total assets versus the log of the cost of goods sold. Do you notice any unusual features in the data? Do the relevant plots appear straight enough for multiple regression?

(b) Fit the indicated multiple regression and show a summary of the estimated features of the model.

(c) Does the estimated model appear to meet the conditions for the use of the MRM?

(d) Does the fit of this model explain statistically significantly more variation in the log of spending on R&D than a model that uses the log of assets alone?

The multiple regression in part (b) has all variables on a natural log scale. To interpret the equation, note that the sum of natural logs is the log of the product,

 a

and that 

b

Hence, the equation

c

is equivalent to

d

The slopes in the log-log regression are exponents in an equation that describes y as the product of the explanatory variables raised to different powers. These powers are the partial elasticities of the response with respect to the predictors. (See Chapter 20 for a discussion of elasticities.)

(e) Interpret the slope for the log of the cost of goods sold in the equation estimated by the fitted model in part (b). Include the confidence interval in your calculation.

(f) The marginal elasticity of R&D spending with respect to CGS is about 0.60. Why is the partial elasticity in the multiple regression for CGS so different? Is it really that different?