statstics regression ang excel and two-way tables

dabawang.
Rec5A-RegressionwithExcel1.docx

STAT 1430 Recitation 5A Regression with Excel

Nash Information Services provides information and analytical services to the movie industry, including analyses to predict movie revenue.

To study movie revenue, they chose a simple random sample of 40 movies released over a 5-year period (2003-2008), and collected data on each movie. We can use this data to predict movie revenues for future movies.

This data set is listed in the Module 3 section of Carmen as “BoxOffice.xls”. (Note: money is in millions of dollars.)

The variable names (left to right) are:

· Title: the name of the movie

· USRelease: the date the movie was first shown

· Genre: what type of movie is it?

· Rating: what age group can see the movie

· Rating1: whether or not there are age restrictions for the movie (1=yes,0=no)

· Budget: cost to make the movie

· Opening: total box office revenue during opening weekend

· Theaters: number of theaters showing the movie during opening weekend.

· IntRevenue: entire amount of money made outside the U.S.

· USRevenue: entire amount of money the movie made in the United States during the entire time it was shown

· WorldRevenue: entire amount of money made both in and out of U.S.

· Profit: whether movie made a profit (1=yes, 0=no)

Let’s suppose we ultimately want to predict box Total U.S. Box Office Revenue for a movie, using data from these previous movies. We start by looking for variables with which it has a strong relationship. (Remember, we can only look for linear relationships in this class.)

1. Why is Total U.S. Box Office Revenue considered the “Y” (dependent) variable in this case?

2. We need to find an appropriate X (independent) variable to help us predict Total U.S. Box Office Revenue. Which of the variables in this data set are eligible as potential candidates? (Note only certain types of variables can qualify for this type of analysis.)

3. Explain why Total World Box Office Revenue wouldn’t be a fair variable to use to predict Total U.S. Box Office Revenue. (You can then cross it off your list above).

4. To look for potential relationships that any of these variables have with Total U.S. Box Office Revenue, use EXCEL to make the appropriate scatterplots. SKETCH them here and have ready on the computer in case your TA asks. Be sure to include titles and axis names. Don’t spend too much time making them perfect, but try to capture the general shape of the scatterplot

5. Now use EXCEL to calculate the appropriate correlations to quantify these relationships. Label and INTERPRET each correlation, using the 3 items we learned in lecture.

6. Based on BOTH the scatterplots AND the correlations, which variable would do the best job of predicting Total U.S. Box Office Revenue? Justify your answer completely.

7. The first method we discussed in lecture to find the best fitting line was to calculate 5 descriptive statistics and use them in the formulas to find the slope and y-intercept of the best fitting line. For this data set, use EXCEL to calculate those 5 descriptive statistics, using the variable you selected in #6 above. Write them down and label them as we did in lecture.

8. Use those 5 descriptive statistics above to calculate the equation of the best-fitting line. Write it down and show your work. (If needed use your notes or see formula sheet on Carmen/Course Info.) Label X and Y.

9. The 2nd method we learned to find the best fitting line was to use EXCEL to do an actual regression analysis. Using Excel, do a regression analysis and write down the equation of the best fitting line using the coefficients it gives you. Write down the equation, labeling X and Y. Check to see how it matches with your above answer. (There may be round-off error but not much.)

10. Using the Excel output and your notes, what % of the variability in U.S revenue can be explained by Opening Weekend Revenue? Does it do a good job of predicting U.S. revenue? Why?

11. For what values of Opening Revenue is it appropriate and safe to make predictions about U.S. Revenue without extrapolation? (Use Excel scatterplot or statistics to help you figure it out.)

12. Find the movie Madagascar in your data set.

a. Find the observed U.S. Revenue for this movie from the data. (Use proper units.)

b. Find the predicted U.S. Revenue for this movie from the best fitting line you calculated in above. (How do we use X to predict Y in an equation?)

c. Calculate the residual for this movie from the formula we used in lecture.

d. Did this movie make MORE or LESS money than expected? Explain briefly.

13. Make a residual plot of this whole data set. You don’t need to sketch your plot, but keep it on your computer for the rest of this assignment.

To do this:

Go to Data/Data Analysis, and select “Regression”/OK. When the box comes up click “Residual PLOTS” and click “New Workbook”. To tell it what data you want, click the “X” box and highlight all the values of X you want to use, starting with the 2nd row in the data set (leave off the label). Then click the “Y” box and highlight all the values you want to use, starting with the 2nd row (leave off the label). Then select OK. You will see the residual plot right away, and if you scroll down through the analysis at the bottom you see the actual residuals for each value in the data set.

Question:

A few of the residuals are outside the range of -$100 million to $100 million. Using your Excel regression output and the data set, find the movies that have residuals outside this range. Include their names and the values of their residuals as shown by Excel. (You will have to search your data to find them.)

14. Now look through your Excel output and data set to find the one movie that has the most POSITIVE residual and write down the movie name, its actual US revenue, and the revenue that was predicted for that movie using the best-fitting line you found previously. Then calculate the actual value of the residual and compare to what Excel got in the previous problem. They should match (subject to some rounding error.)

Movie:

Actual U.S. revenue for this movie:

The predicted U.S. revenue I calculated using the best fitting line:

Residual I calculated from the results:

Residual I calculated from the computer output:

Did this move make MORE or LESS U.S. Revenue than predicted?