Data Review Project
Name: _________________________________________
Data Review Project
150 points Please be sure to do each task listed to receive full credit. All information must be word processed and placed in bullet points unless otherwise specified. Please make sure that your name, the name of the class, and my
name is one the upper left-hand corner of the first page of your assignment. North Valley Real Estate Project (Appendix A; page 746)
1. Look over the following variables in the report on homes sold in the area last year: selling price, number of bedrooms, township, and mortgage type.
a. Which of the variables are qualitative and which are quantitative? b. How is each variable measured? Determine the level of measurement for each of the variables.
2. For the variable price, organize the selling prices into a frequency distribution using five class-intervals. a. What values of the price tend to cluster? b. What is the typical selling price in the first class? What is the typical selling price in the last
class? c. What is the mean sales price? What is the standard deviation? d. What is the price range? What is the median sales price? e. About 95% of the sales prices are between what two values? f. Create a cumulative relative frequency distribution. What price (or less) does fifty percent of the
homes sold for last year? What is the lowest price that the top ten percent of the homes sold? What percent of the homes sold for less than $300,000?
g. Construct a box plot for sales price. 3. Construct a bar chart for the following variables: Bedrooms, bathrooms, and township.
a. What is the mode of each of these variables 4. Develop a scatter diagram with price on the vertical axis and the size of the home on the horizontal
axis. a. Is the relationship between the two variables indirect or direct? b. Develop the same scatter diagram but include only homes without a pool. c. Develop the same scatter diagram but include only homes with a pool. d. Compare the relationships with and without a pool.
5. Sort the data into a table that shows the number of homes that have a garage attached versus those that don’t in each of the five townships. If a home is selected at random, compute the following probabilities:
a. The home has a garage attached. b. The home does not have a garage attached, given that it is in Township 5. c. The home has a garage attached and is in Township 3. d. The home does not have a garage attached or is in Township 2.
1 Derived from assignments from Lind, Statistical Techniques in Business and Economics, 17e
6. Create a probability distribution, the mean, and the standard deviation for the following variables: a. The number of bedrooms b. The number of bathrooms
7. The mean days on the market is 30 with a standard deviation of 10 days. Create a frequency distribution of days on the market.
a. What do you observe in the frequency distribution? b. Use the normal distribution to estimate the number of homes on the market more than 24 days.
Compare this to the actual results. c. If days on the market is normally distributed, how many homes should be on the market more
than the mean number of days? Compare this to the actual number of homes. Is the normal distribution a good approximation of the actual results?
8. Assume the 105 homes is a population. A sample is taken using the following records: 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100.
a. Calculate the sample mean and sample standard deviation for the sales price. b. Using your answers from #2 part c, compare this to the sample mean and standard deviation. c. Determine the likelihood of a sample mean price this high or higher. d. Based on this sample, develop a 95% confidence interval for the following variables
i. Mean selling price ii. Mean days on the market iii. Proportion of homes with a pool
9. A new agent was assigned the 20 homes to sell. The agent reviewed his sample and felt it was significantly different from that of the firm’s average selling price.
a. Does the agent feel his sample is more or less than the firm’s average price? b. What would be the null and the alternative hypothesis statements? c. Conduct a hypothesis test to see if the agent is correct with 0.05 significance level.
10. Conduct a two-sample hypothesis test for the following scenarios at a 0.05 significance level: a. Can we conclude that there is a difference in the mean selling price of homes with a pool and
homes without? b. Can we conclude that there is a difference in the mean selling price of homes with an attached
garage and homes without an attached garage? c. Can we conclude that there is a difference in the mean selling price of homes that are in default
on the mortgage? 11. Test the following scenarios using ANOVA. Note the significance level changes based on the scenario.
a. At the 0.02 significance level, is there a difference in the variability of the selling prices of the homes that have a pool versus those that do not have a pool?
b. At the 0.10 significance level, is there a difference in the mean selling price of the homes among the five townships?
c. At the 0.10 significance level, is there a difference in the mean selling price of the homes among the selling agents?
2 Derived from assignments from Lind, Statistical Techniques in Business and Economics, 17e
12. For the following scenarios determine the regression equation, then use the information
a. Let the selling price be the dependent variable and size of the home the independent variable. Then, estimate the selling price for a home with an area of 2,200 square feet and provide a 95% prediction interval for the selling price of a home of this size.
b. Let days-on-the-market be the dependent variable and price be the independent variable. Then, estimate the days-on-the-market of a home that is priced at $300,000 and provide a 95% prediction interval for days-on-the-market for that price.
13. Use selling price of the home as the dependent variable and determine the regression equation using the size of the house, number of bedrooms, days on the market, and number of bathrooms as dependent variables.
a. Which independent variables have a significant relationship with the dependent variable? b. Do you see any multicollinearity problems? c. Determine a multiple regression equation based on your analysis in part a and b and your
R-square results. d. Evaluate the addition of the variables: pool or garage. Report any changes in your results. e. Develop a histogram of the residuals from your final regression. What normality assumption
met? \\
3 Derived from assignments from Lind, Statistical Techniques in Business and Economics, 17e