Data Analysis 12 hours

profilemaq
stat250hw2-dataanalysis.docx

STAT 250 Fall 2017 Data Analysis Assignment 2

Your submitted document should include the following items. Points will be deducted if the following are not included. For an example of formatting, see the sample problem/solution document posted on Blackboard.

1. Type your Name and STAT 250 with your correct section number (e.g. STAT 250-xxx), right justified on the top of page 1 of your document.

1. Type Data Analysis Assignment 2 centered on Page 1 under your name.

1. Number your pages across your entire solutions document.

1. Your document should include the ANSWERS ONLY with each answer labeled by its corresponding number and subpart. Keep your solutions in order. You should not include the questions in your submitted document. Please see the posted model solution as an example.

1. Generate all requested graphs using StatCrunch. Graphs must be appropriately titled and should refer to the context of the question. Graphical displays must include labels with units if appropriate for each axis.

1. Upload your solutions document onto Blackboard as a Word or pdf document using the link provided by your instructor.

Elements of good technical writing:

Use complete and coherent sentences to answer the questions.

Again, graphs must be appropriately titled and should refer to the context of the question.

Again, graphical displays must include labels with units if appropriate for each axis.

Units should always be included when referring to numerical values.

When making a comparison you must use comparative language, such as “greater than”, “less than”, or “about the same as.”

Ensure that all graphs and tables appear on one page and are not split across two pages.

Type all mathematical calculations when directed to compute an answer ‘by-hand.’ Pictures of actual handwritten work is not accepted.

When writing mathematical expressions into your document you may use either an equation editor or common shortcuts such as: can be written as sqrt(x), can be written as p-hat, can be written as x-bar.

Problem 1: Record Low and High Temperatures of 50 States

Use the data set named “Record Temperatures” to answer the following questions. The value of the “LowTemp” variable is the record low temperature for a state in degrees Fahrenheit. The value of the “HighTemp” variable is the record high temperature for a state in degrees Fahrenheit.

a) Investigate the distribution of record low temperatures by producing a properly titled and labeled frequency histogram using the “LowTemp” variable. What is the shape of this distribution?

b) Calculate the mean and standard deviation of record low temperatures using StatCrunch. (For this part, just include the table from StatCrunch or type your answers).

c) For the record low temperatures, determine how well the Empirical Rule does in predicting the percentage of observations within some number of standard deviations of the mean by answering the following.

i) Calculate the intervals corresponding to one, two, and three standard deviations about the mean. (Do not round the values for the mean and standard deviation found in part (b) before computing the intervals. Use the values as they appear in the StatCrunch output.) Round the endpoints of the final intervals correctly to two decimal places. Present the three intervals including units as (low value, high value).

ii) Use StatCrunch to determine the actual percentage of observations falling in each of these intervals. (Sort the variable values from least to greatest using Data Sort and actually count the number of states with values falling in these intervals. Then, divide that count by 50.)

iii) Compare the actual percentages to what the Empirical Rule predicts for each interval. Summarize your findings in three sentences.

d) Investigate whether Virginia’s record low or high temperature is more extreme compared to the other 49 states.

i) By looking at the data set in StatCrunch, find the record low temperature and record high temperature for Virginia. State each here.

ii) Using the mean and standard deviation of record low temperatures found in part (b), compute the z-score for Virginia’s record low temperature. (Show your work. Do not round the values for the mean and standard deviation found in part (b) before computing the z-score. Use the values as they appear in the StatCrunch output.)

iii) Calculate the mean and standard deviation of record high temperatures using StatCrunch. Copy the table from StatCrunch showing the mean and standard deviation into your document. Use these to compute the z-score for Virginia’s record high temperature. (Show your work. Do not round the values for the mean and standard deviation before computing the z-score.)

iv) Based on comparison of the z-scores, is Virginia’s record low temperature or record high temperature more extreme compared to the other 49 states? Note, the more extreme temperature has a z-score further away from 0. Justify your answer in one or two sentences.

Problem 2: Fairfax City Home Sales 2017

The data set named “Fairfax City Home Sales” contains a random sample of homes sold in Fairfax City in 2017. This sample was provided by a local realtor using the Multiple Listing Service (MLS). The goal of this problem is to explore various factors that may help a buyer predict the selling (or closing) price of a home. Consider four explanatory variables:

· The Year the home was built (variable named “Year”).

· The number of days the home has been listed (variable named “Days”)

· The taxable living area in square feet (variable named “TLArea”).

· Lot Size in acres (variable named “Acres”).

This problem considers using each of these explanatory variables to attempt to predict a home’s closing price (variable named “Price”)

a) Investigate the relationship between the explanatory variable “Year” and response variable “Price” by doing the following:

i) Make a scatterplot.

ii) Calculate the correlation coefficient.

iii) Interpret the scatterplot and correlation coefficient in terms of trend, strength, and shape (form) in one complete sentence.

b) Repeat part (a) for the explanatory variable “Days.”

c) Repeat part (a) for the explanatory variable “TLArea.”

d) Repeat part (a) for the explanatory variable “Acres.”

e) Based on your findings in parts (a) through (d), which of the four explanatory variables would be most appropriate for predicting the response variable “Price?” Justify your choice in one sentence.

f) For the “most appropriate” variable identified in part (e), run a Simple Linear Regression analysis in StatCrunch. Copy and paste only the StatCrunch results output (no tables).

g) Add the fitted line plot to your solutions. This graph appears on page 2 of your output.

h) State the regression equation.

i) Interpret the slope of the regression line (in context of this data set).

j) Is it meaningful to interpret the y-intercept? Why or why not?

k) State r-squared (i.e., the coefficient of determination) and explain what this value means in context of the data set.

l) Use the regression equation from part (h) to predict the selling price for the specific home (TLArea was 3500 feet; Acres was 0.33; and Year was 2010, Days was 10). State your predicted value in a sentence that is in context of the data. Don’t forget units! Note: You can do this calculation “by hand” or using StatCrunch.

m) Is your prediction in part (l) an example of extrapolation? Why or why not?

Problem 3: Simulating Rolling Two Dice

We will be comparing empirical (relative frequencies based on an observation of a real-life process) to theoretical (long-run relative frequency) probabilities. We will use StatCrunch to simulate rolling two dice. Conduct the following simulation by using the steps below:

a) Step 1: Under Applets Simulation Select Dice rolling from the menu.

Step 2: In window, enter 6 for the number of sides and 2 for the number of dice.

Step 3: Select Compute!

Step 4: Select 1000 runs to simulate rolling the two dice 1000 times as shown below. The result of this simulation will appear as a bar graph.

Step 5: Clear the box to the right of “Sum of 2 rolls” for part (a) (none of the bars in the chart will now be highlighted).

Step 6: Copy your chart into your document for your answer to part (a).

NOTE: You will use this result to answer parts (b) – (d).

YOUR RESULT WILL APPEAR HERE

Box 2: Use this box to enter specific values in part (b)-(d).

Box 1: Use the down arrow to change the equality/inequality sign from >=, >, =, <=, or <

Using your result from the 1000-run simulation found in part (a), find the following three proportions for parts (b)-(d) and then compare these empirical probabilities with their theoretical probabilities. DO NOT GENERATE ANOTHER RESULT. You only need to adjust the information in boxes 1 and 2 above to answer parts (c)-(e).

b) Build the probability distribution for the sum of two dice (i.e. list the possible values of the random variable, X (sum of two dice) and each value’s corresponding probability) in table form and present this table into your document (you can do this in StatCrunch or by listing the values and corresponding in Word). Use the sample space (shown in examples in class and textbook page 212) and theoretical probability to calculate your probabilities. Present the probabilities as fractions out of 36 in the table. Note, you can use this table to calculate your theoretical probabilities in parts (c)-(e).

c) Under Event in the applet, enter: “Sum of 2 rolls equals 5.” Use options->copy to copy this chart into your document.

Now calculate the theoretical probability that “Sum of 2 rolls equals 5” using the sample space of 36 possible outcomes or the probability distribution built in part (b). State this probability as a decimal to three decimal places in a sentence.

In another sentence, compare your empirical probability (found in the simulation) to the theoretical probability of obtaining a sum of 2 rolls equal to 5. Remember to justify your answer by including the values.

d) Under Event now find: “Sum of 2 rolls greater than or equal to 5.” Use options->copy to copy this chart into your document.

Now calculate the theoretical probability that “Sum of 2 rolls greater than or equal to 5” using the sample space of 36 possible outcomes or the probability distribution built in part (b). State this probability as a decimal to three decimal places in a sentence.

In another sentence, compare the empirical probability (found in the simulation) to the theoretical probability of obtaining a sum of two rolls that is greater than or equal to 5.

e) Under Event find: “Sum of 2 rolls less than 5” Use options->copy to copy your answer into your document.

Now calculate the theoretical probability that “Sum of 2 rolls less than 5” using the sample space of 36 possible outcomes or the probability distribution built in part (b). State this probability as a decimal to three decimal places in a sentence.

In another sentence, compare the empirical probability (found in the simulation) to the theoretical probability of obtaining a sum of two rolls that is less than 5.

1

p

ˆ

x

x