data Analysis project
Real Estate Pricing Project
1. Introduction
Venture Catalyst Services, a business management consulting firm based in Mercer Island would like to know if the cost of housing in the immediate surrounding areas is making it difficult to find and retain employees. Our goals for this analysis were to find any relationships between the location, home square footage, and other variables such as number of bedrooms and bathrooms to determine how those variables affect the prices of homes in the surrounding area. Mercer Island is a small island located in between Downtown Seattle and Bellevue. Mercer Island, along with the surrounding neighborhoods, Bellevue, Medina, and Clyde Hill are affluent neighborhoods in the greater Seattle area. Mercer Island is located in the southern part of Lake Washington. Because of its close positioning to both the lake and downtown Seattle, people can have their choice between spending the day on the beach or shopping and exploring the city. Bellevue is between Lake Washington and Lake Sammamish. Homeowners can enjoy the views of either lake or decide to visit attractions, like the Museum of Flight. The city of Medina is located on the peninsula of Lake Washington, and because of this offers views of the lake as well as the opportunity to take part in plenty of activities on it. Finally, Clyde Hill is between Medina and Bellevue, so homeowners can take advantage of the attractions those cities offer. All of these neighborhoods were chosen at random other than that of Mercer Island, which is where Venture Catalyst Services is located.
0. Data
The data that we collected was gathered on November 29th of 2021. Venture Catalyst Services is located in Mercer Island, 98040 zip code, so we looked up data for the following surrounding areas: 98006 (Bellevue), 98039 (Medina), and 98004 (Clyde Hill/Bellevue). The observations were all randomly chosen from various online real estate websites. The only limitation that was imposed while gathering the data was that there were not many homes available for sale at the time so we had to search other websites for actual prices of homes. We also made sure that every home found had all of the variables we used in our analysis, which were asking price, number of bedrooms, number of bathrooms, house square footage, lot square footage, age of homes as of 2021, and addresses with zip codes.
The mean asking price, measured in dollars, was $3,660,438, with a standard deviation of $2,492,621.98, so there was variance in the data.
3. Results
Regression Results for Prices of Homes in the Mercer Island and Surrounding Area.
|
Dependent Variable: Home Prices |
|||
|
|
Model 1 |
Model 2 |
Model 3 |
|
Intercept |
-305,629.45 (-0.50) |
|
|
|
Bellevue, |
-404,156.55 (-0.68) |
|
|
|
Medina |
2516618.23 (4.20)* |
|
|
|
Clyde Hill |
638473.886 (1.07) |
|
|
|
Home Sq Ft |
851.364047 (6.83)* |
701.401898
|
|
|
Bedrooms |
|
-509434.9121 |
|
|
Bathrooms |
|
582982.5667 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
R-Squared |
0.601 |
0.448 |
|
|
R-Squared adjusted |
0.572 |
0.418 |
|
|
Number of Observations is 60 |
|||
|
T Ratios are in parentheses *Significant at 5% |
Model 1:
Correlation for model 1:
Model 1 looks at the relationship between home prices with respect to their locations and the home square footage. Since Mercer Island, 98040 zip code, was where our business is located, we converted the surrounding three zip codes into dummy variable categories, as seen in model 1, to Dbellevue, Dmedina, Dclydehill. According to the correlation matrix for model 1, there is no evidence of multi-collinearity amongst the variables. Although, we can see that the relationship for price of homes with respect to the Medina area and the relationship between the price of homes with respect to home square footage have the strongest correlation.
Upon concluding the F-test, at a 5% significance level, we have determined that there is an overall statistically significant linear relationship between the price of homes with respect to their locations and home square footage. Upon concluding the T-test on the separate variables, we found that the relationship between price of homes with respect to the Medina area, and price of homes with respect to square footage are both statistically significant at a 5% significance level. 60.1 percent of the variation in the prices of homes is associated with the variation in location and home square footage. Adjusted for the number of independent variables, 57.2 percent of the variation in home prices is associated with the variation in location and home square footage.
Model 2 involved the dependent variable, the current asking price of the homes, on the y axis and the independent variables, the bedrooms, bathrooms, and square footage of the homes, on the x axis. After running the regression, the R Square value came out to be 0.448. This means that approximately 45% of the variability of the asking price of the homes can be attributed to the set of independent variables. The adjusted R square value is 0.418. Since it is lower than the R square value, this shows that some of the variables added to the model are not statistically significant. Overall, this shows that there is some correlation between the asking price for homes and either the number of bedrooms, the number of bathrooms, and/or the size of the house in square feet, but that the other variables don’t affect the asking price too much. With the P-value being at 2.505E-07, we are able to reject the null hypothesis, which is that there is no correlation between the independent variables and asking price, in favor of the alternative hypothesis saying that there is a correlation between the independent variables and the asking price. When looking at the P-values for the bedrooms, bathrooms, and home square footage, we can start to notice how each independent variable affects the asking price for homes. The bedroom's P-value was at 0.079. We can rule out this variable because it is larger than the 0.05 significance level. Similarly, we can also rule out the bathroom having a significant statistical impact with the asking price for homes because its P-value is greater than 0.05, coming in at 0.064. The P-value for the home square footage is 0.0074, and because it is lesser than the 0.05 significance level we can determine that there is a correlation. Overall, looking at model 2 we can determine that the number of bathrooms and bedrooms has no statistical significance in relation to the asking price of the houses, but there is a correlation between the square footage of a house and the asking price.
1. Conclusion