Project B

nirmak
QMBProjectBSAMPLE.docx

QMB 3600 Quantitative Methods in Business

Project B: Correlation and Linear Regression

Choose a real world business data set of at least 15 pairs of numeric data points.

Immediately check to see if the p-value is less than 0.10. If p-value is greater than 0.10, use a different data set.

Enter the data into Excel and use “Regression” from the Data Analysis add-in to analyze it. Copy all the information from Excel into Word in a presentable fashion. Answer the following questions in Word. Be sure to number them. Submit the Word and Excel file into the drop box.

Only the Word file will be graded, so put everything in the Word file.

Please make sure that you answer all the parts of this project. Please do not leave any of them out.

PART I: CALCULATION AND ANALYSIS

Correlation

1. Name you source and include the link to where the data was found. Give you data set in an organized, presentable fashion.

Index Mundi

http://www.indexmundi.com/facts/united-states/quick-facts/florida/population-density#table

 

Poverty level

Median income

Alachua

24.9

42,149

Baker

17.3

49,236

Bay

14.7

47,461

Bradford

18.2

40,259

Brevard

13.5

48,039

Broward

14.3

51,251

Calhoun

23.5

32,780

Charlotte

12.6

44,378

Citrus

16.8

39,100

Clay

9.8

59,482

Collier

14.1

55,843

Columbia

19.8

38,070

DeSoto

29.6

34,963

Dixie

17.4

33,981

Duval

16.9

48,323

2. Explain the variables and what the units mean.

The variables are x (Poverty Level) and y (Median Income). I’m comparing both in 15 counties of Florida.

3. Explain the reasoning for why one variable is independent and the other is dependent.

I’m testing the poverty level and median income of different counties of Florida to see if it corelates. Also, to test to see if the poverty rate is so high does that mean the income is going to be high or low?

4. Choose α and explain why you chose it.

I chose 0.05 as my α because it is the most common used alpha and there is no need to have a low alpha point.

5. Give your p-value.

0.002307344

6. Compare α and p-value

0.05>0.00230

7. Tell if there is positive, negative or no significant correlation.

In this case, we reject. The result is statistically significant correlation.

8. Give the bottom line conclusion; does the “y” depend on the “x”?

In this study I can confirm that y does depend on x.

9. Give the critical value and r.

0.723325653

Critical value is 0.514

10. Compare r and critical value and tell if there is significant correlation.

723325653>0.514

Critical Value indicates positive correlation

11. Give r2 and explain what it means, using the specific variables of the project.

0.523169095 is the r2 which means the higher the income in a certain county, the lower the poverty rate.

Which also means .47683% comes from other random factors that are not associated in this study.

12. Give other causes of variation that are not part of the model.

· Local job openings

· One county might have more job opportunities

· Bigger employers that pay more in different counties

Regression

13. Using Excel, create a scatter plot with the regression line.

14. Give the regression equation.

y=-1117.5x+63978

15. Explain what the slope ( ) means in the model (for your x and y). Be sure to give units.

.05 Median income

50 Poverty level

Slope is -1117.5

16. Pick 3 data points that will be used to find their residuals. Give the names of the data points, why they were chosen, and why they are important.

Clay county for having the lowest poverty level and highest median income

Desoto county for having the highest poverty level and one of the lowest median income

Baker county for having an average poverty level and an above average median income

17. Find the residuals for the 3 points chosen in #16.

Clay 6455.905806

Desoto 4063.30594

Baker 4591.117978

18. Decide if the residuals are small or large and explain why.

The residuals come up to be large because all the predictions were off by more than 4000. Clay was 6455 dollars off from the regression line, Desoto was 4063 off, baker was 4591 off.

19. Make one prediction. For a time series, predict the next year. Otherwise, make a prediction about a fictions data point that is realistic and relevant to business.

63977.544+(-1117.494)*Poverty percentage

63977.544+(-1117.494)*35%=22,001.017

PART II: CRITICAL THINKING AND APPLYING TO THE REAL WORLD

20. Why did you pick this topic and how is it important to you?

This topic really struck my interest about how in some areas has a very high poverty level but at the same time the median income is above average. I just thought that it was interesting to calculate and compare the numbers showing that every county fluctuates.

21. How can this model be of use to a real world business? Can it help solve any problems?

Some business ideas that come to mind are maybe you want to open a new business and your seeing where the lowest poverty area is or which county has the highest income. The numbers can sort of help you direct where to open the new location of business. Wouldn’t you want to open a new store or restaurant that has a higher median income? A higher median income could mean people have the money to go out and eat.

22. Explain what type I error would be for this model and how type I error can be dangerous to a business using this model. How concerned are you about this and why?

The type I error for this problem would be saying that this certain county has a low poverty rate and a high median income. But, it’s the opposite, the income is the lowest and poverty is a very high percentage. It could really affect a business in many ways like if someone is trying to open a new store but the following info of median income is false. I’m concerned about this because business owners rely on data like this and sometimes the data can have an error.

For each misuses of regression below, explain why this model is or is not vulnerable to:

23. Causation and causality

In this case it will not be vulnerable to this research. Here we are testing the poverty rate vs the median income of certain counties. Counties with higher income generally have a lower poverty rate. But that might not always be the case, there could be so many more reasons of why the poverty is either high or low. So, you just can’t base it off the income. There is correlation but no causality.

24. Time dependent causation and causality

The fact that the median income and poverty percentage is going up at the same time in the counties doesn’t mean that one affects the other. There could be many reasons unlisted to the fact that they are both going up at the same time during the past 5 years.

25. Randumbness

This project is randumbness because it might look like a process or a reason of why the lower poverty counties have higher median income but, it’s all random. There are many reasons why the poverty is up or down and why the income is higher or lower. The pattern though is completely random in this situation.

26. Regression to the moon

According to this model we are testing the poverty rate and median income. The median income can always go up but the poverty rate can only go to 0%, never a negative. As some point the poverty level is going to have to level off. So, this model would be of regression to the moon.

27. False linear assumption (Study the scatter plot and determine if the data might actually not be linear, and if so, what other type of correlation it might be.)

It has a negative correlation, and the prediction is still linear to the scatter plot diagram.

Learning

28. What did you learn about this topic? (Write at least 4 sentences)

What I learned about this topic is that you can use this type of research and calculation for so many business examples. You can predict so many outcomes using this. It can be very beneficial to business owners who are trying to do research ahead of time before they take the first steps. I also learned a lot about excel and how it a very nice tool to have for any business owner or person who wants to conduct research on a certain subject. Also, causation and causality was interesting how a lot of times things will be held responsible for something just because the facts line up with it from the past. Overall this project was well unique and I might have to use it again one day.

29. What did you learn about statistics? (Write at least 4 sentences)

I learned that statistics can be very useful in many situations, statistics can be used to help support your theory and answer. Also, even though statistic can be some facts it might still end up having an error. I also learned that we use statistics all the time for many situations in business, medical, life and trying to prove a point. It’s a very powerful subject that a lot of people need to use for many reasons.

Presentation

30. Present this project in a professional way so that you can proud of it. Save it so that you can show an example of applying quantitative methods to your current or future employer. It should be written clearly, error free, and visually attractive. Include at least one photograph or graphic.

· This Presentation is testing the median income amount and poverty percentage in certain counties of Florida.

· The following page show the calculations and residual output of each county based on the income and poverty percentage

· Upon this project I will also make a prediction using the information that I gathered through this project to see how the income affects the poverty rate.

· My info for the poverty rate and median income was attained at this website http://www.indexmundi.com/facts/united-states/quick-facts/florida/population-density#table

Median income

Median income

24.9 17.3 14.7 18.2 13.5 14.3 23.5 12.6 16.8 9.8000000000000007 14.1 19.8 29.6 17.399999999999999 16.899999999999999 42149 49236 47461 40259 48039 51251 32780 44378 39100 59482 55843 38070 34963 33981 48323

Poverty Level

Median Income

SUMMARY OUTPUT

Regression Statistics

Multiple R0.723304289

R Square0.523169095

Adjusted R Square0.486489794

Standard Error5712.625092

Observations15

ANOVA

dfSSMSFSignificance F

Regression1465470844.6465470844.614.263333520.002307344

Residual13424243110.732634085.44

Total14889713955.3

CoefficientsStandard Errort StatP-valueLower 95%Upper 95%

Intercept63977.544765401.18791411.845087742.44182E-0852308.9876975646.10184

poverty level-1117.494956295.8932466-3.7766828720.002307344-1756.733452-478.2564607

RESIDUAL OUTPUT

ObservationPredicted median incomeResiduals

Alachua36151.920355997.079645

Baker44644.882024591.117978

Bay47550.36891-89.36890811

Bradford43639.13656-3380.136561

Brevard48891.36286-852.3628556

Broward47997.366893253.633109

Calhoun37716.41329-4936.413293

Charlotte49897.10832-5519.108316

Citrus45203.6295-6103.6295

Clay53026.094196455.905806

Collier48220.865887622.134118

Columbia41851.14463-3781.144631

DeSoto30899.694064063.30594

Dixie44533.13253-10552.13253

Duval45091.883231.119996