Statistical Analysis Subject

profileumairchill5
sample.docx

SOUTHERN CROSS UNIVERSITY

School of Business and Tourism

MAT10251 Statistical Analysis

PROJECT COVER SHEET

Please complete all of the following details and then make these sheets the first pages of your project – do not send it as a separate document.

Your project must be submitted as a Word document.

PART C

Student Name:

Ashish Neupane

Student ID No.:

22754842

Tutor’s name:

Dr Badri P. Bhattrai

Due date:

19th January,2018

Date submitted:

23rd Jan 2018

Declaration:

I have read and understand the Rules Relating to Awards ( Rule 3 Section 18 – Academic Misconduct Including Plagiarism ) as contained in the SCU Policy Library. I understand the penalties that apply for plagiarism and agree to be bound by these rules. The work I am submitting electronically is entirely my own work.

.

Signed:

(please type your name)

Ashish Neupane

Date:

23nd January, 2018

STUDENT NAME: Ashish Neupane

STUDENT ID NUMBER:22754842

MAT10251 – Statistical Analysis

Project Part C

Complete the summary table below.

Sample Number (last digit of your student ID number)

02

Level of Significance

0.05

Value: 20%

PLEASE ENSURE YOU KEEP A COPY OF YOUR PROJECT

Marking and Feedback Sheet

Comments

Written Answer Part C

Delete the italic text and add your content.

Each answer below should:

· Introduce and put the question in context

· Include appropriate Excel output.

· Present the results of your procedures, intervals or tests without unnecessary statistical jargon

Question 1

100 to 200 words and 1 to 2 pages

Use your answer to Question 1:

Is there a difference in the average price of cars, of the specified make and model for sale in the specified state, for sale privately and by a used car dealer?

to provide a justified answer to your relative’s question. That is, is there a difference in price between cars sold by a dealer and those sold privately?

Questions 2 and 3

200 to 500 words and 2 to 4 pages

Use the simple and multiple linear regression models developed in Questions 2 and 3 to provide a linear model to predict the price of a used car from age and/or transmission type and/or odometer reading, to answer your relative’s question. That is, how the value of the car they purchase will depreciate?

· Explain choice of independent and dependent variables.

· Include your scatter plot and discuss any apparent relationship between price and age. Comment on the strength, shape and sign of the relationship.

· Include and justify the simple or multiple linear model which best fits the data.

· Discuss and interpret the values of the regression and correlation coefficients of the best model.

· Present the results without unnecessary statistical jargon.

· Provide an answer to your relative’s question. That is, how the value of the car that they purchase will depreciate?

As these answers are part of a letter or emails to a relative you can use informal or casual written language

Appendices Part C

Delete the italic text and add your content.

This section should include appropriate graphs, Excel output and any necessary steps for the required statistical tasks.

Tests should show full statistical working including

· Random variable/s defined

· Any required assumptions mentioned

· Statistical calculations, including Excel output

· Hypotheses and decision for tests

· Conclusion for any hypothesis test.

Appendix C.1 Statistical answer for Question 1

Is there a difference in the average price of cars, of the specified make and model for sale in the specified state, for sale privately and by a used car dealer?

Appendix C.2 Statistical answer for Question 2 and Question 3

Assumptions and Variables Defined

Define dependent and independent variables for both simple and multiple linear regression models.

Mention any assumptions required for the simple/multiple linear regression models.

Simple Linear Regression Model

· Develop a simple linear regression model

· Include interpretation of regression and correlation coefficients.

Multiple Linear Regression Model

· Develop a multiple linear regression model with three independent variables

· Include interpretation of multiple regression and correlation coefficients for the multiple regression model

· Determine which independent variables make a significant contribution to the regression model.

· State, and justify, the simple or multiple linear model which best fits the data.

Letter

Mr Ram Prasad Kuikel

38 RobinsonStreet

Riverstone NSW

19th January 2018

Dear Ramu,

I am writing this letter to say about the question you asked regarding the price differences between the cars which are in sale privately and car which are in sale through the used car dealer. It also will help to know how the price of cars will depreciate with the age, odometer, readings, and transmission.

While purchasing the cars it is very important to know the difference in price of the cars which is of same make and model. Here are some sample of cars available in states which will make your choice better.

The boxplot diagram below shows the price of cars sold privately and by used cars dealers.

No 1.

Z Test for Differences in Two Means

Data

Hypothesized Difference

0

Level of Significance

0.05

Population 1 Sample

Sample Size

89

Sample Mean

15773.61798

Sample Standard Deviation

4840.665989

Population 2 Sample

Sample Size

32

Sample Mean

11741.1875

Sample Standard Deviation

4132.159152

Intermediate Calculations

Difference in Sample Means

4032.430478

Standard Error of the Difference in Means

892.6741

Z Test Statistic

4.5172

Two-Tail Test

Lower Critical Value

-1.9600

Upper Critical Value

1.9600

p-Value

0.00001

Reject the null hypothesis

Upper-Tail Test

Upper Critical Value

1.644853627

p-Value

0.000003

Reject the null hypothesis

Lower-Tail Test

Lower Critical Value

-1.644853627

p-Value

0.999997

Do not reject the null hypothesis

Confidence Interval Estimate

for the Difference Between Two Means

 

 

Data

Confidence Level

95.00%

 

 

Intermediate Calculations

Z Value

1.9600

Interval Half Width

1749.609066

 

 

Confidence Interval

Interval Lower Limit

2282.821411

Interval Upper Limit

5782.0395

From the p-value =0.001>0 = a using the p-value approach, we do not reject the null hypothesis. The difference in price of the car for sale privately and for sale by a used car dealer is different than the probability of getting the difference in our sample is 0.0000000. This is a likely even. Therefore, it is proved that my sample provides actual difference in the average price of the cars of the same make and model in the specified state for sale privately and by a used car dealer.

Age of car is indicated by the price so it is positively influence price of car so I would expect the price to be dependent on the age of the care. So I am constructing a linear model with price as a dependent factor and age as independent factor to enable to predict the price from the age of car.

The scatter diagram below shows the relationship between age and price of used car. As expected this graph shows that the sample the price of the car will be high with the less age of the car. As expected, this diagram shows negative relationship and is approximately linear which does not shows positive relation between price and age.

SUMMARY OUTPUT

Regression Statistics

Multiple R

0.877278

R Square

0.769616

Adjusted R Square

0.76768

Standard Error

2399.539

Observations

121

As expected from the scatter plot, the correlation coefficient 0.877 showing that there is negative linear relation between price and age of cars.

SUMMARY OUTPUT

Regression Statistics

Multiple R

0.877278

R Square

0.769616

Adjusted R Square

0.76768

Standard Error

2399.539

Observations

121

ANOVA

 

df

SS

MS

F

Significance F

Regression

1

2.29E+09

2.29E+09

397.5287

9.66E-40

Residual

119

6.85E+08

5757789

Total

120

2.97E+09

 

 

 

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Lower 95.0%

Upper 95.0%

Intercept

20986.63

383.1138

54.77909

2.93E-86

20228.02

21745.23

20228.02

21745.23

Age

-1274.85

63.94043

-19.9381

9.66E-40

-1401.46

-1148.24

-1401.46

-1148.24

The p-values of two independent variables such as age and odometer in the above table are less than 0.05. This indicates that both age and odometer make a significant contribution to the model and should be included. That is, adding age and odometer has resulted in a stronger model.

Whereas the p-value of one independent variable such as transmission is more than 0.05 which does not make a significant contribution to the model and should not be included.

Therefore, the multiple regression model:

Price = 20986.63– 997 x age (year) – 0.05 x odometer (kms) +432 x transmission

obtained from the table above will allow you to estimate relationship between these for variables .

However, while the equation above will predict price of car this prediction is not very accurate, since from the correlation coefficient the strength of the relationship is not strong. In particular, the coefficient of determination, r2= 0.7696. That is other factors will also influence your price of the car.

Please contact me if you want any further information.

Yours sincerely ,

Ashish Neupane

Appendix C

Appendix C1. – Statistical answer for question 1

Hypothesis test difference means two independent samples

To answer question 1

Let,

X = average price for private sale

X1 = average price for dealer sale

Then, µ = mean average price for private sale

µ1 = mean average price for dealer sale

x and x1 are independent, with n = 32 and n1 = 89

Choice of two with justification

The boxplots below indicate that the distribution of the population of price of private and dealer car of Corolla X-Trail is normal.

If using a z-test

Since have large samples Central Limit Theorem applies. Therefore, the sampling distribution is approximately normal, and the z-test for difference of two independent means can be used to test if there is a difference in the average price of Nissan for private and dealer.

The both sample size are larger than central limit theorem applies (CLT).

Hypotheses

Use level of significance of 5%

Calculation

Excel output of independent two tail z-test.

Data

Hypothesized Difference

0

Level of Significance

0.05

Population 1 Sample

Sample Size

89

Sample Mean

15773.61798

Sample Standard Deviation

4840.665989

Population 2 Sample

Sample Size

32

Sample Mean

11741.1875

Sample Standard Deviation

4132.159152

Intermediate Calculations

Difference in Sample Means

4032.430478

Standard Error of the Difference in Means

892.6741

Z Test Statistic

4.5172

Two-Tail Test

Lower Critical Value

-1.9600

Upper Critical Value

1.9600

p-Value

0.00001

Reject the null hypothesis

answers for question 2.

From the scatter plot the assumption

Let,

X = car age (independent variable)

Y = car price (dependent variable)

Decision

Since p-value = 0.00001do not reject the null hypothesis at any level of significance.

Conclusion

Therefore, the sample provides no evidence at any level of significance.

There is negative relation between price and age.

Equation and Coefficients

From Scatter Plot,

Equation and Coefficients

From scatter plot or regression output

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Lower 95.0%

Upper 95.0%

Intercept

20986.63

383.1138

54.77909

2.93E-86

20228.02

21745.23

20228.02

21745.23

Age

-1274.85

63.94043

-19.9381

9.66E-40

-1401.46

-1148.24

-1401.46

-1148.24

Regression Statistics

Multiple R

0.877278

R Square

0.769616

Adjusted R Square

0.76768

Standard Error

2399.539

Observations

121

Correlation coefficient

R=0.877278

R^2=0.769616

Gradient: b1 = -1274.85 shows that on every additional year on Corolla in Victoria the sale will be reduce by 1274.85.

Variable intercept: b0 =20986.63

If the age of the car Corolla is zero, price would be 20986.63

Interpretation of correlation coefficient

Correlation coefficient: r = 0.877278

R is close to 1 therefore it shows the strong negative linear relationship between price and age.

Coefficient of determination: r2 = 0.769616= 76.96%.

Indicates that approximately 81.93% of the variation in the price is explained by the age.

Question no .3

EXCEL OUTPUT FOR QUESTION NO.3

Regression Statistics

Multiple R

0.799615

R Square

0.639384

Adjusted R Square

0.636354

Standard Error

3002.093

Observations

121

ANOVA

 

df

SS

MS

F

Significance F

Regression

1

1.9E+09

1.9E+09

210.9909

4E-28

Residual

119

1.07E+09

9012561

Total

120

2.97E+09

 

 

 

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Lower 95.0%

Upper 95.0%

Intercept

20485.19

482.4053

42.46469

9.67E-74

19529.98

21440.4

19529.98

21440.4

Odometer (kms)

-0.08071

0.005557

-14.5255

4E-28

-0.09172

-0.06971

-0.09172

-0.06971

Let, A= Age- independent variable

O= odometer-independent variable

T= transmission-independent variable (1=Automatic, O=Manual)

Y= price, $-dependent variable

Price and Age

Price

2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 10 10 11 11 12 12 12 13 16 16 16990 16990 16990 17990 17990 19990 15990 15991 16250 16871 16880 16990 16990 16990 16990 17700 17888 17962 17990 18990 18990 18990 20990 21990 23501 13821 15480 16900 16990 17880 18850 18990 12888 13500 14500 14990 15880 16500 16750 17700 20800 21990 19320 19990 19990 20990 21990 22888 24990 24990 25990 25990 28990 18800 19338 19990 14980 15990 15990 15990 16990 17990 13741 14555 14650 14990 16595 16990 11123 11990 13990 14750 9888 13199 13990 7990 8150 10870 11500 11990 12990 13990 15990 12000 12990 9500 10990 11800 12880 11500 11990 9000 9800 9990 10990 10999 11990 12000 6888 8201 8500 9100 9500 9700 10450 10990 11990 7490 7995 9980 10995 8887 8998 5500 7990 4490 6888 5990 5600 5750 6900

Price and Age

Price

2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 10 10 11 11 12 12 12 13 16 16 16990 16990 16990 17990 17990 19990 15990 15991 16250 16871 16880 16990 16990 16990 16990 17700 17888 17962 17990 18990 18990 18990 20990 21990 23501 13821 15480 16900 16990 17880 18850 18990 12888 13500 14500 14990 15880 16500 16750 17700 20800 21990 19320 19990 19990 20990 21990 22888 24990 24990 25990 25990 28990 18800 19338 19990 14980 15990 15990 15990 16990 17990 13741 14555 14650 14990 16595 16990 11123 11990 13990 14750 9888 13199 13990 7990 8150 10870 11500 11990 12990 13990 15990 12000 12990 9500 10990 11800 12880 11500 11990 9000 9800 9990 10990 10999 11990 12000 6888 8201 8500 9100 9500 9700 10450 10990 11990 7490 7995 9980 10995 8887 8998 5500 7990 4490 6888 5990 5600 5750 6900

Max

Marks

Mark

Cover sheet or sample incorrect-2

Incorrect format, including file name-2

Statistical Inference Question 1

Choice of technique, assumptions & other required 5

Calculation (Excel output)3

Decision and conclusion2

Regression and Correlation

Assumptions and random variables defined2

Simple Linear Model Question 2

Scatter plot3

Equation and coefficients2

Interpretation of regression & correlation coefficients 2

Multiple Linear Model Question 3

Equation, Coefficients and p-values3.5

Interpretation of regression & correlation coefficients3.5

Statistical Inference

Choice of technique and other required steps2

Decision and conclusion2

Best model1

Total Statistical Calculations310.0

Written Answer

Question 1

Introduction, discussion and results2

Question 2 & 3

Introduction1

Interpretation of scatter plot2

Introduction and discussion of best model2

Structure, grammar and spelling2

Total Written Answer90.0

Total Part C400.0

Sheet1

Max Marks Mark
Cover sheet or sample incorrect -2
Incorrect format, including file name -2
Statistical Inference Question 1
Choice of technique, assumptions & other required steps 5
Calculation (Excel output) 3
Decision and conclusion 2
Regression and Correlation
Assumptions and random variables defined 2
Simple Linear Model Question 2
Scatter plot 3
Equation and coefficients 2
Interpretation of regression & correlation coefficients 2
Multiple Linear Model Question 3
Equation, Coefficients and p-values 3.5
Interpretation of regression & correlation coefficients 3.5
Statistical Inference
Choice of technique and other required steps 2
Decision and conclusion 2
Best model 1
Total Statistical Calculations 31 0.0
Written Answer
Question 1
Introduction, discussion and results 2
Question 2 & 3
Introduction 1
Interpretation of scatter plot 2
Introduction and discussion of best model 2
Structure, grammar and spelling 2
Total Written Answer 9 0.0
Total Part C 40 0.0