Statistics assignment

profilen123
Assign4.docx

Assignment 4 Part 2

(Deadline: Dec 4th 11:59 p.m. EST)

Marks for each part of each question are indicated in brackets in the questions below. Please see course outline for Part 2 Marking Scheme for penalties.

For questions 2 and 3, there are a total of 2 bonus for putting the answer to this assignment in the PLAN/DO/REPORT format.

PLAN: (1 bonus mark) In one or two sentences state the overall business objective.

DO: (no bonus marks, but marks for calculations as asked in the questions).Do the numerical calculations asked in the question.

REPORT: (1 bonus mark) Relate the numerical results from “DO” to the business objective given in “PLAN.” One sentence for each numerical result.

Total marks: 25. This assignment is worth 5% of the course mark. The maximum grade is 25 marks. If the total grade adds up to more than 25 marks after including the bonus marks, the additional marks will not be carried over.

Question 1 .[12.5 pts, 5 parts] Please download the dataset “economic_dashboard” in the desired format to answer the following questions using your preferred software.

Statistics Canada releases monthly data to measure Canada’s economic health. The numbers in the dataset are from Tables 36-10-0434-01, 18-10-0004-01, 14-10-0289-01, and 20-10-0008-01. The data shows the monthly Gross Domestic Products (GDP) in Million CAD (CAD = Canadian dollars), Consumer Price Index, and Actual hours[footnoteRef:1] worked at the main job in 1000 hours, and sales in the retail sector in thousand CAD: [1: Number of hours actually worked by the respondent during the reference week, including paid and unpaid hours.]

Variable name

Definition

Date

Month-year

CPI

Consumer Price Index, monthly

Hours

Actual hours worked at main job monthly, (x 1,000)[footnoteRef:2] [2: A value of 1 represent 1000 hours. For example, the value of 621,884 in the first row, represent more than 6 million hours.]

GDP

Gross domestic product (GDP) at basic prices, monthly (x 1,000,000 CAD)

Retail

Retail trade sales, monthly (x 1000 CAD)

Note: figures for GDP and Sales in retail sector are not yet reported for the month of September. These are missing values. Missing values do not appear in your graphs and calculations.

Q 1.a.[3 points] A student in ADM2303 wants to use this dataset to understand the association between CPI and GDP and the association between GDP and Hours worked in the main job. Prepare two scatter plots to show these associations (copy them in your solution and label them Figure 1 and Figure 2) and then describe the associations between the variables for each plot.

(Note! In the marking scheme, points will be deducted from the plots with no caption or/and no label for axis)

Q 1.b.[2 points] The graph below shows the association between GDP and sales in retail sector.

Figure 3

Using a software, calculate the correlation coefficient between GDP and sales in retail sector and report it. Then, comment on how the association between these two variables is reflected in the correlation coefficient (talk about strength and direction).

Q 1.c. [2.5 points] Prepare histogram and Boxplot for hours of work at main job. Copy your plots and label them Figure 4 and Figure 5. Then, comment on the data distribution. Also, explain if you see any unusual observation.

Q 1.d.[4 points] Using a software, complete the following table (note that M. CAD denotes Millions CAD, and K hours denotes, Thousands of hours):

Table 1- Summary statistics

GDP (M. CAD)

Hours of work at the main job (K hours)

Mean

Median

Standard deviation

IQR

Q1.e. [1 point] Based on Figures 4 and 5, what summary statistics are more suitable to describe the data distribution of hours of work at the main job? Explain why.

Question 2. [6.5 pts, 4 parts] Suppose an analyst at Ottawa Hospital wants to compare the mean per patient costs of diagnostic imaging (DI) between two campuses: Civic and General. This cost would vary for each patient as the equipment and time a patient needs to be at the hospital varies. The analyst looks at the data distribution of per patient cost at each campus and finds both data distribution slightly skewed to the right. The analyst randomly selects the costs for 50 patients from DI of Civic campus and 30 patients from DI of the General campus.

Q 2.a. [1 point] What do you think the data distribution look like for the histogram of the mean costs of imaging at Civic campus? Justify your answer.

Q2.b. [2 points] Suppose the analyst’s calculations show that the mean cost of DI per patient at Civic campus is 213 CAD with standard deviation of 25 CAD. What is the probability that mean cost of DI for the 50 patients that are randomly selection is more than 205 CAD?

Q2.c. [2.5 points] Suppose the analyst uses the data from 30 patients randomly selected from General campus and see the means of DI cost is 220 CAD with standard error of 28 CAD. What is the probability that mean costs of DI for patients at General hospital is more than mean costs of DI at Civic hospital (213 CAD)? For the purposes of answering this question you may assume that 213 CAD is a constant.

Q2.d [1 point] What assumptions you made about data distribution in Q2.c to be able to solve the question.

Question 3.[6 pts, 3 parts] In the recent U.S presidential election, 49.9% of voters in Pennsylvania voted for Joe Biden (As of Saturday Nov 14th, 2020. This figure might change once the ballot counting completes).

Q3.a. [2 points] What is the probability that in a group of randomly selected 1000 voters from Pennsylvania, we find at least 500 voters who voted for Joe Biden?

Q3.b.[1 point] What assumptions did you make in order to proceed with calculations in Q3.a?

Q3.c. [3 points] What is the probability that in a group of randomly selected 1000 voters from Pennsylvania, the proportion of individuals who voted for Joe Biden is between 0.5 and 0.51?

1

.

6

e

+

0

6

1

.

7

e

+

0

6

1

.

8

e

+

0

6

1

.

9

e

+

0

6

2

.

0

e

+

0

6

G

D

P

i

n

M

.

C

A

D

3.5e+074.0e+074.5e+075.0e+075.5e+07

Sales in retail sector K. CAD

Scatterplo GDP vs Sales in retail sector