Assignment 4

profilebridget123
Assign4.pdf

Assignment 4 Part 2

(Deadline: Dec 4th 11:59 p.m. EST)

Marks for each part of each question are indicated in brackets in the questions below. Please see

course outline for Part 2 Marking Scheme for penalties.

For questions 2 and 3, there are a total of 2 bonus for putting the answer to this assignment in the

PLAN/DO/REPORT format.

PLAN: (1 bonus mark) In one or two sentences state the overall business

objective.

DO: (no bonus marks, but marks for calculations as asked in the

questions).Do the numerical calculations asked in the question.

REPORT: (1 bonus mark) Relate the numerical results from “DO” to the

business objective given in “PLAN.” One sentence for each numerical result.

Total marks: 25. This assignment is worth 5% of the course mark.

Question 1 .[12.5 pts, 5 parts] Please download the dataset “economic_dashboard” in the

desired format to answer the following questions using your preferred software.

Statistics Canada releases monthly data to measure Canada’s economic health. The numbers in

the dataset are from Tables 36-10-0434-01, 18-10-0004-01, 14-10-0289-01, and 20-10-0008-01.

The data shows the monthly Gross Domestic Products (GDP) in Million CAD (CAD = Canadian

dollars), Consumer Price Index, and Actual hours1 worked at the main job in 1000 hours, and

sales in the retail sector in thousand CAD:

Variable

name

Definition

Date Month-year

CPI Consumer Price Index, monthly

Hours Actual hours worked at main job monthly, (x 1,000)2

GDP Gross domestic product (GDP) at basic prices, monthly (x 1,000,000 CAD)

Retail Retail trade sales, monthly (x 1000 CAD)

Note: figures for GDP and Sales in retail sector are not yet reported for the month of September.

These are missing values. Missing values do not appear in your graphs and calculations.

1 Number of hours actually worked by the respondent during the reference week, including paid and unpaid hours. 2 A value of 1 represent 1000 hours. For example, the value of 621,884 in the first row, represent more than 6 million hours.

Q 1.a.[3 points] A student in ADM2303 wants to use this dataset to understand the association

between CPI and GDP and the association between GDP and Hours worked in the main job.

Prepare two scatter plots to show these associations (copy them in your solution and label them

Figure 1 and Figure 2) and then describe the associations between the variables for each plot.

(Note! In the marking scheme, points will be deducted from the plots with no caption or/and no

label for axis)

Q 1.b.[2 points] The graph below shows the association between GDP and sales in retail sector.

Figure 3

Using a software, calculate the correlation coefficient between GDP and sales in retail sector and

report it. Then, comment on how the association between these two variables is reflected in the

correlation coefficient (talk about strength and direction).

Q 1.c. [2.5 points] Prepare histogram and Boxplot for hours of work at main job. Copy your plots

and label them Figure 4 and Figure 5. Then, comment on the data distribution. Also, explain if you

see any unusual observation.

Q 1.d.[4 points] Using a software, complete the following table (note that M. CAD denotes

Millions CAD, and K hours denotes, Thousands of hours):

Table 1- Summary statistics GDP (M. CAD) Hours of work at the main job (K hours)

Mean

Median

Standard deviation

IQR

Q1.e. [1 point] Based on Figures 4 and 5, what summary statistics are more suitable to describe

the data distribution of hours of work at the main job? Explain why.

Question 2. [6.5 pts, 4 parts] Suppose an analyst at Ottawa Hospital wants to compare the mean

per patient costs of diagnostic imaging (DI) between two campuses: Civic and General. This cost

would vary for each patient as the equipment and time a patient needs to be at the hospital varies.

The analyst looks at the data distribution of per patient cost at each campus and finds both data

distribution slightly skewed to the right. The analyst randomly selects the costs for 50 patients

from DI of Civic campus and 30 patients from DI of the General campus.

Q 2.a. [1 point] What do you think the data distribution look like for the histogram of the mean

costs of imaging at Civic campus? Justify your answer.

Q2.b. [2 points] Suppose the analyst’s calculations show that the mean cost of DI per patient at

Civic campus is 213 CAD with standard deviation of 25 CAD. What is the probability that mean

cost of DI for the 50 patients that are randomly selection is more than 205 CAD?

Q2.c. [2.5 points] Suppose the analyst uses the data from 30 patients randomly selected from

General campus and see the means of DI cost is 220 CAD with standard error of 28 CAD. What

is the probability that mean costs of DI for patients at General hospital is more than mean costs

of DI at Civic hospital (213 CAD)? For the purposes of answering this question you may assume

that 213 CAD is a constant.

Q2.d [1 point] What assumptions you made about data distribution in Q2.c to be able to solve

the question.

Question 3.[6 pts, 3 parts] In the recent U.S presidential election, 49.9% of voters in

Pennsylvania voted for Joe Biden (As of Saturday Nov 14th, 2020. This figure might change once

the ballot counting completes).

Q3.a. [2 points] What is the probability that in a group of randomly selected 1000 voters from

Pennsylvania, we find at least 500 voters who voted for Joe Biden?

Q3.b.[1 point] What assumptions did you make in order to proceed with calculations in Q3.a?

Q3.c. [3 points] What is the probability that in a group of randomly selected 1000 voters from

Pennsylvania, the proportion of individuals who voted for Joe Biden is between 0.5 and 0.51?