Assignment 4
Assignment 4 Part 2
(Deadline: Dec 4th 11:59 p.m. EST)
Marks for each part of each question are indicated in brackets in the questions below. Please see
course outline for Part 2 Marking Scheme for penalties.
For questions 2 and 3, there are a total of 2 bonus for putting the answer to this assignment in the
PLAN/DO/REPORT format.
PLAN: (1 bonus mark) In one or two sentences state the overall business
objective.
DO: (no bonus marks, but marks for calculations as asked in the
questions).Do the numerical calculations asked in the question.
REPORT: (1 bonus mark) Relate the numerical results from “DO” to the
business objective given in “PLAN.” One sentence for each numerical result.
Total marks: 25. This assignment is worth 5% of the course mark.
Question 1 .[12.5 pts, 5 parts] Please download the dataset “economic_dashboard” in the
desired format to answer the following questions using your preferred software.
Statistics Canada releases monthly data to measure Canada’s economic health. The numbers in
the dataset are from Tables 36-10-0434-01, 18-10-0004-01, 14-10-0289-01, and 20-10-0008-01.
The data shows the monthly Gross Domestic Products (GDP) in Million CAD (CAD = Canadian
dollars), Consumer Price Index, and Actual hours1 worked at the main job in 1000 hours, and
sales in the retail sector in thousand CAD:
Variable
name
Definition
Date Month-year
CPI Consumer Price Index, monthly
Hours Actual hours worked at main job monthly, (x 1,000)2
GDP Gross domestic product (GDP) at basic prices, monthly (x 1,000,000 CAD)
Retail Retail trade sales, monthly (x 1000 CAD)
Note: figures for GDP and Sales in retail sector are not yet reported for the month of September.
These are missing values. Missing values do not appear in your graphs and calculations.
1 Number of hours actually worked by the respondent during the reference week, including paid and unpaid hours. 2 A value of 1 represent 1000 hours. For example, the value of 621,884 in the first row, represent more than 6 million hours.
Q 1.a.[3 points] A student in ADM2303 wants to use this dataset to understand the association
between CPI and GDP and the association between GDP and Hours worked in the main job.
Prepare two scatter plots to show these associations (copy them in your solution and label them
Figure 1 and Figure 2) and then describe the associations between the variables for each plot.
(Note! In the marking scheme, points will be deducted from the plots with no caption or/and no
label for axis)
Q 1.b.[2 points] The graph below shows the association between GDP and sales in retail sector.
Figure 3
Using a software, calculate the correlation coefficient between GDP and sales in retail sector and
report it. Then, comment on how the association between these two variables is reflected in the
correlation coefficient (talk about strength and direction).
Q 1.c. [2.5 points] Prepare histogram and Boxplot for hours of work at main job. Copy your plots
and label them Figure 4 and Figure 5. Then, comment on the data distribution. Also, explain if you
see any unusual observation.
Q 1.d.[4 points] Using a software, complete the following table (note that M. CAD denotes
Millions CAD, and K hours denotes, Thousands of hours):
Table 1- Summary statistics GDP (M. CAD) Hours of work at the main job (K hours)
Mean
Median
Standard deviation
IQR
Q1.e. [1 point] Based on Figures 4 and 5, what summary statistics are more suitable to describe
the data distribution of hours of work at the main job? Explain why.
Question 2. [6.5 pts, 4 parts] Suppose an analyst at Ottawa Hospital wants to compare the mean
per patient costs of diagnostic imaging (DI) between two campuses: Civic and General. This cost
would vary for each patient as the equipment and time a patient needs to be at the hospital varies.
The analyst looks at the data distribution of per patient cost at each campus and finds both data
distribution slightly skewed to the right. The analyst randomly selects the costs for 50 patients
from DI of Civic campus and 30 patients from DI of the General campus.
Q 2.a. [1 point] What do you think the data distribution look like for the histogram of the mean
costs of imaging at Civic campus? Justify your answer.
Q2.b. [2 points] Suppose the analyst’s calculations show that the mean cost of DI per patient at
Civic campus is 213 CAD with standard deviation of 25 CAD. What is the probability that mean
cost of DI for the 50 patients that are randomly selection is more than 205 CAD?
Q2.c. [2.5 points] Suppose the analyst uses the data from 30 patients randomly selected from
General campus and see the means of DI cost is 220 CAD with standard error of 28 CAD. What
is the probability that mean costs of DI for patients at General hospital is more than mean costs
of DI at Civic hospital (213 CAD)? For the purposes of answering this question you may assume
that 213 CAD is a constant.
Q2.d [1 point] What assumptions you made about data distribution in Q2.c to be able to solve
the question.
Question 3.[6 pts, 3 parts] In the recent U.S presidential election, 49.9% of voters in
Pennsylvania voted for Joe Biden (As of Saturday Nov 14th, 2020. This figure might change once
the ballot counting completes).
Q3.a. [2 points] What is the probability that in a group of randomly selected 1000 voters from
Pennsylvania, we find at least 500 voters who voted for Joe Biden?
Q3.b.[1 point] What assumptions did you make in order to proceed with calculations in Q3.a?
Q3.c. [3 points] What is the probability that in a group of randomly selected 1000 voters from
Pennsylvania, the proportion of individuals who voted for Joe Biden is between 0.5 and 0.51?