statistics exam

profileAA-
ex_statistics_6.pdf

Exploratory Data Analysis Statistics (exercises)

Aleksandra Pawłowska

May 26, 2020

Glossary

Five-number summary An exploratory data analysis technique that uses five numbers to summarize the data: smallest value, first quar- tile, median, third quartile, and largest value. Box plot A graphical summary of data based on a five-number sum- mary.

Aleksandra Pawłowska Exploratory Data Analysis

Task 1

Consider a sample with data values of 27, 25, 20, 15, 30, 34, 28, and 25. Provide the five-number summary for the data.

Aleksandra Pawłowska Exploratory Data Analysis

Task 1 – solution

Consider a sample with data values of 27, 25, 20, 15, 30, 34, 28, and 25. Provide the five-number summary for the data. five-number summary: 15, 22.5, 26, 29, 34

Aleksandra Pawłowska Exploratory Data Analysis

Task 2

Show the box plot for the data in task 1 (use excel).

Aleksandra Pawłowska Exploratory Data Analysis

Task 2 – solution

Show the box plot for the data in task 1 (use excel).

Aleksandra Pawłowska Exploratory Data Analysis

Task 3

Show the five-number summary and the box plot for the following data: 5, 15, 18, 10, 8, 12, 16, 10, 6.

Aleksandra Pawłowska Exploratory Data Analysis

Task 3 – solution

Show the five-number summary and the box plot for the following data: 5, 15, 18, 10, 8, 12, 16, 10, 6. five-number summary: 5, 8, 10, 15, 18

Aleksandra Pawłowska Exploratory Data Analysis

Task 3 – solution

Aleksandra Pawłowska Exploratory Data Analysis

Task 4

A data set has a first quartile of 42 and a third quartile of 50. Compute the lower and upper limits for the corresponding box plot. Should a data value of 65 be considered an outlier?

Aleksandra Pawłowska Exploratory Data Analysis

Task 4 – solution

A data set has a first quartile of 42 and a third quartile of 50. Compute the lower and upper limits for the corresponding box plot. Should a data value of 65 be considered an outlier? lower limit = 30, upper limit = 62, 65 is an outlier

Aleksandra Pawłowska Exploratory Data Analysis

Task 5

Annual sales, in millions of dollars, for 21 pharmaceutical companies follow.

1 Provide a five-number summary. 2 Compute the lower and upper limits. 3 Do the data contain any outliers? 4 Johnson & Johnson’s sales are the largest on the list at $14,138 million.

Suppose a data entry error (a transposition) had been made and the sales had been entered as $41,138 million. Would the method of detecting outliers in part (c) identify this problem and allow for correction of the data entry error?

5 Show a box plot (use excel).

Aleksandra Pawłowska Exploratory Data Analysis

Task 5 – solution

1 Provide a five-number summary. 608, 1872, 4019, 8305, 14138

2 Compute the lower and upper limits. lower limit = -7777.5, upper limit = 17954.5

3 Do the data contain any outliers? Data contain no outliers 4 Johnson & Johnson’s sales are the largest on the list at

$14,138 million. Suppose a data entry error (a transposition) had been made and the sales had been entered as $41,138 million. Would the method of detecting outliers in part (c) identify this problem and allow for correction of the data entry error? Yes, 41138 would be an outlier, data value would be reviewed and detected

5 Show a box plot. See next slide

Aleksandra Pawłowska Exploratory Data Analysis

Task 5 – solution

Aleksandra Pawłowska Exploratory Data Analysis