statistics exam
Exploratory Data Analysis Statistics (exercises)
Aleksandra Pawłowska
May 26, 2020
Glossary
Five-number summary An exploratory data analysis technique that uses five numbers to summarize the data: smallest value, first quar- tile, median, third quartile, and largest value. Box plot A graphical summary of data based on a five-number sum- mary.
Aleksandra Pawłowska Exploratory Data Analysis
Task 1
Consider a sample with data values of 27, 25, 20, 15, 30, 34, 28, and 25. Provide the five-number summary for the data.
Aleksandra Pawłowska Exploratory Data Analysis
Task 1 – solution
Consider a sample with data values of 27, 25, 20, 15, 30, 34, 28, and 25. Provide the five-number summary for the data. five-number summary: 15, 22.5, 26, 29, 34
Aleksandra Pawłowska Exploratory Data Analysis
Task 2
Show the box plot for the data in task 1 (use excel).
Aleksandra Pawłowska Exploratory Data Analysis
Task 2 – solution
Show the box plot for the data in task 1 (use excel).
Aleksandra Pawłowska Exploratory Data Analysis
Task 3
Show the five-number summary and the box plot for the following data: 5, 15, 18, 10, 8, 12, 16, 10, 6.
Aleksandra Pawłowska Exploratory Data Analysis
Task 3 – solution
Show the five-number summary and the box plot for the following data: 5, 15, 18, 10, 8, 12, 16, 10, 6. five-number summary: 5, 8, 10, 15, 18
Aleksandra Pawłowska Exploratory Data Analysis
Task 3 – solution
Aleksandra Pawłowska Exploratory Data Analysis
Task 4
A data set has a first quartile of 42 and a third quartile of 50. Compute the lower and upper limits for the corresponding box plot. Should a data value of 65 be considered an outlier?
Aleksandra Pawłowska Exploratory Data Analysis
Task 4 – solution
A data set has a first quartile of 42 and a third quartile of 50. Compute the lower and upper limits for the corresponding box plot. Should a data value of 65 be considered an outlier? lower limit = 30, upper limit = 62, 65 is an outlier
Aleksandra Pawłowska Exploratory Data Analysis
Task 5
Annual sales, in millions of dollars, for 21 pharmaceutical companies follow.
1 Provide a five-number summary. 2 Compute the lower and upper limits. 3 Do the data contain any outliers? 4 Johnson & Johnson’s sales are the largest on the list at $14,138 million.
Suppose a data entry error (a transposition) had been made and the sales had been entered as $41,138 million. Would the method of detecting outliers in part (c) identify this problem and allow for correction of the data entry error?
5 Show a box plot (use excel).
Aleksandra Pawłowska Exploratory Data Analysis
Task 5 – solution
1 Provide a five-number summary. 608, 1872, 4019, 8305, 14138
2 Compute the lower and upper limits. lower limit = -7777.5, upper limit = 17954.5
3 Do the data contain any outliers? Data contain no outliers 4 Johnson & Johnson’s sales are the largest on the list at
$14,138 million. Suppose a data entry error (a transposition) had been made and the sales had been entered as $41,138 million. Would the method of detecting outliers in part (c) identify this problem and allow for correction of the data entry error? Yes, 41138 would be an outlier, data value would be reviewed and detected
5 Show a box plot. See next slide
Aleksandra Pawłowska Exploratory Data Analysis
Task 5 – solution
Aleksandra Pawłowska Exploratory Data Analysis