Data analytics
31. There is no consistent way of defining an outlier that
everyone agrees upon. For example, some people refer
to an outlier that is any observation more than three standard
deviations from the mean. Other people use the box
plot definition, where an outlier (moderate or extreme)
is any observation more than 1.5 IQR from the edges of
the box, and some people care only about the extreme
box plot-type outliers, those that are 3.0 IQR from the
edges of the box. The file P02_18.xlsx contains daily
percentage changes in the S&P 500 index over several
years. Identify outliers—days when the percentage
change was unusually large in either a negative or positive
direction—according to each of these three definitions.
Which definition produces the most outliers?
37. The file P02_35.xlsx contains data from a survey of
500 randomly selected households. Use Excel filters to
answer the following questions.
a. Identify households that own their home and have a
monthly home mortgage payment in the top quartile
of the monthly payments for all households.
b. Identify households with monthly expenditures on
utilities that are within two standard deviations of
the mean monthly expenditure on utilities for all
households.
c. Identify households with total indebtedness (excluding
home mortgage) less than 10% of the household’s
primary annual income level.
56. The file P02_56.xlsx contains monthly values of
indexes that measure the amount of energy necessary
to heat or cool buildings due to outside temperatures.
(See the explanation in the Source sheet of the file.)
These are reported for each state in the United States
and also for several regions, as listed in the Locations
sheet, from 1931 to 2000. Create summary measures
and/or charts to see whether there is any indication of
temperature changes (global warming?) through time,