Data analytics

hp17
Lab1.docx

31. There is no consistent way of defining an outlier that

everyone agrees upon. For example, some people refer

to an outlier that is any observation more than three standard

deviations from the mean. Other people use the box

plot definition, where an outlier (moderate or extreme)

is any observation more than 1.5 IQR from the edges of

the box, and some people care only about the extreme

box plot-type outliers, those that are 3.0 IQR from the

edges of the box. The file P02_18.xlsx contains daily

percentage changes in the S&P 500 index over several

years. Identify outliers—days when the percentage

change was unusually large in either a negative or positive

direction—according to each of these three definitions.

Which definition produces the most outliers?

37. The file P02_35.xlsx contains data from a survey of

500 randomly selected households. Use Excel filters to

answer the following questions.

a. Identify households that own their home and have a

monthly home mortgage payment in the top quartile

of the monthly payments for all households.

b. Identify households with monthly expenditures on

utilities that are within two standard deviations of

the mean monthly expenditure on utilities for all

households.

c. Identify households with total indebtedness (excluding

home mortgage) less than 10% of the household’s

primary annual income level.

56. The file P02_56.xlsx contains monthly values of

indexes that measure the amount of energy necessary

to heat or cool buildings due to outside temperatures.

(See the explanation in the Source sheet of the file.)

These are reported for each state in the United States

and also for several regions, as listed in the Locations

sheet, from 1931 to 2000. Create summary measures

and/or charts to see whether there is any indication of

temperature changes (global warming?) through time,

and report your findings.