Info project 4

profiledanya.alsinan
4INFO1010SingleVariableMeasuresofVariability.pptx

INFO 1010

SINGLE VARIABLE MEASURES OF VARIABILITY

‹#›

1

Measures of Variability

It is often desirable to consider measures of variability (dispersion), as well as measures of location.

For example, in choosing supplier A or supplier B we might consider not only the average delivery time for each, but also the variability in delivery time for each.

‹#›

Measures of Variability

Range

Interquartile Range

Variance

Standard Deviation

Coefficient of Variation

‹#›

Interquartile Range

The interquartile range of a data set is the difference between the third quartile and the first quartile.

It is the range for the middle 50% of the data.

It overcomes the sensitivity to extreme data values.

‹#›

525

530

530

535

535

535

535

535

540

540

540

540

540

545

545

545

545

545

550

550

550

550

550

550

550

560

560

560

565

565

565

570

570

572

575

575

575

580

580

580

580

585

590

590

590

600

600

600

600

610

610

615

625

625

625

635

649

650

670

670

675

675

680

690

700

700

700

700

715

715

Interquartile Range

3rd Quartile (Q3) = 625

1st Quartile (Q1) = 545

IQR = Q3 - Q1 = 625 - 545 = 80

‹#›

Variance

The variance is a measure of variability that utilizes all the data.

It is based on the difference between the value of each observation (xi) and the mean ( for a sample, m for a population).

The variance is useful in comparing the variability of two or more variables.

=VAR.S(data cell range)

‹#›

Standard Deviation

The standard deviation of a data set is the positive square root of the variance.

It is measured in the same units as the data, making it more easily interpreted than the variance.

STDEV.S(data cell range)

‹#›

Standard

deviation is

about 9%

of the mean

Variance

Standard Deviation

Coefficient of Variation

Sample Variance, Standard Deviation,

And Coefficient of Variation

Apartment Rents

s2 = = 2,996.16

s =

% =

‹#›

Measures of Distribution

Distribution Shape

z-Scores

Chebyshev’s Theorem

Empirical Rule

Detecting Outliers

‹#›

Distribution Shape: Skewness

An important measure of the shape of a distribution is called skewness.

The formula for the skewness of sample data is

Skewness =

‹#›

Distribution Shape: Skewness

Symmetric (not skewed)

Relative Frequency

.05

.10

.15

.20

.25

.30

.35

0

Skewness = 0

Skewness is zero.

Mean and median are equal.

‹#›

Relative Frequency

.05

.10

.15

.20

.25

.30

.35

0

Distribution Shape: Skewness

Moderately Skewed Left

Skewness = - .31

Skewness is negative.

Mean will usually be less than the median.

‹#›

Distribution Shape: Skewness

Moderately Skewed Right

Relative Frequency

.05

.10

.15

.20

.25

.30

.35

0

Skewness = .31

Skewness is positive.

Mean will usually be more than the median.

‹#›

Distribution Shape: Skewness

Highly Skewed Right

Relative Frequency

.05

.10

.15

.20

.25

.30

.35

0

Skewness = 1.25

Skewness is positive (often above 1.0).

Mean will usually be more than the median.

‹#›

Seventy efficiency apartments were randomly

sampled in a college town. The monthly rent prices

for the apartments are listed below in ascending order.

Distribution Shape: Skewness

Example: Apartment Rents

525

530

530

535

535

535

535

535

540

540

540

540

540

545

545

545

545

545

550

550

550

550

550

550

550

560

560

560

565

565

565

570

570

572

575

575

575

580

580

580

580

585

590

590

590

600

600

600

600

610

610

615

625

625

625

635

649

650

670

670

675

675

680

690

700

700

700

700

715

715

‹#›

Relative Frequency

.05

.10

.15

.20

.25

.30

.35

0

Skewness = .92

Distribution Shape: Skewness

Example: Apartment Rents

‹#›

Empirical Rule

When the data are believed to approximate a

bell-shaped distribution …

The empirical rule is based on the normal

distribution, which is covered in Chapter 6.

The empirical rule can be used to determine the

percentage of data values that must be within a

specified number of standard deviations of the

mean.

‹#›

Empirical Rule

For data having a bell-shaped distribution:

of the values of a normal random variable

are within of its mean.

68.26%

+/- 1 standard deviation

of the values of a normal random variable

are within of its mean.

95.44%

+/- 2 standard deviations

of the values of a normal random variable

are within of its mean.

99.72%

+/- 3 standard deviations

‹#›

Empirical Rule

x

m – 3s

m – 1s

m – 2s

m + 1s

m + 2s

m + 3s

m

68.26%

95.44%

99.72%

‹#›

z-Scores

Standardized Values for Apartment Rents

‹#›

The z-score is often called the standardized value.

It denotes the number of standard deviations a data

value xi is from the mean.

z-Scores

Excel’s STANDARDIZE function can be used to

compute the z-score.

=

‹#›

z-Scores

A data value less than the sample mean will have a

z-score less than zero.

A data value greater than the sample mean will have

a z-score greater than zero.

A data value equal to the sample mean will have a

z-score of zero.

An observation’s z-score is a measure of the relative

location of the observation in a data set.

‹#›

Z-Scores and Distributions

‹#›

Detecting Outliers

An outlier is an unusually small or unusually large

value in a data set.

A data value with a z-score less than -3 or greater

than +3 might be considered an outlier.

It might be:

an incorrectly recorded data value

a data value that was incorrectly included in the

data set

a correctly recorded data value that belongs in

the data set

‹#›

Chebyshev’s Theorem

At least (1 - 1/z2) of the items in any data set will be within z standard

deviations of the mean, where z is any value greater than 1.

Chebyshev’s theorem requires z > 1, but z need not be an integer.

At least of the data values must be

within of the mean.

75%

z = 2 standard deviations

At least of the data values must be

within of the mean.

89%

z = 3 standard deviations

At least of the data values must be

within of the mean.

94%

z = 4 standard deviations

‹#›

Five-Number Summaries and Box Plots

Summary statistics and easy-to-draw graphs can be

used to quickly summarize large quantities of data.

Two tools that accomplish this are five-number

summaries and box plots.

‹#›

Five-Number Summary

1

Smallest Value

First Quartile

Median

Third Quartile

Largest Value

2

3

4

5

‹#›

Box Plot

A box plot is a graphical summary of data that is

based on a five-number summary.

A key to the development of a box plot is the

computation of the median and the quartiles Q1 and

Q3.

Box plots provide another way to identify outliers.

‹#›

Box Plot

Whiskers (dashed lines) are drawn from the ends of the box to the smallest and largest data values

inside the limits.

500

525

550

575

600

625

650

675

700

725

Smallest value

inside limits = 525

Largest value

inside limits = 715

‹#›

Data Dashboards:

Adding Numerical Measures

to Improve Effectiveness

The addition of numerical measures, such as the mean

and standard deviation of KPIs, to a data dashboard

is often critical.

Drilling down refers to functionality in interactive

dashboards that allows the user to access information

and analyses at increasingly detailed level.

Dashboards are often interactive.

Data dashboards are not limited to graphical displays.

‹#›

‹#›

-1.20

-1.11

-1.11

-1.02

-1.02

-1.02

-1.02

-1.02

-0.93

-0.93

-0.93

-0.93

-0.93

-0.84

-0.84

-0.84

-0.84

-0.84

-0.75

-0.75

-0.75

-0.75

-0.75

-0.75

-0.75

-0.56

-0.56

-0.56

-0.47

-0.47

-0.47

-0.38

-0.38

-0.34

-0.29

-0.29

-0.29

-0.20

-0.20

-0.20

-0.20

-0.11

-0.01

-0.01

-0.01

0.17

0.17

0.17

0.17

0.35

0.35

0.44

0.62

0.62

0.62

0.81

1.06

1.08

1.45

1.45

1.54

1.54

1.63

1.81

1.99

1.99

1.99

1.99

2.27

2.27