Help with Excel assignment for STATISTICS

profileGhalo1
ch02.pdf

1

Descriptive Statistics

2CHAPTER

OBJECTIVE To present graphical and numerical methods for exploring, summarizing, and describing data

CONTENTS

2.1 Graphical and Numerical Methods for Describing Qualitative Data

2.2 Graphical Methods for Describing Quantitative Data

2.3 Numerical Methods for Describing Quantitative Data

2.4 Measures of Central Tendency

2.5 Measures of Variation

2.6 Measures of Relative Standing

2.7 Methods for Detecting Outliers

2.8 Distorting the Truth with Descriptive Statistics

STATISTICS IN ACTION Characteristics of Contaminated Fish in The Tennessee River, Alabama

•••

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 1

MARKED SET

ENERGY

2 Chapter 2 Descriptive Statistics

2.1 Graphical and Numerical Methods for Describing Qualitative Data

Assuming you have collected a data set of interest to you, how can you make sense out of it? That is, how can you organize and summarize the data set to make it more com- prehensible and meaningful? In this chapter, we look at several basic statistical tools for describing data. These involve graphs and charts that rapidly convey a visual pic- ture of the data, and numerical measures that describe certain features of the data. The proper procedure to use depends on the type of data (quantitative or qualitative) that we want to describe.

When describing qualitative observations, we define the categories in such a way that each observation can fall in one and only one category. The data set is then de- scribed numerically by giving the number of observations, or the proportion of the total number of observations, that fall in each of the categories.

Definition 2.1 The category frequency for a given category is the number of observations that fall in that category.

Definition 2.2 The category relative frequency for a given category is the proportion of the total number of observations that fall in that category.

To illustrate, consider a problem of interest to researchers investigating the safety of nuclear power reactors and the hazards of using energy. The researchers discovered 45 energy-related accidents worldwide since 1977 that resulted in multiple fatalities. Table 2.1 summarizes the researcher’s findings. In this application, the qualitative variable of interest is the cause of the fatal energy-related accident. You can see from Table 2.1 that the data for the 45 accidents fall into six categories (causes). The sum- mary table gives both the frequency and relative frequency of each cause category. Clearly, a gas explosion was most likely to have caused an accident, occurring in 28 of the 45 accidents (or approximately 62%).

Graphical descriptions of qualitative data sets are usually achieved using bar graphs or pie charts; these figures are often constructed by a computer. Bar graphs give the frequency (or relative frequency) corresponding to each category, with the height or length of the bar proportional to the category frequency (or relative frequency). Pie charts divide a complete circle (a pie) into slices, one corresponding to each category, with the central angle of the slice proportional to the category

TABLE 2.1 Summary Frequency Table for Cause of Energy- Related Fatal Accidents

Category Frequency Relative Frequency (Cause) (Number of Accidents) (Proportion)

Coal mine collapse 7 .156

Dam failure 4 .089

Gas explosion 28 .622

Lightning 1 .022

Nuclear reactor 1 .022

Oil fire 4 .089

Totals 45 1.000

Source: “Safety of Nuclear Power Reactors.” Nuclear Issues Briefing Paper 14, November 2004.

MENDMC02_0131877062.QXD 3/24/06 5:33 PM Page 2

2.1 Graphical and Numerical Methods for Describing Qualitative 3

FIGURE 2.1 MINITAB bar chart for cause of energy-related fatal accidents

relative frequency. Examples of these familiar graphical methods are shown in Figures 2.1 and 2.2.

Figure 2.1 is a vertical bar graph produced by MINITAB that describes the data in Table 2.1. (Bar graphs can be vertical or horizontal.) Each bar corresponds to one of the six causes, and the height of the bar is proportional to the number of fatal accidents that fall in that cause category.

There were approximately 144,000 industrial robots operating in North America in 2004. Figure 2.2 is a MINITAB pie chart showing the percentages of new industrial robot units assigned to various tasks. The six task categories are (1) spot welding, (2) arc welding, (3) material removal, (4) material handling, (5) assembly, and (6) dis- pensing/coating. A pie chart shows a section of the pie for each category, where the size of the pie slice is proportional to the category relative frequency (percentage). The pie chart not only gives the exact percentage of new industrial robots assigned to each task but it also provides a rapid visual comparison of the relative frequencies. You can clearly see that material handling (34%) and spot welding (32%) are the two major uses of new industrial robots.

Vertical bar graphs like Figure 2.1 can be enhanced by arranging the bars on the graph in the form of a Pareto diagram. A Pareto diagram (named for the Italian econ- omist Vilfredo Pareto) is a frequency bar graph with the bars displayed in order of height, starting with the tallest bar on the left. Pareto diagrams are popular graphical tools in process and quality control, where the heights of the bars often represent fre- quencies of problems (e.g., defects, accidents, breakdowns, and failures) in the pro- duction process. Because the bars are arranged in descending order of height, it is easy to identify the areas with the most severe problems.

An SPSS Pareto diagram for the energy-related accident data summarized in Table 2.1 is displayed in Figure 2.3. Since the relative frequencies associated with the six cause categories are arranged in decreasing order, it is easy to identify the cause (gas explosion) of the most accidents. In addition to the bars with decreasing heights, the Pareto diagram also shows a plot of the cumulative proportion of accidents (called a “cum” line) superimposed over the bars. The cum line scale appears on the right side of the Pareto diagram in Figure 2.3.

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 3

MARKED SET

4 Chapter 2 Descriptive Statistics

FIGURE 2.2 MINITAB pie chart for industrial robot applications

Source: Robotic Industries Association.

FIGURE 2.3 SPSS Pareto diagram for cause of energy-related fatal accidents

ROBOTS

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 4

MARKED SET

2.1 Graphical and Numerical Methods for Describing Qualitative 5

FIGURE 2.4 SAS analysis of ice types for meltponds

Example 2.1 The National Snow and Ice Data Center (NSIDC) collects data on the albedo, depth, and physical characteristics of ice meltponds in the Canadian Arctic. En- vironmental engineers at the University of Colorado are using these data to study how climate impacts the sea ice. Data for 504 ice meltponds located in the Barrow Strait in the Canadian Arctic are saved in the PONDICE file. One vari- able of interest is the type of ice observed for each pond. Ice type is classified as first-year ice, multiyear ice, or landfast ice. Construct a summary table and a horizontal bar graph to describe the ice types of the 504 meltponds. Interpret the results.

The data in the PONDICE file were analyzed using SAS. Figure 2.4 shows a SAS summary table for the three ice types. Of the 504 meltponds, 88 had first-year ice, 220 had multiyear ice, and 196 had landfast ice. The corresponding proportions (or rela- tive frequencies) are , , and . These proportions are shown in the “Percent” column in the table and in the accompanying SAS horizontal bar graph in Figure 2.4. The University of Colorado researchers used this information to estimate that about 17% of meltponds in the Canadian Arctic have first-year ice.

Summary of Graphical Descriptive Methods for Qualitative Data Bar Graph: The categories (classes) of the qualitative variable are represented by bars, where the height of each bar is either the class frequency, class relative fre- quency, or class percentage.

Pie Chart: The categories (classes) of the qualitative variable are represented by slices of a pie (circle). The size of each slice is proportional to the class relative frequency.

Pareto Diagram: A bar graph with the categories (classes) of the qualitative vari- able (i.e., the bars) arranged by height in descending order from left to right.

196>504 = .389220>504 = .43688>504 = .175

Solution

PONDICE

MENDMC02_0131877062.QXD 3/24/06 5:33 PM Page 5

6 Chapter 2 Descriptive Statistics

Applied Exercises 2.1 PhD degrees in engineering. The following table lists the

number of doctoral degrees awarded in engineering and the number of students enrolled full-time in engineering doctoral programs at U.S. universities for the years 1999 to 2003. According to the American Society of Engineer- ing Education, the “number of doctoral degrees awarded in engineering has remained flat since 1999,” but “the doctoral enrollment has increased markedly over the same time frame.” Use bar graphs to support these statements.

Number of Doctoral Enrollment Year PhD Degrees (Full-Time)

1999 5945 31,536

2000 5990 33,312

2001 6085 35,425

2002 5802 40,949

2003 5870 45,462

Source: American Society for Engineering Education. Prism, October 2004.

2.2 Engineering jobs related to studies. Each month, Mechanical Engineering magazine reports the results of a survey of its readers. The November 2004 issue gave the results of responses to the question, “Does your job reflect the course of engineering studies you pursued in school?” The data are typically summarized in a pie chart. Form a pie chart for the summary data in the accompanying table. Interpret the results.

Does your job reflect the course of engineering studies you Percentage of pursued in school? Responses

Yes. My job is a close match. 29

No. It’s engineering, but not what I studied. 32

Job is not engineering related. 9

Currently unemployed. 30

Source: Mechanical Engineering, Vol. 126, No. 11, November 2004.

2.3 Beach erosional hotspots. Beaches that exhibit high ero- sion rates relative to the surrounding beach are defined as erosional hotspots. The U.S. Army Corps of Engineers is conducting a study of beach hotspots using an online ques- tionnaire. In early 2002, information on six beach hotspots was collected. Some of the data are listed in the table.

a. Identify each variable recorded as quantitative or qualitative.

b. Form a pie chart for the beach condition of the six hotspots.

c. Form a pie chart for the nearshore bar condition of the six hotspots.

d. Comment on the reliability of using the pie charts to make inferences about all beach hotspots in the country.

Long-Term Beach Beach Nearshore Bar Erosion Rate Hotspot Condition Condition (miles/year)

Miami No dunes/flat Single, 4 Beach, FL shore parallel

Coney Island, NY No dunes/flat Other 13

Surfside, CA Bluff/scarp Single, 35 shore parallel

Monmouth Single dune Planar Not estimated Beach, NJ

Ocean City, NJ Single dune Other Not estimated

Spring Lake, NJ Not observed Planar 14

Source: “Identification and characterization of erosional hotspots.” William & Mary Virginia Institute of Marine Science, U.S. Army Corps of Engineers Project Report, March 18, 2002.

2.4 Management system failures. The U.S. Chemical Safety and Hazard Investigation Board (CSB) is responsible for determining the root cause of industrial accidents. Since its creation in 1998, the CSB has identified 83 incidents that were caused by management system failures. (Process Safety Progress, Dec. 2004.) The accompanying table gives a breakdown of the root causes of these 83 incidents. Construct a Pareto diagram for the data and interpret the graph.

Management System Cause Category Number of Incidents

Engineering & Design 27

Procedures & Practices 24

Management & Oversight 22

Training & Communication 10

Total 83

Source: Blair, A. S. “Management System Failures Identified in Incidents Investigated by the U.S. Chemical Safety and Hazard Investigation Board.” Process Safety Progress, Vol. 23, No. 4, Dec. 2004 (Table 1).

2.5 Unauthorized computer use. The Computer Security In- stitute (CSI) conducts an annual survey of computer crime at U.S. businesses. CSI sends survey questionnaires to computer security personnel at all U.S. corporations and government agencies. In 2001, 64% of the respondents admitted unauthorized use of computer systems at their firms during the year. (Computer Security Issues & Trends, Spring 2001.) One survey question asked, “If your busi- ness website suffered unauthorized use, where did the at- tack come from, inside or outside the company?” The responses for those business websites that did, in fact, ex- perience unauthorized use are summarized in the table for

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 6

MARKED SET

2.1 Graphical and Numerical Methods for Describing Qualitative 7

two survey years, 1999 (125 reported attacks) and 2001 (163 reported attacks).

a. Construct a bar chart to describe the sources of unau- thorized computer use in 1999.

b. Construct a bar chart to describe the sources of unau- thorized computer use in 2001.

c. Compare the responses for the two years. What infer- ence can be made from the charts?

Percentage Percentage WWW Site Attack in 1999 in 2001

Inside 7 4

Outside 38 47

Both 41 22

Don’t Know 14 26

Totals 100 100

Source: “2001 CSI/FBI computer crime and security survey.” Computer Security Issues & Trends, Vol. 7, No. 1, Spring 2001.

2.6 Top PWB manufacturers. Circuit World (Vol. 26, 2000) re- ported on the top 61 printed wiring board (PWB) manufac- turers in the world. The company, country of origin, and annual revenue (in millions of dollars) for these PWB man- ufacturers are saved in the PWB file. The first 10 observa- tions are shown in the accompanying table. Construct a bar graph to describe the country of origin of the top 61 PWB manufacturers. There is concern about the viability of the PWB industry in Europe due to the lack of companies orig- inating from Europe. Does the bar graph support this belief?

(First 10 companies shown)

Rank Company Country Revenue

1 CMK Japan 1,085

2 Viasystems USA 980

3 Ibiden Japan 890

4 Hadco USA 665

5 Nippon Mektron Japan 660

6 Hitachi Chemical Group Japan 616

7 Compeq Mfg Taiwan 560

8 Mitsubishi Gas Chem Group Japan 530

9 Shinko Electric Japan 478

10 Johnson Matthey USA 405

Source: Britton, P. “Competitive PWB manufacturing: What is needed to Maintain a Viable Industry in Europe?” Circuit World, Vol. 26, No. 3, 2000 (Table 1).

2.7 Benford’s Law of Numbers. According to Benford’s Law, certain digits ( ) are more likely to occur as the first significant digit in a randomly selected number than

1, 2, 3, Á , 9

other digits. For example, the law predicts that the number 1 is the most likely to occur (30% of the time) as the first digit. In a study reported in the American Scientist (July–Aug. 1998) to test Benford’s Law, 743 first-year college students were asked to write down a six-digit number at random. The first significant digit of each number was recorded and its distribution summarized in the following table.

Digits

First Digit Number of Occurrences

1 109

2 75

3 77

4 99

5 72

6 117

7 89

8 62

9 43

Total 743

Source: Hill, T.P. “The first digit phenomenon.” American Scientist, Vol. 86, No. 4, July–Aug. 1998, p. 363 (Figure 5).

a. Describe the first digit of the “random guess” data with a Pareto diagram.

b. Does the graph support Benford’s Law? Explain.

2.8 Software defects. The PROMISE Software Engineering Repository is a collection of data sets available to serve re- searchers in building predictive software models. One such data set, saved in the SWDEFECTS file, contains in- formation on 498 modules of software code. Each module was analyzed for defects and classified as “true” if it con- tained defective code and “false” if not. Access the data file and produce a pie chart for the defect variable. Use the pie chart to make a statement about the likelihood of de- fective software code.

2.9 Extinct New Zealand birds. Refer to the Evolutionary Ecology Research (July, 2003) study of the patterns of ex- tinction in the New Zealand bird population, Exercise 1.10 (p. xxx). Data on flight capability (volant or flightless), habitat (aquatic, ground terrestrial, or aerial terrestrial), nesting site (ground, cavity within ground, tree, cavity above ground), nest density (high or low), diet (fish, verte- brates, vegetables, or invertebrates), body mass (grams), egg length (millimeters), and extinct status (extinct, absent from island, present) for 132 bird species at the time of the Maori colonization of New Zealand are saved in the NZBIRDS file. Use a graphical method to investigate the theory that extinct status is related to flight capability, habitat, and nest density.

PWB

SWDEFECTS

NZBIRDS

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 7

MARKED SET

8 Chapter 2 Descriptive Statistics

TABLE 2.2 EPA Mileage Ratings on 100 Cars

36.3 41.0 36.9 37.1 44.9 36.8 30.0 37.2 42.1 36.7

32.7 37.3 41.2 36.6 32.9 36.5 33.2 37.4 37.5 33.6

40.5 36.5 37.6 33.9 40.2 36.4 37.7 37.7 40.0 34.2

36.2 37.9 36.0 37.9 35.9 38.2 38.3 35.7 35.6 35.1

38.5 39.0 35.5 34.8 38.6 39.4 35.3 34.4 38.8 39.7

36.3 36.8 32.5 36.4 40.5 36.6 36.1 38.2 38.4 39.3

41.0 31.8 37.3 33.1 37.0 37.6 37.0 38.7 39.0 35.8

37.0 37.2 40.7 37.4 37.1 37.8 35.9 35.6 36.7 34.5

37.1 40.3 36.7 37.0 33.9 40.1 38.0 35.2 34.8 39.5

39.9 36.9 32.9 33.8 39.8 34.0 36.8 35.0 38.1 36.9

2.10 Groundwater contamination in wells. In New Hampshire, about half the counties mandate the use of reformulated gasoline. This has lead to an increase in the contamination of groundwater with methyl tert-butyl ether (MTBE). Environmental Science & Technology (Jan. 2005) reported on the factors related to MTBE contamination in public and private New Hampshire wells. Data were collected for a sample of 223 wells. These data are saved in the MTBE file. Three of the variables are qualitative in nature: well class (public or private), aquifer (bedrock or unconsoli- dated), and detectable level of MTBE (below limit or detect). (Note: A detectable level of MTBE occurs if the MTBE value exceeds .2 micrograms per liter.) The data for 10 selected wells are shown in the accompanying table. Use graphical methods to describe each of the three qualitative variables for all 223 wells.

(10 selected observations from 223)

Well Class Aquifier Detect MTBE

Private Bedrock Below limit

Private Bedrock Below limit

Public Unconsolidated Detect

Public Unconsolidated Below limit

Public Unconsolidated Below limit

Public Unconsolidated Below limit

Public Unconsolidated Detect

Public Unconsolidated Below limit

Public Unconsolidated Below limit

Public Bedrock Detect

Public Bedrock Detect

Source: Ayotte, J.D., Argue, D.M., and McGarry, F.J., “Methyl tert-butyl ether occurrence and related factors in public and private wells in southeast New Hampshire.” Environmental Science & Technology, Vol. 39, No. 1, Jan. 2005.

2.2 Graphical Methods for Describing Quantitative Data

Recall from Section 1.3 that quantitative data sets consist of data that are recorded on a meaningful numerical scale. For describing, summarizing, and detecting patterns in such data, we can use three graphical methods: dot plots, stem-and-leaf displays, and histograms. Since most statistical software packages can be used to construct these displays, we’ll focus here on their interpretation rather than their construction.

For example, the Environmental Protection Agency (EPA) performs extensive tests on all new car models to determine their mileage ratings. Suppose that the 100 measurements in Table 2.2 represent the results of such tests on a certain new car model. How can we summarize the information in this rather large sample?

A visual inspection of the data indicates some obvious facts. For example, most of the mileages are in the 30s, with a smaller fraction in the 40s. But it is difficult to provide much additional information on the 100 mileage ratings without resorting to some method of summarizing the data. One such method is a dot plot.

Dot Plots

A MINITAB dot plot for the 100 EPA mileage ratings is shown in Figure 2.5. The horizontal axis of Figure 2.5 is a scale for the quantitative variable in miles per gallon

EPAGAS

MTBE

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 8

MARKED SET

2.2 Graphical Methods for Describing Quantitative Data 9

(mpg). The numerical value of each measurement in the data set is located on the hor- izontal scale by a dot. When data values repeat, the dots are placed above one another, forming a pile at that particular numerical location. As you can see, this dot plot veri- fies that almost all of the mileage ratings are in the 30s, with most falling between 35 and 40 miles per gallon.

Stem-and-Leaf Display

Another graphical representation of these same data, a MINITAB stem-and-leaf dis- play, is shown in Figure 2.6. In this display the stem is the portion of the measurement (mpg) to the left of the decimal point, and the remaining portion to the right of the dec- imal point is the leaf.

In Figure 2.6, the stems for the data set are listed in the second column from the smallest (30) to the largest (44). Then the leaf for each observation is listed to the right in the row of the display corresponding to the observation’s stem. For example, the leaf 3 of the first observation (36.3) in Table 2.2 appears in the row corresponding to the stem 36. Similarly, the leaf 7 for the second observation (32.7) in Table 2.2 ap- pears in the row corresponding to the stem 32, and the leaf 5 for the third observation (40.5) appears in the row corresponding to the stem 40. (The stems and leaves for these first three observations are highlighted in Figure 2.6.) Typically, the leaves in each row are ordered as shown in the MINITAB stem-and-leaf display.

The stem-and-leaf display presents another compact picture of the data set. You can see at a glance that the 100 mileage readings were distributed between 30.0 and

30.0 32.5 35.0 37.5 MPG

Dotplot of MPG

40.0 42.5 45.0

FIGURE 2.5 MINITAB dot plot for 100 EPA mileage ratings

Stem�and�leaf of MPG Leaf Unit � 0.10

30 38

1 30 0

2 31 8

6 32 5799

12 33 126899

18 34 024588

29 35 01235667899

49 36 01233445566777888999

(21) 37 000011122334456677899 0122345678

20 39 00345789

2 42

12 40 0123557

5 41 002 1

1 43

1 44 9

N � 100FIGURE 2.6 MINITAB Stem-and-leaf display for 100 mileage ratings

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 9

MARKED SET

10 Chapter 2 Descriptive Statistics

30.0

0

10

20

F re

q u

en cy

30

40

31.5 33.0 34.5 36.0 37.5 MPG

39.0 40.5 42.0 43.5 45.0

FIGURE 2.7 SPSS histogram for 100 EPA mileage ratings

44.9, with most of them falling in stem rows 35 to 39. The six leaves in stem row 34 indicate that six of the 100 readings were at least 34.0 but less than 35.0. Similarly, the eleven leaves in stem row 35 indicate that eleven of the 100 readings were at least 35.0 but less than 36.0. Only five cars had readings equal to 41 or larger, and only one was as low as 30.

Steps to Follow in Constructing a Stem-and-Leaf Display

Step 1 Divide each observation in the data set into two parts, the stem and the leaf. For example, the stem and leaf of the mileage 31.8 are 31 and 8, respectively:

Stem Leaf

31 8

Step 2 List the stems in order in a column, starting with the smallest stem and end- ing with the largest.

Step 3 Proceed through the data set, placing the leaf for each observation in the ap- propriate stem row. Arbitrarily, you may want to arrange the leaves in each row in ascending order.

Histograms

An SPSS histogram for these 100 EPA mileage readings is shown in Figure 2.7. The horizontal axis of Figure 2.7, which gives the miles per gallon for a given automobile, is divided into class intervals commencing with the interval from 30.0–31.5 and pro- ceeding in intervals of equal size to 43.5–45.0 mpg. The vertical axis gives the num- ber (or frequency) of the 100 readings that fall in each interval. It appears that about 33 of the 100 cars, or 33%, obtained a mileage between 36.0 and 37.5. This class interval contains the highest frequency, and the intervals tend to contain a smaller number of the measurements as the mileages get smaller or larger.

Histograms can be used to display either the frequency of relative frequency of the measurements falling into the class intervals. The class intervals, frequencies, and

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 10

MARKED SET

2.2 Graphical Methods for Describing Quantitative Data 11

*SPSS, like many software packages, will classify an observation that falls on the borderline of a class interval into the next highest interval. For example, the gas mileage of 37.5, which falls on the border between the class intervals 36.0–37.5 and 37.5–39.0, is classified into the 37.5–39.0 class. The frequencies in Table 2.3 reflect this convention.

relative frequencies for the EPA car mileage data are shown in the summary table, Table 2.3.*

By summing the relative frequencies in the intervals 34.5–36.0, 36.0–37.5, and 37.5–39.0, you can see that 65% of the mileages are between 34.5 and 39.0. Similarly, only 2% of the cars obtained a mileage rating over 42.0. Many other summary state- ments can be made by further study of the histogram and accompanying summary table. Note that the sum of all class frequencies will always equal the sample size, n. Some recommendations for selecting the number of intervals in a histogram for smaller data sets are given in the following box.

Determining the Number of Classes in a Histogram

Number of Observations in Data Set Number of Classes

Less than 25 5–6

25–50 7–14

More than 50 15–20

Although histograms provide good visual descriptions of data sets—particularly very large ones—they do not let us identify individual measurements. In contrast, each of the original measurements is visible to some extent in a dot plot and clearly visible in a stem-and-leaf display. The stem-and-leaf display arranges the data in ascending order, so it’s easy to locate the individual measurements. For example, in Figure 2.6 we can easily see that two of the gas mileage measurements are equal to 36.3, but can’t see that fact by inspecting the histogram in Figure 2.7. However, stem-and-leaf displays can become unwieldy for very large data sets. A very large number of stems and leaves causes the vertical and horizontal dimensions of the display to become cumbersome, diminishing the usefulness of the visual display.

TABLE 2.3 Class Intervals, Frequencies, and Relative Frequencies for the Car Mileage Data

Class Interval Frequency Relative Frequency

30.0–31.5 1 .01

31.5–33.0 5 .05

33.0–34.5 9 .09

34.5–36.0 14 .14

36.0–37.5 33 .33

37.5–39.0 18 .18

39.0–40.5 12 .12

40.5–42.0 6 .06

42.0–43.5 1 .01

43.5–45.0 1 .01

Totals 100 1.00

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 11

MARKED SET

IRONORE

12 Chapter 2 Descriptive Statistics

Steps to Follow in Constructing a Histogram

Step 1 Calculate the range of the data:

Step 2 Divide the range into between 5 and 20 classes of equal width. The number of classes is arbitrary, but you will obtain a better graphical description if you use a small number of classes for a small amount of data and a larger number of classes for larger data sets (see the rule of thumb in the previous box). The low- est (or first) class boundary should be located below the smallest measurement, and the class width should be chosen so that no observation can fall on a class boundary.

Step 3 For each class, count the number of observations that fall in that class. This number is called the class frequency.

Step 4 Calculate each class relative frequency:

Step 5 The histogram is essentially a bar graph in which the categories are classes. In a frequency histogram, the heights of the bars are determined by the class frequency. Similarly, in a relative frequency histogram, the heights of the bars are determined by the class relative frequency.

Example 2.2 The IRONORE file contains data on the percentage iron content for 390 iron-ore specimens collected in Japan. Figure 2.8 is a relative frequency histogram for the 390 iron-ore measurements produced using SAS.

a. Interpret the graph.

b. Visually estimate the fraction of iron-ore measurements that lie between 64.6 and 65.8.

Solution a. Note that the classes are marked off in intervals of .4 along the horizontal axis of the SAS histogram in Figure 2.8, with the midpoint (rather than the lower and upper boundaries) of each interval shown. The histogram shows that the percentage iron-ore measurements tend to pile up near 66; that is, the class from 65.6 to 66.4 has the greatest relative frequency.

b. The bars that fall in the interval from 64.6 to 65.8 are shaded in Figure 2.8. This shaded portion represents approximately 40% of the total area of the bars for the complete distribution. Thus, about 40% of the 390 iron-ore measurements lie be- tween 64.6 and 65.8.

Interpreting a Relative Frequency Distribution

The percentage of the total number of measurements falling within a particular inter- val is proportional to the area of the bar that is constructed above the interval. For ex- ample, if 30% of the area under the distribution lies over a particular interval, then 30% of the observations fall in that interval.

Class relative frequency = Class frequency

Total number of measurements

Range = Largest observation - Smallest observation

MENDMC02_0131877062.QXD 3/24/06 5:33 PM Page 12

2.2 Graphical Methods for Describing Quantitative Data 13

FIGURE 2.8 SAS histogram for iron-ore data

Most statistical software packages can be used to generate histograms, stem-and- leaf displays, and dot plots. All three are useful tools for graphically describing data sets. We recommend that you generate and compare the displays whenever you can. You’ll find that histograms are generally more useful for very large data sets, while stem-and-leaf displays and dot plots provide useful detail for smaller data sets.

Summary of Graphical Descriptive Methods for Quantitative Data Dot Plot: The numerical value of each quantitative measurement in the data set is represented by a dot on a horizontal scale. When data values repeat, the dots are placed above one another vertically.

Stem-and-Leaf Display: The numerical value of the quantitative variable is parti- tioned into a “stem” and a “leaf.” The possible stems are listed in order in a column. The leaf for each quantitative measurement in the data set is placed in the corre- sponding stem row. Leaves for observations with the same stem value are listed in increasing order horizontally.

Histogram: The possible numerical values of the quantitative variable are parti- tioned into class intervals, where each interval has the same width. These intervals form the scale of the horizontal axis. The frequency or relative frequency of obser- vations in each class interval is determined. A vertical bar is placed over each class interval with height equal to either the class frequency or class relative frequency.

2.11 Radioactive lichen. Lichen has a high absorbance capacity for radiation fallout from nuclear accidents. Since lichen is a major food source for Alaskan caribou, and caribou are, in turn, a major food source for many Alaskan villagers, it is important to monitor the level of radioactivity in lichen. Researchers at the University of Alaska, Fairbanks,

collected data on nine lichen specimens at various loca- tions for this purpose. The amount of the radioactive ele- ment, cesium-137, was measured (in microcuries per milliliter) for each specimen. The data values, converted to logarithms, are given in the table. (Note, the closer the value is to zero, the greater the amount of cesium in the specimen.)

Applied Exercises LICHEN

MENDMC02_0131877062.QXD 3/24/06 5:33 PM Page 13

14 Chapter 2 Descriptive Statistics

MINITAB output for Exercise 2.13

Location

Bethel −5.50 −5.00

Eagle Summit −4.15 −4.85

Moose Pass −6.05

Turnagain Pass −5.00

Wickersham Dome −4.10 −4.50 −4.60

Source: Lichen Radionuclide Baseline Research project, 2003.

a. Construct a dot plot for the nine measurements. b. Construct a stem-and-leaf display for the nine

measurements. c. Construct a histogram plot for the nine measurements. d. Which of the three graphs, parts a–c, is more informative? e. What proportion of the measurements have a radioac-

tivity level of −5.00 or lower?

2.12 Spectral images of asteroids. Researchers at the Massachu- setts Institute of Technology (MIT) studied the spectroscopic

properties of main-belt asteroids having diameters smaller than 10 kilometers. Asteroids were observed with the Hilt- ner telescope at the MIT Observatory; the number N of in- dependent spectral image exposures for each observation was recorded. The data for 40 asteroid observations, ob- tained from Science (Apr. 9, 1993), are listed here.

3 4 3 3 1 4 1 3 2 3

1 1 4 2 3 3 2 6 1 1

3 3 2 2 2 2 1 3 2 1

6 1 3 2 2 1 2 2 4 2

Source: Binzel, R. P., and Xu, S., “Chips off of Asteroid 4 Vesta: Evidence for the parent body of basaltic achondrite meteorites.” Science. Vol. 260, Apr. 3, 1993, p. 187 (Table 1).

a. Summarize the data with a stem-and-leaf display. b. What proportion of asteroid observations resulted in

exactly one spectral image exposure?

ASTEROIDS

EARTHQUAKE 2.13 Earthquake aftershock magnitudes. Seismologists use the term aftershock to describe the smaller earthquakes that follow a main

earthquake. Following the Northridge earthquake of January 17, 1994, the Los Angeles area experienced 2929 aftershocks in a 3- week period. The magnitudes (measured on the Richter scale) for these aftershocks were recorded by the U.S. Geological Survey and are saved in the EARTHQUAKE file. A MINITAB relative frequency histogram for these magnitudes is shown below.

a. Estimate the percentage of the 2929 aftershocks measuring between 1.5 and 2.5 on the Richter scale. b. Estimate the percentage of the 2929 aftershocks measuring greater than 3.0 on the Richter scale.

2.14 Process voltage readings. A Harris Corporation/University of Florida study was undertaken to determine whether a manufacturing process performed at a remote location can be established locally. Test devices (pilots) were set up at

both the old and new locations and voltage readings on the process were obtained. A “good process” was considered to be one with voltage readings of at least 9.2 volts (with larger readings being better than smaller readings). The

VOLTAGE

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 14

MARKED SET

2.2 Graphical Methods for Describing Quantitative Data 15

table contains voltage readings for 30 production runs at each location.

Old Location New Location

9.98 10.12 9.84 9.19 10.01 8.82

10.26 10.05 10.15 9.63 8.82 8.65

10.05 9.80 10.02 10.10 9.43 8.51

10.29 10.15 9.80 9.70 10.03 9.14

10.03 10.00 9.73 10.09 9.85 9.75

8.05 9.87 10.01 9.60 9.27 8.78

10.55 9.55 9.98 10.05 8.83 9.35

10.26 9.95 8.72 10.12 9.39 9.54

9.97 9.70 8.80 9.49 9.48 9.36

9.87 8.72 9.84 9.37 9.64 8.68

Source: Harris Corporation, Melbourne, FL.

a. Construct a relative frequency histogram for the volt- age readings of the old process.

b. Construct a stem-and-leaf display for the voltage read- ings of the old process. Which of the two graphs in parts a and b is more informative?

c. Construct a frequency histogram for the voltage read- ings of the new process.

d. Compare the two graphs in parts a and c. (You may want to draw the two histograms on the same graph.) Does it appear that the manufacturing process can be established locally (i.e., is the new process as good as or better than the old)?

2.15 Sanitation inspection of cruise ships. To minimize the po- tential for gastrointestinal disease outbreaks, all passenger cruise ships arriving at U.S. ports are subject to unan- nounced sanitation inspections. Ships are rated on a 100- point scale by the Centers for Disease Control and Prevention. A score of 86 or higher indicates that the ship is providing an accepted standard of sanitation. The May 2004 sanitation scores for 174 cruise ships are saved in the SHIPSANIT file. The first five and last five observations in the data set are listed in the accompanying table.

(selected observations)

Ship Name Sanitation Score

Adonia 99

Adventure of the Seas 97

AIDAAura 99

AIDAAvita 98

Albatross 96

· ·

· ·

Volendam 97

Voyager of the Seas 97

Wind Spirit 97

Wind Surf 98

World Discoverer 89

Source: National Center for Environmental Health, Centers for Disease Control and Prevention, May 24, 2004.

a. Generate a stem-and-leaf display of the data. Identify the stems and leaves of the graph.

b. A score of 86 or higher at the time of inspection indi- cates the ship is providing an accepted standard of san- itation. Use the stem-and-leaf display to estimate the proportion of ships that have an accepted sanitation standard.

c. Locate the inspection score of 74 (Nautilus Explorer) on the stem-and-leaf display.

2.16 Surface roughness of pipe. Oil field pipes are internally coated in order to prevent corrosion. Engineers at the Uni- versity of Louisiana, Lafayette, investigated the influence that coating may have on the surface roughness of oil field pipes (Anti-corrosion Methods and Materials, Vol. 50, 2003). A scanning probe instrument was used to measure the surface roughness of 20 sample sections of coated in- terior pipe. The data (in micrometers) is provided in the table. Describe the sample data with an appropriate graph.

1.72 2.50 2.16 2.13 1.06 2.24 2.31 2.03 1.09 1.40

2.57 2.64 1.26 2.05 1.19 2.13 1.27 1.51 2.41 1.95

Source: Farshad, F. and Pesacreta, T. “Coated pipe Interior surface Roughness as Measured by Three scanning Probe Instruments.” Anti-corrosion Methods and Materials, Vol. 50, No. 1, 2003 (Table III).

2.17 Groundwater contamination in wells. Refer to the Envir- onmental Science & Technology (Jan. 2005) study of the factors related to MTBE contamination in 223 New Hampshire wells, Exercise 2.10 (p. xxx). The data are saved in the MTBE file. Two of the many quantitative variables measured for each well are the pH level (stan- dard units) and the MTBE level (micrograms per liter).

a. Construct a histogram for the pH levels of the sampled wells. From the histogram, estimate the proportion of wells with pH values less than 7.0.

b. For those wells with detectable levels of MTBE, con- struct a histogram for the MTBE values. From the his- togram, estimate the proportion of contaminated wells with MTBE values that exceed 5 micrograms per liter.

2.18 Insecticides on orchards. Refer to the Environmental Science & Technology study of insecticides used on dor- mant orchards in the San Joaquin Valley, California, Exer- cise 1.3. Ambient air samples were collected and analyzed daily at an orchard site during the most intensive period of spraying. The thion and oxon levels (in ng/m3) in the air samples are recorded in the table, as well as the oxon/thion ratios.

SHIPSANIT

ROUGHPIPE

ORCHARD

MTBE

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 15

MARKED SET

16 Chapter 2 Descriptive Statistics

2 4 6 8 10 12 14 0

15

30

45

60

75

90

105

120

135

35

130

70

15

5

N u

m b

er o

f p

ar ti

cl es

Diameter (nanometers)

Source: Roy, B. and Chakravorty, D. “Ultrafine copper particles grown in a glass ceramic.” Journal of Applied Physics, Vol. 74, No.6, Sept. 15, 1993, p. 4192 (Figure 3).

Oxon/Thion Date Condition Thion Oxon Ratio

Jan. 15 Fog 38.2 10.3 .270

17 Fog 28.6 6.9 .241

18 Fog 30.2 6.2 .205

19 Fog 23.7 12.4 .523

20 Fog 62.3 (Air sample lost) —

20 Clear 74.1 45.8 .618

21 Fog 88.2 9.9 .112

21 Clear 46.4 27.4 .591

22 Fog 135.9 44.8 .330

23 Fog 102.9 27.8 .270

23 Cloud 28.9 6.5 .225

25 Fog 46.9 11.2 .239

25 Clear 44.3 16.6 .375

Source: Selber, J. N., et al., “Air and Fog Deposition Residues of Four Organophosphate Insecticides used on Dormant Orchards in the San Joaquin Valley, California.” Environmental Science & Technology. Vol. 27, No. 10, Oct. 1993, p. 2240 (Table V).

a. Summarize the daily oxon/thion ratios with a stem- and-leaf display.

b. Comment on the statement, “The oxon/thion ratio for the insecticide chlorpyrifos is greater in the clear air than in fog air.”

2.19 Growing ultrafine copper particles. Scientists in India ex- perimented with growing copper nanoparticles within a glass medium (Journal of Applied Physics, Sept. 1993). A glass ceramic was subjected to an alkali/copper ion ex- change reaction followed by a reduction treatment in hy- drogen. Upon drying, a sample of 255 copper particles was extracted from the glass surface. The diameters of the copper particles were measured and are described by the accompanying frequency histogram.

a. Approximately how many copper particles had a diam- eter between 5 and 7 nanometers?

b. Convert the frequency histogram to a relative frequency histogram.

c. Approximately what proportion of copper particles ex- ceeded 9 nanometers in diameter?

2.20 Estimating the age of glacial drifts. Tills are glacial drifts consisting of a mixture of clay, sand, gravel, and boulders. Engineers from the University of Washington’s Depart- ment of Earth and Space Sciences studied the chemical makeup of buried tills in order to estimate the age of the glacial drifts in Wisconsin. (American Journal of Science, Jan. 2005.) The ratio of the elements aluminum (Al) and beryllium (Be) in sediment is related to the duration of burial. The Al/Be ratios for a sample of 26 buried till spec- imens are given in the table. With the aid of a graph, esti- mate the proportion of till specimens with an Al/Be ratio that exceeds 4.5.

3.75 4.05 3.81 3.23 3.13 3.30 3.21 3.32 4.09 3.90 5.06 3.85 3.88

4.06 4.56 3.60 3.27 4.09 3.38 3.37 2.73 2.95 2.25 2.73 2.55 3.06

Source: Adapted from American Journal of Science, Vol. 305, No. 1, Jan. 2005, p. 16 (Table 2).

TILLRATIO

2.3 Numerical Methods for Describing Quantitative Data

Numerical descriptive measures are numbers computed from a data set to help us create a mental image of its relative frequency histogram. The measures that we will present fall into three categories: (1) those that help to locate the center of the relative frequency distribution, (2) those that measure its spread, and (3) those that describe the relative position of an observation within the data set. These categories are called, respectively, measures of central tendency, measures of variation, and measures of relative standing. In the definitions that follow, we will denote the variable observed to create a data set by the symbol y and the n measurements of a data set by

.y1, y2, Á , yn

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 16

MARKED SET

2.4 Measures of Central Tendency 17

Numerical descriptive measures computed from sample data are often called stat- istics. In contrast, numerical descriptive measures of the population are called par- ameters. Their values are typically unknown and are usually represented by Greek symbols. For example, we will see that the average value of the population is repre- sented by the Greek letter µ. Although we could calculate the value of this parameter if we actually had access to the entire population, we generally wish to avoid doing so, for economic or other reasons. Thus, as you will subsequently see, we will sample the population and then use the sample statistic to infer, or make decisions about, the value of the population parameter of interest.

Definition 2.3 A statistic is a numerical descriptive measure computed from sample data.

Definition 2.4 A parameter is a numerical descriptive measure of a population.

2.4 Measures of Central Tendency

The three most common measures of central tendency are the arithmetic mean, the median, and the mode. Of the three, the arithmetic mean (or mean, as it is commonly called) is used most frequently in practice.

Definition 2.5 The arithmetic mean of a set of n measurements, , is the average of the measurements:

Typically, the symbol is used to represent the sample mean (i.e., the mean of a sample of n measurements), whereas the Greek letter µ represents the population mean.

Example 2.3 Calculate the mean for the set of sample measurements: 4, 6, 1, 2, 3.

Solution Substitution into the formula for yields

Definition 2.6 The median of a set of n measurements, , is the middle number when the measurements are arranged in ascending (or descending) order, i.e., the value of y located so that half the area under the relative frequency his- togram lies to its left and half the area lies to its right. We will use the symbol m to represent the sample median and the symbol τ to represent the population median.

If the number of measurements in a data set is odd, the median is the measure- ment that falls in the middle when the measurements are arranged in increasing order. For example, the median of the sample measurements of Example 2.3 is . If the number of measurements is even, the median is defined to be the mean of the

m = 3n = 5

y1, y2, Á , yn

y = a n

i = 1 yi

n =

4 + 6 + 1 + 2 + 3 5

= 3.2

y

n = 5

y

a n

i = 1 yi

n

y1, y2, Á , yn

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 17

MARKED SET

18 Chapter 2 Descriptive Statistics

two middle measurements when the measurements are arranged in increasing order. For example, the median of the measurements, 1, 4, 5, 8, 10, 11, is

Calculating the Median of Small Sample Data Sets

Let y(i) denote the ith value of y when the sample of n measurements is arranged in as- cending order. Then the sample median is calculated as follows:

Definition 2.7 The mode of a set of n measurements, , is the value of y that oc- curs with the greatest frequency.

If the outline of a relative frequency histogram were cut from a piece of plywood, it would be perfectly balanced over the point that locates its mean, as illustrated in Figure 2.9a. As noted in Definition 2.6, half the area under the relative frequency dis- tribution will lie to the left of the median, and half will lie to the right, as shown in Figure 2.9b. The mode will locate the point at which the greatest frequency occurs, i.e., the peak of the relative frequency distribution, as shown in Figure 2.9c.

Although the mean is often the preferred measure of central tendency, it is sensi- tive to very large or very small observations. Consequently, the mean will shift toward the direction of skewness (i.e., the tail of the distribution) and may be misleading in some situations. For example, if a data set consists of the first-year starting salaries of civil engineering graduates, the high starting salaries of a few graduates will influence

y1, y2, Á , yn

m = c y[(n + 1)/2] if n is oddy(n/2) + y(n/2 + 1) 2

if n is even

m = 5 + 8

2 = 6.5

n = 6

y

Mean (point of balance)

R el

at iv

e fr

eq u

en cya.

y

Mode (Peak point)

R el

at iv

e fr

eq u

en cyc.

y

Median

R el

at iv

e fr

eq u

en cyb.

50% of

area

50% of

area

FIGURE 2.9 Interpretations of the mean, median, and mode for a relative frequency distribution

MENDMC02_0131877062.QXD 3/24/06 5:33 PM Page 18

IRONORE

2.4 Measures of Central Tendency 19

the mean more than the median. For this reason, the median is sometimes called a resistant measure of central tendency, since it, unlike the mean, is resistant to the influence of extreme observations. For data sets that are extremely skewed, (e.g., the starting salaries of civil engineering graduates), the median would better represent the “center” of the distribution data.

Rarely is the mode the preferred measure of central tendency. The mode is pre- ferred over the mean or median only if the relative frequency of occurrence of y is of interest. For example, a supplier of carpenter’s materials would be interested in the modal length (in inches) of nails he sells.

In summary, the best measure of central tendency for a data set depends on the type of descriptive information you want. Most of the inferential statistical methods discussed in this text are based, theoretically, on mound-shaped distributions of data with little or no skewness. For these situations, the mean and the median will be, for all practical purposes, the same. Since the mean has nicer mathematical properties than the median, it is the preferred measure of central tendency for these inferential techniques.

Example 2.4 Refer to Example 2.3 and the percentage iron content measurements for the 390 iron-ore specimens. Find the mean, median, and mode for this data set. In- terpret the results.

Solution Since the data set (saved in the IRONORE file) is large, we used EXCEL to obtain these measures of central tendency. Figure 2.10 shows a portion of the EXCEL spreadsheet with the descriptive statistics. The mean, median, and more (highlighted on the printout) are 65.74%, 65.83%, and 65.81%, respectively.

The mean implies that the average content of iron ore in the sample of specimens is 65.74%; the median implies that half (or 195) of the specimens in the sample had measurements below 65.83%; and the mode indicates that the percent iron content that occurred most often in the sample was 65.81%. Note that all three measures of central tendency, when located on the histogram, Figure 2.7 (p. xxx), happen to be good indicators of the “center” of the data set.

FIGURE 2.10 EXCEL descriptive statistics for iron-ore content measurements

MENDMC02_0131877062.QXD 3/24/06 5:33 PM Page 19

20 Chapter 2 Descriptive Statistics

pollutants from motor vehicle exhaust emissions. However, these converters unintentionally increase the level of ammonia in the air. Environmental Science & Technology (Sept. 1, 2000) published a study on the ammonia levels near the exit ramp of a San Francisco highway tunnel. The data in the table represent daily ammonia concentrations (parts per million) on eight randomly se- lected days during afternoon drive-time in the summer of a recent year.

1.53 1.50 1.37 1.51 1.55 1.42 1.41 1.48

a. Find the mean daily ammonia level in air in the tunnel. b. Find the median ammonia level. c. Interpret the values obtained in parts a and b.

2.25 Process voltage readings. Find and interpret the mean, median, and mode for each of the voltage readings data sets in Exercise 2.14 (p. xxx). Which is the preferred measure of central tendency? Explain.

2.26 Sanitation inspection of cruise ships. Refer to the Centers for Disease Control study of sanitation levels for 174 in- ternational cruise ships, Exercise 2.15 (p. xxx). (Recall that sanitation scores ranged from 0 to 100.) Find and in- terpret numerical descriptive measures of central tendency for the sanitation levels.

2.27 Insecticides on orchards. Find and interpret the mean, me- dian, and mode for the oxon/thion ratio data set in Exer- cise 2.18 (p. xxx). Which is the preferred measure of central tendency? Explain.

2.21 Measures of central tendency. Find the mean, median, and mode for each of the following data sets.

a. 4, 3, 10, 8, 5 b. 9, 6, 12, 4, 4, 2, 5, 6

2.22 Radioactive lichen. Refer to the University of Alaska study to monitor the level of radioactivity in lichen, Exercise 2.11 (p. xxx). The amount of the radioactive element cesium-137 (measured in microcuries per milliliter) for each of nine lichen specimens is repeated in the table.

Location

Bethel −5.50 −5.00

Eagle Summit −4.15 −4.85

Moose Pass −6.05

Turnagain Pass −5.00

Wickersham Dome −4.10 −4.50 −4.60

Source: Lichen Radionuclide Baseline Research project, 2003.

a. Find the mean, median, and mode of the radioactivity levels.

b. Interpret the value of each measure of central tendency, part a.

2.23 Spectral images of asteroids. Find and interpret the mean, median, and mode for the data set of spectral image expo- sures of asteroids in Exercise 2.12 (p. xxx). Which is the preferred measure of central tendency? Explain.

2.24 Ammonia in car exhaust. Three-way catalytic converters have been installed in new vehicles in order to reduce

Applied Exercises

LICHEN

ASTEROIDS

GOBIANTS

ORCHARD

SHIPSANIT

VOLTAGE

AMMONIA

2.28 Mongolian desert ants. The Journal of Biogeography (Dec. 2003) published an article on the first comprehensive study of ants in Mongolia (Central Asia).

Annual Max. Daily Total Plant Number of Species Site Region Rainfall (mm) Temp. (°C) Cover (%) Ant Species Diversity Index

1 Dry Steppe 196 5.7 40 3 .89

2 Dry Steppe 196 5.7 52 3 .83

3 Dry Steppe 179 7.0 40 52 1.31

4 Dry Steppe 197 8.0 43 7 1.48

5 Dry Steppe 149 8.5 27 5 .97

6 Gobi Desert 112 10.7 30 49 .46

7 Gobi Desert 125 11.4 16 5 1.23

8 Gobi Desert 99 10.9 30 4

9 Gobi Desert 125 11.4 56 4 .76

10 Gobi Desert 84 11.4 22 5 1.26

11 Gobi Desert 115 11.4 14 4 .69

Source: Pfeiffer, M., et al., “Community Organization and Species Richness of Ants in Mongolia Along an Ecological Gradient from Steppe to Gobi Desert.” Journal of Biogeography, Vol. 30, No. 12, Dec. 2003 (Tables 1 and 2).

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 20

MARKED SET

2.5 Measures of Variation 21

LEFTEYE

Botanists placed seed baits at 11 study sites and observed the ant species attracted to each site. Some of the data recorded at each study site are provided in the table at the bottom of the page.

a. Find the mean, median, and mode for the number of ant species discovered at the 11 sites. Interpret each of these values.

b. Which measure of central tendency would you recom- mend to describe the center of the number of ant species distribution? Explain.

c. Find the mean, median, and mode for the total plant cover percentage at the 5 Dry Steppe sites only.

d. Find the mean, median, and mode for the total plant cover percentage at the 6 Gobi Desert sites only.

e. Based on the results, parts c and d, does the center of the total plant cover percentage distribution appear to be different at the two regions?

2.29 Eye refractive study. The conventional method of measur- ing the refractive status of an eye involves three quanti- ties: (1) sphere power, (2) cylinder power, and (3) axis. Optometric researchers studied the variation in these three measures of refraction. (Optometry and Vision Science, June 1995.) Twenty-five successive refractive measure- ments were obtained on the eyes of over 100 university students. The cylinder power measurements for the left eye of one particular student (ID #11) are listed in the table. (Note: All measurements are negative values.)

.08 .08 1.07 .09 .16 .04 .07 .17 .11

.06 .12 .17 .20 .12 .17 .09 .07 .16

.15 .16 .09 .06 .10 .21 .06

Source: Rubin, A., and Harris, W. F., “Refractive Variation During Autorefraction: Multivariate Distribution of Refractive Status.” Optometry and Vision Science, Vol. 72, No. 6, June 1995, p. 409 (Table 4).

a. Find measures of central tendency for the data and in- terpret their values.

b. Note that the data contains one unusually large (nega- tive) cylinder power measurement relative to the other measurements in the data set. Find this measurement. (In Section 2.7, we call this value an outlier).

c. Delete the outlier, part b, from the data set and recalcu- late the measures of central tendency. Which measure is most affected by the deletion of the outlier?

2.30 Active nuclear power plants. The U.S. Energy Information Administration monitors all nuclear power plants operating in the United States. The table lists the number of active nu- clear power plants operating in each of a sample of 20 states.

a. Find the mean, median, and mode of this data set. b. Eliminate the largest value from the data set and repeat

part a. What effect does dropping this measurement have on the measures of central tendency found in part a?

c. Arrange the 20 values in the table from lowest to highest. Next, eliminate the lowest two values and the highest two values from the data set and find the mean of the remain- ing data values. The result is called a 10% trimmed mean, since it is calculated after removing the highest 10% and the lowest 10% of the data values. What advantages does a trimmed mean have over the regular arithmetic mean?

State Number of Power Plants

Alabama 5

Arizona 3

California 4

Florida 5

Georgia 4

Illinois 13

Kansas 1

Louisiana 2

Massachusetts 1

Mississippi 1

New Hampshire 1

New York 6

North Carolina 5

Ohio 2

Pennsylvania 9

South Carolina 7

Tennessee 3

Texas 4

Vermont 1

Wisconsin 3

Source: Statistical Abstract of the United States, 2000 (Table 966). U.S. Energy Information Administration, Electric Power Annual.

NUCLEAR

2.5 Measures of Variation

The most commonly used measures of data variation are the range, the variance, and the standard deviation.

Definition 2.8 The range is equal to the difference between the largest and the smallest meas- urements in a data set:

Range = Largest measurement - Smallest measurement

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 21

MARKED SET

22 Chapter 2 Descriptive Statistics

Definition 2.9 The variance of a sample of n measurements, , is defined to be

The population variance is defined to be

for a finite population with n measurements.

Definition 2.10 The standard deviation of a sample of n measurements is equal to the square root of the variance:

The population standard deviation is

Example 2.5 Find the variance and standard deviation for the sample observations: 1, 3, 2, 2, 4.

Solution We must first calculate and :

Then the sample variance is

and the sample standard deviation is

It is possible that two different data sets could possess the same range but differ greatly in the amount of variation in the data. Consequently, the range is a relatively insensitive measure of data variation. It is used primarily in industrial quality control where the inferential procedures are based on small samples (i.e., small values of n). The variance has theoretical significance but is difficult to interpret since the units of measurement on the variable y of interest are squared (e.g., feet2, ppm2, etc.). The units of measurement on the standard deviation, however, are the same as the units on

s = 2s2 = 21.3 = 1.1402 s2 =

a n

i = 1 1yi - y22 n - 1

= a n

i = 1 y2i -

aa n

i = 1 yib

2

n

n - 1 =

34 - 11222

5

4 = 1.3

a n

i = 1 y2i = (1)2 + (3)2 + (2)2 + (2)2 + (4)2 = 12 a

n

i = 1 yi = 1 + 3 + 2 + 2 + 4 = 12

a n

i = 1 yi

2 a n

i = 1 yi

n = 5

s = 2s2 s = 2s2 = Ta

n

i = 1 1yi - y22 n - 1

s2 = a n

i = 1 1yi - m22

n

s2 = a n

i = 1 1yi - y22 n - 1

= a n

i = 1 yi

2 - ¢an

i = 1 yi≤2

n

n - 1

y1, y2, Á , yn

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 22

MARKED SET

2.5 Measures of Variation 23

y (e.g., feet, ppm). When combined with the mean of the data set, the standard devia- tion is easily interpreted.

Two useful rules for interpreting the standard deviation are the Empirical Rule and Chebyshev’s Rule.

The Empirical Rule

If a data set has an approximately mound-shaped, symmetric distribution, then the fol- lowing rules of thumb may be used to describe the data set:

1. Approximately 68% of the measurements will lie within 1 standard deviation of their mean (i.e., within the interval for samples and for populations).

2. Approximately 95% of the measurements will lie within 2 standard deviations of their mean (i.e., within the interval for samples and for populations).

3. Almost all the measurements will lie within 3 standard deviations of their mean (i.e., within the interval for samples and for populations).

Chebyshev’s Rule

Chebyshev’s Rule applies to any data set, regardless of the shape of the frequency distribution of the data.

a. It is possible that very few of the measurements will fall within 1 standard devia- tion of the mean, i.e., within the interval ( ) for samples and ( ) for populations.

b. At least of the measurements will fall within 2 standard deviations of the mean, i.e., within the interval ( ) for samples and ( ) for populations.

c. At least of the measurements will fall within 3 standard deviations of the mean, i.e., within the interval ( ) for samples and ( ) for populations.

d. Generally, for any number k greater than 1, at least ( ) of the measure- ments will fall within k standard deviations of the mean, i.e., within the interval ( ) for samples and ( ) for populations.

The Empirical Rule is the result of the practical experience of researchers in many fields who have observed many different types of real-life data sets. Chebyshev’s Rule is derived from a theorem proved by the Russian mathematician Pafnuty L. Chebyshev (1821–1894). Both rules, described in the boxes, give the per- centage of measurements in a data set that fall in the interval , where k is any integer.

Example 2.6 Refer to Example 2.4 (p. xxx) and the data on percent iron content of iron-ore specimens. Use a rule of thumb to describe the distribution of iron content measurements. In particular, estimate the number of the 390 iron-ore speci- mens that have iron content measurements that fall within 2 standard devia- tions of the mean.

Solution From the EXCEL printout, Figure 2.10 (p. xxx), we found the sample mean . The figure also shows the standard deviation as . Using

these values, we form the intervals , , and . Applying both the Em- pirical Rule and Chebyshev’s Rule, we can estimate the proportions of the 390 iron content measurements to fall within the intervals. These proportions are given in Table 2.4.

y ; 3sy ; 2sy ; s s = .69%y = 65.74%

y ; ks

m ; ksy ; ks

1 - 1>k2 m ; 3sy ; 3s

8 9

m ; 2sy ; 2s 3 4

m ; sy ; s

m ; 3sy ; 3s

m ; 2sy ; 2s

m ; sy ; s

IRONORE

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 23

MARKED SET

You can see that, for each of the three intervals, the actual proportion of the iron-ore specimens that have iron measurements in the interval is very close

to that approximated by the Empirical Rule. Such a result is expected since the rela- tive frequency histogram of the 390 measurements (shown in Figure 2.8, p. xxx) is mound-shaped and nearly symmetric. Although it can be applied to any data set, Chebyshev’s Rule tends to be conservative, providing a lower bound on the percent- age of measurements that fall in the interval. Consequently, our best estimate of the percentage of iron content measurements that fall within 2 standard deviations of the mean is obtained using the Empirical Rule—namely, approximately 95%.

Since many data sets encountered in engineering and the sciences are approxi- mately mound-shaped, scientists often apply the Empirical Rule to estimate a range where most of the measurements fall. The interval is typically selected since it captures about 95% of the data.

y ; 2s

n = 390

24 Chapter 2 Descriptive Statistics

TABLE 2.4 Applying Rules of Thumb to the 390 Iron Content Measurements

Expected Proportion using Expected Proportion Actual

k Empirical Rule using Chebyshev’s Rule Proportion

1 (65.05, 66.43) at least 0 .744

2 (64.36, 67.12) at least .75 .947

3 (63.67, 67.81) at least .889 .980L 1.00

L .95

L .68

y ; ks

Applied Exercises c. Eliminate the smallest and largest value from the data

set and repeat part a. What effect does dropping both of these measurements have on the measures of variation found in part a?

2.33 Level of nicotine in cigarette smoke. Periodically, the Feder- al Trade Commission (FTC) ranks domestic cigarette brands according to tar, nicotine, and carbon monoxide content. The test results are obtained by using a sequential smoking machine to “smoke” cigarettes to a 23-millimeter butt length. The tar, nicotine, and carbon monoxide concen- trations (rounded to the nearest milligram) in the residual “dry” particulate matter of the smoke are then measured. The nicotine levels of 500 cigarette brands recently tested by the FTC are saved in the FTC file. The accompanying SAS printouts describe the nicotine distribution.

a. Examine the relative frequency histogram for nicotine level. Use an appropriate rule of thumb to describe the data set.

b. Locate the mean and standard deviation of the nicotine levels on the printout, then compute the interval .

c. Based on your answer to part a, estimate the percentage of cigarettes with nicotine levels in the interval formed in part b.

d. Use the information in the SAS histogram to determine the actual percentage of nicotine levels that fall within the interval formed in part b. Does your answer agree with your estimate of part c?

y ; 2s

2.31 Ammonia in car exhaust. Refer to the Environmental Sci- ence & Technology (Sept. 1, 2000) study on the ammonia levels near the exit ramp of a San Francisco highway tun- nel, Exercise 2.24 (p. xxx). The data (in parts per million) for 8 days during afternoon drive-time are reproduced in the table.

1.53 1.50 1.37 1.51 1.55 1.42 1.41 1.48

a. Find the range of the ammonia levels. b. Find the variance of the ammonia levels. c. Find the standard deviation of the ammonia levels. d. Suppose the standard deviation of the daily ammonia

levels during morning drive-time at the exit ramp is 1.45 ppm. Which time, morning or afternoon drive- time, has more variable ammonia levels?

2.32 Active nuclear power plants. Refer to Exercise 2.30 (p. xxx) and the U.S. Energy Information Administration’s data on the number of nuclear power plants operating in each of 20 states. The data are saved in the NUCLEAR file.

a. Find the range, variance, and standard deviation of this data set.

b. Eliminate the largest value from the data set and repeat part a. What effect does dropping this measurement have on the measures of variation found in part a?

AMMONIA

NUCLEAR

FTC

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 24

MARKED SET

2.34 Software defects. Refer to Exercise 2.8 (p. xxx) and the PROMISE Software Engineering Repository data set that contains in- formation on 498 modules of software code. One possible predictor of whether a module of code contains defects is the number of lines of code. The MINITAB printout given here shows summary statistics for number of lines of code for modules that con- tain defects and modules that do not. Use the means and standard deviations to compare the distributions of lines of code for de- fective (“true”) and nondefective (“false”) modules.

2.5 Measures of Variation 25

SWDEFECTS

SAS output for Exercise 2.33

Variable defect N Mean StDev Minimum Q1 Q3Median Maximum MLOC false

true 449 49

26.17 61.51

37.64 67.49

1.10 1.00

8.00 21.00

15.00 41.00

29.00 85.50

423.00 411.00

Descriptive Statistics: MLOC

MINITAB output for Exercise 2.34

2.35 Groundwater contamination in wells. Refer to the Environmental Science & Technology (Jan. 2005) study of the MTBE contamina- tion in New Hampshire wells, Exercise 2.10 (p. xxx). Consider only the data for those wells with detectable levels of MTBE. The accompanying MINITAB printout gives summary statistics for MTBE levels (micrograms per liter) of public and private wells.

MTBE

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 25

MARKED SET

26 Chapter 2 Descriptive Statistics

MINITAB output for Exercise 2.35

2.36 Monitoring impedance to leg movements. In an experi- ment to monitor the impedance to leg movement, Korean engineers attached electrodes to the ankles and knees of volunteers. Of interest was the signal-to-noise ratio (SNR) of impedance changes, where the signal is the magnitude of the leg movement and noise is the impedance change resulting from interferences such as knee flexes and hip extensions. For a particular ankle–knee electrode pair, a sample of 10 volunteers had SNR values with a mean of 19.5 and a standard deviation of 4.7. (IEICE Transactions on Information & Systems, Jan. 2005.) Assuming the dis- tribution of SNR values in the population is mound- shaped and symmetric, give an interval that contains about 95% of all SNR values in the population. Would you ex- pect to observe an SNR value of 30?

2.37 Bearing strength of concrete FRP strips. Fiber-reinforced polymer (FRP) composite materials are the standard for strengthening, retrofitting, and repairing concrete struc- tures. Typically, FRP strips are fastened to the concrete with epoxy adhesive. Engineers at the University of Wisconsin–Madison have developed a new method of fas- tening the FRP strips using mechanical anchors. (Composites Fabrication Magazine, Sept. 2004.) To eval- uate the new fastening method, 10 specimens of pultruded FRP strips mechanically fastened to highway bridges were tested for bearing strength. The strength measurements (recorded in mega Pascal units, Mpa) are shown in the table. Use the sample data to give an interval that is likely to contain the bearing strength of a pultruded FRP strip.

240.9 248.8 215.7 233.6 231.4 230.9 225.3 247.3 235.5 238.0

Source: Data are simulated from summary information provided in Composites Fabrication Magazine, Sept. 2004, p. 32 (Table 1).

2.38 Velocity of Winchester bullets. The American Rifleman (June 1993) reported on the velocity of ammunition fired from the FEG P9R pistol, a 9-mm gun manufactured in Hungary. Field tests revealed that Winchester bullets fired from the pistol had a mean velocity (at 15 feet) of 936 feet per second and a standard deviation of 10 feet per second.

Tests were also conducted with Uzi and Black Hills ammunition.

a. Describe the velocity distribution of Winchester bullets fired from the FEG P9R pistol.

b. A bullet, brand unknown, is fired from the FEG P9R pistol. Suppose the velocity (at 15 feet) of the bullet is 1000 feet per second. Is the bullet likely to be manufac- tured by Winchester? Explain.

2.39 Speed of light from galaxies. Astronomers theorize that cold dark matter (CDM) caused the formation of galaxies and clusters of galaxies in the universe. The theoretical CDM model requires an estimate of the velocity of light emitted from the galaxy cluster. The Astronomical Journal (July 1995) published a study of observed velocities for galaxies in four different galaxy clusters. Galaxy velocity was measured in kilometers per second (km/s) using a spectrograph and high-power telescope.

a. The observed velocities of 103 galaxies located in the cluster named A2142 are summarized in the accompa- nying histogram. Comment on whether the Empirical Rule is applicable for describing the velocity distribu- tion for this cluster.

a. Find an interval that will contain most (about 95%) of the MTBE values for private New Hampshire wells. b. Find an interval that will contain most (about 95%) of the MTBE values for public New Hampshire wells.

FRP

A2142 Velocity distribution 15

10

5

0

N u

m b

er o

f ga

la xi

es

302520 Observed velocity (/1000 km/sec)

Source: Oegerle, W.R., Hill, J. M., and Fitchett, M. J., “Observations of high dispersion clusters of galaxies: Constraints on cold dark matter.” The Astronomical Journal, Vol. 110, No. 1, July 1995, p. 37 (Figure 1).

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 26

MARKED SET

ASTEROIDS

2.6 Measures of Relative Standing 27

b. The mean and standard deviation of the 103 velocities observed in galaxy cluster A2142 were reported as

and , respectively. Use this information to construct an interval that captures approximately 95% of the galaxy velocities in the cluster.

c. Recommend a single velocity value to be used in the CDM model for galaxy cluster A2142. Explain your reasoning.

2.40 Spectral images of asteroids. The asteroid data of Exer- cise 2.12 are reproduced here, followed by a MINITAB printout giving descriptive statistics.

s = 1280km/sx = 27,117 km/s

3 4 3 3 1 4 1 3 2 3

1 1 4 2 3 3 2 6 1 1

3 3 2 2 2 2 1 3 2 1

6 1 3 2 2 1 2 2 4 2

Source: Binzel, R. P., and Xu, S. “Chips off of Asteroid 4 Vesta: Evidence for the parent body of basaltic achondrite meteorites.” Science, Vol. 260, Apr. 3, 1993, p. 187 (Table 1).

a. Locate and s on the printout. b. Construct the intervals , , and . c. Count the number of observations that fall within each

interval. d. Which rule of thumb, the Empirical Rule or Cheby-

shev’s Rule, is best for describing the data?

y ; 3sy ; 2sy ; s y

MINITAB printout for Exercise 2.40

2.6 Measures of Relative Standing

Test scores and some types of sociological and health data are often reported in a manner that describes the location of an observation relative to the other scores in the distribution. Two measures of the relative standing of an observation are percentiles and z-scores.

Definition 2.11 The 100 pth percentile of a data set is a value of y located so that 100p% of the area under the relative frequency distribution for the data lies to the left of the 100pth percentile and of the area lies to its right. (Note:

.)

For example, if your grade in an industrial engineering class was located at the 84th per- centile, then 84% of the grades were lower than your grade and 16% were higher.

The median is the 50th percentile. The 25th percentile, the median, and the 75th percentile are called the lower quartile, the midquartile, and the upper quartile, re- spectively, for a data set.

Definition 2.12 The lower quartile, QL, for a data set is the 25th percentile.

Definition 2.13 The midquartile (or median), m, for a data set is the 50th percentile.

Definition 2.14 The upper quartile, QU, for a data set is the 75th percentile.

For large data sets (e.g., populations), quartiles are found by locating the corre- sponding areas under the curve (relative frequency distribution). However, when the data set of interest is small, it may be impossible to find a measurement in the data set that exceeds, say, exactly 25% of the remaining measurements. Consequently, the 25th percentile (or lower quartile) for the data set is not well defined. The following box contains a few rules for finding quartiles and other percentiles with small data sets.

0 … p … 1 10011 - p2%

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 27

MARKED SET

28 Chapter 2 Descriptive Statistics

Finding Quartiles (and Percentiles) with Small Data Sets

Step 1 Rank the measurements in the data set in increasing order of magnitude. Let represent the ranked measurements.

Step 2 Calculate the quantity and round to the nearest integer. The measurement with this rank, denoted y(l), represents the lower quartile or 25th percentile. [Note: If falls halfway between two inte- gers, round up.]

Step 3 Calculate the quantity and round to the nearest integer. The measurement with this rank, denoted y(u), represents the upper quartile or 75th percentile. [Note: If falls halfway between two inte- gers, round down.]

General To find the pth percentile, calculate the quantity and round to the nearest integer. The measurement with this rank, denoted y(i), is the pth percentile.

Example 2.7 Freckles are defects that sometimes form during the solidification of alloy in- gots. A freckle index has been developed to measure the level of freckling on the ingot. A team of engineers conducted several experiments to measure the freckle index of a certain type of superalloy ( Journal of Metallurgy, Sept. 2004). The data for alloy tests is shown in Table 2.5. Create a stem-and-leaf dis- play for the data and use it to find the lower quartile for the 18 freckle indexes.

Solution The data of Table 2.5 are saved in the FRECKLE file. A MINITAB stem-and-leaf dis- play for the data is shown in Figure 2.11. We’ll use this graph to help find the lower quartile for the data set.

From the box, the lower quartile QL is the observation y(/) when the data are arranged in increasing order, where . Since , . Rounding up, we obtain . Thus, the lower quartile, QL, will be the fifth observa- tion when the data are arranged in order from smallest to largest, i.e., . For small data sets, a stem-and-leaf display is useful for finding quartiles and percentiles. You can see that the fifth observation is the fifth leaf in stem row 0. This value corre- sponds to a freckle index of 4.1. Thus, for this small data set, .QL = 4.1

QL = y152 / = 5

/141192 = 4.75n = 18/ = 141n + 12

n = 18

i = p1n + 12>100 u = 341n + 12

u = 341n + 12 / = 141n + 12

/ = 141n + 12 y112, y122, Á , y1n2

TABLE 2.5 Freckle Indexes for 18 Superalloys

30.1 22.0 14.6 16.4 12.0 2.4 22.2 10.0 15.1

12.6 6.8 4.1 2.5 1.4 33.4 16.8 8.1 3.2

Source: Yang, W. H., et al., “A Freckle Criterion for the Solidification of Superalloys with a Tilted Solidification Front,” Journal of Metallurgy, Vol. 56, No. 9, Sept. 2004 (Table IV).

FIGURE 2.11 MINITAB stem-and-leaf display for freckle index of alloys

FRECKLE

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 28

MARKED SET

2.6 Measures of Relative Standing 29

Another useful measure of relative standing is a z-score. By definition, a z-score describes the location of an observation y relative to the mean in units of the standard deviation. Negative z-scores indicate that the observation lies to the left of the mean; positive z-scores indicate that the observation lies to the right of the mean. Also, we know from the Empirical Rule that most of the observations in a data set will be less than 2 standard deviations from the mean (i.e., will have z-scores less than 2 in ab- solute value) and almost all will be within 3 standard deviations of the mean (i.e., will have z-scores less than 3 in absolute value).

Definition 2.15 The z-score for a value y of a data set is the distance that y lies above or below the mean, measured in units of the standard deviation:

Example 2.8 Refer to Example 2.6 and the data on percentage iron content for 390 iron-ore specimens. Find and interpret the z-score for the measurement of 66.56%.

Solution Recall that the mean and standard deviation of the sample data (shown in Figure 2.10) are and . Substituting into the formula for z, we obtain

Since the z-score is positive, we conclude that the iron content value of 66.56% lies a distance of 1.19 standard deviations above (to the right of) the sample mean of 65.74%.

z = 1y - y2>s = 166.56 - 65.742>.69 = 1.19 y = 66.56s = .69y = 65.74

Population z - score: z = y - m s

Sample z - score: z = y - y

s

Applied Exercises 2.41 Phosphorous standards in the Everglades. A key pollutant

of the Florida Everglades is total phosphorous (TP). Chance (Summer 2003) reported on a study to establish standards for TP water quality in the Everglades. The Florida Department of Environmental Protection (DEP) collected data on TP concentrations at 28 Everglades sites. The 75th percentile of the TP distribution was found to be 10 micrograms per liter. The DEP recommended this value be used as a TP standard for the Everglades; i.e., any site with a TP reading exceeding 10 micrograms per liter would be considered unsafe. Interpret this 75th percentile value. Give a reason why it was selected as a TP standard by the DEP.

2.42 Spectral images of asteroids. Refer to the asteroid data given in Exercises 2.12 and 2.40.

a. Find the lower and upper quartiles for the data set. In- terpret these values.

b. Find the z-score for an asteroid observation with six in- dependent spectral image exposures. Interpret this value.

2.43 Cyanide in soil. Environmental Science & Technology (Oct. 1993) reported on a study of contaminated soil in The Netherlands. A total of 72 400-gram soil specimens were sampled, dried, and analyzed for the contaminant cyanide. The cyanide concentration (milligrams per kilogram of soil) of each soil specimen was determined using an in- frared microscopic method. The cyanide concentration levels are summarized in the accompanying table. Fully interpret the descriptive statistics shown in the table.

Summary Statistics for Cyanide Levels in Soil (mg/kg)

Sample size 72 Lower quartile 13.8

Mean 84.0 Upper quartile 88.5

Variance 6,400 Minimum 2.0

Median 28.8 Maximum 3,320.0

Source: Lamé, F. P. J., and Defize, P. R., “Sampling of contaminated soil: Sampling error in relation to sample size and segregation.” Environmental Science & Technology, Vol. 27, No. 10, Oct. 1993, p. 2039 (Table II).

ASTEROIDS

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 29

MARKED SET

homes in my subdivision. The 90th percentile of the study sample had a lead concentration of .00372 mg/L. Are water customers in my subdivision at risk of drinking water with unhealthy lead levels? Explain.

2.47 Process Voltage readings. Refer to the Harris Corporation study on voltage readings at two locations, Exercise 2.14 (p. xxx).

a. Calculate the z-score for a voltage reading of 10.50 at the old location.

b. Calculate the z-score for a voltage reading of 10.50 at the new location.

c. Based on the results of parts a and b, at which location is a voltage reading of 10.50 more likely to occur? Explain.

2.48 Eye refractive study. Refer to the Optometry and Vision Science (June 1995) study of refractive variation in eyes, Exercise 2.29 (p. xxx). The 25 cylinder power measure- ments are saved in the LEFTEYE file.

a. Find the 10th percentile of cylinder power measure- ments. Interpret the result.

b. Find the 95th percentile of cylinder power measure- ments. Interpret the result.

c. Calculate the z-score for the cylinder power measure- ment of 1.07. Interpret the result.

2.44 Extinct New Zealand birds. Refer to the Evolutionary Ecology Research (July 2003) study of the patterns of ex- tinction in the New Zealand bird population. Exercise 2.9 (p. xxx). Consider the data on the egg length (measured in millimeters) for the 132 bird species saved in the NZBIRDS file.

a. Find the 10th percentile for the egg length distribution and interpret its value.

b. The Moas, P. australis bird species has an egg length of 205 millimeters. Find the z-score for this species of bird and interpret its value.

2.45 Sanitation inspection of cruise ships. Refer to the sanita- tion levels of cruise ships, Exercise 2.15 (p. xxx), saved in the SHIPSANIT file.

a. Give a measure of relative standing for the Nautilus Explorer’s score of 74. Interpret the result.

b. Give a measure of relative standing for the Rotterdam’s score of 93. Interpret the result.

2.46 Lead in drinking water. The US. Environmental Protection Agency (EPA) sets a limit on the amount of lead permitted in drinking water. The EPA Action Level for lead is .015 milligrams per liter (mg/L) of water. Under EPA guide- lines, if 90% of a water system’s study samples have a lead concentration less than .015 mg/L, the water is con- sidered safe for drinking. I (co-author Sincich) received a report on a study of lead levels in the drinking water of

LEFTEYE

VOLTAGE

SHIPSANIT

NZBIRDS

30 Chapter 2 Descriptive Statistics

2.7 Methods for Detecting Outliers

Sometimes inconsistent observations are included in a data set. For example, when we discuss starting salaries for college graduates with bachelor’s degrees, we generally think of traditional college graduates—those near 22 years of age with 4 years of college education. But suppose one of the graduates is a 34-year-old PhD chemical engineer who has returned to the university to obtain a bachelor’s degree in metallur- gy. Clearly, the starting salary for this graduate could be much larger than the other starting salaries because of the graduate’s additional education and experience, and we probably would not want to include it in the data set. Such an errant observation, which lies outside the range of the data values that we want to describe, is called an outlier.

Definition 2.16 An observation y that is unusually large or small relative to the other values in a data set is called an outlier. Outliers typically are attributable to one of the fol- lowing causes:

1. The measurement is observed, recorded, or entered into the computer incorrectly.

2. The measurement comes from a different population. 3. The measurement is correct, but represents a rare (chance) event.

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 30

MARKED SET

2.7 Methods for Detecting Outliers 31

The most obvious method for determining whether an observation is an outlier is to calculate its z-score (Section 2.6).

Example 2.9 Refer to the sample data on 45 energy-related accidents worldwide since 1977 that resulted in multiple fatalities. (The data are saved in the ENERGY file.) In ad- dition to the cause of the fatal energy-related accident, the data set also con- tains information on the number of fatalities for each accident. The first observation in the data set is a dam failure accident that occurred in India in 1979, killing 2500 people. Is this observation an outlier?

Solution Descriptive statistics on the number of fatalities for the 45 energy-related accidents are displayed in the SPSS printout, Figure 2.12. The mean and standard deviation, highlighted on the printout, are and . Consequently, the z-score for the observation with a number of fatalities of is

The Empirical Rule states that almost all the observations in a data set will have z- scores less than 3 in absolute value, while Chebyshev’s Rule guarantees that at most (or, 11%) will have z-scores greater than 3 in absolute value. Since a z-score as large as 5.67 is rare, the measurement is called an outlier. Although this value was correctly recorded, the 1979 accident was attributed to heavy flooding in India, causing one of the first hydroelectric dams in the country to collapse.

Another procedure for detecting outliers is to construct a box plot of the sample data. With this method, we construct intervals similar to the and inter- vals of the Empirical Rule; however, the intervals are based on a quantity called the interquartile range instead of the standard deviation s.

Definition 2.17 The interquartile range, IQR, is the distance between the upper and lower quartiles:

The intervals [ , ] and [ , ] are the key to detecting outliers with a box plot.

The elements of a box plot are listed in the next box. A box plot is relatively easy to construct for small data sets because the quartiles and interquartile range can be quickly determined. However, since almost all statistical software include box plot routines, we’ll use the computer to construct a box plot.

Elements of a Box Plot 1. A rectangle (the box) is drawn with the ends (the hinges) drawn at the lower and

upper quartiles (QL and QU). The median of the data is shown in the box, usually by a line.

QU + 3(IQR)QL - 31IQR2QU + 1.51IQR2QL - 1.51IQR2 IQR = QU - QL

y ; 3sy ; 2s

y = 2500

1 9

z = 1y - y2>s = 12500 - 243.32>398.2 = 5.67 y = 2500

s = 398.2y = 243.3

FIGURE 2.12 SPSS descriptive statistics for number of energy-related fatal accidents

ENERGY

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 31

MARKED SET

32 Chapter 2 Descriptive Statistics

2. The points at distances 1.5(IQR) from each hinge mark the inner fences of the data set. Lines (the whiskers) are drawn from each hinge to the most extreme measurement inside the inner fence.

3. A second pair of fences, the outer fences, appear at a distance of 3 interquartile ranges, 3(IQR), from the hinges. One symbol (e.g., “∗”) is used to represent measurements falling between the inner and outer fences, and another (e.g., “0”) is used to represent measurements beyond the outer fences. Thus, outer fences are not shown unless one or more measurements lie beyond them.

4. The symbols used to represent the median and the extreme data points (those be- yond the fences) will vary depending on the software you use to construct the box plot. (You may use your own symbols if you are constructing a box plot by hand.) You should consult the program’s documentation to determine exactly which symbols are used.

Aids to the Interpretation of Box Plots 1. Examine the length of the box. The IQR is a measure of the sample’s variability

and is especially useful for the comparison of two samples.

2. Visually compare the lengths of the whiskers. If one is clearly longer, the distri- bution of the data is probably skewed in the direction of the longer whisker.

3. Analyze any measurements that lie beyond the fences. Fewer than 5% should fall beyond the inner fences, even for very skewed distributions. Measurements beyond the outer fences are probably outliers, with one of the following explanations:

a. The measurement is incorrect. It may have been observed, recorded, or entered into the computer incorrectly.

b. The measurement belongs to a population different from the population that the rest of the sample was drawn from.

c. The measurement is correct and from the same population as the rest. Gener- ally, we accept this explanation only after carefully ruling out all others.

Example 2.10 Refer to Example 2.9 (p. xxx) and the data on number of fatalities for the 45 energy-related accidents saved in the ENERGY file. Construct a box plot for the data and use it to identify any outliers.

Solution We used MINITAB to form a box plot for the fatalities data. The box plot, as well as some descriptive statistics, is shown in Figure 2.13.

From the printout, the lower and upper quartiles are and , re- spectively. These values form the edges of the box in Figure 2.13. (The median,

, is shown inside the box with a horizontal line.) The interquartile range, , is used to form the fences and whiskers of the

box plot. Several highly suspect outliers (identified by asterisks) are shown on Figure 2.13. There appear to be two outliers with values of around 500 fatalities, one with about 1000 fatalities, and one with 2500 fatalities. (Note: The largest outlier is the observation identified in Example 2.9.)

IQR = QU - QL = 253 - 69 = 184 m = 114

QU = 253QL = 69

Upper outer fence = QU + 31IQR2 Lower outer fence = QL - 31IQR2

Upper inner fence = QU + 1.51IQR2 Lower inner fence = QL - 1.51IQR2

ENERGY

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 32

MARKED SET

2.7 Methods for Detecting Outliers 33

The z-score and box plot methods both establish rule-of-thumb limits outside of which a y value is deemed to be an outlier. Usually, the two methods produce similar results. However, the presence of one or more outliers in a data set can inflate the value of s used to calculate the z-score. Consequently, it will be less likely that an er- rant observation would have a z-score larger than 3 in absolute value. In contrast, the values of the quartiles used to calculate the fences for a box plot are not affected by the presence of outliers.

Rules of Thumb for Detecting Outliers* Suspect Outliers Highly Suspect Outliers

Box Plots: Data points between inner and Data points beyond outer fences outer fences

z-Scores: ƒ z ƒ 6 32 6 ƒ z ƒ 6 3

FIGURE 2.13 MINITAB descriptive statistics and box plot for number of energy related fatalities

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 33

MARKED SET

2.49 Barium content of clinkers. Paving bricks—called clinkers—were examined for trace elements in order to determine the origin (e.g., factory) of the clinker. (Advances in Cement Research, Jan. 2004.) The barium content (mg/kg) for each in a sample of 200 clinkers was measured, yielding the following summary statistics:

, , and .

a. Interpret the value of the median, m. b. Interpret the value of the lower quartile, QL. c. Interpret the value of the upper quartile, QU. d. Find the interquartile range, IQR. e. Find the endpoints of the inner fence in a box plot for

Barium content. f. The researchers found no clinkers with a Barium con-

tent beyond the boundaries of the inner fences. What does this imply?

2.50 Spectral images of asteroids. Refer to the asteroid data given in Exercises 2.12 and 2.40 (p. xxx and p. xxx).

a. Construct a box plot for the data. Do you detect any outliers?

b. Use the method of z-scores to detect outliers.

2.51 Process voltage readings. Refer to the voltage reading data supplied in Exercise 2.14 (p. xxx).

a. Construct a box plot for the data at the old location. Do you detect any outliers?

b. Use the method of z-scores to detect outliers at the old location.

c. Construct a box plot for the data at the new location. Do you detect any outliers?

d. Use the method of z-scores to detect outliers at the new location.

e. Compare the distributions of voltage readings at the two locations by placing the box plots, parts a and c, side by side vertically.

2.52 Sanitation inspection of cruise ships. Refer to the data on sanitation levels of cruise ships, Exercise 2.15 (p. xxx).

a. Use the box plot method to detect any outliers in the data. b. Use the z-score method to detect any outliers in the data. c. Do the two methods agree? If not, explain why.

2.53 Zinc phosphide in sugarcane. A chemical company pro- duces a substance composed of 98% cracked corn particles and 2% zinc phosphide for use in controlling rat popula- tions in sugarcane fields. Production must be carefully con- trolled to maintain the 2% zinc phosphide because too much zinc phosphide will cause damage to the sugarcane

QU = 260m = 170QL = 115

SHIPSANIT

VOLTAGE

ASTEROIDS

34 Chapter 2 Descriptive Statistics

Applied Exercises and too little will be ineffective in controlling the rat popu- lation. Records from past production indicate that the dis- tribution of the actual percentage of zinc phosphide present in the substance is approximately mound-shaped, with a mean of 2.0% and a standard deviation of .08%. Suppose one batch chosen randomly actually contains 1.80% zinc phosphide. Does this indicate that there is too little zinc phosphide in today’s production? Explain your reasoning.

2.54 Sensor motion of a robot. Researchers at Carnegie Mellon University developed an algorithm for estimating the sen- sor motion of a robotic arm by mounting a camera with in- ertia sensors on the arm. (The International Journal of Robotics Research, Dec. 2004.) Two variables of interest were the error of estimating arm rotation (measured in radians) and the error of estimating arm translation (meas- ured in centimeters). Data for 11 experiments are listed in the table. In each experiment, the perturbation of camera intrinsics and projections were varied.

Trial Perturbed Perturbed Rotation Error Translation Intrinsics Projections (radians) Error (cm)

1 No No .0000034 .0000033

2 Yes No .032 1.0

3 Yes No .030 1.3

4 Yes No .094 3.0

5 Yes No .046 1.5

6 Yes No .028 1.3

7 No Yes .27 22.9

8 No Yes .19 21.0

9 No Yes .42 34.4

10 No Yes .57 29.8

11 No Yes .32 17.7

Source: Strelow, D., and Singh, S., “Motion Estimation Form Image and Inertial Measurements.” The International Journal of Robotics Research, Vol. 23, No. 12, Dec. 2004 (Table 4).

a. Find and s for translation errors in trials with per- turbed intrinsics but no perturbed projections.

b. Find and s for translation errors in trials with per- turbed projections but no perturbed intrinsics.

c. A trial resulted in a translation error of 4.5 cm. Is this value an outlier for trials with perturbed intrinsics but no perturbed projections? For trials with perturbed pro- jections but no perturbed intrinsics? What type of cam- era perturbation most likely occurred for this trial?

y

y

2.8 Distorting the Truth with Descriptive Statistics

A picture may be “worth a thousand words,” but pictures can also color messages or distort them. In fact, the pictures in statistics—histograms, bar charts, and other graphical descriptions—are susceptible to distortion, so we have to examine each of

SENSOR

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 34

MARKED SET

COLLISION

2.8 Distorting the Truth with Descriptive Statistics 35

FIGURE 2.14 MINITAB bar graph of vessel collisions by location

them with care. In this section, we begin by mentioning a few of the pitfalls to watch for when interpreting a chart or a graph. Then we discuss how numerical descriptive statistics can be used to distort the truth.

One common way to change the impression conveyed by a graph is to change the scale on the vertical axis, the horizontal axis, or both. For example, consider the data on collisions of large marine vessels operating in European waters over a 5-year period summarized in Table 2.6. Figure 2.14 is a MINITAB bar graph showing the frequency of collisions for each of the three locations. The graph shows that “in port” collisions occur more often than collisions “at sea” or collisions in “restricted waters.”

Suppose you want to use the same data to exaggerate the difference between the number of “in port” collisions and the number of collisions in “restricted waters.” One way to do this is to increase the distance between successive units on the vertical axis—that is, stretch the vertical axis by graphing only a few units per inch. A telltale sign of stretching is a long vertical axis, but this is often hidden by starting the vertical axis at some point above the origin, 0. Such a graph is shown in the SPSS printout, Figure 2.15. By starting the bar chart at 250 collisions (instead of 0), it appears that the frequency of “in port” collisions is many times larger than the frequency of collisions in “restricted waters.”

The changes in categories indicated by a bar graph can also be emphasized or deemphasized by stretching or shrinking the vertical axis. Another method of achiev- ing visual distortion with bar graphs is by making the width of the bars proportional to height. For example, look at the bar chart in Figure 2.16a, which depicts the percent- age of the total number of motor vehicle deaths in a year that occurred on each of four

TABLE 2.6 Collisions of Marine Vessels by Location

Location Number of Ships

At Sea 376

Restricted Waters 273

In Port 478

Total 1,127

Source: The Dock and Harbour Authority, Dec./Jan. 1985–1986.

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 35

MARKED SET

36 Chapter 2 Descriptive Statistics

major highways. Now suppose we make both the width and the height grow as the percentage of fatal accidents grows. This change is shown in Figure 2.16b. The reader may tend to equate the area of the bars with the percentage of deaths occurring at each highway. But in fact, the true relative frequency of fatal accidents is proportional only to the height of the bars.

Although we’ve discussed only a few of the ways that graphs can be used to con- vey misleading pictures of phenomena, the lesson is clear. Look at all graphical de- scriptions of data with a critical eye. Particularly, check the axes and the size of the units on each axis. Ignore the visual changes and concentrate on the actual numerical changes indicated by the graph or chart.

The information in a data set can also be distorted by using numerical descriptive measures. Consider the data on 45 energy-related accidents analyzed in Examples 2.9 and 2.10 (and saved in the ENERGY file). Suppose you want a single number that best

FIGURE 2.15 SPSS bar graph of vessel collisions by location—adjusted vertical axis

Highway Highway a. Bar chart

A B C D A B C D

R el

at iv

e fr

eq u

en cy

R el

at iv

e fr

eq u

en cy

0

.15

.30

0

.15

.30

b. Width of bars grows with height

FIGURE 2.16 Relative frequency of fatal motor vehicle accidents on each of four major highways

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 36

MARKED SET

2.8 Distorting the Truth with Descriptive Statistics 37

describes the “typical” number of fatalities that occur in such an accident. One choice is the mean number of fatalities. In Example 2.9 we found the mean to be fatalities. However, if you examine the data in the ENERGY file, you will find that 34 of the 45 accidents (or 75%) had fatalities below the mean. In other words, the value of 243.3 is not very “typical” of the accidents in the data set. This is because (as we discussed in Section 2.4) the mean is inflated by the extreme values in a data set. Re- call (Example 2.10) that one accident had 2500 fatalities and another had 1000 fatali- ties. These two very atypical values inflate the mean. Figure 2.17 shows how the value of the mean changes as these outliers are removed from the data set. When the most deadly accident (2500 fatalities) is deleted, the mean drops to . When both outliers are deleted, the mean drops to .

A better measure of central tendency for the number of fatalities in the 45 energy- related accidents is the median. In Example 2.10 we found the median to be fatalities. We know, by definition, that half the accidents have a fatality value below 114 and half above. Consequently, the median is more “typical” of the values in the data set. As you can see from Figure 2.17, the median is when the largest outlier is deleted, and is when both outliers are deleted. Thus, the median does not dramatically change as the largest observations in the data set are removed.

Another distortion of information in a sample occurs when only a measure of cen- tral tendency is reported. Both a measure of central tendency and a measure of vari- ability are needed to obtain an accurate mental image of a data set. For example, suppose the Environmental Protection Agency (EPA) wants to rank two car models based on their estimated (mean) EPA city mileage ratings. Assume that model A has a mean EPA mileage rating of 32 miles per gallon and that model B has a mean EPA mileage rating of 30 miles per gallon. Based on the mean, the EPA should rank model A ahead of model B.

However, the EPA did not take into account the variability associated with the mileage ratings. As an extreme example, suppose that the standard deviation for model A is 5 miles per gallon, whereas that for model B is only 1 mile per gallon. If the mileages form a mound-shaped distribution, they might appear as shown in Figure 2.18. Note that the larger amount of variability associated with model A implies that a model A car is more likely to have a low mileage rating than a model B car. If the ranking is based on selecting the model with the lowest chance of a low mileage rating, model B will be ranked ahead of model A.

m = 100 m = 107

m = 114

y = 173.2 y = 192

y = 243.3

FIGURE 2.17 MINITAB descriptive statistics for number of fatalities in energy- related accidents

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 37

MARKED SET

38 Chapter 2 Descriptive Statistics

0 15 20 25 30 32 35 40 45 50

Mileage distribution for model B

Mileage distribution for model A

R el

at iv

e fr

eq u

en cy

�B �A

FIGURE 2.18 Mileage distributions for two car models

Quick Review Key Terms Arithmetic mean 00

Bar graph 32

Box plots 88

Chebyshev’s Rule 73

Class interval 38

Category frequency 31

Category relative frequency 31

Dot plot 41

Empirical Rule 74

Hinges 87

Histogram 43

Inner fences 87

Interquartile range (IQR) 86

Lower quartile 86

Mean 56

Measures of central tendency 55

Measures of relative standing 71

Measures of variation or spread 67

Median 57

Midquartile 86

Mode 50

Mound-shaped distribution 73

100thp percentile 00

Outer fences 87

Outlier 85

Pareto diagram 33

Parameter 00

Population mean 00

Population standard deviation 00

Population variance 00

Percentile 81

Pie chart 86

Range 67

Sample 00

Sample mean 00

Skewness 59

Standard deviation 69

Statistic 00

Stem-and-leaf display 52

Upper quartile 86

Variance 68

Whiskers 88

z-score 82

IQR = QU - QL

z = y - m s

z = y - y

s

s = 2s2 s2 =

a n

i = 1 1yi - y22 n - 1

= a n

i = 1 yi

2 - ¢an

i = 1 yi≤2

n

n - 1

y = a n

i = 1 yt

n

Category frequency n

Key Formulas

Category relative frequency 00 Sample mean 00 Sample variance 00 Sample standard deviation 00 Sample z-score 00 Population z-score 00 Interquartile range 00 *Lower inner fence 00 *Upper inner fence 00 *Lower outer fence 00 *Upper outer fence 00

QU + 31IQR2 QL - 31IQR2 QU + 1.51IQR2 QL - 1.51IQR2

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 38

MARKED SET

Quick Review 39

Chapter Summary Notes • Graphical methods for qualitative data: pie chart, bar graph, and Pareto diagram • Graphical methods for quantitative data: dot plot, stem-and-leaf display, and histogram • Numerical measures of central tendency: mean, median, and mode • Numerical measures of variation: range, variance, and standard deviation • Sample numerical descriptive measures are called statistics. • Population numerical descriptive measures are called parameters. • Rules for determining the percentage of measurements in the interval (mean) (std. dev.): Chebyshev’s Rule

(at least 75%) and Empirical Rule (approximately 95%)

• Measures of relative standing: percentile score and z-score • Methods for detecting outliers: box plots and z-scores

; 2

Dumped 77.6%

Exported 5%

Recycled into new products

6.7%

Burned for fuel 10.7%

Source: U.S. Environmental Protection Agency and National Solid Waste Management Association.

Supplementary Exercises

2.55 Scrapped tires. The accompanying pie chart describes the fate of the (estimated) 250 million automobile tires that are scrapped in the United States each year.

a. Interpret the pie chart. b. Convert the pie chart into a relative frequency bar graph. c. Convert the pie chart into a frequency bar graph.

2.56 Switching off air bags. Driver-side and passenger-side air bags are installed in all new cars to prevent serious or fatal injury in an automobile crash. However, air bags have been found to cause deaths in children and small people or people with handicaps in low-speed crashes. Conse- quently, in 1998 the federal government began allowing vehicle owners to request installation of an on-off switch for air bags. The table describes the reasons for requesting the installation of passenger-side on-off switches given by car owners in 1998 and 1999.

a. What type of variable, quantitative or qualitative is summarized in the table? Give the values that the vari- able could assume.

b. Calculate the relative frequencies for each reason. c. Display the information in the table in an appropriate

graph. d. What proportion of the car owners who requested on-

off air bag switches gave Medical as one of the reasons?

Reason Number of Requests

Infant 1,852

Child 17,148

Medical 8,377

Infant & Medical 44

Child & Medical 903

Infant & Child 1,878

Infant & Child & Medical 135

Total 30,337

Source: National Highway Transportation Safety Administration, Sept. 2000.

2.57 Unsafe Florida roads. In Florida, civil engineers are design- ing roads with the latest safety-oriented construction meth- ods in response to the fact that more people in Florida are killed by bad roads than by guns. One year, a total of 135 traffic accidents that occurred were attributed to poorly constructed roads. A breakdown of the poor road condi- tions that caused the accidents is shown in the following table. Construct and interpret a Pareto diagram for the data.

Poor Road Condition Number of Fatalities

Obstructions without warning 7

Road repairs/Under construction 39

Loose surface material 13

Soft or low shoulders 20

Holes, ruts, etc. 8

Standing water 25

Worn road surface 6

Other 17

Total 135

Source: Florida Department of Highway Safety and Motor Vehicles.

BADROADS

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 39

MARKED SET

40 Chapter 2 Descriptive Statistics

TILLRATIO

MINITAB output for Exercise 2.58

DRIVSTAR Count Percent

2

3

4

5

N�

4.08

17.35

60.20

18.37

4

17

59

18

98

Tally for Discrete Variables: DRIVSTAR

Variable

DRIVHEAD

N

98

Mean

603.7

StDev

185.4

Minimum

216.0

01

475.0

Median

605.0

03

724.3

Maximum

1240.0

Descriptive Statistics: DRIVHEADMINITAB output for Exercise 2.61

2.62 Chemical makeup of glacial drifts. Refer to the American Journal of Science (Jan., 2005) study of the chemical makeup of buried gla- cial drifts (tills), Exercise 2.20 (p. xxx). The data on the Al/Be ratios for a sample of 26 buried till specimens are repeated in the table.

3.76 4.05 3.81 3.23 3.13 3.30 3.21 3.32 4.09 3.90 5.06 3.85 3.88

4.06 4.56 3.60 3.27 4.09 3.38 3.37 2.73 2.95 2.25 2.73 2.55 3.06

Source: Adapted from American Journal of Science, Vol. 305, No. 1, Jan. 2005, p. 16 (Table 2).

EARTHQUAKE

ROUGHPIPE

CRASH

2.58 Earthquake aftershock magnitudes. Refer to the magnitudes (measured on the Richter scale) of 2929 earthquake aftershocks, Ex- ercise 2.13 (p. xxx). The data are saved in the EARTHQUAKE file.

a. Descriptive statistics for the data are shown in the accompanying MINITAB printout. Interpret these statistics. b. Use a box plot to identify any outliers in the data.

2.59 Surface roughness of pipe. Refer to the Anti-corrosion Methods and Materials (Vol. 50, 2003) study of the surface roughness of coated oil field pipes, Exercise 2.16 (p. xxx). The data (in micrometers) is repeated in the table. Give an interval that will likely contain about 95% of all coated pipe roughness measurements.

1.72 2.50 2.16 2.13 1.06 2.24 2.31 2.03 1.09 1.40

2.57 2.64 1.26 2.05 1.19 2.13 1.27 1.51 2.41 1.95

Source: Farshad, F., and Pesacreta, T., “Coated Pipe Interior Surface Roughness as Measured by Three Scanning Probe Instruments.” Anti- corrosion Methods and Materials, Vol. 50, No. 1, 2003 (Table III).

2.60 Crash tests on new cars. Each year, the National Highway Traffic Safety Administration (NHTSA) crash tests new car models to determine how well they protect the driver and front-seat passenger in a head-on collision. The NHTSA has developed a “star” scoring system for the frontal crash test, with results ranging from one star (*) to five stars (*****). The more stars in the rating, the better the level of crash protection in a head-on collision. The NHTSA crash test results for 98 cars in a recent model year are stored in the data file named CRASH. The driver-side star ratings for the 98 cars are summarized in the accompanying MINITAB printout. Use the information in the printout to form a pie chart. Interpret the graph.

2.61 Crash Tests on New Cars. Refer to Exercise 2.60 and the NHTSA crash test data. One quantitative variable recorded by the NHTSA is driver’s severity of head injury (measured on a scale from 0 to 1500). The mean and standard deviation for the 98 driv- er head-injury ratings in the CRASH file are displayed in the MINITAB printout at the bottom of the page. Use these values to find the z-score for a driver head-injury rating of 408. Interpret the result.

MENDMC02_0131877062.QXD 3/24/06 5:33 PM Page 40

Quick Review 41

REDDYE

a. Compute and interpret three numerical descriptive measures of central tendency for the Al/Be ratios.

b. Compute and interpret three numerical descriptive measures of variation for the Al/Be ratios.

c. Construct a box plot for the data. Do you detect any outliers?

2.63 Red dye in gasoline. Dyes are used in coloration products, such as textiles, paper, leather, and foodstuffs, and are re- quired by law to be in gasoline to indicate the presence of lead. To monitor environmental contamination, analyti- cal methods must be developed to identify and quantify these dyes. In one study, thermospray high-performance liquid chromatography/mass spectrometry was used to characterize dyes in wastewater and gasoline. The accom- panying table gives the relative abundance (relative frequency of occurrence) of commercial Diazo Red dye components in gasoline. Describe the relative abundance of red dye compounds with a bar graph. Interpret the graph.

Red Dye Compound Relative Abundance

H .021

CH3 .210

C2H5 .354

C3H7 .072

C7H15 .054

C8H17 .127

C9H19 .118

C10H21 .025

Others .019

Source: Voyksner, R. D., “Characterization of dyes in environmental samples by thermospray high-performance liquid chromatography/mass spectrometry.” Analytical Chemistry, Vol. 57, No. 13, Nov. 1985, p. 2601 (Table 1). Reprinted with permission. Copyright 1985 American Chemical Society.

0 20 40 60 80 100 120 140 160 180 200 0

2 1 1

3 4

7

5 4

3 3 3 2 22

10

4

6

8

10

12

14

16

F re

q u

en cy

Chip length (mm)

Source: Chin, Jih-Hua et al. “The computer simulation and experimental analysis of chip monitoring for deep hole drilling.” Journal of Engineering for Industry, Transactions of the ASME, Vol. 115, May 1993, p. 187 (Figure 12).

2.64 Deep-hole drilling. Refer to the Journal of Engineering for Industry (May 1993) study of deep hole drilling described in Exercise 1.13 (p. xxx). An analysis of drill chip congestion was performed using data generated via computer simulation. The simulated distribution of the length (in millimeters) of 50 drill chips is displayed here in a frequency histogram.

a. Convert the frequency histogram into a relative frequency histogram. b. Based on the graph in part a, would you expect to observe a drill chip with a length of at least 190 mm? Explain.

2.65 Lumpy iron ore. Sixty-six bulk specimens of Chilean lumpy iron ore (95% particle size, 150 millimeters) were randomly sampled from a 35,325-long-ton shipload

of ore, and the percentage of iron in each ore specimen was determined. The data are shown in the following table.

LUMPYORE

MENDMC02_0131877062.QXD 3/24/06 5:33 PM Page 41

42 Chapter 2 Descriptive Statistics

62.66 61.82 62.24

62.87 63.01 63.43

63.22 63.01 62.87

63.01 62.80 63.64

62.10 62.80 63.92

63.43 63.01 63.71

63.22 62.10 63.64

63.57 63.29 64.06

61.75 63.37 62.73

63.15 61.75 62.52

63.08 63.29 62.10

63.22 62.38 63.29

63.22 62.59 63.01

63.08 63.92 63.36

62.87 63.29 63.08

61.68 63.57 62.03

62.45 62.80 64.34

62.10 62.31 64.06

62.87 63.01 62.87

62.87 62.94 63.50

62.94 63.08 63.78

62.38 63.43 62.10

Source: Sato, T., Ito, K., Chujo, S., and Takahashi, U., “Example of experiments on systematic sampling of iron ore.” Reports of Statistical Application Research, Union of Japanese Scientists and Engineers, Vol. 18, No. 1, 1971.

a. Describe the population from which the sample was selected.

b. Give one possible objective of this sampling procedure. c. Construct a relative frequency histogram for the data. d. Calculate and s. e. Find the percentage of the total number ( ) of

observations that lie in the interval . Does this percentage agree with the Empirical Rule?

f. Find the 25th, 50th, 75th, and 90th percentiles for the data set. Interpret these values.

2.66 PCB in soil samples. Refer to the Chemosphere (Feb. 1986) study to obtain information on the background lev- els of the toxic substance polychlorinated biphenyl (PCB) in soil samples in the United Kingdom, Exercise 1.18 (p. xxx). Such information is used as a benchmark against which PCB levels at waste disposal facilities in the United Kingdom can be compared. The accompanying table con- tains the measured PCB levels of soil samples taken at 14 rural and 15 urban locations in the United Kingdom. (PCB concentration is measured in .0001 gram per kilogram of soil.) From these preliminary results, the researchers

y ; 2s n = 66

y

reported “a significant difference between (the PCB lev- els) for rural areas. . . and for urban areas.”

Rural Urban

3.5 1.0 1.6 12.0 24.0 11.0 107.0 18.0

8.1 5.3 23.0 8.2 29.0 49.0 94.0 12.0

1.8 9.8 1.5 9.7 16.0 22.0 141.0 18.0

9.0 15.0 21.0 13.0 11.0

Source: Badsha, K., and Eduljee, G., “PCB in the U.K. environment—A preliminary survey.” Chemosphere, Vol. 15, No. 2, Feb. 1986, p. 213 (Table 1). Reprinted with permission. Copyright 1986, Pergamon Press, Ltd.

a. Construct a stem-and-leaf display for the PCB levels of rural soil samples.

b. Construct a stem-and-leaf display for the PCB levels of urban soil samples.

c. Combine the data for rural and urban soil samples and construct a stem-and-leaf display. Identify each of the urban PCB levels on the display with a circle. Does the graph support the researchers’ conclusions?

2.67 Unplanned nuclear scrams. Scram is the term used by nu- clear engineers to describe a rapid emergency shutdown of a nuclear reactor. The nuclear industry has made a con- certed effort to significantly reduce the number of un- planned scrams. The accompanying table gives the number of scrams at each of 56 U.S. nuclear reactor units in a recent year. Would you expect to observe a nuclear re- actor in the future with 11 unplanned scrams? Explain.

1 0 3 1 4 2 10 6 5 2 0 3 1 5

4 2 7 12 0 3 8 2 0 9 3 3 4 7

2 4 5 3 2 7 13 4 2 3 3 7 0 9

4 3 5 2 7 8 5 2 4 3 4 0 1 7

2.68 Work measurement data. Industrial engineers periodical- ly conduct “work measurement” analyses to determine the time required to produce a single unit of output. At a large processing plant, the number of total worker-hours re- quired per day to perform a certain task was recorded for 50 days. The data are shown in the table.

128 119 95 97 124 128 142 98 108 120

113 109 124 132 97 138 133 136 120 112

146 128 103 135 114 109 100 111 131 113

124 131 133 131 88 118 116 98 112 138

100 112 111 150 117 122 97 116 92 122

a. Compute the mean, median, and mode of the data set. b. Find the range, variance, and standard deviation of the

data set. c. Construct the intervals , , and .

Count the number of observations that fall within each y ; 3sy ; 2sy ; s

PCBUK

SCRAMS

WORKHRS

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 42

MARKED SET

Quick Review 43

EVOS

interval and find the corresponding proportions. Com- pare the results to the Empirical Rule. Do you detect any outliers?

d. Construct a box plot for the data. Do you detect any outliers?

e. Find the 70th percentile for the data on total daily worker-hours. Interpret its value.

2.69 Oil spill impact on seabirds. The Journal of Agricultural, Biological, and Environmental Statistics (Sept. 2000) published a study on the impact of the Exxon Valdez tanker oil spill on the seabird population in Prince William Sound, Alaska. A subset of the data analyzed is stored in the EVOS file. Data were collected on 96 shoreline locations (called transects) of constant width but variable length. For each transect, the number of seabirds found is record- ed as well as the length (in kilometers) of the transect and whether or not the transect was in an oiled area. (The first five and last five observations in the EVOS file are listed in the table.)

a. Identify the variables measured as quantitative or qualitative.

b. Identify the experimental unit. c. Use a pie chart to describe the percentage of transects

in oiled and unoiled areas. d. Use a graphical method to examine the relationship

between observed number of seabirds and transect length.

e. Observed seabird density is defined as the observed count divided by the length of the transect. MINITAB descriptive statistics for seabird densities in unoiled and oiled transects are displayed in the printout shown

(Selected observations)

Transect Seabirds Length Oil

1 0 4.06 No

2 0 6.51 No

3 54 6.76 No

4 0 4.26 No

5 14 3.59 No

92 7 3.40 Yes

93 4 6.67 Yes

94 0 3.29 Yes

95 0 6.22 Yes

96 27 8.94 Yes

Source: McDonald, T. L., Erickson, W. P. and McDonald, L. L., “Analysis of count data from before-after control-impact studies.” Journal of Agricultural, Biological, and Environmental Statistics, Vol 5, No. 3, Sept. 2000, pp.277–278 (Table A.1).

at the bottom of the page. Assess whether the distribu- tion of seabird densities differs for transects in oiled and unoiled areas.

f. For unoiled transects, give an interval of values that is likely to contain at least 75% of the seabird densities.

g. For oiled transects, give an interval of values that is likely to contain at least 75% of the seabird densities.

h. Which type of transect, an oiled or unoiled one, is more likely to have a seabird density of 16? Explain.

oooo

GALAXY 2

2.70 Speed of light from galaxies. Refer to The Astronomical Journal study of galaxy velocities, Exercise 2.39 (p. xxx). A second clus- ter of galaxies, named A1775, is thought to be a double cluster; that is, two clusters of galaxies in close proximity. Fifty-one ve- locity observations (in kilometers per second, km/s) from cluster A1775 are listed in the table.

22922 20210 21911 19225 18792 21993 23059

20785 22781 23303 22192 19462 19057 23017

20186 23292 19408 24909 19866 22891 23121

19673 23261 22796 22355 19807 23432 22625

22744 22426 19111 18933 22417 19595 23408

22809 19619 22738 18499 19130 23220 22647

22718 22779 19026 22513 19740 22682 19179

19404 22193

Source: Oegerle, W. R., Hill, J. M., and Fitchett, M. J., “Observations of high dispersion clusters of galaxies: Constraints on cold dark matter.” The Astronomical Journal. Vol. 110. No. 1. July 1995. p. 34 (Table 1). p. 37 (Figure 1).

Variable

Density

Oil N Mean StDev Minimum Median MaximumQ1 Q3

no

yes

36 3.27

3.495

6.70

5.968

0.000

0.0000

0.000

0.000

0.890

0.700

3.87

5.233

36.23

32.83660

Descriptive Statistics: Density

MENDMC02_0131877062.QXD 3/24/06 5:33 PM Page 43

44 Chapter 2 Descriptive Statistics

*From Tanur, J., et al., eds. Statistics: A Guide to the Unknown. San Francisco: Holden-Day, 1978. pp. 279–81.

50

100

0

F re

q u

en cy

1.0081.0061.0041.0021.000.998.996 Diameter (centimeters)

LSL

Gender Lifts/Minute Mean Standard Deviation

Male 1 30.25 8.56

4 23.83 6.70

Female 1 19.79 3.11

4 15.82 3.23

Source: Ayoub, M. M., Mital, A., Bakken, G. M., Asfour, S. S., and Bethea, N. J., “Development of strength capacity norms for manual materials handling activities: The state of the art” Human Factors, June 1980, Vol. 22, pp. 271–283. Copyright 1980 by the Human Factors Soceity, Inc. and reproduced by permission.

c. Assuming the MMHA recommendations of Ayoub et al. are reasonable, would you expect that an average male could safely lift a box (30 centimeters wide) weighing 25 kilograms from the floor to knuckle height at a rate of 4 lifts per minute? An average female? Explain.

2.73 Steel rod quality. In his essay “Making Things Right,” W. Edwards Deming considered the role of statistics in the quality control of industrial products.* In one example, Deming examined the quality control process for a manu- facturer of steel rods. Rods produced with diameters smaller than 1 centimeter fit too loosely in their bearings and ultimately must be rejected (thrown out). To deter- mine whether the diameter setting of the machine that produces the rods is correct, 500 rods are selected from the

2.71 Hull failures of oil tankers. Owing to several major ocean oil spills by tank vessels, Congress passed the 1990 Oil Pollution Act, which requires all tankers to be designed with thicker hulls. Further improvements in the structural design of a tank vessel have been proposed since then, each with the objective of reducing the likelihood of an oil spill and decreasing the amount of outflow in the event of a hull puncture. To aid in this development, Marine Tech- nology (Jan. 1995) reported on the spillage amount (in thousands of metric tons) and cause of puncture for 50 re- cent major oil spills from tankers and carriers.[Note: Cause of puncture is classified as either collision (C), fire/explosion (FE), hull failure (HF) or grounding (G).] The data are saved in the OILSPILL file.

a. Use a graphical method to describe the cause of oil spillage for the 50 tankers. Does the graph suggest that any one cause is more likely to occur than any other? How is this information of value to the design engineers?

b. Find and interpret descriptive statistics for the 50 spillage amounts. Use this information to form an interval that can be used to predict the spillage amount of the next major oil spill.

2.72 Manual materials handling. Engineers have a team for un- aided human acts of lifting, lowering, pushing, pulling, carrying, or holding and releasing an object—manual ma- terials handling activities (MMHA). M. M. Ayoub et al. (1980) have attempted to develop strength and capacity guidelines for MMHA. The authors point out that a clear distinction between strength and capacity must be made: “Strength implies what a person can do in a single attempt, whereas capacity implies what a person can do for an ex- tended period of time. Lifting strength, for example, deter- mines the amount that can be lifted at frequent intervals.” The accompanying table presents a portion of the recom- mendations of Ayoub et al. for the lifting capacities of males and females. It gives the means and standard devia- tions of the maximum weight (in kilograms) of a box 30 centimeters wide that can be safely lifted from the floor to knuckle height at two different lift rates—1 lift per minute and 4 lifts per minute.

a. Roughly sketch the relative frequency distribution of maximum recommended weight of lift for each of the four gender/lifts-per-minute combinations. The Empir- ical Rule will help you do this.

b. Construct the interval for each of the four data sets and give the approximate proportion of measure- ments that fall within the interval.

y + 2s

a. Use a graphical method to describe the velocity distribution of galaxy cluster A1775. b. Examine the graph, part a. Is there evidence to support the double cluster theory? Explain. c. Calculate numerical descriptive measures (e.g., mean and standard deviation) for galaxy velocities in cluster A1775. Depend-

ing on your answer to part b, you may need to calculate two sets of numerical descriptive measures, one for each of the clus- ters (say, A1775A and A1775B) within the double cluster.

d. Suppose you observe a galaxy velocity of 20,000 km/s. Is this galaxy likely to belong to cluster A1775A or A1775B? Explain.

OILSPILL

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 44

MARKED SET

Statistics in Action 45

day’s production and their diameters are recorded. The distribution of the 500 diameters for one day’s production is shown in the figure. Note that the symbol LSL in the figure represents the 1-centimeter lower specification limit of the steel rod diameters.

a. What type of data, quantitative or qualitative, does the figure portray?

b. What type of graphical method is being used to de- scribe the data?

c. Use the figure to estimate the proportion of rods with diameters between 1.0025 and 1.0045 centimeters.

d. There has been speculation that some of the inspectors are unaware of the trouble that an undersized rod diam- eter would cause later in the manufacturing process. Consequently, these inspectors may be passing rods with diameters that were barely below the lower speci- fication limit and recording them in the interval cen- tered at 1.000 centimeter. According to the figure, is there any evidence to support this claim? Explain.

STATISTICS IN ACTION Characteristics of Contaminated Fish in the Tennessee River, Alabama

R ecall (Statistics in Action, Chapter 1, p. xx) that the U.S. Army Corps of Engineers collected data on fish contaminated from the toxic discharges of a chemical plant once located on the banks of the Tennessee River in Alabama. Ecologists fear that contaminated fish migrating from the mouth of

the river to a nearby reservoir and wildlife refuge could endanger other wildlife that prey on the fish. The variables measured for each of the 144 captured fish are: species (channel catfish, largemouth

bass, or smallmouth buffalofish), river/creek where captured (Tennessee River, Flint Creek, Limestone Creek, or Spring Creek), weight (in grams), length (in centimeters), and level of DDT contamination (in parts per million). The data are saved in the DDT file.

One goal of the study is to describe the characteristics of the captured fish. Some key questions to be answered are: Where (i.e., what river or creek) are the different species most likely to be captured? What is the typical weight and length of the fish? What is the level of DDT contamination of the fish? Does the level of contamination vary by species? These questions can be partially answered by applying the descriptive methods of this chapter. Of course, the method used will depend on the type (quantitative or qualitative) of the variable analyzed.

Consider, first, the qualitative variable species. A bar graph for species, produced using SAS, is shown in Figure SIA2.1. You can see that the majority (about 67%) of the captured fish were channel catfish, another 25% were smallmouth buffalofish, and about 18% were largemouth bass. To determine where these species of fish were captured, we examine the MINITAB pie charts in Figure SIA2.2. One pie chart is pro- duced for each of the four river/creek locations. The charts show that the only species captured in the tributary creeks (LC, SC, or FC) was channel catfish. Since these creeks are closest to the reservoir and wildlife reserve, ecologists focused their investigation on wildlife that prey on channel catfish.

FIGURE SIA2.1 SAS horizontal bar graph for species of fish DDT

•••

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 45

MARKED SET

46 Chapter 2 Descriptive Statistics

FIGURE SIA2.2 MINITAB pie charts of species by river

To examine the quantitative variables length, weight, and DDT level, we produced descriptive statistics for each variable by species. These statistics are shown in the SAS printout, Figure SIA2.3. Histograms of these variables for channel catfish are shown in the MINITAB printout, Figure SIA2.4. The histograms reveal mound-shaped, nearly symmetric distributions for the lengths and weights of channel catfish. Thus, we can apply the Empirical Rule to describe these distributions.

FIGURE SIA2.3 SAS descriptive statistics by species of fish

FIGURE SIA2.4 MINITAB histograms for channel catfish

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 46

MARKED SET

Statistics in Action 47

FIGURE SIA2.5 MINITAB summary statistics for DDT levels of channel catfish, outlier deleted

For channel catfish lengths, and . Therefore, about 95% of the channel catfish lengths fall in the interval , i.e., between 35.57 and 53.89 centimeters. For channel catfish weights, and . This implies that about 95% of the channel catfish weights fall in the in- terval , i.e., between 461.9 and 1512.7 grams.

The histogram at the bottom of Figure SIA2.4 shows that channel catfish DDT levels are highly skewed. The skewness appears to be caused by a few extremely large DDT values. The SAS printout, Figure SIA2.3, shows that the largest (maximum) DDT level is 1100 ppm. Is this value an outlier? For channel catfish DDT levels, and . Thus, the z-score for this large DDT value is . Since it is extremely unlikely to find an observation in a data set that is almost 9 standard deviations from the mean, the DDT value is considered a highly suspect outlier. Some research by the U.S. Army Corps of Engi- neers revealed that this DDT value was correctly measured and recorded but that the fish was one of the few found at the exact location where the manufacturing plant was discharging its toxic waste materials into the water. Consequently, the researchers removed this observation from the data set and reanalyzed the DDT levels of channel catfish.

The MINITAB printout, Figure SIA2.5, gives summary statistics for channel catfish DDT levels when the outlier is deleted. Now, and . According to Chebyshev’s Rule, at least 75% of the DDT levels for channel catfish will lie in the interval , i.e., between 0 and 115.7 ppm. Also, the SAS histogram for the reduced data set is shown in Figure SIA2.6. The histogram reveals that a large percentage of the DDT levels are above 5 ppm—the maximum level deemed safe by the Environmental Protection Agency. This provided further evidence for the ecologists to focus on wildlife that prey on channel catfish. •

22.1 ; 2146.82 s = 46.8y = 22.1

z = 11100 - 33.32>119.5 = 8.93s = 119.5y = 33.3

987.3 ; 21262.72 s = 262.7y = 987.3

44.73 ; 214.582 s = 4.58y = 44.73

FIGURE SIA2.6 SAS histogram for DDT levels of channel catfish, outlier deleted

DDT

MENDMC02_0131877062.QXD 3/24/06 5:33 PM Page 47

MENDMC02_0131877062.QXD 3/25/06 2:49 AM Page 48

MARKED SET

<< /ASCII85EncodePages false /AllowTransparency false /AutoPositionEPSFiles true /AutoRotatePages /None /Binding /Left /CalGrayProfile (Dot Gain 20%) /CalRGBProfile (sRGB IEC61966-2.1) /CalCMYKProfile (U.S. Web Coated \050SWOP\051 v2) /sRGBProfile (sRGB IEC61966-2.1) /CannotEmbedFontPolicy /Error /CompatibilityLevel 1.4 /CompressObjects /Tags /CompressPages true /ConvertImagesToIndexed true /PassThroughJPEGImages true /CreateJDFFile false /CreateJobTicket false /DefaultRenderingIntent /Default /DetectBlends true /ColorConversionStrategy /LeaveColorUnchanged /DoThumbnails false /EmbedAllFonts true /EmbedJobOptions true /DSCReportingLevel 0 /EmitDSCWarnings false /EndPage -1 /ImageMemory 1048576 /LockDistillerParams false /MaxSubsetPct 100 /Optimize true /OPM 1 /ParseDSCComments true /ParseDSCCommentsForDocInfo true /PreserveCopyPage true /PreserveEPSInfo true /PreserveHalftoneInfo false /PreserveOPIComments false /PreserveOverprintSettings true /StartPage 1 /SubsetFonts true /TransferFunctionInfo /Apply /UCRandBGInfo /Preserve /UsePrologue false /ColorSettingsFile () /AlwaysEmbed [ true ] /NeverEmbed [ true ] /AntiAliasColorImages false /DownsampleColorImages true /ColorImageDownsampleType /Bicubic /ColorImageResolution 300 /ColorImageDepth -1 /ColorImageDownsampleThreshold 1.50000 /EncodeColorImages true /ColorImageFilter /DCTEncode /AutoFilterColorImages true /ColorImageAutoFilterStrategy /JPEG /ColorACSImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >> /ColorImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >> /JPEG2000ColorACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /JPEG2000ColorImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /AntiAliasGrayImages false /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >> /GrayImageDict << /QFactor 0.15 /HSamples [1 1 1 1] /VSamples [1 1 1 1] >> /JPEG2000GrayACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /JPEG2000GrayImageDict << /TileWidth 256 /TileHeight 256 /Quality 30 >> /AntiAliasMonoImages false /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 1200 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict << /K -1 >> /AllowPSXObjects false /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile () /PDFXOutputCondition () /PDFXRegistryName (http://www.color.org) /PDFXTrapped /Unknown /Description << /ENU (Use these settings to create PDF documents with higher image resolution for high quality pre-press printing. The PDF documents can be opened with Acrobat and Reader 5.0 and later. These settings require font embedding.) /JPN <FEFF3053306e8a2d5b9a306f30019ad889e350cf5ea6753b50cf3092542b308030d730ea30d730ec30b9537052377528306e00200050004400460020658766f830924f5c62103059308b3068304d306b4f7f75283057307e305930023053306e8a2d5b9a30674f5c62103057305f00200050004400460020658766f8306f0020004100630072006f0062006100740020304a30883073002000520065006100640065007200200035002e003000204ee5964d30678868793a3067304d307e305930023053306e8a2d5b9a306b306f30d530a930f330c8306e57cb30818fbc307f304c5fc59808306730593002> /FRA <FEFF004f007000740069006f006e007300200070006f0075007200200063007200e900650072002000640065007300200064006f00630075006d0065006e00740073002000500044004600200064006f007400e900730020006400270075006e00650020007200e90073006f006c007500740069006f006e002000e9006c0065007600e9006500200070006f0075007200200075006e00650020007100750061006c0069007400e90020006400270069006d007000720065007300730069006f006e00200070007200e9007000720065007300730065002e0020005500740069006c006900730065007a0020004100630072006f0062006100740020006f00750020005200650061006400650072002c002000760065007200730069006f006e00200035002e00300020006f007500200075006c007400e9007200690065007500720065002c00200070006f007500720020006c006500730020006f00750076007200690072002e0020004c00270069006e0063006f00720070006f0072006100740069006f006e002000640065007300200070006f006c0069006300650073002000650073007400200072006500710075006900730065002e> /DEU <FEFF00560065007200770065006e00640065006e0020005300690065002000640069006500730065002000450069006e007300740065006c006c0075006e00670065006e0020007a0075006d002000450072007300740065006c006c0065006e00200076006f006e0020005000440046002d0044006f006b0075006d0065006e00740065006e0020006d00690074002000650069006e006500720020006800f60068006500720065006e002000420069006c0064006100750066006c00f600730075006e0067002c00200075006d002000650069006e00650020007100750061006c00690074006100740069007600200068006f006300680077006500720074006900670065002000410075007300670061006200650020006600fc0072002000640069006500200044007200750063006b0076006f0072007300740075006600650020007a0075002000650072007a00690065006c0065006e002e00200044006900650020005000440046002d0044006f006b0075006d0065006e007400650020006b00f6006e006e0065006e0020006d006900740020004100630072006f0062006100740020006f0064006500720020006d00690074002000640065006d002000520065006100640065007200200035002e003000200075006e00640020006800f600680065007200200067006500f600660066006e00650074002000770065007200640065006e002e00200042006500690020006400690065007300650072002000450069006e007300740065006c006c0075006e00670020006900730074002000650069006e00650020005300630068007200690066007400650069006e00620065007400740075006e00670020006500720066006f0072006400650072006c006900630068002e> /PTB <FEFF005500740069006c0069007a006500200065007300740061007300200063006f006e00660069006700750072006100e700f5006500730020007000610072006100200063007200690061007200200064006f00630075006d0065006e0074006f0073002000500044004600200063006f006d00200075006d00610020007200650073006f006c007500e700e3006f00200064006500200069006d006100670065006d0020007300750070006500720069006f0072002000700061007200610020006f006200740065007200200075006d00610020007100750061006c0069006400610064006500200064006500200069006d0070007200650073007300e3006f0020006d0065006c0068006f0072002e0020004f007300200064006f00630075006d0065006e0074006f0073002000500044004600200070006f00640065006d0020007300650072002000610062006500720074006f007300200063006f006d0020006f0020004100630072006f006200610074002c002000520065006100640065007200200035002e00300020006500200070006f00730074006500720069006f0072002e00200045007300740061007300200063006f006e00660069006700750072006100e700f50065007300200072006500710075006500720065006d00200069006e0063006f00720070006f0072006100e700e3006f00200064006500200066006f006e00740065002e> /DAN <FEFF004200720075006700200064006900730073006500200069006e0064007300740069006c006c0069006e006700650072002000740069006c0020006100740020006f0070007200650074007400650020005000440046002d0064006f006b0075006d0065006e0074006500720020006d006500640020006800f8006a006500720065002000620069006c006c00650064006f0070006c00f80073006e0069006e0067002000740069006c0020007000720065002d00700072006500730073002d007500640073006b007200690076006e0069006e0067002000690020006800f8006a0020006b00760061006c0069007400650074002e0020005000440046002d0064006f006b0075006d0065006e007400650072006e00650020006b0061006e002000e50062006e006500730020006d006500640020004100630072006f0062006100740020006f0067002000520065006100640065007200200035002e00300020006f00670020006e0079006500720065002e00200044006900730073006500200069006e0064007300740069006c006c0069006e0067006500720020006b007200e600760065007200200069006e0074006500670072006500720069006e006700200061006600200073006b007200690066007400740079007000650072002e> /NLD <FEFF004700650062007200750069006b002000640065007a006500200069006e007300740065006c006c0069006e00670065006e0020006f006d0020005000440046002d0064006f00630075006d0065006e00740065006e0020007400650020006d0061006b0065006e0020006d00650074002000650065006e00200068006f00670065002000610066006200650065006c00640069006e00670073007200650073006f006c007500740069006500200076006f006f0072002000610066006400720075006b006b0065006e0020006d0065007400200068006f006700650020006b00770061006c0069007400650069007400200069006e002000650065006e002000700072006500700072006500730073002d006f006d0067006500760069006e0067002e0020004400650020005000440046002d0064006f00630075006d0065006e00740065006e0020006b0075006e006e0065006e00200077006f007200640065006e002000670065006f00700065006e00640020006d006500740020004100630072006f00620061007400200065006e002000520065006100640065007200200035002e003000200065006e00200068006f006700650072002e002000420069006a002000640065007a006500200069006e007300740065006c006c0069006e00670020006d006f006500740065006e00200066006f006e007400730020007a0069006a006e00200069006e006700650073006c006f00740065006e002e> /ESP <FEFF0055007300650020006500730074006100730020006f007000630069006f006e006500730020007000610072006100200063007200650061007200200064006f00630075006d0065006e0074006f0073002000500044004600200063006f006e0020006d00610079006f00720020007200650073006f006c00750063006900f3006e00200064006500200069006d006100670065006e00200071007500650020007000650072006d006900740061006e0020006f006200740065006e0065007200200063006f007000690061007300200064006500200070007200650069006d0070007200650073006900f3006e0020006400650020006d00610079006f0072002000630061006c0069006400610064002e0020004c006f007300200064006f00630075006d0065006e0074006f00730020005000440046002000730065002000700075006500640065006e00200061006200720069007200200063006f006e0020004100630072006f00620061007400200079002000520065006100640065007200200035002e003000200079002000760065007200730069006f006e0065007300200070006f00730074006500720069006f007200650073002e0020004500730074006100200063006f006e0066006900670075007200610063006900f3006e0020007200650071007500690065007200650020006c006100200069006e0063007200750073007400610063006900f3006e0020006400650020006600750065006e007400650073002e> /SUO <FEFF004e00e4006900640065006e002000610073006500740075007300740065006e0020006100760075006c006c006100200076006f0069006400610061006e0020006c0075006f006400610020005000440046002d0061007300690061006b00690072006a006f006a0061002c0020006a006f006900640065006e002000740075006c006f0073007400750073006c00610061007400750020006f006e0020006b006f0072006b006500610020006a00610020006b007500760061006e0020007400610072006b006b007500750073002000730075007500720069002e0020005000440046002d0061007300690061006b00690072006a0061007400200076006f0069006400610061006e0020006100760061007400610020004100630072006f006200610074002d0020006a0061002000520065006100640065007200200035002e00300020002d006f0068006a0065006c006d0061006c006c0061002000740061006900200075007500640065006d006d0061006c006c0061002000760065007200730069006f006c006c0061002e0020004e00e4006d00e4002000610073006500740075006b0073006500740020006500640065006c006c00790074007400e4007600e4007400200066006f006e0074007400690065006e002000750070006f00740075007300740061002e> /ITA <FEFF00550073006100720065002000710075006500730074006500200069006d0070006f007300740061007a0069006f006e00690020007000650072002000630072006500610072006500200064006f00630075006d0065006e00740069002000500044004600200063006f006e00200075006e00610020007200690073006f006c0075007a0069006f006e00650020006d0061006700670069006f00720065002000700065007200200075006e00610020007100750061006c0069007400e00020006400690020007000720065007300740061006d007000610020006d00690067006c0069006f00720065002e0020004900200064006f00630075006d0065006e00740069002000500044004600200070006f00730073006f006e006f0020006500730073006500720065002000610070006500720074006900200063006f006e0020004100630072006f00620061007400200065002000520065006100640065007200200035002e003000200065002000760065007200730069006f006e006900200073007500630063006500730073006900760065002e002000510075006500730074006500200069006d0070006f007300740061007a0069006f006e006900200072006900630068006900650064006f006e006f0020006c002700750073006f00200064006900200066006f006e007400200069006e0063006f00720070006f0072006100740069002e> /NOR <FEFF004200720075006b00200064006900730073006500200069006e006e007300740069006c006c0069006e00670065006e0065002000740069006c002000e50020006f00700070007200650074007400650020005000440046002d0064006f006b0075006d0065006e0074006500720020006d006500640020006800f80079006500720065002000620069006c00640065006f00700070006c00f80073006e0069006e006700200066006f00720020006800f800790020007500740073006b00720069006600740073006b00760061006c00690074006500740020006600f800720020007400720079006b006b002e0020005000440046002d0064006f006b0075006d0065006e0074006500720020006b0061006e002000e50070006e006500730020006d006500640020004100630072006f0062006100740020006f0067002000520065006100640065007200200035002e00300020006f0067002000730065006e006500720065002e00200044006900730073006500200069006e006e007300740069006c006c0069006e00670065006e00650020006b0072006500760065007200200073006b00720069006600740069006e006e00620079006700670069006e0067002e> /SVE <FEFF0041006e007600e4006e00640020006400650020006800e4007200200069006e0073007400e4006c006c006e0069006e006700610072006e00610020006e00e40072002000640075002000760069006c006c00200073006b0061007000610020005000440046002d0064006f006b0075006d0065006e00740020006d006500640020006800f6006700720065002000620069006c0064007500700070006c00f60073006e0069006e00670020006600f60072002000700072006500700072006500730073007500740073006b0072006900660074006500720020006100760020006800f600670020006b00760061006c0069007400650074002e0020005000440046002d0064006f006b0075006d0065006e00740065006e0020006b0061006e002000f600700070006e006100730020006d006500640020004100630072006f0062006100740020006f00630068002000520065006100640065007200200035002e003000200065006c006c00650072002000730065006e006100720065002e00200044006500730073006100200069006e0073007400e4006c006c006e0069006e0067006100720020006b007200e400760065007200200069006e006b006c00750064006500720069006e00670020006100760020007400650063006b0065006e0073006e006900740074002e> >> >> setdistillerparams << /HWResolution [2400 2400] /PageSize [612.000 792.000] >> setpagedevice