CLA 1 Paper - Data Analysis & Business Intelligence

voyage

Statistical_Techniques_in_Business_and_E.pdf

Home >Business & Finance homework help >CLA 1 Paper - Data Analysis & Business Intelligence

Statistical Techniques in Business & Economics

LIND

MARCHAL

WATHEN

Seventeenth Edition

Statistical Techniques in

BUSINESS & ECONOMICS

The McGraw-Hill/Irwin Series in Operations and Decision Sciences

SUPPLY CHAIN MANAGEMENT

Benton Purchasing and Supply Chain Management Third Edition

Bowersox, Closs, Cooper, and Bowersox Supply Chain Logistics Management Fourth Edition

Burt, Petcavage, and Pinkerton Supply Management Eighth Edition

Johnson, Leenders, and Flynn Purchasing and Supply Management Fourteenth Edition

Simchi-Levi, Kaminsky, and Simchi-Levi Designing and Managing the Supply Chain: Concepts, Strategies, Case Studies Third Edition

PROJECT MANAGEMENT

Brown and Hyer Managing Projects: A Team-Based Approach First Edition

Larson and Gray Project Management: The Managerial Process Fifth Edition

SERVICE OPERATIONS MANAGEMENT

Fitzsimmons and Fitzsimmons Service Management: Operations, Strategy, Information Technology Eighth Edition

MANAGEMENT SCIENCE

Hillier and Hillier Introduction to Management Science: A Modeling and Case Studies Approach with Spreadsheets Fifth Edition

Stevenson and Ozgur

Introduction to Management Science with Spreadsheets First Edition

MANUFACTURING CONTROL SYSTEMS

Jacobs, Berry, Whybark, and Vollmann Manufacturing Planning & Control for Supply Chain Management Sixth Edition

BUSINESS RESEARCH METHODS

Cooper and Schindler Business Research Methods Twelfth Edition

BUSINESS FORECASTING

Wilson, Keating, and John Galt Solutions, Inc. Business Forecasting Sixth Edition

LINEAR STATISTICS AND REGRESSION

Kutner, Nachtsheim, and Neter Applied Linear Regression Models Fourth Edition

BUSINESS SYSTEMS DYNAMICS

Sterman Business Dynamics: Systems Thinking and Modeling for a Complex World First Edition

OPERATIONS MANAGEMENT

Cachon and Terwiesch Matching Supply with Demand: An Introduction to Operations Management Third Edition

Finch Interactive Models for Operations and Supply Chain Management First Edition

Jacobs and Chase Operations and Supply Chain Management Fourteenth Edition

Jacobs and Chase Operations and Supply Chain Management: The Core Third Edition

Jacobs and Whybark Why ERP? A Primer on SAP Implementation First Edition

Schroeder, Goldstein, and Rungtusanatham Operations Management in the Supply Chain: Decisions and Cases Sixth Edition

Stevenson Operations Management Eleventh Edition

Swink, Melnyk, Cooper, and Hartley Managing Operations across the Supply Chain Second Edition

PRODUCT DESIGN

Ulrich and Eppinger Product Design and Development Fifth Edition

BUSINESS MATH

Slater and Wittry Math for Business and Finance: An Algebraic Approach First Edition

Slater and Wittry Practical Business Math Procedures Eleventh Edition

Slater and Wittry Practical Business Math Procedures, Brief Edition Eleventh Edition

BUSINESS STATISTICS

Bowerman, O’Connell, and Murphree Business Statistics in Practice Seventh Edition

Bowerman, O’Connell, Murphree, and Orris Essentials of Business Statistics Fourth Edition

Doane and Seward Applied Statistics in Business and Economics Fourth Edition

Lind, Marchal, and Wathen Basic Statistics for Business and Economics Eighth Edition

Lind, Marchal, and Wathen Statistical Techniques in Business and Economics Seventeenth Edition

Jaggia and Kelly Business Statistics: Communicating with Numbers First Edition

Jaggia and Kelly Essentials of Business Statistics: Communicating with Numbers First Edition

Statistical Techniques in

BUSINESS & ECONOMICS

S E V E N T E E N T H E D I T I O N

DOUGLAS A. LIND Coastal Carolina University and The University of Toledo

WILLIAM G. MARCHAL The University of Toledo

SAMUEL A. WATHEN Coastal Carolina University

STATISTICAL TECHNIQUES IN BUSINESS & ECONOMICS, SEVENTEENTH EDITION Published by McGraw-Hill Education, 2 Penn Plaza, New York, NY 10121. Copyright © 2018 by McGraw-Hill Education. All rights reserved. Printed in the United States of America. Previous editions © 2015, 2012, and 2010. No part of this publication may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without the prior written consent of McGraw- Hill Education, including, but not limited to, in any network or other electronic storage or transmission, or broadcast for distance learning.

Some ancillaries, including electronic and print components, may not be available to customers outside the United States.

This book is printed on acid-free paper.

1 2 3 4 5 6 7 8 9 LWI 21 20 19 18 17 16

ISBN 978-1-259-66636-0 MHID 1-259-66636-0

Chief Product Officer, SVP Products & Markets: G. Scott Virkler Vice President, General Manager, Products & Markets: Marty Lange Vice President, Content Design & Delivery: Betsy Whalen Managing Director: Tim Vertovec Senior Brand Manager: Charles Synovec Director, Product Development: Rose Koos Product Developers: Michele Janicek / Ryan McAndrews Senior Director, Digital Content Development: Douglas Ruby Marketing Manager: Trina Maurer Director, Content Design & Delivery: Linda Avenarius Program Manager: Mark Christianson Content Project Managers: Harvey Yep (Core) / Bruce Gin (Assessment) Buyer: Susan K. Culbertson Design: Matt Backhaus Cover Image: © Corbis / Glow Images Content Licensing Specialists: Melissa Homer (Image) / Beth Thole (Text) Typeface: 9.5/11 Proxima Nova Compositor: Aptara®, Inc. Printer: LSC Communications

All credits appearing on page or at the end of the book are considered to be an extension of the copyright page.

Library of Congress Cataloging-in-Publication Data

Names: Lind, Douglas A., author. | Marchal, William G., author. | Wathen, Samuel Adam. author. Title: Statistical techniques in business & economics/Douglas A. Lind, Coastal Carolina University and The University of Toledo, William G. Marchal, The University of Toledo, Samuel A. Wathen, Coastal Carolina University. Other titles: Statistical techniques in business and economics Description: Seventeenth Edition. | Dubuque, IA : McGraw-Hill Education, [2017] | Revised edition of the authors’ Statistical techniques in business & economics, [2015] Identifiers: LCCN 2016054310| ISBN 9781259666360 (alk. paper) | ISBN 1259666360 (alk. paper) Subjects: LCSH: Social sciences—Statistical methods. | Economics—Statistical methods. | Commercial statistics. Classification: LCC HA29 .M268 2017 | DDC 519.5—dc23 LC record available at https://lccn.loc.gov/2016054310

The Internet addresses listed in the text were accurate at the time of publication. The inclusion of a website does not indicate an endorsement by the authors or McGraw-Hill Education, and McGraw-Hill Education does not guarantee the accuracy of the information presented at these sites.

mheducation.com/highered

DEDICATION

To Jane, my wife and best friend, and our sons, their wives, and our grandchildren: Mike and Sue (Steve and Courtney), Steve and Kathryn (Kennedy, Jake, and Brady), and Mark and Sarah (Jared, Drew, and Nate).

Douglas A. Lind

To Oscar Sambath Marchal, Julian Irving Horowitz, Cecilia Marchal Nicholson and Andrea.

William G. Marchal

To my wonderful family: Barb, Hannah, and Isaac.

Samuel A. Wathen

Over the years, we received many compliments on this text and understand that it’s a favorite among students. We accept that as the highest compliment and continue to work very hard to maintain that status.

The objective of Statistical Techniques in Business and Economics is to provide students majoring in management, marketing, finance, accounting, economics, and other fields of business administration with an introductory survey of descriptive and infer- ential statistics. To illustrate the application of statistics, we use many examples and exercises that focus on business applications, but also relate to the current world of the college student. A previous course in statistics is not necessary, and the mathematical requirement is first-year algebra.

In this text, we show beginning students every step needed to be successful in a basic statistics course. This step-by-step approach enhances performance, accel- erates preparedness, and significantly improves motivation. Understanding the concepts, seeing and doing plenty of examples and exercises, and comprehending the application of statistical methods in business and economics are the focus of this book.

The first edition of this text was published in 1967. At that time, locating relevant business data was difficult. That has changed! Today, locating data is not a problem. The number of items you purchase at the grocery store is automatically recorded at the checkout counter. Phone companies track the time of our calls, the length of calls, and the identity of the person called. Credit card companies maintain information on the number, time and date, and amount of our purchases. Medical devices automati- cally monitor our heart rate, blood pressure, and temperature from remote locations. A large amount of business information is recorded and reported almost instantly. CNN, USA Today, and MSNBC, for example, all have websites that track stock prices in real time.

Today, the practice of data analytics is widely applied to “big data.” The practice of data analytics requires skills and knowledge in several areas. Computer skills are needed to process large volumes of information. Analytical skills are needed to evaluate, summarize, organize, and analyze the information. Critical thinking skills are needed to interpret and communicate the results of processing the information.

Our text supports the development of basic data analytical skills. In this edition, we added a new section at the end of each chapter called Data Analytics. As you work through the text, this section provides the instructor and student with opportu- nities to apply statistical knowledge and statistical software to explore several busi- ness environments. Interpretation of the analytical results is an integral part of these exercises.

A variety of statistical software is available to complement our text. Microsoft Excel includes an add-in with many statistical analyses. Megastat is an add-in available for Microsoft Excel. Minitab and JMP are stand-alone statistical software available to down- load for either PC or MAC computers. In our text, Microsoft Excel, Minitab, and Megastat are used to illustrate statistical software analyses. When a software application is pre- sented, the software commands for the application are available in Appendix C. We use screen captures within the chapters, so the student becomes familiar with the nature of the software output.

Because of the availability of computers and software, it is no longer necessary to dwell on calculations. We have replaced many of the calculation examples with interpre- tative ones, to assist the student in understanding and interpreting the statistical results. In addition, we place more emphasis on the conceptual nature of the statistical topics. While making these changes, we still continue to present, as best we can, the key con- cepts, along with supporting interesting and relevant examples.

A N O T E F R O M T H E A U T H O R S

vii

WHAT’S NEW IN THE SEVENTEENTH EDITION? We have made many changes to examples and exercises throughout the text. The sec- tion on “Enhancements” to our text details them. The major change to the text is in response to user interest in the area of data analytics. Our approach is to provide in- structors and students with the opportunity to combine statistical knowledge, computer and statistical software skills, and interpretative and critical thinking skills. A set of new and revised exercises is included at the end of chapters 1 through 18 in a section titled “Data Analytics.”

In these sections, exercises refer to three data sets. The North Valley Real Estate sales data set lists 105 homes currently on the market. The Lincolnville School District bus data lists information on 80 buses in the school district’s bus fleet. The authors de- signed these data so that students will be able to use statistical software to explore the data and find realistic relationships in the variables. The Baseball Statistics for the 2016 season is updated from the previous edition.

The intent of the exercises is to provide the basis of a continuing case analysis. We suggest that instructors select one of the data sets and assign the corresponding exer- cises as each chapter is completed. Instructor feedback regarding student performance is important. Students should retain a copy of each chapter’s results and interpretations to develop a portfolio of discoveries and findings. These will be helpful as students progress through the course and use new statistical techniques to further explore the data. The ideal ending for these continuing data analytics exercises is a comprehensive report based on the analytical findings.

We know that working with a statistics class to develop a very basic competence in data analytics is challenging. Instructors will be teaching statistics. In addition, instruc- tors will be faced with choosing statistical software and supporting students in develop- ing or enhancing their computer skills. Finally, instructors will need to assess student performance based on assignments that include both statistical and written compo- nents. Using a mentoring approach may be helpful.

We hope that you and your students find this new feature interesting and engaging.

HOW ARE CHAPTERS ORGANIZED TO ENGAGE STUDENTS AND PROMOTE LEARNING?

Chapter Learning Objectives Each chapter begins with a set of learning objectives designed to pro- vide focus for the chapter and motivate student learning. These objectives, lo- cated in the margins next to the topic, indicate what the student should be able to do after completing each sec- tion in the chapter.

Chapter Opening Exercise A representative exercise opens the chapter and shows how the chapter content can be applied to a real-world situation.

LEARNING OBJECTIVES When you have completed this chapter, you will be able to:

LO2-1 Summarize qualitative variables with frequency and relative frequency tables.

LO2-2 Display a frequency table using a bar or pie chart.

LO2-3 Summarize quantitative variables with frequency and relative frequency distributions.

LO2-4 Display a frequency distribution using a histogram or frequency polygon.

MERRILL LYNCH recently completed a study of online investment portfolios for a sample of clients. For the 70 participants in the study, organize these data into a frequency distribution. (See Exercise 43 and LO2-3.)

Describing Data: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS,

AND GRAPHIC PRESENTATION2

Source: © rido/123RF

Lin66360_ch02_018-050.indd 18 1/6/17 4:52 AM

Introduction to the Topic Each chapter starts with a review of the important concepts of the previ- ous chapter and provides a link to the material in the current chapter. This step-by-step approach increases com- prehension by providing continuity across the concepts.

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 19

INTRODUCTION The United States automobile retailing industry is highly competitive. It is dominated by megadealerships that own and operate 50 or more franchises, employ over 10,000 people, and generate several billion dollars in annual sales. Many of the top dealerships

are publicly owned with shares traded on the New York Stock Exchange or NASDAQ. In 2014, the largest megadealership was AutoNation (ticker symbol AN), followed by Penske Auto Group (PAG), Group 1 Automotive, Inc. (ticker symbol GPI), and the privately owned Van Tuyl Group.

These large corporations use statistics and analytics to summarize and analyze data and information to support their decisions. As an ex- ample, we will look at the Applewood Auto group. It owns four dealer- ships and sells a wide range of vehicles. These include the popular Korean brands Kia and Hyundai, BMW and Volvo sedans and luxury SUVs, and a full line of Ford and Chevrolet cars and trucks.

Ms. Kathryn Ball is a member of the senior management team at Applewood Auto Group, which has its corporate offices adjacent to Kane Motors. She is responsible for tracking and analyzing vehicle sales and

the profitability of those vehicles. Kathryn would like to summarize the profit earned on the vehicles sold with tables, charts, and graphs that she would review monthly. She wants to know the profit per vehicle sold, as well as the lowest and highest amount of profit. She is also interested in describing the demographics of the buyers. What are their ages? How many vehicles have they previously purchased from one of the Apple- wood dealerships? What type of vehicle did they purchase?

The Applewood Auto Group operates four dealerships:

• Tionesta Ford Lincoln sells Ford and Lincoln cars and trucks. • Olean Automotive Inc. has the Nissan franchise as well as the General Motors

brands of Chevrolet, Cadillac, and GMC Trucks. • Sheffield Motors Inc. sells Buick, GMC trucks, Hyundai, and Kia. • Kane Motors offers the Chrysler, Dodge, and Jeep line as well as BMW and Volvo.

Every month, Ms. Ball collects data from each of the four dealerships and enters them into an Excel spreadsheet. Last month the Applewood Auto Group sold 180 vehicles at the four dealerships. A copy of the first few observations appears to the left. The variables collected include:

• Age—the age of the buyer at the time of the purchase. • Profit—the amount earned by the dealership on the sale of each

vehicle. • Location—the dealership where the vehicle was purchased. • Vehicle type—SUV, sedan, compact, hybrid, or truck. • Previous—the number of vehicles previously purchased at any of the

four Applewood dealerships by the consumer.

The entire data set is available at the McGraw-Hill website (www.mhhe .com/lind17e) and in Appendix A.4 at the end of the text.

Source: © Justin Sullivan/Getty Images

CONSTRUCTING FREQUENCY TABLES Recall from Chapter 1 that techniques used to describe a set of data are called descrip- tive statistics. Descriptive statistics organize data to show the general pattern of the data, to identify where values tend to concentrate, and to expose extreme or unusual data values. The first technique we discuss is a frequency table.

LO2-1 Summarize qualitative variables with frequency and relative frequency tables.

FREQUENCY TABLE A grouping of qualitative data into mutually exclusive and collectively exhaustive classes showing the number of observations in each class.

Lin66360_ch02_018-050.indd 19 1/6/17 4:52 AM

Example/Solution After important concepts are introduced, a solved example is given. This example provides a how-to illustration and shows a relevant business application that helps students answer the question, “How can I apply this concept?”

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 95

INTRODUCTION Chapter 2 began our study of descriptive statistics. In order to transform raw or un- grouped data into a meaningful form, we organize the data into a frequency distribution. We present the frequency distribution in graphic form as a histogram or a frequency polygon. This allows us to visualize where the data tend to cluster, the largest and the smallest values, and the general shape of the data.

In Chapter 3, we first computed several measures of location, such as the mean, median, and mode. These measures of location allow us to report a typical value in the set of observations. We also computed several measures of dispersion, such as the range, variance, and standard deviation. These measures of dispersion allow us to de- scribe the variation or the spread in a set of observations.

We continue our study of descriptive statistics in this chapter. We study (1) dot plots, (2) stem-and-leaf displays, (3) percentiles, and (4) box plots. These charts and statistics give us additional insight into where the values are concentrated as well as the general shape of the data. Then we consider bivariate data. In bivariate data, we observe two variables for each individual or observation. Examples include the number of hours a student studied and the points earned on an examination; if a sampled product meets quality specifications and the shift on which it is manufactured; or the amount of electric- ity used in a month by a homeowner and the mean daily high temperature in the region for the month. These charts and graphs provide useful insights as we use business analytics to enhance our understanding of data.

DOT PLOTS Recall for the Applewood Auto Group data, we summarized the profit earned on the 180 vehicles sold with a frequency distribution using eight classes. When we orga- nized the data into the eight classes, we lost the exact value of the observations. A dot plot, on the other hand, groups the data as little as possible, and we do not lose the identity of an individual observation. To develop a dot plot, we display a dot for each observation along a horizontal number line indicating the possible values of the data. If there are identical observations or the observations are too close to be shown individually, the dots are “piled” on top of each other. This allows us to see the shape of the distribution, the value about which the data tend to cluster, and the largest and smallest observations. Dot plots are most useful for smaller data sets, whereas histo- grams tend to be most useful for large data sets. An example will show how to con- struct and interpret dot plots.

LO4-1 Construct and interpret a dot plot.

E X A M P L E

The service departments at Tionesta Ford Lincoln and Sheffield Motors Inc., two of the four Applewood Auto Group dealerships, were both open 24 days last month. Listed below is the number of vehicles serviced last month at the two dealerships. Construct dot plots and report summary statistics to compare the two dealerships.

Tionesta Ford Lincoln

Monday Tuesday Wednesday Thursday Friday Saturday

23 33 27 28 39 26 30 32 28 33 35 32 29 25 36 31 32 27 35 32 35 37 36 30

Lin66360_ch04_094-131.indd 95 1/10/17 7:41 PM

Self-Reviews Self-Reviews are interspersed throughout each chapter and follow Example/Solution sec- tions. They help students mon- itor their progress and provide immediate reinforcement for that particular technique. An- swers are in Appendix E.

106 CHAPTER 4

calculate quartiles. Excel 2013 and Excel 2016 offer both methods. The Excel function, Quartile.exc, will result in the same answer as Equation 4–1. The Excel function, Quar- tile.inc, will result in the Excel Method answers.

The Quality Control department of Plainsville Peanut Company is responsible for checking the weight of the 8-ounce jar of peanut butter. The weights of a sample of nine jars pro- duced last hour are:

7.69 7.72 7.8 7.86 7.90 7.94 7.97 8.06 8.09

(a) What is the median weight? (b) Determine the weights corresponding to the first and third quartiles.

S E L F - R E V I E W 4–2

11. Determine the median and the first and third quartiles in the following data.

46 47 49 49 51 53 54 54 55 55 59

12. Determine the median and the first and third quartiles in the following data.

5.24 6.02 6.67 7.30 7.59 7.99 8.03 8.35 8.81 9.45 9.61 10.37 10.39 11.86 12.22 12.71 13.07 13.59 13.89 15.42

13. The Thomas Supply Company Inc. is a distributor of gas-powered generators. As with any business, the length of time customers take to pay their invoices is im- portant. Listed below, arranged from smallest to largest, is the time, in days, for a sample of The Thomas Supply Company Inc. invoices.

13 13 13 20 26 27 31 34 34 34 35 35 36 37 38 41 41 41 45 47 47 47 50 51 53 54 56 62 67 82

a. Determine the first and third quartiles. b. Determine the second decile and the eighth decile. c. Determine the 67th percentile.

14. Kevin Horn is the national sales manager for National Textbooks Inc. He has a sales staff of 40 who visit college professors all over the United States. Each Saturday morning he requires his sales staff to send him a report. This re- port includes, among other things, the number of professors visited during the previous week. Listed below, ordered from smallest to largest, are the number of visits last week.

38 40 41 45 48 48 50 50 51 51 52 52 53 54 55 55 55 56 56 57 59 59 59 62 62 62 63 64 65 66 66 67 67 69 69 71 77 78 79 79

a. Determine the median number of calls. b. Determine the first and third quartiles. c. Determine the first decile and the ninth decile. d. Determine the 33rd percentile.

E X E R C I S E S

Lin66360_ch04_094-131.indd 106 1/10/17 7:41 PM

viii

Statistics in Action Statistics in Action articles are scattered through- out the text, usually about two per chapter. They provide unique, interesting applications and his- torical insights in the field of statistics.

144 CHAPTER 5

The General Rule of Addition The outcomes of an experiment may not be mutually exclusive. For example, the Florida Tourist Commission selected a sample of 200 tourists who visited the state during the year. The survey revealed that 120 tourists went to Disney World and 100 went to Busch Gardens near Tampa. What is the probability that a person selected visited either Disney World or Busch Gardens? If the special rule of addition is used, the probability of selecting a tourist who went to Disney World is .60, found by 120/200. Similarly, the probability of a tourist going to Busch Gardens is .50. The sum of these probabilities is 1.10. We know, however, that this probability cannot be greater than 1. The explanation is that many tour- ists visited both attractions and are being counted twice! A check of the survey responses revealed that 60 out of 200 sampled did, in fact, visit both attractions.

To answer our question, “What is the probability a selected person visited either Disney World or Busch Gardens?” (1) add the probability that a tourist visited Disney World and the probability he or she visited Busch Gardens, and (2) subtract the proba- bility of visiting both. Thus:

P(Disney or Busch) = P(Disney) + P(Busch) − P(both Disney and Busch) = .60 + .50 − .30 = .80

When two events both occur, the probability is called a joint probability. The prob- ability (.30) that a tourist visits both attractions is an example of a joint probability.

The following Venn diagram shows two events that are not mutually exclusive. The two events overlap to illustrate the joint event that some people have visited both attractions.

A sample of employees of Worldwide Enterprises is to be surveyed about a new health care plan. The employees are classified as follows:

Classification Event Number of Employees

Supervisors A 120 Maintenance B 50 Production C 1,460 Management D 302 Secretarial E 68

(a) What is the probability that the first person selected is: (i) either in maintenance or a secretary? (ii) not in management? (b) Draw a Venn diagram illustrating your answers to part (a). (c) Are the events in part (a)(i) complementary or mutually exclusive or both?

S E L F - R E V I E W 5–3

STATISTICS IN ACTION

If you wish to get some attention at the next gath- ering you attend, announce that you believe that at least two people present were born on the same date—that is, the same day of the year but not necessarily the same year. If there are 30 people in the room, the probability of a duplicate is .706. If there are 60 people in the room, the probability is .994 that at least two people share the same birthday. With as few as 23 people the chances are even, that is .50, that at least two people share the same birthday. Hint: To compute this, find the probability everyone was born on a different day and use the complement rule. Try this in your class.

Lin66360_ch05_132-174.indd 144 1/10/17 7:41 PM

Definitions Definitions of new terms or terms unique to the study of statistics are set apart from the text and highlighted for easy reference and review. They also appear in the Glossary at the end of the book.

A SURVEY OF PROBABILITY CONCEPTS 145

P (Disney) = .60 P (Busch) = .50

P (Disney and Busch) = .30

JOINT PROBABILITY A probability that measures the likelihood two or more events will happen concurrently.

So the general rule of addition, which is used to compute the probability of two events that are not mutually exclusive, is:

GENERAL RULE OF ADDITION P(A or B) = P(A) + P(B) − P(A and B) [5–4]

For the expression P(A or B), the word or suggests that A may occur or B may occur. This also includes the possibility that A and B may occur. This use of or is sometimes called an inclusive. You could also write P(A or B or both) to emphasize that the union of the events includes the intersection of A and B.

If we compare the general and special rules of addition, the important difference is determining if the events are mutually exclusive. If the events are mutually exclusive, then the joint probability P(A and B) is 0 and we could use the special rule of addition. Other- wise, we must account for the joint probability and use the general rule of addition.

E X A M P L E

What is the probability that a card chosen at random from a standard deck of cards will be either a king or a heart?

S O L U T I O N

We may be inclined to add the probability of a king and the probability of a heart. But this creates a problem. If we do that, the king of hearts is counted with the kings and also with the hearts. So, if we simply add the probability of a king (there are 4 in a deck of 52 cards) to the probability of a heart (there are 13 in a deck of 52 cards) and report that 17 out of 52 cards meet the requirement, we have counted the king of hearts twice. We need to subtract 1 card from the 17 so the king of hearts is counted only once. Thus, there are 16 cards that are either hearts or kings. So the probability is 16/52 = .3077.

Card Probability Explanation

King P(A) = 4/52 4 kings in a deck of 52 cards Heart P(B) = 13/52 13 hearts in a deck of 52 cards King of Hearts P(A and B) = 1/52 1 king of hearts in a deck of 52 cards

Lin66360_ch05_132-174.indd 145 1/10/17 7:41 PM

Formulas Formulas that are used for the first time are boxed and numbered for reference. In addi- tion, a formula card is bound into the back of the text that lists all the key formulas.

A SURVEY OF PROBABILITY CONCEPTS 147

16. Two coins are tossed. If A is the event “two heads” and B is the event “two tails,” are A and B mutually exclusive? Are they complements?

17. The probabilities of the events A and B are .20 and .30, respectively. The probability that both A and B occur is .15. What is the probability of either A or B occurring?

18. Let P(X) = .55 and P(Y) = .35. Assume the probability that they both occur is .20. What is the probability of either X or Y occurring?

19. Suppose the two events A and B are mutually exclusive. What is the probability of their joint occurrence?

20. A student is taking two courses, history and math. The probability the student will pass the history course is .60, and the probability of passing the math course is .70. The probability of passing both is .50. What is the probability of passing at least one?

21. The aquarium at Sea Critters Depot contains 140 fish. Eighty of these fish are green swordtails (44 female and 36 male) and 60 are orange swordtails (36 female and 24 males). A fish is randomly captured from the aquarium:

a. What is the probability the selected fish is a green swordtail? b. What is the probability the selected fish is male? c. What is the probability the selected fish is a male green swordtail? d. What is the probability the selected fish is either a male or a green swordtail?

22. A National Park Service survey of visitors to the Rocky Mountain region revealed that 50% visit Yellowstone Park, 40% visit the Tetons, and 35% visit both.

a. What is the probability a vacationer will visit at least one of these attractions? b. What is the probability .35 called? c. Are the events mutually exclusive? Explain.

RULES OF MULTIPLICATION TO CALCULATE PROBABILITY In this section, we discuss the rules for computing the likelihood that two events both happen, or their joint probability. For example, 16% of the 2016 tax returns were pre- pared by H&R Block and 75% of those returns showed a refund. What is the likelihood a person’s tax form was prepared by H&R Block and the person received a refund? Venn diagrams illustrate this as the intersection of two events. To find the likelihood of two events happening, we use the rules of multiplication. There are two rules of multipli- cation: the special rule and the general rule.

Special Rule of Multiplication The special rule of multiplication requires that two events A and B are independent. Two events are independent if the occurrence of one event does not alter the probabil- ity of the occurrence of the other event.

LO5-4 Calculate probabilities using the rules of multiplication.

INDEPENDENCE The occurrence of one event has no effect on the probability of the occurrence of another event.

One way to think about independence is to assume that events A and B occur at differ- ent times. For example, when event B occurs after event A occurs, does A have any effect on the likelihood that event B occurs? If the answer is no, then A and B are independent events. To illustrate independence, suppose two coins are tossed. The outcome of a coin toss (head or tail) is unaffected by the outcome of any other prior coin toss (head or tail).

For two independent events A and B, the probability that A and B will both occur is found by multiplying the two probabilities. This is the special rule of multiplication and is written symbolically as:

SPECIAL RULE OF MULTIPLICATION P(A and B) = P(A)P(B) [5–5]

Lin66360_ch05_132-174.indd 147 1/10/17 7:41 PM

Exercises Exercises are included after sec- tions within the chapter and at the end of the chapter. Section exercises cover the material stud- ied in the section. Many exercises have data files available to import into statistical software. They are indicated with the FILE icon. Answers to the odd-numbered exercises are in Appendix D.

DESCRIBING DATA: NUMERICAL MEASURES 79

INTERPRETATION AND USES OF THE STANDARD DEVIATION The standard deviation is commonly used as a measure to compare the spread in two or more sets of observations. For example, the standard deviation of the biweekly amounts invested in the Dupree Paint Company profit-sharing plan is computed to be $7.51. Suppose these employees are located in Georgia. If the standard deviation for a group of employees in Texas is $10.47, and the means are about the same, it indicates that the amounts invested by the Georgia employees are not dispersed as much as those in Texas (because $7.51 < $10.47). Since the amounts invested by the Georgia employees are clustered more closely about the mean, the mean for the Georgia em- ployees is a more reliable measure than the mean for the Texas group.

Chebyshev’s Theorem We have stressed that a small standard deviation for a set of values indicates that these values are located close to the mean. Conversely, a large standard deviation reveals that the observations are widely scattered about the mean. The Russian mathematician P. L. Chebyshev (1821–1894) developed a theorem that allows us to determine the minimum proportion of the values that lie within a specified number of standard deviations of the mean. For example, according to Chebyshev’s theorem, at least three out of every four, or 75%, of the values must lie between the mean plus two standard deviations and the mean minus two standard deviations. This relationship applies regardless of the shape of the distribution. Further, at least eight of nine values, or 88.9%, will lie between plus three standard deviations and minus three standard deviations of the mean. At least 24 of 25 values, or 96%, will lie between plus and minus five standard deviations of the mean.

Chebyshev’s theorem states:

LO3-5 Explain and apply Chebyshev’s theorem and the Empirical Rule.

STATISTICS IN ACTION

Most colleges report the “average class size.” This information can be mislead- ing because average class size can be found in several ways. If we find the number of students in each class at a particular university, the result is the mean number of students per class. If we compile a list of the class sizes for each student and find the mean class size, we might find the mean to be quite different. One school found the mean number of students in each of its 747 classes to be 40. But when

(continued)

CHEBYSHEV’S THEOREM For any set of observations (sample or population), the proportion of the values that lie within k standard deviations of the mean is at least 1 – 1/k2, where k is any value greater than 1.

For Exercises 47–52, do the following:

a. Compute the sample variance. b. Determine the sample standard deviation.

47. Consider these values a sample: 7, 2, 6, 2, and 3. 48. The following five values are a sample: 11, 6, 10, 6, and 7. 49. Dave’s Automatic Door, referred to in Exercise 37, installs automatic garage

door openers. Based on a sample, following are the times, in minutes, required to install 10 door openers: 28, 32, 24, 46, 44, 40, 54, 38, 32, and 42.

50. The sample of eight companies in the aerospace industry, referred to in Exer- cise 38, was surveyed as to their return on investment last year. The results are 10.6, 12.6, 14.8, 18.2, 12.0, 14.8, 12.2, and 15.6.

51. The Houston, Texas, Motel Owner Association conducted a survey regarding weekday motel rates in the area. Listed below is the room rate for business-class guests for a sample of 10 motels.

$101 $97 $103 $110 $78 $87 $101 $80 $106 $88

52. A consumer watchdog organization is concerned about credit card debt. A survey of 10 young adults with credit card debt of more than $2,000 showed they paid an average of just over $100 per month against their balances. Listed below are the amounts each young adult paid last month.

$110 $126 $103 $93 $99 $113 $87 $101 $109 $100

E X E R C I S E S

Lin66360_ch03_051-093.indd 79 1/6/17 4:51 AM

Computer Output The text includes many software examples, using Excel, MegaStat®, and Minitab. The software results are illustrated in the chapters. Instructions for a particular software example are in Appendix C.

64 CHAPTER 3

E X A M P L E

Table 2–4 on page 26 shows the profit on the sales of 180 vehicles at Applewood Auto Group. Determine the mean and the median selling price.

S O L U T I O N

The mean, median, and modal amounts of profit are reported in the following output (highlighted in the screen shot). (Reminder: The instructions to create the output appear in the Software Commands in Appendix C.) There are 180 vehicles in the study, so using a calculator would be tedious and prone to error.

Software Solution We can use a statistical software package to find many measures of location.

a. What is the arithmetic mean of the Alaska unemployment rates? b. Find the median and the mode for the unemployment rates. c. Compute the arithmetic mean and median for just the winter (Dec–Mar) months.

Is it much different? 22. Big Orange Trucking is designing an information system for use in “in-cab”

communications. It must summarize data from eight sites throughout a region to describe typical conditions. Compute an appropriate measure of central location for the variables wind direction, temperature, and pavement.

City Wind Direction Temperature Pavement

Anniston, AL West 89 Dry Atlanta, GA Northwest 86 Wet Augusta, GA Southwest 92 Wet Birmingham, AL South 91 Dry Jackson, MS Southwest 92 Dry Meridian, MS South 92 Trace Monroe, LA Southwest 93 Wet Tuscaloosa, AL Southwest 93 Trace

Lin66360_ch03_051-093.indd 64 1/6/17 4:51 AM

HOW DOES THIS TEXT REINFORCE STUDENT LEARNING?

BY CHAPTER

Chapter Summary Each chapter contains a brief summary of the chapter material, including vocab- ulary, definitions, and critical formulas.

202 CHAPTER 6

the number of transmission services, muffler replacements, and oil changes per day at Avellino’s Auto Shop. They follow Poisson distributions with means of 0.7, 2.0, and 6.0, respectively.

In summary, the Poisson distribution is a family of discrete distributions. All that is needed to construct a Poisson probability distribution is the mean number of defects, errors, or other random variable, designated as μ.

From actuary tables, Washington Insurance Company determined the likelihood that a man age 25 will die within the next year is .0002. If Washington Insurance sells 4,000 policies to 25-year-old men this year, what is the probability they will pay on exactly one policy?

S E L F - R E V I E W 6–6

31. In a Poisson distribution μ = 0.4. a. What is the probability that x = 0? b. What is the probability that x > 0?

32. In a Poisson distribution μ = 4. a. What is the probability that x = 2? b. What is the probability that x ≤ 2? c. What is the probability that x > 2?

33. Ms. Bergen is a loan officer at Coast Bank and Trust. From her years of experience, she estimates that the probability is .025 that an applicant will not be able to repay his or her installment loan. Last month she made 40 loans.

a. What is the probability that three loans will be defaulted? b. What is the probability that at least three loans will be defaulted?

34. Automobiles arrive at the Elkhart exit of the Indiana Toll Road at the rate of two per minute. The distribution of arrivals approximates a Poisson distribution.

a. What is the probability that no automobiles arrive in a particular minute? b. What is the probability that at least one automobile arrives during a particular

minute? 35. It is estimated that 0.5% of the callers to the Customer Service department of Dell

Inc. will receive a busy signal. What is the probability that of today’s 1,200 callers at least 5 received a busy signal?

36. In the past, schools in Los Angeles County have closed an average of 3 days each year for weather emergencies. What is the probability that schools in Los Angeles County will close for 4 days next year?

E X E R C I S E S

C H A P T E R S U M M A R Y

I. A random variable is a numerical value determined by the outcome of an experiment. II. A probability distribution is a listing of all possible outcomes of an experiment and the

probability associated with each outcome. A. A discrete probability distribution can assume only certain values. The main features are:

1. The sum of the probabilities is 1.00. 2. The probability of a particular outcome is between 0.00 and 1.00. 3. The outcomes are mutually exclusive.

B. A continuous distribution can assume an infinite number of values within a specific range. III. The mean and variance of a probability distribution are computed as follows.

A. The mean is equal to:

μ = Σ[xP(x)] (6–1) B. The variance is equal to:

σ2 = Σ[(x − μ)2P(x)] (6–2)

Lin66360_ch06_175-208.indd 202 1/14/17 7:02 AM

Pronunciation Key This section lists the mathematical symbol, its meaning, and how to pronounce it. We believe this will help the student retain the meaning of the symbol and generally en- hance course communications.

168 CHAPTER 5

P R O N U N C I A T I O N K E Y

SYMBOL MEANING PRONUNCIATION

P(A) Probability of A P of A

P(∼A) Probability of not A P of not A P(A and B) Probability of A and B P of A and B

P(A or B) Probability of A or B P of A or B

P(A | B) Probability of A given B has happened P of A given B

nPr Permutation of n items selected r at a time Pnr

nCr Combination of n items selected r at a time Cnr

C H A P T E R E X E R C I S E S

47. The marketing research department at Pepsico plans to survey teenagers about a newly developed soft drink. Each will be asked to compare it with his or her favorite soft drink. a. What is the experiment? b. What is one possible event?

48. The number of times a particular event occurred in the past is divided by the number of occurrences. What is this approach to probability called?

49. The probability that the cause and the cure for all cancers will be discovered before the year 2020 is .20. What viewpoint of probability does this statement illustrate?

50. Berdine’s Chicken Factory has several stores in the Hilton Head, South Carolina, area. When interviewing applicants for server positions, the owner would like to in- clude information on the amount of tip a server can expect to earn per check (or bill). A study of 500 recent checks indicated the server earned the following amounts in tips per 8-hour shift.

Amount of Tip Number

$0 up to $ 20 200 20 up to 50 100 50 up to 100 75 100 up to 200 75 200 or more 50

Total 500

a. What is the probability of a tip of $200 or more? b. Are the categories “$0 up to $20,” “$20 up to $50,” and so on considered mutually

exclusive? c. If the probabilities associated with each outcome were totaled, what would that total be? d. What is the probability of a tip of up to $50? e. What is the probability of a tip of less than $200?

51. Winning all three “Triple Crown” races is considered the greatest feat of a pedigree racehorse. After a successful Kentucky Derby, Corn on the Cob is a heavy favorite at 2 to 1 odds to win the Preakness Stakes. a. If he is a 2 to 1 favorite to win the Belmont Stakes as well, what is his probability of

winning the Triple Crown? b. What do his chances for the Preakness Stakes have to be in order for him to be

“even money” to earn the Triple Crown? 52. The first card selected from a standard 52-card deck is a king.

a. If it is returned to the deck, what is the probability that a king will be drawn on the second selection?

b. If the king is not replaced, what is the probability that a king will be drawn on the second selection?

Lin66360_ch05_132-174.indd 168 1/10/17 7:41 PM

Chapter Exercises Generally, the end-of-chapter exercises are the most challenging and integrate the chapter concepts. The answers and worked-out solutions for all odd- numbered exercises are in Appendix D at the end of the text. Many exercises are noted with a data file icon in the margin. For these exercises, there are data files in Excel format located on the text’s website, www.mhhe.com/Lind17e. These files help students use statistical software to solve the exercises.

348 CHAPTER 10

The major characteristics of the t distribution are: 1. It is a continuous distribution. 2. It is mound-shaped and symmetrical. 3. It is flatter, or more spread out, than the standard normal distribution. 4. There is a family of t distributions, depending on the number of degrees of freedom.

V. There are two types of errors that can occur in a test of hypothesis. A. A Type I error occurs when a true null hypothesis is rejected.

1. The probability of making a Type I error is equal to the level of significance. 2. This probability is designated by the Greek letter α.

B. A Type II error occurs when a false null hypothesis is not rejected. 1. The probability of making a Type II error is designated by the Greek letter β. 2. The likelihood of a Type II error must be calculated comparing the hypothesized

distribution to an alternate distribution based on sample results.

P R O N U N C I A T I O N K E Y

SYMBOL MEANING PRONUNCIATION

H0 Null hypothesis H sub zero

H1 Alternate hypothesis H sub one

α/2 Two-tailed significance level Alpha divided by 2 xc Limit of the sample mean x bar sub c

μ0 Assumed population mean mu sub zero

C H A P T E R E X E R C I S E S

25. According to the local union president, the mean gross income of plumbers in the Salt Lake City area follows the normal probability distribution with a mean of $45,000 and a standard deviation of $3,000. A recent investigative reporter for KYAK TV found, for a sample of 120 plumbers, the mean gross income was $45,500. At the .10 significance level, is it reasonable to conclude that the mean income is not equal to $45,000? Deter- mine the p-value.

26. Rutter Nursery Company packages its pine bark mulch in 50-pound bags. From a long history, the production department reports that the distribution of the bag weights follows the normal distribution and the standard deviation of the packaging process is 3 pounds per bag. At the end of each day, Jeff Rutter, the production manager, weighs 10 bags and computes the mean weight of the sample. Below are the weights of 10 bags from today’s production.

45.6 47.7 47.6 46.3 46.2 47.4 49.2 55.8 47.5 48.5

a. Can Mr. Rutter conclude that the mean weight of the bags is less than 50 pounds? Use the .01 significance level.

b. In a brief report, tell why Mr. Rutter can use the z distribution as the test statistic. c. Compute the p-value.

27. A new weight-watching company, Weight Reducers International, advertises that those who join will lose an average of 10 pounds after the first two weeks. The standard devi- ation is 2.8 pounds. A random sample of 50 people who joined the weight reduction program revealed a mean loss of 9 pounds. At the .05 level of significance, can we conclude that those joining Weight Reducers will lose less than 10 pounds? Determine the p-value.

28. Dole Pineapple Inc. is concerned that the 16-ounce can of sliced pineapple is being overfilled. Assume the standard deviation of the process is .03 ounce. The quality-con- trol department took a random sample of 50 cans and found that the arithmetic mean weight was 16.05 ounces. At the 5% level of significance, can we conclude that the mean weight is greater than 16 ounces? Determine the p-value.

Lin66360_ch10_318-352.indd 348 1/16/17 9:53 PM

Data Analytics The goal of the Data Analytics sec- tions is to develop analytical skills. The exercises present a real world context with supporting data. The data sets are printed in Appendix A and available to download from the text’s website www.mhhe.com/Lind17e. Statistical software is required to analyze the data and respond to the exercises. Each data set is used to explore questions and dis- cover findings that relate to a real world context. For each business context, a story is uncovered as students progress from chapters one to seventeen.

244 CHAPTER 7

68. In establishing warranties on HDTVs, the manufacturer wants to set the limits so that few will need repair at the manufacturer’s expense. On the other hand, the warranty period must be long enough to make the purchase attractive to the buyer. For a new HDTV, the mean number of months until repairs are needed is 36.84 with a standard deviation of 3.34 months. Where should the warranty limits be set so that only 10% of the HDTVs need repairs at the manufacturer’s expense?

69. DeKorte Tele-Marketing Inc. is considering purchasing a machine that randomly selects and automatically dials telephone numbers. DeKorte Tele-Marketing makes most of its calls during the evening, so calls to business phones are wasted. The manufacturer of the machine claims that its programming reduces the calling to business phones to 15% of all calls. To test this claim, the director of purchasing at DeKorte programmed the machine to select a sample of 150 phone numbers. What is the likelihood that more than 30 of the phone numbers selected are those of businesses, assuming the manu- facturer’s claim is correct?

70. A carbon monoxide detector in the Wheelock household activates once every 200 days on average. Assume this activation follows the exponential distribution. What is the probability that: a. There will be an alarm within the next 60 days? b. At least 400 days will pass before the next alarm? c. It will be between 150 and 250 days until the next warning? d. Find the median time until the next activation.

71. “Boot time” (the time between the appearance of the Bios screen to the first file that is loaded in Windows) on Eric Mouser’s personal computer follows an exponential distribu- tion with a mean of 27 seconds. What is the probability his “boot” will require: a. Less than 15 seconds? b. More than 60 seconds? c. Between 30 and 45 seconds? d. What is the point below which only 10% of the boots occur?

72. The time between visits to a U.S. emergency room for a member of the general popula- tion follows an exponential distribution with a mean of 2.5 years. What proportion of the population: a. Will visit an emergency room within the next 6 months? b. Will not visit the ER over the next 6 years? c. Will visit an ER next year, but not this year? d. Find the first and third quartiles of this distribution.

73. The times between failures on a personal computer follow an exponential distribution with a mean of 300,000 hours. What is the probability of: a. A failure in less than 100,000 hours? b. No failure in the next 500,000 hours? c. The next failure occurring between 200,000 and 350,000 hours? d. What are the mean and standard deviation of the time between failures?

D A T A A N A L Y T I C S

(The data for these exercises are available at the text website: www.mhhe.com/lind17e.)

74. Refer to the North Valley Real Estate data, which report information on homes sold during the last year. a. The mean selling price (in $ thousands) of the homes was computed earlier to be $357.0,

with a standard deviation of $160.7. Use the normal distribution to estimate the percent- age of homes selling for more than $500.000. Compare this to the actual results. Is price normally distributed? Try another test. If price is normally distributed, how many homes should have a price greater than the mean? Compare this to the actual number of homes. Construct a frequency distribution of price. What do you observe?

b. The mean days on the market is 30 with a standard deviation of 10 days. Use the normal distribution to estimate the number of homes on the market more than 24 days. Compare this to the actual results. Try another test. If days on the market is normally distributed, how many homes should be on the market more than the mean number of days? Compare this to the actual number of homes. Does the normal

Lin66360_ch07_209-249.indd 244 1/14/17 8:29 AM

Software Commands Software examples using Excel, Mega- Stat®, and Minitab are included through- out the text. The explanations of the computer input commands are placed at the end of the text in Appendix C.

780

11–2. The Minitab commands for the two-sample t-test on page 368 are:

a. Put the amount absorbed by the Store brand in C1 and the amount absorbed by the Name brand paper towel in C2.

b. From the toolbar, select Stat, Basic Statistics, and then 2-Sample, and click OK.

c. In the next dialog box, select Samples in different col- umns, select C1 Store for the First column and C2 Name of the Second, click the box next to Assume equal variances, and click OK.

11–3. The Excel commands for the paired t-test on page 373 are: a. Enter the data into columns B and C (or any other two col-

umns) in the spreadsheet, with the variable names in the first row.

b. Select the Data tab on the top menu. Then, on the far right, select Data Analysis. Select t-Test: Paired Two Sample for Means, and then click OK.

c. In the dialog box, indicate that the range of Variable 1 is from B1 to B11 and Variable 2 from C1 to C11, the Hypothesized Mean Difference is 0, click Labels, Alpha is .05, and the Output Range is E1. Click OK.

CHAPTER 12 12–1. The Excel commands for the test of variances on page 391 are: a. Enter the data for U.S. 25 in column A and for I-75 in col-

umn B. Label the two columns. b. Select the Data tab on the top menu. Then, on the far right,

select Data Analysis. Select F-Test: Two-Sample for Variances, then click OK.

c. The range of the first variable is A1:A8, and B1:B9 for the second. Click on Labels, enter 0.05 for Alpha, select D1 for the Output Range, and click OK.

12–2. The Excel commands for the one-way ANOVA on page 400 are: a. Key in data into four columns labeled Northern, WTA, Po-

cono, and Branson. b. Select the Data tab on the top menu. Then, on the far right,

select Data Analysis. Select ANOVA: Single Factor, then click OK.

c. In the subsequent dialog box, make the input range A1:D8, click on Grouped by Columns, click on Labels in first row, the Alpha text box is 0.05, and finally select Output Range as F1 and click OK.

c. In the dialog box, indicate that the range of Variable 1 is from A1 to A6 and Variable 2 from B1 to B7, the Hypothe- sized Mean Difference is 0, click Labels, Alpha is 0.05, and the Output Range is D1. Click OK.

Lin66360_appc_774-784.indd 780 1/20/17 10:28 AM

Answers to Self-Review The worked-out solutions to the Self-Reviews are pro- vided at the end of the text in Appendix E.

16–7 a. Rank

x y x y d d 2

805 23 5.5 1 4.5 20.25 777 62 3.0 9 −6.0 36.00 820 60 8.5 8 0.5 0.25 682 40 1.0 4 −3.0 9.00 777 70 3.0 10 −7.0 49.00 810 28 7.0 2 5.0 25.00 805 30 5.5 3 2.5 6.25 840 42 10.0 5 5.0 25.00 777 55 3.0 7 −4.0 16.00 820 51 8.5 6 2.5 6.25

0 193.00

rs = 1 − 6(193)

10(99) = −.170

b. H0: ρ = 0; H1: ρ ≠ 0. Reject H0 if t < −2.306 or t > 2.306.

t = −.170√ 10 − 2

1 − (−0.170)2 = −0.488

H0 is not rejected. We have not shown a relationship between the two tests.

CHAPTER 17 17–1 1.

Country Amount Index (Based=US) China 822.7 932.8 Japan 110.7 125.5 United States 88.2 100.0 India 86.5 98.1 Russia 71.5 81.1

China Produced 832.8% more steel than the US

2. a.

Year Average Hourly Earnings Index (1995 = Base) 1995 11.65 100.0 2000 14.02 120.3 2005 16.13 138.5 2013 19.97 171.4 2016 21.37 183.4

2016 Average wage Increased 83.4% from 1995

Year Average Hourly Earnings Index (1995 – 2000 = Base) 1995 11.65 90.8 2000 14.02 109.2 2005 16.13 125.7 2013 19.97 155.6 2016 21.37 166.5

2016 Average wage Increased 86.5% from the average of 1995, 2000

17–2 1. a. P1 = ($85/$75)(100) = 113.3 P2 = ($45/$40)(100) = 112.5 P = (113.3 + 112.5)/2 = 112.9 b. P = ($130/$115)(100) = 113.0

c. P = $85(500) + $45(1,200) $75(500) + $40(1,200)

(100)

= $96,500 85,500

(100) = 112.9

d. P = $85(520) + $45(1,300) $75(520) + $40(1,300)

(100)

= $102,700

$91,000 (100) = 112.9

e. P = √(112.9) (112.9) = 112.9

17–3 a. P = $4(9,000) + $5(200) + $8(5,000) $3(10,000) + $1(600) + $10(3,000)

(100)

= $77,000 60,600

(100) = 127.1

b. The value of sales went up 27.1% from 2001 to 2017

17–4 a. For 2011

Item Weight

Cotton ($0.25/$0.20)(100)(.10) = 12.50 Autos (1,200/1,000)(100)(.30) = 36.00 Money turnover (90/80)(100)(.60) = 67.50 Total 116.00

For 2016

Item Weight

Cotton ($0.50/$0.20)(100)(.10) = 25.00 Autos (900/1,000)(100)(.30) = 27.00 Money turnover (75/80)(100)(.60) = 56.25 Total 108.25

b. Business activity increased 16% from 2004 to 2009. It increased 8.25% from 2004 to 2014.

17–5 In terms of the base period, Jon’s salary was $14,637 in 2000 and $17,944 in 2016. This indicates that take-home pay in- creased at a faster rate than the rate of prices paid for food, transportation, etc.

17–6 $0.42, round by ($1.00/238.132)(100). The purchasing power has declined by $0.58.

17–7 Year IPI PPI

2007 111.07 92.9 2008 107.12 100.2 2009 94.80 95.3 2010 100.00 100.0 2011 102.93 107.8 2012 105.80 110.1 2013 107.83 110.5 2014 110.98 111.5 2015 111.32 105.8

The Industrial Production index (IPI) increased 11.32% from 2010 to 2015. The Producer Price Index (PPI) increases 5.8%.

CHAPTER 18 18–1

Year Number Produced Moving Average

2011 2 2012 6 4 2013 4 5 2014 5 4 2015 3 6 2016 10

Lin66360_appe_01-13.indd 11 1/11/17 8:22 AM

BY SECTION

Section Reviews After selected groups of chapters (1–4, 5–7, 8 and 9, 10–12, 13 and 14, 15 and 16, and 17 and 18), a Section Review is included. Much like a review before an exam, these include a brief overview of the chap- ters and problems for review.

126 A REVIEW OF CHAPTERS 1–4

D A T A A N A L Y T I C S

44. Refer to the North Valley real estate data recorded on homes sold during the last year. Prepare a report on the selling prices of the homes based on the answers to the following questions. a. Compute the minimum, maximum, median, and the first and the third quartiles of

price. Create a box plot. Comment on the distribution of home prices. b. Develop a scatter diagram with price on the vertical axis and the size of the home on

the horizontal. Is there a relationship between these variables? Is the relationship direct or indirect?

c. For homes without a pool, develop a scatter diagram with price on the vertical axis and the size of the home on the horizontal. Do the same for homes with a pool. How do the relationships between price and size for homes without a pool and homes with a pool compare?

45. Refer to the Baseball 2016 data that report information on the 30 Major League Baseball teams for the 2016 season. a. In the data set, the year opened, is the first year of operation for that stadium. For

each team, use this variable to create a new variable, stadium age, by subtracting the value of the variable, year opened, from the current year. Develop a box plot with the new variable, age. Are there any outliers? If so, which of the stadiums are outliers?

b. Using the variable, salary, create a box plot. Are there any outliers? Compute the quartiles using formula (4–1). Write a brief summary of your analysis.

c. Draw a scatter diagram with the variable, wins, on the vertical axis and salary on the horizontal axis. What are your conclusions?

d. Using the variable, wins, draw a dot plot. What can you conclude from this plot? 46. Refer to the Lincolnville School District bus data.

a. Referring to the maintenance cost variable, develop a box plot. What are the mini- mum, first quartile, median, third quartile, and maximum values? Are there any outliers?

b. Using the median maintenance cost, develop a contingency table with bus manufac- turer as one variable and whether the maintenance cost was above or below the median as the other variable. What are your conclusions?

A REVIEW OF CHAPTERS 1–4 This section is a review of the major concepts and terms introduced in Chapters 1–4. Chapter 1 began by describing the meaning and purpose of statistics. Next we described the different types of variables and the four levels of measurement. Chapter 2 was concerned with describing a set of observations by organizing it into a frequency distribution and then portraying the frequency distribution as a histogram or a frequency polygon. Chapter 3 began by describing measures of location, such as the mean, weighted mean, median, geometric mean, and mode. This chapter also included measures of dispersion, or spread. Discussed in this section were the range, variance, and standard deviation. Chapter 4 included several graphing techniques such as dot plots, box plots, and scatter diagrams. We also discussed the coefficient of skew- ness, which reports the lack of symmetry in a set of data.

Throughout this section we stressed the importance of statistical software, such as Excel and Minitab. Many computer outputs in these chapters demonstrated how quickly and effectively a large data set can be organized into a frequency distribution, several of the measures of location or measures of variation calculated, and the information presented in graphical form.

Lin66360_ch04_094-131.indd 126 1/10/17 7:41 PM

Cases The review also includes continuing cases and several small cases that let students make decisions using tools and techniques from a variety of chapters.

5. Refer to the following diagram.

0 40 80 120 160 200

* *

a. What is the graph called? b. What are the median, and first and third quartile values? c. Is the distribution positively skewed? Tell how you know. d. Are there any outliers? If yes, estimate these values. e. Can you determine the number of observations in the study?

A REVIEW OF CHAPTERS 1–4 129

C A S E S

A. Century National Bank The following case will appear in subsequent review sec- tions. Assume that you work in the Planning Department of the Century National Bank and report to Ms. Lamberg. You will need to do some data analysis and prepare a short writ- ten report. Remember, Mr. Selig is the president of the bank, so you will want to ensure that your report is complete and accurate. A copy of the data appears in Appendix A.6. Century National Bank has offices in several cities in the Midwest and the southeastern part of the United States. Mr. Dan Selig, president and CEO, would like to know the characteristics of his checking account custom- ers. What is the balance of a typical customer? How many other bank services do the checking ac- count customers use? Do the customers use the ATM ser- vice and, if so, how often? What about debit cards? Who uses them, and how often are they used? To better understand the customers, Mr. Selig asked Ms. Wendy Lamberg, director of planning, to select a sam- ple of customers and prepare a report. To begin, she has appointed a team from her staff. You are the head of the team and responsible for preparing the report. You select a random sample of 60 customers. In addition to the balance in each account at the end of last month, you determine (1) the number of ATM (automatic teller machine) transac- tions in the last month; (2) the number of other bank ser- vices (a savings account, a certificate of deposit, etc.) the customer uses; (3) whether the customer has a debit card (this is a bank service in which charges are made directly to the customer’s account); and (4) whether or not interest is paid on the checking account. The sample includes cus- tomers from the branches in Cincinnati, Ohio; Atlanta, Georgia; Louisville, Kentucky; and Erie, Pennsylvania.

1. Develop a graph or table that portrays the checking balances. What is the balance of a typical customer? Do many customers have more than $2,000 in their accounts? Does it appear that there is a difference in the distribution of the accounts among the four branches? Around what value do the account bal- ances tend to cluster?

2. Determine the mean and median of the checking ac- count balances. Compare the mean and the median balances for the four branches. Is there a difference among the branches? Be sure to explain the difference between the mean and the median in your report.

3. Determine the range and the standard deviation of the checking account balances. What do the first and third quartiles show? Determine the coefficient of skewness and indicate what it shows. Because Mr. Selig does not deal with statistics daily, include a brief description and interpretation of the standard deviation and other measures.

B. Wildcat Plumbing Supply Inc.: Do We Have Gender Differences?

Wildcat Plumbing Supply has served the plumbing needs of Southwest Arizona for more than 40 years. The company was founded by Mr. Terrence St. Julian and is run today by his son Cory. The company has grown from a handful of employees to more than 500 today. Cory is concerned about several positions within the company where he has men and women doing es- sentially the same job but at different pay. To investi- gate, he collected the information below. Suppose you are a student intern in the Accounting Department and have been given the task to write a report summarizing the situation.

Yearly Salary ($000) Women Men

Less than 30 2 0 30 up to 40 3 1 40 up to 50 17 4 50 up to 60 17 24 60 up to 70 8 21 70 up to 80 3 7 80 or more 0 3

To kick off the project, Mr. Cory St. Julian held a meeting with his staff and you were invited. At this meeting, it was suggested that you calculate several measures of

Lin66360_ch04_094-131.indd 129 1/10/17 7:41 PM

Practice Test The Practice Test is intended to give students an idea of content that might appear on a test and how the test might be structured. The Practice Test includes both objective questions and problems covering the material studied in the section.

130 A REVIEW OF CHAPTERS 1–4

location, create charts or draw graphs such as a cumula- tive frequency distribution, and determine the quartiles for both men and women. Develop the charts and write the report summarizing the yearly salaries of employees at Wildcat Plumbing Supply. Does it appear that there are pay differences based on gender?

C. Kimble Products: Is There a Difference In the Commissions?

At the January national sales meeting, the CEO of Kimble Products was questioned extensively regarding the com- pany policy for paying commissions to its sales represen- tatives. The company sells sporting goods to two major

markets. There are 40 sales representatives who call di- rectly on large-volume customers, such as the athletic de- partments at major colleges and universities and professional sports franchises. There are 30 sales repre- sentatives who represent the company to retail stores lo- cated in shopping malls and large discounters such as Kmart and Target. Upon his return to corporate headquarters, the CEO asked the sales manager for a report comparing the com- missions earned last year by the two parts of the sales team. The information is reported below. Write a brief re- port. Would you conclude that there is a difference? Be sure to include information in the report on both the cen- tral tendency and dispersion of the two groups.

Commissions Earned by Sales Representatives Calling on Large Retailers ($)

1,116 681 1,294 12 754 1,206 1,448 870 944 1,255 1,213 1,291 719 934 1,313 1,083 899 850 886 1,556 886 1,315 1,858 1,262 1,338 1,066 807 1,244 758 918

Commissions Earned by Sales Representatives Calling on Athletic Departments ($)

354 87 1,676 1,187 69 3,202 680 39 1,683 1,106 883 3,140 299 2,197 175 159 1,105 434 615 149 1,168 278 579 7 357 252 1,602 2,321 4 392 416 427 1,738 526 13 1,604 249 557 635 527

P R A C T I C E T E S T

There is a practice test at the end of each review section. The tests are in two parts. The first part contains several objec- tive questions, usually in a fill-in-the-blank format. The second part is problems. In most cases, it should take 30 to 45 minutes to complete the test. The problems require a calculator. Check the answers in the Answer Section in the back of the book.

Part 1—Objective 1. The science of collecting, organizing, presenting, analyzing, and interpreting data to assist in

making effective decisions is called . 1. 2. Methods of organizing, summarizing, and presenting data in an informative way are

called . 2. 3. The entire set of individuals or objects of interest or the measurements obtained from all

individuals or objects of interest are called the . 3. 4. List the two types of variables. 4. 5. The number of bedrooms in a house is an example of a . (discrete variable,

continuous variable, qualitative variable—pick one) 5. 6. The jersey numbers of Major League Baseball players are an example of what level of

measurement? 6. 7. The classification of students by eye color is an example of what level of measurement? 7. 8. The sum of the differences between each value and the mean is always equal to what value? 8. 9. A set of data contained 70 observations. How many classes would the 2k method suggest to

construct a frequency distribution? 9. 10. What percent of the values in a data set are always larger than the median? 10. 11. The square of the standard deviation is the . 11. 12. The standard deviation assumes a negative value when . (all the values are negative,

at least half the values are negative, or never—pick one.) 12. 13. Which of the following is least affected by an outlier? (mean, median, or range—pick one) 13.

Part 2—Problems 1. The Russell 2000 index of stock prices increased by the following amounts over the last 3 years.

18% 4% 2%

What is the geometric mean increase for the 3 years?

Lin66360_ch04_094-131.indd 130 1/10/17 7:41 PM

Required=Results

McGraw-Hill Connect® Learn Without Limits Connect is a teaching and learning platform that is proven to deliver better results for students and instructors.

Connect empowers students by continually adapting to deliver precisely what they need, when they need it, and how they need it, so your class time is more engaging and effective.

Mobile

Connect Insight® Connect Insight is Connect’s new one-of-a- kind visual analytics dashboard—now available for both instructors and students—that provides at-a-glance information regarding student performance, which is immediately actionable. By presenting assignment, assessment, and topical performance results together with a time metric that is easily visible for aggregate or individual results, Connect Insight gives the user the ability to take a just-in-time approach to teaching and learning, which was never before available. Connect Insight presents data that empowers students and helps instructors improve class performance in a way that is efficient and effective.

73% of instructors who use Connect require it; instructor satisfaction increases by 28%

when Connect is required.

Students can view their results for any

Connect course.

Analytics

Connect’s new, intuitive mobile interface gives students and instructors flexible and convenient, anytime–anywhere access to all components of the Connect platform.

©Getty Images/iStockphoto

Using Connect improves retention rates by 19.8%, passing rates by 12.7%, and exam scores by 9.1%.

SmartBook® Proven to help students improve grades and study more efficiently, SmartBook contains the same content within the print book, but actively tailors that content to the needs of the individual. SmartBook’s adaptive technology provides precise, personalized instruction on what the student should do next, guiding the student to master and remember key concepts, targeting gaps in knowledge and offering customized feedback, and driving the student toward comprehension and retention of the subject matter. Available on tablets, SmartBook puts learning at the student’s fingertips—anywhere, anytime.

Adaptive

Over 8 billion questions have been answered, making McGraw-Hill

Education products more intelligent, reliable, and precise.

THE ADAPTIVE READING EXPERIENCE DESIGNED TO TRANSFORM THE WAY STUDENTS READ

More students earn A’s and B’s when they use McGraw-Hill Education Adaptive products.

www.mheducation.com

INSTRUCTOR LIBRARY The Connect® Business Statistics Instructor Library is your repository for additional resources to improve student engagement in and out of class. You can select and use any asset that enhances your lecture, including:

• Solutions Manual The Solutions Manual, carefully revised by the authors, contains solutions to all basic, inter- mediate, and challenge problems found at the end of each chapter.

• Test Bank The Test Bank, revised by Wendy Bailey of Troy University, contains hundreds of true/false, multiple choice and short-answer/discussions, updated based on the revisions of the authors. The level of difficulty varies, as indicated by the easy, medium, and difficult labels.

• Powerpoint Presentations Prepared by Stephanie Campbell of Mineral Area College, the presentations con- tain exhibits, tables, key points, and summaries in a visually stimulating collection of slides.

• Excel Templates There are templates for various end of chapter problems that have been set as Excel spreadsheets—all denoted by an icon. Students can easily download, save the files and use the data to solve end of chapter problems.

MEGASTAT® FOR MICROSOFT EXCEL® MegaStat® by J. B. Orris of Butler University is a full-featured Excel statistical analysis add-in that is available on the MegaStat website at www.mhhe.com/megastat (for purchase). MegaStat works with recent versions of Microsoft Excel® (Windows and Mac OS X). See the website for details on supported versions.

Once installed, MegaStat will always be available on the Excel add-ins ribbon with no expiration date or data limita- tions. MegaStat performs statistical analyses within an Excel workbook. When a MegaStat menu item is selected, a dialog box pops up for data selection and options. Since MegaStat is an easy-to-use extension of Excel, students can focus on learning statistics without being distracted by the software. Ease-of-use features include Auto Expand for quick data selection and Auto Label detect.

MegaStat does most calculations found in introductory statistics textbooks, such as computing descriptive statistics, creating frequency distributions, and computing probabilities as well as hypothesis testing, ANOVA, chi-square analysis, and regression analysis (simple and multiple). MegaStat output is carefully formatted and appended to an output worksheet.

Video tutorials are included that provide a walkthrough using MegaStat for typical business statistics topics. A con- text-sensitive help system is built into MegaStat and a User’s Guide is included in PDF format.

MINITAB®/SPSS®/JMP® Minitab® Version 17, SPSS® Student Version 18.0, and JMP® Student Edition Version 8 are software products that are available to help students solve the exercises with data files. Each software product can be packaged with any McGraw-Hill business statistics text.

ADDITIONAL RESOURCES

xiv

ACKNOWLEDGMENTS

Stefan Ruediger Arizona State University Anthony Clark St. Louis Community College Umair Khalil West Virginia University Leonie Stone SUNY Geneseo

Golnaz Taghvatalab Central Michigan University John Yarber Northeast Mississippi Community College John Beyers University of Maryland

Mohammad Kazemi University of North Carolina Charlotte Anna Terzyan Loyola Marymount University Lee O. Cannell El Paso Community College

This edition of Statistical Techniques in Business and Economics is the product of many people: students, colleagues, reviewers, and the staff at McGraw-Hill Education. We thank them all. We wish to express our sincere gratitude to the reviewers:

Their suggestions and thorough reviews of the previous edition and the manuscript of this edi- tion make this a better text.

Special thanks go to a number of people. Shelly Moore, College of Western Idaho, and John Arcaro, Lakeland Community College, accuracy checked the Connect exercises. Ed Pappanastos, Troy University, built new data sets and revised Smartbook. Rene Ordonez, Southern Oregon University, built the Connect guided examples. Wendy Bailey, Tory University, prepared the test bank. Stephanie Campbell, Mineral Area College, prepared the Powerpoint decks. Vickie Fry, Westmoreland County Community College, provided countless hours of digital accuracy checking and support.

We also wish to thank the staff at McGraw-Hill. This includes Dolly Womack, Senior Brand Man- ager; Michele Janicek, Product Developer Coordinator; Camille Corum and Ryan McAndrews, Product Developers; Harvey Yep and Bruce Gin, Content Project Managers; and others we do not know per- sonally, but who have made valuable contributions.

xvi CONTENTS

xvi

ENHANCEMENTS TO STATISTICAL TECHNIQUES IN BUSINESS & ECONOMICS, 17E

MAJOR CHANGES MADE TO INDIVIDUAL CHAPTERS:

CHAPTER 1 What Is Statistics? • Revised Self-Review 1-2.

• New Section describing Business Analytics and its integration with the text.

• Updated exercises 2, 3, 17, and 19.

• New Data Analytics section with new data and questions.

CHAPTER 2 Describing Data: Frequency Tables, Frequency Distributions, and Graphic Presentation • Revised chapter introduction.

• Added more explanation about cumulative relative frequency distributions.

• Updated exercises 47 and 48 using real data.

• New Data Analytics section with new data and questions.

CHAPTER 3 Describing Data: Numerical Measures • Updated Self-Review 3-2.

• Updated Exercises 16, 18, 73, 77, and 82.

• New Data Analytics section with new data and questions.

CHAPTER 4 Describing Data: Displaying and Exploring Data • Updated exercise 22 with 2016 New York Yankee player

salaries.

• New Data Analytics section with new data and questions.

CHAPTER 5 A Survey of Probability Concepts • Revised the Example/Solution in the section on Bayes

Theorem.

• Updated exercises 45 and 58 using real data.

• New Data Analytics section with new data and questions.

CHAPTER 6 Discrete Probability Distributions • Expanded discussion of random variables.

• Revised the Example/Solution in the section on Poisson distribution.

• Updated exercises 18, 58, and 68.

• New Data Analytics section with new data and questions.

CHAPTER 7 Continuous Probability Distributions • Revised Self-Review 7-1.

• Revised the Example/Solutions using Uber as the context.

• Updated exercises 19, 22, 28, 36, 47, and 64.

• New Data Analytics section with new data and questions.

CHAPTER 8 Sampling Methods and the Central Limit Theorem • New Data Analytics section with new data and questions.

CHAPTER 9 Estimation and Confidence Intervals • New Self-Review 9-3 problem description.

• Updated exercises 5, 6, 12, 14, 23, 24, 33, 41, 43, and 61.

• New Data Analytics section with new data and questions.

CHAPTER 10 One-Sample Tests of Hypothesis • Revised the Example/Solutions using an airport, cell phone

parking lot as the context.

• Revised the section on Type II error to include an additional example.

• New Type II error exercises, 23 and 24.

• Updated exercises 19, 31, 32, and 43.

• New Data Analytics section with new data and questions.

CHAPTER 11 Two-Sample Tests of Hypothesis • Updated exercises 5, 9, 12, 26, 27, 30, 32, 34, 40, 42,

and 46.

• New Data Analytics section with new data and questions.

CHAPTER 12 Analysis of Variance • Revised Self-Reviews 12-1 and 12-3.

• Updated exercises 10, 21, 24, 33, 38, 42, and 44.

• New Data Analytics section with new data and questions.

CHAPTER 13 Correlation and Linear Regression • Added new conceptual formula, to relate the standard error

to the regression ANOVA table.

• Updated exercises 36, 41, 42, 43, and 57.

• New Data Analytics section with new data and questions.

CHAPTER 14 Multiple Regression Analysis • Updated exercises 19, 21, 23, 24, and 25.

• New Data Analytics section with new data and questions.

CHAPTER 15 Nonparametric Methods: Nominal Level Hypothesis Tests • Updated the context of Manelli Perfume Company Example/

Solution.

• Revised the “Hypothesis Test of Unequal Expected Frequen- cies” Example/Solution.

• Updated exercises 3, 31, 42, 46, and 61.

• New Data Analytics section with new data and questions.

xvii

CHAPTER 16 Nonparametric Methods: Analysis of Ordinal Data • Revised the “Sign Test” Example/Solution.

• Revised the “Testing a Hypothesis About a Median” Example/ Solution.

• Revised the “Wilcoxon Rank-Sum Test for Independent Popu- lations” Example/Solution.

• Revised Self-Reviews 16-3 and 16-6.

• Updated exercise 25.

• New Data Analytics section with new data and questions.

CHAPTER 17 Index Numbers • Revised Self-Reviews 17-1, 17-2, 17-3, 17-4, 17-5, 17-6, 17-7.

• Updated dates, illustrations, and examples.

• New Data Analytics section with new data and questions.

CHAPTER 18 Time Series and Forecasting • Updated dates, illustrations, and examples.

• New Data Analytics section with new data and questions.

CHAPTER 19 Statistical Process Control and Quality Management • Updated 2016 Malcolm Baldridge National Quality Award

winners.

• Updated exercises 13, 22, and 25.

xix

B R I E F C O N T E N T S

1 What is Statistics? 1 2 Describing Data: Frequency Tables, Frequency Distributions,

and Graphic Presentation 18

3 Describing Data: Numerical Measures 51 4 Describing Data: Displaying and Exploring Data 94 Review Section

5 A Survey of Probability Concepts 132 6 Discrete Probability Distributions 175 7 Continuous Probability Distributions 209 Review Section 8 Sampling Methods and the Central Limit Theorem 250 9 Estimation and Confidence Intervals 282 Review Section 10 One-Sample Tests of Hypothesis 318 11 Two-Sample Tests of Hypothesis 353 12 Analysis of Variance 386 Review Section 13 Correlation and Linear Regression 436 14 Multiple Regression Analysis 488 Review Section 15 Nonparametric Methods:

Nominal Level Hypothesis Tests 545

16 Nonparametric Methods: Analysis of Ordinal Data 582 Review Section

17 Index Numbers 621 18 Time Series and Forecasting 653 Review Section 19 Statistical Process Control and Quality Management 697 20 An Introduction to Decision Theory 728

Appendixes: Data Sets, Tables, Software Commands, Answers 745

Glossary 847

Index 851

C O N T E N T S

1 What is Statistics? 1 Introduction 2

Why Study Statistics? 2

What is Meant by Statistics? 3

Types of Statistics 4

Descriptive Statistics 4 Inferential Statistics 5

Types of Variables 6

Levels of Measurement 7

Nominal-Level Data 7 Ordinal-Level Data 8 Interval-Level Data 9 Ratio-Level Data 10

EXERCISES 11

Ethics and Statistics 12

Basic Business Analytics 12

Chapter Summary 13

Chapter Exercises 14

Data Analytics 17

2 Describing Data: FREQUENCY TABLES, FREQUENCY

DISTRIBUTIONS, AND GRAPHIC PRESENTATION 18 Introduction 19

Constructing Frequency Tables 19

Relative Class Frequencies 20

Graphic Presentation of Qualitative Data 21

EXERCISES 25

Constructing Frequency Distributions 26

Relative Frequency Distribution 30

EXERCISES 31

Graphic Presentation of a Distribution 32

Histogram 32 Frequency Polygon 35

EXERCISES 37

Cumulative Distributions 38

EXERCISES 41

Chapter Summary 42

Chapter Exercises 43

Data Analytics 49

3 Describing Data: NUMERICAL MEASURES 51

Introduction 52

Measures of Location 52

The Population Mean 53 The Sample Mean 54 Properties of the Arithmetic Mean 55

EXERCISES 56

The Median 57 The Mode 59

EXERCISES 61

The Relative Positions of the Mean, Median, and Mode 62

EXERCISES 63

Software Solution 64

The Weighted Mean 65

EXERCISES 66

The Geometric Mean 66

EXERCISES 68

Why Study Dispersion? 69

Range 70 Variance 71

EXERCISES 73

Population Variance 74 Population Standard Deviation 76

EXERCISES 76

Sample Variance and Standard Deviation 77 Software Solution 78

EXERCISES 79

Interpretation and Uses of the Standard Deviation 79

Chebyshev’s Theorem 79 The Empirical Rule 80

A Note from the Authors vi

CONTENTS xxi

EXERCISES 81

The Mean and Standard Deviation of Grouped Data 82

Arithmetic Mean of Grouped Data 82 Standard Deviation of Grouped Data 83

EXERCISES 85

Ethics and Reporting Results 86

Chapter Summary 86

Pronunciation Key 88

Chapter Exercises 88

Data Analytics 92

4 Describing Data: DISPLAYING AND EXPLORING DATA 94

Introduction 95

Dot Plots 95

Stem-and-Leaf Displays 96

EXERCISES 101

Measures of Position 103

Quartiles, Deciles, and Percentiles 103

EXERCISES 106

Box Plots 107

EXERCISES 109

Skewness 110

EXERCISES 113

Describing the Relationship between Two Variables 114

Contingency Tables 116

EXERCISES 118

Chapter Summary 119

Pronunciation Key 120

Chapter Exercises 120

Data Analytics 126

Problems 127

Cases 129

Practice Test 130

5 A Survey of Probability Concepts 132 Introduction 133

What is a Probability? 134

Approaches to Assigning Probabilities 136

Classical Probability 136 Empirical Probability 137 Subjective Probability 139

EXERCISES 140

Rules of Addition for Computing Probabilities 141

Special Rule of Addition 141 Complement Rule 143 The General Rule of Addition 144

EXERCISES 146

Rules of Multiplication to Calculate Probability 147

Special Rule of Multiplication 147 General Rule of Multiplication 148

Contingency Tables 150

Tree Diagrams 153

EXERCISES 155

Bayes’ Theorem 157

EXERCISES 161

Principles of Counting 161

The Multiplication Formula 161 The Permutation Formula 163 The Combination Formula 164

EXERCISES 166

Chapter Summary 167

Pronunciation Key 168

Chapter Exercises 168

Data Analytics 173

6 Discrete Probability Distributions 175 Introduction 176

What is a Probability Distribution? 176

Random Variables 178

Discrete Random Variable 179 Continuous Random Variable 179

The Mean, Variance, and Standard Deviation of a Discrete Probability Distribution 180

Mean 180 Variance and Standard Deviation 180

EXERCISES 182

Binomial Probability Distribution 184

How Is a Binomial Probability Computed? 185 Binomial Probability Tables 187

EXERCISES 190

Cumulative Binomial Probability Distributions 191

EXERCISES 193

Hypergeometric Probability Distribution 193

xxii CONTENTS

EXERCISES 197

Poisson Probability Distribution 197

EXERCISES 202

Chapter Summary 202

Chapter Exercises 203

Data Analytics 208

7 Continuous Probability Distributions 209 Introduction 210

The Family of Uniform Probability Distributions 210

EXERCISES 213

The Family of Normal Probability Distributions 214

The Standard Normal Probability Distribution 217

Applications of the Standard Normal Distribution 218 The Empirical Rule 218

EXERCISES 220

Finding Areas under the Normal Curve 221

EXERCISES 224

EXERCISES 226

EXERCISES 229

The Normal Approximation to the Binomial 229

Continuity Correction Factor 230 How to Apply the Correction Factor 232

EXERCISES 233

The Family of Exponential Distributions 234

EXERCISES 238

Chapter Summary 239

Chapter Exercises 240

Data Analytics 244

Problems 246

Cases 247

Practice Test 248

8 Sampling Methods and the Central Limit Theorem 250 Introduction 251

Sampling Methods 251

Reasons to Sample 251 Simple Random Sampling 252 Systematic Random Sampling 255 Stratified Random Sampling 255 Cluster Sampling 256

EXERCISES 257

Sampling “Error” 259

Sampling Distribution of the Sample Mean 261

EXERCISES 264

The Central Limit Theorem 265

EXERCISES 271

Using the Sampling Distribution of the Sample Mean 273

EXERCISES 275

Chapter Summary 275

Pronunciation Key 276

Chapter Exercises 276

Data Analytics 281

9 Estimation and Confidence Intervals 282 Introduction 283

Point Estimate for a Population Mean 283

Confidence Intervals for a Population Mean 284

Population Standard Deviation, Known σ 284 A Computer Simulation 289

EXERCISES 291

Population Standard Deviation, σ Unknown 292 EXERCISES 299

A Confidence Interval for a Population Proportion 300

EXERCISES 303

Choosing an Appropriate Sample Size 303

Sample Size to Estimate a Population Mean 304 Sample Size to Estimate a Population Proportion 305

EXERCISES 307

Finite-Population Correction Factor 307

EXERCISES 309

Chapter Summary 310

Chapter Exercises 311

Data Analytics 315

Problems 316

Cases 317

Practice Test 317

10 One-Sample Tests of Hypothesis 318 Introduction 319

What is Hypothesis Testing? 319

CONTENTS xxiii

Six-Step Procedure for Testing a Hypothesis 320

Step 1: State the Null Hypothesis (H0) and the Alternate Hypothesis (H1) 320 Step 2: Select a Level of Significance 321 Step 3: Select the Test Statistic 323 Step 4: Formulate the Decision Rule 323 Step 5: Make a Decision 324 Step 6: Interpret the Result 324

One-Tailed and Two-Tailed Hypothesis Tests 325

Hypothesis Testing for a Population Mean: Known Population Standard Deviation 327

A Two-Tailed Test 327 A One-Tailed Test 330

p-Value in Hypothesis Testing 331

EXERCISES 333

Hypothesis Testing for a Population Mean: Population Standard Deviation Unknown 334

EXERCISES 339

A Statistical Software Solution 340

EXERCISES 342

Type II Error 343

EXERCISES 346

Chapter Summary 347

Pronunciation Key 348

Chapter Exercises 348

Data Analytics 352

11 Two-Sample Tests of Hypothesis 353 Introduction 354

Two-Sample Tests of Hypothesis: Independent Samples 354

EXERCISES 359

Comparing Population Means with Unknown Population Standard Deviations 360

Two-Sample Pooled Test 360

EXERCISES 364

Unequal Population Standard Deviations 366

EXERCISES 369

Two-Sample Tests of Hypothesis: Dependent Samples 370

Comparing Dependent and Independent Samples 373

EXERCISES 375

Chapter Summary 377

Pronunciation Key 378

Chapter Exercises 378

Data Analytics 385

12 Analysis of Variance 386 Introduction 387

Comparing Two Population Variances 387

The F Distribution 387 Testing a Hypothesis of Equal Population Variances 388

EXERCISES 391

ANOVA: Analysis of Variance 392

ANOVA Assumptions 392 The ANOVA Test 394

EXERCISES 401

Inferences about Pairs of Treatment Means 402

EXERCISES 404

Two-Way Analysis of Variance 406

EXERCISES 411

Two-Way ANOVA with Interaction 412

Interaction Plots 412 Testing for Interaction 413 Hypothesis Tests for Interaction 415

EXERCISES 417

Chapter Summary 418

Pronunciation Key 420

Chapter Exercises 420

Data Analytics 429

Problems 431

Cases 433

Practice Test 434

13 Correlation and Linear Regression 436 Introduction 437

What is Correlation Analysis? 437

The Correlation Coefficient 440

EXERCISES 445

Testing the Significance of the Correlation Coefficient 447

EXERCISES 450

Regression Analysis 451

Least Squares Principle 451 Drawing the Regression Line 454

EXERCISES 457

Testing the Significance of the Slope 459

xxiv CONTENTS

EXERCISES 461

Evaluating a Regression Equation’s Ability to Predict 462

The Standard Error of Estimate 462 The Coefficient of Determination 463

EXERCISES 464

Relationships among the Correlation Coefficient, the Coefficient of Determination, and the Standard Error of Estimate 464

EXERCISES 466

Interval Estimates of Prediction 467

Assumptions Underlying Linear Regression 467 Constructing Confidence and Prediction Intervals 468

EXERCISES 471

Transforming Data 471

EXERCISES 474

Chapter Summary 475

Pronunciation Key 477

Chapter Exercises 477

Data Analytics 487

14 Multiple Regression Analysis 488 Introduction 489

Multiple Regression Analysis 489

EXERCISES 493

Evaluating a Multiple Regression Equation 495

The ANOVA Table 495 Multiple Standard Error of Estimate 496 Coefficient of Multiple Determination 497 Adjusted Coefficient of Determination 498

EXERCISES 499

Inferences in Multiple Linear Regression 499

Global Test: Testing the Multiple Regression Model 500 Evaluating Individual Regression Coefficients 502

EXERCISES 505

Evaluating the Assumptions of Multiple Regression 506

Linear Relationship 507 Variation in Residuals Same for Large and Small ŷ Values 508 Distribution of Residuals 509 Multicollinearity 509 Independent Observations 511

Qualitative Independent Variables 512

Regression Models with Interaction 515

Stepwise Regression 517

EXERCISES 519

Review of Multiple Regression 521

Chapter Summary 527

Pronunciation Key 528

Chapter Exercises 529

Data Analytics 539

Problems 541

Cases 542

Practice Test 543

15 Nonparametric Methods: NOMINAL LEVEL HYPOTHESIS TESTS 545

Introduction 546

Test a Hypothesis of a Population Proportion 546

EXERCISES 549

Two-Sample Tests about Proportions 550

EXERCISES 554

Goodness-of-Fit Tests: Comparing Observed and Expected Frequency Distributions 555

Hypothesis Test of Equal Expected Frequencies 555

EXERCISES 560

Hypothesis Test of Unequal Expected Frequencies 562

Limitations of Chi-Square 563

EXERCISES 565

Testing the Hypothesis That a Distribution is Normal 566

EXERCISES 569

Contingency Table Analysis 570

EXERCISES 573

Chapter Summary 574

Pronunciation Key 575

Chapter Exercises 576

Data Analytics 581

16 Nonparametric Methods: ANALYSIS OF ORDINAL DATA 582

Introduction 583

The Sign Test 583

CONTENTS xxv

EXERCISES 587

Using the Normal Approximation to the Binomial 588

EXERCISES 590

Testing a Hypothesis About a Median 590

EXERCISES 592

Wilcoxon Signed-Rank Test for Dependent Populations 592

EXERCISES 596

Wilcoxon Rank-Sum Test for Independent Populations 597

EXERCISES 601

Kruskal-Wallis Test: Analysis of Variance by Ranks 601

EXERCISES 605

Rank-Order Correlation 607

Testing the Significance of rs 609

EXERCISES 610

Chapter Summary 612

Pronunciation Key 613

Chapter Exercises 613

Data Analytics 616

Problems 618

Cases 619

Practice Test 619

17 Index Numbers 621 Introduction 622

Simple Index Numbers 622

Why Convert Data to Indexes? 625 Construction of Index Numbers 625

EXERCISES 627

Unweighted Indexes 628

Simple Average of the Price Indexes 628 Simple Aggregate Index 629

Weighted Indexes 629

Laspeyres Price Index 629 Paasche Price Index 631 Fisher’s Ideal Index 632

EXERCISES 633

Value Index 634

EXERCISES 635

Special-Purpose Indexes 636

Consumer Price Index 637 Producer Price Index 638 Dow Jones Industrial Average (DJIA) 638

EXERCISES 640

Consumer Price Index 640

Special Uses of the Consumer Price Index 641 Shifting the Base 644

EXERCISES 646

Chapter Summary 647

Chapter Exercises 648

Data Analytics 652

18 Time Series and Forecasting 653 Introduction 654

Components of a Time Series 654

Secular Trend 654 Cyclical Variation 655 Seasonal Variation 656 Irregular Variation 656

A Moving Average 657

Weighted Moving Average 660

EXERCISES 663

Linear Trend 663

Least Squares Method 665

EXERCISES 667

Nonlinear Trends 668

EXERCISES 669

Seasonal Variation 670

Determining a Seasonal Index 671

EXERCISES 676

Deseasonalizing Data 677

Using Deseasonalized Data to Forecast 678

EXERCISES 680

The Durbin-Watson Statistic 680

EXERCISES 686

Chapter Summary 686

Chapter Exercises 686

Data Analytics 693

Problems 695

Practice Test 696

19 Statistical Process Control and Quality Management 697 Introduction 698

A Brief History of Quality Control 698

Six Sigma 700

xxvi CONTENTS

Sources of Variation 701

Diagnostic Charts 702

Pareto Charts 702 Fishbone Diagrams 704

EXERCISES 705

Purpose and Types of Quality Control Charts 705

Control Charts for Variables 706 Range Charts 709

In-Control and Out-of-Control Situations 711

EXERCISES 712

Attribute Control Charts 713

p-Charts 713 c-Bar Charts 716

EXERCISES 718

Acceptance Sampling 719

EXERCISES 722

Chapter Summary 722

Pronunciation Key 723

Chapter Exercises 724

20 An Introduction to Decision Theory 728 Introduction 729

Elements of a Decision 729

Decision Making Under Conditions of Uncertainty 730

Payoff Table 730 Expected Payoff 731

EXERCISES 732

Opportunity Loss 733

EXERCISES 734

Expected Opportunity Loss 734

EXERCISES 735

Maximin, Maximax, and Minimax Regret Strategies 735

Value of Perfect Information 736

Sensitivity Analysis 737

EXERCISES 738

Decision Trees 739

Chapter Summary 740

Chapter Exercises 741

APPENDIXES 745

Appendix A: Data Sets 746

Appendix B: Tables 756

Appendix C: Software Commands 774

Appendix D: Answers to Odd-Numbered Chapter Exercises 785

Review Exercises 829

Solutions to Practice Tests 831

Appendix E: Answers to Self-Review 834

Glossary 847

Index 851

What is Statistics? 1

BEST BUY sells Fitbit wearable technology products that track a person’s physical activity and sleep quality. The Fitbit technology collects daily information on a person’s number of steps so that a person can track calories consumed. The information can be synced with a cell phone and displayed with a Fitbit app. Assume you know the daily number of Fitbit Flex 2 units sold last month at the Best Buy store in Collegeville, Pennsylvania. Describe a situation where the number of units sold is considered a sample. Illustrate a second situation where the number of units sold is considered a population. (See Exercise 11 and LO1-3.)

LEARNING OBJECTIVES When you have completed this chapter, you will be able to:

LO1-1 Explain why knowledge of statistics is important.

LO1-2 Define statistics and provide an example of how statistics is applied.

LO1-3 Differentiate between descriptive and inferential statistics.

LO1-4 Classify variables as qualitative or quantitative, and discrete or continuous.

LO1-5 Distinguish between nominal, ordinal, interval, and ratio levels of measurement.

LO1-6 List the values associated with the practice of statistics.

2 CHAPTER 1

INTRODUCTION Suppose you work for a large company and your supervisor asks you to decide if a new version of a smartphone should be produced and sold. You start by thinking about the product’s innovations and new features. Then, you stop and realize the consequences of the decision. The product will need to make a profit so the pricing and the costs of production and distribution are all very important. The decision to introduce the product is based on many alternatives. So how will you know? Where do you start?

Without a long experience in the industry, beginning to develop an intelligence that will make you an expert is essential. You select three other people to work with and meet with them. The conversation focuses on what you need to know and what information and data you need. In your meeting, many questions are asked. How many competitors are already in the market? How are smartphones priced? What design features do com- petitors’ products have? What features does the market require? What do customers want in a smartphone? What do customers like about the existing products? The answers will be based on business intelligence consisting of data and information collected through customer surveys, engineering analysis, and market research. In the end, your presentation to support your decision regarding the introduction of a new smartphone is based on the statistics that you use to summarize and organize your data, the statistics that you use to compare the new product to existing products, and the statistics to esti- mate future sales, costs, and revenues. The statistics will be the focus of the conversa- tion that you will have with your supervisor about this very important decision.

As a decision maker, you will need to acquire and analyze data to support your decisions. The purpose of this text is to develop your knowledge of basic statistical techniques and methods and how to apply them to develop the business and personal intelligence that will help you make decisions.

WHY STUDY STATISTICS? If you look through your university catalogue, you will find that statistics is required for many college programs. As you investigate a future career in accounting, economics,

human resources, finance, business analytics, or other business area, you also will discover that statistics is required as part of these college pro- grams. So why is statistics a requirement in so many disciplines?

A major driver of the requirement for statistics knowledge is the tech- nologies available for capturing data. Examples include the technology that Google uses to track how Internet users access websites. As people use Google to search the Internet, Google records every search and then uses these data to sort and prioritize the results for future Internet searches. One recent estimate indicates that Google processes 20,000 terabytes of information per day. Big-box retailers like Target, Walmart, Kroger, and others scan every purchase and use the data to manage the distribution of products, to make decisions about marketing and sales, and to track daily and even hourly sales. Police departments collect and use data to provide city residents with maps that communicate informa- tion about crimes committed and their location. Every organization is col- lecting and using data to develop knowledge and intelligence that will help people make informed decisions, and to track the implementation of their decisions. The graphic to the left shows the amount of data gener- ated every minute (www.domo.com). A good working knowledge of sta- tistics is useful for summarizing and organizing data to provide information that is useful and supportive of decision making. Statistics is used to make valid comparisons and to predict the outcomes of decisions.

In summary, there are at least three reasons for studying statistics: (1) data are collected everywhere and require statistical knowledge to

LO1-1 Explain why knowledge of statistics is important.

WHAT IS STATISTICS? 3

make the information useful, (2) statistical techniques are used to make professional and personal decisions, and (3) no matter what your career, you will need a knowl- edge of statistics to understand the world and to be conversant in your career. An understanding of statistics and statistical method will help you make more effective personal and professional decisions.

WHAT IS MEANT BY STATISTICS? This question can be rephrased in two, subtly different ways: what are statistics and what is statistics? To answer the first question, a statistic is a number used to communi- cate a piece of information. Examples of statistics are:

• The inflation rate is 2%. • Your grade point average is 3.5. • The price of a new Tesla Model S sedan is $79,570.

Each of these statistics is a numerical fact and communicates a very limited piece of in- formation that is not very useful by itself. However, if we recognize that each of these statistics is part of a larger discussion, then the question “what is statistics” is applicable. Statistics is the set of knowledge and skills used to organize, summarize, and analyze data. The results of statistical analysis will start interesting conversations in the search for knowledge and intelligence that will help us make decisions. For example:

• The inflation rate for the calendar year was 0.7%. By applying statistics we could compare this year’s inflation rate to the past observations of inflation. Is it higher, lower, or about the same? Is there a trend of increasing or decreasing inflation? Is there a relationship between interest rates and government bonds?

• Your grade point average (GPA) is 3.5. By collecting data and applying statistics, you can determine the required GPA to be admitted to the Master of Business Administration program at the University of Chicago, Harvard, or the University of Michigan. You can determine the likelihood that you would be admitted to a partic- ular program. You may be interested in interviewing for a management position with Procter & Gamble. What GPA does Procter & Gamble require for college grad- uates with a bachelor’s degree? Is there a range of acceptable GPAs?

• You are budgeting for a new car. You would like to own an electric car with a small carbon footprint. The price for the Tesla Model S Sedan is $79,570. By collecting additional data and applying statistics, you can analyze the alternatives. For exam- ple, another choice is a hybrid car that runs on both gas and electricity such as a 2015 Toyota Prius. It can be purchased for about $28,659. Another hybrid, the Chevrolet Volt, costs $33,995. What are the differences in the cars’ specifications? What additional information can be collected and summarized so that you can make a good purchase decision?

Another example of using statistics to provide information to evaluate decisions is the distribution and market share of Frito-Lay products. Data are collected on each of the Frito-Lay product lines. These data include the market share and the pounds of product sold. Statistics is used to present this information in a bar chart in Chart 1–1. It clearly shows Frito-Lay’s dominance in the potato, corn, and tortilla chip markets. It also shows the absolute measure of pounds of each product line consumed in the United States.

These examples show that statistics is more than the presentation of numerical in- formation. Statistics is about collecting and processing information to create a conversa- tion, to stimulate additional questions, and to provide a basis for making decisions. Specifically, we define statistics as:

LO1-2 Define statistics and provide an example of how statistics is applied.

STATISTICS IN ACTION

A feature of our textbook is called Statistics in Action. Read each one carefully to get an appreciation of the wide application of statis- tics in management, economics, nursing, law enforcement, sports, and other disciplines. • In 2015, Forbes pub-

lished a list of the rich- est Americans. William Gates, founder of Microsoft Corporation, is the richest. His net worth is estimated at $76.0 billion. (www .forbes.com)

• In 2015, the four largest privately owned American companies, ranked by revenue, were Cargill, Koch Industries, Dell, and Albertsons. (www .forbes.com)

• In the United States, a typical high school grad- uate earns $668 per week, a typical college graduate with a bache- lor’s degree earns $1,101 per week, and a typical college graduate with a master’s degree earns $1,326 per week. (www.bls.gov/emp/ ep_chart_001.htm)

STATISTICS The science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making more effective decisions.

4 CHAPTER 1

In this book, you will learn the basic techniques and applications of statistics that you can use to support your decisions, both personal and professional. To start, we will differentiate between descriptive and inferential statistics.

TYPES OF STATISTICS When we use statistics to generate information for decision making from data, we use either descriptive statistics or inferential statistics. Their application depends on the questions asked and the type of data available.

Descriptive Statistics Masses of unorganized data—such as the census of population, the weekly earnings of thousands of computer programmers, and the individual responses of 2,000 registered voters regarding their choice for president of the United States—are of little value as is. However, descriptive statistics can be used to organize data into a meaningful form. We define descriptive statistics as:

LO1-3 Differentiate between descriptive and inferential statistics.

DESCRIPTIVE STATISTICS Methods of organizing, summarizing, and presenting data in an informative way.

The following are examples that apply descriptive statistics to summarize a large amount of data and provide information that is easy to understand.

• There are a total of 46,837 miles of interstate highways in the United States. The interstate system represents only 1% of the nation’s total roads but carries more than 20% of the traffic. The longest is I-90, which stretches from Boston to Seattle, a distance of 3,099 miles. The shortest is I-878 in New York City, which is 0.70 mile in length. Alaska does not have any interstate highways, Texas has the most inter- state miles at 3,232, and New York has the most interstate routes with 28.

• The average person spent $133.91 on traditional Valentine’s Day merchandise in 2014. This is an increase of $2.94 from 2013. As in previous years, men spent more than twice the amount women spent on the holiday. The average man spent $108.38 to impress the people in his life while women only spent $48.41.

Statistical methods and techniques to generate descriptive statistics are presented in Chapters 2 and 4. These include organizing and summarizing data with frequency distributions and presenting frequency distributions with charts and graphs. In addition, statistical measures to summarize the characteristics of a distribution are discussed in Chapter 3.

Frito-Lay

Rest of Industry

0 100 200 300 400

Millions of Pounds

500 600 700 800

Potato Chips

Tortilla Chips

Pretzels

Extruded Snacks

Corn Chips

64%

75%

26%

56%

82%

CHART 1–1 Frito-Lay Volume and Share of Major Snack Chip Categories in U.S. Supermarkets

WHAT IS STATISTICS? 5

Inferential Statistics Sometimes we must make decisions based on a limited set of data. For example, we would like to know the operating characteristics, such as fuel efficiency measured by miles per gallon, of sport utility vehicles (SUVs) currently in use. If we spent a lot of time, money, and effort, all the owners of SUVs could be surveyed. In this case, our goal would be to survey the population of SUV owners.

POPULATION The entire set of individuals or objects of interest or the measurements obtained from all individuals or objects of interest.

INFERENTIAL STATISTICS The methods used to estimate a property of a population on the basis of a sample.

SAMPLE A portion, or part, of the population of interest.

However, based on inferential statistics, we can survey a limited number of SUV owners and collect a sample from the population.

Samples often are used to obtain reliable estimates of population parameters. (Sam- pling is discussed in Chapter 8.) In the process, we make trade-offs between the time, money, and effort to collect the data and the error of estimating a population parameter. The process of sampling SUVs is illustrated in the following graphic. In this example, we would like to know the mean or average SUV fuel efficiency. To estimate the mean of the population, six SUVs are sampled and the mean of their MPG is calculated.

Population All items

Sample Items selected

from the population

So, the sample of six SUVs represents evidence from the population that we use to reach an inference or conclusion about the average MPG for all SUVs. The process of sampling from a population with the objective of estimating properties of a population is called inferential statistics.

STATISTICS IN ACTION

Where did statistics get its start? In 1662 John Graunt published an article called “Natural and Political Obser- vations Made upon Bills of Mortality.” The author’s “observations” were the re- sult of a study and analysis of a weekly church publica- tion called “Bill of Mortality,” which listed births, christen- ings, and deaths and their causes. Graunt realized that the Bills of Mortality repre- sented only a fraction of all births and deaths in London. However, he used the data to reach broad conclusions or inferences about the im- pact of disease, such as the plague, on the general population. His logic is an example of statistical inference. His analysis and interpretation of the data are thought to mark the start of statistics.

6 CHAPTER 1

Inferential statistics is widely applied to learn something about a population in busi- ness, agriculture, politics, and government, as shown in the following examples:

• Television networks constantly monitor the popularity of their programs by hiring Nielsen and other organizations to sample the preferences of TV viewers. For example, 9.0% of a sample of households with TVs watched The Big Bang Theory during the week of November 2, 2015 (www.nielsen.com). These program ratings are used to make decisions about advertising rates and whether to continue or cancel a program.

• In 2015, a sample of U.S. Internal Revenue Service tax preparation volunteers were tested with three standard tax returns. The sample indicated that tax returns were completed with a 49% accuracy rate. In other words there were errors on about half of the returns. In this example, the statistics are used to make decisions about how to improve the accuracy rate by correcting the most common errors and improving the training of volunteers.

A feature of our text is self-review problems. There are a number of them inter- spersed throughout each chapter. The first self-review follows. Each self-review tests your comprehension of preceding material. The answer and method of solution are given in Appendix E. You can find the answer to the following self-review in 1–1 in Appendix E. We recommend that you solve each one and then check your answer.

The answers are in Appendix E.

The Atlanta-based advertising firm Brandon and Associates asked a sample of 1,960 con- sumers to try a newly developed chicken dinner by Boston Market. Of the 1,960 sampled, 1,176 said they would purchase the dinner if it is marketed. (a) Is this an example of descriptive statistics or inferential statistics? Explain. (b) What could Brandon and Associates report to Boston Market regarding acceptance of

the chicken dinner in the population?

TYPES OF VARIABLES There are two basic types of variables: (1) qualitative and (2) quantitative (see Chart 1–2). When an object or individual is observed and recorded as a nonnumeric characteristic, it is a qualitative variable or an attribute. Examples of qualitative variables are gender, bev- erage preference, type of vehicle owned, state of birth, and eye color. When a variable is qualitative, we usually count the number of observations for each category and determine

LO1-4 Classify variables as qualitative or quantitative, and discrete or continuous.

S E L F - R E V I E W 1–1

Types of Variables

Qualitative Quantitative

ContinuousDiscrete

• Brand of PC • Marital status • Hair color

• Children in a family • Strokes on a golf hole • TV sets owned

• Amount of income tax paid • Weight of a student • Yearly rainfall in Tampa, FL

CHART 1–2 Summary of the Types of Variables

WHAT IS STATISTICS? 7

what percent are in each category. For example, if we observe the variable eye color, what percent of the population has blue eyes and what percent has brown eyes? If the variable is type of vehicle, what percent of the total number of cars sold last month were SUVs? Qualitative variables are often summarized in charts and bar graphs (Chapter 2).

When a variable can be reported numerically, it is called a quantitative variable. Examples of quantitative variables are the balance in your checking account, the num- ber of gigabytes of data used on your cell phone plan last month, the life of a car battery (such as 42 months), and the number of people employed by a company.

Quantitative variables are either discrete or continuous. Discrete variables can as- sume only certain values, and there are “gaps” between the values. Examples of dis- crete variables are the number of bedrooms in a house (1, 2, 3, 4, etc.), the number of cars arriving at Exit 25 on I-4 in Florida near Walt Disney World in an hour (326, 421, etc.), and the number of students in each section of a statistics course (25 in section A, 42 in section B, and 18 in section C). We count, for example, the number of cars arriving at Exit 25 on I-4, and we count the number of statistics students in each section. Notice that a home can have 3 or 4 bedrooms, but it cannot have 3.56 bedrooms. Thus, there is a “gap” between possible values. Typically, discrete variables are counted.

Observations of a continuous variable can assume any value within a specific range. Examples of continuous variables are the air pressure in a tire and the weight of a shipment of tomatoes. Other examples are the ounces of raisins in a box of raisin bran cereal and the duration of flights from Orlando to San Diego. Grade point average (GPA) is a continuous variable. We could report the GPA of a particular student as 3.2576952. The usual practice is to round to 3 places—3.258. Typically, continuous variables result from measuring.

LEVELS OF MEASUREMENT Data can be classified according to levels of measurement. The level of measurement determines how data should be summarized and presented. It also will indicate the type of statistical analysis that can be performed. Here are two examples of the relationship between measurement and how we apply statistics. There are six colors of candies in a bag of M&Ms. Suppose we assign brown a value of 1, yellow 2, blue 3, orange 4, green

5, and red 6. What kind of variable is the color of an M&M? It is a qualita- tive variable. Suppose someone summarizes M&M color by adding the assigned color values, divides the sum by the number of M&Ms, and re- ports that the mean color is 3.56. How do we interpret this statistic? You are correct in concluding that it has no meaning as a measure of M&M color. As a qualitative variable, we can only report the count and per- centage of each color in a bag of M&Ms. As a second example, in a high school track meet there are eight competitors in the 400-meter run. We report the order of finish and that the mean finish is 4.5. What does the mean finish tell us? Nothing! In both of these instances, we have not used the appropriate statistics for the level of measurement.

There are four levels of measurement: nominal, ordinal, interval, and ratio. The low- est, or the most primitive, measurement is the nominal level. The highest is the ratio level of measurement.

Nominal-Level Data For the nominal level of measurement, observations of a qualitative variable are mea- sured and recorded as labels or names. The labels or names can only be classified and counted. There is no particular order to the labels.

LO1-5 Distinguish between nominal, ordinal, interval, and ratio levels of measurement.

NOMINAL LEVEL OF MEASUREMENT Data recorded at the nominal level of measurement is represented as labels or names. They have no order. They can only be classified and counted.

8 CHAPTER 1

The classification of the six colors of M&M milk chocolate candies is an example of the nominal level of measurement. We simply classify the candies by color. There is no natural order. That is, we could report the brown candies first, the orange first, or any of the other colors first. Recording the variable gender is another example of the nominal level of measurement. Suppose we count the number of students entering a football game with a student ID and report how many are men and how many are women. We could report either the men or the women first. For the data measured at the nominal level, we are limited to counting the number in each category of the variable. Often, we convert these counts to percentages. For example, a random sample of M&M candies reports the following percentages for each color:

Color Percent in a bag

Blue 24% Green 20% Orange 16% Yellow 14% Red 13% Brown 13%

To process the data for a variable measured at the nominal level, we often numer- ically code the labels or names. For example, if we are interested in measuring the home state for students at East Carolina University, we would assign a student’s home state of Alabama a code of 1, Alaska a code of 2, Arizona a 3, and so on. Using this procedure with an alphabetical listing of states, Wisconsin is coded 49 and Wyoming 50. Realize that the number assigned to each state is still a label or name. The reason we assign numerical codes is to facilitate counting the number of students from each state with statistical software. Note that assigning numbers to the states does not give us license to manipulate the codes as numerical information. Specifically, in this exam- ple, 1 + 2 = 3 corresponds to Alabama + Alaska = Arizona. Clearly, the nominal level of measurement does not permit any mathematical operation that has any valid interpretation.

Ordinal-Level Data The next higher level of measurement is the ordinal level. For this level of measure- ment a qualitative variable or attribute is either ranked or rated on a relative scale.

ORDINAL LEVEL OF MEASUREMENT Data recorded at the ordinal level of measurement is based on a relative ranking or rating of items based on a defined attribute or qualitative variable. Variables based on this level of measurement are only ranked or counted.

For example, many businesses make decisions about where to locate their facil- ities; in other words, where is the best place for their business? Business Facilities (www.businessfacilities.com) publishes a list of the top 10 states for the “best business climate.” The 2016 rankings are shown to the left. They are based on the evaluation of many different factors, including the cost of labor, business tax climate, quality of life, transportation infrastructure, educated workforce, and economic growth potential.

This is an example of an ordinal scale because the states are ranked in order of best to worst business climate. That is, we know the relative order of the states based

Best Business Climate

1. Florida 2. Utah 3. Texas 4. Georgia 5. Indiana 6. Tennessee 7. Nebraska 8. North Carolina 9. Virginia 10. Washington

WHAT IS STATISTICS? 9

on the attribute. For example, in 2016 Florida had the best business climate and Utah was second. Indiana was fifth, and that was better than Tennessee but not as good as Georgia. Notice we cannot say that Floridaʼs business climate is five times better than Indianaʼs business climate because the magnitude of the differences between the states is not known. To put it another way, we do not know if the magnitude of the differ- ence between Louisiana and Utah is the same as between Texas and Georgia.

Another example of the ordinal level measure is based on a scale that measures an attribute. This type of scale is used when students rate instructors on a variety of attri- butes. One attribute may be: “Overall, how do you rate the quality of instruction in this class?” A student’s response is recorded on a relative scale of inferior, poor, good, ex- cellent, and superior. An important characteristic of using a relative measurement scale is that we cannot distinguish the magnitude of the differences between groups. We do not know if the difference between “Superior” and “Good” is the same as the difference between “Poor” and “Inferior.”

Table 1–1 lists the frequencies of 60 student ratings of instructional quality for Pro- fessor James Brunner in an Introduction to Finance course. The data are summarized based on the order of the scale used to rate the instructor. That is, they are summarized by the number of students who indicated a rating of superior (6), good (26), and so on. We also can convert the frequencies to percentages. About 43.3% (26/60) of the stu- dents rated the instructor as good.

TABLE 1–1 Rating of a Finance Professor

Rating Frequency Percentage

Superior 6 10.0% Good 26 43.3% Average 16 26.7% Poor 9 15.0% Inferior 3 5.0%

Interval-Level Data The interval level of measurement is the next highest level. It includes all the character- istics of the ordinal level, but, in addition, the difference or interval between values is meaningful.

INTERVAL LEVEL OF MEASUREMENT For data recorded at the interval level of measurement, the interval or the distance between values is meaningful. The interval level of measurement is based on a scale with a known unit of measurement.

The Fahrenheit temperature scale is an example of the interval level of measurement. Suppose the high temperatures on three consecutive winter days in Boston are 28, 31, and 20 degrees Fahrenheit. These temperatures can be easily ranked, but we can also determine the interval or distance between temperatures. This is possible because 1 de- gree Fahrenheit represents a constant unit of measurement. That is, the distance between 10 and 15 degrees Fahrenheit is 5 degrees, and is the same as the 5-degree distance between 50 and 55 degrees Fahrenheit. It is also important to note that 0 is just a point on the scale. It does not represent the absence of the condition. The measurement of zero degrees Fahrenheit does not represent the absence of heat or cold. But by our own measurement scale, it is cold! A major limitation of a variable measured at the interval level is that we cannot make statements similar to 20 degrees Fahrenheit is twice as warm as 10 degrees Fahrenheit.

10 CHAPTER 1

Another example of the interval scale of measurement is women’s dress sizes. Listed below is information on several dimensions of a standard U.S. woman’s dress.

Size Bust (in) Waist (in) Hips (in)

8 32 24 35 10 34 26 37 12 36 28 39 14 38 30 41 16 40 32 43 18 42 34 45 20 44 36 47 22 46 38 49 24 48 40 51 26 50 42 53 28 52 44 55

Why is the “size” scale an interval measurement? Observe that as the size changes by two units (say from size 10 to size 12 or from size 24 to size 26), each of the mea- surements increases by 2 inches. To put it another way, the intervals are the same.

There is no natural zero point for dress size. A “size 0” dress does not have “zero” material. Instead, it would have a 24-inch bust, 16-inch waist, and 27-inch hips. More- over, the ratios are not reasonable. If you divide a size 28 by a size 14, you do not get the same answer as dividing a size 20 by a size 10. Neither ratio is equal to two, as the “size” number would suggest. In short, if the distances between the numbers make sense, but the ratios do not, then you have an interval scale of measurement.

Ratio-Level Data Almost all quantitative variables are recorded on the ratio level of measurement. The ratio level is the “highest” level of measurement. It has all the characteristics of the interval level, but, in addition, the 0 point and the ratio between two numbers are both meaningful.

RATIO LEVEL OF MEASUREMENT Data recorded at the ratio level of measurement are based on a scale with a known unit of measurement and a meaningful interpretation of zero on the scale.

Examples of the ratio scale of measurement include wages, units of production, weight, changes in stock prices, distance between branch offices, and height. Money is also a good illustration. If you have zero dollars, then you have no money, and a wage of $50 per hour is two times the wage of $25 per hour. Weight also is measured at the ratio level of measurement. If a scale is correctly calibrated, then it will read 0 when nothing is on the scale. Further, something that weighs 1 pound is half as heavy as something that weighs 2 pounds.

Table 1–2 illustrates the ratio scale of measurement for the variable, annual income for four father-and-son combinations. Observe that the senior Lahey earns twice as much as his son. In the Rho family, the son makes twice as much as the father.

Name Father Son

Lahey $80,000 $ 40,000 Nale 90,000 30,000 Rho 60,000 120,000 Steele 75,000 130,000

TABLE 1–2 Father–Son Income Combinations

WHAT IS STATISTICS? 11

Chart 1–3 summarizes the major characteristics of the various levels of measure- ment. The level of measurement will determine the type of statistical methods that can be used to analyze a variable. Statistical methods to analyze variables measured on a nominal level are discussed in Chapter 15; methods for ordinal-level variables are dis- cussed in Chapter 16. Statistical methods to analyze variables measured on an interval or ratio level are presented in Chapters 9 through 14.

Levels of Measurement

RatioNominal Ordinal Interval

Meaningful 0 point and ratio between values

Data may only be classi�ed

Data are ranked Meaningful difference between values

• Jersey numbers of football players • Make of car

• Your rank in class • Team standings in the Southeastern Conference

• Temperature • Dress size

• Number of patients seen • Number of sales calls made • Distance to class

CHART 1–3 Summary and Examples of the Characteristics for Levels of Measurement

(a) The mean age of people who listen to talk radio is 42.1 years. What level of measure- ment is used to assess the variable age?

(b) In a survey of luxury-car owners, 8% of the U.S. population owned luxury cars. In California and Georgia, 14% of people owned luxury cars. Two variables are included in this information. What are they and how are they measured?

S E L F - R E V I E W 1–2

The answers to the odd-numbered exercises are in Appendix D.

1. What is the level of measurement for each of the following variables? a. Student IQ ratings. b. Distance students travel to class. c. The jersey numbers of a sorority soccer team. d. A student’s state of birth. e. A student’s academic class—that is, freshman, sophomore, junior, or senior. f. Number of hours students study per week.

2. Slate is a daily magazine on the Web. Its business activities can be described by a number of variables. What is the level of measurement for each of the following variables?

a. The number of hits on their website on Saturday between 8:00 am and 9:00 am. b. The departments, such as food and drink, politics, foreign policy, sports, etc. c. The number of weekly hits on the Sam’s Club ad. d. The number of years each employee has been employed with Slate.

3. On the Web, go to your favorite news source and find examples of each type of variable. Write a brief memo that lists the variables and describes them in terms of qualitative or quantitative, discrete or continuous, and the measurement level.

E X E R C I S E S

12 CHAPTER 1

ETHICS AND STATISTICS Following events such as Wall Street money manager Bernie Madoff’s Ponzi scheme, which swindled billions from investors, and financial misrepresentations by Enron and Tyco, business students need to understand that these events were based on the mis- representation of business and financial information. In each case, people within each organization reported financial information to investors that indicated the companies were performing much better than the actual situation. When the true financial informa- tion was reported, the companies were worth much less than advertised. The result was many investors lost all or nearly all of the money they had invested.

The article “Statistics and Ethics: Some Advice for Young Statisticians,” in The American Statistician 57, no. 1 (2003), offers guidance. The authors advise us to practice statistics with integrity and honesty, and urge us to “do the right thing” when collecting, organizing, summarizing, analyzing, and interpreting numerical information. The real contribution of statistics to society is a moral one. Financial analysts need to provide information that truly reflects a company’s performance so as not to mislead individual investors. Information regarding product defects that may be harmful to people must be analyzed and reported with integrity and honesty. The authors of The American Statistician article further indicate that when we practice statistics, we need to maintain “an independent and principled point-of-view” when analyzing and reporting findings and results.

As you progress through this text, we will highlight ethical issues in the collection, analysis, presentation, and interpretation of statistical information. We also hope that, as you learn about using statistics, you will become a more informed consumer of informa- tion. For example, you will question a report based on data that do not fairly represent the population, a report that does not include all relevant statistics, one that includes an incorrect choice of statistical measures, or a presentation that introduces bias in a delib- erate attempt to mislead or misrepresent.

BASIC BUSINESS ANALYTICS A knowledge of statistics is necessary to support the increasing need for companies and organizations to apply business analytics. Business analytics is used to process and analyze data and information to support a story or narrative of a company’s business, such as “what makes us profitable,” “how will our customers respond to a change in marketing”? In addition to statistics, an ability to use computer software to summarize, organize, analyze, and present the findings of statistical analysis is essential. In this text, we will be using very elementary applications of business analytics using common and available computer software. Throughout our text, we will use Microsoft Excel and, oc- casionally, Minitab. Universities and colleges usually offer access to Microsoft Excel. Your computer already may be packaged with Microsoft Excel. If not, the Microsoft Office package with Excel often is sold at a reduced academic price through your uni- versity or college. In this text, we use Excel for the majority of the applications. We also use an Excel “Add-in” called MegaStat. If your instructor requires this package, it is avail- able at www.mhhe.com/megastat. This add-in gives Excel the capability to produce additional statistical reports. Occasionally, we use Minitab to illustrate an application. See www.minitab.com for further information. Minitab also offers discounted academic pricing. The 2016 version of Microsoft Excel supports the analyses in our text. However,

LO1-6 List the values associated with the practice of statistics.

4. For each of the following, determine whether the group is a sample or a population. a. The participants in a study of a new cholesterol drug. b. The drivers who received a speeding ticket in Kansas City last month. c. People on welfare in Cook County (Chicago), Illinois. d. The 30 stocks that make up the Dow Jones Industrial Average.

WHAT IS STATISTICS? 13

earlier versions of Excel for Apple Mac computers do not have the necessary add-in. If you do not have Excel 2016 and are using an Apple Mac computer with Excel, you can download the free, trial version of Stat Plus at www.analystsoft.com. It is a statistical software package that will integrate with Excel for Mac computers.

The following example shows the application of Excel to perform a statistical summary. It refers to sales information from the Applewood Auto Group, a multi-location car sales and service company. The Applewood information has sales information for 180 vehicle sales. Each sale is described by several variables: the age of the buyer, whether the buyer is a re- peat customer, the location of the dealership for the sale, the type of vehicle sold, and the profit for the sale. The following shows Excel’s summary of statistics for the variable profit. The summary of profit shows the mean profit per vehicle was $1,843.17, the median profit was slightly more at $1,882.50, and profit ranged from $294 to $3,292.

Throughout the text, we will motivate the use of computer software to summarize, describe, and present information and data. The applications of Excel are supported by instructions so that you can learn how to apply Excel to do statistical analysis. The in- structions are presented in Appendix C of this text. These data and other data sets and files are available on the text’s student website, www.mhhe.com/lind17e.

C H A P T E R S U M M A R Y

I. Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making more effective decisions.

II. There are two types of statistics. A. Descriptive statistics are procedures used to organize and summarize data. B. Inferential statistics involve taking a sample from a population and making estimates

about a population based on the sample results. 1. A population is an entire set of individuals or objects of interest or the measure-

ments obtained from all individuals or objects of interest. 2. A sample is a part of the population.

III. There are two types of variables. A. A qualitative variable is nonnumeric.

1. Usually we are interested in the number or percent of the observations in each category.

2. Qualitative data are usually summarized in graphs and bar charts.

14 CHAPTER 1

B. There are two types of quantitative variables and they are usually reported numerically. 1. Discrete variables can assume only certain values, and there are usually gaps be-

tween values. 2. A continuous variable can assume any value within a specified range.

IV. There are four levels of measurement. A. With the nominal level, the data are sorted into categories with no particular order to

the categories. B. The ordinal level of measurement presumes that one classification is ranked higher

than another. C. The interval level of measurement has the ranking characteristic of the ordinal level of

measurement plus the characteristic that the distance between values is a constant size.

D. The ratio level of measurement has all the characteristics of the interval level, plus there is a 0 point and the ratio of two values is meaningful.

C H A P T E R E X E R C I S E S

5. Explain the difference between qualitative and quantitative variables. Give an example of qualitative and quantitative variables.

6. Explain the difference between a sample and a population. 7. Explain the difference between a discrete and a continuous variable. Give an example

of each not included in the text. 8. For the following situations, would you collect information using a sample or a popula-

tion? Why? a. Statistics 201 is a course taught at a university. Professor Rauch has taught nearly

1,500 students in the course over the past 5 years. You would like to know the aver- age grade for the course.

b. As part of a research project, you need to report the average profit as a percent- age of revenue for the #1-ranked corporation in the Fortune 500 for each of the last 10 years.

c. You are looking forward to graduation and your first job as a salesperson for one of five large pharmaceutical corporations. Planning for your interviews, you will need to know about each company’s mission, profitability, products, and markets.

d. You are shopping for a new MP3 music player such as the Apple iPod. The manu- facturers advertise the number of music tracks that can be stored in the memory. Usually, the advertisers assume relatively short, popular songs to estimate the number of tracks that can be stored. You, however, like Broadway musical tunes and they are much longer. You would like to estimate how many Broadway tunes will fit on your MP3 player.

9. Exits along interstate highways were formerly numbered successively from the western or southern border of a state. However, the Department of Transportation has recently changed most of them to agree with the numbers on the mile markers along the highway. a. What level of measurement were data on the consecutive exit numbers? b. What level of measurement are data on the milepost numbers? c. Discuss the advantages of the newer system.

10. A poll solicits a large number of college undergraduates for information on the following variables: the name of their cell phone provider (AT&T, Verizon, and so on), the numbers of minutes used last month (200, 400, for example), and their satisfaction with the ser- vice (Terrible, Adequate, Excellent, and so forth). What is the level of measurement for each of these three variables?

11. Best Buy sells Fitbit wearable technology products that track a person’s activity. For ex- ample, the Fitbit technology collects daily information on a person’s number of steps so that a person can track calories consumed. The information can be synced with a cell phone and displayed with a Fitbit app. Assume you know the daily number of Fitbit Flex

WHAT IS STATISTICS? 15

2 units sold last month at the Best Buy store in Collegeville, Pennsylvania. Describe a situation where the number of units sold is considered a sample. Illustrate a second sit- uation where the number of units sold is considered a population.

12. Using the concepts of sample and population, describe how a presidential election is unlike an “exit” poll of the electorate.

13. Place these variables in the following classification tables. For each table, summarize your observations and evaluate if the results are generally true. For example, salary is reported as a continuous quantitative variable. It is also a continuous ratio-scaled variable. a. Salary b. Gender c. Sales volume of MP3 players d. Soft drink preference e. Temperature f. SAT scores g. Student rank in class h. Rating of a finance professor i. Number of home video screens

Discrete Variable Continuous Variable

Qualitative

Quantitative a. Salary

Discrete Continuous

Nominal

Ordinal

Interval

Ratio a. Salary

14. Using data from such publications as the Statistical Abstract of the United States, Forbes, or any news source, give examples of variables measured with nominal, ordinal, interval, and ratio scales.

15. The Struthers Wells Corporation employs more than 10,000 white-collar workers in its sales offices and manufacturing facilities in the United States, Europe, and Asia. A sam- ple of 300 U.S. workers revealed 120 would accept a transfer to a location outside the United States. On the basis of these findings, write a brief memo to Ms. Wanda Carter, Vice President of Human Services, regarding all white-collar workers in the firm and their willingness to relocate.

16. AVX Home Entertainment, Inc., recently began a “no-hassles” return policy. A sample of 500 customers who recently returned items showed 400 thought the policy was fair, 32 thought it took too long to complete the transaction, and the rest had no opin- ion. On the basis of this information, make an inference about customer reaction to the new policy.

17. The Wall Street Journal’s website, www.wsj.com, reported the number of cars and light-duty trucks sold through October of 2014 and October of 2015. The top six- teen manufacturers are listed here. Sales data often is reported in this way to compare current sales to last year’s sales.

16 CHAPTER 1

a. Using computer software, compare the October 2015 sales to the October 2014 sales for each manufacturer by computing the difference. Make a list of the manufac- turers that increased sales compared to 2014; make a list of manufacturers that de- creased sales.

b. Using computer software, compare 2014 sales to 2015 sales for each manufacturer by computing the percentage change in sales. Make a list of the manufacturers in order of increasing percentage changes. Which manufacturers are in the top five in percentage change? Which manufacturers are in the bottom five in percentage change?

c. Using computer software, first sort the data using the 2015 year-to-date sales. Then, design a bar graph to illustrate the 2014 and 2015 year-to-date sales for the top 12 manufacturers. Also, design a bar graph to illustrate the percentage change for the top 12 manufacturers. Compare these two graphs and prepare brief written comments.

18. The following chart depicts the average amounts spent by consumers on holiday gifts.

Write a brief report summarizing the amounts spent during the holidays. Be sure to in- clude the total amount spent and the percent spent by each group.

19. The following chart depicts the earnings in billions of dollars for ExxonMobil for the pe- riod 2003 until 2014. Write a brief report discussing the earnings at ExxonMobil during

Year-to-Date Sales

Through October Through October Manufacturer 2015 2014

General Motors Corp. 2,562,840 2,434,707 Ford Motor Company 2,178,587 2,065,612 Toyota Motor Sales USA Inc. 2,071,446 1,975,368 Chrysler 1,814,268 1,687,313 American Honda Motor Co Inc. 1,320,217 1,281,777 Nissan North America Inc. 1,238,535 1,166,389 Hyundai Motor America 638,195 607,539 Kia Motors America Inc. 526,024 489,711 Subaru of America Inc. 480,331 418,497 Volkswagen of America Inc. 294,602 301,187 Mercedes-Benz 301,915 281,728 BMW of North America Inc. 279,395 267,193 Mazda Motor of America Inc. 267,158 259,751 Audi of America Inc. 165,103 146,133 Mitsubishi Motors N A, Inc. 80,683 64,564 Volvo 53,803 47,823

WHAT IS STATISTICS? 17

the period. Was one year higher than the others? Did the earnings increase, decrease, or stay the same over the period?

Year Earnings ($ billions)

A B C D E F G H

2003 14.5

2004 16.7

2005 24.3

2006 26.2

2007 26.5

2008 45.2

2009 19.3

2010 30.5

2011 41.1

2012 44.9

2013 32.6

2014

1 2

13 32.5

D o

lla rs

(b ill

io ns

)

20 03

20 04

20 05

20 06

20 07

20 08

20 09

20 10

20 11

20 12

20 13

20 14

Year

ExxonMobile Annual Earnings

D A T A A N A L Y T I C S

20. Refer to the North Valley Real Estate data, which report information on homes sold in the area last year. Consider the following variables: selling price, number of bed- rooms, township, and mortgage type. a. Which of the variables are qualitative and which are quantitative? b. How is each variable measured? Determine the level of measurement for each of the

variables. 21. Refer to the Baseball 2016 data, which report information on the 30 Major League

Baseball teams for the 2016 season. Consider the following variables: number of wins, payroll, season attendance, whether the team is in the American or National League, and the number of home runs hit. a. Which of these variables are quantitative and which are qualitative? b. Determine the level of measurement for each of the variables.

22. Refer to the Lincolnville School District bus data, which report information on the school district’s bus fleet. a. Which of the variables are qualitative and which are quantitative? b. Determine the level of measurement for each variable.

LEARNING OBJECTIVES When you have completed this chapter, you will be able to:

LO2-1 Summarize qualitative variables with frequency and relative frequency tables.

LO2-2 Display a frequency table using a bar or pie chart.

LO2-3 Summarize quantitative variables with frequency and relative frequency distributions.

LO2-4 Display a frequency distribution using a histogram or frequency polygon.

Describing Data: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS,

AND GRAPHIC PRESENTATION2

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 19

The Applewood Auto Group operates four dealerships:

• Tionesta Ford Lincoln sells Ford and Lincoln cars and trucks. • Olean Automotive Inc. has the Nissan franchise as well as the General Motors

brands of Chevrolet, Cadillac, and GMC Trucks. • Sheffield Motors Inc. sells Buick, GMC trucks, Hyundai, and Kia. • Kane Motors offers the Chrysler, Dodge, and Jeep line as well as BMW and Volvo.

• Age—the age of the buyer at the time of the purchase. • Profit—the amount earned by the dealership on the sale of each

four Applewood dealerships by the consumer.

The entire data set is available at the McGraw-Hill website (www.mhhe .com/lind17e) and in Appendix A.4 at the end of the text.

LO2-1 Summarize qualitative variables with frequency and relative frequency tables.

FREQUENCY TABLE A grouping of qualitative data into mutually exclusive and collectively exhaustive classes showing the number of observations in each class.

20 CHAPTER 2

In Chapter 1, we distinguished between qualitative and quantitative variables. To review, a qualitative variable is nonnumeric, that is, it can only be classified into distinct categories. Examples of qualitative data include political affiliation (Republican, Demo- crat, Independent, or other), state of birth (Alabama, . . . , Wyoming), and method of payment for a purchase at Barnes & Noble (cash, digital wallet, debit, or credit). On the other hand, quantitative variables are numerical in nature. Examples of quantitative data relating to college students include the price of their textbooks, their age, and the num- ber of credit hours they are registered for this semester.

In the Applewood Auto Group data set, there are five variables for each vehicle sale: age of the buyer, amount of profit, dealer that made the sale, type of vehicle sold, and number of previous purchases by the buyer. The dealer and the type of vehicle are qualitative variables. The amount of profit, the age of the buyer, and the number of pre- vious purchases are quantitative variables.

Suppose Ms. Ball wants to summarize last month’s sales by location. The first step is to sort the vehicles sold last month according to their location and then tally, or count, the number sold at each location of the four locations: Tionesta, Olean, Sheffield, or Kane. The four locations are used to develop a frequency table with four mutually exclusive (distinctive) classes. Mutually exclu- sive classes means that a particular vehicle can be assigned to only one class. In addition, the frequency table must be collectively exhaustive. That is every vehi- cle sold last month is accounted for in the table. If every vehicle is included in the frequency table, the table will be collectively exhaustive and the total number of vehicles will be 180. How do we obtain these counts? Excel provides a tool called a Pivot Table that will quickly and accurately establish the four classes and do the counting. The Excel results follow in Table 2–1. The table shows a total of 180 vehicles and, of the 180 vehicles, 52 were sold at Kane Motors. © Steve Cole/Getty Images RF

TABLE 2–1 Frequency Table for Vehicles Sold Last Month at Applewood Auto Group by Location

Location Number of Cars

Kane 52 Olean 40 Sheffield 45 Tionesta 43

Total 180

Relative Class Frequencies You can convert class frequencies to relative class frequencies to show the fraction of the total number of observations in each class. A relative frequency captures the relationship between a class frequency and the total number of observations. In the vehicle sales ex- ample, we may want to know the percentage of total cars sold at each of the four locations. To convert a frequency table to a relative frequency table, each of the class frequencies is divided by the total number of observations. Again, this is easily accomplished using Excel. The fraction of vehicles sold last month at the Kane location is 0.289, found by 52 divided by 180. The relative frequency for each location is shown in Table 2–2.

TABLE 2–2 Relative Frequency Table of Vehicles Sold by Location Last Month at Applewood Auto Group

Location Number of Cars Relative Frequency Found by

Kane 52 .289 52/180 Olean 40 .222 40/180 Sheffield 45 .250 45/180 Tionesta 43 .239 43/180

Total 180 1.000

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 21

GRAPHIC PRESENTATION OF QUALITATIVE DATA The most common graphic form to present a qualitative variable is a bar chart. In most cases, the horizontal axis shows the variable of interest. The vertical axis shows the frequency or fraction of each of the possible outcomes. A distinguishing feature of a bar chart is there is distance or a gap between the bars. That is, because the variable of in- terest is qualitative, the bars are not adjacent to each other. Thus, a bar chart graphically describes a frequency table using a series of uniformly wide rectangles, where the height of each rectangle is the class frequency.

LO2-2 Display a frequency table using a bar or pie chart.

BAR CHART A graph that shows qualitative classes on the horizontal axis and the class frequencies on the vertical axis. The class frequencies are proportional to the heights of the bars.

PIE CHART A chart that shows the proportion or percentage that each class represents of the total number of frequencies.

We use the Applewood Auto Group data as an example (Chart 2–1). The variables of interest are the location where the vehicle was sold and the number of vehicles sold at each location. We label the horizontal axis with the four locations and scale the verti- cal axis with the number sold. The variable location is of nominal scale, so the order of the locations on the horizontal axis does not matter. In Chart 2–1, the locations are listed alphabetically. The locations could also be in order of decreasing or increasing frequencies.

The height of the bars, or rectangles, corresponds to the number of vehicles at each location. There were 52 vehicles sold last month at the Kane location, so the height of the Kane bar is 52; the height of the bar for the Olean location is 40.

Nu m

be r o

f V eh

ic le

s So

0 Kane Olean

Location

Shef�eld Tionesta

CHART 2–1 Number of Vehicles Sold by Location

Another useful type of chart for depicting qualitative information is a pie chart.

We explain the details of constructing a pie chart using the information in Table 2–3, which shows the frequency and percent of cars sold by the Applewood Auto Group for each vehicle type.

22 CHAPTER 2

The first step to develop a pie chart is to mark the percentages 0, 5, 10, 15, and so on evenly around the circumference of a circle (see Chart 2–2). To plot the 40% of total sales represented by sedans, draw a line from the center of the circle to 0 and another line from the center of the circle to 40%. The area in this “slice” represents the number of sedans sold as a percentage of the total sales. Next, add the SUV’s percentage of total sales, 30%, to the sedan’s percentage of total sales, 40%. The result is 70%. Draw a line from the center of the circle to 70%, so the area between 40 and 70 shows the sales of SUVs as a percentage of total sales. Continuing, add the 15% of total sales for compact vehicles, which gives us a total of 85%. Draw a line from the center of the circle to 85, so the “slice” between 70% and 85% represents the number of compact vehicles sold as a percentage of the total sales. The remaining 10% for truck sales and 5% for hybrid sales are added to the chart using the same method.

Vehicle Type Number Sold Percent Sold

Sedan 72 40 SUV 54 30 Compact 27 15 Truck 18 10 Hybrid 9 5

Total 180 100

TABLE 2–3 Vehicle Sales by Type at Applewood Auto Group

25%

50%

70%

85%

95% 0%

40%

75%

Hybrid

Truck

Sedan

SUV

Compact

CHART 2–2 Pie Chart of Vehicles by Type

Because each slice of the pie represents the relative frequency of each vehicle type as a percentage of the total sales, we can easily compare them:

• The largest percentage of sales is for sedans. • Sedans and SUVs together account for 70% of vehicle sales. • Hybrids account for 5% of vehicle sales, in spite of being on the market for only a

few years.

We can use Excel software to quickly count the number of cars for each vehicle type and create the frequency table, bar chart, and pie chart shown in the following summary. The Excel tool is called a Pivot Table. The instructions to produce these de- scriptive statistics and charts are given in Appendix C.

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 23

Pie and bar charts both serve to illustrate frequency and relative frequency ta- bles. When is a pie chart preferred to a bar chart? In most cases, pie charts are used to show and compare the relative differences in the percentage of observations for each value or class of a qualitative variable. Bar charts are preferred when the goal is to compare the number or frequency of observations for each value or class of a qualitative variable. The following Example/Solution shows another application of bar and pie charts.

E X A M P L E

SkiLodges.com is test marketing its new website and is interested in how easy its website design is to navigate. It randomly selected 200 regular Internet users and asked them to perform a search task on the website. Each person was asked to rate the relative ease of navigation as poor, good, excellent, or awesome. The re- sults are shown in the following table:

Awesome 102 Excellent 58 Good 30 Poor 10

1. What type of measurement scale is used for ease of navigation? 2. Draw a bar chart for the survey results. 3. Draw a pie chart for the survey results.

S O L U T I O N

The data are measured on an ordinal scale. That is, the scale is ranked in relative ease of navigation when moving from “awesome” to “poor.” The interval between each rating is unknown so it is impossible, for example, to conclude that a rating of good is twice the value of a poor rating.

We can use a bar chart to graph the data. The vertical scale shows the relative frequency and the horizontal scale shows the values of the ease-of- navigation variable.

24 CHAPTER 2

A pie chart can also be used to graph these data. The pie chart emphasizes that more than half of the respondents rate the relative ease of using the website awesome.

Re la

tiv e

Fr eq

ue nc

y %

0 PoorGoodExcellentAwesome

Ease of Navigation of SkiLodges.com website

Ease of Navigation

Beverage Number

Cola-Plus 40 Coca-Cola 25 Pepsi 20 Lemon-Lime 15

Total 100

The answers are in Appendix E.

DeCenzo Specialty Food and Beverage Company has been serving a cola drink with an additional flavoring, Cola-Plus, that is very popular among its customers. The company is interested in customer preferences for Cola-Plus versus Coca-Cola, Pepsi, and a lemon-lime beverage. They ask 100 randomly sampled customers to take a taste test and select the beverage they prefer most. The results are shown in the following table:

S E L F - R E V I E W 2–1

Poor 5%

Ease of Navigation of SkiLodges.com website

Good 15%

Awesome 51% Excellent

29%

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 25

(a) Is the data qualitative or quantitative? Why? (b) What is the table called? What does it show? (c) Develop a bar chart to depict the information. (d) Develop a pie chart using the relative frequencies.

The answers to the odd-numbered exercises are at the end of the book in Appendix D.

1. A pie chart shows the relative market share of cola products. The “slice” for Pepsi- Cola has a central angle of 90 degrees. What is its market share?

2. In a marketing study, 100 consumers were asked to select the best digital music player from the iPod, the iRiver, and the Magic Star MP3. To summarize the con- sumer responses with a frequency table, how many classes would the frequency table have?

3. A total of 1,000 residents in Minnesota were asked which season they preferred. One hundred liked winter best, 300 liked spring, 400 liked summer, and 200 liked fall. Develop a frequency table and a relative frequency table to summarize this information.

4. Two thousand frequent business travelers are asked which midwestern city they prefer: Indianapolis, Saint Louis, Chicago, or Milwaukee. One hundred liked India- napolis best, 450 liked Saint Louis, 1,300 liked Chicago, and the remainder pre- ferred Milwaukee. Develop a frequency table and a relative frequency table to summarize this information.

5. Wellstone Inc. produces and markets replacement covers for cell phones in five different colors: bright white, metallic black, magnetic lime, tangerine orange, and fusion red. To estimate the demand for each color, the company set up a kiosk in the Mall of America for several hours and asked randomly selected people which cover color was their favorite. The results follow:

E X E R C I S E S

Bright white 130 Metallic black 104 Magnetic lime 325 Tangerine orange 455 Fusion red 286

a. What is the table called? b. Draw a bar chart for the table. c. Draw a pie chart. d. If Wellstone Inc. plans to produce 1 million cell phone covers, how many of

each color should it produce? 6. A small business consultant is investigating the performance of several companies.

The fourth-quarter sales for last year (in thousands of dollars) for the selected com- panies were:

Fourth-Quarter Sales Company ($ thousands)

Hoden Building Products $ 1,645.2 J & R Printing Inc. 4,757.0 Long Bay Concrete Construction 8,913.0 Mancell Electric and Plumbing 627.1 Maxwell Heating and Air Conditioning 24,612.0 Mizelle Roofing & Sheet Metals 191.9

The consultant wants to include a chart in his report comparing the sales of the six companies. Use a bar chart to compare the fourth-quarter sales of these corpora- tions and write a brief report summarizing the bar chart.

26 CHAPTER 2

CONSTRUCTING FREQUENCY DISTRIBUTIONS In Chapter 1 and earlier in this chapter, we distinguished between qualitative and quantitative data. In the previous section, using the Applewood Automotive Group data, we summarized two qualitative variables: the location of the sale and the type of vehicle sold. We created frequency and relative frequency tables and depicted the results in bar and pie charts.

The Applewood Auto Group data also includes several quantitative variables: the age of the buyer, the profit earned on the sale of the vehicle, and the number of previ- ous purchases. Suppose Ms. Ball wants to summarize last month’s sales by profit earned for each vehicle. We can describe profit using a frequency distribution.

LO2-3 Summarize quantitative variables with frequency and relative frequency distributions.

FREQUENCY DISTRIBUTION A grouping of quantitative data into mutually exclusive and collectively exhaustive classes showing the number of observations in each class.

How do we develop a frequency distribution? The following example shows the steps to construct a frequency distribution. Remember, our goal is to construct tables, charts, and graphs that will quickly summarize the data by showing the location, extreme values, and shape of the data’s distribution.

TABLE 2–4 Profit on Vehicles Sold Last Month by the Applewood Auto Group Maximum

Minimum

$1,387 $2,148 $2,201 $ 963 $ 820 $2,230 $3,043 $2,584 $2,370 1,754 2,207 996 1,298 1,266 2,341 1,059 2,666 2,637 1,817 2,252 2,813 1,410 1,741 3,292 1,674 2,991 1,426 1,040 1,428 323 1,553 1,772 1,108 1,807 934 2,944 1,273 1,889 352 1,648 1,932 1,295 2,056 2,063 2,147 1,529 1,166 482 2,071 2,350 1,344 2,236 2,083 1,973 3,082 1,320 1,144 2,116 2,422 1,906 2,928 2,856 2,502 1,951 2,265 1,485 1,500 2,446 1,952 1,269 2,989 783 2,692 1,323 1,509 1,549 369 2,070 1,717 910 1,538 1,206 1,760 1,638 2,348 978 2,454 1,797 1,536 2,339 1,342 1,919 1,961 2,498 1,238 1,606 1,955 1,957 2,700 443 2,357 2,127 294 1,818 1,680 2,199 2,240 2,222 754 2,866 2,430 1,115 1,824 1,827 2,482 2,695 2,597 1,621 732 1,704 1,124 1,907 1,915 2,701 1,325 2,742 870 1,464 1,876 1,532 1,938 2,084 3,210 2,250 1,837 1,174 1,626 2,010 1,688 1,940 2,639 377 2,279 2,842 1,412 1,762 2,165 1,822 2,197 842 1,220 2,626 2,434 1,809 1,915 2,231 1,897 2,646 1,963 1,401 1,501 1,640 2,415 2,119 2,389 2,445 1,461 2,059 2,175 1,752 1,821 1,546 1,766 335 2,886 1,731 2,338 1,118 2,058 2,487

S O L U T I O N

To begin, we need the profits for each of the 180 vehicle sales listed in Table 2–4. This information is called raw or ungrouped data because it is simply a listing

E X A M P L E

Ms. Kathryn Ball of the Applewood Auto Group wants to summarize the quantitative variable profit with a frequency distribution and display the distribution with charts and graphs. With this information, Ms. Ball can easily answer the following ques- tions: What is the typical profit on each sale? What is the largest or maximum profit on any sale? What is the smallest or minimum profit on any sale? Around what value do the profits tend to cluster?

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 27

of the individual, observed profits. It is possible to search the list and find the smallest or minimum profit ($294) and the largest or maximum profit ($3,292), but that is about all. It is difficult to determine a typical profit or to visualize where the profits tend to cluster. The raw data are more easily interpreted if we summarize the data with a frequency distribution. The steps to create this frequency distribu- tion follow.

Step 1: Decide on the number of classes. A useful recipe to determine the number of classes (k) is the “2 to the k rule.” This guide suggests you select the smallest number (k) for the number of classes such that 2k (in words, 2 raised to the power of k) is greater than the number of observations (n). In the Applewood Auto Group example, there were 180 vehicles sold. So n = 180. If we try k = 7, which means we would use 7 classes, 27 = 128, which is less than 180. Hence, 7 is too few classes. If we let k = 8, then 28 = 256, which is greater than 180. So the recommended number of classes is 8.

Step 2: Determine the class interval. Generally, the class interval is the same for all classes. The classes all taken together must cover at least the distance from the minimum value in the data up to the max- imum value. Expressing these words in a formula:

i ≥ Maximum Value − Minimum Value

k where i is the class interval, and k is the number of classes.

For the Applewood Auto Group, the minimum value is $294 and the maximum value is $3,292. If we need 8 classes, the interval should be:

i ≥ Maximum Value − Minimum Value

k =

$3,292 − $294 8

= $374.75

In practice, this interval size is usually rounded up to some conve- nient number, such as a multiple of 10 or 100. The value of $400 is a reasonable choice.

Step 3: Set the individual class limits. State clear class limits so you can put each observation into only one category. This means you must avoid overlapping or unclear class limits. For example, classes such as “$1,300–$1,400” and “$1,400–$1,500” should not be used because it is not clear whether the value $1,400 is in the first or second class. In this text, we will generally use the format $1,300 up to $1,400 and $1,400 up to $1,500 and so on. With this format, it is clear that $1,399 goes into the first class and $1,400 in the second.

Because we always round the class interval up to get a conve- nient class size, we cover a larger than necessary range. For ex- ample, using 8 classes with an interval of $400 in the Applewood Auto Group example results in a range of 8($400) = $3,200. The actual range is $2,998, found by ($3,292 − $294). Comparing that value to $3,200, we have an excess of $202. Because we need to cover only the range (Maximum − Minimum), it is natural to put ap- proximately equal amounts of the excess in each of the two tails. Of course, we also should select convenient class limits. A guide- line is to make the lower limit of the first class a multiple of the class interval. Sometimes this is not possible, but the lower limit should at least be rounded. So here are the classes we could use for these data.

28 CHAPTER 2

Classes

$ 200 up to $ 600 600 up to 1,000 1,000 up to 1,400 1,400 up to 1,800 1,800 up to 2,200 2,200 up to 2,600 2,600 up to 3,000 3,000 up to 3,400

Profit Frequency

$ 200 up to $ 600 |||| ||| 600 up to 1,000 |||| |||| | 1,000 up to 1,400 |||| |||| |||| |||| ||| 1,400 up to 1,800 |||| |||| |||| |||| |||| |||| |||| ||| 1,800 up to 2,200 |||| |||| |||| |||| |||| |||| |||| |||| |||| 2,200 up to 2,600 |||| |||| |||| |||| |||| || 2,600 up to 3,000 |||| |||| |||| |||| 3,000 up to 3,400 ||||

Step 4: Tally the vehicle profit into the classes and determine the number of observations in each class. To begin, the profit from the sale of the first vehicle in Table 2–4 is $1,387. It is tallied in the $1,000 up to $1,400 class. The second profit in the first row of Table 2–4 is $2,148. It is tallied in the $1,800 up to $2,200 class. The other profits are tallied in a similar manner. When all the profits are tallied, the table would appear as:

The number of observations in each class is called the class frequency. In the $200 up to $600 class there are 8 observations, and in the $600 up to $1,000 class there are 11 observations. There- fore, the class frequency in the first class is 8 and the class frequency in the second class is 11. There are a total of 180 observations in the entire set of data. So the sum of all the frequencies should be equal to 180. The results of the frequency distribution are in Table 2–5.

Now that we have organized the data into a frequency distribution (see Table 2–5), we can summarize the profits of the vehicles for the Applewood Auto Group. Observe the following:

1. The profits from vehicle sales range between $200 and $3,400. 2. The vehicle profits are classified using a class interval of $400. The class inter-

val is determined by subtracting consecutive lower or upper class limits. For

TABLE 2–5 Frequency Distribution of Profit for Vehicles Sold Last Month at Applewood Auto Group

Profit Frequency

$ 200 up to $ 600 8 600 up to 1,000 11 1,000 up to 1,400 23 1,400 up to 1,800 38 1,800 up to 2,200 45 2,200 up to 2,600 32 2,600 up to 3,000 19 3,000 up to 3,400 4

Total 180

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 29

example, the lower limit of the first class is $200, and the lower limit of the second class is $600. The difference is the class interval of $400.

3. The profits are concentrated between $1,000 and $3,000. The profit on 157 vehicles, or 87%, was within this range.

4. For each class, we can determine the typical profit or class midpoint. It is half- way between the lower or upper limits of two consecutive classes. It is com- puted by adding the lower or upper limits of consecutive classes and dividing by 2. Referring to Table 2–5, the lower class limit of the first class is $200, and the next class limit is $600. The class midpoint is $400, found by ($600 + $200)/2. The midpoint best represents, or is typical of, the profits of the vehi- cles in that class. Applewood sold 8 vehicles with a typical profit of $400.

5. The largest concentration, or highest frequency, of vehicles sold is in the $1,800 up to $2,200 class. There are 45 vehicles in this class. The class midpoint is $2,000. So we say that the typical profit in the class with the highest frequency is $2,000.

By presenting this information to Ms. Ball, we give her a clear picture of the distribu- tion of the vehicle profits for last month.

We admit that arranging the information on profits into a frequency distribution does result in the loss of some detailed information. That is, by organizing the data into a frequency distribution, we cannot pinpoint the exact profit on any vehicle, such as $1,387, $2,148, or $2,201. Further, we cannot tell that the actual minimum profit for any vehicle sold is $294 or that the maximum profit was $3,292. However, the lower limit of the first class and the upper limit of the last class convey essen- tially the same meaning. Likely, Ms. Ball will make the same judgment if she knows the smallest profit is about $200 that she will if she knows the exact profit is $292. The advantages of summarizing the 180 profits into a more understandable and organized form more than offset this disadvantage.

Number of Returns Adjusted Gross Income (in thousands)

No adjusted gross income 178.2 $ 1 up to 5,000 1,204.6 5,000 up to 10,000 2,595.5 10,000 up to 15,000 3,142.0 15,000 up to 20,000 3,191.7 20,000 up to 25,000 2,501.4 25,000 up to 30,000 1,901.6 30,000 up to 40,000 2,502.3 40,000 up to 50,000 1,426.8 50,000 up to 75,000 1,476.3 75,000 up to 100,000 338.8 100,000 up to 200,000 223.3 200,000 up to 500,000 55.2 500,000 up to 1,000,000 12.0 1,000,000 up to 2,000,000 5.1 2,000,000 up to 10,000,000 3.4 10,000,000 or more 0.6

TABLE 2–6 Adjusted Gross Income for Individuals Filing Income Tax Returns

When we summarize raw data with frequency distributions, equal class intervals are pre- ferred. However, in certain situations unequal class intervals may be necessary to avoid a large number of classes with very small frequencies. Such is the case in Table 2–6. The U.S. Internal Revenue Service uses unequal-sized class intervals for adjusted gross income on individual tax returns to summarize the number of individual tax returns. If we use our method to find equal class intervals, the 2k rule results in 25 classes, and

STATISTICS IN ACTION

In 1788, James Madison, John Jay, and Alexander Hamilton anonymously published a series of essays entitled The Federalist. These Federalist papers were an attempt to convince the people of New York that they should ratify the Constitution. In the course of history, the authorship of most of these papers became known, but 12 re- mained contested. Through the use of statistical analysis, and particularly studying the frequency distributions of various words, we can now conclude that James Madison is the likely author of the 12 papers. In fact, the statistical evidence that Madison is the author is overwhelming.

30 CHAPTER 2

a class interval of $400,000, assuming $0 and $10,000,000 as the minimum and maximum values for adjusted gross income. Using equal class intervals, the first 13 classes in Table 2–6 would be combined into one class of about 99.9% of all tax returns and 24 classes for the 0.1% of the returns with an adjusted gross income above $400,000. Using equal class inter- vals does not provide a good understanding of the raw data. In this case, good judgment in the use of unequal class intervals, as demonstrated in Table 2–6, is required to show the distribution of the number of tax returns filed, especially for incomes under $500,000.

In the first quarter of last year, the 11 members of the sales staff at Master Chemical Company earned the following commissions:

$1,650 $1,475 $1,510 $1,670 $1,595 $1,760 $1,540 $1,495 $1,590 $1,625 $1,510

(a) What are the values such as $1,650 and $1,475 called? (b) Using $1,400 up to $1,500 as the first class, $1,500 up to $1,600 as the second class,

and so forth, organize the quarterly commissions into a frequency distribution. (c) What are the numbers in the right column of your frequency distribution called? (d) Describe the distribution of quarterly commissions, based on the frequency distribu-

tion. What is the largest concentration of commissions earned? What is the smallest, and the largest? What is the typical amount earned?

Relative Frequency Distribution It may be desirable, as we did earlier with qualitative data, to convert class frequencies to relative class frequencies to show the proportion of the total number of observations in each class. In our vehicle profits, we may want to know what percentage of the vehi- cle profits are in the $1,000 up to $1,400 class. To convert a frequency distribution to a relative frequency distribution, each of the class frequencies is divided by the total num- ber of observations. From the distribution of vehicle profits, Table 2–5, the relative fre- quency for the $1,000 up to $1,400 class is 0.128, found by dividing 23 by 180. That is, profit on 12.8% of the vehicles sold is between $1,000 and $1,400. The relative fre- quencies for the remaining classes are shown in Table 2–7.

S E L F - R E V I E W 2–2

TABLE 2–7 Relative Frequency Distribution of Profit for Vehicles Sold Last Month at Applewood Auto Group

Profit Frequency Relative Frequency Found by

$ 200 up to $ 600 8 .044 8/180 600 up to 1,000 11 .061 11/180 1,000 up to 1,400 23 .128 23/180 1,400 up to 1,800 38 .211 38/180 1,800 up to 2,200 45 .250 45/180 2,200 up to 2,600 32 .178 32/180 2,600 up to 3,000 19 .106 19/180 3,000 up to 3,400 4 .022 4/180

Total 180 1.000

There are many software packages that perform statistical calculations. Throughout this text, we will show the output from Microsoft Excel, MegaStat (a Microsoft Excel add-in), and Minitab (a statistical software package). Because Excel is most readily available, it is used most frequently.

Within the earlier Graphic Presentation of Qualitative Data section, we used the Pivot Table tool in Excel to create a frequency table. To create the table to the left, we use the same Excel tool to

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 31

compute frequency and relative frequency distributions for the profit variable in the Applewood Auto Group data. The necessary steps are given in the Software Commands section in Appendix C.

Barry Bonds of the San Francisco Giants established a new single-season Major League Baseball home run record by hitting 73 home runs during the 2001 season. Listed below is the sorted distance of each of the 73 home runs.

S E L F - R E V I E W 2–3

(a) For this data, show that seven classes would be used to create a frequency distribution using the 2k rule.

(b) Show that a class interval of 30 would summarize the data in seven classes. (c) Construct frequency and relative frequency distributions for the data with

seven classes and a class interval of 30. Start the first class with a lower limit of 300.

(d) How many home runs traveled a distance of 360 up to 390 feet? (e) What percentage of the home runs traveled a distance of 360 up to 390 feet? (f) What percentage of the home runs traveled a distance of 390 feet or more?

7. A set of data consists of 38 observations. How many classes would you recom- mend for the frequency distribution?

8. A set of data consists of 45 observations between $0 and $29. What size would you recommend for the class interval?

9. A set of data consists of 230 observations between $235 and $567. What class interval would you recommend?

10. A set of data contains 53 observations. The minimum value is 42 and the maximum value is 129. The data are to be organized into a frequency distribution.

a. How many classes would you suggest? b. What would you suggest as the lower limit of the first class?

11. Wachesaw Manufacturing Inc. produced the following number of units in the last 16 days.

The information is to be organized into a frequency distribution. a. How many classes would you recommend? b. What class interval would you suggest? c. What lower limit would you recommend for the first class? d. Organize the information into a frequency distribution and determine the relative

frequency distribution. e. Comment on the shape of the distribution.

E X E R C I S E S This icon indicates that the data are available at the text website: www.mhhe.com/ Lind17e. You will be able to download the data directly into Excel or Minitab from this site.

27 27 27 28 27 25 25 28 26 28 26 28 31 30 26 26

320 320 347 350 360 360 360 361 365 370 370 375 375 375 375 380 380 380 380 380 380 390 390 391 394 396 400 400 400 400 405 410 410 410 410 410 410 410 410 410 410 410 411 415 415 416 417 417 420 420 420 420 420 420 420 420 429 430 430 430 430 430 435 435 436 440 440 440 440 440 450 480 488

32 CHAPTER 2

The data are to be organized into a frequency distribution. a. How many classes would you recommend? b. What class interval would you suggest? c. What lower limit would you recommend for the first class? d. Organize the number of oil changes into a frequency distribution. e. Comment on the shape of the frequency distribution. Also determine the relative

frequency distribution.

13. The manager of the BiLo Supermarket in Mt. Pleasant, Rhode Island, gathered the following information on the number of times a customer visits the store during a month. The responses of 51 customers were:

65 98 55 62 79 59 51 90 72 56 70 62 66 80 94 79 63 73 71 85

12. The Quick Change Oil Company has a number of outlets in the metropolitan Seat- tle area. The daily number of oil changes at the Oak Street outlet in the past 20 days are:

5 3 3 1 4 4 5 6 4 2 6 6 6 7 1 1 14 1 2 4 4 4 5 6 3 5 3 4 5 6 8 4 7 6 5 9 11 3 12 4 7 6 5 15 1 1 10 8 9 2 12

a. Starting with 0 as the lower limit of the first class and using a class interval of 3, organize the data into a frequency distribution.

b. Describe the distribution. Where do the data tend to cluster? c. Convert the distribution to a relative frequency distribution.

14. The food services division of Cedar River Amusement Park Inc. is studying the amount of money spent per day on food and drink by families who visit the amuse- ment park. A sample of 40 families who visited the park yesterday revealed they spent the following amounts:

$77 $18 $63 $84 $38 $54 $50 $59 $54 $56 $36 $26 $50 $34 $44 41 58 58 53 51 62 43 52 53 63 62 62 65 61 52 60 60 45 66 83 71 63 58 61 71

a. Organize the data into a frequency distribution, using seven classes and 15 as the lower limit of the first class. What class interval did you select?

b. Where do the data tend to cluster? c. Describe the distribution. d. Determine the relative frequency distribution.

GRAPHIC PRESENTATION OF A DISTRIBUTION Sales managers, stock analysts, hospital administrators, and other busy executives of- ten need a quick picture of the distributions of sales, stock prices, or hospital costs. These distributions can often be depicted by the use of charts and graphs. Three charts that will help portray a frequency distribution graphically are the histogram, the fre- quency polygon, and the cumulative frequency polygon.

Histogram A histogram for a frequency distribution based on quantitative data is similar to the bar chart showing the distribution of qualitative data. The classes are marked on the

LO2-4 Display a distribution using a histogram or frequency polygon.

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 33

horizontal axis and the class frequencies on the vertical axis. The class frequencies are represented by the heights of the bars. However, there is one important differ- ence based on the nature of the data. Quantitative data are usually measured using scales that are continuous, not discrete. Therefore, the horizontal axis represents all possible values, and the bars are drawn adjacent to each other to show the continu- ous nature of the data.

HISTOGRAM A graph in which the classes are marked on the horizontal axis and the class frequencies on the vertical axis. The class frequencies are represented by the heights of the bars, and the bars are drawn adjacent to each other.

E X A M P L E

Below is the frequency distribution of the profits on vehicle sales last month at the Applewood Auto Group.

Construct a histogram. What observations can you reach based on the information presented in the histogram?

S O L U T I O N

The class frequencies are scaled along the vertical axis (Y-axis) and either the class limits or the class midpoints along the horizontal axis. To illustrate the construction of the histogram, the first three classes are shown in Chart 2–3.

Profit Frequency

$ 200 up to $ 600 8 600 up to 1,000 11 1,000 up to 1,400 23 1,400 up to 1,800 38 1,800 up to 2,200 45 2,200 up to 2,600 32 2,600 up to 3,000 19 3,000 up to 3,400 4

Total 180

200 600 1,000 1,400

8 8

Nu m

be r o

f V eh

ic le

s (c

la ss

fr eq

ue nc

Pro�t $

CHART 2–3 Construction of a Histogram

34 CHAPTER 2

From Chart 2–3 we note the profit on eight vehicles was $200 up to $600. There- fore, the height of the column for that class is 8. There are 11 vehicle sales where the profit was $600 up to $1,000. So, logically, the height of that column is 11. The height of the bar represents the number of observations in the class.

This procedure is continued for all classes. The complete histogram is shown in Chart 2–4. Note that there is no space between the bars. This is a feature of the histogram. Why is this so? Because the variable profit, plotted on the horizontal axis, is a continuous variable. In a bar chart, the scale of measurement is usually nominal and the vertical bars are separated. This is an important distinction be- tween the histogram and the bar chart.

We can make the following statements using Chart 2–4. They are the same as the observations based on Table 2–5.

1. The profits from vehicle sales range between $200 and $3,400. 2. The vehicle profits are classified using a class interval of $400. The class inter-

val is determined by subtracting consecutive lower or upper class limits. For example, the lower limit of the first class is $200, and the lower limit of the second class is $600. The difference is the class interval or $400.

3. The profits are concentrated between $1,000 and $3,000. The profit on 157 vehicles, or 87%, was within this range.

4. For each class, we can determine the typical profit or class midpoint. It is halfway between the lower or upper limits of two consecutive classes. It is computed by adding the lower or upper limits of consecutive classes and dividing by 2. Refer- ring to Chart 2–4, the lower class limit of the first class is $200, and the next class limit is $600. The class midpoint is $400, found by ($600 + $200)/2. The mid- point best represents, or is typical of, the profits of the vehicles in that class. Applewood sold 8 vehicles with a typical profit of $400.

5. The largest concentration, or highest frequency of vehicles sold, is in the $1,800 up to $2,200 class. There are 45 vehicles in this class. The class midpoint is $2,000. So we say that the typical profit in the class with the highest frequency is $2,000.

Thus, the histogram provides an easily interpreted visual representation of a frequency distribution. We should also point out that we would have made the same observations and the shape of the histogram would have been the same had we used a relative frequency distribution instead of the actual frequencies. That is, if we use the relative frequencies of Table 2–7, the result is a histogram of the same shape as Chart 2–4. The only difference is that the vertical axis would have been reported in percentage of vehicles instead of the number of vehicles. The Excel commands to create Chart 2–4 are given in Appendix C.

20 0–

60 0

60 0–

1,0 00

–1 ,40

1,4 00

–1 ,80

1,8 00

–2 ,20

2,2 00

–2 ,60

2,6 00

–3 ,00

3,0 00

–3 ,40

Pro�t

4 8

Fr eq

ue nc

CHART 2–4 Histogram of the Profit on 180 Vehicles Sold at the Applewood Auto Group

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 35

Frequency Polygon A frequency polygon also shows the shape of a distribution and is similar to a histo- gram. It consists of line segments connecting the points formed by the intersections of the class midpoints and the class frequencies. The construction of a frequency polygon is illustrated in Chart 2–5. We use the profits from the cars sold last month at the Apple- wood Auto Group. The midpoint of each class is scaled on the X-axis and the class frequencies on the Y-axis. Recall that the class midpoint is the value at the center of a class and represents the typical values in that class. The class frequency is the number of observations in a particular class. The profit earned on the vehicles sold last month by the Applewood Auto Group is repeated below.

STATISTICS IN ACTION

Florence Nightingale is known as the founder of the nursing profession. However, she also saved many lives by using statisti- cal analysis. When she encountered an unsanitary condition or an undersup- plied hospital, she improved the conditions and then used statistical data to document the improve- ment. Thus, she was able to convince others of the need for medical reform, particularly in the area of sanitation. She developed original graphs to demon- strate that, during the Crimean War, more soldiers died from unsanitary condi- tions than were killed in combat.

Fr eq

ue nc

4000

Pro�t $

800 1,200 1,600 2,000 2,400 2,800 3,200 3,600

CHART 2–5 Frequency Polygon of Profit on 180 Vehicles Sold at Applewood Auto Group

As noted previously, the $200 up to $600 class is represented by the midpoint $400. To construct a frequency polygon, move horizontally on the graph to the mid- point, $400, and then vertically to 8, the class frequency, and place a dot. The x and the y values of this point are called the coordinates. The coordinates of the next point are x = 800 and y = 11. The process is continued for all classes. Then the points are connected in order. That is, the point representing the lowest class is joined to the one representing the second class and so on. Note in Chart 2–5 that, to complete the frequency polygon, midpoints of $0 and $3,600 are added to the X-axis to “anchor” the polygon at zero frequencies. These two values, $0 and $3,600, were derived by subtracting the class interval of $400 from the lowest midpoint ($400) and by adding $400 to the highest midpoint ($3,200) in the frequency distribution.

Both the histogram and the frequency polygon allow us to get a quick picture of the main characteristics of the data (highs, lows, points of concentration, etc.). Although the two representations are similar in purpose, the histogram has the advantage of depicting each class as a rectangle, with the height of the rectangular bar representing

Profit Midpoint Frequency

$ 200 up to $ 600 $ 400 8 600 up to 1,000 800 11 1,000 up to 1,400 1,200 23 1,400 up to 1,800 1,600 38 1,800 up to 2,200 2,000 45 2,200 up to 2,600 2,400 32 2,600 up to 3,000 2,800 19 3,000 up to 3,400 3,200 4

Total 180

36 CHAPTER 2

4000

Pro�t $

Fr eq

ue nc

800 1,200 1,600 2,000 2,400 2,800 3,200 3,600

Fowler Motors Applewood

CHART 2–6 Distribution of Profit at Applewood Auto Group and Fowler Motors

the number in each class. The frequency polygon, in turn, has an advantage over the histogram. It allows us to compare directly two or more frequency distributions. Sup- pose Ms. Ball wants to compare the profit per vehicle sold at Applewood Auto Group with a similar auto group, Fowler Auto in Grayling, Michigan. To do this, two frequency polygons are constructed, one on top of the other, as in Chart 2–6. Two things are clear from the chart:

• The typical vehicle profit is larger at Fowler Motors—about $2,000 for Applewood and about $2,400 for Fowler.

• There is less variation or dispersion in the profits at Fowler Motors than at Apple- wood. The lower limit of the first class for Applewood is $0 and the upper limit is $3,600. For Fowler Motors, the lower limit is $800 and the upper limit is the same: $3,600.

The total number of cars sold at the two dealerships is about the same, so a direct comparison is possible. If the difference in the total number of cars sold is large, then converting the frequencies to relative frequencies and then plotting the two distribu- tions would allow a clearer comparison.

The annual imports of a selected group of electronic suppliers are shown in the following frequency distribution.

S E L F - R E V I E W 2–4

Imports ($ millions) Number of Suppliers

2 up to 5 6 5 up to 8 13 8 up to 11 20 11 up to 14 10 14 up to 17 1

(a) Portray the imports as a histogram. (b) Portray the imports as a relative frequency polygon. (c) Summarize the important facets of the distribution (such as classes with the highest

and lowest frequencies).

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 37

15. Molly’s Candle Shop has several retail stores in the coastal areas of North and South Carolina. Many of Molly’s customers ask her to ship their purchases. The fol- lowing chart shows the number of packages shipped per day for the last 100 days. For example, the first class shows that there were 5 days when the number of pack- ages shipped was 0 up to 5.

Fr eq

ue nc

y Number of Packages

0 5 10 15 20 25 30 35

28 23

10 35

a. What is this chart called? b. What is the total number of packages shipped? c. What is the class interval? d. What is the number of packages shipped in the 10 up to 15 class? e. What is the relative frequency of packages shipped in the 10 up to 15 class? f. What is the midpoint of the 10 up to 15 class? g. On how many days were there 25 or more packages shipped?

16. The following chart shows the number of patients admitted daily to Memorial Hospital through the emergency room.

2 4 6 8 10 12

Fr eq

ue nc

Number of Patients

a. What is the midpoint of the 2 up to 4 class? b. How many days were 2 up to 4 patients admitted? c. What is the class interval? d. What is this chart called?

17. The following frequency distribution reports the number of frequent flier miles, reported in thousands, for employees of Brumley Statistical Consulting Inc. during the most recent quarter.

E X E R C I S E S

Frequent Flier Miles Number of (000) Employees

0 up to 3 5 3 up to 6 12 6 up to 9 23 9 up to 12 8 12 up to 15 2 Total 50

38 CHAPTER 2

Cumulative Distributions Consider once again the distribution of the profits on vehicles sold by the Applewood Auto Group. Suppose we were interested in the number of vehicles that sold for a profit of less than $1,400. These values can be approximated by developing a cumulative frequency distribution and portraying it graphically in a cumulative frequency polygon. Or, suppose we were interested in the profit earned on the lowest-selling 40% of the ve- hicles. These values can be approximated by developing a cumulative relative frequency distribution and portraying it graphically in a cumulative relative frequency polygon.

a. How many employees were studied? b. What is the midpoint of the first class? c. Construct a histogram. d. A frequency polygon is to be drawn. What are the coordinates of the plot for the

first class? e. Construct a frequency polygon. f. Interpret the frequent flier miles accumulated using the two charts.

18. A large Internet retailer is studying the lead time (elapsed time between when an order is placed and when it is filled) for a sample of recent orders. The lead times are reported in days.

a. How many orders were studied? b. What is the midpoint of the first class? c. What are the coordinates of the first class for a frequency polygon? d. Draw a histogram. e. Draw a frequency polygon. f. Interpret the lead times using the two charts.

Lead Time (days) Frequency

0 up to 5 6 5 up to 10 7 10 up to 15 12 15 up to 20 8 20 up to 25 7 Total 40

E X A M P L E

The frequency distribution of the profits earned at Applewood Auto Group is repeated from Table 2–5.

Profit Frequency

$ 200 up to $ 600 8 600 up to 1,000 11 1,000 up to 1,400 23 1,400 up to 1,800 38 1,800 up to 2,200 45 2,200 up to 2,600 32 2,600 up to 3,000 19 3,000 up to 3,400 4

Total 180

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 39

Construct a cumulative frequency polygon to answer the following question: sixty of the vehicles earned a profit of less than what amount? Construct a cumulative relative frequency polygon to answer this question: seventy-five percent of the vehicles sold earned a profit of less than what amount?

S O L U T I O N

As the names imply, a cumulative frequency distribution and a cumulative fre- quency polygon require cumulative frequencies. To construct a cumulative fre- quency distribution, refer to the preceding table and note that there were eight vehicles in which the profit earned was less than $600. Those 8 vehicles, plus the 11 in the next higher class, for a total of 19, earned a profit of less than $1,000. The cumulative frequency for the next higher class is 42, found by 8 + 11 + 23. This process is continued for all the classes. All the vehicles earned a profit of less than $3,400. (See Table 2–8.)

TABLE 2–8 Cumulative Frequency Distribution for Profit on Vehicles Sold Last Month at Applewood Auto Group

Profit Cumulative Frequency Found by

Less than $ 600 8 8 Less than 1,000 19 8 + 11 Less than 1,400 42 8 + 11 + 23 Less than 1,800 80 8 + 11 + 23 + 38 Less than 2,200 125 8 + 11 + 23 + 38 + 45 Less than 2,600 157 8 + 11 + 23 + 38 + 45 + 32 Less than 3,000 176 8 + 11 + 23 + 38 + 45 + 32 + 19 Less than 3,400 180 8 + 11 + 23 + 38 + 45 + 32 + 19 + 4

TABLE 2–9 Cumulative Relative Frequency Distribution for Profit on Vehicles Sold Last Month at Applewood Auto Group

Profit Cumulative Frequency Cumulative Relative Frequency

Less than $ 600 8 8/180 = 0.044 = 4.4% Less than $ 1,000 19 19/180 = 0.106 = 10.6% Less than $ 1,400 42 42/180 = 0.233 = 23.3% Less than $ 1,800 80 80/180 = 0.444 = 44.4% Less than $2,200 125 125/180 = 0.694 = 69.4% Less than $2,600 157 157/180 = 0.872 = 87.2% Less than $3,000 176 176/180 = 0.978 = 97.8% Less than $3,400 180 180/180 = 1.000 = 100%

To construct a cumulative relative frequency distribution, we divide the cumulative frequencies by the total number of observations, 180. As shown in Table 2-9, the cumulative relative frequency of the fourth class is 80/180 = 44%. This means that 44% of the vehicles sold for less than $1,800.

To plot a cumulative frequency distribution, scale the upper limit of each class along the X-axis and the corresponding cumulative frequencies along the Y-axis. To provide additional information, you can label the vertical axis on the right in terms of cumulative relative frequencies. In the Applewood Auto Group,

40 CHAPTER 2

the vertical axis on the left is labeled from 0 to 180 and on the right from 0 to 100%. Note, as an example, that 50% on the right axis should be opposite 90 vehicles on the left axis.

To begin, the first plot is at x = 200 and y = 0. None of the vehicles sold for a profit of less than $200. The profit on 8 vehicles was less than $600, so the next plot is at x = 600 and y = 8. Continuing, the next plot is x = 1,000 and y = 19. There were 19 vehicles that sold for a profit of less than $1,000. The rest of the points are plotted and then the dots connected to form Chart 2–7.

We should point out that the shape of the distribution is the same if we use cumulative relative frequencies instead of the cumulative frequencies. The only difference is that the vertical axis is scaled in percentages. In the following charts, a percentage scale is added to the right side of the graphs to help answer ques- tions about cumulative relative frequencies.

200 600 1,000 1,400 1,800 2,200 2,600 3,000 3,400

Nu m

be r o

f V eh

ic le

s So

Pe rc

en t o

f V eh

ic le

s So

ld Pro�t $

100

120

140

160

180

CHART 2–7 Cumulative Frequency Polygon for Profit on Vehicles Sold Last Month at Applewood Auto Group

Using Chart 2–7 to find the amount of profit on 75% of the cars sold, draw a hori- zontal line from the 75% mark on the right-hand vertical axis over to the polygon, then drop down to the X-axis and read the amount of profit. The value on the X-axis is about $2,300, so we estimate that 75% of the vehicles sold earned a profit of $2,300 or less for the Applewood group.

To find the highest profit earned on 60 of the 180 vehicles, we use Chart 2–7 to locate the value of 60 on the left-hand vertical axis. Next, we draw a horizontal line from the value of 60 to the polygon and then drop down to the X-axis and read the profit. It is about $1,600, so we estimate that 60 of the vehicles sold for a profit of less than $1,600. We can also make estimates of the percentage of vehicles that sold for less than a particular amount. To explain, suppose we want to estimate the percentage of vehicles that sold for a profit of less than $2,000. We begin by locat- ing the value of $2,000 on the X-axis, move vertically to the polygon, and then horizontally to the vertical axis on the right. The value is about 56%, so we conclude 56% of the vehicles sold for a profit of less than $2,000.

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 41

A sample of the hourly wages of 15 employees at Home Depot in Brunswick, Georgia, was organized into the following table.

Hourly Wages Number of Employees

$ 8 up to $10 3 10 up to 12 7 12 up to 14 4 14 up to 16 1

(a) What is the table called? (b) Develop a cumulative frequency distribution and portray the distribution in a cumula-

tive frequency polygon. (c) On the basis of the cumulative frequency polygon, how many employees earn less

than $11 per hour?

S E L F - R E V I E W 2–5

19. The following cumulative frequency and the cumulative relative frequency polygon for the distribution of hourly wages of a sample of certified welders in the Atlanta, Georgia, area is shown in the graph.

Fr eq

ue nc

Hourly Wage

Pe rc

en t

0 5 10 15 20 25 30

100

a. How many welders were studied? b. What is the class interval? c. About how many welders earn less than $10.00 per hour? d. About 75% of the welders make less than what amount? e. Ten of the welders studied made less than what amount? f. What percent of the welders make less than $20.00 per hour?

20. The cumulative frequency and the cumulative relative frequency polygon for a dis- tribution of selling prices ($000) of houses sold in the Billings, Montana, area is shown in the graph.

Fr eq

ue nc

Pe rc

en t

200

150

100

Selling Price ($000)

500 100 150 200 250 350300

E X E R C I S E S

42 CHAPTER 2

a. How many homes were studied? b. What is the class interval? c. One hundred homes sold for less than what amount? d. About 75% of the homes sold for less than what amount? e. Estimate the number of homes in the $150,000 up to $200,000 class. f. About how many homes sold for less than $225,000?

21. The frequency distribution representing the number of frequent flier miles accumulated by employees at Brumley Statistical Consulting Inc. is repeated from Exercise 17.

Frequent Flier Miles (000) Frequency

0 up to 3 5 3 up to 6 12 6 up to 9 23 9 up to 12 8 12 up to 15 2

Total 50

a. How many employees accumulated less than 3,000 miles? b. Convert the frequency distribution to a cumulative frequency distribution. c. Portray the cumulative distribution in the form of a cumulative frequency polygon. d. Based on the cumulative relative frequencies, about 75% of the employees

accumulated how many miles or less? 22. The frequency distribution of order lead time of the retailer from Exercise 18 is

repeated below.

Lead Time (days) Frequency

0 up to 5 6 5 up to 10 7 10 up to 15 12 15 up to 20 8 20 up to 25 7

Total 40

a. How many orders were filled in less than 10 days? In less than 15 days? b. Convert the frequency distribution to cumulative frequency and cumulative rela-

tive frequency distributions. c. Develop a cumulative frequency polygon. d. About 60% of the orders were filled in less than how many days?

C H A P T E R S U M M A R Y

I. A frequency table is a grouping of qualitative data into mutually exclusive and collectively exhaustive classes showing the number of observations in each class.

II. A relative frequency table shows the fraction of the number of frequencies in each class. III. A bar chart is a graphic representation of a frequency table. IV. A pie chart shows the proportion each distinct class represents of the total number of

observations. V. A frequency distribution is a grouping of data into mutually exclusive and collectively ex-

haustive classes showing the number of observations in each class. A. The steps in constructing a frequency distribution are

1. Decide on the number of classes. 2. Determine the class interval. 3. Set the individual class limits. 4. Tally the raw data into classes and determine the frequency in each class.

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 43

B. The class frequency is the number of observations in each class. C. The class interval is the difference between the limits of two consecutive classes. D. The class midpoint is halfway between the limits of consecutive classes.

VI. A relative frequency distribution shows the percent of observations in each class. VII. There are several methods for graphically portraying a frequency distribution.

A. A histogram portrays the frequencies in the form of a rectangle or bar for each class. The height of the rectangles is proportional to the class frequencies.

B. A frequency polygon consists of line segments connecting the points formed by the intersection of the class midpoint and the class frequency.

C. A graph of a cumulative frequency distribution shows the number of observations less than a given value.

D. A graph of a cumulative relative frequency distribution shows the percent of observa- tions less than a given value.

C H A P T E R E X E R C I S E S

23. Describe the similarities and differences of qualitative and quantitative variables. Be sure to include the following: a. What level of measurement is required for each variable type? b. Can both types be used to describe both samples and populations?

24. Describe the similarities and differences between a frequency table and a frequency distribution. Be sure to include which requires qualitative data and which requires quan- titative data.

25. Alexandra Damonte will be building a new resort in Myrtle Beach, South Carolina. She must decide how to design the resort based on the type of activities that the resort will offer to its customers. A recent poll of 300 potential customers showed the following results about customers’ preferences for planned resort activities:

Like planned activities 63 Do not like planned activities 135 Not sure 78 No answer 24

a. What is the table called? b. Draw a bar chart to portray the survey results. c. Draw a pie chart for the survey results. d. If you are preparing to present the results to Ms. Damonte as part of a report, which

graph would you prefer to show? Why? 26. Speedy Swift is a package delivery service that serves the greater Atlanta, Georgia,

metropolitan area. To maintain customer loyalty, one of Speedy Swift’s performance objectives is on-time delivery. To monitor its performance, each delivery is measured on the following scale: early (package delivered before the promised time), on-time (pack- age delivered within 5 minutes of the promised time), late (package delivered more than 5 minutes past the promised time), or lost (package never delivered). Speedy Swift’s objective is to deliver 99% of all packages either early or on-time. Speedy collected the following data for last month’s performance:

On-time On-time Early Late On-time On-time On-time On-time Late On-time Early On-time On-time Early On-time On-time On-time On-time On-time On-time Early On-time Early On-time On-time On-time Early On-time On-time On-time Early On-time On-time Late Early Early On-time On-time On-time Early On-time Late Late On-time On-time On-time On-time On-time On-time On-time On-time Late Early On-time Early On-time Lost On-time On-time On-time Early Early On-time On-time Late Early Lost On-time On-time On-time On-time On-time Early On-time Early On-time Early On-time Late On-time On-time Early On-time On-time On-time Late On-time Early On-time On-time On-time On-time On-time On-time On-time Early Early On-time On-time On-time

44 CHAPTER 2

a. What kind of variable is delivery performance? What scale is used to measure delivery performance?

b. Construct a frequency table for delivery performance for last month. c. Construct a relative frequency table for delivery performance last month. d. Construct a bar chart of the frequency table for delivery performance for last month. e. Construct a pie chart of on-time delivery performance for last month. f. Write a memo reporting the results of the analyses. Include your tables and graphs with

written descriptions of what they show. Conclude with a general statement of last month’s delivery performance as it relates to Speedy Swift’s performance objectives.

27. A data set consists of 83 observations. How many classes would you recommend for a frequency distribution?

28. A data set consists of 145 observations that range from 56 to 490. What size class inter- val would you recommend?

29. The following is the number of minutes to commute from home to work for a group of 25 automobile executives.

28 25 48 37 41 19 32 26 16 23 23 29 36 31 26 21 32 25 31 43 35 42 38 33 28

a. How many classes would you recommend? b. What class interval would you suggest? c. What would you recommend as the lower limit of the first class? d. Organize the data into a frequency distribution. e. Comment on the shape of the frequency distribution.

30. The following data give the weekly amounts spent on groceries for a sample of 45 households.

$271 $363 $159 $ 76 $227 $337 $295 $319 $250 279 205 279 266 199 177 162 232 303 192 181 321 309 246 278 50 41 335 116 100 151 240 474 297 170 188 320 429 294 570 342 279 235 434 123 325

31. A social scientist is studying the use of iPods by college students. A sample of 45 students revealed they played the following number of songs yesterday.

4 6 8 7 9 6 3 7 7 6 7 1 4 7 7 4 6 4 10 2 4 6 3 4 6 8 4 3 3 6 8 8 4 6 4 6 5 5 9 6 8 8 6 5 10

Organize the information into a frequency distribution. a. How many classes would you suggest? b. What is the most suitable class interval? c. What is the lower limit of the initial class? d. Create the frequency distribution. e. Describe the shape of the distribution.

32. David Wise handles his own investment portfolio, and has done so for many years. Listed below is the holding time (recorded to the nearest whole year) between purchase and sale for his collection of 36 stocks.

8 8 6 11 11 9 8 5 11 4 8 5 14 7 12 8 6 11 9 7 9 15 8 8 12 5 9 8 5 9 10 11 3 9 8 6

a. How many classes would you propose? b. What class interval would you suggest? c. What quantity would you use for the lower limit of the initial class?

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 45

d. Using your responses to parts (a), (b), and (c), create a frequency distribution. e. Describe the shape of the frequency distribution.

33. You are exploring the music in your iTunes library. The total play counts over the past year for the 27 songs on your “smart playlist” are shown below. Make a frequency distribu- tion of the counts and describe its shape. It is often claimed that a small fraction of a person’s songs will account for most of their total plays. Does this seem to be the case here?

128 56 54 91 190 23 160 298 445 50 578 494 37 677 18 74 70 868 108 71 466 23 84 38 26 814 17

34. The monthly issues of the Journal of Finance are available on the Internet. The table below shows the number of times an issue was downloaded over the last 33 months. Suppose that you wish to summarize the number of downloads with a frequency distribution.

312 2,753 2,595 6,057 7,624 6,624 6,362 6,575 7,760 7,085 7,272 5,967 5,256 6,160 6,238 6,709 7,193 5,631 6,490 6,682 7,829 7,091 6,871 6,230 7,253 5,507 5,676 6,974 6,915 4,999 5,689 6,143 7,086

a. How many classes would you propose? b. What class interval would you suggest? c. What quantity would you use for the lower limit of the initial class? d. Using your responses to parts (a), (b), and (c), create a frequency distribution. e. Describe the shape of the frequency distribution.

35. The following histogram shows the scores on the first exam for a statistics class.

50 60 70 80 90 100

25 20 15 10

5 0

Score

Fr eq

ue nc

a. How many students took the exam? b. What is the class interval? c. What is the class midpoint for the first class? d. How many students earned a score of less than 70?

36. The following chart summarizes the selling price of homes sold last month in the Sarasota, Florida, area.

100

250 200 150 100 50

0 50 100 150 Selling Price ($000)

200 250 300 350

Fr eq

ue nc

Pe rc

en t

a. What is the chart called? b. How many homes were sold during the last month? c. What is the class interval? d. About 75% of the houses sold for less than what amount? e. One hundred seventy-five of the homes sold for less than what amount?

46 CHAPTER 2

37. A chain of sport shops catering to beginning skiers, headquartered in Aspen, Colorado, plans to conduct a study of how much a beginning skier spends on his or her initial purchase of equipment and supplies. Based on these figures, it wants to explore the possibility of offering combinations, such as a pair of boots and a pair of skis, to induce customers to buy more. A sample of 44 cash register receipts revealed these initial purchases:

$140 $ 82 $265 $168 $ 90 $114 $172 $230 $142 86 125 235 212 171 149 156 162 118 139 149 132 105 162 126 216 195 127 161 135 172 220 229 129 87 128 126 175 127 149 126 121 118 172 126

a. Arrive at a suggested class interval. b. Organize the data into a frequency distribution using a lower limit of $70. c. Interpret your findings.

38. The numbers of outstanding shares for 24 publicly traded companies are listed in the following table.

Number of Outstanding Shares Company (millions)

Southwest Airlines 738 FirstEnergy 418 Harley Davidson 226 Entergy 178 Chevron 1,957 Pacific Gas and Electric 430 DuPont 932 Westinghouse 22 Eversource 314 Facebook 1,067 Google, Inc. 64 Apple 941

Number of Outstanding Shares Company (millions)

Costco 436 Home Depot 1,495 DTE Energy 172 Dow Chemical 1,199 Eastman Kodak 272 American Electric Power 485 ITT Corporation 93 Ameren 243 Virginia Electric and Power 575 Public Service Electric & Gas 506 Consumers Energy 265 Starbucks 744

a. Using the number of outstanding shares, summarize the companies with a frequency distribution.

b. Display the frequency distribution with a frequency polygon. c. Create a cumulative frequency distribution of the outstanding shares. d. Display the cumulative frequency distribution with a cumulative frequency polygon. e. Based on the cumulative relative frequency distribution, 75% of the companies have

less than “what number” of outstanding shares? f. Write a brief analysis of this group of companies based on your statistical summaries

of “number of outstanding shares.” 39. A recent survey showed that the typical American car owner spends $2,950 per year on

operating expenses. Below is a breakdown of the various expenditure items. Draw an appropriate chart to portray the data and summarize your findings in a brief report.

Expenditure Item Amount

Fuel $ 603 Interest on car loan 279 Repairs 930 Insurance and license 646 Depreciation 492

Total $2,950

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 47

40. Midland National Bank selected a sample of 40 student checking accounts. Below are their end-of-the-month balances.

$404 $ 74 $234 $149 $279 $215 $123 $ 55 $ 43 $321 87 234 68 489 57 185 141 758 72 863

703 125 350 440 37 252 27 521 302 127 968 712 503 489 327 608 358 425 303 203

a. Tally the data into a frequency distribution using $100 as a class interval and $0 as the starting point.

b. Draw a cumulative frequency polygon. c. The bank considers any student with an ending balance of $400 or more a “pre-

ferred customer.” Estimate the percentage of preferred customers. d. The bank is also considering a service charge to the lowest 10% of the ending bal-

ances. What would you recommend as the cutoff point between those who have to pay a service charge and those who do not?

41. Residents of the state of South Carolina earned a total of $69.5 billion in adjusted gross income. Seventy-three percent of the total was in wages and salaries; 11% in dividends, interest, and capital gains; 8% in IRAs and taxable pensions; 3% in business income pensions; 2% in Social Security; and the remaining 3% from other sources. Develop a pie chart depicting the breakdown of adjusted gross income. Write a paragraph summa- rizing the information.

42. A recent study of home technologies reported the number of hours of personal computer usage per week for a sample of 60 persons. Excluded from the study were people who worked out of their home and used the computer as a part of their work.

9.3 5.3 6.3 8.8 6.5 0.6 5.2 6.6 9.3 4.3 6.3 2.1 2.7 0.4 3.7 3.3 1.1 2.7 6.7 6.5 4.3 9.7 7.7 5.2 1.7 8.5 4.2 5.5 5.1 5.6 5.4 4.8 2.1 10.1 1.3 5.6 2.4 2.4 4.7 1.7 2.0 6.7 1.1 6.7 2.2 2.6 9.8 6.4 4.9 5.2 4.5 9.3 7.9 4.6 4.3 4.5 9.2 8.5 6.0 8.1

a. Organize the data into a frequency distribution. How many classes would you sug- gest? What value would you suggest for a class interval?

b. Draw a histogram. Describe your result. 43. Merrill Lynch recently completed a study regarding the size of online investment

portfolios (stocks, bonds, mutual funds, and certificates of deposit) for a sample of cli- ents in the 40 up to 50 years old age group. Listed following is the value of all the in- vestments in thousands of dollars for the 70 participants in the study.

$669.9 $ 7.5 $ 77.2 $ 7.5 $125.7 $516.9 $ 219.9 $645.2 301.9 235.4 716.4 145.3 26.6 187.2 315.5 89.2 136.4 616.9 440.6 408.2 34.4 296.1 185.4 526.3 380.7 3.3 363.2 51.9 52.2 107.5 82.9 63.0 228.6 308.7 126.7 430.3 82.0 227.0 321.1 403.4 39.5 124.3 118.1 23.9 352.8 156.7 276.3 23.5 31.3 301.2 35.7 154.9 174.3 100.6 236.7 171.9 221.1 43.4 212.3 243.3 315.4 5.9 1,002.2 171.7 295.7 437.0 87.8 302.1 268.1 899.5

a. Organize the data into a frequency distribution. How many classes would you sug- gest? What value would you suggest for a class interval?

b. Draw a histogram. Financial experts suggest that this age group of people have at least five times their salary saved. As a benchmark, assume an investment portfolio of $500,000 would support retirement in 10–15 years. In writing, summarize your results.

48 CHAPTER 2

44. A total of 5.9% of the prime-time viewing audience watched shows on ABC, 7.6% watched shows on CBS, 5.5% on Fox, 6.0% on NBC, 2.0% on Warner Brothers, and 2.2% on UPN. A total of 70.8% of the audience watched shows on other cable net- works, such as CNN and ESPN. You can find the latest information on TV viewing from the following website: http://www.nielsen.com/us/en/top10s.html/. Develop a pie chart or a bar chart to depict this information. Write a paragraph summarizing your findings.

45. Refer to the following chart:

Contact for Job Placement at Wake Forest University

Networking and

Connections 70%

On-Campus Recruiting

10%

Job Posting Websites

20%

a. What is the name given to this type of chart? b. Suppose that 1,000 graduates will start a new job shortly after graduation. Estimate

the number of graduates whose first contact for employment occurred through net- working and other connections.

c. Would it be reasonable to conclude that about 90% of job placements were made through networking, connections, and job posting websites? Cite evidence.

46. The following chart depicts the annual revenues, by type of tax, for the state of Georgia.

Sales 44.54%Income

43.34%

Other 0.9%

License 2.9%

Corporate 8.31%

Annual Revenue State of Georgia

a. What percentage of the state revenue is accounted for by sales tax and individual income tax?

b. Which category will generate more revenue: corporate taxes or license fees? c. The total annual revenue for the state of Georgia is $6.3 billion. Estimate the amount

of revenue in billions of dollars for sales taxes and for individual taxes.

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 49

47. In 2014, the United States exported a total of $376 billion worth of products to Canada. The five largest categories were:

Product Amount

Vehicles $63.3 Machinery 59.7 Electrical machinery 36.6 Mineral fuel and oil 24.8 Plastic 17.0

a. Use a software package to develop a bar chart. b. What percentage of the United States’ total exports to Canada is represented by the

two categories “Machinery” and “Electrical Machinery”? c. What percentage of the top five exported products do “Machinery” and “Electrical

Machinery” represent? 48. In the United States, the industrial revolution of the early 20th century changed

farming by making it more efficient. For example, in 1910 U.S. farms used 24.2 million horses and mules and only about 1,000 tractors. By 1960, 4.6 million tractors were used and only 3.2 million horses and mules. An outcome of making farming more efficient is the reduction of the number of farms from over 6 million in 1920 to about 2.2 million farms today. Listed below is the number of farms, in thousands, for each of the 50 states. Summarize the data and write a paragraph that describes your findings.

50 12 5 28 59 19 35 22 80 5 8 48 3 75 25 77 46 68 10 69 77 25 13 20 35 6 52 61 36 38 88 1 75 246 59 50 44 98 74 2 32 42 7 31 28 9 8 44 25 37

49. One of the most popular candies in the United States is M&M’s produced by the Mars Company. In the beginning M&M’s were all brown. Now they are produced in red, green, blue, orange, brown, and yellow. Recently, the purchase of a 14-ounce bag of M&M’s Plain had 444 candies with the following breakdown by color: 130 brown, 98 yellow, 96 red, 35 orange, 52 blue, and 33 green. Develop a chart depicting this information and write a paragraph summarizing the results.

50. The number of families who used the Minneapolis YWCA day care service was recorded during a 30-day period. The results are as follows:

31 49 19 62 24 45 23 51 55 60 40 35 54 26 57 37 43 65 18 41 50 56 4 54 39 52 35 51 63 42

a. Construct a cumulative frequency distribution. b. Sketch a graph of the cumulative frequency polygon. c. How many days saw fewer than 30 families utilize the day care center? d. Based on cumulative relative frequencies, how busy were the highest 80% of the days?

D A T A A N A L Y T I C S

51. Refer to the North Valley Real Estate data that reports information on homes sold during the last year. For the variable price, select an appropriate class interval and orga- nize the selling prices into a frequency distribution. Write a brief report summarizing your findings. Be sure to answer the following questions in your report. a. Around what values of price do the data tend to cluster? b. Based on the frequency distribution, what is the typical selling price in the first class?

What is the typical selling price in the last class?

50 CHAPTER 2

c. Draw a cumulative relative frequency distribution. Using this distribution, fifty percent of the homes sold for what price or less? Estimate the lower price of the top ten percent of homes sold. About what percent of the homes sold for less than $300,000?

d. Refer to the variable bedrooms. Draw a bar chart showing the number of homes sold with 2, 3, 4 or more bedrooms. Write a description of the distribution.

52. Refer to the Baseball 2016 data that report information on the 30 Major League Baseball teams for the 2016 season. Create a frequency distribution for the Team Salary variable and answer the following questions. a. What is the typical salary for a team? What is the range of the salaries? b. Comment on the shape of the distribution. Does it appear that any of the teams have

a salary that is out of line with the others? c. Draw a cumulative relative frequency distribution of team salary. Using this distribu-

tion, forty percent of the teams have a salary of less than what amount? About how many teams have a total salary of more than $220 million?

53. Refer to the Lincolnville School District bus data. Select the variable referring to the number of miles traveled since the last maintenance, and then organize these data into a frequency distribution. a. What is a typical amount of miles traveled? What is the range? b. Comment on the shape of the distribution. Are there any outliers in terms of miles

driven? c. Draw a cumulative relative frequency distribution. Forty percent of the buses

were driven fewer than how many miles? How many buses were driven less than 10,500 miles?

d. Refer to the variables regarding the bus manufacturer and the bus capacity. Draw a pie chart of each variable and write a description of your results.

Describing Data: NUMERICAL MEASURES 3

LEARNING OBJECTIVES When you have completed this chapter, you will be able to:

LO3-1 Compute and interpret the mean, the median, and the mode.

LO3-2 Compute a weighted mean.

LO3-3 Compute and interpret the geometric mean.

LO3-4 Compute and interpret the range, variance, and standard deviation.

LO3-5 Explain and apply Chebyshev’s theorem and the Empirical Rule.

LO3-6 Compute the mean and standard deviation of grouped data.

THE KENTUCKY DERBY is held the first Saturday in May at Churchill Downs in Louisville, Kentucky. The race track is one and one-quarter miles. The table in Exercise 82 shows the winners since 1990, their margin of victory, the winning time, and the payoff on a $2 bet. Determine the mean and median for the variables winning time and payoff on a $2 bet. (See Exercise 82 and LO3-1.)

52 CHAPTER 3

INTRODUCTION Chapter 2 began our study of descriptive statistics. To summarize raw data into a mean- ingful form, we organized qualitative data into a frequency table and portrayed the re- sults in a bar chart. In a similar fashion, we organized quantitative data into a frequency distribution and portrayed the results in a histogram. We also looked at other graphical techniques such as pie charts to portray qualitative data and frequency polygons to portray quantitative data.

This chapter is concerned with two numerical ways of describing quantitative vari- ables, namely, measures of location and measures of dispersion. Measures of location are often referred to as averages. The purpose of a measure of location is to pinpoint the center of a distribution of data. An average is a measure of location that shows the central value of the data. Averages appear daily on TV, on vari- ous websites, in the newspaper, and in other jour- nals. Here are some examples:

• The average U.S. home changes ownership every 11.8 years.

• An American receives an average of 568 pieces of mail per year.

• The average American home has more TV sets than people. There are 2.73 TV sets and 2.55 people in the typical home.

• The average American couple spends $20,398 for their wedding, while their budget is 50% less. This does not include the cost of a honeymoon or engagement ring.

• The average price of a theater ticket in the United States is $8.31, according to the National Association of Theater Owners.

If we consider only measures of location in a set of data, or if we compare sev- eral sets of data using central values, we may draw an erroneous conclusion. In addition to measures of location, we should consider the dispersion—often called the variation or the spread—in the data. As an illustration, suppose the average annual income of executives for Internet-related companies is $80,000, and the average income for executives in pharmaceutical firms is also $80,000. If we looked only at the average incomes, we might wrongly conclude that the distribu- tions of the two salaries are the same. However, we need to examine the disper- sion or spread of the distributions of salary. A look at the salary ranges indicates that this conclusion of equal distributions is not correct. The salaries for the execu- tives in the Internet firms range from $70,000 to $90,000, but salaries for the mar- keting executives in pharmaceuticals range from $40,000 to $120,000. Thus, we conclude that although the average salaries are the same for the two industries, there is much more spread or dispersion in salaries for the pharmaceutical execu- tives. To describe the dispersion, we will consider the range, the variance, and the standard deviation.

MEASURES OF LOCATION We begin by discussing measures of location. There is not just one measure of location; in fact, there are many. We will consider five: the arithmetic mean, the median, the mode, the weighted mean, and the geometric mean. The arithmetic mean is the most widely used and widely reported measure of location. We study the mean as both a population parameter and a sample statistic.

LO3-1 Compute and interpret the mean, the median, and the mode.

STATISTICS IN ACTION

Did you ever meet the “average” American man? Well, his name is Robert (that is the nominal level of measurement) and he is 31 years old (that is the ratio level), is 69.5 inches tall (again the ratio level of measurement), weighs 172 pounds, wears a size 9½ shoe, has a 34-inch waist, and wears a size 40 suit. In addition, the average man eats 4 pounds of potato chips, watches 1,456 hours of TV, and eats 26 pounds of bananas each year, and also sleeps 7.7 hours per night. The average American woman is 5′ 4″ tall and weighs 140 pounds, while the average American model is 5′ 11″ tall and weighs 117 pounds. On any given day, almost half of the women in the United States are on a diet. Idol- ized in the 1950s, Marilyn Monroe would be consid- ered overweight by today’s standards. She fluctuated between a size 14 and a size 18 dress, and was a healthy and attractive woman.

DESCRIBING DATA: NUMERICAL MEASURES 53

The Population Mean Many studies involve all the values in a population. For example, there are 12 sales as- sociates employed at the Reynolds Road Carpet Outlet. The mean amount of commis- sion they earned last month was $1,345. This is a population value because we considered the commission of all the sales associates. Other examples of a population mean would be:

• The mean closing price for Johnson & Johnson stock for the last 5 days is $95.47. • The mean number of hours of overtime worked last week by the six welders in the

welding department of Butts Welding Inc. is 6.45 hours. • Caryn Tirsch began a website last month devoted to organic gardening. The mean

number of hits on her site for the 31 days in July was 84.36.

For raw data—that is, data that have not been grouped in a frequency distribution— the population mean is the sum of all the values in the population divided by the num- ber of values in the population. To find the population mean, we use the following formula.

Population mean = Sum of all the values in the population

Number of values in the population

Instead of writing out in words the full directions for computing the population mean (or any other measure), it is more convenient to use the shorthand symbols of mathe- matics. The mean of the population using mathematical symbols is:

POPULATION MEAN μ = Σx N

(3–1)

where: μ represents the population mean. It is the Greek lowercase letter “mu.” N is the number of values in the population. x represents any particular value. Σ is the Greek capital letter “sigma” and indicates the operation of adding. Σx is the sum of the x values in the population.

Any measurable characteristic of a population is called a parameter. The mean of a population is an example of a parameter.

PARAMETER A characteristic of a population.

E X A M P L E

There are 42 exits on I-75 through the state of Kentucky. Listed below are the distances between exits (in miles).

11 4 10 4 9 3 8 10 3 14 1 10 3 5 2 2 5 6 1 2 2 3 7 1 3 7 8 10 1 4 7 5 2 2 5 1 1 3 3 1 2 1

54 CHAPTER 3

Why is this information a population? What is the mean number of miles between exits?

S O L U T I O N

This is a population because we are considering all the exits on I-75 in Kentucky. We add the distances between each of the 42 exits. The total distance is 192 miles. To find the arithmetic mean, we divide this total by 42. So the arithmetic mean is 4.57 miles, found by 192/42. From formula (3–1):

μ = Σx N

= 11 + 4 + 10 + … + 1

42 =

192 42

= 4.57

How do we interpret the value of 4.57? It is the typical number of miles between exits. Because we considered all the exits on I-75 in Kentucky, this value is a popu- lation parameter.

The Sample Mean As explained in Chapter 1, we often select a sample from the population to estimate a specific characteristic of the population. Smucker’s quality assurance department needs to be assured that the amount of orange marmalade in the jar labeled as containing 12 ounces actually contains that amount. It would be very expensive and time-consuming to check the weight of each jar. Therefore, a sample of 20 jars is selected, the mean of the sample is determined, and that value is used to estimate the amount in each jar.

For raw data—that is, ungrouped data—the mean is the sum of all the sampled values divided by the total number of sampled values. To find the mean for a sample:

Sample mean = Sum of all the values in the sample

Number of values in the sample

The mean of a sample and the mean of a population are computed in the same way, but the shorthand notation used is different. The formula for the mean of a sam- ple is:

SAMPLE MEAN x = Σx n

(3–2)

where: x represents the sample mean. It is read “x bar.” n is the number of values in the sample. x represents any particular value. Σ is the Greek capital letter “sigma” and indicates the operation of adding. Σx is the sum of the x values in the sample.

The mean of a sample, or any other measure based on sample data, is called a statistic. If the mean weight of a sample of 10 jars of Smucker’s orange marmalade is 11.5 ounces, this is an example of a statistic.

DESCRIBING DATA: NUMERICAL MEASURES 55

Properties of the Arithmetic Mean The arithmetic mean is a widely used measure of location. It has several important properties:

1. To compute a mean, the data must be measured at the interval or ratio level. Recall from Chapter 1 that ratio-level data include such data as ages, incomes, and weights, with the distance between numbers being constant.

2. All the values are included in computing the mean. 3. The mean is unique. That is, there is only one mean in a set of data. Later in the

chapter, we will discover a measure of location that might appear twice, or more than twice, in a set of data.

4. The sum of the deviations of each value from the mean is zero. Expressed symbolically:

Σ (x − x) = 0

As an example, the mean of 3, 8, and 4 is 5. Then:

Σ(x − x ) = (3 − 5) + (8 − 5) + (4 − 5)

= −2 + 3 − 1

= 0

Thus, we can consider the mean as a balance point for a set of data. To illustrate, we have a long board with the numbers 1, 2, 3, . . . , 9 evenly spaced on it. Suppose three bars of equal weight were placed on the board at numbers 3, 4, and 8, and the balance point was set at 5, the mean of the three numbers. We would find that the

STATISTIC A characteristic of a sample.

E X A M P L E

Verizon is studying the number of monthly minutes used by clients in a particular cell phone rate plan. A random sample of 12 clients showed the following number of minutes used last month.

90 77 94 89 119 112 91 110 92 100 113 83

What is the arithmetic mean number of minutes used last month?

S O L U T I O N

Using formula (3–2), the sample mean is:

Sample mean = Sum of all values in the sample Number of values in the sample

x = Σx n

= 90 + 77 + … + 83

12 =

1,170 12

= 97.5

The arithmetic mean number of minutes used last month by the sample of cell phone users is 97.5 minutes.

56 CHAPTER 3

board is balanced perfectly! The deviations below the mean (−3) are equal to the devi- ations above the mean (+3). Shown schematically:

1 2 3 4 5 6 7 8 9

_ x

The mean does have a weakness. Recall that the mean uses the value of every item in a sample, or population, in its computation. If one or two of these values are either extremely large or extremely small compared to the majority of data, the mean might not be an appropriate average to represent the data. For example, suppose the annual incomes of a sample of financial planners at Merrill Lynch are $62,900, $61,600, $62,500, $60,800, and $1,200,000. The mean income is $289,560. Obvi- ously, it is not representative of this group because all but one financial planner has an income in the $60,000 to $63,000 range. One income ($1.2 million) is unduly affecting the mean.

1. The annual incomes of a sample of middle-management employees at Westinghouse are $62,900, $69,100, $58,300, and $76,800.

(a) Give the formula for the sample mean. (b) Find the sample mean. (c) Is the mean you computed in (b) a statistic or a parameter? Why? (d) What is your best estimate of the population mean? 2. The six students in Computer Science 411 are a population. Their final course grades

are 92, 96, 61, 86, 79, and 84. (a) Give the formula for the population mean. (b) Compute the mean course grade. (c) Is the mean you computed in (b) a statistic or a parameter? Why?

S E L F - R E V I E W 3–1

The answers to the odd-numbered exercises are in Appendix D.

1. Compute the mean of the following population values: 6, 3, 5, 7, 6. 2. Compute the mean of the following population values: 7, 5, 7, 3, 7, 4. 3. a. Compute the mean of the following sample values: 5, 9, 4, 10.

b. Show that Σ (x − x) = 0. 4. a. Compute the mean of the following sample values: 1.3, 7.0, 3.6, 4.1, 5.0.

b. Show that Σ (x − x) = 0. 5. Compute the mean of the following sample values: 16.25, 12.91, 14.58. 6. Suppose you go to the grocery store and spend $61.85 for the purchase of 14

items. What is the mean price per item?

E X E R C I S E S

DESCRIBING DATA: NUMERICAL MEASURES 57

The Median We have stressed that, for data containing one or two very large or very small values, the arithmetic mean may not be representative. The center for such data is better de- scribed by a measure of location called the median.

To illustrate the need for a measure of location other than the arithmetic mean, sup- pose you are seeking to buy a condominium in Palm Aire. Your real estate agent says that the typical price of the units currently available is $110,000. Would you still want to look? If you had budgeted your maximum purchase price at $75,000, you might think they are out of your price range. However, checking the prices of the individual units might change your mind. They are $60,000, $65,000, $70,000, and $80,000, and a superdeluxe penthouse costs $275,000. The arithmetic mean price is $110,000, as the real estate agent reported, but one price ($275,000) is pulling the arithmetic mean up- ward, causing it to be an unrepresentative average. It does seem that a price around $70,000 is a more typical or representative average, and it is. In cases such as this, the median provides a more valid measure of location.

MEDIAN The midpoint of the values after they have been ordered from the minimum to the maximum values.

For Exercises 7–10, (a) compute the arithmetic mean and (b) indicate whether it is a statistic or a parameter.

7. There are 10 salespeople employed by Midtown Ford. The number of new cars sold last month by the respective salespeople were: 15, 23, 4, 19, 18, 10, 10, 8, 28, 19.

8. A mail-order company counted the number of incoming calls per day to the compa- ny’s toll-free number during the first 7 days in May: 14, 24, 19, 31, 36, 26, 17.

9. The Cambridge Power and Light Company selected a random sample of 20 residential customers. Following are the amounts, to the nearest dollar, the custom- ers were charged for electrical service last month:

54 48 58 50 25 47 75 46 60 70 67 68 39 35 56 66 33 62 65 67

10. A Human Resources manager at Metal Technologies studied the overtime hours of welders. A sample of 15 welders showed the following number of overtime hours worked last month.

13 13 12 15 7 15 5 12 6 7 12 10 9 13 12

11. AAA Heating and Air Conditioning completed 30 jobs last month with a mean reve- nue of $5,430 per job. The president wants to know the total revenue for the month. Based on the limited information, can you compute the total revenue? What is it?

12. A large pharmaceutical company hires business administration graduates to sell its products. The company is growing rapidly and dedicates only 1 day of sales train- ing for new salespeople. The company’s goal for new salespeople is $10,000 per month. The goal is based on the current mean sales for the entire company, which is $10,000 per month. After reviewing the retention rates of new employees, the company finds that only 1 in 10 new employees stays longer than 3 months. Com- ment on using the current mean sales per month as a sales goal for new employ- ees. Why do new employees leave the company?

58 CHAPTER 3

The median price of the units available is $70,000. To determine this, we order the prices from the minimum value ($60,000) to the maximum value ($275,000) and select the middle value ($70,000). For the median, the data must be at least an ordinal level of measurement.

Prices Ordered from Prices Ordered from Minimum to Maximum Maximum to Minimum

$ 60,000 $275,000 65,000 80,000 70,000 ← Median → 70,000 80,000 65,000 275,000 60,000

Note that there is the same number of prices below the median of $70,000 as above it. The median is, therefore, unaffected by extremely low or high prices. Had the highest price been $90,000, or $300,000, or even $1 million, the median price would still be $70,000. Likewise, had the lowest price been $20,000 or $50,000, the median price would still be $70,000.

In the previous illustration, there are an odd number of observations (five). How is the median determined for an even number of observations? As before, the observa- tions are ordered. Then by convention to obtain a unique value we calculate the mean of the two middle observations. So for an even number of observations, the median may not be one of the given values.

E X A M P L E

Facebook is a popular social networking website. Users can add friends and send them messages, and update their personal profiles to notify friends about them- selves and their activities. A sample of 10 adults revealed they spent the following number of hours last month using Facebook.

3 5 7 5 9 1 3 9 17 10

Find the median number of hours.

S O L U T I O N

Note that the number of adults sampled is even (10). The first step, as before, is to order the hours using Facebook from the minimum value to the maximum value. Then identify the two middle times. The arithmetic mean of the two middle observations gives us the median hours. Arranging the values from minimum to maximum:

1 3 3 5 5 7 9 9 10 17

The median is found by averaging the two middle values. The middle values are 5 hours and 7 hours, and the mean of these two values is 6. We conclude that the typical adult Facebook user spends 6 hours per month at the website. Notice that the median is not one of the values. Also, half of the times are below the median and half are above it.

DESCRIBING DATA: NUMERICAL MEASURES 59

The major properties of the median are:

1. It is not affected by extremely large or small values. Therefore, the median is a valuable measure of location when such values do occur.

2. It can be computed for ordinal-level data or higher. Recall from Chapter 1 that ordinal-level data can be ranked from low to high.

The Mode The mode is another measure of location.

MODE The value of the observation that appears most frequently.

The mode is especially useful in summarizing nominal-level data. As an example of its use for nominal-level data, a company has developed five bath oils. The bar chart in Chart 3–1 shows the results of a marketing survey designed to find which bath oil con- sumers prefer. The largest number of respondents favored Lamoure, as evidenced by the highest bar. Thus, Lamoure is the mode.

N um

be r

of R

es po

ns es

Bath oil

Amor Lamoure Soothing

300

200

100

400

Smell Nice Far Out

Mode

CHART 3–1 Number of Respondents Favoring Various Bath Oils

E X A M P L E

Recall the data regarding the distance in miles between exits on I-75 in Kentucky. The information is repeated below.

11 4 10 4 9 3 8 10 3 14 1 10 3 5 2 2 5 6 1 2 2 3 7 1 3 7 8 10 1 4 7 5 2 2 5 1 1 3 3 1 2 1

What is the modal distance?

S O L U T I O N

The first step is to organize the distances into a frequency table. This will help us determine the distance that occurs most frequently.

60 CHAPTER 3

In summary, we can determine the mode for all levels of data—nominal, ordinal, in- terval, and ratio. The mode also has the advantage of not being affected by extremely high or low values.

The mode does have disadvantages, however, that cause it to be used less fre- quently than the mean or median. For many sets of data, there is no mode because no value appears more than once. For example, there is no mode for this set of price data because every value occurs once: $19, $21, $23, $20, and $18. Conversely, for some data sets there is more than one mode. Suppose the ages of the individuals in a stock investment club are 22, 26, 27, 27, 31, 35, and 35. Both the ages 27 and 35 are modes. Thus, this grouping of ages is referred to as bimodal (having two modes). One would question the use of two modes to represent the location of this set of age data.

Distance in Miles between Exits Frequency

1 8 2 7 3 7 4 3 5 4 6 1 7 3 8 2 9 1 10 4 11 1 14 1

Total 42

The distance that occurs most often is 1 mile. This happens eight times—that is, there are eight exits that are 1 mile apart. So the modal distance between exits is 1 mile.

Which of the three measures of location (mean, median, or mode) best rep- resents the central location of these data? Is the mode the best measure of location to represent the Kentucky data? No. The mode assumes only the nominal scale of measurement and the variable miles is measured using the ratio scale. We calcu- lated the mean to be 4.57 miles. See page 54. Is the mean the best measure of location to represent these data? Probably not. There are several cases in which the distance between exits is large. These values are affecting the mean, making it too large and not representative of the distances between exits. What about the median? The median distance is 3 miles. That is, half of the distances between exits are 3 miles or less. In this case, the median of 3 miles between exits is probably a more representative measure of the distance between exits.

1. A sample of single persons in Towson, Texas, receiving Social Security payments revealed these monthly benefits: $852, $598, $580, $1,374, $960, $878, and $1,130.

(a) What is the median monthly benefit? (b) How many observations are below the median? Above it? 2. The number of work stoppages in the United States over the last 10 years are 22, 20,

21, 15, 5, 11, 19, 19, 15, and 11. (a) What is the median number of stoppages? (b) How many observations are below the median? Above it? (c) What is the modal number of work stoppages?

S E L F - R E V I E W 3–2

DESCRIBING DATA: NUMERICAL MEASURES 61

13. What would you report as the modal value for a set of observations if there were a total of: a. 10 observations and no two values were the same? b. 6 observations and they were all the same? c. 6 observations and the values were 1, 2, 3, 3, 4, and 4?

For Exercises 14–16, determine the (a) mean, (b) median, and (c) mode.

14. The following is the number of oil changes for the last 7 days at the Jiffy Lube located at the corner of Elm Street and Pennsylvania Avenue.

41 15 39 54 31 15 33

15. The following is the percent change in net income from last year to this year for a sample of 12 construction companies in Denver.

5 1 −10 −6 5 12 7 8 6 5 −1 11

16. The following are the ages of the 10 people in the Java Coffee Shop at the Southwyck Shopping Mall at 10 a.m.

21 41 20 23 24 33 37 42 23 29

17. Several indicators of long-term economic growth in the United States and their annual percent change are listed below.

Economic Indicator Percent Change Economic Indicator Percent Change

Inflation 4.5% Real GNP 2.9% Exports 4.7 Investment (residential) 3.6 Imports 2.3 Investment (nonresidential) 2.1 Real disposable income 2.9 Productivity (total) 1.4 Consumption 2.7 Productivity (manufacturing) 5.2

a. What is the median percent change? b. What is the modal percent change?

18. Sally Reynolds sells real estate along the coastal area of Northern California. Below are her total annual commissions between 2005 and 2015. Find the mean, median, and mode of the commissions she earned for the 11 years.

Year Amount (thousands)

2005 292.16 2006 233.80 2007 206.97 2008 202.67 2009 164.69 2010 206.53 2011 237.51 2012 225.57 2013 255.33 2014 248.14 2015 269.11

19. The accounting firm of Rowatti and Koppel specializes in income tax returns for self-employed professionals, such as physicians, dentists, architects, and law- yers. The firm employs 11 accountants who prepare the returns. For last year, the number of returns prepared by each accountant was:

58 75 31 58 46 65 60 71 45 58 80

E X E R C I S E S

62 CHAPTER 3

The Relative Positions of the Mean, Median, and Mode Refer to the histogram in Chart 3–2. It is a symmetric distribution, which is also mound- shaped. This distribution has the same shape on either side of the center. If the histo- gram were folded in half, the two halves would be identical. For any symmetric distribution, the mode, median, and mean are located at the center and are always equal. They are all equal to 30 years in Chart 3–2. We should point out that there are symmetric distributions that are not mound-shaped.

Age

Fr eq

ue nc

Mean = 30 Median = 30 Mode = 30

CHART 3–2 A Symmetric Distribution

The number of years corresponding to the highest point of the curve is the mode (30 years). Because the distribution is symmetrical, the median corresponds to the point where the distribution is cut in half (30 years). Also, because the arithmetic mean is the balance point of a distribution (as shown in the Properties of the Arithmetic Mean sec- tion on page 56), and the distribution is symmetric, the arithmetic mean is 30. Logically, any of the three measures would be appropriate to represent the distribution’s center.

If a distribution is nonsymmetrical, or skewed, the relationship among the three measures changes. In a positively skewed distribution, such as the distribution of weekly income in Chart 3–3, the arithmetic mean is the largest of the three measures. Why? Because the mean is influenced more than the median or mode by a few extremely high values. The median is generally the next largest measure in a positively skewed frequency distribution. The mode is the smallest of the three measures.

If the distribution is highly skewed, the mean would not be a good measure to use. The median and mode would be more representative.

Find the mean, median, and mode for the number of returns prepared by each accountant. If you could report only one, which measure of location would you recommend reporting?

20. The demand for the video games provided by Mid-Tech Video Games Inc. has exploded in the last several years. Hence, the owner needs to hire several new technical people to keep up with the demand. Mid-Tech gives each applicant a special test that Dr. McGraw, the designer of the test, believes is closely related to the ability to create video games. For the general population, the mean on this test is 100. Below are the scores on this test for the applicants.

95 105 120 81 90 115 99 100 130 10

The president is interested in the overall quality of the job applicants based on this test. Compute the mean and the median scores for the 10 applicants. What would you report to the president? Does it seem that the applicants are better than the general population?

DESCRIBING DATA: NUMERICAL MEASURES 63

Conversely, if a distribution is negatively skewed, such as the distribution of tensile strength in Chart 3–4, the mean is the lowest of the three measures. The mean is, of course, influenced by a few extremely low observations. The median is greater than the arithmetic mean, and the modal value is the largest of the three measures. Again, if the distribution is highly skewed, the mean should not be used to represent the data.

Mode = 25

Fr eq

ue nc

Median = 29 Mean = 60

Weekly Income

CHART 3–3 A Positively Skewed Distribution

CHART 3–4 A Negatively Skewed Distribution

Mean = 45

Fr eq

ue nc

Median = 76 Mode = 80

Tensile Strength

The weekly sales from a sample of Hi-Tec electronic supply stores were organized into a frequency distribution. The mean of weekly sales was computed to be $105,900, the median $105,000, and the mode $104,500. (a) Sketch the sales in the form of a smoothed frequency polygon. Note the location of the

mean, median, and mode on the X-axis. (b) Is the distribution symmetrical, positively skewed, or negatively skewed? Explain.

S E L F - R E V I E W 3–3

21. The unemployment rate in the state of Alaska by month is given in the table below:

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

8.7 8.8 8.7 7.8 7.3 7.8 6.6 6.5 6.5 6.8 7.3 7.6

E X E R C I S E S

64 CHAPTER 3

E X A M P L E

Table 2–4 on page 26 shows the profit on the sales of 180 vehicles at Applewood Auto Group. Determine the mean and the median selling price.

S O L U T I O N

Software Solution We can use a statistical software package to find many measures of location.

Is it much different? 22. Big Orange Trucking is designing an information system for use in “in-cab”

City Wind Direction Temperature Pavement

DESCRIBING DATA: NUMERICAL MEASURES 65

THE WEIGHTED MEAN The weighted mean is a convenient way to compute the arithmetic mean when there are several observations of the same value. To explain, suppose the nearby Wendy’s Restaurant sold medium, large, and Biggie-sized soft drinks for $1.84, $2.07, and $2.40, respectively. Of the last 10 drinks sold, 3 were medium, 4 were large, and 3 were Biggie- sized. To find the mean price of the last 10 drinks sold, we could use formula (3–2).

x = $1.84 + $1.84 + $1.84 + $2.07 + $2.07 + $2.07 + $2.07 + $2.40 + $2.40 + $2.40

x = $21.00

10 = $2.10

The mean selling price of the last 10 drinks is $2.10. An easier way to find the mean selling price is to determine the weighted mean.

That is, we multiply each observation by the number of times it occurs. We will refer to the weighted mean as xW . This is read “x bar sub w.”

xw = 3($1.84) + 4($2.07) + 3($2.40)

10 =

$21.00 10

= $2.10

In this case, the weights are frequency counts. However, any measure of importance could be used as a weight. In general, the weighted mean of a set of numbers designated x1, x2, x3, . . . , xn with the corresponding weights w1, w2, w3, . . . , wn is computed by:

LO3-2 Compute a weighted mean.

The mean profit is $1,843.17 and the median is $1,882.50. These two values are less than $40 apart, so either value is reasonable. We can also see from the Excel output that there were 180 vehicles sold and their total profit was $331,770.00. We will describe the meaning of standard error, standard deviation, and other measures reported on the output later in this chapter and in later chapters.

What can we conclude? The typical profit on a vehicle is about $1,850. Man- agement at Applewood might use this value for revenue projections. For example, if the dealership could increase the number of vehicles sold in a month from 180 to 200, this would result in an additional estimated $37,000 of revenue, found by 20($1,850).

WEIGHTED MEAN xw = w1x1 + w2x2 + w3x3 + … + wnxn

w1 + w2 + w3 + … + wn (3–3)

This may be shortened to:

xw = Σ (wx)

Σw Note that the denominator of a weighted mean is always the sum of the weights.

E X A M P L E

The Carter Construction Company pays its hourly employees $16.50, $19.00, or $25.00 per hour. There are 26 hourly employees, 14 of whom are paid at the $16.50 rate, 10 at the $19.00 rate, and 2 at the $25.00 rate. What is the mean hourly rate paid the 26 employees?

66 CHAPTER 3

S O L U T I O N

To find the mean hourly rate, we multiply each of the hourly rates by the number of employees earning that rate. From formula (3–3), the mean hourly rate is

xw = 14($16.50) + 10($19.00) + 2($25.00)

14 + 10 + 2 =

$471.00 26

= $18.1154

The weighted mean hourly wage is rounded to $18.12.

Springers sold 95 Antonelli men’s suits for the regular price of $400. For the spring sale, the suits were reduced to $200 and 126 were sold. At the final clearance, the price was reduced to $100 and the remaining 79 suits were sold. (a) What was the weighted mean price of an Antonelli suit? (b) Springers paid $200 a suit for the 300 suits. Comment on the store’s profit per suit if a

salesperson receives a $25 commission for each one sold.

S E L F - R E V I E W 3–4

THE GEOMETRIC MEAN The geometric mean is useful in finding the average change of percentages, ratios, in- dexes, or growth rates over time. It has a wide application in business and economics because we are often interested in finding the percentage changes in sales, salaries, or economic figures, such as the gross domestic product, which compound or build on each other. The geometric mean of a set of n positive numbers is defined as the nth root of the product of n values. The formula for the geometric mean is written:

LO3-3 Compute and interpret the geometric mean.

GEOMETRIC MEAN GM = √n (x1) (x2) … (xn) (3–4)

The geometric mean will always be less than or equal to (never more than) the arithme- tic mean. Also, all the data values must be positive.

As an example of the geometric mean, suppose you receive a 5% increase in salary this year and a 15% increase next year. The average annual percent increase is 9.886%,

23. In June, an investor purchased 300 shares of Oracle (an information technology company) stock at $20 per share. In August, she purchased an additional 400 shares at $25 per share. In November, she purchased an additional 400 shares, but the stock declined to $23 per share. What is the weighted mean price per share?

24. The Bookstall Inc. is a specialty bookstore concentrating on used books sold via the Internet. Paperbacks are $1.00 each, and hardcover books are $3.50. Of the 50 books sold last Tuesday morning, 40 were paperback and the rest were hard- cover. What was the weighted mean price of a book?

25. The Loris Healthcare System employs 200 persons on the nursing staff. Fifty are nurse’s aides, 50 are practical nurses, and 100 are registered nurses. Nurse’s aides receive $8 an hour, practical nurses $15 an hour, and registered nurses $24 an hour. What is the weighted mean hourly wage?

26. Andrews and Associates specialize in corporate law. They charge $100 an hour for researching a case, $75 an hour for consultations, and $200 an hour for writing a brief. Last week one of the associates spent 10 hours consulting with her client, 10 hours researching the case, and 20 hours writing the brief. What was the weighted mean hourly charge for her legal services?

E X E R C I S E S

DESCRIBING DATA: NUMERICAL MEASURES 67

not 10.0%. Why is this so? We begin by calculating the geometric mean. Recall, for ex- ample, that a 5% increase in salary is 105%. We will write it as 1.05.

GM = √(1.05) (1.15) = 1.09886

This can be verified by assuming that your monthly earning was $3,000 to start and you received two increases of 5% and 15%.

Raise 1 = $3,000(.05) = $150.00

Raise 2 = $3,150(.15) = 472.50 Total $622.50

Your total salary increase is $622.50. This is equivalent to:

$3,000.00(.09886) = $296.59

$3,296.58(.09886) = 325.91 $622.50

The following example shows the geometric mean of several percentages.

E X A M P L E

The return on investment earned by Atkins Construction Company for four succes- sive years was 30%, 20%, −40%, and 200%. What is the geometric mean rate of return on investment?

S O L U T I O N

The number 1.3 represents the 30% return on investment, which is the “original” investment of 1.0 plus the “return” of 0.3. The number 0.6 represents the loss of 40%, which is the original investment of 1.0 less the loss of 0.4. This calculation assumes the total return each period is reinvested or becomes the base for the next period. In other words, the base for the second period is 1.3 and the base for the third period is (1.3)(1.2) and so forth.

Then the geometric mean rate of return is 29.4%, found by

GM = √n (x1) (x2) … (xn) = √ 4 (1.3) (1.2) (0.6) (3.0) = √4 2.808 = 1.294

The geometric mean is the fourth root of 2.808. So, the average rate of return (com- pound annual growth rate) is 29.4%.

Notice also that if you compute the arithmetic mean [(30 + 20 − 40 + 200)/4 = 52.5], you would have a much larger number, which would overstate the true rate of return!

A second application of the geometric mean is to find an average percentage change over a period of time. For example, if you earned $45,000 in 2004 and $100,000 in 2016, what is your annual rate of increase over the period? It is 6.88%. The rate of increase is determined from the following formula.

RATE OF INCREASE OVER TIME GM = √ n Value at end of period

Value at start of period − 1 (3–5)

In formula 3-5 above, n is the number of periods. An example will show the details of finding the average annual percent increase.

68 CHAPTER 3

E X A M P L E

During the decade of the 1990s, and into the 2000s, Las Vegas, Nevada, was one of the fastest-growing cities in the United States. The population increased from 258,295 in 1990 to 613,599 in 2014. This is an increase of 355,304 people, or a 137.56% increase over the period. The population has more than doubled. What is the average annual percent increase?

S O L U T I O N

There are 24 years between 1990 and 2014, so n = 24. Then the geometric mean formula (3–5) as applied to this problem is:

GM = √ n Value at end of period

Value at start of period − 1.0 = √

24 613,599 258,295

− 1.0 = 1.0367 − 1.0 = .0367

To summarize, the steps to compute the geometric mean are:

1. Divide the value at the end of the period by the value at the beginning of the period.

2. Find the nth root of the ratio, where n is the number of periods. 3. Subtract one.

The value of .0367 indicates that the average annual growth over the period was 3.67%. To put it another way, the population of Las Vegas increased at a rate of 3.67% per year from 1990 to 2014.

1. The percent increase in sales for the last 4 years at Combs Cosmetics were 4.91, 5.75, 8.12, and 21.60.

(a) Find the geometric mean percent increase. (b) Find the arithmetic mean percent increase. (c) Is the arithmetic mean equal to or greater than the geometric mean? 2. Production of Cablos trucks increased from 23,000 units in 1996 to 120,520 in 2016.

Find the geometric mean annual percent increase.

S E L F - R E V I E W 3–5

27. Compute the geometric mean of the following monthly percent increases: 8, 12, 14, 26, and 5.

28. Compute the geometric mean of the following weekly percent increases: 2, 8, 6, 4, 10, 6, 8, and 4.

29. Listed below is the percent increase in sales for the MG Corporation over the last 5 years. Determine the geometric mean percent increase in sales over the period.

9.4 13.8 11.7 11.9 14.7

30. In 2001, a total of 40,244,000 taxpayers in the United States filed their individual tax returns electronically. By the year 2015, the number increased to 128,653,000. What is the geometric mean annual increase for the period?

31. The Consumer Price Index is reported monthly by the U.S. Bureau of Labor Statis- tics. It reports the change in prices for a market basket of goods from one period to another. The index for 2000 was 172.2. By 2015, it increased to 236.525. What was the geometric mean annual increase for the period?

32. JetBlue Airways is an American low-cost airline headquartered in New York City. Its main base is John F. Kennedy International Airport. JetBlue’s revenue in 2002 was $635.2 million. By 2014, revenue had increased to $5,817.0 million. What was the geometric mean annual increase for the period?

E X E R C I S E S

DESCRIBING DATA: NUMERICAL MEASURES 69

WHY STUDY DISPERSION? A measure of location, such as the mean, median, or mode, only describes the center of the data. It is valuable from that standpoint, but it does not tell us anything about the spread of the data. For example, if your nature guide told you that the river ahead aver- aged 3 feet in depth, would you want to wade across on foot without additional informa- tion? Probably not. You would want to know something about the variation in the depth. Is the maximum depth of the river 3.25 feet and the minimum 2.75 feet? If that is the case, you would probably agree to cross. What if you learned the river depth ranged from 0.50 foot to 5.5 feet? Your decision would probably be not to cross. Before making a decision about crossing the river, you want information on both the typical depth and the dispersion in the depth of the river.

A small value for a measure of dispersion indicates that the data are clustered closely, say, around the arithmetic mean. The mean is therefore considered representative of the data. Conversely, a large measure of dispersion indicates that the mean is not reliable. Refer to Chart 3–5. The 100 employees of Hammond Iron Works Inc., a steel fabricating company, are organized into a histogram based on the number of years of employment with the company. The mean is 4.9 years, but the spread of the data is from 6 months to 16.8 years. The mean of 4.9 years is not very representative of all the employees.

LO3-4 Compute and interpret the range, variance, and standard deviation.

STATISTICS IN ACTION

The U.S. Postal Service has tried to become more “user friendly” in the last several years. A recent survey showed that customers were interested in more consistency in the time it takes to make a delivery. Under the old conditions, a local letter might take only one day to deliver, or it might take several. “Just tell me how many days ahead I need to mail the birthday card to Mom so it gets there on her birthday, not early, not late,” was a common complaint. The level of consistency is mea- sured by the standard devi- ation of the delivery times.

Years

Em pl

oy ee

10 20

CHART 3–5 Histogram of Years of Employment at Hammond Iron Works Inc.

A second reason for studying the dispersion in a set of data is to compare the spread in two or more distributions. Suppose, for example, that the new Vision Quest LCD computer monitor is assembled in Baton Rouge and also in Tucson. The arithmetic mean hourly output in both the Baton Rouge plant and the Tucson plant is 50. Based on

33. In 2000, there were 720,000 cell phone subscribers worldwide. By 2015, the num- ber of cell phone subscribers increased to 752,000,000. What is the geometric mean annual increase for the period?

34. The information below shows the cost for a year of college in public and private colleges in 2002–03 and 2015–16. What is the geometric mean annual increase for the period for the two types of colleges? Compare the rates of increase.

Type of College 2002–03 2015–16

Public $ 4,960 $23,893 Private 18,056 32,405

70 CHAPTER 3

the two means, you might conclude that the distributions of the hourly outputs are iden- tical. Production records for 9 hours at the two plants, however, reveal that this conclu- sion is not correct (see Chart 3–6). Baton Rouge production varies from 48 to 52 assemblies per hour. Production at the Tucson plant is more erratic, ranging from 40 to 60 per hour. Therefore, the hourly output for Baton Rouge is clustered near the mean of 50; the hourly output for Tucson is more dispersed.

We will consider several measures of dispersion. The range is based on the maxi- mum and minimum values in the data set; that is, only two values are considered. The variance and the standard deviation use all the values in a data set and are based on deviations from the arithmetic mean.

Range The simplest measure of dispersion is the range. It is the difference between the maxi- mum and minimum values in a data set. In the form of an equation:

48 49 50 51 52 _ X

53 54 55 56 57 58 59 604746454443424140

Baton Rouge

Tucson

Hourly Production

CHART 3–6 Hourly Production of Computer Monitors at the Baton Rouge and Tucson Plants

RANGE Range = Maximum value − Minimum value (3–6)

The range is widely used in production management and control applications be- cause it is very easy to calculate and understand.

E X A M P L E

Refer to Chart 3–6 above. Find the range in the number of computer monitors pro- duced per hour for the Baton Rouge and the Tucson plants. Interpret the two ranges.

S O L U T I O N

The range of the hourly production of computer monitors at the Baton Rouge plant is 4, found by the difference between the maximum hourly production of 52 and

DESCRIBING DATA: NUMERICAL MEASURES 71

Variance A limitation of the range is that it is based on only two values, the maximum and the minimum; it does not take into consideration all of the values. The variance does. It measures the mean amount by which the values in a population, or sample, vary from their mean. In terms of a definition:

the minimum of 48. The range in the hourly production for the Tucson plant is 20 computer monitors, found by 60 − 40. We therefore conclude that (1) there is less dispersion in the hourly production in the Baton Rouge plant than in the Tucson plant because the range of 4 computer monitors is less than a range of 20 com- puter monitors and (2) the production is clustered more closely around the mean of 50 at the Baton Rouge plant than at the Tucson plant (because a range of 4 is less than a range of 20). Thus, the mean production in the Baton Rouge plant (50 com- puter monitors) is a more representative measure of location than the mean of 50 computer monitors for the Tucson plant.

VARIANCE The arithmetic mean of the squared deviations from the mean.

The following example illustrates how the variance is used to measure dispersion.

E X A M P L E

The chart below shows the number of cappuccinos sold at the Starbucks in the Orange County airport and the Ontario, California, airport between 4 and 5 p.m. for a sample of 5 days last month.

Determine the mean, median, range, and variance for each location. Comment on the similarities and differences in these measures.

S O L U T I O N

The mean, median, and range for each of the airport locations are reported as part of an Excel spreadsheet.

72 CHAPTER 3

Notice that all three of the measures are exactly the same. Does this indicate that there is no difference in the two sets of data? We get a clearer picture if we calcu- late the variance. First, for Orange County:

Variance = Σ(x − μ)2

N =

(−302) + (−102) + 02 + 102 + 302

5 =

2,000 5

= 400

The variance is 400. That is, the average squared deviation from the mean is 400. The following shows the detail of determining the variance for the number of

cappuccinos sold at the Ontario Airport.

Variance = Σ(x − μ)2

N =

(−302) + (−52) + 02 + 52 + 302

5 =

1,850 5

= 370

So the mean, median, and range of the cappuccinos sold are the same at the two airports, but the variances are different. The variance at Orange County is 400, but it is 370 at Ontario.

Let’s interpret and compare the results of our measures for the two Starbucks airport locations. The mean and median of the two locations are exactly the same, 50 cappuccinos sold. These measures of location suggest the two distributions are the same. The range for both locations is also the same, 60. However, recall that

DESCRIBING DATA: NUMERICAL MEASURES 73

the range provides limited information about the dispersion because it is based on only two values, the minimum and maximum.

The variances are not the same for the two airports. The variance is based on the differences between each observation and the arithmetic mean. It shows the closeness or clustering of the data relative to the mean or center of the distribution. Compare the variance for Orange County of 400 to the variance for Ontario of 370. Based on the variance, we conclude that the dispersion for the sales distribution of the Ontario Starbucks is more concentrated—that is, nearer the mean of 50—than for the Orange County location.

The variance has an important advantage over the range. It uses all the values in the computation. Recall that the range uses only the highest and the lowest values.

The weights of containers being shipped to Ireland are (in thousands of pounds):

95 103 105 110 104 105 112 90

(a) What is the range of the weights? (b) Compute the arithmetic mean weight. (c) Compute the variance of the weights.

S E L F - R E V I E W 3–6

For Exercises 35–38, calculate the (a) range, (b) arithmetic mean, (c) variance, and (d) interpret the statistics.

35. During last weekend’s sale, there were five customer service representatives on duty at the Electronic Super Store. The numbers of HDTVs these representatives sold were 5, 8, 4, 10, and 3.

36. The Department of Statistics at Western State University offers eight sections of basic statistics. Following are the numbers of students enrolled in these sections: 34, 46, 52, 29, 41, 38, 36, and 28.

37. Dave’s Automatic Door installs automatic garage door openers. The following list indicates the number of minutes needed to install 10 door openers: 28, 32, 24, 46, 44, 40, 54, 38, 32, and 42.

38. All eight companies in the aerospace industry were surveyed as to their return on investment last year. The results are: 10.6%, 12.6%, 14.8%, 18.2%, 12.0%, 14.8%, 12.2%, and 15.6%.

39. Ten young adults living in California rated the taste of a newly developed su- shi pizza topped with tuna, rice, and kelp on a scale of 1 to 50, with 1 indicating they did not like the taste and 50 that they did. The ratings were:

34 39 40 46 33 31 34 14 15 45

In a parallel study, 10 young adults in Iowa rated the taste of the same pizza. The ratings were:

28 25 35 16 25 29 24 26 17 20

As a market researcher, compare the potential for sushi pizza in the two markets. 40. The personnel files of all eight employees at the Pawnee location of Acme

Carpet Cleaners Inc. revealed that during the last 6-month period they lost the fol- lowing number of days due to illness:

2 0 6 3 10 4 1 2

E X E R C I S E S

74 CHAPTER 3

Population Variance In the previous example, we developed the concept of variance as a measure of disper- sion. Similar to the mean, we can calculate the variance of a population or the variance of a sample. The formula to compute the population variance is:

POPULATION VARIANCE σ2 = Σ(x − μ)2

N (3–7)

where:

σ2 is the population variance (σ is the lowercase Greek letter sigma). It is read as “sigma squared.”

x is the value of a particular observation in the population. μ is the arithmetic mean of the population. N is the number of observations in the population.

The process for computing the variance is implied by the formula.

1. Begin by finding the mean. 2. Find the difference between each observation and the mean, and square that

difference. 3. Sum all the squared differences. 4. Divide the sum of the squared differences by the number of items in the

population.

So the population variance is the mean of the squared difference between each value and the mean. For populations whose values are near the mean, the variance will be small. For populations whose values are dispersed from the mean, the population vari- ance will be large.

The variance overcomes the weakness of the range by using all the values in the population, whereas the range uses only the maximum and minimum values. We over- come the issue where Σ(x − μ) = 0 by squaring the differences. Squaring the differences will always result in nonnegative values. The following is another example that illus- trates the calculation and interpretation of the variance.

E X A M P L E

The number of traffic citations issued last year by month in Beaufort County, South Carolina, is reported below.

Citations by Month

January February March April May June July August September October November December 19 17 22 18 28 34 45 39 38 44 34 10

Determine the population variance.

All eight employees during the same period at the Chickpee location of Acme Carpets revealed they lost the following number of days due to illness:

2 0 1 0 5 0 1 0

As the director of human resources, compare the two locations. What would you recommend?

DESCRIBING DATA: NUMERICAL MEASURES 75

S O L U T I O N

Because we are studying all the citations for a year, the data comprise a population. To determine the population variance, we use formula (3–7). The table below de- tails the calculations.

Citations Month (x) x − μ (x − μ)2

January 19 −10 100 February 17 −12 144 March 22 −7 49 April 18 −11 121 May 28 −1 1 June 34 5 25 July 45 16 256 August 39 10 100 September 38 9 81 October 44 15 225 November 34 5 25 December 10 −19 361 Total 348 0 1,488

1. We begin by determining the arithmetic mean of the population. The total num- ber of citations issued for the year is 348, so the mean number issued per month is 29.

μ = Σx N

= 19 + 17 + … + 10

12 =

348 12

= 29

2. Next we find the difference between each observation and the mean. This is shown in the third column of the table. Recall on page 55 in this chapter, the Verizon example showed that the sum of the differences between each value and the mean is 0. This principle is repeated here. The sum of the differences between the mean and the number of citations each month is 0.

3. The next step is to square the difference for each month. That is shown in the fourth column of the table. All the squared differences will be positive. Note that squaring a negative value, or multiplying a negative value by itself, always results in a positive value.

4. The squared differences are totaled. The total of the fourth column is 1,488. That is the term Σ(x − μ)2.

5. Finally, we divide the squared differences by N, the number of observations in the population.

σ2 = Σ(x − σ)2

N =

1,488 12

= 124

So, the population variance for the number of citations is 124.

Like the range, the variance can be used to compare the dispersion in two or more sets of observations. For example, the variance for the number of citations issued in Beaufort County was just computed to be 124. If the variance in the num- ber of citations issued in Marlboro County, South Carolina, is 342.9, we conclude that (1) there is less dispersion in the distribution of the number of citations issued in Beaufort County than in Marlboro County (because 124 is less than 342.9) and (2) the number of citations in Beaufort County is more closely clustered around the mean of 29 than for the number of citations issued in Marlboro County. Thus the mean number of citations issued in Beaufort County is a more representative mea- sure of location than the mean number of citations in Marlboro County.

76 CHAPTER 3

Population Standard Deviation When we compute the variance, it is important to understand the unit of measure and what happens when the differences in the numerator are squared. That is, in the previ- ous example, the number of monthly citations is the variable. When we calculate the variance, the unit of measure for the variance is citations squared. Using “squared cita- tions” as a unit of measure is cumbersome.

There is a way out of this difficulty. By taking the square root of the population vari- ance, we can transform it to the same unit of measurement used for the original data. The square root of 124 citations squared is 11.14 citations. The units are now simply citations. The square root of the population variance is the population standard deviation.

POPULATION STANDARD DEVIATION σ = √ Σ(x − μ)2

N (3–8)

The Philadelphia office of PricewaterhouseCoopers hired five accounting trainees this year. Their monthly starting salaries were $3,536; $3,173; $3,448; $3,121; and $3,622. (a) Compute the population mean. (b) Compute the population variance. (c) Compute the population standard deviation. (d) The Pittsburgh office hired six trainees. Their mean monthly salary was $3,550, and

the standard deviation was $250. Compare the two groups.

S E L F - R E V I E W 3–7

41. Consider these five values a population: 8, 3, 7, 3, and 4. a. Determine the mean of the population. b. Determine the variance.

42. Consider these six values a population: 13, 3, 8, 10, 8, and 6. a. Determine the mean of the population. b. Determine the variance.

43. The annual report of Dennis Industries cited these primary earnings per common share for the past 5 years: $2.68, $1.03, $2.26, $4.30, and $3.58. If we assume these are population values, what is:

a. The arithmetic mean primary earnings per share of common stock? b. The variance?

44. Referring to Exercise 43, the annual report of Dennis Industries also gave these returns on stockholder equity for the same 5-year period (in percent): 13.2, 5.0, 10.2, 17.5, and 12.9.

a. What is the arithmetic mean return? b. What is the variance?

45. Plywood Inc. reported these returns on stockholder equity for the past 5 years: 4.3, 4.9, 7.2, 6.7, and 11.6. Consider these as population values.

a. Compute the range, the arithmetic mean, the variance, and the standard deviation. b. Compare the return on stockholder equity for Plywood Inc. with that for Dennis

Industries cited in Exercise 44. 46. The annual incomes of the five vice presidents of TMV Industries are $125,000;

$128,000; $122,000; $133,000; and $140,000. Consider this a population. a. What is the range? b. What is the arithmetic mean income? c. What is the population variance? The standard deviation? d. The annual incomes of officers of another firm similar to TMV Industries were

also studied. The mean was $129,000 and the standard deviation $8,612. Com- pare the means and dispersions in the two firms.

E X E R C I S E S

DESCRIBING DATA: NUMERICAL MEASURES 77

Sample Variance and Standard Deviation The formula for the population mean is μ = Σx/N. We just changed the symbols for the sample mean; that is, x = Σx/n. Unfortunately, the conversion from the population vari- ance to the sample variance is not as direct. It requires a change in the denominator. Instead of substituting n (number in the sample) for N (number in the population), the denominator is n − 1. Thus the formula for the sample variance is:

SAMPLE VARIANCE s2 = Σ(x − x )2

n − 1 (3–9)

where:

s2 is the sample variance. x is the value of each observation in the sample. x is the mean of the sample. n is the number of observations in the sample.

Why is this change made in the denominator? Although the use of n is logical since x is used to estimate μ, it tends to underestimate the population variance, σ2. The use of (n − 1) in the denominator provides the appropriate correction for this tendency. Because the primary use of sample statistics like s2 is to estimate population parame- ters like σ2, (n − 1) is preferred to n in defining the sample variance. We will also use this convention when computing the sample standard deviation.

E X A M P L E

The hourly wages for a sample of part-time employees at Home Depot are $12, $20, $16, $18, and $19. What is the sample variance?

S O L U T I O N

The sample variance is computed by using formula (3–9).

x = Σx n

= $85

5 = $17

Hourly Wage (x) x − x (x − x )2

$12 −$5 25 20 3 9 16 −1 1 18 1 1 19 2 4

$85 0 40

s2 = Σ(x − x )2

n − 1 =

40 5 − 1

= 10 in dollars squared

78 CHAPTER 3

The sample standard deviation is used as an estimator of the population standard deviation. As noted previously, the population standard deviation is the square root of the population variance. Likewise, the sample standard deviation is the square root of the sample variance. The sample standard deviation is determined by:

SAMPLE STANDARD DEVIATION s = √ Σ(x − x )2

n − 1 (3–10)

E X A M P L E

The sample variance in the previous example involving hourly wages was com- puted to be 10. What is the sample standard deviation?

S O L U T I O N

The sample standard deviation is $3.16, found by √10. Note again that the sample variance is in terms of dollars squared, but taking the square root of 10 gives us $3.16, which is in the same units (dollars) as the original data.

Software Solution On page 64, we used Excel to determine the mean, median, and mode of profit for the Applewood Auto Group data. You also will note that it lists the sample variance and sample standard deviation. Excel, like most other statistical software, assumes the data are from a sample.

The years of service for a sample of seven employees at a State Farm Insurance claims office in Cleveland, Ohio, are 4, 2, 5, 4, 5, 2, and 6. What is the sample variance? Compute the sample standard deviation.

S E L F - R E V I E W 3–8

DESCRIBING DATA: NUMERICAL MEASURES 79

Chebyshev’s theorem states:

LO3-5 Explain and apply Chebyshev’s theorem and the Empirical Rule.

STATISTICS IN ACTION

(continued)

For Exercises 47–52, do the following:

a. Compute the sample variance. b. Determine the sample standard deviation.

door openers. Based on a sample, following are the times, in minutes, required to install 10 door openers: 28, 32, 24, 46, 44, 40, 54, 38, 32, and 42.

51. The Houston, Texas, Motel Owner Association conducted a survey regarding weekday motel rates in the area. Listed below is the room rate for business-class guests for a sample of 10 motels.

$101 $97 $103 $110 $78 $87 $101 $80 $106 $88

$110 $126 $103 $93 $99 $113 $87 $101 $109 $100

E X E R C I S E S

80 CHAPTER 3

it found the mean from a list of the class sizes of each student, it was 147. Why the disparity? Because there are few students in the small classes and a larger number of students in the larger classes, which has the effect of increasing the mean class size when it is calculated this way. A school could reduce this mean class size for each student by reducing the number of students in each class. That is, cut out the large freshman lecture classes.

(continued from p. 79)

EMPIRICAL RULE For a symmetrical, bell-shaped frequency distribution, approximately 68% of the observations will lie within plus and minus one standard deviation of the mean; about 95% of the observations will lie within plus and minus two standard deviations of the mean; and practically all (99.7%) will lie within plus and minus three standard deviations of the mean.

E X A M P L E

Dupree Paint Company employees contribute a mean of $51.54 to the company’s profit-sharing plan every two weeks. The standard deviation of biweekly contributions is $7.51. At least what percent of the contributions lie within plus 3.5 standard deviations and minus 3.5 standard deviations of the mean, that is between $25.26 and $77.83?

S O L U T I O N

About 92%, found by

1 − 1

k2 = 1 −

(3.5)2 = 1 −

1 12.25

= 0.92

The Empirical Rule Chebyshev’s theorem applies to any set of values; that is, the distribution of values can have any shape. However, for a symmetrical, bell-shaped distribution such as the one in Chart 3–7, we can be more precise in explaining the dispersion about the mean. These relationships involving the standard deviation and the mean are described by the Empirical Rule, sometimes called the Normal Rule.

These relationships are portrayed graphically in Chart 3–7 for a bell-shaped distribution with a mean of 100 and a standard deviation of 10.

908070 110 120 130100 68% 95%

99.7%

CHART 3–7 A Symmetrical, Bell-Shaped Curve Showing the Relationships between the Standard Deviation and the Percentage of Observations

Applying the Empirical Rule, if a distribution is symmetrical and bell-shaped, practically all of the observations lie between the mean plus and minus three standard deviations. Thus, if x = 100 and s = 10, practically all the observations lie between 100 + 3(10) and 100 − 3(10), or 70 and 130. The estimated range is therefore 60, found by 130 − 70.

DESCRIBING DATA: NUMERICAL MEASURES 81

Conversely, if we know that the range is 60 and the distribution is bell-shaped, we can approximate the standard deviation by dividing the range by 6. For this illustration: range ÷ 6 = 60 ÷ 6 = 10, the standard deviation.

E X A M P L E

A sample of the rental rates at University Park Apartments approximates a symmet- rical, bell-shaped distribution. The sample mean is $500; the standard deviation is $20. Using the Empirical Rule, answer these questions:

1. About 68% of the monthly rentals are between what two amounts? 2. About 95% of the monthly rentals are between what two amounts? 3. Almost all of the monthly rentals are between what two amounts?

S O L U T I O N

1. About 68% are between $480 and $520, found by x ± 1s = $500 ± 1($20). 2. About 95% are between $460 and $540, found by x ± 2s = $500 ± 2($20). 3. Almost all (99.7%) are between $440 and $560, found by x ± 3s = $500 ± 3($20).

The Pitney Pipe Company is one of several domestic manufacturers of PVC pipe. The quality control department sampled 600 10-foot lengths. At a point 1 foot from the end of the pipe, they measured the outside diameter. The mean was 14.0 inches and the standard deviation 0.1 inch. (a) If we do not know the shape of the distribution of outside pipe diameters, at least what

percent of the observations will be between 13.85 inches and 14.15 inches? (b) If we assume that the distribution of diameters is symmetrical and bell-shaped, about

95% of the observations will be between what two values?

S E L F - R E V I E W 3–9

53. According to Chebyshev’s theorem, at least what percent of any set of observa- tions will be within 1.8 standard deviations of the mean?

54. The mean income of a group of sample observations is $500; the standard devia- tion is $40. According to Chebyshev’s theorem, at least what percent of the in- comes will lie between $400 and $600?

55. The distribution of the weights of a sample of 1,400 cargo containers is symmetric and bell-shaped. According to the Empirical Rule, what percent of the weights will lie:

a. Between x − 2s and x + 2s? b. Between x and x + 2s ? Above x + 2s?

56. The following graph portrays the distribution of the number of spicy chicken sand- wiches sold at a nearby Wendy’s for the last 141 days. The mean number of sand- wiches sold per day is 91.9 and the standard deviation is 4.67.

10090 Sales

If we use the Empirical Rule, sales will be between what two values on 68% of the days? Sales will be between what two values on 95% of the days?

E X E R C I S E S

82 CHAPTER 3

THE MEAN AND STANDARD DEVIATION OF GROUPED DATA In most instances, measures of location, such as the mean, and measures of dispersion, such as the standard deviation, are determined by using the individual values. Statistical software packages make it easy to calculate these values, even for large data sets. However, sometimes we are given only the frequency distribution and wish to estimate the mean or standard deviation. In the following discussion, we show how we can esti- mate the mean and standard deviation from data organized into a frequency distribu- tion. We should stress that a mean or a standard deviation from grouped data is an estimate of the corresponding actual values.

Arithmetic Mean of Grouped Data To approximate the arithmetic mean of data organized into a frequency distribution, we begin by assuming the observations in each class are represented by the midpoint of the class. The mean of a sample of data organized in a frequency distribution is computed by:

LO3-6 Compute the mean and standard deviation of grouped data.

STATISTICS IN ACTION

During the 2016 Major League Baseball season, DJ LeMahieu of the Colorado Rockies had the highest batting average at .348. Tony Gwynn hit .394 in the strike-shortened season of 1994, and Ted Williams hit .406 in 1941. No one has hit over .400 since 1941. The mean bat- ting average has remained constant at about .260 for more than 100 years, but the standard deviation declined from .049 to .031. This indicates less disper- sion in the batting averages today and helps explain the lack of any .400 hitters in recent times.

ARITHMETIC MEAN OF GROUPED DATA x = ΣfM

n (3–11)

where:

x is the sample mean. M is the midpoint of each class. f is the frequency in each class. fM is the frequency in each class times the midpoint of the class. Σfm is the sum of these products. n is the total number of frequencies.

E X A M P L E

The computations for the arithmetic mean of data grouped into a frequency distribution will be shown based on the Applewood Auto Group profit data. Recall in Chapter 2, in Table 2–7 on page 30, we constructed a frequency distribution for the vehicle profit. The information is repeated below. Determine the arithmetic mean profit per vehicle.

Profit Frequency

$ 200 up to $ 600 8 600 up to 1,000 11 1,000 up to 1,400 23 1,400 up to 1,800 38 1,800 up to 2,200 45 2,200 up to 2,600 32 2,600 up to 3,000 19 3,000 up to 3,400 4

Total 180

S O L U T I O N

The mean vehicle selling price can be estimated from data grouped into a fre- quency distribution. To find the estimated mean, assume the midpoint of each class is representative of the data values in that class. Recall that the midpoint of a class

DESCRIBING DATA: NUMERICAL MEASURES 83

Standard Deviation of Grouped Data To calculate the standard deviation of data grouped into a frequency distribution, we need to adjust formula (3–10) slightly. We weight each of the squared differences by the number of frequencies in each class. The formula is:

is halfway between the lower class limits of two consecutive classes. To find the midpoint of a particular class, we add the lower limits of two consecutive classes and divide by 2. Hence, the midpoint of the first class is $400, found by ($200 + $600)/2. We assume the value of $400 is representative of the eight values in that class. To put it another way, we assume the sum of the eight values in this class is $3,200, found by 8($400). We continue the process of multiplying the class midpoint by the class frequency for each class and then sum these products. The results are summarized in Table 3–1.

TABLE 3–1 Profit on 180 Vehicles Sold Last Month at Applewood Auto Group

Solving for the arithmetic mean using formula (3–11), we get:

x = ΣfM

n =

$333,200 180

= $1,851.11

We conclude that the mean profit per vehicle is about $1,851.

Profit Frequency (f ) Midpoint (M) fM

$ 200 up to $ 600 8 $ 400 $ 3,200 600 up to 1,000 11 800 8,800 1,000 up to 1,400 23 1,200 27,600 1,400 up to 1,800 38 1,600 60,800 1,800 up to 2,200 45 2,000 90,000 2,200 up to 2,600 32 2,400 76,800 2,600 up to 3,000 19 2,800 53,200 3,000 up to 3,400 4 3,200 12,800

Total 180 $333,200

STANDARD DEVIATION, GROUPED DATA s = √ Σf(M − x )2

n − 1 (3–12)

where:

s is the sample standard deviation. M is the midpoint of the class. f is the class frequency. n is the number of observations in the sample. x is the sample mean.

E X A M P L E

Refer to the frequency distribution for the Applewood Auto Group profit data re- ported in Table 3–1. Compute the standard deviation of the vehicle selling prices.

S O L U T I O N

Following the same practice used earlier for computing the mean of data grouped into a frequency distribution, f is the class frequency, M the class midpoint, and n the number of observations.

84 CHAPTER 3

Profit Frequency (f ) Midpoint (M) fM (M − x ) (M −x )2 f(M − x )2

$ 200 up to $ 600 8 400 3,200 −1,451 2,105,401 16,843,208 600 up to 1,000 11 800 8,800 −1,051 1,104,601 12,150,611 1,000 up to 1,400 23 1,200 27,600 −651 423,801 9,747,423 1,400 up to 1,800 38 1,600 60,800 −251 63,001 2,394,038 1,800 up to 2,200 45 2,000 90,000 149 22,201 999,045 2,200 up to 2,600 32 2,400 76,800 549 301,401 9,644,832 2,600 up to 3,000 19 2,800 53,200 949 900,601 17,111,419 3,000 up to 3,400 4 3,200 12,800 1,349 1,819,801 7,279,204

Total 180 333,200 76,169,780

To find the standard deviation:

Step 1: Subtract the mean from the class midpoint. That is, find (M − x ) = ($400 − $1,851 = −$1,451) for the first class, for the second class ($800 − $1,851 = −$1,051), and so on.

Step 2: Square the difference between the class midpoint and the mean. For the first class, it would be ($400 − $1,851)2 = 2,105,401, for the sec- ond class ($800 − $1,851)2 = 1,104,601, and so on.

Step 3: Multiply the squared difference between the class midpoint and the mean by the class frequency. For the first class, the value is 8($400 − $1,851)2 = 16,843,208; for the second, 11($800 − $1,851)2 = 12,150,611, and so on.

Step 4: Sum the f(M − x )2. The total is 76,169,920. To find the standard devi- ation, we insert these values in formula (3–12).

s = √ Σf(M − x )2

n − 1 = √

76,169,780 180 − 1

= 652.33

The mean and the standard deviation calculated from the data grouped into a frequency distribution are usually close to the values calculated from raw data. The grouped data result in some loss of information. For the vehicle profit example, the mean profit reported in the Excel output on page 64 is $1,843.17 and the standard deviation is $643.63. The respective values estimated from data grouped into a frequency distribution are $1,851.11 and $652.33. The difference in the means is $7.94, or about 0.4%. The standard deviations differ by $8.70, or 1.4%. Based on the percentage difference, the estimates are very close to the actual values.

The net incomes of a sample of twenty container shipping companies were organized into the following table:

Net Income ($ millions) Number of Companies

2 up to 6 1 6 up to 10 4 10 up to 14 10 14 up to 18 3 18 up to 22 2

(a) What is the table called? (b) Based on the distribution, what is the estimate of the arithmetic mean net income? (c) Based on the distribution, what is the estimate of the standard deviation?

S E L F - R E V I E W 3–10

DESCRIBING DATA: NUMERICAL MEASURES 85

57. When we compute the mean of a frequency distribution, why do we refer to this as an estimated mean?

58. Estimate the mean and the standard deviation of the following frequency distribu- tion showing the number of times students eat at campus dining places in a month.

Class Frequency

0 up to 5 2 5 up to 10 7 10 up to 15 12 15 up to 20 6 20 up to 25 3

59. Estimate the mean and the standard deviation of the following frequency dis- tribution showing the ages of the first 60 people in line on Black Friday at a retail store.

Class Frequency

20 up to 30 7 30 up to 40 12 40 up to 50 21 50 up to 60 18 60 up to 70 12

60. SCCoast, an Internet provider in the Southeast, developed the following frequency distribution on the age of Internet users. Estimate the mean and the standard deviation.

Age (years) Frequency

10 up to 20 3 20 up to 30 7 30 up to 40 18 40 up to 50 20 50 up to 60 12

61. The IRS was interested in the number of individual tax forms prepared by small accounting firms. The IRS randomly sampled 50 public accounting firms with 10 or fewer employees in the Dallas–Fort Worth area. The following frequency ta- ble reports the results of the study. Estimate the mean and the standard deviation.

Number of Clients Frequency

20 up to 30 1 30 up to 40 15 40 up to 50 22 50 up to 60 8 60 up to 70 4

E X E R C I S E S

86 CHAPTER 3

ETHICS AND REPORTING RESULTS In Chapter 1, we discussed the ethical and unbiased reporting of statistical results. While you are learning about how to organize, summarize, and interpret data using sta- tistics, it is also important to understand statistics so that you can be an intelligent con- sumer of information.

In this chapter, we learned how to compute numerical descriptive statistics. Specifi- cally, we showed how to compute and interpret measures of location for a data set: the mean, median, and mode. We also discussed the advantages and disadvantages for each statistic. For example, if a real estate developer tells a client that the average home in a particular subdivision sold for $150,000, we assume that $150,000 is a representative selling price for all the homes. But suppose that the client also asks what the median sales price is, and the median is $60,000. Why was the developer only reporting the mean price? This information is extremely important to a person’s decision making when buying a home. Knowing the advantages and disadvantages of the mean, median, and mode is important as we report statistics and as we use statistical information to make decisions.

We also learned how to compute measures of dispersion: range, variance, and standard deviation. Each of these statistics also has advantages and disadvantages. Remember that the range provides information about the overall spread of a distribu- tion. However, it does not provide any information about how the data are clustered or concentrated around the center of the distribution. As we learn more about statistics, we need to remember that when we use statistics we must maintain an independent and principled point of view. Any statistical report requires objective and honest com- munication of the results.

C H A P T E R S U M M A R Y

I. A measure of location is a value used to describe the central tendency of a set of data. A. The arithmetic mean is the most widely reported measure of location.

1. It is calculated by adding the values of the observations and dividing by the total number of observations. a. The formula for the population mean of ungrouped or raw data is

μ = Σx N

(3–1)

b. The formula for the sample mean is

x = Σx n

(3–2)

62. Advertising expenses are a significant component of the cost of goods sold. Listed below is a frequency distribution showing the advertising expenditures for 60 man- ufacturing companies located in the Southwest. Estimate the mean and the stan- dard deviation of advertising expenses.

Advertising Expenditure Number of ($ millions) Companies

25 up to 35 5 35 up to 45 10 45 up to 55 21 55 up to 65 16 65 up to 75 8

Total 60

DESCRIBING DATA: NUMERICAL MEASURES 87

c. The formula for the sample mean of data in a frequency distribution is

x = ΣfM

n (3–11)

2. The major characteristics of the arithmetic mean are: a. At least the interval scale of measurement is required. b. All the data values are used in the calculation. c. A set of data has only one mean. That is, it is unique. d. The sum of the deviations from the mean equals 0.

B. The median is the value in the middle of a set of ordered data. 1. To find the median, sort the observations from minimum to maximum and identify

the middle value. 2. The major characteristics of the median are:

a. At least the ordinal scale of measurement is required. b. It is not influenced by extreme values. c. Fifty percent of the observations are larger than the median. d. It is unique to a set of data.

C. The mode is the value that occurs most often in a set of data. 1. The mode can be found for nominal-level data. 2. A set of data can have more than one mode.

D. The weighted mean is found by multiplying each observation by its corresponding weight. 1. The formula for determining the weighted mean is

xw = w1x1 + w2 x2 + w3 x3 + … + wn xn

w1 + w2 + w3 + … + wn (3–3)

E. The geometric mean is the nth root of the product of n positive values. 1. The formula for the geometric mean is

GM = √n (x1) (x2) (x3) … (xn) (3–4) 2. The geometric mean is also used to find the rate of change from one period to another.

GM = √ n Value at end of period

Value at beginning of period − 1 (3–5)

3. The geometric mean is always equal to or less than the arithmetic mean. II. The dispersion is the variation or spread in a set of data.

A. The range is the difference between the maximum and minimum values in a set of data. 1. The formula for the range is

Range = Maximum value − Minimum value (3–6) 2. The major characteristics of the range are:

a. Only two values are used in its calculation. b. It is influenced by extreme values. c. It is easy to compute and to understand.

B. The variance is the mean of the squared deviations from the arithmetic mean. 1. The formula for the population variance is

σ2 = Σ(x − μ)2

N (3–7)

2. The formula for the sample variance is

s2 = Σ(x − x )2

n − 1 (3–9)

3. The major characteristics of the variance are: a. All observations are used in the calculation. b. The units are somewhat difficult to work with; they are the original units squared.

C. The standard deviation is the square root of the variance. 1. The major characteristics of the standard deviation are:

a. It is in the same units as the original data. b. It is the square root of the average squared distance from the mean. c. It cannot be negative. d. It is the most widely reported measure of dispersion.

88 CHAPTER 3

2. The formula for the sample standard deviation is

s = √ Σ(x − x )2

n − 1 (3–10)

3. The formula for the standard deviation of grouped data is

s = √ Σf(M − x )2

n − 1 (3–12)

III. We use the standard deviation to describe a frequency distribution by applying Chebyshev’s theorem or the Empirical Rule. A. Chebyshev’s theorem states that regardless of the shape of the distribution, at least

1 − 1/k2 of the observations will be within k standard deviations of the mean, where k is greater than 1.

B. The Empirical Rule states that for a bell-shaped distribution about 68% of the values will be within one standard deviation of the mean, 95% within two, and virtually all within three.

P R O N U N C I A T I O N K E Y

SYMBOL MEANING PRONUNCIATION

μ Population mean mu Σ Operation of adding sigma Σx Adding a group of values sigma x x Sample mean x bar

xw Weighted mean x bar sub w

GM Geometric mean G M

ΣfM Adding the product of the frequencies and the class midpoints sigma f M

σ2 Population variance sigma squared σ Population standard deviation sigma

C H A P T E R E X E R C I S E S

63. The accounting firm of Crawford and Associates has five senior partners. Yesterday the senior partners saw six, four, three, seven, and five clients, respectively. a. Compute the mean and median number of clients seen by the partners. b. Is the mean a sample mean or a population mean? c. Verify that Σ(x − μ) = 0.

64. Owens Orchards sells apples in a large bag by weight. A sample of seven bags con- tained the following numbers of apples: 23, 19, 26, 17, 21, 24, 22. a. Compute the mean and median number of apples in a bag. b. Verify that Σ(x − x ) = 0.

65. A sample of households that subscribe to United Bell Phone Company for landline phone service revealed the following number of calls received per household last week. Determine the mean and the median number of calls received.

52 43 30 38 30 42 12 46 39 37 34 46 32 18 41 5

66. The Citizens Banking Company is studying the number of times the ATM located in a Loblaws Supermarket at the foot of Market Street is used per day. Following are the number of times the machine was used daily over each of the last 30 days. Determine the mean number of times the machine was used per day.

83 64 84 76 84 54 75 59 70 61 63 80 84 73 68 52 65 90 52 77 95 36 78 61 59 84 95 47 87 60

DESCRIBING DATA: NUMERICAL MEASURES 89

67. A recent study of the laundry habits of Americans included the time in minutes of the wash cycle. A sample of 40 observations follows. Determine the mean and the me- dian of a typical wash cycle.

35 37 28 37 33 38 37 32 28 29 39 33 32 37 33 35 36 44 36 34 40 38 46 39 37 39 34 39 31 33 37 35 39 38 37 32 43 31 31 35

68. Trudy Green works for the True-Green Lawn Company. Her job is to solicit lawn- care business via the telephone. Listed below is the number of appointments she made in each of the last 25 hours of calling. What is the arithmetic mean number of appoint- ments she made per hour? What is the median number of appointments per hour? Write a brief report summarizing the findings.

9 5 2 6 5 6 4 4 7 2 3 6 3 4 4 7 8 4 4 5 5 4 8 3 3

69. The Split-A-Rail Fence Company sells three types of fence to homeowners in suburban Seattle, Washington. Grade A costs $5.00 per running foot to install, Grade B costs $6.50 per running foot, and Grade C, the premium quality, costs $8.00 per running foot. Yesterday, Split-A-Rail installed 270 feet of Grade A, 300 feet of Grade B, and 100 feet of Grade C. What was the mean cost per foot of fence installed?

70. Rolland Poust is a sophomore in the College of Business at Scandia Tech. Last semester he took courses in statistics and accounting, 3 hours each, and earned an A in both. He earned a B in a 5-hour history course and a B in a 2-hour history of jazz course. In addi- tion, he took a 1-hour course dealing with the rules of basketball so he could get his li- cense to officiate high school basketball games. He got an A in this course. What was his GPA for the semester? Assume that he receives 4 points for an A, 3 for a B, and so on. What measure of central tendency did you calculate? What method did you use?

71. The table below shows the percent of the labor force that is unemployed and the size of the labor force for three counties in northwest Ohio. Jon Elsas is the Regional Director of Economic Development. He must present a report to several companies that are consid- ering locating in northwest Ohio. What would be an appropriate unemployment rate to show for the entire region?

County Percent Unemployed Size of Workforce

Wood 4.5 15,300 Ottawa 3.0 10,400 Lucas 10.2 150,600

72. The American Diabetes Association recommends a blood glucose reading of less than 130 for those with Type 2 diabetes. Blood glucose measures the amount of sugar in the blood. Below are the readings for February for a person recently diagnosed with Type 2 diabetes.

112 122 116 103 112 96 115 98 106 111 106 124 116 127 116 108 112 112 121 115 124 116 107 118 123 109 109 106

a. What is the arithmetic mean glucose reading? b. What is the median glucose reading? c. What is the modal glucose reading?

73. The first Super Bowl was played in 1967. The cost for a 30-second commercial was $42,000. The cost of a 30-second commercial for Super Bowl 50 was $4.6 million. What was the geometric mean rate of increase for the 50 year period?

90 CHAPTER 3

74. A recent article suggested that, if you earn $25,000 a year today and the inflation rate continues at 3% per year, you’ll need to make $33,598 in 10 years to have the same buying power. You would need to make $44,771 if the inflation rate jumped to 6%. Confirm that these statements are accurate by finding the geometric mean rate of increase.

75. The ages of a sample of Canadian tourists flying from Toronto to Hong Kong were 32, 21, 60, 47, 54, 17, 72, 55, 33, and 41. a. Compute the range. b. Compute the standard deviation.

76. The weights (in pounds) of a sample of five boxes being sent by UPS are 12, 6, 7, 3, and 10. a. Compute the range. b. Compute the standard deviation.

77. The enrollments of the 13 public universities in the state of Ohio are listed below.

College Enrollment

University of Akron 26,106 Bowling Green State University 18,864 Central State University 1,718 University of Cincinnati 44,354 Cleveland State University 17,194 Kent State University 41,444 Miami University 23,902 Ohio State University 62,278 Ohio University 36,493 Shawnee State University 4,230 University of Toledo 20,595 Wright State University 17,460 Youngstown State University 12,512

a. Is this a sample or a population? b. What is the mean enrollment? c. What is the median enrollment? d. What is the range of the enrollments? e. Compute the standard deviation.

78. Health issues are a concern of managers, especially as they evaluate the cost of medi- cal insurance. A recent survey of 150 executives at Elvers Industries, a large insurance and financial firm located in the Southwest, reported the number of pounds by which the executives were overweight. Compute the mean and the standard deviation.

Pounds Overweight Frequency

0 up to 6 14 6 up to 12 42 12 up to 18 58 18 up to 24 28 24 up to 30 8

79. The Apollo space program lasted from 1967 until 1972 and included 13 missions. The missions lasted from as little as 7 hours to as long as 301 hours. The duration of each flight is listed below.

9 195 241 301 216 260 7 244 192 147 10 295 142

a. Explain why the flight times are a population. b. Find the mean and median of the flight times. c. Find the range and the standard deviation of the flight times.

DESCRIBING DATA: NUMERICAL MEASURES 91

80. Creek Ratz is a very popular restaurant located along the coast of northern Florida. They serve a variety of steak and seafood dinners. During the summer beach season, they do not take reservations or accept “call ahead” seating. Management of the restau- rant is concerned with the time a patron must wait before being seated for dinner. Listed below is the wait time, in minutes, for the 25 tables seated last Saturday night.

28 39 23 67 37 28 56 40 28 50 51 45 44 65 61 27 24 61 34 44 64 25 24 27 29

a. Explain why the times are a population. b. Find the mean and median of the times. c. Find the range and the standard deviation of the times.

81. A sample of 25 undergraduates reported the following dollar amounts of enter- tainment expenses last year:

684 710 688 711 722 698 723 743 738 722 696 721 685 763 681 731 736 771 693 701 737 717 752 710 697

a. Find the mean, median, and mode of this information. b. What are the range and standard deviation? c. Use the Empirical Rule to establish an interval that includes about 95% of the

observations. 82. The Kentucky Derby is held the first Saturday in May at Churchill Downs in

Louisville, Kentucky. The race track is one and one-quarter miles. The following table shows the winners since 1990, their margin of victory, the winning time, and the pay- off on a $2 bet.

Winning Margin Winning Time Payoff on a Year Winner (lengths) (minutes) $2 Win Bet

1990 Unbridled 3.5 2.03333 10.80 1991 Strike the Gold 1.75 2.05000 4.80 1992 Lil E. Tee 1 2.05000 16.80 1993 Sea Hero 2.5 2.04000 12.90 1994 Go For Gin 2 2.06000 9.10 1995 Thunder Gulch 2.25 2.02000 24.50 1996 Grindstone nose 2.01667 5.90 1997 Silver Charm head 2.04000 4.00 1998 Real Quiet 0.5 2.03667 8.40 1999 Charismatic neck 2.05333 31.30 2000 Fusaichi Pegasus 1.5 2.02000 2.30 2001 Monarchos 4.75 1.99950 10.50 2002 War Emblem 4 2.01883 20.50 2003 Funny Cide 1.75 2.01983 12.80 2004 Smarty Jones 2.75 2.06767 4.10 2005 Giacomo 0.5 2.04583 50.30 2006 Barbaro 6.5 2.02267 6.10 2007 Street Sense 2.25 2.03617 4.90 2008 Big Brown 4.75 2.03033 6.80 2009 Mine That Bird 6.75 2.04433 103.20 2010 Super Saver 2.50 2.07417 18.00 2011 Animal Kingdom 2.75 2.034 43.80 2012 I’ll Have Another 1.5 2.03050 32.60 2013 Orb 2.5 2.04817 12.80 2014 California Chrome 1.75 2.0610 7.00 2015 American Pharaoh 1.00 2.05033 7.80

92 CHAPTER 3

a. Determine the mean and median for the variables winning time and payoff on a $2 bet. b. Determine the range and standard deviation of the variables winning time and payoff

on a $2 bet. c. Refer to the variable winning margin. What is the level of measurement? What mea-

sure of location would be most appropriate? 83. The manager of the local Walmart Supercenter is studying the number of items

purchased by customers in the evening hours. Listed below is the number of items for a sample of 30 customers.

15 8 6 9 9 4 18 10 10 12 12 4 7 8 12 10 10 11 9 13 5 6 11 14 5 6 6 5 13 5

a. Find the mean and the median of the number of items. b. Find the range and the standard deviation of the number of items. c. Organize the number of items into a frequency distribution. You may want to review the

guidelines in Chapter 2 for establishing the class interval and the number of classes. d. Find the mean and the standard deviation of the data organized into a frequency distri-

bution. Compare these values with those computed in part (a). Why are they different? 84. The following frequency distribution reports the electricity cost for a sample of 50 two-

bedroom apartments in Albuquerque, New Mexico, during the month of May last year.

Electricity Cost Frequency

$ 80 up to $100 3 100 up to 120 8 120 up to 140 12 140 up to 160 16 160 up to 180 7 180 up to 200 4

Total 50

a. Estimate the mean cost. b. Estimate the standard deviation. c. Use the Empirical Rule to estimate the proportion of costs within two standard devia-

tions of the mean. What are these limits? 85. Bidwell Electronics Inc. recently surveyed a sample of employees to determine how far

they lived from corporate headquarters. The results are shown below. Compute the mean and the standard deviation.

Distance (miles) Frequency M

0 up to 5 4 2.5 5 up to 10 15 7.5 10 up to 15 27 12.5 15 up to 20 18 17.5 20 up to 25 6 22.5

D A T A A N A L Y T I C S

86. Refer to the North Valley Real Estate data and prepare a report on the sales prices of the homes. Be sure to answer the following questions in your report. a. Around what values of price do the data tend to cluster? What is the mean sales

price? What is the median sales price? Is one measure more representative of the typical sales prices than the others?

b. What is the range of sales prices? What is the standard deviation? About 95% of the sales prices are between what two values? Is the standard deviation a useful statistic for describing the dispersion of sales price?

c. Repeat (a) and (b) using FICO score.

DESCRIBING DATA: NUMERICAL MEASURES 93

87. Refer to the Baseball 2016 data, which report information on the 30 Major League Baseball teams for the 2016 season. Refer to the variable team salary. a. Prepare a report on the team salaries. Be sure to answer the following questions in

your report. 1. Around what values do the data tend to cluster? Specifically what is the mean

team salary? What is the median team salary? Is one measure more representa- tive of the typical team salary than the others?

2. What is the range of the team salaries? What is the standard deviation? About 95% of the salaries are between what two values?

b. Refer to the information on the average salary for each year. In 2000 the average player salary was $1.99 million. By 2016 the average player salary had increased to $4.40 million. What was the rate of increase over the period?

88. Refer to the Lincolnville School District bus data. Prepare a report on the mainte- nance cost for last month. Be sure to answer the following questions in your report. a. Around what values do the data tend to cluster? Specifically what was the mean

maintenance cost last month? What is the median cost? Is one measure more repre- sentative of the typical cost than the others?

b. What is the range of maintenance costs? What is the standard deviation? About 95% of the maintenance costs are between what two values?

LEARNING OBJECTIVES When you have completed this chapter, you will be able to:

LO4-1 Construct and interpret a dot plot.

LO4-2 Construct and describe a stem-and-leaf display.

LO4-3 Identify and compute measures of position.

LO4-4 Construct and analyze a box plot.

LO4-5 Compute and interpret the coefficient of skewness.

LO4-6 Create and interpret a scatter diagram.

LO4-7 Develop and explain a contingency table.

MCGIVERN JEWELERS recently posted an advertisement on a social media site reporting the shape, size, price, and cut grade for 33 of its diamonds in stock. Develop a box plot of the variable price and comment on the result. (See Exercise 37 and LO4-4.)

Describing Data: DISPLAYING AND EXPLORING DATA4

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 95

LO4-1 Construct and interpret a dot plot.

E X A M P L E

Tionesta Ford Lincoln

Monday Tuesday Wednesday Thursday Friday Saturday

23 33 27 28 39 26 30 32 28 33 35 32 29 25 36 31 32 27 35 32 35 37 36 30

96 CHAPTER 4

Sheffield Motors Inc.

Monday Tuesday Wednesday Thursday Friday Saturday

31 35 44 36 34 37 30 37 43 31 40 31 32 44 36 34 43 36 26 38 37 30 42 33

S O L U T I O N

The Minitab system provides a dot plot and outputs the mean, median, maximum, and minimum values, and the standard deviation for the number of cars serviced at each dealership over the last 24 working days.

The dot plots, shown in the center of the output, graphically illustrate the distribu- tions for each dealership. The plots show the difference in the location and dis- persion of the observations. By looking at the dot plots, we can see that the number of vehicles serviced at the Sheffield dealership is more widely dispersed and has a larger mean than at the Tionesta dealership. Several other features of the number of vehicles serviced are:

• Tionesta serviced the fewest cars in any day, 23. • Sheffield serviced 26 cars during their slowest day, which is 4 cars less than

the next lowest day. • Tionesta serviced exactly 32 cars on four different days. • The numbers of cars serviced cluster around 36 for Sheffield and 32 for Tionesta.

From the descriptive statistics, we see Sheffield serviced a mean of 35.83 vehicles per day. Tionesta serviced a mean of 31.292 vehicles per day during the same period. So Sheffield typically services 4.54 more vehicles per day. There is also more dispersion, or variation, in the daily number of vehicles serviced at Sheffield than at Tionesta. How do we know this? The standard deviation is larger at Shef- field (4.96 vehicles per day) than at Tionesta (4.112 cars per day).

STEM-AND-LEAF DISPLAYS In Chapter 2, we showed how to organize data into a frequency distribution so we could summarize the raw data into a meaningful form. The major advantage to organizing the data into a frequency distribution is we get a quick visual picture of the shape of the

LO4-2 Construct and describe a stem-and-leaf display.

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 97

distribution without doing any further calculation. To put it another way, we can see where the data are concentrated and also determine whether there are any extremely large or small values. There are two disadvantages, however, to organizing the data into a frequency distribution: (1) we lose the exact identity of each value and (2) we are not sure how the values within each class are distributed. To explain, the Theater of the Republic in Erie, Pennsylvania, books live theater and musical performances. The the- ater’s capacity is 160 seats. Last year, among the forty-five performances, there were eight different plays and twelve different bands. The following frequency distribution shows that between eighty up to ninety people attended two of the forty-five perfor- mances; there were seven performances where ninety up to one hundred people at- tended. However, is the attendance within this class clustered about 90, spread evenly throughout the class, or clustered near 99? We cannot tell.

Attendance Frequency

80 up to 90 2 90 up to 100 7 100 up to 110 6 110 up to 120 9 120 up to 130 8 130 up to 140 7 140 up to 150 3 150 up to 160 3

Total 45

One technique used to display quantitative information in a condensed form and provide more information than the frequency distribution is the stem-and-leaf display. An advantage of the stem-and-leaf display over a frequency distribution is we do not lose the identity of each observation. In the above example, we would not know the identity of the values in the 90 up to 100 class. To illustrate the construc- tion of a stem-and-leaf display using the number people attending each perfor- mance, suppose the seven observations in the 90 up to 100 class are 96, 94, 93, 94, 95, 96, and 97. The stem value is the leading digit or digits, in this case 9. The leaves are the trailing digits. The stem is placed to the left of a vertical line and the leaf values to the right.

The values in the 90 up to 100 class would appear as follows:

9 ∣ 6 4 3 4 5 6 7

It is also customary to sort the values within each stem from smallest to largest. Thus, the second row of the stem-and-leaf display would appear as follows:

9 ∣ 3 4 4 5 6 6 7

With the stem-and-leaf display, we can quickly observe that 94 people attended two performances and the number attending ranged from 93 to 97. A stem-and-leaf display is similar to a frequency distribution with more information, that is, the identity of the observations is preserved.

STEM-AND-LEAF DISPLAY A statistical technique to present a set of data. Each numerical value is divided into two parts. The leading digit(s) becomes the stem and the trailing digit the leaf. The stems are located along the vertical axis, and the leaf values are stacked against each other along the horizontal axis.

98 CHAPTER 4

The following example explains the details of developing a stem-and-leaf display.

E X A M P L E

Listed in Table 4–1 is the number of people attending each of the 45 performances at the Theater of the Republic last year. Organize the data into a stem-and-leaf display. Around what values does attendance tend to cluster? What is the smallest attendance? The largest attendance?

S O L U T I O N

From the data in Table 4–1, we note that the smallest attendance is 88. So we will make the first stem value 8. The largest attendance is 156, so we will have the stem values begin at 8 and continue to 15. The first number in Table 4–1 is 96, which has a stem value of 9 and a leaf value of 6. Moving across the top row, the second value is 93 and the third is 88. After the first 3 data values are considered, the chart is as follows.

Stem Leaf

8 8 9 6 3 10 11 12 13 14 15

Organizing all the data, the stem-and-leaf chart looks as follows.

Stem Leaf

8 8 9 9 6 3 5 6 4 4 7 10 8 7 3 4 6 3 11 7 3 2 7 2 1 9 8 3 12 7 5 7 0 5 5 0 4 13 9 5 2 9 4 6 8 14 8 2 3 15 6 5 5

The usual procedure is to sort the leaf values from the smallest to largest. The last line, the row referring to the values in the 150s, would appear as:

15 ∣ 5 5 6

TABLE 4–1 Number of People Attending Each of the 45 Performances at the Theater of the Republic

96 93 88 117 127 95 113 96 108 94 148 156 139 142 94 107 125 155 155 103 112 127 117 120 112 135 132 111 125 104 106 139 134 119 97 89 118 136 125 143 120 103 113 124 138

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 99

The final table would appear as follows, where we have sorted all of the leaf values.

Stem Leaf

8 8 9 9 3 4 4 5 6 6 7 10 3 3 4 6 7 8 11 1 2 2 3 3 7 7 8 9 12 0 0 4 5 5 5 7 7 13 2 4 5 6 8 9 9 14 2 3 8 15 5 5 6

You can draw several conclusions from the stem-and-leaf display. First, the mini- mum number of people attending is 88 and the maximum is 156. There were two per- formances with less than 90 people attending, and three performances with 150 or more. You can observe, for example, that for the three performances with more than 150 people attending, the actual attendances were 155, 155, and 156. The concentra- tion of attendance is between 110 and 130. There were fifteen performances with at- tendance between 110 and 119 and eight performances between 120 and 129. We can also tell that within the 120 to 129 group the actual attendances were spread evenly throughout the class. That is, 120 people attended two performances, 124 peo- ple attended one performance, 125 people attended three performances, and 127 peo- ple attended two performances.

We also can generate this information on the Minitab software system. We have named the variable Attendance. The Minitab output is below. You can find the Minitab commands that will produce this output in Appendix C.

The Minitab solution provides some additional information regarding cumulative totals. In the column to the left of the stem values are numbers such as 2, 9, 15, and so on. The number 9 indicates there are 9 observations that have occurred before the value of 100. The number 15 indicates that 15 observations have occurred prior to 110. About halfway down the column the number 9 appears in parentheses. The parentheses indicate that the middle value or median appears in that row and there are nine values in this group. In this case, we describe the middle value as the value below which half of the observations oc- cur. There are a total of 45 observations, so the middle value, if the data were arranged from smallest to largest, would be the 23rd observation; its value is 118. After the median, the values begin to decline. These values represent the “more than” cumulative totals. There are 21 observations of 120 or more, 13 of 130 or more, and so on.

100 CHAPTER 4

Which is the better choice, a dot plot or a stem-and-leaf chart? This is really a matter of personal choice and convenience. For presenting data, especially with a large num- ber of observations, you will find dot plots are more frequently used. You will see dot plots in analytical literature, marketing reports, and occasionally in annual reports. If you are doing a quick analysis for yourself, stem-and-leaf tallies are handy and easy, partic- ularly on a smaller set of data.

1. The number of employees at each of the 142 Home Depot stores in the Southeast region is shown in the following dot plot.

100 10484 88 92 Number of employees

9680

(a) What are the maximum and minimum numbers of employees per store? (b) How many stores employ 91 people? (c) Around what values does the number of employees per store tend to cluster? 2. The rate of return for 21 stocks is:

8.3 9.6 9.5 9.1 8.8 11.2 7.7 10.1 9.9 10.8 10.2 8.0 8.4 8.1 11.6 9.6 8.8 8.0 10.4 9.8 9.2

S E L F - R E V I E W 4–1

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 101

Organize this information into a stem-and-leaf display. (a) How many rates are less than 9.0? (b) List the rates in the 10.0 up to 11.0 category. (c) What is the median? (d) What are the maximum and the minimum rates of return?

1. Describe the differences between a histogram and a dot plot. When might a dot plot be better than a histogram?

2. Describe the differences between a histogram and a stem-and-leaf display. 3. Consider the following chart.

6 72 3 4 51

a. What is this chart called? b. How many observations are in the study? c. What are the maximum and the minimum values? d. Around what values do the observations tend to cluster?

4. The following chart reports the number of cell phones sold at a big-box retail store for the last 26 days.

199 144

a. What are the maximum and the minimum numbers of cell phones sold in a day? b. What is a typical number of cell phones sold?

5. The first row of a stem-and-leaf chart appears as follows: 62 | 1 3 3 7 9. Assume whole number values.

a. What is the “possible range” of the values in this row? b. How many data values are in this row? c. List the actual values in this row of data.

6. The third row of a stem-and-leaf chart appears as follows: 21 | 0 1 3 5 7 9. Assume whole number values.

a. What is the “possible range” of the values in this row? b. How many data values are in this row? c. List the actual values in this row of data.

7. The following stem-and-leaf chart shows the number of units produced per day in a factory.

Stem Leaf 3 8 4 5 6 6 0133559 7 0236778 8 59 9 00156 10 36

a. How many days were studied? b. How many observations are in the first class?

E X E R C I S E S

102 CHAPTER 4

c. What are the minimum value and the maximum value? d. List the actual values in the fourth row. e. List the actual values in the second row. f. How many values are less than 70? g. How many values are 80 or more? h. What is the median? i. How many values are between 60 and 89, inclusive?

8. The following stem-and-leaf chart reports the number of prescriptions filled per day at the pharmacy on the corner of Fourth and Main Streets.

Stem Leaf 12 689 13 123 14 6889 15 589 16 35 17 24568 18 268 19 13456 20 034679 21 2239 22 789 23 00179 24 8 25 13 26 27 0

a. How many days were studied? b. How many observations are in the last class? c. What are the maximum and the minimum values in the entire set of data? d. List the actual values in the fourth row. e. List the actual values in the next to the last row. f. On how many days were less than 160 prescriptions filled? g. On how many days were 220 or more prescriptions filled? h. What is the middle value? i. How many days did the number of filled prescriptions range between 170 and 210?

9. A survey of the number of phone calls made by a sample of 16 Verizon sub- scribers last week revealed the following information. Develop a stem-and-leaf chart. How many calls did a typical subscriber make? What were the maximum and the minimum number of calls made?

52 43 30 38 30 42 12 46 39 37 34 46 32 18 41 5

10. Aloha Banking Co. is studying ATM use in suburban Honolulu. Yesterday, for a sample of 30 ATM's, the bank counted the number of times each machine was used. The data is presented in the table. Develop a stem-and-leaf chart to summa- rize the data. What were the typical, minimum, and maximum number of times each ATM was used?

83 64 84 76 84 54 75 59 70 61 63 80 84 73 68 52 65 90 52 77 95 36 78 61 59 84 95 47 87 60

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 103

MEASURES OF POSITION The standard deviation is the most widely used measure of dispersion. However, there are other ways of describing the variation or spread in a set of data. One method is to determine the location of values that divide a set of observations into equal parts. These measures include quartiles, deciles, and percentiles.

Quartiles divide a set of observations into four equal parts. To explain further, think of any set of values arranged from the minimum to the maximum. In Chapter 3, we called the middle value of a set of data arranged from the minimum to the maximum the median. That is, 50% of the observations are larger than the median and 50% are smaller. The median is a measure of location because it pinpoints the center of the data. In a similar fashion, quartiles divide a set of observations into four equal parts. The first quartile, usu- ally labeled Q1, is the value below which 25% of the observations occur, and the third quartile, usually labeled Q3, is the value below which 75% of the observations occur.

Similarly, deciles divide a set of observations into 10 equal parts and percentiles into 100 equal parts. So if you found that your GPA was in the 8th decile at your univer- sity, you could conclude that 80% of the students had a GPA lower than yours and 20% had a higher GPA. If your GPA was in the 92nd percentile, then 92% of students had a GPA less than your GPA and only 8% of students had a GPA greater than your GPA. Per- centile scores are frequently used to report results on such national standardized tests as the SAT, ACT, GMAT (used to judge entry into many master of business administration programs), and LSAT (used to judge entry into law school).

Quartiles, Deciles, and Percentiles To formalize the computational procedure, let Lp refer to the location of a desired percen- tile. So if we want to find the 92nd percentile we would use L92, and if we wanted the median, the 50th percentile, then L50. For a number of observations, n, the location of the Pth percentile, can be found using the formula:

LO4-3 Identify and compute measures of position.

LOCATION OF A PERCENTILE Lp = (n + 1) P

100 [4–1]

An example will help to explain further.

E X A M P L E

Morgan Stanley is an investment company with offices located throughout the United States. Listed below are the commissions earned last month by a sample of 15 brokers at the Morgan Stanley office in Oakland, California.

$2,038 $1,758 $1,721 $1,637 $2,097 $2,047 $2,205 $1,787 $2,287 1,940 2,311 2,054 2,406 1,471 1,460

Locate the median, the first quartile, and the third quartile for the commissions earned.

S O L U T I O N

The first step is to sort the data from the smallest commission to the largest.

$1,460 $1,471 $1,637 $1,721 $1,758 $1,787 $1,940 $2,038 2,047 2,054 2,097 2,205 2,287 2,311 2,406

104 CHAPTER 4

In the above example, the location formula yielded a whole number. That is, we wanted to find the first quartile and there were 15 observations, so the location formula indicated we should find the fourth ordered value. What if there were 20 observations in the sample, that is n = 20, and we wanted to locate the first quartile? From the loca- tion formula (4–1):

L25 = (n + 1) P

100 = (20 + 1)

25 100

= 5.25

We would locate the fifth value in the ordered array and then move .25 of the distance between the fifth and sixth values and report that as the first quartile. Like the median, the quartile does not need to be one of the actual values in the data set.

To explain further, suppose a data set contained the six values 91, 75, 61, 101, 43, and 104. We want to locate the first quartile. We order the values from the minimum to the maximum: 43, 61, 75, 91, 101, and 104. The first quartile is located at

L25 = (n + 1) P

100 = (6 + 1)

25 100

= 1.75

The position formula tells us that the first quartile is located between the first and the second values and it is .75 of the distance between the first and the second values. The first value is 43 and the second is 61. So the distance between these two values is 18. To locate the first quartile, we need to move .75 of the distance between the first and second values, so .75(18) = 13.5. To complete the procedure, we add 13.5 to the first value, 43, and report that the first quartile is 56.5.

We can extend the idea to include both deciles and percentiles. To locate the 23rd percentile in a sample of 80 observations, we would look for the 18.63 position.

L23 = (n + 1) P

100 = (80 + 1)

23 100

= 18.63

The median value is the observation in the center and is the same as the 50th percen- tile, so P equals 50. So the median or L50 is located at (n + 1)(50/100), where n is the number of observations. In this case, that is position number 8, found by (15 + 1) (50/100). The eighth-largest commission is $2,038. So we conclude this is the median and that half the brokers earned com- missions more than $2,038 and half earned less than $2,038. The result using

formula (4–1) to find the median is the same as the method presented in Chapter 3.

Recall the definition of a quartile. Quartiles divide a set of observations into four equal parts. Hence 25% of the observations will be less than the first quartile. Seventy-five percent of the observations will be less than the third quartile. To locate the first quartile, we use formula (4–1), where n = 15 and P = 25:

L25 = (n + 1) P

100 = (15 + 1)

25 100

= 4

and to locate the third quartile, n = 15 and P = 75:

L75 = (n + 1) P

100 = (15 + 1)

75 100

= 12

Therefore, the first and third quartile values are located at positions 4 and 12, respectively. The fourth value in the ordered array is $1,721 and the twelfth is $2,205. These are the first and third quartiles.

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 105

To find the value corresponding to the 23rd percentile, we would locate the 18th value and the 19th value and determine the distance between the two values. Next, we would multiply this difference by 0.63 and add the result to the smaller value. The result would be the 23rd percentile.

Statistical software is very helpful when describing and summarizing data. Excel, Minitab, and MegaStat, a statistical analysis Excel add-in, all provide summary statistics that include quartiles. For example, the Minitab summary of the Morgan Stanley com- mission data, shown below, includes the first and third quartiles, and other statistics. Based on the reported quartiles, 25% of the commissions earned were less than $1,721 and 75% were less than $2,205. These are the same values we calculated using formula (4–1).

There are ways other than formula (4–1) to lo- cate quartile values. For example, another method uses 0.25n + 0.75 to locate the position of the first quartile and 0.75n + 0.25 to locate the position of the third quartile. We will call this the Excel Method. In the Morgan Stanley data, this method would place the first quartile at position 4.5 (.25 × 15 + .75) and the third quartile at position 11.5 (.75 × 15 + .25). The first quartile would be interpolated as 0.5, or one-half the difference between the fourth- and the fifth-ranked values. Based on this method, the first quartile is $1739.5, found by ($1,721 + 0.5[$1,758 − $1,721]). The third quar- tile, at position 11.5, would be $2,151, or one-half the distance between the eleventh- and the

twelfth-ranked values, found by ($2,097 + 0.5[$2,205 − $2,097]). Excel, as shown in the Morgan Stanley and Applewood examples, can compute quartiles using either of the two methods. Please note the text uses formula (4–1) to calculate quartiles.

Is the difference between the two methods important? No. Usually it is just a nui- sance. In general, both methods calculate values that will support the statement that ap- proximately 25% of the values are less than the value of the first quartile, and approximately 75% of the data values are less than the value of the third quartile. When the sample is

large, the difference in the results from the two methods is small. For example, in the Applewood Auto Group data there are 180 vehicles. The quartiles computed using both methods are shown to the left. Based on the variable profit, 45 of the 180 values (25%) are less than both values of the first quartile, and 135 of the 180 values (75%) are less than both values of the third quartile.

When using Excel, be careful to understand the method used to

STATISTICS IN ACTION

John W. Tukey (1915–2000) received a PhD in mathe- matics from Princeton in 1939. However, when he joined the Fire Control Re- search Office during World War II, his interest in ab- stract mathematics shifted to applied statistics. He de- veloped effective numerical and graphical methods for studying patterns in data. Among the graphics he developed are the stem- and-leaf diagram and the box-and-whisker plot or box plot. From 1960 to 1980, Tukey headed the statistical division of NBC’s election night vote projection team. He became renowned in 1960 for preventing an early call of victory for Richard Nixon in the presi- dential election won by John F. Kennedy.

Morgan Stanley Commissions

1460 Equation 4-1 2047 1471

Quartile 1 Quartile 3

1721 2205

Alternate Method Quartile 1 Quartile 3

1739.5 2151

2054 1637 2097 1721 2205 1758 2287 1787 2311 1940 2406 2038

Pro�tAge Applewood

Equation 4-1 Quartile 1 Quartile 3

1415.5 2275.5

Alternate Method Quartile 1 Quartile 3

1422.5 2268.5

$1,387 $1,754 $1,817 $1,040 $1,273 $1,529 $3,082 $1,951 $2,692 $1,342

21 23 24 25 26 27 27 28 28 29

106 CHAPTER 4

The Quality Control department of Plainsville Peanut Company is responsible for checking the weight of the 8-ounce jar of peanut butter. The weights of a sample of nine jars pro- duced last hour are:

7.69 7.72 7.8 7.86 7.90 7.94 7.97 8.06 8.09

(a) What is the median weight? (b) Determine the weights corresponding to the first and third quartiles.

S E L F - R E V I E W 4–2

11. Determine the median and the first and third quartiles in the following data.

46 47 49 49 51 53 54 54 55 55 59

12. Determine the median and the first and third quartiles in the following data.

5.24 6.02 6.67 7.30 7.59 7.99 8.03 8.35 8.81 9.45 9.61 10.37 10.39 11.86 12.22 12.71 13.07 13.59 13.89 15.42

13 13 13 20 26 27 31 34 34 34 35 35 36 37 38 41 41 41 45 47 47 47 50 51 53 54 56 62 67 82

a. Determine the first and third quartiles. b. Determine the second decile and the eighth decile. c. Determine the 67th percentile.

38 40 41 45 48 48 50 50 51 51 52 52 53 54 55 55 55 56 56 57 59 59 59 62 62 62 63 64 65 66 66 67 67 69 69 71 77 78 79 79

a. Determine the median number of calls. b. Determine the first and third quartiles. c. Determine the first decile and the ninth decile. d. Determine the 33rd percentile.

E X E R C I S E S

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 107

BOX PLOTS A box plot is a graphical display, based on quartiles, that helps us picture a set of data. To construct a box plot, we need only five statistics: the minimum value, Q1 (the first quartile), the median, Q3 (the third quartile), and the maximum value. An example will help to explain.

LO4-4 Construct and analyze a box plot.

E X A M P L E

Alexander’s Pizza offers free delivery of its pizza within 15 miles. Alex, the owner, wants some information on the time it takes for delivery. How long does a typical delivery take? Within what range of times will most deliveries be completed? For a sample of 20 deliveries, he determined the following information:

Minimum value = 13 minutes

Q1 = 15 minutes

Median = 18 minutes

Q3 = 22 minutes

Maximum value = 30 minutes

Develop a box plot for the delivery times. What conclusions can you make about the delivery times?

S O L U T I O N

The first step in drawing a box plot is to create an appropriate scale along the horizontal axis. Next, we draw a box that starts at Q1 (15 minutes) and ends at Q3 (22 minutes). Inside the box we place a vertical line to represent the median (18 minutes). Finally, we extend horizontal lines from the box out to the minimum value (13 minutes) and the maximum value (30 minutes). These horizontal lines outside of the box are sometimes called “whiskers” because they look a bit like a cat’s whiskers.

12 14 16 18 20 22 24 26 28 30 32

Q1 Median

Minimum value

Maximum value

Minutes

The box plot also shows the interquartile range of delivery times between Q1 and Q3. The interquartile range is 7 minutes and indicates that 50% of the deliveries are between 15 and 22 minutes.

The box plot also reveals that the distribution of delivery times is positively skewed. In Chapter 3, we defined skewness as the lack of symmetry in a set of data. How do we know this distribution is positively skewed? In this case, there are actually two pieces of information that suggest this. First, the dashed line to the right of the box from 22 minutes (Q3) to the maximum time of 30 minutes is longer than the dashed line from the left of 15 minutes (Q1) to the minimum value of 13 minutes. To put it another way,

108 CHAPTER 4

the 25% of the data larger than the third quartile is more spread out than the 25% less than the first quartile. A second indication of positive skewness is that the median is not in the center of the box. The distance from the first quartile to the median is smaller than the distance from the median to the third quartile. We know that the number of delivery times between 15 minutes and 18 minutes is the same as the number of de- livery times between 18 minutes and 22 minutes.

E X A M P L E

Refer to the Applewood Auto Group data. Develop a box plot for the variable age of the buyer. What can we conclude about the distribution of the age of the buyer?

S O L U T I O N

Minitab was used to develop the following chart and summary statistics.

The median age of the purchaser is 46 years, 25% of the purchasers are less than 40 years of age, and 25% are more than 52.75 years of age. Based on the sum- mary information and the box plot, we conclude:

• Fifty percent of the purchasers are between the ages of 40 and 52.75 years. • The distribution of ages is fairly symmetric. There are two reasons for this con-

clusion. The length of the whisker above 52.75 years (Q3) is about the same length as the whisker below 40 years (Q1). Also, the area in the box between 40 years and the median of 46 years is about the same as the area between the median and 52.75.

There are three asterisks (*) above 70 years. What do they indicate? In a box plot, an asterisk identifies an outlier. An outlier is a value that is inconsistent with the rest of the data. It is defined as a value that is more than 1.5 times the inter- quartile range smaller than Q1 or larger than Q3. In this example, an outlier would be a value larger than 71.875 years, found by:

Outlier > Q3 + 1.5(Q3 − Q1) = 52.75 + 1.5(52.75 − 40) = 71.875

An outlier would also be a value less than 20.875 years.

Outlier < Q1 − 1.5(Q3 − Q1) = 40 − 1.5(52.75 − 40) = 20.875

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 109

The following box plot shows the assets in millions of dollars for credit unions in Seattle, Washington.

0 10 20 30 40 50 60 70 80 90 100

What are the smallest and largest values, the first and third quartiles, and the median? Would you agree that the distribution is symmetrical? Are there any outliers?

S E L F - R E V I E W 4–3

From the box plot, we conclude there are three purchasers 72 years of age or older and none less than 21 years of age. Technical note: In some cases, a single asterisk may represent more than one observation because of the limitations of the software and space available. It is a good idea to check the actual data. In this in- stance, there are three purchasers 72 years old or older; two are 72 and one is 73.

15. The box plot below shows the amount spent for books and supplies per year by students at four-year public colleges.

0 350 700 1,050 1,400 $1,750

a. Estimate the median amount spent. b. Estimate the first and third quartiles for the amount spent. c. Estimate the interquartile range for the amount spent. d. Beyond what point is a value considered an outlier? e. Identify any outliers and estimate their value. f. Is the distribution symmetrical or positively or negatively skewed?

16. The box plot shows the undergraduate in-state tuition per credit hour at four-year public colleges.

0 300 600 900 1,200 $1,500

a. Estimate the median. b. Estimate the first and third quartiles. c. Determine the interquartile range. d. Beyond what point is a value considered an outlier? e. Identify any outliers and estimate their value. f. Is the distribution symmetrical or positively or negatively skewed?

17. In a study of the gasoline mileage of model year 2016 automobiles, the mean miles per gallon was 27.5 and the median was 26.8. The smallest value in the study was 12.70 miles per gallon, and the largest was 50.20. The first and third quartiles were 17.95 and 35.45 miles per gallon, respectively. Develop a box plot and comment on the distribution. Is it a symmetric distribution?

E X E R C I S E S

110 CHAPTER 4

SKEWNESS In Chapter 3, we described measures of central location for a distribution of data by re- porting the mean, median, and mode. We also described measures that show the amount of spread or variation in a distribution, such as the range and the standard deviation.

Another characteristic of a distribution is the shape. There are four shapes com- monly observed: symmetric, positively skewed, negatively skewed, and bimodal. In a symmetric distribution the mean and median are equal and the data values are evenly spread around these values. The shape of the distribution below the mean and median is a mirror image of distribution above the mean and median. A distribution of values is skewed to the right or positively skewed if there is a single peak, but the values extend much farther to the right of the peak than to the left of the peak. In this case, the mean is larger than the median. In a negatively skewed distribution there is a single peak, but the observations extend farther to the left, in the negative direction, than to the right. In a negatively skewed distribution, the mean is smaller than the median. Positively skewed distributions are more common. Salaries often follow this pattern. Think of the salaries of those employed in a small company of about 100 people. The president and a few top executives would have very large salaries relative to the other workers and hence the distribution of salaries would exhibit positive skewness. A bimodal distribu- tion will have two or more peaks. This is often the case when the values are from two or more populations. This information is summarized in Chart 4–1.

LO4-5 Compute and interpret the coefficient of skewness.

M ed

ia n

M ea

Fr eq

ue nc

Fr eq

ue nc

Fr eq

ue nc

Fr eq

ue nc

Years

Ages

Symmetric

Monthly Salaries

Positively Skewed

$3,000 $4,000

M ed

ia n

M ea

Median Mean

Test Scores

Negatively Skewed

75 80 Score

Mean

Outside Diameter

Bimodal

.98 1.04 Inches$

CHART 4–1 Shapes of Frequency Polygons

There are several formulas in the statistical literature used to calculate skewness. The simplest, developed by Professor Karl Pearson (1857–1936), is based on the differ- ence between the mean and the median.

18. A sample of 28 time shares in the Orlando, Florida, area revealed the follow- ing daily charges for a one-bedroom suite. For convenience, the data are ordered from smallest to largest. Construct a box plot to represent the data. Comment on the distribution. Be sure to identify the first and third quartiles and the median.

$116 $121 $157 $192 $207 $209 $209 229 232 236 236 239 243 246 260 264 276 281 283 289 296 307 309 312 317 324 341 353

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 111

Using this relationship, the coefficient of skewness can range from −3 up to 3. A value near −3, such as −2.57, indicates considerable negative skewness. A value such as 1.63 indicates moderate positive skewness. A value of 0, which will occur when the mean and median are equal, indicates the distribution is symmetrical and there is no skewness present.

In this text, we present output from Minitab and Excel. Both of these software pack- ages compute a value for the coefficient of skewness based on the cubed deviations from the mean. The formula is:

SOFTWARE COEFFICIENT OF SKEWNESS

sk = n

(n − 1) (n − 2)[ ∑(

x − x s )

] [4–3]

Formula (4–3) offers an insight into skewness. The right-hand side of the formula is the difference between each value and the mean, divided by the standard deviation. That is the portion (x − x )/s of the formula. This idea is called standardizing. We will discuss the idea of standardizing a value in more detail in Chapter 7 when we describe the normal probability distribution. At this point, observe that the result is to report the difference between each value and the mean in units of the standard deviation. If this difference is positive, the particular value is larger than the mean; if the value is nega- tive, the standardized quantity is smaller than the mean. When we cube these values, we retain the information on the direction of the difference. Recall that in the formula for the standard deviation [see formula (3–10)] we squared the difference between each value and the mean, so that the result was all nonnegative values.

If the set of data values under consideration is symmetric, when we cube the stan- dardized values and sum over all the values, the result would be near zero. If there are several large values, clearly separate from the others, the sum of the cubed differences would be a large positive value. If there are several small values clearly separate from the others, the sum of the cubed differences will be negative.

An example will illustrate the idea of skewness.

PEARSON’S COEFFICIENT OF SKEWNESS sk = 3(x − Median)

s [4–2]

STATISTICS IN ACTION

The late Stephen Jay Gould (1941–2002) was a profes- sor of zoology and professor of geology at Harvard University. In 1982, he was diagnosed with cancer and had an expected survival time of 8 months. However, never to be discouraged, his research showed that the distribution of survival time is dramatically skewed to the right and showed that not only do 50% of similar cancer patients survive more than 8 months, but that the survival time could be years rather than months! In fact, Dr. Gould lived an- other 20 years. Based on his experience, he wrote a widely published essay titled “The Median Is Not the Message.”

E X A M P L E

Following are the earnings per share for a sample of 15 software companies for the year 2016. The earnings per share are arranged from smallest to largest.

Compute the mean, median, and standard deviation. Find the coefficient of skewness using Pearson’s estimate and the software methods. What is your conclusion regarding the shape of the distribution?

S O L U T I O N

These are sample data, so we use formula (3–2) to determine the mean

x = Σx n

= $74.26

15 = $4.95

$0.09 $0.13 $0.41 $0.51 $ 1.12 $ 1.20 $ 1.49 $3.18 3.50 6.36 7.83 8.92 10.13 12.99 16.40

112 CHAPTER 4

The median is the middle value in a set of data, arranged from smallest to largest. In this case, there is an odd-number of observations, so the middle value is the median. It is $3.18.

We use formula (3–10) on page 78 to determine the sample standard deviation.

s = √ Σ(x − x )2

n − 1 = √

($0.09 − $4.95)2 + … + ($16.40 − $4.95)2

15 − 1 = $5.22

Pearson’s coefficient of skewness is 1.017, found by

sk = 3(x − Median)

s =

3($4.95 − $3.18) $5.22

= 1.017

This indicates there is moderate positive skewness in the earnings per share data. We obtain a similar, but not exactly the same, value from the software method.

The details of the calculations are shown in Table 4–2. To begin, we find the differ- ence between each earnings per share value and the mean and divide this result by the standard deviation. We have referred to this as standardizing. Next, we cube, that is, raise to the third power, the result of the first step. Finally, we sum the cubed values. The details for the first company, that is, the company with an earnings per share of $0.09, are:

( x − x

s ) 3

= ( 0.09 − 4.95

5.22 ) 3

= (−0.9310)3 = −0.8070

When we sum the 15 cubed values, the result is 11.8274. That is, the term Σ[(x − x )/s]3 = 11.8274. To find the coefficient of skewness, we use formula (4–3), with n = 15.

sk = n

(n − 1) (n − 2) ∑(

x − x s )

= 15

(15 − 1) (15 − 2) (11.8274) = 0.975

We conclude that the earnings per share values are somewhat positively skewed. The following Minitab summary reports the descriptive measures, such as

TABLE 4–2 Calculation of the Coefficient of Skewness

Earnings per Share (x − x )

s (

x − x s )

0.09 −0.9310 −0.8070 0.13 −0.9234 −0.7873 0.41 −0.8697 −0.6579 0.51 −0.8506 −0.6154 1.12 −0.7337 −0.3950 1.20 −0.7184 −0.3708 1.49 −0.6628 −0.2912 3.18 −0.3391 −0.0390 3.50 −0.2778 −0.0214 6.36 0.2701 0.0197 7.83 0.5517 0.1679 8.92 0.7605 0.4399 10.13 0.9923 0.9772

12.99 1.5402 3.6539 16.40 2.1935 10.5537

11.8274

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 113

A sample of five data entry clerks employed in the Horry County Tax Office revised the fol- lowing number of tax records last hour: 73, 98, 60, 92, and 84. (a) Find the mean, median, and the standard deviation. (b) Compute the coefficient of skewness using Pearson’s method. (c) Calculate the coefficient of skewness using the software method. (d) What is your conclusion regarding the skewness of the data?

S E L F - R E V I E W 4–4

For Exercises 19–22:

a. Determine the mean, median, and the standard deviation. b. Determine the coefficient of skewness using Pearson’s method. c. Determine the coefficient of skewness using the software method.

19. The following values are the starting salaries, in $000, for a sample of five accounting graduates who accepted positions in public accounting last year.

36.0 26.0 33.0 28.0 31.0

20. Listed below are the salaries, in $000, for a sample of 15 chief financial offi- cers in the electronics industry.

$516.0 $548.0 $566.0 $534.0 $586.0 $529.0 546.0 523.0 538.0 523.0 551.0 552.0 486.0 558.0 574.0

E X E R C I S E S

the mean, median, and standard deviation of the earnings per share data. Also in- cluded are the coefficient of skewness and a histogram with a bell-shaped curve superimposed.

114 CHAPTER 4

DESCRIBING THE RELATIONSHIP BETWEEN TWO VARIABLES In Chapter 2 and the first section of this chapter, we presented graphical techniques to summarize the distribution of a single variable. We used a histogram in Chapter 2 to summarize the profit on vehicles sold by the Applewood Auto Group. Earlier in

this chapter, we used dot plots and stem-and-leaf displays to visually summarize a set of data. Because we are studying a single variable, we refer to this as univariate data.

There are situations where we wish to study and visually portray the relationship between two vari- ables. When we study the relationship between two variables, we refer to the data as bivariate. Data ana- lysts frequently wish to understand the relationship between two variables. Here are some examples:

• Tybo and Associates is a law firm that advertises ex- tensively on local TV. The partners are considering increasing their advertising budget. Before doing so, they would like to know the relationship be- tween the amount spent per month on advertising and the total amount of billings for that month. To put it another way, will increasing the amount spent on advertising result in an increase in billings?

LO4-6 Create and interpret a scatter diagram.

21. Listed below are the commissions earned ($000) last year by the 15 sales representatives at Furniture Patch Inc.

$ 3.9 $ 5.7 $ 7.3 $10.6 $13.0 $13.6 $15.1 $15.8 $17.1 17.4 17.6 22.3 38.6 43.2 87.7

22. Listed below are the salaries for the 2016 New York Yankees Major League Baseball team.

Player Salary Player Salary

CC Sabathia $25,000,000 Dustin Ackley $3,200,000 Mark Teixeira 23,125,000 Martin Prado 3,000,000 Masahiro Tanaka 22,000,000 Didi Gregorius 2,425,000 Jacoby Ellsbury 21,142,857 Aaron Hicks 574,000 Alex Rodriguez 21,000,000 Austin Romine 556,000 Brian McCann 17,000,000 Chasen Shreve 533,400 Carlos Beltran 15,000,000 Greg Bird 525,300 Brett Gardner 13,500,000 Luis Severino 521,300 Chase Headley 13,000,000 Bryan Mitchell 516,650 Aroldis Chapman 11,325,000 Kirby Yates 511,900 Andrew Miller 9,000,000 Mason Williams 509,700 Starlin Castro 7,857,143 Ronald Torreyes 508,600 Nathan Eovaldi 5,600,000 John Barbato 507,500 Michael Pineda 4,300,000 Dellin Betances 507,500 Ivan Nova 4,100,000 Luis Cessa 507,500

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 115

• Coastal Realty is studying the selling prices of homes. What variables seem to be related to the selling price of homes? For example, do larger homes sell for more than smaller ones? Probably. So Coastal might study the relationship between the area in square feet and the selling price.

• Dr. Stephen Givens is an expert in human development. He is studying the relation- ship between the height of fathers and the height of their sons. That is, do tall fathers tend to have tall children? Would you expect LeBron James, the 6′8″, 250 pound professional basketball player, to have relatively tall sons?

One graphical technique we use to show the relationship between variables is called a scatter diagram.

To draw a scatter diagram, we need two variables. We scale one variable along the horizontal axis (X-axis) of a graph and the other variable along the vertical axis (Y-axis). Usually one variable depends to some degree on the other. In the third example above, the height of the son depends on the height of the father. So we scale the height of the father on the horizontal axis and that of the son on the vertical axis.

We can use statistical software, such as Excel, to perform the plotting function for us. Caution: You should always be careful of the scale. By changing the scale of either the vertical or the horizontal axis, you can affect the apparent visual strength of the relationship.

Following are three scatter diagrams (Chart 4–2). The one on the left shows a rather strong positive relationship between the age in years and the maintenance cost last year for a sample of 10 buses owned by the city of Cleveland, Ohio. Note that as the age of the bus increases, the yearly maintenance cost also increases. The example in the center, for a sample of 20 vehicles, shows a rather strong indirect rela- tionship between the odometer reading and the auction price. That is, as the number of miles driven increases, the auction price decreases. The example on the right de- picts the relationship between the height and yearly salary for a sample of 15 shift supervisors. This graph indicates there is little relationship between their height and yearly salary.

$24,000 21,000 18,000 15,000 12,000A

uc tio

n pr

ic e

10,000 30,000 50,000 Odometer

Auction Price versus Odometer $10,000

8,000 6,000 4,000 2,000

Co st

(a nn

ua l)

0 1 2 3 4 5 6 Age (years)

Age of Buses and Maintenance Cost Height versus Salary

125 120 115 110 105 100

95 90S

al ar

y ($

00 0)

54 55 56 57 58 59 60 61 62 63 Height (inches)

CHART 4–2 Three Examples of Scatter Diagrams.

E X A M P L E

In the introduction to Chapter 2, we presented data from the Applewood Auto Group. We gathered information concerning several variables, including the profit earned from the sale of 180 vehicles sold last month. In addition to the amount of profit on each sale, one of the other variables is the age of the purchaser. Is there a relationship between the profit earned on a vehicle sale and the age of the pur- chaser? Would it be reasonable to conclude that more profit is made on vehicles purchased by older buyers?

116 CHAPTER 4

In the preceding example, there is a weak positive, or direct, relationship between the variables. There are, however, many instances where there is a relationship between the variables, but that relationship is inverse or negative. For example:

• The value of a vehicle and the number of miles driven. As the number of miles in- creases, the value of the vehicle decreases.

• The premium for auto insurance and the age of the driver. Auto rates tend to be the highest for younger drivers and less for older drivers.

• For many law enforcement personnel, as the number of years on the job increases, the number of traffic citations decreases. This may be because personnel become more liberal in their interpretations or they may be in supervisor positions and not in a position to issue as many citations. But in any event, as age increases, the num- ber of citations decreases.

CONTINGENCY TABLES A scatter diagram requires that both of the variables be at least interval scale. In the Applewood Auto Group example, both age and vehicle profit are ratio scale variables. Height is also ratio scale as used in the discussion of the relationship between the height of fathers and the height of their sons. What if we wish to study the relationship between two variables when one or both are nominal or ordinal scale? In this case, we tally the results in a contingency table.

LO4-7 Develop and explain a contingency table.

S O L U T I O N

We can investigate the relationship between vehicle profit and the age of the buyer with a scatter diagram. We scale age on the horizontal, or X-axis, and the profit on the vertical, or Y-axis. We assume profit depends on the age of the purchaser. As people age, they earn more income and purchase more expensive cars which, in turn, produce higher profits. We use Excel to develop the scatter diagram. The Excel commands are in Appendix C.

The scatter diagram shows a rather weak positive relationship between the two variables. It does not appear there is much relationship between the vehicle profit and the age of the buyer. In Chapter 13, we will study the relationship between variables more extensively, even calculating several numerical measures to ex- press the relationship between variables.

0 10 20 30 40 Age (Years)

Profit and Age of Buyer at Applewood Auto Group Pr

ofi t p

er V

eh ic

le ($

)

50 60 70 80 $0

$500

$1,000

$1,500

$2,000

$2,500

$3,000

$3,500

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 117

A contingency table is a cross-tabulation that simultaneously summarizes two variables of interest. For example:

• Students at a university are classified by gender and class (freshman, sophomore, junior, or senior).

• A product is classified as acceptable or unacceptable and by the shift (day, after- noon, or night) on which it is manufactured.

• A voter in a school bond referendum is classified as to party affiliation (Democrat, Republican, other) and the number of children that voter has attending school in the district (0, 1, 2, etc.).

CONTINGENCY TABLE A table used to classify observations according to two identifiable characteristics.

E X A M P L E

There are four dealerships in the Applewood Auto Group. Suppose we want to com- pare the profit earned on each vehicle sold by the particular dealership. To put it another way, is there a relationship between the amount of profit earned and the dealership?

S O L U T I O N

In a contingency table, both variables only need to be nominal or ordinal. In this example, the variable dealership is a nominal variable and the variable profit is a ratio variable. To convert profit to an ordinal variable, we classify the variable profit into two categories, those cases where the profit earned is more than the median and those cases where it is less. On page 64, we calculated the median profit for all sales last month at Applewood Auto Group to be $1,882.50.

Contingency Table Showing the Relationship between Profit and Dealership

Above/Below Median Profit Kane Olean Sheffield Tionesta Total

Above 25 20 19 26 90 Below 27 20 26 17 90

Total 52 40 45 43 180

By organizing the information into a contingency table, we can compare the profit at the four dealerships. We observe the following:

• From the Total column on the right, 90 of the 180 cars sold had a profit above the median and half below. From the definition of the median, this is expected.

• For the Kane dealership, 25 out of the 52, or 48%, of the cars sold were sold for a profit more than the median.

• The percentage of profits above the median for the other dealerships are 50% for Olean, 42% for Sheffield, and 60% for Tionesta.

We will return to the study of contingency tables in Chapter 5 during the study of probability and in Chapter 15 during the study of nonparametric methods of analysis.

118 CHAPTER 4

The rock group Blue String Beans is touring the United States. The following chart shows the relationship between concert seating capacity and revenue in $000 for a sample of concerts.

5800 6300 6800 Seating Capacity

Am ou

nt ($

00 0)

7300

(a) What is the diagram called? (b) How many concerts were studied? (c) Estimate the revenue for the concert with the largest seating capacity. (d) How would you characterize the relationship between revenue and seating capacity?

Is it strong or weak, direct or inverse?

S E L F - R E V I E W 4–5

23. Develop a scatter diagram for the following sample data. How would you describe the relationship between the values?

x-Value y-Value x-Value y-Value

10 6 11 6 8 2 10 5 9 6 7 2

11 5 7 3 13 7 11 7

24. Silver Springs Moving and Storage Inc. is studying the relationship between the number of rooms in a move and the number of labor hours required for the move. As part of the analysis, the CFO of Silver Springs developed the following scatter diagram.

1 2 3 Rooms

Ho ur

E X E R C I S E S

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 119

a. How many moves are in the sample? b. Does it appear that more labor hours are required as the number of rooms

increases, or do labor hours decrease as the number of rooms increases?

25. The Director of Planning for Devine Dining Inc. wishes to study the relationship be- tween the gender of a guest and whether the guest orders dessert. To investigate the relationship, the manager collected the following information on 200 recent customers.

Gender

Dessert Ordered Male Female Total

Yes 32 15 47 No 68 85 153

Total 100 100 200

a. What is the level of measurement of the two variables? b. What is the above table called? c. Does the evidence in the table suggest men are more likely to order dessert

than women? Explain why.

26. Ski Resorts of Vermont Inc. is considering a merger with Gulf Shores Beach Resorts Inc. of Alabama. The board of directors surveyed 50 stockholders concerning their position on the merger. The results are reported below.

Opinion

Number of Shares Held Favor Oppose Undecided Total

Under 200 8 6 2 16 200 up to 1,000 6 8 1 15 Over 1,000 6 12 1 19

Total 20 26 4 50

a. What level of measurement is used in this table? b. What is this table called? c. What group seems most strongly opposed to the merger?

C H A P T E R S U M M A R Y

I. A dot plot shows the range of values on the horizontal axis and the number of observa- tions for each value on the vertical axis. A. Dot plots report the details of each observation. B. They are useful for comparing two or more data sets.

II. A stem-and-leaf display is an alternative to a histogram. A. The leading digit is the stem and the trailing digit the leaf. B. The advantages of a stem-and-leaf display over a histogram include:

1. The identity of each observation is not lost. 2. The digits themselves give a picture of the distribution. 3. The cumulative frequencies are also shown.

III. Measures of location also describe the shape of a set of observations. A. Quartiles divide a set of observations into four equal parts.

1. Twenty-five percent of the observations are less than the first quartile, 50% are less than the second quartile, and 75% are less than the third quartile.

2. The interquartile range is the difference between the third quartile and the first quartile.

B. Deciles divide a set of observations into 10 equal parts and percentiles into 100 equal parts.

120 CHAPTER 4

IV. A box plot is a graphic display of a set of data. A. A box is drawn enclosing the regions between the first quartile and the third quartile.

1. A line is drawn inside the box at the median value. 2. Dotted line segments are drawn from the third quartile to the largest value to

show the highest 25% of the values and from the first quartile to the smallest value to show the lowest 25% of the values.

B. A box plot is based on five statistics: the maximum and minimum values, the first and third quartiles, and the median.

V. The coefficient of skewness is a measure of the symmetry of a distribution. A. There are two formulas for the coefficient of skewness.

1. The formula developed by Pearson is:

sk = 3(x − Median)

s [4–2]

2. The coefficient of skewness computed by statistical software is:

sk = n

(n − 1) (n − 2)[ ∑(

x − x s )

] [4–3]

VI. A scatter diagram is a graphic tool to portray the relationship between two variables. A. Both variables are measured with interval or ratio scales. B. If the scatter of points moves from the lower left to the upper right, the variables un-

der consideration are directly or positively related. C. If the scatter of points moves from the upper left to the lower right, the variables are

inversely or negatively related. VII. A contingency table is used to classify nominal-scale observations according to two

characteristics.

P R O N U N C I A T I O N K E Y

SYMBOL MEANING PRONUNCIATION

Lp Location of percentile L sub p

Q1 First quartile Q sub 1

Q3 Third quartile Q sub 3

C H A P T E R E X E R C I S E S

27. A sample of students attending Southeast Florida University is asked the number of so- cial activities in which they participated last week. The chart below was prepared from the sample data.

41 2 Activities

a. What is the name given to this chart? b. How many students were in the study? c. How many students reported attending no social activities?

28. Doctor’s Care is a walk-in clinic, with locations in Georgetown, Moncks Corner, and Aynor, at which patients may receive treatment for minor injuries, colds, and flu, as well as physical examinations. The following charts report the number of patients treated in each of the three locations last month.

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 121

5020 30 Patients

4010

Location

Georgetown

Moncks Corner

Aynor

Describe the number of patients served at the three locations each day. What are the maximum and minimum numbers of patients served at each of the locations?

29. Below is the number of customers who visited Smith’s True-Value hardware store in Bellville, Ohio, over the last twenty-three days. Make a stem-and-leaf display of this variable.

46 52 46 40 42 46 40 37 46 40 52 32 37 32 52 40 32 52 40 52 46 46 52

30. The top 25 companies (by market capitalization) operating in the Washington, DC, area along with the year they were founded and the number of employees are given below. Make a stem-and-leaf display of each of these variables and write a short de- scription of your findings.

Company Name Year Founded Employees

AES Corp. 1981 30,000 American Capital Ltd. 1986 484 AvalonBay Communities Inc. 1978 1,767 Capital One Financial Corp. 1995 31,800 Constellation Energy Group Inc. 1816 9,736 Coventry Health Care Inc. 1986 10,250 Danaher Corp. 1984 45,000 Dominion Resources Inc. 1909 17,500 Fannie Mae 1938 6,450 Freddie Mac 1970 5,533 Gannett Co. 1906 49,675 General Dynamics Corp. 1952 81,000 Genworth Financial Inc. 2004 7,200 Harman International Industries Inc. 1980 11,246 Host Hotels & Resorts Inc. 1927 229 Legg Mason 1899 3,800 Lockheed Martin Corp. 1995 140,000 Marriott International Inc. 1927 151,000 MedImmune LLC 1988 2,516 NII Holdings Inc. 1996 7,748 Norfolk Southern Corp. 1982 30,594 Pepco Holdings Inc. 1896 5,057 Sallie Mae 1972 11,456 T. Rowe Price Group Inc. 1937 4,605 The Washington Post Co. 1877 17,100

31. In recent years, due to low interest rates, many homeowners refinanced their home mortgages. Linda Lahey is a mortgage officer at Down River Federal Savings

122 CHAPTER 4

and Loan. Below is the amount refinanced for 20 loans she processed last week. The data are reported in thousands of dollars and arranged from smallest to largest.

59.2 59.5 61.6 65.5 66.6 72.9 74.8 77.3 79.2 83.7 85.6 85.8 86.6 87.0 87.1 90.2 93.3 98.6 100.2 100.7

a. Find the median, first quartile, and third quartile. b. Find the 26th and 83rd percentiles. c. Draw a box plot of the data.

32. A study is made by the recording industry in the United States of the number of music CDs owned by 25 senior citizens and 30 young adults. The information is reported below.

Seniors

28 35 41 48 52 81 97 98 98 99 118 132 133 140 145 147 153 158 162 174 177 180 180 187 188

Young Adults

81 107 113 147 147 175 183 192 202 209 233 251 254 266 283 284 284 316 372 401 417 423 490 500 507 518 550 557 590 594

a. Find the median and the first and third quartiles for the number of CDs owned by senior citizens. Develop a box plot for the information.

b. Find the median and the first and third quartiles for the number of CDs owned by young adults. Develop a box plot for the information.

c. Compare the number of CDs owned by the two groups. 33. The corporate headquarters of Bank.com, an on-line banking company, is located

in downtown Philadelphia. The director of human resources is making a study of the time it takes employees to get to work. The city is planning to offer incentives to each downtown employer if they will encourage their employees to use public transportation. Below is a listing of the time to get to work this morning according to whether the em- ployee used public transportation or drove a car.

Public Transportation

23 25 25 30 31 31 32 33 35 36 37 42

Private

32 32 33 34 37 37 38 38 38 39 40 44

a. Find the median and the first and third quartiles for the time it took employees using public transportation. Develop a box plot for the information.

b. Find the median and the first and third quartiles for the time it took employees who drove their own vehicle. Develop a box plot for the information.

c. Compare the times of the two groups. 34. The following box plot shows the number of daily newspapers published in each

state and the District of Columbia. Write a brief report summarizing the number pub- lished. Be sure to include information on the values of the first and third quartiles,

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 123

the median, and whether there is any skewness. If there are any outliers, estimate their value.

Number of Newspapers

****

0 20 40 60 80 100

35. Walter Gogel Company is an industrial supplier of fasteners, tools, and springs. The amounts of its invoices vary widely, from less than $20.00 to more than $400.00. During the month of January the company sent out 80 invoices. Here is a box plot of these in- voices. Write a brief report summarizing the invoice amounts. Be sure to include infor- mation on the values of the first and third quartiles, the median, and whether there is any skewness. If there are any outliers, approximate the value of these invoices.

Invoice Amount

0 50 100 150 200 250

36. The American Society of PeriAnesthesia Nurses (ASPAN; www.aspan.org) is a national organization serving nurses practicing in ambulatory surgery, preanesthesia, and postanesthesia care. The organization consists of the 40 components listed below.

State/Region Membership

Alabama 95 Arizona 399 Maryland, Delaware, DC 531 Connecticut 239 Florida 631 Georgia 384 Hawaii 73 Illinois 562 Indiana 270 Iowa 117 Kentucky 197 Louisiana 258 Michigan 411 Massachusetts 480 Maine 97 Minnesota, Dakotas 289 Missouri, Kansas 282 Mississippi 90 Nebraska 115 North Carolina 542 Nevada 106

State/Region Membership

New Jersey, Bermuda 517 Alaska, Idaho, Montana, Oregon, Washington 708 New York 891 Ohio 708 Oklahoma 171 Arkansas 68 California 1,165 New Mexico 79 Pennsylvania 575 Rhode Island 53 Colorado 409 South Carolina 237 Texas 1,026 Tennessee 167 Utah 67 Virginia 414 Vermont, New Hampshire 144 Wisconsin 311 West Virginia 62

Use statistical software to answer the following questions. a. Find the mean, median, and standard deviation of the number of members per

component.

124 CHAPTER 4

b. Find the coefficient of skewness, using the software. What do you conclude about the shape of the distribution of component size?

c. Compute the first and third quartiles using formula (4–1). d. Develop a box plot. Are there any outliers? Which components are outliers? What are

the limits for outliers? 37. McGivern Jewelers is located in the Levis Square Mall just south of Toledo, Ohio.

Recently it posted an advertisement on a social media site reporting the shape, size, price, and cut grade for 33 of its diamonds currently in stock. The information is re- ported below.

Shape Size (carats) Price Cut Grade Shape Size (carats) Price Cut Grade

Princess 5.03 $44,312 Ideal cut Round 0.77 $2,828 Ultra ideal cut Round 2.35 20,413 Premium cut Oval 0.76 3,808 Premium cut Round 2.03 13,080 Ideal cut Princess 0.71 2,327 Premium cut Round 1.56 13,925 Ideal cut Marquise 0.71 2,732 Good cut Round 1.21 7,382 Ultra ideal cut Round 0.70 1,915 Premium cut Round 1.21 5,154 Average cut Round 0.66 1,885 Premium cut Round 1.19 5,339 Premium cut Round 0.62 1,397 Good cut Emerald 1.16 5,161 Ideal cut Round 0.52 2,555 Premium cut Round 1.08 8,775 Ultra ideal cut Princess 0.51 1,337 Ideal cut Round 1.02 4,282 Premium cut Round 0.51 1,558 Premium cut Round 1.02 6,943 Ideal cut Round 0.45 1,191 Premium cut Marquise 1.01 7,038 Good cut Princess 0.44 1,319 Average cut Princess 1.00 4,868 Premium cut Marquise 0.44 1,319 Premium cut Round 0.91 5,106 Premium cut Round 0.40 1,133 Premium cut Round 0.90 3,921 Good cut Round 0.35 1,354 Good cut Round 0.90 3,733 Premium cut Round 0.32 896 Premium cut Round 0.84 2,621 Premium cut

a. Develop a box plot of the variable price and comment on the result. Are there any outliers? What is the median price? What are the values of the first and the third quartiles?

b. Develop a box plot of the variable size and comment on the result. Are there any outliers? What is the median price? What are the values of the first and the third quartiles?

c. Develop a scatter diagram between the variables price and size. Be sure to put price on the vertical axis and size on the horizontal axis. Does there seem to be an associ- ation between the two variables? Is the association direct or indirect? Does any point seem to be different from the others?

d. Develop a contingency table for the variables shape and cut grade. What is the most common cut grade? What is the most common shape? What is the most common combination of cut grade and shape?

38. Listed below is the amount of commissions earned last month for the eight mem- bers of the sales staff at Best Electronics. Calculate the coefficient of skewness using both methods. Hint: Use of a spreadsheet will expedite the calculations.

980.9 1,036.5 1,099.5 1,153.9 1,409.0 1,456.4 1,718.4 1,721.2

39. Listed below is the number of car thefts in a large city over the last week. Calculate the coefficient of skewness using both methods. Hint: Use of a spreadsheet will expe- dite the calculations.

3 12 13 7 8 3 8

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 125

40. The manager of Information Services at Wilkin Investigations, a private investigation firm, is studying the relationship between the age (in months) of a combination printer, copier, and fax machine and its monthly maintenance cost. For a sample of 15 machines, the manager developed the following chart. What can the manager conclude about the re- lationship between the variables?

34 39 44 Months

$130

120

110

100

M on

th ly

M ai

nt en

an ce

C os

t 49

41. An auto insurance company reported the following information regarding the age of a driver and the number of accidents reported last year. Develop a scatter diagram for the data and write a brief summary.

Age Accidents Age Accidents

16 4 23 0 24 2 27 1 18 5 32 1 17 4 22 3

42. Wendy’s offers eight different condiments (mustard, catsup, onion, mayonnaise, pickle, lettuce, tomato, and relish) on hamburgers. A store manager collected the following in- formation on the number of condiments ordered and the age group of the customer. What can you conclude regarding the information? Who tends to order the most or least number of condiments?

Age

Number of Condiments Under 18 18 up to 40 40 up to 60 60 or older

0 12 18 24 52 1 21 76 50 30 2 39 52 40 12 3 or more 71 87 47 28

43. Here is a table showing the number of employed and unemployed workers 20 years or older by gender in the United States.

Number of Workers (000)

Gender Employed Unemployed

Men 70,415 4,209 Women 61,402 3,314

a. How many workers were studied? b. What percent of the workers were unemployed? c. Compare the percent unemployed for the men and the women.

126 A REVIEW OF CHAPTERS 1–4

D A T A A N A L Y T I C S

price. Create a box plot. Comment on the distribution of home prices. b. Develop a scatter diagram with price on the vertical axis and the size of the home on

the horizontal. Is there a relationship between these variables? Is the relationship direct or indirect?

b. Using the variable, salary, create a box plot. Are there any outliers? Compute the quartiles using formula (4–1). Write a brief summary of your analysis.

c. Draw a scatter diagram with the variable, wins, on the vertical axis and salary on the horizontal axis. What are your conclusions?

d. Using the variable, wins, draw a dot plot. What can you conclude from this plot? 46. Refer to the Lincolnville School District bus data.

a. Referring to the maintenance cost variable, develop a box plot. What are the mini- mum, first quartile, median, third quartile, and maximum values? Are there any outliers?

A REVIEW OF CHAPTERS 1–4 127

124 14 150 289 52 156 203 82 27 248 39 52 103 58 136 249 110 298 251 157 186 107 142 185 75 202 119 219 156 78 116 152 206 117 52 299 58 153 219 148 145 187 165 147 158 146 185 186 149 140

Use a statistical software package such as Excel or Minitab to help answer the following questions. a. Determine the mean, median, and standard deviation. b. Determine the first and third quartiles. c. Develop a box plot. Are there any outliers? Do the amounts follow a symmetric distri-

bution or are they skewed? Justify your answer. d. Organize the distribution of funds into a frequency distribution. e. Write a brief summary of the results in parts a to d.

2. Listed below are the 45 U.S. presidents and their age as they began their terms in office.

Number Name Age

1 Washington 57 2 J. Adams 61 3 Jefferson 57 4 Madison 57 5 Monroe 58 6 J. Q. Adams 57 7 Jackson 61 8 Van Buren 54 9 W. H. Harrison 68 10 Tyler 51 11 Polk 49 12 Taylor 64 13 Fillmore 50 14 Pierce 48 15 Buchanan 65 16 Lincoln 52 17 A. Johnson 56 18 Grant 46 19 Hayes 54 20 Garfield 49 21 Arthur 50 22 Cleveland 47 23 B. Harrison 55

Number Name Age

24 Cleveland 55 25 McKinley 54 26 T. Roosevelt 42 27 Taft 51 28 Wilson 56 29 Harding 55 30 Coolidge 51 31 Hoover 54 32 F. D. Roosevelt 51 33 Truman 60 34 Eisenhower 62 35 Kennedy 43 36 L. B. Johnson 55 37 Nixon 56 38 Ford 61 39 Carter 52 40 Reagan 69 41 G. H. W. Bush 64 42 Clinton 46 43 G. W. Bush 54 44 Obama 47 45 Trump 70

bution or are they skewed? Justify your answer. d. Organize the distribution of ages into a frequency distribution. e. Write a brief summary of the results in parts a to d.

P R O B L E M S

1. The duration in minutes of a sample of 50 power outages last year in the state of South Carolina is listed below.

128 A REVIEW OF CHAPTERS 1–4

3. Listed below is the 2014 median household income for the 50 states and the District of Columbia. https://www.census.gov/hhes/www/income/data/historical/ household/

State Amount

Alabama 42,278 Alaska 67,629 Arizona 49,254 Arkansas 44,922 California 60,487 Colorado 60,940 Connecticut 70,161 Delaware 57,522 D.C. 68,277 Florida 46,140 Georgia 49,555 Hawaii 71,223 Idaho 53,438 Illinois 54,916 Indiana 48,060 Iowa 57,810 Kansas 53,444 Kentucky 42,786 Louisiana 42,406 Maine 51,710 Maryland 76,165 Massachusetts 63,151 Michigan 52,005 Minnesota 67,244 Mississippi 35,521 Missouri 56,630

State Amount

Montana 51,102 Nebraska 56,870 Nevada 49,875 New Hampshire 73,397 New Jersey 65,243 New Mexico 46,686 New York 54,310 North Carolina 46,784 North Dakota 60,730 Ohio 49,644 Oklahoma 47,199 Oregon 58,875 Pennsylvania 55,173 Rhode Island 58,633 South Carolina 44,929 South Dakota 53,053 Tennessee 43,716 Texas 53,875 Utah 63,383 Vermont 60,708 Virginia 66,155 Washington 59,068 West Virginia 39,552 Wisconsin 58,080 Wyoming 55,690

bution or are they skewed? Justify your answer. d. Organize the distribution of funds into a frequency distribution. e. Write a brief summary of the results in parts a to d.

4. A sample of 12 homes sold last week in St. Paul, Minnesota, revealed the following information. Draw a scatter diagram. Can we conclude that, as the size of the home (reported below in thousands of square feet) increases, the selling price (reported in $ thousands) also increases?

Home Size Home Size (thousands of Selling Price (thousands of Selling Price square feet) ($ thousands) square feet) ($ thousands)

1.4 100 1.3 110 1.3 110 0.8 85 1.2 105 1.2 105 1.1 120 0.9 75 1.4 80 1.1 70 1.0 105 1.1 95

5. Refer to the following diagram.

0 40 80 120 160 200

* *

A REVIEW OF CHAPTERS 1–4 129

C A S E S

B. Wildcat Plumbing Supply Inc.: Do We Have Gender Differences?

Yearly Salary ($000) Women Men

Less than 30 2 0 30 up to 40 3 1 40 up to 50 17 4 50 up to 60 17 24 60 up to 70 8 21 70 up to 80 3 7 80 or more 0 3

To kick off the project, Mr. Cory St. Julian held a meeting with his staff and you were invited. At this meeting, it was suggested that you calculate several measures of

130 A REVIEW OF CHAPTERS 1–4

C. Kimble Products: Is There a Difference In the Commissions?

Commissions Earned by Sales Representatives Calling on Large Retailers ($)

1,116 681 1,294 12 754 1,206 1,448 870 944 1,255 1,213 1,291 719 934 1,313 1,083 899 850 886 1,556 886 1,315 1,858 1,262 1,338 1,066 807 1,244 758 918

Commissions Earned by Sales Representatives Calling on Athletic Departments ($)

354 87 1,676 1,187 69 3,202 680 39 1,683 1,106 883 3,140 299 2,197 175 159 1,105 434 615 149 1,168 278 579 7 357 252 1,602 2,321 4 392 416 427 1,738 526 13 1,604 249 557 635 527

P R A C T I C E T E S T

Part 1—Objective 1. The science of collecting, organizing, presenting, analyzing, and interpreting data to assist in

making effective decisions is called . 1. 2. Methods of organizing, summarizing, and presenting data in an informative way are

called . 2. 3. The entire set of individuals or objects of interest or the measurements obtained from all

individuals or objects of interest are called the . 3. 4. List the two types of variables. 4. 5. The number of bedrooms in a house is an example of a . (discrete variable,

continuous variable, qualitative variable—pick one) 5. 6. The jersey numbers of Major League Baseball players are an example of what level of

at least half the values are negative, or never—pick one.) 12. 13. Which of the following is least affected by an outlier? (mean, median, or range—pick one) 13.

Part 2—Problems 1. The Russell 2000 index of stock prices increased by the following amounts over the last 3 years.

18% 4% 2%

What is the geometric mean increase for the 3 years?

2. The information below refers to the selling prices ($000) of homes sold in Warren, Pennsylvania, during 2016.

Selling Price ($000) Frequency

120.0 up to 150.0 4 150.0 up to 180.0 18 180.0 up to 210.0 30 210.0 up to 240.0 20 240.0 up to 270.0 17 270.0 up to 300.0 10 300.0 up to 330.0 6

a. What is the class interval? b. How many homes were sold in 2016? c. How many homes sold for less than $210,000? d. What is the relative frequency of the 210 up to 240 class? e. What is the midpoint of the 150 up to 180 class? f. The selling prices range between what two amounts?

3. A sample of eight college students revealed they owned the following number of CDs.

52 76 64 79 80 74 66 69

a. What is the mean number of CDs owned? b. What is the median number of CDs owned? c. What is the 40th percentile? d. What is the range of the number of CDs owned? e. What is the standard deviation of the number of CDs owned?

4. An investor purchased 200 shares of the Blair Company for $36 each in July of 2013, 300 shares at $40 each in September 2015, and 500 shares at $50 each in January 2016. What is the investor’s weighted mean price per share?

5. During the 50th Super Bowl, 30 million pounds of snack food were eaten. The chart below depicts this information.

Snack Nuts 8%

Potato Chips 37%

Tortilla Chips 28%

Pretzels 14%

Popcorn 13%

a. What is the name given to this graph? b. Estimate, in millions of pounds, the amount of potato chips eaten during the game. c. Estimate the relationship of potato chips to popcorn. (twice as much, half as much, three

times, none of these—pick one) d. What percent of the total do potato chips and tortilla chips comprise?

A REVIEW OF CHAPTERS 1–4 131

LEARNING OBJECTIVES When you have completed this chapter, you will be able to:

LO5-1 Define the terms probability, experiment, event, and outcome.

LO5-2 Assign probabilities using a classical, empirical, or subjective approach.

LO5-3 Calculate probabilities using the rules of addition.

LO5-4 Calculate probabilities using the rules of multiplication.

LO5-5 Compute probabilities using a contingency table.

LO5-6 Calculate probabilities using Bayes’ theorem.

LO5-7 Determine the number of outcomes using principles of counting.

RECENT SURVEYS indicate 60% of tourists to China visited the Forbidden City, the Temple of Heaven, the Great Wall, and other historical sites in or near Beijing. Forty percent visited Xi’an and its magnificent terra-cotta soldiers, horses, and chariots, which lay buried for over 2,000 years. Thirty percent of the tourists went to both Beijing and Xi’an. What is the probability that a tourist visited at least one of these places? (See Exercise 76 and LO5-3.)

A Survey of Probability Concepts5

A SURVEY OF PROBABILITY CONCEPTS 133

INTRODUCTION The emphasis in Chapters 2, 3, and 4 is on descriptive statistics. In Chapter 2, we orga- nize the profits on 180 vehicles sold by the Applewood Auto Group into a frequency distribution. This frequency distribution shows the smallest and the largest profits and where the largest concentration of data occurs. In Chapter 3, we use numerical mea- sures of location and dispersion to locate a typical profit on vehicle sales and to exam- ine the variation in the profit of a sale. We describe the variation in the profits with such measures of dispersion as the range and the standard deviation. In Chapter 4, we de- velop charts and graphs, such as a scatter diagram or a dot plot, to further describe the data graphically.

Descriptive statistics is concerned with summarizing data collected from past events. We now turn to the second facet of statistics, namely, computing the chance that something will occur in the future. This facet of statistics is called statistical infer- ence or inferential statistics.

Seldom does a decision maker have complete information to make a decision. For example:

• Toys and Things, a toy and puzzle manufac- turer, recently developed a new game based on sports trivia. It wants to know whether sports buffs will purchase the game. “Slam Dunk” and “Home Run” are two of the names under consideration. To investigate, the president of Toys and Things decided to hire a market research firm. The firm selected a sample of 800 consumers from the population and asked each respon- dent for a reaction to the new game and its proposed titles. Using the sample results, the company can estimate the proportion of the population that will purchase the game.

• The quality assurance department of a U.S. Steel mill must assure management that the quarter-inch wire being produced has an acceptable tensile strength. Clearly, not all the wire produced can be tested for tensile strength because testing re- quires the wire to be stretched until it breaks—thus destroying it. So a random sam- ple of 10 pieces is selected and tested. Based on the test results, all the wire produced is deemed to be either acceptable or unacceptable.

• Other questions involving uncertainty are: Should the daytime drama Days of Our Lives be discontinued immediately? Will a newly developed mint-flavored cereal be profitable if marketed? Will Charles Linden be elected to county auditor in Batavia County?

Statistical inference deals with conclusions about a population based on a sample taken from that population. (The populations for the preceding illustrations are all con- sumers who like sports trivia games, all the quarter-inch steel wire produced, all televi- sion viewers who watch soaps, all who purchase breakfast cereal, and so on.)

Because there is uncertainty in decision making, it is important that all the known risks involved be scientifically evaluated. Helpful in this evaluation is probability theory, often referred to as the science of uncertainty. Probability theory allows the decision maker to analyze the risks and minimize the gamble inherent, for example, in marketing a new product or accepting an incoming shipment possibly containing defective parts.

Because probability concepts are so important in the field of statistical inference (to be discussed starting with Chapter 8), this chapter introduces the basic language of probability, including such terms as experiment, event, subjective probability, and addi- tion and multiplication rules.

STATISTICS IN ACTION

Government statistics show there are about 1.7 automobile-caused fatalities for every 100,000,000 vehicle-miles. If you drive 1 mile to the store to buy your lottery ticket and then return home, you have driven 2 miles. Thus the probability that you will join this statistical group on your next 2-mile round trip is 2 × 1.7/100,000,000 = 0.000000034. This can also be stated as “One in 29,411,765.” Thus, if you drive to the store to buy your Powerball ticket, your chance of being killed (or killing someone else) is more than 4 times greater than the chance that you will win the Powerball Jackpot, one chance in 120,526,770. http://www.durangobill .com/PowerballOdds.html

134 CHAPTER 5

WHAT IS A PROBABILITY? No doubt you are familiar with terms such as probability, chance, and likelihood. They are often used interchangeably. The weather forecaster announces that there is a 70% chance of rain for Super Bowl Sunday. Based on a survey of consumers who tested a newly developed toothpaste with a banana flavor, the probability is .03 that, if marketed, it will be a financial success. (This means that the chance of the banana-flavor tooth- paste being accepted by the public is rather remote.) What is a probability? In general, it is a numerical value that describes the chance that something will happen.

LO5-1 Define the terms probability, experiment, event, and outcome.

PROBABILITY A value between zero and one, inclusive, describing the relative possibility (chance or likelihood) an event will occur.

A probability is frequently expressed as a decimal, such as .70, .27, or .50, or a percent such as 70%, 27% or 50%. It also may be reported as a fraction such as 7/10, 27/100, or 1/2. It can assume any number from 0 to 1, inclusive. Expressed as a per- centage, the range is between 0% and 100%, inclusive. If a company has only five sales regions, and each region’s name or number is written on a slip of paper and the slips put in a hat, the probability of selecting one of the five regions is 1. The probability of select- ing from the hat a slip of paper that reads “Pittsburgh Steelers” is 0. Thus, the probability of 1 represents something that is certain to happen, and the probability of 0 represents something that cannot happen.

The closer a probability is to 0, the more improbable it is the event will happen. The closer the probability is to 1, the more likely it will happen. The relationship is shown in the following diagram along with a few of our personal beliefs. You might, however, se- lect a different probability for Slo Poke’s chances to win the Kentucky Derby or for an increase in federal taxes.

Cannot Sure to happen happen

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00

Probability our sun will disappear this year

Chance Slo Poke will

win the Kentucky

Derby

Chance of a head in

single toss of a coin

Chance of an

increase in federal

taxes

Chance of rain in Florida

this year

Sometimes, the likelihood of an event is expressed using the term odds. To explain, someone says the odds are “five to two” that an event will occur. This means that in a total of seven trials (5 + 2), the event will occur five times and not occur two times. Using odds, we can compute the probability that the event occurs as 5/(5 + 2) or 5/7. So, if the odds in favor of an event are x to y, the probability of the event is x/(x + y).

Three key words are used in the study of probability: experiment, outcome, and event. These terms are used in our everyday language, but in statistics they have spe- cific meanings.

EXPERIMENT A process that leads to the occurrence of one and only one of several possible results.

A SURVEY OF PROBABILITY CONCEPTS 135

This definition is more general than the one used in the physical sciences, where we picture someone manipulating test tubes or microscopes. In reference to proba- bility, an experiment has two or more possible results, and it is uncertain which will occur.

OUTCOME A particular result of an experiment.

EVENT A collection of one or more outcomes of an experiment.

For example, the tossing of a coin is an experiment. You are unsure of the outcome. When a coin is tossed, one particular outcome is a “head.” The alternative outcome is a “tail.” Similarly, asking 500 college students if they would travel more than 100 miles to attend a Mumford and Sons concert is an experiment. In this experiment, one possible outcome is that 273 students indicate they would travel more than 100 miles to attend the concert. Another outcome is that 317 students would attend the concert. Still an- other outcome is that 423 students indicate they would attend the concert. When one or more of the experiment’s outcomes are observed, we call this an event.

Examples to clarify the definitions of the terms experiment, outcome, and event are presented in the following figure.

In the die-rolling experiment, there are six possible outcomes, but there are many possible events. When counting the number of members of the board of directors for Fortune 500 companies over 60 years of age, the number of possible outcomes can be anywhere from zero to the total number of members. There are an even larger number of possible events in this experiment.

Experiment

All possible outcomes

Some possible events

Roll a die

Observe a 1

Observe a 2

Observe a 3

Observe a 4

Observe a 5

Observe a 6

Observe an even number Observe a number greater than 4

Observe a number 3 or less

Count the number of members of the board of directors

for Fortune 500 companies who are over 60 years of age

None is over 60

One is over 60

Two are over 60

...

29 are over 60

...

48 are over 60

...

More than 13 are over 60 Fewer than 20 are over 60

136 CHAPTER 5

APPROACHES TO ASSIGNING PROBABILITIES In this section, we describe three ways to assign a probability to an event: classical, empirical, and subjective. The classical and empirical methods are objective and are based on information and data. The subjective method is based on a person’s belief or estimate of an event’s likelihood.

Classical Probability Classical probability is based on the assumption that the outcomes of an experiment are equally likely. Using the classical viewpoint, the probability of an event happening is com- puted by dividing the number of favorable outcomes by the number of possible outcomes:

LO5-2 Assign probabilities using a classical, empirical, or subjective approach.

Video Games Inc. recently developed a new video game. Its playability is to be tested by 80 veteran game players. (a) What is the experiment? (b) What is one possible outcome? (c) Suppose 65 of the 80 players testing the new game said they liked it. Is 65 a probability? (d) The probability that the new game will be a success is computed to be −1.0. Comment. (e) Specify one possible event.

S E L F - R E V I E W 5–1

Probability of an event =

Number of favorable outcomes Total number of possible outcomes

[5–1]CLASSICAL PROBABILITY

E X A M P L E

Consider an experiment of rolling a six-sided die. What is the probability of the event “an even number of spots appear face up”?

S O L U T I O N

The possible outcomes are:

a one-spot

a two-spot

a three-spot

a four-spot

a five-spot

a six-spot

There are three “favorable” outcomes (a two, a four, and a six) in the collection of six equally likely possible outcomes. Therefore:

Probability of an even number = 3 6

← ←

Number of favorable outcomes

Total number of possible outcomes = .5

The mutually exclusive concept appeared earlier in our study of frequency distri- butions in Chapter 2. Recall that we create classes so that a particular value is included in only one of the classes and there is no overlap between classes. Thus, only one of several events can occur at a particular time.

A SURVEY OF PROBABILITY CONCEPTS 137

MUTUALLY EXCLUSIVE The occurrence of one event means that none of the other events can occur at the same time.

COLLECTIVELY EXHAUSTIVE At least one of the events must occur when an experiment is conducted.

EMPIRICAL PROBABILITY The probability of an event happening is the fraction of the time similar events happened in the past.

LAW OF LARGE NUMBERS Over a large number of trials, the empirical probability of an event will approach its true probability.

The variable “gender” presents mutually exclusive outcomes, male and female. An employee selected at random is either male or female but cannot be both. A manufac- tured part is acceptable or unacceptable. The part cannot be both acceptable and unac- ceptable at the same time. In a sample of manufactured parts, the event of selecting an unacceptable part and the event of selecting an acceptable part are mutually exclusive.

If an experiment has a set of events that includes every possible outcome, such as the events “an even number” and “an odd number” in the die-tossing experiment, then the set of events is collectively exhaustive. For the die-tossing experiment, every out- come will be either even or odd. So the set is collectively exhaustive.

If the set of events is collectively exhaustive and the events are mutually exclusive, the sum of the probabilities is 1. Historically, the classical approach to probability was developed and applied in the 17th and 18th centuries to games of chance, such as cards and dice. It is unnecessary to do an experiment to determine the probability of an event occurring using the classical approach because the total number of outcomes is known before the experiment. The flip of a coin has two possible outcomes; the roll of a die has six possible outcomes. We can logically arrive at the probability of getting a tail on the toss of one coin or three heads on the toss of three coins.

The classical approach to probability can also be applied to lotteries. In South Carolina, one of the games of the Education Lottery is “Pick 3.” A person buys a lottery ticket and selects three numbers between 0 and 9. Once per week, the three numbers are randomly selected from a machine that tumbles three containers each with balls numbered 0 through 9. One way to win is to match the numbers and the order of the numbers. Given that 1,000 possible outcomes exist (000 through 999), the probability of winning with any three-digit number is 0.001, or 1 in 1,000.

Empirical Probability Empirical or relative frequency is the second type of objective probability. It is based on the number of times an event occurs as a proportion of a known number of trials.

The formula to determine an empirical probability is:

Empirical probability = Number of times the event occurs

Total number of observations The empirical approach to probability is based on what is called the law of large numbers. The key to establishing probabilities empirically is that more observations will provide a more accurate estimate of the probability.

To explain the law of large numbers, suppose we toss a fair coin. The result of each toss is either a head or a tail. With just one toss of the coin the empirical probability for

138 CHAPTER 5

heads is either zero or one. If we toss the coin a great number of times, the probability of the outcome of heads will approach .5. The following table reports the results of seven different experiments of flipping a fair coin 1, 10, 50, 100, 500, 1,000, and 10,000 times and then computing the relative frequency of heads. Note as we increase the number of trials, the empirical probability of a head appearing approaches .5, which is its value based on the classical approach to probability.

Number of Trials Number of Heads Relative Frequency of Heads

1 0 .00 10 3 .30 50 26 .52 100 52 .52 500 236 .472 1,000 494 .494 10,000 5,027 .5027

What have we demonstrated? Based on the classical definition of probability, the likeli- hood of obtaining a head in a single toss of a fair coin is .5. Based on the empirical or relative frequency approach to probability, the probability of the event happening ap- proaches the same value based on the classical definition of probability.

This reasoning allows us to use the empirical or relative frequency approach to finding a probability. Here are some examples.

• Last semester, 80 students registered for Business Statistics 101 at Scandia University. Twelve students earned an A. Based on this information and the empirical approach to assigning a probability, we estimate the likelihood a student at Scandia will earn an A is .15.

• Stephen Curry of the Golden State Warriors made 363 out of 400 free throw attempts during the 2015–16 NBA season. Based on the empirical approach to probability, the likelihood of him making his next free throw attempt is .908.

Life insurance companies rely on past data to determine the acceptability of an appli- cant as well as the premium to be charged. Mortality tables list the likelihood a person of a particular age will die within the upcoming year. For example, the likelihood a 20-year-old female will die within the next year is .00105.

The empirical concept is illustrated with the following example.

E X A M P L E

On February 1, 2003, the Space Shuttle Columbia exploded. This was the second disaster in 113 space missions for NASA. On the basis of this information, what is the probability that a future mission is successfully completed?

S O L U T I O N

We use letters or numbers to simplify the equations. P stands for probability and A represents the event of a successful mission. In this case, P(A) stands for the probability a future mission is successfully completed.

Probability of a successful flight = Number of successful flights

Total number of flights

P(A) = 111 113

= .98

We can use this as an estimate of probability. In other words, based on past experience, the probability is .98 that a future space shuttle mission will be safely completed.

A SURVEY OF PROBABILITY CONCEPTS 139

Subjective Probability If there is little or no experience or information on which to base a probability, it is esti- mated subjectively. Essentially, this means an individual evaluates the available opin- ions and information and then estimates or assigns the probability. This probability is called a subjective probability.

SUBJECTIVE CONCEPT OF PROBABILITY The likelihood (probability) of a particular event happening that is assigned by an individual based on whatever information is available.

Illustrations of subjective probability are:

1. Estimating the likelihood the New England Patriots will play in the Super Bowl next year.

2. Estimating the likelihood you are involved in an automobile accident during the next 12 months.

3. Estimating the likelihood the U.S. budget deficit will be reduced by half in the next 10 years.

The types of probability are summarized in Chart 5–1. A probability statement al- ways assigns a likelihood to an event that has not yet occurred. There is, of course, considerable latitude in the degree of uncertainty that surrounds this probability, based primarily on the knowledge possessed by the individual concerning the underlying pro- cess. The individual possesses a great deal of knowledge about the toss of a die and can state that the probability that a one-spot will appear face up on the toss of a true die is one-sixth. But we know very little concerning the acceptance in the marketplace of a new and untested product. For example, even though a market research director tests a newly developed product in 40 retail stores and states that there is a 70% chance that the product will have sales of more than 1 million units, she has limited knowledge of how consumers will react when it is marketed nationally. In both cases (the case of the person rolling a die and the testing of a new product), the individual is assigning a prob- ability value to an event of interest, and a difference exists only in the predictor’s confi- dence in the precision of the estimate. However, regardless of the viewpoint, the same laws of probability (presented in the following sections) will be applied.

Approaches to Probability

SubjectiveObjective

Empirical ProbabilityClassical Probability Based on available information

Based on equally likely outcomes

Based on relative frequencies

CHART 5–1 Summary of Approaches to Probability

140 CHAPTER 5

1. One card will be randomly selected from a standard 52-card deck. What is the proba- bility the card will be a queen? Which approach to probability did you use to answer this question?

2. The Center for Child Care reports on 539 children and the marital status of their par- ents. There are 333 married, 182 divorced, and 24 widowed parents. What is the probability a particular child chosen at random will have a parent who is divorced? Which approach did you use?

3. What is the probability you will save one million dollars by the time you retire? Which approach to probability did you use to answer this question?

S E L F - R E V I E W 5–2

1. Some people are in favor of reducing federal taxes to increase consumer spending and others are against it. Two persons are selected and their opinions are recorded. Assuming no one is undecided, list the possible outcomes.

2. A quality control inspector selects a part to be tested. The part is then declared acceptable, repairable, or scrapped. Then another part is tested. List the possible outcomes of this experiment regarding two parts.

3. A survey of 34 students at the Wall College of Business showed the following majors:

Accounting 10 Finance 5 Economics 3 Management 6 Marketing 10

From the 34 students, suppose you randomly select a student. a. What is the probability he or she is a management major? b. Which concept of probability did you use to make this estimate?

4. A large company must hire a new president. The Board of Directors prepares a list of five candidates, all of whom are equally qualified. Two of these candidates are members of a minority group. To avoid bias in the selection of the candidate, the company decides to select the president by lottery.

a. What is the probability one of the minority candidates is hired? b. Which concept of probability did you use to make this estimate?

5. In each of the following cases, indicate whether classical, empirical, or subjective probability is used.

a. A baseball player gets a hit in 30 out of 100 times at bat. The probability is .3 that he gets a hit in his next at bat.

b. A seven-member committee of students is formed to study environmental issues. What is the likelihood that any one of the seven is randomly chosen as the spokesperson?

c. You purchase a ticket for the Lotto Canada lottery. Over 5 million tickets were sold. What is the likelihood you will win the $1 million jackpot?

d. The probability of an earthquake in northern California in the next 10 years above 5.0 on the Richter Scale is .80.

6. A firm will promote two employees out of a group of six men and three women. a. List all possible outcomes. b. What probability concept would be used to assign probabilities to the outcomes?

7. A sample of 40 oil industry executives was selected to test a questionnaire. One question about environmental issues required a yes or no answer.

a. What is the experiment? b. List one possible event. c. Ten of the 40 executives responded yes. Based on these sample responses,

what is the probability that an oil industry executive will respond yes? d. What concept of probability does this illustrate? e. Are each of the possible outcomes equally likely and mutually exclusive?

E X E R C I S E S

A SURVEY OF PROBABILITY CONCEPTS 141

RULES OF ADDITION FOR COMPUTING PROBABILITIES There are two rules of addition, the special rule of addition and the general rule of addi- tion. We begin with the special rule of addition.

Special Rule of Addition When we use the special rule of addition, the events must be mutually exclusive. Recall that mutually exclusive means that when one event occurs, none of the other events can occur at the same time. An illustration of mutually exclusive events in the die-tossing experiment is the events “a number 4 or larger” and “a number 2 or smaller.” If the outcome is in the first group {4, 5, and 6}, then it cannot also be in the second group {1 and 2}. Another illustration is a product coming off the assembly line cannot be defective and satisfactory at the same time.

If two events A and B are mutually exclusive, the special rule of addition states that the probability of one or the other event’s occurring equals the sum of their probabili- ties. This rule is expressed in the following formula:

LO5-3 Calculate probabilities using the rules of addition.

8. A sample of 2,000 licensed drivers revealed the following number of speed- ing violations.

Number of Violations Number of Drivers

0 1,910 1 46 2 18 3 12 4 9 5 or more 5

Total 2,000

a. What is the experiment? b. List one possible event. c. What is the probability that a particular driver had exactly two speeding violations? d. What concept of probability does this illustrate?

9. Bank of America customers select their own three-digit personal identification num- ber (PIN) for use at ATMs.

a. Think of this as an experiment and list four possible outcomes. b. What is the probability that a customer will pick 259 as their PIN? c. Which concept of probability did you use to answer (b)?

10. An investor buys 100 shares of AT&T stock and records its price change daily. a. List several possible events for this experiment. b. Which concept of probability did you use in (a)?

SPECIAL RULE OF ADDITION P(A or B) = P(A) + P(B) [5–2]

For three mutually exclusive events designated A, B, and C, the rule is written:

P(A or B or C) = P(A) + P(B) + P(C)

An example will show the details.

142 CHAPTER 5

English logician J. Venn (1834–1923) developed a diagram to portray graphically the outcome of an experiment. The mutually exclusive concept and various other rules for combining probabilities can be illustrated using this device. To construct a Venn dia- gram, a space is first enclosed representing the total of all possible outcomes. This space is usually in the form of a rectangle. An event is then represented by a circular area that is drawn inside the rectangle proportional to the probability of the event. The following Venn diagram represents the mutually exclusive concept. There is no overlap- ping of events, meaning that the events are mutually exclusive. In the following Venn diagram, assume the events A, B, and C are about equally likely.

E X A M P L E

A machine fills plastic bags with a mixture of beans, broccoli, and other vegetables. Most of the bags contain the correct weight, but because of the variation in the size of the beans and other vegetables, a package might be underweight or over- weight. A check of 4,000 packages filled in the past month revealed:

Number of Probability of Weight Event Packages Occurrence

Underweight A 100 .025 ← 100

4,000Satisfactory B 3,600 .900 Overweight C 300 .075

4,000 1.000

What is the probability that a particular package will be either underweight or overweight?

S O L U T I O N

The outcome “underweight” is the event A. The outcome “overweight” is the event C. Applying the special rule of addition:

P(A or C) = P(A) + P(C) = .025 + .075 = .10

Note that the events are mutually exclusive, meaning that a package of mixed veg- etables cannot be underweight, satisfactory, and overweight at the same time. They are also collectively exhaustive; that is, a selected package must be either under- weight, satisfactory, or overweight.

Event A

Event B

Event C

A SURVEY OF PROBABILITY CONCEPTS 143

Complement Rule The probability that a bag of mixed vegetables selected is underweight, P(A), plus the probability that it is not an underweight bag, written P(∼A) and read “not A,” must logi- cally equal 1. This is written:

P(A) + P(∼A) = 1

This can be revised to read:

COMPLEMENT RULE P(A) = 1 − P(∼A) [5–3]

This is the complement rule. It is used to determine the probability of an event occurring by subtracting the probability of the event not occurring from 1. This rule is useful because sometimes it is easier to calculate the probability of an event happening by determining the probability of it not happening and subtracting the result from 1. Notice that the events A and ∼A are mutually exclusive and collectively exhaustive. Therefore, the probabilities of A and ∼A sum to 1. A Venn diagram illustrating the complement rule is shown as:

Event A

, A

E X A M P L E

Referring to the previous example/solution, the probability a bag of mixed vegeta- bles is underweight is .025 and the probability of an overweight bag is .075. Use the complement rule to show the probability of a satisfactory bag is .900. Show the solution using a Venn diagram.

S O L U T I O N

The probability the bag is unsatisfactory equals the probability the bag is over- weight plus the probability it is underweight. That is, P(A or C) = P(A) + P(C) = .025 + .075 = .100. The bag is satisfactory if it is not underweight or overweight, so P(B) = 1 − [P(A) + P(C)] = 1 − [.025 + .075] = 0.900. The Venn diagram portraying this situation is:

A .025

not (A or C) .90

C .075

144 CHAPTER 5

P(Disney or Busch) = P(Disney) + P(Busch) − P(both Disney and Busch) = .60 + .50 − .30 = .80

When two events both occur, the probability is called a joint probability. The prob- ability (.30) that a tourist visits both attractions is an example of a joint probability.

The following Venn diagram shows two events that are not mutually exclusive. The two events overlap to illustrate the joint event that some people have visited both attractions.

A sample of employees of Worldwide Enterprises is to be surveyed about a new health care plan. The employees are classified as follows:

Classification Event Number of Employees

Supervisors A 120 Maintenance B 50 Production C 1,460 Management D 302 Secretarial E 68

S E L F - R E V I E W 5–3

STATISTICS IN ACTION

A SURVEY OF PROBABILITY CONCEPTS 145

P (Disney) = .60 P (Busch) = .50

P (Disney and Busch) = .30

JOINT PROBABILITY A probability that measures the likelihood two or more events will happen concurrently.

So the general rule of addition, which is used to compute the probability of two events that are not mutually exclusive, is:

GENERAL RULE OF ADDITION P(A or B) = P(A) + P(B) − P(A and B) [5–4]

E X A M P L E

What is the probability that a card chosen at random from a standard deck of cards will be either a king or a heart?

S O L U T I O N

Card Probability Explanation

King P(A) = 4/52 4 kings in a deck of 52 cards Heart P(B) = 13/52 13 hearts in a deck of 52 cards King of Hearts P(A and B) = 1/52 1 king of hearts in a deck of 52 cards

146 CHAPTER 5

From formula (5–4):

P(A or B) = P(A) + P(B) − P(A and B) = 4/52 + 13/52 − 1/52 = 16/52, or .3077

A Venn diagram portrays these outcomes, which are not mutually exclusive.

Kings

Hearts

Both

A B A

and

Routine physical examinations are conducted annually as part of a health service program for General Concrete Inc. employees. It was discovered that 8% of the employees need corrective shoes, 15% need major dental work, and 3% need both corrective shoes and major dental work. (a) What is the probability that an employee selected at random will need either corrective

shoes or major dental work? (b) Show this situation in the form of a Venn diagram.

S E L F - R E V I E W 5–4

11. The events A and B are mutually exclusive. Suppose P(A) = .30 and P(B) = .20. What is the probability of either A or B occurring? What is the probability that neither A nor B will happen?

12. The events X and Y are mutually exclusive. Suppose P(X) = .05 and P(Y) = .02. What is the probability of either X or Y occurring? What is the probability that neither X nor Y will happen?

13. A study of 200 advertising firms revealed their income after taxes:

Income after Taxes Number of Firms

Under $1 million 102 $1 million to $20 million 61 $20 million or more 37

a. What is the probability an advertising firm selected at random has under $1 million in income after taxes?

b. What is the probability an advertising firm selected at random has either an in- come between $1 million and $20 million, or an income of $20 million or more? What rule of probability was applied?

14. The chair of the board of directors says, “There is a 50% chance this company will earn a profit, a 30% chance it will break even, and a 20% chance it will lose money next quarter.”

a. Use an addition rule to find the probability the company will not lose money next quarter.

b. Use the complement rule to find the probability it will not lose money next quarter. 15. Suppose the probability you will get an A in this class is .25 and the probability you

will get a B is .50. What is the probability your grade will be above a C?