Week 3

mloi01
Chapter17.doc

294 Part Five: Sampling and Fieldwork

Chapter Seventeen: Determination of Sample Size 295

Chapter 17

Determination of Sample Size: A Review of Statistical Theory

AT-A-GLANCE

I. Introductions

A. Descriptive and inferential statistics

B. Sample statistics and population parameters

II. Making Data Usable

A. Frequency distributions

B. Proportions

C. Measures of central tendency

· The mean

· The median

· The mode

D. Measures of dispersion

· The range

· Why use the standard deviation?

· Variance

· Standard deviation

III. The Normal Distribution

IV. Population Distribution, Sample Distribution, and Sampling Distribution

V. Central-Limit Theorem

VI. Estimation of Parameters

A. Point estimates

B. Confidence intervals

VII. Sample Size

A. Random error and sample size

B. Factors in determining sample size for questions involving means

C. Estimating sample size for questions involving means

D. The influence of population size on sample size

E. Factors in determining sample size for proportions

F. Calculating sample size for sample proportions

G. Determining sample size on the basis of judgment

H. Determining sample size for stratified and other probability samples

I. Determining level of precision after data collection

VIII. A Reminder About Statistics

LEARNING OUTCOMES

1. Understand basic statistical terminology

2. Interpret frequency distributions, proportions, and measures of central tendency and dispersion

3. Distinguish among population, sample, and sampling distributions

4. Explain the central-limit theorem

5. Summarize the use of confidence interval estimates

6. Discuss major issues in specifying sample size

CHAPTER VIGNETTE: Federal Reserve Finds Cards Are Replacing Cash

Payment options have gone high tech, and today’s spenders are more likely to pay with a debit or credit card or through a variety of methods for electronic transfer of funds. Researchers at the Federal Reserve conducted surveys of depository institutions (i.e., banks, savings and loans, credit unions) asking them to report the number of each type of payment the institutions processed. The Fed’s carefully designed the sample, including the number of institutions to contact. The total number of institutions was known—14,117—and the researchers had to select enough from this population to be confident that the answers would be representative of transactions nationwide. A stratified random sample was used so that each type of institution would be included. They determined that 2,700 institutions would need to be sampled so that they could say that the results, with 95 percent confidence, were accurate to within (5 percent of the responses. 1,500 institutions responded, and their responses confirmed earlier analysis showing that the number of checks is declining while the number of electronic payments is increasing.

SURVEY THIS!

Students are asked to consider the question on the survey that asks if the respondent is employed:

1. What percentage of respondents do you think will answer “yes”?

2. Based on your estimate, how many respondents would you need to be 95 percent confident your responses are ±5 percent of the population proportion?

3. Look at the data from your class. At the 95 percent confidence level, how precise is the measure regarding employment status?

Consider the question that asks how “interesting” or “boring” they feel their life is:

4. Review the “rule of thumb” provided in the chapter regarding estimating the value of the standard deviation of a scale. How many respondents would you need to be 95 percent confident your responses are ±5 percent of the population proportion?

5. What if you want to be 99 percent confident—what would the sample size have to be?

6. Look at the data from your class. At the 95 percent confidence level, how precise is this measure?

RESEARCH SNAPSHOTS

· The Well-Chosen Average

The average pay of the people who work in an establishment may mean something or it may not. If the average is a median that means half the employees make more than that and half make less. But if it is a mean, you may be getting nothing more than the average of one large income (i.e., the proprietor’s) and the salaries of a crew of underpaid workers. A simple example is given where the “average wage” is $57,000. However, the mode, which is the most common and occurs most frequently, is $20,000, and the median is $30,000, which means that half the people earn more than this and half earn less. Do politicians use statistics to lie or do the statistics lie when they claim that the “rich do not pay taxes”? Two websites are given to look at the tax figures.

· Sampling the World

The Gallup organization’s WorldView generates a comprehensive snapshot of the opinions of people from across the globe. The company collects data from over 150 countries on various topics using representative samples from 98 percent of the world’s adult population, with samples generally consisting of 1,000 individuals. Gallup’s WorldView program provides extremely valuable information on poorer and more rural populations. To accurately reflect the world’s sentiments and opinions, you must have an accurate sample of the world’s population.

· Target and Wal-Mart Shoppers Really Are Different

Scarborough Research sampled over 200,000 adults to compare consumers who shop exclusively at either Target or Wal-Mart. The largest share (40 percent) named both stores when asked to identify the stores at which they had shopped during the preceding three months. However, 31 percent shopped at Wal-Mart but not Target, and 12 percent shopped at Target but not Wal-Mart. Target-only shoppers were younger and more likely to have a high household income, were more likely to shop at more upscale stores, and to visit many different stores than the Wal-Mart-only shoppers. The Wal-Mart shoppers were more likely to shop at discounters (i.e., Dollar General and Kmart) and were more likely to be at least 50 years old. Given a U.S. adult population of approximately 220 million, do you think the sample size was adequate to make these comparisons?

OUTLINE

I. INTRODUCTION

· Descriptive and Inferential Statistics

· There are two applications of statistics: (1) to describe characteristics of the population or sample (descriptive statistics) and (2) to generalize from the sample to the population (inferential statistics).

· Sample Statistics and Population Parameters

· Sample statistics are variables in the sample or measures computed from the sample data.

· Population parameters are variables or measured characteristics of the population.

· We will generally use Greek lowercase letters to denote population parameters (e.g., ( or () and English letters to denote sample statistics (e.g., X or S).

II. MAKING THE DATA USABLE

· To make data usable, this information must be organized and summarized.

· Methods for doing this include:

· frequency distributions

· proportions

· measures of central tendency and dispersion

· Frequency Distributions

· Constructing a frequency table or frequency distribution is one of the most common means of summarizing a set of data.

· The frequency of a value is the number of times a particular value of a variable occurs.

· Exhibit 17.1 represents a frequency distribution of respondents’ answers to a question asking how much customers had deposited in the savings and loan.

· It is also quite simple to construct a distribution of relative frequency, or a percentage distribution, which is developed by dividing the frequency of each value by the total number of observations, and multiply the result by 100 (see Exhibit 17.2).

· Probability is the long-run relative frequency with which an event will occur.

· Inferential statistics uses the concept of a probability distribution, which is conceptually the same as a percentage distribution except that the data are converted into probabilities (see Exhibit 17.3).

· Proportions

· A proportion indicates the percentage of population elements that successfully meet some standard on the particular characteristic.

· May be expressed as a percentage, a fraction, or a decimal number.

· Measures of Central Tendency

· There are three ways to measure the central tendency, and each has a different meaning.

· Mean

· The mean is simply the arithmetic average, and it is a common measure of central tendency.

· It is the sum of all the observations divided by the number of observations.

· Often we will not have enough data to calculate the population mean,

image1.wmf

m

, so we will calculate a sample mean,
image2.wmf

X

(read as “X bar”).

· Median

· The median is the midpoint of the distribution, or the 50th percentile.

· In other words, the median is the value below which half the values in the sample fall.

· Mode

· The mode is the measure of central tendency that merely identifies the value that occurs most often.

· Determined by listing each possible value and noting the number of times each value occurs.

· Measures of Dispersion

· Accurate analysis of data also requires knowing the tendency of observations to depart from the central tendency.

· Thus, another way to summarize the data is to calculate the dispersion of the data, or how the observations vary from the mean.

· There are several measures of dispersion discussed below.

· Range

· The simplest measure of dispersion.

· It is the distance between the smallest and largest values of a frequency distribution.

· Does not take into account all the observations; it merely tells us about the extreme values of the distribution.

· While we do not expect all observations to be exactly like the mean, in a skinny distribution they will be a short distance from the mean, while in a fat distribution they will be spread out (see Exhibit 17.6).

· The interquartile range is the range encompassing the middle 50 percent of the observations (i.e., the range between the bottom quartile and the top quartile).

· Why Use the Standard Deviation?

· It is perhaps the most valuable index of spread, or dispersion.

· Learning about the standard deviation will be easier if we present several other measures of dispersion that may be used. Each of these has certain limitations that the standard deviation does not.

· Deviation—a method of calculating how far any observation is from the mean.

· Average deviation – determined by calculating the deviation score of each observation value (i.e., its difference from the mean) and summing up each score; then we divide by the sample size (n).

· While this means of calculating a measure of spread seems of interest, it is never used because the positive deviation scores are always canceled out by the negative deviation scores, thus leaving an average deviation value of zero.

· One might correct the disadvantage of the average deviation by computing the absolute values of the deviations. We could ignore all the positive and negative signs and utilize only the absolute values of each deviation to give us the mean absolute deviation, but there are some technical mathematical problems that make the mean absolute deviation less valuable than some other measures.

· Variance

· This is another means of eliminating the sign problem caused by the negative deviations canceling out the positive deviations.

· Useful for describing the sample variability.

· This procedure is to square the deviation scores and divide by the number of observations. (In actual fact n - 1 rather than n is used in most pragmatic business research problems.)

· A very good index of the degree of dispersion.

· The variance, S2, will be equal to zero if—and only if—each and every observation in the distribution is the same as the mean.

· Standard Deviation

· The variance does have one major drawback—it reflects a unit of measurement that has been squared.

· Because of this, statisticians have taken the square root of the variance.

· The square root of the variance for distribution is called the standard deviation.

· Exhibit 17.7 illustrates the calculation of a standard deviation.

· S is the symbol for the sample standard deviation, while

image3.wmf

s

is the symbol for the population standard deviation.

III. THE NORMAL DISTRIBUTION

· One of the most common probability distributions in statistics is the normal distribution, commonly represented by the normal curve.

· Bell shaped and almost all (99 percent) of its values are within ( 3 standard deviations from its mean.

· The standardized normal distribution is a specific normal curve that has several characteristics

1. It is symmetrical about its mean.

2. The mode identifies its highest point, which is also the mean and median, and vertical line about which this curve is symmetrical.

3. The normal curve has an infinite number of cases (it is a continuous distribution), and the area under the curve has a probability density equal to 1.0.

4. The standardized normal distribution has a mean of 0 and a standard deviation of 1.

· Exhibit 17.9 illustrates these properties, and Exhibit 17.10 is a summary version of the typical standardized normal table found at the end of most statistics textbooks.

· The standardized normal distribution is a purely theoretical probability distribution, but it is the most useful distribution in inferential statistics.

· The standardized normal distribution is extremely valuable because we can translate or transform any normal variable, X, into the standardized value, Z.

· This has many pragmatic implications for the business researcher.

· The standardized normal table in the back of most statistics and research books allows us to evaluate the probability of the occurrence of certain events without any difficulty.

· Computing the standardized value, Z, of any measurement expressed in original units is simple:

· Subtract the mean from the value to be transformed, and divide by the standard deviation (all expressed in original units).

· In the formula note that

image4.wmf

s

, the population standard deviation, is used for calculation:

image5.wmf

Z

=

X

-

m

s

where ( = hypothesized or expected value of the mean

· Example: Suppose that a toy manufacturer has experienced mean sales,

image6.wmf

m

, of 9,000 units and a standard deviation,
image7.wmf

s

, of 500 units during the month of September. The production manager wishes to know if wholesalers will demand between 7,500 and 9,625 units during the month of September in the upcoming year. Because there are no tables in the back of our textbook showing the distribution for a mean of 9,000 and a standard deviation of 500, we must transform our distribution of toy sales, X, into the standardized form utilizing our simple formula.

image8.wmf

Z

=

7

,

500

-

9

,

000

500

=

-

3

.

00

image9.wmf

Z

=

9

,

625

-

9

,

000

500

=

1

.

25

When Z = 3.00, the area under the curve (probability) equals .499

When Z = 1.25, the area under the curve (probability) equals .394

Thus, the total area under the curve is .499 + .394 = .893

The area under the curve portraying this computation is the shaded area in Exhibit 17.12. Thus, the sales manager knows there is a .893 probability that sales will be between 7,500 and 9,625.

IV. POPULATION DISTRIBUTION, SAMPLE DISTRIBUTION, AND SAMPLING DISTRIBUTION

· Three additional types of distribution must be defined:

1. population distribution

2. sample distribution

3. sampling distribution

· A frequency distribution of the population elements is called a population distribution.

· The population distribution has its mean and standard deviation represented by the Greek letters

image10.wmf

m

and
image11.wmf

s

.

· A frequency distribution of a sample is called the sample distribution.

· The sample mean is designated with

image12.wmf

X

, and the sample standard deviation is designated S.

· However, we must now introduce another distribution: the sampling distribution of the sample mean.

· A sampling distribution is a theoretical probability that shows the functional relation between the possible values of some summary characteristic of n cases drawn at random and the probability (density) associated with each value over all possible samples of size n from a particular population.

· The sampling distribution’s mean is called the expected value of the statistic.

· The expected value of the mean of the sampling distribution is equal to

image13.wmf

m

.

· The standard deviation of the sampling distribution is called the standard error of the mean (

image14.wmf

S

X

) and is approximately equal to
image15.wmf

s

n

.

· Exhibit 17.13 shows the relationship among a population distribution, the sample distribution, and three sampling distributions of varying sample size.

· As sample size increases, the spread of the sample mean around ( decreases (i.e., larger samples will have a skinnier sampling distribution).

V. CENTRAL-LIMIT THEOREM

· The central-limit theorem states: As the sample size, n, increases, the distribution of the mean,

image16.wmf

X

, of a random sample taken from practically any population approaches a normal distribution (with a mean
image17.wmf

m

and a standard deviation,
image18.wmf

s

n

).

· The central-limit theorem works regardless of the shape of the original population distribution (see Exhibit 17.14).

· This theoretical knowledge about distributions can be used to solve two very practical business research problems:

1. estimating parameters

2. determining sample size

VI. ESTIMATION OF PARAMETERS

· Point Estimates

· Our goal in using statistics is to make an estimate about the population parameters.

· The population mean,

image19.wmf

m

, and standard deviation,
image20.wmf

s

, are constants, but in most instances of business research they are unknown.

· To estimate the population values, we are required to sample.

· Example: To estimate the average number of people participating in racquetball in one week we may take a sample of 300 racquetball players throughout the area where our researcher is thinking of building club facilities. If the sample mean,

image21.wmf

X

, equals 2.6 days per week, we may use this figure as a point estimate.

· This single value, 2.6, is the best estimate of the population mean.

· However, we would be extremely lucky if the sample estimate were exactly the same as the population value.

· A less risky alternative would be to calculate a confidence interval.

· Confidence Intervals

· A confidence interval estimate is based on the knowledge that

image22.wmf

m

=

X

±

a small sampling error.

· After calculating an interval estimate, we can determine how probable it is that the population mean will fall within this range of statistical values.

· In the racquetball example the researcher, after setting up a confidence interval, would be able to make a statement such as “with 95 percent confidence, I think that the average number of days played per week is between 2.3 and 2.9.”

· The researcher has a certain confidence that the interval contains the true value of the population mean.

· The crux of the problem for the researcher is to determine how much random sampling error to tolerate.

· The confidence level is a percentage or decimal that indicates the long-run probability that the results will be correct.

· Traditionally, researchers have utilized the 95 percent confidence level.

· The confidence interval gives the estimated value of the population parameter, plus or minus an estimate of the error:

image23.wmf

m

=

X

±

a small sampling error (E)

where E =

image24.wmf

Z

c

.

1

.

S

X

image25.wmf

Z

c

.

1

.

= the value of Z, our standardized normal variable at a specified confidence level.

image26.wmf

S

X

= the standard error of the mean.

· The following is a step-by-step procedure for calculating confidence intervals:

1. Calculate

image27.wmf

X

from the sample.

2. Assuming

image28.wmf

s

is unknown, estimate the population standard deviation by finding S, the sample deviation.

3. Estimate the standard error of the mean, using the following formula:

image29.wmf

S

X

=

S

n

.

4. Determine the Z-value associated with the confidence level desired. The confidence level should be divided by 2 to determine what percentage of the area under the curve must be included on each side of the mean.

5. Calculate the confidence interval.

· Sample statistics, such as the sample means,

image30.wmf

X

s, can provide good estimates of population parameters such as
image31.wmf

m

.

· There will be a random sampling error, which is the difference between the survey results and the results of surveying the entire population.

VII. SAMPLE SIZE

· Random Error and Sample Size

· Random sampling error varies with samples of different sizes.

· Increasing the sample size decreases the width of the confidence interval at a given confidence level.

· When the standard deviation of the population is unknown, a confidence interval is calculated by using the following formula:

image32.wmf

X

±

Z

S

n

· Observe that the equation for the plus or minus error factor in the confidence interval includes n, the sample size.

· If n increases, E is reduced (see Exhibit 17.18).

· Increases in sample size reduce sampling error at a decreasing rate.

· More technically, random sampling error is inversely proportional to the square root of n.

· Thus, the main issue becomes ones of determining the optimal sample size.

· Factors in Determining Sample Size for Questions Involving Means

· Three factors are required to specify sample size:

1. the variance, or heterogeneity, of the population

2. the magnitude of acceptable error

3. the confidence level

· The variance, or heterogeneity, of the population characteristic in statistical terms refers to the standard deviation of the population parameter.

· Only a small sample is required if the population is homogeneous.

· As heterogeneity increases, so must sample size.

· The magnitude of error, or the confidence interval, is defined in statistical terms as E, and indicates how precise the estimate must be.

· It indicates a certain precision level.

· From a managerial perspective, the importance of the decision in terms of profitability will influence the researcher’s specifications of the range of error.

· The third factor of concern is the confidence level.

· We will typically use the 95 percent confidence level.

· This, however, is an arbitrary decision based on convention.

· Estimating Sample Size for Questions Involving Means

· The researcher must follow three steps:

1. Estimate the standard deviation of the population.

2. Make a judgment about the desired magnitude of error.

3. Determine a confidence level.

· The only problem is estimating the standard deviation of the population.

· Ideally, similar studies conducted in the past will be used as a basis for judging the standard deviation.

· In practice, researchers without prior information conduct a pilot study to estimate the population parameters so that another, larger sample, of the appropriate sample size, may be drawn.

· This procedure is called sequential sampling, because researchers take an initial look at the pilot study results before deciding on a larger sample to provide more precise information.

· A rule of thumb for estimating the value of the standard deviation is to expect it to be one-sixth of the range.

· In a general sense, doubling sample size will reduce error by only approximately one-quarter.

· The Influence of Population Size on Sample Size

· In most cases the size of the population does not have a major effect on the sample size.

· The variance of the population has the largest effect on sample size.

· However, a finite correction factor may be needed to adjust the sample size if that size is more than 5 percent of a finite population.

· If the sample is large relative to the population, the above procedures may overestimate sample size, and there may be a need to adjust sample size.

· Factors in Determining Sample Size for Proportions

· When the question involves the estimation of a proportion, the researcher requires some knowledge of the logic for determining a confidence interval around a sample proportion (p) of the population proportion (().

· For a confidence interval to be constructed around the sample proportion (p), an estimate of the standard error of the proportion (Sp) must be calculated and a confidence coefficient specified.

· The plus or minus estimate of the population proportion is:

Confidence interval = p

image33.wmf

±

Zc.l.Sp

Where Sp =

image34.wmf

pq

n

p = proportion of successes

q = 1 ‑ p, or proportion of failures

· To determine sample size for a proportion, the researcher must make a judgment about the confidence level and the maximum allowance for random sampling error.

· Furthermore, the size of the proportion influences sampling error, so an estimate of the expected proportion of successes must be made based on intuition or prior information. The formula is:

image35.wmf

n

=

Z

c

.

1

2

pq

(

)

E

2

· Calculating Sample Size for Sample Proportions

· In practice, a number of tables have been constructed for determining sample size.

· Exhibit 17.20 illustrates a sample size table for problems that involve sample proportions (p).

· Determining Sample Size on the Basis of Judgment

· Sample size may also be determined on the basis of managerial judgments.

· Using a sample size similar to those used in previous studies provides the inexperienced researcher with a comparison of other researchers’ judgments.

· Another judgmental factor is the selection of the appropriate item, question, or characteristics to be used for the sample size calculations.

· Often the item that will produce the largest sample size will be used to determine the ultimate sample size.

· However, the cost of data collection becomes a major consideration, and judgment must be exercised regarding the importance of such information.

· Another consideration stems from most researchers’ need to analyze the various subgroups within the sample.

· There is a judgmental rule of thumb for selecting minimum subgroup sample size:

· Each subgroup to be separately analyzed should have a minimum of 100 or more units in each category of the major breakdowns.

· With this procedure, the total sample size is computed by totaling the sample size necessary for these subgroups.

· Determining Sample Size for Stratified and Other Probability Samples

· Stratified sampling involves drawing separate samples within the subgroups to make the sample more efficient.

· With a stratified sample, the sample variances are expected to differ by strata.

· This makes the determination of sample size more complex.

· Increased complexity may also characterize the determination of sample size for cluster sampling and other probability sampling methods.

· The formulas are beyond the scope of this book.

· Determining Level of Precision after Data Collection

· After we have collected the data, we also want to determine our level of precision, given the size of the sample, the variance, and the confidence level.

· Rather than solving for n in the sample size equation, we solve for E2, and then take the square root of this to determine our level of precision.

VIII. A REMINDER ABOUT STATISTICS

· Learning the terms and symbols defined in this chapter will provide you with the basics of the language of statisticians and researchers.

QUESTIONS FOR REVIEW AND CRITICAL THINKING/ANSWERS

1. What is the difference between descriptive and inferential statistics?

Information is needed to describe phenomena. When data have been collected, tabulation and summarization make the data more manageable and meaningful. If this is our goal, we are concerned with descriptive statistics. Inferential statistics are used to make an inference about a population from a sample. Generalizing from a sample to a population is a more complex process than description.

2. Speed limits for 13 countries in miles per hour are given (see the text). What is the mean, median and mode for this data? Feel free to use your computer (statistical software or spreadsheet) to get the answer.

The mean equals 68.08.

The median equals 62.

The mode equals 62.

3. Prepare a frequency distribution for the data in question 2.

Frequency distribution of speed limits

Highway miles/hour Number

87 1

81 1

75 3

70 1

62 5

56 2

Total 13

Variable = SPEED; country speed limit - MPH

Value Frequency Cum. Freq Percent Cum. Pct.

56 2 2 15.38 15.38

62 5 7 38.46 53.85

70 1 8 7.69 61.54

75 3 11 23.08 84.62

81 1 12 7.69 92.31

87 1 13 7.69 100.00

Variable = SPEED; country speed limit - MPH

56|**

62|*****

70|*

75|***

81|*

87|*

+————+————+————+————+————+

0 10 20 30 40 50

4. Why is the standard deviation rather than the average deviation typically used?

The average deviation always equals zero. Hence, the average deviation must be manipulated so that we have a quantitative index reflecting the distribution’s spread or variability. By squaring the deviation, the problem of the deviation’s average always being zero is eliminated. The square root is taken to eliminate the drawback of having the measure of dispersion in square units rather than the original measure of units.

5. Calculate the sample standard deviation for the data in question 2.

The standard deviation equals 9.78.

image36.wmf

X

= 68.08

X (X -

image37.wmf

X

) (X -
image38.wmf

X

)2

87 18.9231 358.083

81 12.9231 167.006

75 6.9231 147.929

75 6.9231 147.929

75 6.9231 147.929

70 1.9231 3.698

62 -6.0769 36.928

62 -6.0769 36.928

62 -6.0769 36.928

62 -6.0769 36.928

62 -6.0769 36.928

56 -12.0769 145.851

56 -12.0769 145.851

885 1148.9231

Standard deviation =

image39.wmf

X

-

X

(

)

2

n

-

1

=

1148

.

92

13

-

1

=

95

.

74

=

9

.

78468

Univariate Summary Statistics

Data Set = INTERNATIONAL SPEED LIMITS - MILES PER HOUR

SPEED country speed limit - MPH

Variable N Mean Std Dev Std Error Sum

SPEED 13 68.0769 9.7849 2.7138 885.0000

6. Draw three distributions that have the same mean value but different standard deviation values. Draw three distributions that have the same standard deviation but different mean values.

image40.jpg

image41.jpg

7. A manufacturer of MP3 players surveyed one hundred retail stores in each of the firm’s sales regions. An analyst noticed that in the South Atlantic region the average retail price was $165 (mean) and the standard deviation was $30. However, in the Mid-Atlantic region the mean price was $170, with a standard deviation of $15. What do these statistics tell us about these two sales regions?

Although the mean of the South Atlantic region is slightly lower than the mean of the Mid-Atlantic region ($165 versus $170), the standard deviation in the South Atlantic region is considerably larger. Interpretation of this suggests that there is greater variability in the average retail price in the South Atlantic region. Simply stated, the distribution is much “fatter” than the “skinny” distribution in the Mid-Atlantic region.

8. What is the sampling distribution? How does it differ from the sample distribution?

The sampling distribution is a theoretical probability distribution that shows the functional relation between the possible values of some summary characteristic (such as the mean) of n cases drawn at random and the probability (density) associated with each value over all possible samples of size n from a particular population. It is a distribution of

image42.wmf

X

s. The sample distribution is the frequency distribution of X from the actual sample drawn.

9. What would happen to the sampling distribution of the mean if we increased the sample size from 5 to 25?

The sampling distribution would have less dispersion. As sample size increases, the spread of the sample means around ( decreases. Thus, with a larger sample we will have a skinnier sampling distribution.

10. Suppose a fast-food restaurant wishes to estimate average sales volume for a new menu item. The restaurant has analyzed the sales of the item at a similar outlet and observed the following results:

image43.wmf

X

= 500 (mean daily sales)

S = 100 (standard deviation of sample)

n = 25 (sample size)

The restaurant manager wants to know into what range the mean daily sales should fall 95 percent of the time. Perform this calculation.

image44.wmf

S

X

=

S

n

=

20

(1 ‑

image45.wmf

a

) =.95 = confidence coefficient

image46.wmf

a

=.05

Zc.l. = 1.96 (from standard normal table)

Upper bound =

image47.wmf

X

+

Z


image48.wmf

S

X

= 500 + 1.96(20)

= 539.2

Lower bound =

image49.wmf

X

-

Z


image50.wmf

S

X

= 500 - 1.96(20)

= 460.8

Thus, the mean sales volume should be between 460.8 and 539.2 95 percent of the time.

11. In our example of research on lipstick where E = $2 and S = $29, what sample size would we require if we desired a 99 percent confidence level? What about if we keep the 95% confidence level, but decide that our acceptable error is $4?

n =

image51.wmf

ZS

E

æ

è

ö

ø

2

E = 2

S = 29

Z = 2.57 for a 99% confidence interval

n =

image52.wmf

2

.

57

(

)

29

(

)

2

é

ë

ê

ù

û

ú

2

n =

image53.wmf

74

.

53

2

é

ë

ù

û

2

n = [37.265]2

n = 1388.6805

n = 1389

If the confidence level is kept at 95% as it is in the example given on p. 433, when E = 4, then sample size would be 202. This is illustrated in the chapter.

12. Suppose you are planning to sample cat owners to determine the average number of cans of cat food they purchase monthly. The following standards have been set: a confidence level of 99 percent and an error of less than five units. Past research has indicated that the standard deviation should be 6 units. What is the required sample size?

n =

image54.wmf

ZS

E

æ

è

ö

ø

2

E = 5

S = 6

Z = 2.57 at 99% confidence

n =

image55.wmf

2

.

57

(

)

6

(

)

5

é

ë

ê

ù

û

ú

2

=

15

.

42

5

é

ë

ù

û

2

[3.084]2 = 9.5

n= 9.5

Note: The students should be asked why sample size is so low if the confidence level is at 99%. Of course, it is because such a large magnitude of error is tolerable. Reducing the E to 1 increases sample size to 238.

n =

image56.wmf

2

.

57

(

)

6

(

)

1

é

ë

ê

ù

û

ú

2

=

15

.

42

1

é

ë

ù

û

2

=

237

.

7

13. In a survey of 500 people, 60 percent responded with agreement to an attitude question. Calculate a confidence interval at 95 percent to get an interval estimate for a proportion.

p = .6

n = 500

Z = 1.96 at 95% confidence

Sp =

image57.wmf

pq

n

=

.

6

(

)

.

4

(

)

500

=

.

24

500

=

.

0048

=

.

0219

Confidence interval = p

image58.wmf

±

Zc.l.Sp

.6

image59.wmf

±

(1.96)(.0219)

.6

image60.wmf

±

.042924

Confidence interval = .557076 ‑ .642924

14. What is a standardized normal curve?

One of the most common probability distributions in statistics is the normal distribution, also called the normal curve. This mathematical and theoretical distribution describes the expected distribution of sample means and many other chance occurrences. The normal curve is bell shaped, and almost all (99 percent) of its values are within (3 standard deviations from its mean. The standardized normal distribution is a specific normal curve that has several characteristics:

1. It is symmetrical about its mean.

2. The mean identifies the normal curve’s highest point (the mode) and the vertical line about which this normal curve is symmetrical.

3. The normal curve has an infinite number of cases (it is a continuous distribution), and the area under the curve has a probability density equal to 1.0.

4. The standardized normal distribution has a mean of 0 and a standard deviation of 1.

The standardized normal distribution is a purely theoretical probability distribution, but it is the most useful distribution in inferential statistics. It is extremely valuable because we can translate, or transform, any normal variable, X, into the standardized value, Z. The standardized normal table in the back of most statistics and research books allows us to evaluate the probability of the occurrence of many events without any difficulty.

15. A researcher expects the population proportion of Cubs fans in Chicago to be 80 percent. The researcher wishes to have an error of less than 5 percent and to be 95 percent confident of an estimate to be made from a mail survey. What sample size is required?

Z = 1.96

E = 0.05

p = 0.9

q = 0.20

n =

image61.wmf

Z

pq

2

E

2

=

1

.

96

(

)

2

.

80

(

)

.

20

(

)

.

05

(

)

2

=

3

.

8416

(

)

.

16

(

)

.

0025

(

)

=

.

6146

.

0025

=

245

.

84

n = 245 by calculation

16. [Ethics Question] Using the formula in this chapter, a researcher determines that at the 95 percent confidence level, a sample of 2,500 is required to satisfy a client’s requirements. The researcher actually uses a sample of 1,200, however, because the client has specified a budget cap for the survey. What are the ethical considerations in this situation?

It is common for business research to be conducted with sample sizes that are less than ideal. After all, the tolerance for error is in part a subjective judgment. However, there are many research projects that must have very precise and reliable estimates. The question in this situation is one of being open with the client. If this matter was discussed there is no problem. However, if the researcher finds that budget will not pay for the appropriate sample size then the researcher’s ethics are questionable.

17. [Internet Question] Go to http://www.dartmouth.edu/~chance/ to visit the Chance course. The Chance course is an innovative program to creatively teach introductory materials about probability and statistics. The Chance course is designed to enhance quantitative literacy. Numerous videos can be played online.

No student response is required for this question.

18. [Internet Question] Go to http://www.researchinfo.com. Click on “Market Research Calculators” and select the sample size calculator. How big of a sample is needed to make an inference about the U.S. population (5 percent? How large a sample is needed to make an inference about the population of Norway (5 percent? Remember, the population statistics can be found in the CIA World Factbook on-line. Comment.

The Sample Size Calculator can be used to help find the sample size required for this question. At the 95% confidence level with a 5 percent confidence interval, the sample size for both the U.S. and Norway is 384, and at the 99% confidence level it is 666. From the CIA World Factbook on-line, the July 2011 population estimate for the U.S. was 313,232,044 and 4,691,849 for Norway. Recall that in most cases the size of the population does not have a major effect on the sample size.

19. [Internet Question] A random number generator and other statistical information can be found at http://www.random.org. Flip some virtual coins. Perform 20 flips with an Aurelian coin. Perform 20 flips with a Constatius coin. Perform frequency tables for each result. What conclusion might you draw? Would the result change if you flipped the coins 200 times or 2,000 times?

Student’s responses to the first part will vary. For example, my 20 flips with the Aurelian coin resulted in “heads” 9 times and “tails” 11 times, and the results were 8 and 12, respectively, for the Constatius coin. Hopefully, students will not do this exercise by flipping the coins 200 times or 2,000 times. Rather, they should understand that the mean of the sampling distribution will approach the population mean. Mathematically, this is the assertion of the central-limit theorem, which states: As the sample size increases, the distribution of the mean of a random sample taken from practically any population approaches a normal distribution.

RESEARCH ACTIVITIES

1. [Internet Question] Use an online library service to find basic business research studies that report a “response rate” or number of respondents compared to number of contacts. You may wish to consult journals like the Journal of Business Research or the Journal of Marketing or the Journal of Management. Find at least 25 such studies. What is the average response rate across all of these studies? Do there appear to be any trends or factors that are associated with lower response rates? Write a brief report on your findings.

Students’ responses will vary on this exercise.

CASE 17.1 Pointsec Mobile Technologies

Objectives: To enable students to think critically about sample selection and size.

Summary: Many business people carry mobile devices such as laptop computers and PDAs, often containing valuable data related to their jobs. Pointsec provides security systems to protect such data. Pointsec decided to gather information on the prevalence of such devices left behind on taxis. The company surveyed taxi drivers by calling major taxi companies in nine cities around the world. Each company put the interviewers in touch with about 100 taxi drivers who were asked how many devices had been left in their cab over the preceding six months. The researchers came up with city-by-city numbers. For example, 3.42 cell phones per cab yielded over 85,000 cell phones left behind in Chicago and over 63,000 in London, which was a 71 percent increase over a six-month period four years earlier.

Questions

1. Discuss why the sampling method and sample size make these results questionable, even though the numbers were reported as if they were precise.

The drivers surveyed were not selected randomly. The researchers used a judgment sample to generate this data. Moreover, the sample consisted of 100 taxi drivers from around the world. The researchers did not appear to consider the factors in determining sample size for questions involving means: (1) the variance, or heterogeneity, of the population. One might argue that a taxi driver is a taxi driver no matter what city in the world he or she is driving in, however, the population of taxi drivers around the world is most likely heterogeneous, and as heterogeneity increases, so must sample size; (2) the magnitude of acceptable error; and (3) the confidence level.

2. The simple survey method described in the case may have been sufficient as a way to draw attention to the issue of data security. However, if the company were using data on lost mobile devices to predict demand for a product, accuracy might be more significant. Imagine that you have been asked to collect data on mobile devices left in cabs, and you wish to be able to report results with a 95 percent confidence level. How can you improve the sample design and select an appropriate sample size?

The researcher must follow three steps: (1) estimate the standard deviation of the population, (2) make a judgment about the allowable magnitude of error, and (3) determine a confidence level. One problem is estimating the standard deviation of the population. Ideally, similar studies conducted in the past will give a basis for judging the standard deviation, and it appears that another study was done four years earlier which might be helpful. If not, a pilot study can be conducted to estimate the population parameters so that another, larger sample, of the appropriate size, may be drawn.

275

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

_998908248.unknown

_1206899747.doc

X



X





2

n



1



1148

.

92

13



1



95

.

74



9

.

78468

_991562753.unknown