Rch 5301

UnitIIStudyGuide.pdf

RCH 5301, Research Design and Methods 1

Course Learning Outcomes for Unit II At the end of this unit, you should be able to:

4. Apply research methods within a research design. 4.1 Explain the relationship of population, sampling frame, sampling design, and

generalization. Required Unit Resources Chapter 2: Review of Literature (ULO 4.1) Chapter 8: Quantitative Methods (ULO 4.1) Read the following sections from this chapter:

• Introduction • Putting Quantitative Research in Context • Quantitative Designs

Video: Discovering Math: Advanced—Statistics and Data Analysis: Part 2 (ULO 4.1) Watch the following segments to learn more about data analysis. The transcript for this video can be found by clicking on “Transcript” in the gray bar to the right of the video in the Films on Demand database.

• The Statistics of Sampling, or By Any Means” (Segment 9) (2 minutes) • “Sampling Distributions—Elephant Ages” (Segment 10) (2 minutes) • “Central Limit Theorem—Giraffe Graphs” (Segment 11) (2 minutes) • “Confidence Intervals— Running Cheetahs” (Segment 12) (3 minutes) • “Sampling Techniques” (Segment 13) (2 minutes) • “Uncertainty in Statistical Inferences” (Segment 17) (2 minutes) • “Sampling Error and Confidence Interval—High School Completion” (Segment 18) (2 minutes) • “Means— Average Salaries and Education” (Segment 20) (4 minutes)

Video: Confidence Intervals: Against All Odds—Inside Statistics (ULO 4.1) This video touches on many of this unit’s concepts, such as target population, random samples, confidence intervals, and solving business problems. The transcript for this video can be found by clicking on “Transcript” in the gray bar to the right of the video in the Films on Demand database. (10 minutes). Unit Lesson Lesson: Sampling and Statistical Inference: Leave the Crystal Ball at Home (ULO 4.1)

Populations, Samples, and Statistical Inference Positivist researchers share the epistemological assumption that knowledge is achieved through observation and measurement of phenomena using a quantitative research methodological strategy. Measuring and quantifying phenomena can be valuable as an end, such as in descriptive statistics; however, the true advantage of using quantitative data is in its ability to be used for statistical inference. In quantitative research, statistical inference refers to being able to draw conclusions about a population of interest from analyzing sample data selected from the population. Making inferences about a population is also known as generalizing results.

UNIT II STUDY GUIDE Sampling, Confidence Intervals, and Inferential Statistics

RCH 5301, Research Design and Methods 2

UNIT x STUDY GUIDE Title

(Adapted from Zaman, n.d.)

A population can include virtually anything that can be counted and measured, including human participants, objects, and events, and there is no limit to the industries that can apply inferential statistics, including business and finance, marketing, law, medicine, health and safety, psychology and sociology, manufacturing, logistics, technology, education, energy, and the list goes on. Likewise, the application of inferential statistics is endless. For example, sampling can be used to make inferences about the average job satisfaction level of a population of employees, to make inferences about the percentage of defects in a population of manufactured units, to make inferences about average achievement test scores in a population of elementary school students, and to make inferences about the relationship between accident victim fatality rates and years of experience of first responders. Two of the primary research methods used to facilitate statistical inference in business and social science research are sampling and surveying. While immersed in this content on sampling, keep in mind that surveying is the most ubiquitous quantitative method used to collect sample data today. Surveying and sampling are used because, in many cases, it is simply not feasible from a cost or time perspective to gather data from every participant in the population of interest. If that were possible, it would be called a census. Rather, the use of sample data is an efficient and cost-effective alternative to make inferences about population means, proportions, and relationships between variables. An example of a population mean is the average antibody response to a new medicine. An example of a population proportion is the percentage of retirement savings for workers who participate in employer-sponsored savings plans. An example of a relationship between variables would be the correlation of reported cases of cannabis psychosis and age of user. This discussion will focus on the use of samples and confidence intervals to make inferences about population means.

Sampling Error: Random Error Versus Systematic Error While sampling can be an efficient and cost-effective alternative to using a census, it is not perfect. A sample mean is unlikely to be an exact representative of the population mean. The sample mean could be very close to the population mean or not close at all. The variance between the true population mean and the sampling mean is due to sampling error. Sampling error takes two forms, which are random error and systematic error.

RCH 5301, Research Design and Methods 3

UNIT x STUDY GUIDE Title

Random error, as its name suggests, occurs randomly. The sample mean may not truly represent the population mean due to factors like natural variations and individual differences in the cases selected for the sample or simply due to pure chance. Random error is unpredictable, so each measurement in the sample clusters around the true population mean and moves in both directions, both above and below, as shown in the graphic below. Random error can be mitigated, but not eliminated completely, by increasing sample size (Zikmund et al., 2013). Systematic error is generally non-random and occurs through deficiencies and errors in the process of sampling rather than the sample itself. For example, systematic error can occur through researcher errors, such as poor survey construction, analytical errors, and bias. As a result, systematic error is predictable, and each measurement will move in the same direction relative to the true population mean, either above or below, as shown in the graphic. Systematic error is controllable and, therefore, can be substantially reduced through improving systematic processes, such as recalibrating measurement equipment and pilot testing surveys, but it cannot be mitigated by increasing sample size (Zikmund et al., 2013).

(Adapted from The Office of Research Integrity, n.d.)

This discussion of sampling error highlights the imperfection and limitations of sampling. While systematic error can be substantially eliminated by a conscientious researcher, random error will always present a challenge to predicting population means. Fortunately, sample size and the central limit theorem (discussed later in this lesson) can help the researcher minimize random error.

Statistical Inference Using Confidence Intervals Although sampling error will always be a factor in using samples to draw conclusions about population means, the power of statistical inference is in its ability to estimate the true values of population means, even in the face of random error. One of the most common methods used to make inferences about population means is confidence intervals. A confidence interval uses a sample to create a range of mean values in which the population mean is likely to fall. For example, a sample of emergency calls into a 911 call center may provide an average length of 4.2 minutes per call. The following confidence intervals can be calculated from this sample: 68.27% C.I. = 3.8 minutes to 4.6 minutes 95% C.I. = 3.416 minutes to 4.984 minutes 95.45% C.I. = 3.4 minutes to 5.0 minutes 99.73% C.I. = 3.0 minutes to 5.4 minutes

RCH 5301, Research Design and Methods 4

UNIT x STUDY GUIDE Title

The first confidence interval indicates, with 68.27% confidence, that the population mean lies between 3.8 and 4.6 minutes. The last confidence interval indicates, with 99.73% confidence, that the population mean lies between 3.0 and 5.4. This is useful information, but it is also helpful to have a basic understanding of how these statistics were derived. The first point to understand is that the confidence interval represents the area of probability in a normally distributed data set. The confidence intervals above were used for a very specific reason. A confidence interval of 68.27 corresponds to +-1 standard deviation, a confidence interval of 95.45 corresponds to +-2 standard deviations, and a confidence interval of 99.73 corresponds to +-3 standard deviations. This is represented in the distribution graphic below.

(Weaver, 2016)

Researchers customarily use the 95% confidence interval when estimating population means from sample data. Using a lower confidence interval, such as 68.25%, will give a tighter range in which the population mean is estimated to fall, but the confidence is not sufficiently large. A higher confidence interval, such as 99.73%, will give a wider range in which the population mean is estimated to fall, but the confidence is so large that it is not useful in predicting the population parameter (Cooper & Schindler, 2014). The second aspect of confidence intervals that is important to understand is a statistic called the standard error of the mean (often referred to as the standard error or SE) and something called the Z-score. The standard error of the mean can be thought of as the standard deviation of the sample data; more accurately, SE is the standard deviation of the sampling distribution, which will be discussed below. As a side note, standard deviation (σ) pertains to population data and is called a parameter. The standard error of the mean pertains to sample data and is called a statistic. Parameters describe population data, while statistics describe sample data. The SE is estimated from the sample standard deviation (s). Some very intelligent statisticians determined how to estimate the SE; otherwise, it would be necessary to draw a large number of individual samples to create a sampling distribution to calculate the SE. This is discussed further in the section below on the Central Limit Theorem. It is not necessary to know how SE is calculated. It is more important for practitioners to understand how to use this statistic rather than how to calculate it (Field, 2005; Zikmund et al., 2013). The Z-score, or standard score, is also integral to calculating confidence intervals. A Z-score uses standard deviation to compare a sample mean (�̅�𝑥) to a single data value. A Z-score of 1 indicates that a single data value is one standard deviation from �̅�𝑥. The Z-score will be positive if the single data value is above �̅�𝑥, and it will be negative if it lies below �̅�𝑥. Z-scores can be found by referring to a Z-score table or using a Z-score calculator. Both can be found on the internet. In the example below, a 95% confidence level equates to a Z- score of 1.96 or 1.96 standard deviations. As mentioned above, researchers usually use a 95% confidence interval, so a Z-Score of 1.96 would always be used. The other examples were included to provide an

RCH 5301, Research Design and Methods 5

UNIT x STUDY GUIDE Title

understanding of the area under a normal curve that is included for the respective confidence intervals (see bell curve above) and to show the application of the appropriate Z-score based on the confidence interval used (Brase & Brase, 2013). The following will show how the standard error of the mean and Z-scores were used in the 911 call center scenario mentioned above. �̅�𝑥 - [(Z) (.4 SE)] ≤ µ ≤ �̅�𝑥 + [(Z) (.4 SE)] = Confidence Interval �̅�𝑥 = Sample Mean = 4.2 minutes Z = Z Score (Standard Deviation) SE = Standard Error of the Mean = .4 minutes µ = Population Mean 68.27% C.I. = 3.8 minutes to 4.6 minutes

4.2 minutes – [(1) x (.4)] = 3.8 minutes ≤ µ ≤ 4.2 minutes + [(1) x (.4)] = 4.6 minutes 95% C.I. = 3.416 minutes to 4.984 minutes

4.2 minutes – [(1.96) x (.4)] = 3.416 minutes ≤ µ ≤ 4.2 minutes + [(1.96) x (.4)] = 4.984 minutes 95.45% C.I. = 3.4 minutes to 5.0 minutes

4.2 minutes – [(2) x (.4)] = 3.4 minutes ≤ µ ≤ 4.2 minutes + [(2) x (.4)] = 5.0 minutes 99.73% C.I. = 3.0 minutes to 5.4 minutes

4.2 minutes – [(3) x (.4)] = 3.0 minutes ≤ µ ≤ 4.2 minutes + [(3) x (.4)] = 5.4 minutes Precise confidence intervals and Z-scores (standard deviations) were used in the calculations above. For example, a 95% confidence interval equates to a Z-score of 1.96 standard deviations. This means that +-1.96 standard deviations from the mean represent 95% of the area under a normal curve. Compare this to the 95.45% confidence interval that equates to a Z-score of 2 standard deviations, representing 95.45% of the area under a normal curve. While the Z-scores for the 95% and 95.45% confidence intervals are very close, researchers should use the precise Z-score (standard deviation) of 1.96, rather than 2, when calculating a 95% confidence interval. This warning is provided to help allay any confusion since standard deviations and intervals under the normal curve are often simplified in usage. This simplification is referred to as the empirical rule, the 68-95-99.7 rule, or the three-sigma rule. For example, 1.96 standard deviations are often rounded to two standard deviations to represent 95% of the area under a normal curve. This is fine for quick forecasting, but precision is recommended for high stakes calculations (Brase & Brase, 2013; Salkind, 2009). The last point that is important to understand is that the confidence interval range depends on both sample size and the dispersion of the sample’s data values around the mean. The precision of a confidence interval may be increased by increasing the size of the sample. As Bell et al. (2022) comprehensively explain, there is an inverse relationship between sample size and sampling error. As sample size increases, the sampling error will decrease, and the confidence interval will become tighter and more precise. Gains in precision are clearly evident up to sample sizes of around 1,000 cases. After this, precision gains become much less noticeable. It should also be noted that it is an absolute—rather than relative—sample size that is important, regardless of the size of the population of interest. This means that a researcher should not be concerned about taking a certain percentage of the population as a sample. There is little benefit of samples larger than 1,000, regardless of whether the population size is 1 million or 100 million. A larger dispersion of the data values around the sample mean will result in less confidence that the sample mean is a better representative of the population mean than a smaller dispersion of data values around the sample mean. For example, assume two samples each had a sample mean of 4.2 minutes of average wait times. Sample 1 has a range of values from .25 minutes to 15 minutes, while Sample 2 has a range of values from 4 minutes to 5 minutes. There would be more confidence that Sample 2’s mean is representative of the population mean due to the smaller dispersion of data values, which is reflected in a smaller standard error.

RCH 5301, Research Design and Methods 6

UNIT x STUDY GUIDE Title

Figure 1 (Adapted from The University of Florida, n.d.)

Central Limit Theorem

In the discussion above about sampling error, it was noted that sampling is not perfect. If Sample 1 is from the population, its mean is unlikely to be an exact representative of the population mean. Additionally, if Sample 2 is drawn from the same population, its mean would likely not represent the population mean or the Sample 1 mean. If additional samples are drawn from the same population, it is very likely that the means from each sample would vary. This begs the question, how well does a single sample represent the population? One way to address this conundrum would be to continue to take many individual samples from the population. In theory, hundreds or thousands of samples would be taken from the population. For each sample, a sample mean (�̅�𝑥) would be calculated, resulting in a collection of many sample means. Remarkably, plotting the sample means in a histogram would result in a normal bell-shaped curve, called a sampling distribution, even if the population data is not normally distributed! This is beneficial since the distribution of the population is often unknown or may be known to be irregular. The central limit theorem says that for any population distribution at all, regardless of its shape, the sampling distribution from that population will be bell-shaped as long as the sample size is sufficiently large. Statisticians have determined that a sample size of n > 30 is sufficient for the central limit theorem to apply. If the population data is close to being normally distributed, then smaller sample sizes will result in normal distributions of sample means. If, however, the population distribution is far from normal, then larger sample sizes will be needed to ensure a normal distribution of sample means (Norusis, 2008). In addition to the sampling distribution exhibiting normality, the central limit theorem states that the average of the sample means would be a very close, near exact estimate of the population mean and would draw closer as the sample size is increased. Fortunately, as mentioned above, the standard deviation of a single sample can be used to estimate the standard error of the mean. This alleviates the impracticality of taking many samples from the same population to create a sampling distribution. A small standard error indicates that the sample mean is a close estimate of the population mean. Conversely, a large standard error would indicate that the sample mean is not a close estimate of the population mean. In short, a small standard error is good; a large standard error—not so much (Norusis, 2008).

RCH 5301, Research Design and Methods 7

UNIT x STUDY GUIDE Title

In Closing This unit presented the basics of sampling, including populations, samples, statistical inference and generalization, sampling error, and confidence intervals. One simple but important takeaway from this unit should be that as sample size increases, sample standard deviation decreases, resulting in the sample mean becoming a better estimate of the population mean. Additionally, it is important to remember that sample size is absolute rather than relative, and after n = 1,000, the benefits of sample size begin to diminish. Therefore, a sample of n = 1,000 would be an optimal sample size for a population that is 1 million or 50 million. Unit III will explore the topics of structured interviews, surveys, and questionnaires. These are the data collection methods that work together with sampling. Later units will return to the idea of using inferential statistics for hypothesis testing.

References Annenberg Learner. (2013). Confidence intervals: Against all odds—Inside statistics [Video]. Films on

Demand. https://libraryresources.columbiasouthern.edu/login?auth=CAS&url=https://fod.infobase.com/PortalPl aylists.aspx?wID=273866&xtid=111543

Bell, E., Bryman, A., & Harley, B. (2022). Business research methods (6th ed.). Oxford University Press.

https://online.vitalsource.com/#/books/9780192640505 Brase, C. H., & Brase, C. P. (2013). Understanding basic statistics (6th ed.). Cengage Learning. Cooper, D. R., & Schindler, P. S. (2014). Business research methods (12th ed.). McGraw-Hill. Creswell, J. W., & Creswell, J. D. (2022). Research design: Qualitative, quantitative, and mixed methods

approaches (6th ed.). SAGE. https://online.vitalsource.com/#/books/9781071817964 Discovery Education. (2007). Discovering math: Advanced—Statistics and data analysis: Part 2 [Video]. Films

on Demand. https://libraryresources.columbiasouthern.edu/login?auth=CAS&url=https://fod.infobase.com/PortalPl aylists.aspx?wID=273866&xtid=117915

Field, A. (2005). Discovering stats using SPSS (2nd ed.). SAGE. Norusis, M. J. (2008). SPSS 16.0 guide to data analysis. Prentice Hall. The Office of Research Integrity. (n.d.) Module 4: Method of information collection – Section 2:3 [Graphic].

https://ori.hhs.gov/node/1228/printable/print Salkind, N. J. (2009). Exploring research (7th ed.). Pearson Education. The University of Florida. (n.d.). Estimation [Graphic]. https://bolt.mph.ufl.edu/6050-6052/unit-4/module-11/ Weaver, P. (2016, July 19). Can you use standard deviation in project management? [Chart]. Project

Manager. https://projectmanager.com.au/can-you-use-standard-deviation-in-project-management/ Zaman, E. (n.d.) What is the difference between population and sample? [Graphic]. BBA Lectures.

https://www.bbalectures.com/what-is-the-difference-between-population-and-sample/ Zikmund, W. G., Babin, B. J., Carr, J. C., & Griffin, M. (2013). Business research methods (9th ed.). Cengage

Learning.

ResearchDesignandMethodsRCH5301UnitIIDB.docx

2

Research Design and Methods RCH 5301

Unit II DB

Describe a quantitative research project you would be interested in starting now or in the near future. Explain the problem and why this would be important for you to study. Why is a quantitative methodology an appropriate research strategy for your project? Then, select a peer post and evaluate the appropriateness of using a quantitative research strategy for their proposed study. Discuss challenges they might encounter using a quantitative methodology.

It should be at least 300 words in length.

It should include at least one APA-formatted scholarly, professional, or textbook reference with accompanying in-text citation to support any paraphrased, summarized, or quoted material.