Brilliant Answer

profilehottboy561
SkillBuilderSamplingDistributions.docx

Skill Builder 10: Sampling Distributions

Key Characteristics of a Sampling Distribution

Introduction

Recall that a goal of quantitative research is to make generalizations about a population using the information that has been obtained from a sample. In other words, researchers use sample statistics to draw conclusions about population parameters.

For example, Afrobarometer (http://www.afrobarometer.org/) is from a network of researchers who collect data from over 35 countries in Africa using public attitude surveys on democracy, governance, economic conditions, and related issues. In the Afrobarometer surveys, researchers work to make generalizations about people in a specific country in Africa, or all of Africa, based on a subset of the population. They use statistics to estimate parameters (e.g., the average response to an item on a survey) for certain African countries or for Africa as a whole. To use another example, let’s say an operations manager is concerned about the life of all of the light bulbs being produced at her facility. She views all of the potential items a process may produce as a population (i.e., the population of light bulbs) and may use statistical analysis on a subset of light bulbs (i.e., a sample) to draw conclusions about all of the light bulbs.

Distribution of All Possible Values

A critical theoretical characteristic of a sample is that the luck of the draw has chosen an element included in the population sample. Different samples from the same population are therefore expected to have different elements and will, consequently, generate various statistics (e.g., different sample means). This means that the statistics that you obtain based on your data are likely to differ from the actual population parameter; that is, you will have sampling error in your study. 

For instance, if you have collected data on a sample of 50 college students and have obtained a mean university satisfaction score for the students, the mean that you observe in your sample is likely to differ from the actual, true population mean. Thus, even though you will typically draw just one sample (size n) when conducting your research, you need to know about the distribution of all possible values for the statistic. Knowing this can help you to understand how close your observed sample statistic is to the “true” population parameter and can help you to make inferences from your sample to the population. 

Sampling Distribution

A sampling distribution tells you about the distribution of values for a statistic. That is, if you draw many, many random samples of size n from the population, the distribution of the values for the statistic is the sampling distribution. Recall that simple random sampling means that each unit in the population has an equal chance of being selected into the sample. This type of sampling allows us to create sampling distributions and to understand how our sample might differ from the population. (Even though samples are rarely truly random, statistical models are typically based on the assumption that a random sample has been collected.) 

All statistics (e.g., standard deviations, variances, correlation coefficients) have sampling distributions. Nevertheless, the one most often studied is the sampling distribution of the sample mean, X¯¯. The mean is used most often because researchers frequently want to know, “what is the value of a variable for a typical case?” Statisticians have developed many models using the mean.

The aim of this Skill Builder is to show how sample information is related to the population. In particular, the relationship between the sample mean, X¯¯, and the population mean, µ, will be examined. Note that this Skill Builder discusses examples in which it is presumed that the population is quite large. An assumption has also been made that the value of σ, the population standard deviation, is known exactly. (In practice, σ is often estimated and is not known exactly.)

Sample and Population Distributions

Sample and Population Distributions

There are three distributions you will need to understand to make statistical inferences.

· 1

1

Distribution of a Population

· 2

2

Distribution of a Sample

· 3

3

Sampling Distribution

Distribution of a Population

Distribution of a population is the distribution of all values for all elements of the population. For example, you want to study a characteristic of a population and have identified a variable X that allows you to measure the characteristic. The variable X  will be a measure of some characteristics of the entities in the population and will have a distribution that depends upon the frequency of the various values in the population. Usually, you will not be able to see all of the elements in the population but will use sample information to make inferences about the population.    

Distribution of a Sample

The second distribution you will need to understand is the distribution of a sample. In contrast to the population, the sample will consist of actual observations based on the data that you collect. You can easily make a table or histogram to display the distribution of X for the sample. Theoretically, there will be many possible samples with observations, but usually, you will collect one sample. Because of your interest in making a statement about the population mean, µ, you will often report the sample mean, X¯¯, for your sample.

Sampling Distribution for X¯¯

The third distribution you need to understand is a sampling distribution. While you will typically collect only one sample with n observations or elements, what would happen if all possible samples of size n were selected from the population? A sampling distribution is a distribution of a sample statistic. For example, a sampling distribution could be generated by drawing all possible samples of size n from the population, calculating the mean for each of those samples, and then plotting the frequency for each mean value that we obtained. For some students, the idea of repeated sampling is difficult because they only plan to select one sample. Nevertheless, you need to consider what would happen if you could perform repeated sampling.

The following link shows an illustration of repeat sampling that may help you see that different samples result in different values for a statistic: http://onlinestatbook.com/stat_sim/sampling_dist/

Variables: X  and X¯¯

There are two variables to be concerned with, X  and X¯¯ (sample mean) when making a statement about the population parameter using sample data.  Think of X  as the variable representing a real-world construct or measurement, such as IQ or responses to a question on a survey, and recall that X¯¯ is the sample mean. In the process of making a statement about the population mean, µ, you need to be concerned with both X  and X¯¯

The variable has a distribution in the population: recall the population distribution of IQ or the distribution of responses to an item on a survey questionnaire. Moreover, for each sample that you could draw from the population, you can construct a distribution for X

Most importantly, though, the variable X¯¯ has a theoretical distribution that is usually never observed in actual research but can often be determined theoretically. Each sample that could be drawn from the population will provide only one value to be used in determining the distribution of X¯¯.

Sampling Distributions for Sample Means

In general, as stated above, a sampling distribution is the theoretical distribution for a sample statistic based on simple random samples of size n. In particular, sampling distributions for sample means have extremely useful relationships to the population from which the samples could be drawn.

· 1

1

The population mean, µ, of the variable,  X, has the same value as the mean of the sampling distribution of sample means, often denoted as μx. 

· 2

2

The standard deviation of the sampling distribution for sample means denoted as σx, is equal to the population standard deviation, σ, divided by the square root of n. The formula is σx=σn. 

· 3

3

If the population has a normal distribution, the sampling distribution of sample means also has a normal distribution. 

· 4

4

Regardless of the distribution of the population, if the sample size is relatively large, the sampling distribution of sample means is close to normal. One rule of thumb is that large means n is greater than 30, but more generally, the larger the sample, the closer the distribution of sample means is to a normal distribution.

The last point summarizes the Central Limit Theorem or, alternatively, the Law of Large Numbers, one of the most important concepts in all of the statistics. The Central Limit Theorem states that regardless of the distribution of the population if the sample size is relatively large (a rule of thumb is n > 30), the sampling distribution of sample means is close to normal. 

Probability and Sampling Error

Characteristics of a Sampling Distribution

Even though you will usually never know exactly the value of µ, with large random samples you will have a good idea of the size of the sampling error. As discussed earlier, sampling error is the absolute value of a statistic minus the parameter being estimated.  For now, focus on the equation in Figure 1. Theoretically, this equation tells you how far the sample mean is from the population mean, but usually, you will never be able to evaluate the equation unless you do a complete census of the population. 

This information about the sampling distribution helps researchers to determine how accurate their sample statistics are as estimators of population parameters; that is, this information gives researchers an idea of the degree of sampling error in their study and can tell them something about the probability of obtaining certain values for sample statistics (e.g., sample means).

Using probabilities based on the standard normal curve, then, we can examine how likely it is to obtain sample means within a specified range of the population mean and can assess how accurate our sample mean is compared to the actual population mean.

Notice, also, that there are two horizontal scales. One is associated with the standard normal distribution, with values called Z values. While they theoretically range from - ∞ to + ∞ , the values shown range from -3 to + 3.