Brilliant Answer

profilehottboy561
SkillBuilderConfidenceIntervals.docx

Estimating Population Parameters

NOTATIONS: Words in orange represent glossary terms. You can locate the Glossary in Appendix 1.

x-bar symbol or expression used to represent the sample mean.

The x-bar symbol (or expression) to the left may appear as x¯¯ or X¯¯ throughout the text, however, the correct representation of this statistic is that shown on the left.

Introduction

Often, researchers want to estimate the value of a population parameter using information they have obtained from their sample. For example, in the political mass media, you will often see estimates of the proportion of voters who plan to vote for a certain candidate based on data collected from samples.

Point Estimates

One way to estimate a population parameter is to use a point estimate. In a point estimate, we estimate the unknown parameter of interest using a single value (hence the name "point estimation"). The point estimate is typically the value of our sample statistic and represents the best guess about the population parameter. As the following scenario illustrates, this form of inference is quite intuitive.

Scenario: Studying the IQ levels of Students at Smart University (SU)

Suppose that we are interested in studying the IQ levels of students at Smart University (SU). In particular (since IQ level is a quantitative variable), we are interested in estimating μ: the mean IQ level of all the students at SU.

Point estimation is simple and intuitive, but also a bit problematic. That's because if you are using truly random samples, the sample estimate is the point estimate and best-guess of the population parameter’s value. Because of the luck of the draw, however, different random samples will be composed of different elements of the population, and the various samples will have different values for the sample statistic.

The bottom line for point estimates is that with random samples, you can be fairly certain the estimate is close to the population parameter, but the value of the sample statistic provides no indication of how close the sample value is to the population parameter. While point estimates are useful as best guesses, they do not tell us about how far off they may be.

In other words, when we estimate, for instance, μ by the sample mean x¯, we are almost guaranteed to make some kind of error. Even though we know that the values of x¯¯  fall around μ, it is very unlikely that the value of  x¯¯  will fall exactly at μ. The difference between  x¯¯  and μ  is called sampling error

Interval Estimates

Given that such errors are a fact of life for point estimates (by the mere fact that we are basing our estimate on one sample that is a small fraction of the population), these estimates are in themselves of limited usefulness, unless we are able to quantify the extent of the estimation error. Interval estimates address this issue. The idea behind interval estimation is, therefore, to enhance the simple point estimates by supplying information about the size of the error attached. For an interval estimate, we try to determine the possible range of values that are likely to include the population parameter (the mean, for example). In particular, we often use confidence intervals in the social sciences to specify the likelihood that the population parameter is contained within a specified range. An example of a statement you might make that is based on a confidence interval is, “I am 95% confident that the population proportion of voters in favor of my candidate is between .51 and .59.” Confidence intervals can be calculated for a variety of parameters, but in this Skill Builder, we will primarily focus on calculating confidence intervals for means.

Consider the Studying the IQ levels of Students at Smart University (SU) scenario we discussed in the point estimation section.

Interval estimation takes point estimation a step further and says something like:

I am 95% confident that by using the point estimate x¯=115 to estimate μ, I am off by no more than 3 IQ points. In other words, I am 95% confident that μ is within 3 of 115, or between 112 (115 - 3) and 118 (115 + 3).

Yet another way to say the same thing is:

I am 95% confident that μ is somewhere in (or covered by) the interval (112,118).

Note that while point estimation provides just one number as an estimate for μ (115), interval estimation provides a whole interval of "plausible values" for μ (between 112 and 118), and also attaches the level of our confidence that this interval indeed includes the value of μ to our estimation (in our example, 95% confidence). The interval (112,118) is therefore called "a 95% confidence interval for μ."

Calculating a Confidence Interval

The Confidence Interval

In general, a confidence interval is calculated as follows:  point estimate ± margin of error.  That is, we take the point estimate and subtract the margin of error to get the lower value for the confidence interval, and we take the point estimate and add the margin of error to get the upper value for the confidence interval. 

Example: 95% Confidence Interval

So if you are calculating a 95% confidence interval for a mean, for example, you will be able to make a statement that you are 95% confident that the population means is contained in the interval shown in Figure 1.

Remember that the point estimate is the value for the sample statistic that is being used to estimate the parameter. The margin of error corresponds to the likelihood that our interval captures the true population parameter. For instance, for a 95% confidence interval for the mean, we are 95% sure that our estimate will not depart from the true population mean by more than m, the margin of error.

Margin of Error

The margin of error is calculated like this:

margin of error = critical statistic value × standard error of the estimator.

The critical test statistic value that goes into the equation depends on the level of confidence that you choose, e.g., 95%, along with which particular statistic is being used in the equation (e.g., z or t).  The standard error of the estimator is the standard deviation of the sampling distribution of the estimator, often the standard error of the mean.  

Calculating the Confidence Interval for a Mean

Now that we’ve discussed the general form of confidence intervals, let’s take a look at the specific equation used to calculate the confidence interval for a mean as seen in Figure 3.  

Level of Confidence

The Confidence Level

So far, we have focused on 95% confidence intervals.  This is the most commonly used level of confidence, but it is possible to calculate intervals using different levels of confidence, such as 90% or 99%.

Let’s think some more about what our level of confidence indicates.   

· bullet

A 95% level of confidence says that in the long run, about 95% of the intervals you construct will contain the parameter being estimated.

· bullet

A second way of viewing the level of confidence is to think in terms of subjective probability, much like a gambler bets on a sporting event or a meteorologist thinks about the chance of rain tomorrow.  You may correctly notice that researchers typically do not collect many, many samples; they typically collect one sample. 

· bullet

From the gambler’s perspective, a 95% confidence interval says you are more confident that the interval contains the parameter than you are with a 90% confidence interval.

· bullet

Conversely, you will have less confidence that the 95% interval contains the parameter than does a 99% interval.

Scenario: Effectiveness of Teaching Financial Literacy

Recall our IQ example from Module 1 to see how choosing a level of confidence affects the width of the confidence interval.  The IQ level of students at a particular university has an unknown mean (μ) and a known standard deviation σ=15.

This is not very surprising, given that in the 99% interval we multiply the standard deviation by 2.576, in the 95% by 2, and in the 90% only by 1.645. Beyond this numerical explanation, there is a very clear intuitive explanation and an important implication of this result.

Let's start with an intuitive explanation. The more certain I want to be that the interval contains the value of μ, the more plausible values the interval needs to include in order to account for that extra certainty.

· bullet

I am 95% certain that the value of μ is one of the values in the interval (112,118). In order to be 99% certain that one of the values in the interval is the value of μ, I need to include more values and thus provide a wider confidence interval.

· bullet

In our example, the wider 99% confidence interval (111, 119) gives us a less precise estimation about the value of μ than the narrower 90% confidence interval (112.5, 117.5), because of the smaller interval 'narrows-in' on the plausible values of μ.

The important practical implication here is that researchers must decide whether they prefer to state their results with a higher level of confidence or produce a more precise interval. In other words, there is a trade-off between the level of confidence and the precision with which the parameter is estimated.

The price we have to pay for a higher level of confidence is that the unknown population mean will be estimated with less precision (i.e., with a wider confidence interval). If we would like to estimate μ with more precision (i.e., a narrower confidence interval), we will need to sacrifice and report an interval with a lower level of confidence.

Sample Size

Sample Size

In estimating µ, the second element of the margin of error besides that of z is the standard deviation of the sampling distribution for X¯¯ :  (i.e., the standard error of the mean). Especially pay attention to the fact that the standard error of the mean depends upon n, the size of the random sample being used to calculate a value for X¯¯.

Figures 2 -5 below show the sampling distribution for  X¯¯ for various sample sizes. In each case, the population is normal and has a mean of zero and a standard deviation of 1.0. 

Use < and > icons to navigate the slides. Notice that with increasing sample size, the sampling distribution is said to be more “packed” around the population parameter ( μ, which equals zero in these examples).

Each of these three distributions is considered normal with a mean of zero, but the standard errors of the mean vary across the distributions.