Data Collection Plan Homework

profileEva Chan
Samplingv1.ppt

© 2012 International Institute for Learning, Inc.

DS-624 Quality Management

© 2014 International Institute for Learning, Inc.

3-*

IIL-LSSGB

Learning Objectives

Articulate why sampling is important

Describe Sampling advantages, limitations, and biases

List different types of Sampling strategies

Understand how to calculate sample size

Measure Phase

COPQ

Data

Collection

Plan

MSA

Capability

Define

We

have a problem

Measure

How bad is it?

Analyze

Find the

Root Cause

Improve

Fix it- Eliminate

Root Cause

Control

Make it

stay fixed

Measure Phase

Data

Collection

Plan

Sampling or not Sampling

Lean Six Sigma Green Belt

Data Collection

Scatter plots provide a visual display of the relationship between two variables, showing how one variable increases or decreases as another variable increases or decreases. If changes in one variable are linked to changes in another, they are said to be correlated. Correlation may provide some insight into a possible cause and effect relationship between two variables, although correlation alone does not prove a causal relationship. It is the foundation for a more complete correlation analysis.

4-01-*

IIL-LSSGB

What is Sampling?

Sampling is the process of identifying a smaller group that will be used to represent the whole

Entire Populationz

Sample

Lean Six Sigma Green Belt

Data Collection

Most data collection requires sampling. Why? As the slide says:

Impractical or expensive to collect all of the data

It is important that you understand how to accurately pull sample data from a larger population.

4-01-*

IIL-LSSGB

Sampling: Advantages and Limitations?

Advantages:

Impractical or expensive to collect all the data

Valid conclusion can be made from relatively small amount of carefully collected data

The sample must be “representative” of the total population

Limitations:

Statistics based on a sample may be more uncertain

Two types of errors may occur:

Sampling Bias

Random Error

Lean Six Sigma Green Belt

Data Collection

Most data collection requires sampling. Why? As the slide says:

Impractical or expensive to collect all of the data

It is important that you understand how to accurately pull sample data from a larger population.

4-01-*

IIL-LSSGB

Sampling Bias

A systematical bias in the way the sample is collected influences the results.

Sampling bias will not be identified by advanced statistical tools

The most common types of sampling bias are:

Convenience sampling – collecting data during times or in places that are easier to collect

Judgment sampling – making educated guesses about which items or people should be sampled

They must be avoided by choosing a valid sampling strategy

Lean Six Sigma Green Belt

Data Collection

4-01-*

IIL-LSSGB

Valid Sampling Strategies

Random sampling

Systematic sampling

Stratified sampling

A combination of the above

Lean Six Sigma Green Belt

Data Collection

These are the main valid sampling strategies. You’ll want to determine which one works best for your project and the data you need to collect. The best strategy is random sampling, but since this is not always possible, you might go with systematic sampling, stratified sampling, or a combination of those.

4-01-*

IIL-LSSGB

Random and Systematic Sampling

Random sampling:

Every item in a population or process has an equal chance of being measured

Involves usually assigning computer-generated random numbers

Is the best strategy, but not always possible.

Systematic or sequential sampling:

Taking data at certain intervals

Every 5 minutes or every 10th claim

Lean Six Sigma Green Belt

Data Collection

4-01-*

IIL-LSSGB

Stratified Sampling

Stratified sampling

The population is divided into different layers by a stratification factor

The overall result can be biased if the proportion of samples does not reflect the relative frequencies in the population.

Customer Type 1 (small)

Customer Type (large)

Take a sample from each layer

Can be used to:

Assure that every layer has enough data

Focus in specific layers and reduce overall sample size

Lean Six Sigma Green Belt

Data Collection

4-01-*

IIL-LSSGB

Sampling Discussion

We will randomly review 50 tech support issues from the last 6 months to measure cycle time and accuracy of support.

To calculate the defective invoices processed in our company, we’ll watch the accounting staff process invoices between 4-5pm for the next two weeks, so we don’t bother them during their peak hours.

To investigate the reasons for rejected customer applications, we’ll look at data collected 3 years prior that represents a random sample of 300 customer applications from that year.

To calculate the average weight of the male employee in our company, we’ll weigh all men that happen to be working out in a company gym at 6:30 am for one week.

Lean Six Sigma Green Belt

Data Collection

Read the four scenarios and make your notes as to why this will or will not produce a good sample. Then we’ll discuss as a class.

4-01-*

IIL-LSSGB

Sample Size

Factors in Sample Size Selection

Data type: Continuous or Discrete

Objectives: What you will do with results

Familiarity: How much you already know about the items to be sampled

Certainty: How much risk you’re willing to take in your conclusions being wrong (this calculation is taken care of for this level of sampling)

This list indicates some of the critical issues that will impact your sampling plans.

FAMILIARITY is important because of a challenging Catch 22: to accurately determine the right sample size, you need to KNOW something about the data already!

Because of this, sampling decisions often start with guesswork and progress to more reliable, data-driven sampling plans as your familiarity with the data increases.

Lean Six Sigma Green Belt

Data Collection

4-01-*

IIL-LSSGB

Sampling Size

Continuous data:

Need to know s (Standard Deviation)

Need to know d (Desired Precision)

Discrete data:

Need to know P (Proportion)

Need to know d (Desired Precision)

“Pre-sample” at least 30 units and calculate s (watch out for biases)

“Pre-sample” until you have captured at least 5 items you’re looking for (e.g., defects*), then calculate P

Continuous data requires an estimate of the “standard deviation” of the variable being measured —noted s.

Discrete data requires an estimate of P, the proportion of the population that contains the characteristic in question.

As noted here, you can either guesstimate the value of s or P or, you may run a small initial sample to determine the initial value of s or P to use in determining your sample. After you’ve collected data, calculate the standard deviation or proportion from the actual data to determine how close your guesstimate was—and then refine your sampling plan.

The assumption is that what you are looking for occurs less than 50% of the time

If the sample size based on the new estimated variation increases more than about 10%, you should probably re-sample to ensure your data is truly representative.

On the other hand, you may find that with your initial guesstimates you have actually collected more than enough data. But that’s okay, more data can’t hurt you!

If you’re really unsure, you can always default to P = .5, which is the most conservative case.

Lean Six Sigma Green Belt

Data Collection

© 2014 International Institute for Learning, Inc.

4-01-*

IIL-LSSGB

Learning Objectives

Articulate why sampling is important

Describe Sampling advantages, limitations, and biases

List different types of Sampling strategies

Understand how to calculate sample size

© 2012 International Institute for Learning, Inc.

Thank you!

© 2014 International Institute for Learning, Inc.

3-*

IIL-LSSGB