SSG120_SupplementalMaterials_Module61.pdf

Six Sigma Green Belt Book 6 | Module 6

©2020 Bisk Education, Inc. and Villanova University. All rights reserved.

No part of this document may be reproduced in any form or by any electronic or mechanical means,

including information storage and retrieval systems, without written permission from the copyright owner.

Company, product, and service names used herein may be trademarks of their respective owners and

are used in an editorial fashion with no intention of infringement of the respective owner’s trademark

rights. The information in this study guide is distributed on an “as is” basis, without warranty. Neither

the copyright owner nor the author(s) shall have any liability to any person or entity with respect to any

actual or alleged damage caused by the information contained herein.

Permission to print this document is limited to one copy per student.

Six Sigma Green Belt | 3

Table of Contents

Module 6

Introduction ������������������������������������������������������������������������������������������������������ 4

Objectives ��������������������������������������������������������������������������������������������������������� 4

Assignment Checklist ����������������������������������������������������������������������������������������� 4

Introduction to the Analyze Phase �������������������������������������������������������������������� 5

Root Cause Analysis ��������������������������������������������������������������������������������������� 6

Multi-Vari Charts �������������������������������������������������������������������������������������������� 9

Tollgate Review: Potential Root Causes ����������������������������������������������������������� 11

How to Narrow Root Causes �������������������������������������������������������������������������� 11

Tollgate Review: Narrow Root Causes ������������������������������������������������������������� 12

Z-Scores ������������������������������������������������������������������������������������������������������ 12

Hypothesis Testing: Key Steps ����������������������������������������������������������������������� 17

Hypothesis Testing: Type I and II Errors ����������������������������������������������������������� 18

One-Sample t Test ���������������������������������������������������������������������������������������� 18

Two-Sample t Test ���������������������������������������������������������������������������������������� 21

Paired t Test ������������������������������������������������������������������������������������������������ 24

Rational Subgroups �������������������������������������������������������������������������������������� 26

Constructing Control Charts ��������������������������������������������������������������������������� 28

Tollgate Review: Critical Root Causes �������������������������������������������������������������� 36

Conclusion to the Analyze Phase ������������������������������������������������������������������� 36

Six Sigma Green Belt | 4

Module 6 Introduction The Analyze phase of DMAIC will be the focus of Week 6� You will be introduced to multi-vari charts

and discuss the importance of root cause analysis� You will be taught how to narrow root causes, how

to calculate Z-scores, and the key steps to hypothesis testing� This week also covers t-tests, rational

subgroups, and the construction of control charts� Throughout the week, you will also receive insight

into the tollgate review meetings conducted during this phase�

Objectives • Create a stem and leaf plot�

• Calculate measures of central tendency with downtime data�

• Calculate measures of dispersion with downtime data�

Assignment Checklist � ____________________________________

� ____________________________________

� ____________________________________

� ____________________________________

� ____________________________________

� ____________________________________

� ____________________________________

� ____________________________________

� ____________________________________

� ____________________________________

� ____________________________________

� ____________________________________

� ____________________________________

� ____________________________________

Six Sigma Green Belt | 5

Introduction to the Analyze Phase The Analyze Phase is where the potential cause of problems are identified, current processes are

analyzed, relationships between inputs, processes, and outputs are identified, and data analysis is

carried out�

The Tollgates of Analyze

The three tollgates of the Analyze Phase are:

1� Potential root causes�

2� Narrow root causes�

3� Critical root causes�

Tools helpful in identifying all possible root causes include:

• Swim Lane Diagrams�

• Cause and Effect Diagrams�

• Pareto Charts�

• Brainstorming�

Data-driven exercises are used to narrow down that list:

• Calculating Z-values (to further define relevance of an issue)�

• Statistical significance of an issue by:

- T tests�

- F tests�

- Chi Square analysis�

Critical root causes should be approximately 1–3 issues that are the focus of the Improve Phase�

Six Sigma Green Belt | 6

Root Cause Analysis

Introduction

Root cause analysis is a tool to understand the underlying causes of a problem – not the one cause�

• The leaves are the symptom of the problem or the weed; it is above the surface and very obvious�

• The root is the underlying cause; it is below the surface and is not obvious�

Variation

Variation is the number-one thing to eliminate when brainstorming and fixing a problem� Special cause

variation must be reduced with root cause analysis (RCA)�

• The problem must be understood thoroughly in order to ensure that a permanent fix is put in place

instead of a quick fix�

• RCA is the process of finding and eliminating the cause which would prevent the problem from

returning�

Short-Term Fixes

RCAs seek the true cause of the problem, but most people engage in short-term fixes� Examples of

short-term fixes include:

• Guards put in place in manufacturing lines�

• Buckets put in place to eliminate waste�

Additionally:

• The problem still occurs with short-term fixes because measures are only put in place to try to

eliminate the waste�

• Prevention of the problem does not occur�

• Even though assumptions are utilized for RCAs, they should be backed up with documentation and

data�

Six Sigma Green Belt | 7

Functions and Variables

• The formula y = f(x) relates to independent and dependent variables and, by extension, Lean and

continuous improvement�

• y is the dependent variable�

• x is the independent variable�

• f is the function�

• x goes into the function, comes out as y�

Figure 1

Using the Formula

• Y = f(x) implies causation, that is Y depends on x�

• This is different than correlation�

• When one value gets larger, the other gets larger�

• When one value gets smaller, the other gets smaller�

• Remember correlation is not causation (shark attack example)�

Six Sigma Green Belt | 8

Taguchi’s Signal-to-Noise Ratio

Taguchi created the concept of signal-to-noise ratio which is about the vital few vs� the trivial many�

• The vital few are the signals; they are things that cause an output�

- Xs, or independent variables, are the vital few�

• The trivial many are additional�

Figure 2

Notice when we mentioned earlier about root cause, we said RCA is the process of finding and

eliminating the cause which would prevent the problem from returning� With Lean Six Sigma, we are

actually interested in finding out which possible root causes create defects, and then proving them with

statistics� We are not trying to fix a problem that happened for one cause – we are trying to prevent

problems that could occur through more than one or even many causes�

Tying It to Lean

• In Lean, we want to ensure we are focused on fixing the problems with the signals, not the noise�

• Instead of eliminating the waste to create the flow, people have gotten good at performing the

waste�

• The Lean system asks us to identify what waste needs to be removed and to spend our time focused

on being really good at value-added activities�

• The principle of Kaizen is small, incremental, continuous improvements every day

- We want those incremental improvements to be taking place on our signals and not on the rest

of the noise that is around them�

Conclusion

We discussed root cause as a tool, explored the meaning of functions and variables and tied Lean into

our understanding of root cause�

Six Sigma Green Belt | 9

Multi-Vari Charts Here is an example of a multi-vari chart�

Figure 3

In this example, we are measuring a part� The part has to be the same thickness, and the part is a

couple of feet long� So we would measure that in five places, evenly distributed, across that part� We

would then mark down the highest and lowest values on that first green line, we would connect the

dots and draw a line� And that length of the line relative to the specs that are drawn�

• On this chart in red, going across, shows that most of our variation is within part variation�

• So, we know that the problem is not part-to-part variation�

• The problem is the length of that line almost consuming up the whole spec, so there is something

in the process that is eating up almost the whole spec�

Six Sigma Green Belt | 10

Figure 4

• Within all five of those days, all of the lengths of those lines are about the same, which gives us an

indication of within piece variation�

• The length of those lines shows that the within piece variation is very small, while the piece-to-

piece is jumping all over the place�

• The piece-to-piece, within the same day, are about the same, but day to day, something is changing�

Figure 5

• There is also time-to-time variation�

• Simple tool, yet very powerful to see this graphically�

Conclusion

We learned how to make multi-vari charts, and the power they have to visualize different sources of

variation, such as within piece variation, and time-to-time variation�

Six Sigma Green Belt | 11

Tollgate Review: Potential Root Causes At the potential root causes tollgate review meeting, the sponsor will want the following information:

• What are some of the potential reasons for gaps in performance?

• How did the team identify those?

The list of potential root causes might be an overwhelmingly large list� If that is the case, it is helpful

to group them into meaningful categories� However, make sure the full list is available to the sponsor

and team members, since you are probably going to use it again later on in the project� And as always,

send the full list to the sponsor a few days before the review meeting so that they have time to review it�

Always keep in mind that the goal of the potential root causes tollgate is not to eliminate any root

causes� For this tollgate, you just need to come up with a lot of potential root causes, some of which

must be reasonable� If they are not reasonable or if the list is too small, you can always go and find

more and add them as necessary�

How to Narrow Root Causes Affinitize the List Keep a Record Pareto Diagram Scatter Plot Vetting

• Many causes may

be similar�

• Aggregate the list so

only unique items

appear�

• Allow team to rule

out�

• Do not permanently

delete any ideas�

• It may be necessary

to revisit the list at a

later time�

• Rank issues to see

which are occurring

most often�

• Simple but powerful�

• Scatter plot can

show a relationship

between variables�

• Even after

eliminating potential

root causes from a

list, a vetting process

should be conducted

to verify selections�

Figure 6

What Comes Next

In another lecture, we will learn about hypothesis testing and some very powerful statistical tools that

will allow us to identify if our selections are truly root causes and how much of an impact they are

having on the process�

• When we generated potential root cause, it could have been purely by brainstorming; hopefully

there was some data involved, however�

• The next tollgate is where we are really going to base causes on data�

Six Sigma Green Belt | 12

Tollgate Review: Narrow Root Causes At the narrow root causes tollgate review meeting, the sponsor will want the following information:

• An official list of the narrowed root causes�

- Keep this list, because it will be used later on�

• How the team identified the narrowed root causes�

• What tools the team used, i�e�, statistical tools, graphical tools, etc�

- Data should drive the decision to eliminate potential root causes�

• Any new information or updates that might change the scope of the project, the project charter, etc�

Z-Scores

Introduction

In statistics, a z-score (z-value) is a statistic (measure from a sample) that is hypothesized to have been

generated by a normal distribution with a mean of zero and a standard deviation of one� The z-score

can be thought of as a measure of relative location, since it measures the distance between a specific

value of interest and the mean in standard deviation units� Since the z-score is based on the normal

probability distribution, let’s review some of its characteristics�

Characteristics of the Normal Probability Distribution

In general, if we know the data are normally distributed, by using the mean and the standard deviation

we may make some statements about the spread of the data� This result is sometimes referred to as

the Empirical Rule� Note that the area under the curve of a normal probability distribution – like all

valid probability distributions – sums to one�

• The area between plus and minus one standard deviation contains 68 percent of the data�

• The area between plus and minus two standard deviations contains 95 percent of the data�

• The area between plus and minus three standard deviations contains 99�8 percent of the data�

Six Sigma Green Belt | 13

Figure 7

Since the normal probability distribution is differentiated by the mean and the standard deviation, there

are many different normal distributions (all with this basic shape)� In statistics, we refer to this as a

“family” of normal distributions� Additionally, we can make the following observations:

• The highest point on the curve is the mean�

• A normal distribution is symmetric about its mean�

• The standard deviation determines how “flat” and “wide” the curve is�

• Since the normal distribution is symmetric about its mean, there is 50% of the data below the

mean and 50% of the data above the mean�

Working with Z-Scores

Again, one way of thinking about z-scores is this: given a normal distribution, a z-score is a measure

of relative location in that its value indicates the number of standard deviations a specific data value is

above or below the average�

Example: SAT (a standardized test used for college admissions in the U�S�) scores are normally

distributed (that is, when the individual scores are plotted, they form the familiar “bell curve”) with a

range from 400 to 1600� Suppose someone takes a version of the SAT in which the average score was

1010 and the standard deviation was 175� Now, suppose their score is 1170� So, the question might

be: relative to the other people who took this version of the SAT, how well did this person do?

Computing a z-score: to compute a z-score, we take the data value of interest, subtract the average,

and then divide by the standard deviation� Typically (though not always), we are working with samples,

so we will consider 175 a sample standard deviation�

Six Sigma Green Belt | 14

So, their score is 0�91 standard deviations above the average score of 1010�

This usually leads to another question: what percentile is associated with a z-score of 0�91? To answer

this question, we need to make use of the properties of the normal probability distribution�

Recall that a probability is computed as the area under the curve of a probability distribution� Also,

recall there are different types of probability distributions� The normal probability distribution is a

continuous probability distribution, so computing the area under the curve for specific values requires

some sophisticated mathematics� Don’t panic! Fortunately, a “short-cut” has been devised, so we don’t

have to know the mathematics behind the calculations� Statisticians have computed the probabilities

for z-values and summarized the result in a z-Table (standard normal distribution – a normal distribution

where the mean is zero and the standard deviation is one)� You have z-Tables in your reference materials�

Let’s continue with our example� Now, however, we want to more fully interpret the z-score of 0�91 in

order to better understand the person’s SAT score of 1170�

Example: what can we say about a z-score of 0�91 from a normal distribution with a mean of 1010

and a standard deviation of 175? Well, it informs us that the person’s score is 0�91 standard deviations

above the average score of 1010� But, to answer this question more completely, we turn to the table

of probabilities for z-scores found in a z-Table (probabilities for a standard normal distribution – one

where the mean is 0 and the standard deviation is one)�

Six Sigma Green Belt | 15

z x.x0 x.x1 x.x2 x.x3 x.x4 x.x5 x.x6 x.x7 x.x8 x.x9

+0�0 �5000 �5040 �5080 �5120 �5160 �5199 �5239 �5279 �5319 �5359

+0�1 �5398 �5438 �5478 �5517 �5557 �5596 �5636 �5675 �5714 �5753

+0�2 �5793 �5832 �5871 �5910 �5948 �5987 �6026 �6064 �6103 �6141

+0�3 �6179 �6217 �6255 �6293 �6331 �6368 �6406 �6443 �6480 �6517

+0�4 �6554 �6591 �6628 �6664 �6700 �6736 �6772 �6808 �6844 �6879

+0�5 �6915 �6950 �6985 �7019 �7054 �7088 �7123 �7157 �7190 �7224

+0�6 �7257 �7291 �7324� �7357 �7389 �7422 �7454 �7486 �7517 �7549

+0�7 �7580 �7611 �7642 �7673 �7704 �7734 �7764 �7794 �7823� �7852

+0�8 �7881 �7910 �7939 �7967 �7995 �8023 �8051 �8078 �8106 �8133

+0�9 �8159 �8186 �8212 �8238 �8264 �8289 �8315 �8304 �8365 �8389

+1�0 �8413 �8438 �8461 �8485 �8508 �8531 �8554 �8577 �8599 �8621

+1�1 �8643 �8665 �8686 �8708 �8729 �8749 �8770 �8790 �8810 �8830

+1�2 �8849 �8869 �8888 �8907 �8925 �8944 �8962 �8980 �8997 �9015

+1�3 �9032 �9049 �9066 �9082 �9099 �9115 �9131 �9147 �9162 �9177

+1�4 �9192 �9207 �9222 �9236 �9251 �9265 �9279 �9292 �9306 �9319

Figure 8

This z-Table is known as a cumulative z-Table (note I have “snipped” just a portion of the full table)�

That is, it provides us the probabilities of being less than a given z-value� Note that to use this table,

z-scores must be rounded to two decimal places� As you see, the whole number and tenths decimal

places are in the rows of the first column (z) and the hundredths decimal places (0�00 to 0�09) go

across the other columns� So, according to the table, the probability of observing a z-score of 0�91

or less is about 0�8186� Another way to state this is a SAT score of 1170 for this version of the test

represents about the 82nd percentile (about 82% scored less than 1170 and about 18% scored more

than 1170)�

Figure 9

Six Sigma Green Belt | 16

Example: suppose we want to know the probability of scoring between 1070 and 1270 on this version

of the SAT? We can follow the same process as above, but we must calculate two z-scores (one for

1070 and one for 1270)�

Therefore, the probability that a score is between 1070 and 1270 is approximately the same as the

probability that a z-score is between 0�34 and 1�49 which is 0�9319 – 0�6331 or 0�2988�

Figure 10

Keep the following in mind when you are working with z-scores and z-Tables�

• There are multiple versions of a z-Table, but they are equivalent to each other� That is, some

z-Tables provide the probability a z-value is between 0 and z instead of the probability a z-value is

less than a specific value (which is a cumulative z-Table like the one we used above)�

• Since a z-Table contains probabilities for the standard normal distribution, the basic rules of probability

apply� For example, the probability a z-value is less than a given value and the probability a z-value

is greater than that same value must sum to one� We were using this rule when we pointed out that

about 82% of the people taking the SAT in the example scored less than 1170 and this meant that

about 18% scored more than 1170�

• Using z-score equivalents of the data (that is, converting our data values to z-values) allows us to

compare normal distributions with different averages and different standard deviations� This is why

using z-scores is often referred to as “standardizing” the values�

Six Sigma Green Belt | 17

Hypothesis Testing: Key Steps

Introduction

A hypothesis test is a statistical procedure for deciding between two hypotheses using a “test statistic”

obtained from sample data� The hypothesis test makes use of a null hypothesis (often indicated by Ho)

and an alternative hypothesis (often indicated by Ha)� The null and alternative hypotheses are written

such that both cannot be true� Sample data is then collected, and a test statistic is computed from the

sample data� To arrive at a conclusion, the test statistic is compared to the appropriate critical value

(value obtained from a reference distribution based on the level of risk chosen or alpha value) or a

p-value (the probability of obtaining a test statistic that is as extreme or more extreme) is computed

using the test statistic and a decision is made based on the p-value�

Null vs. Alternative Hypothesis

• Alternative hypothesis – what you are testing for or what you think may happen�

• Null hypothesis – what you would expect to happen due to chance alone if nothing really influenced

the process out of the ordinary�

• When you are testing, if you reject the null hypothesis, you are forced to accept the alternative

hypothesis�

Key Steps to Hypothesis Testing

1� Define the situation or problem you want to examine�

2� State your null and alternative hypotheses�

3� State the level of significance�

4� Select the proper tool for analysis�

5� Collect data using that tool�

6� Calculate your test statistic�

7� Look up your critical value�

8� Compare your test statistic to your critical value or compute and interpret the p-value�

9� Make a conclusion based on your results�

Six Sigma Green Belt | 18

Example: here is another example� You want to find out if improvements you have made to a process

have reduced cycle time� In that scenario, what are your null and alternative hypotheses? Your alternative

hypothesis is that the new cycle time is less than the old cycle time, meaning there has been an

improvement� The null hypothesis is that there has not been an improvement�

How do you write that in terms of math language? If you think that the cycle time has actually improved,

then that means the cycle time for the new process is less than the old because improvement for cycle

times means shorter� So, the alternative hypothesis, which is labeled as Ha, is that the new cycle time

is less than the old cycle time� Therefore, the null hypothesis, which is labeled Ho, is that the new Six

Sigma Green Belt cycle time is not less than the old cycle time, meaning that it is either greater than

or equal to the old cycle time� So, notice the distinction between the two� The null hypothesis is that

it has not improved, while the alternative hypothesis is that it has improved�

Hypothesis Testing: Type I and II Errors Again, a hypothesis test is a statistical procedure that leads to a decision� Since we are making a

decision or drawing a conclusion, there are two types of mistakes (errors) we can make�

• Type I Error: we reject the null hypothesis when it is true� This is known as an alpha (α) error� The

alpha error (or significance level) is the risk we specify in the tail (or rejection region)�

• Type II Error: we fail to reject the null hypothesis when it is false� This is known as a beta (β) error�

A type II error can occur due to chance alone or just bad luck� However, most of the time, we get

a type II error, failing to reject the null, even though it’s false, because we made the significance

level too small�

So, the moral here is that you have a risk either way� You’re going to risk that you have your significance

level too large, and you have a risk that you have your significance level too small�

One-Sample t Test

Introduction

A one-sample t Test is a hypothesis test about a population mean in which the test statistic is based on

the t distribution� The form of the t distribution was developed by W� S� Gosset in 1908 while he worked

as a statistician and chemist for Guinness Brewery� Gosset developed some of the early methods for

analyzing experimental data from small samples� His work influenced many statisticians in the 20th

century including W� Edwards Deming� Guinness required employees to publish under a pseudonym

and he chose “Student t�” Today, the distribution is often referred to simply as the t distribution�

Six Sigma Green Belt | 19

The t-distribution is a probability distribution used to estimate the population mean or test a claim about

the population mean based on the fact that the population distribution is approximately normal and

the population variance or standard deviation (either one) is unknown, or the population distribution

is approximately normal and the sample size is less than 30� Whether or not we know the population

standard deviation, if the sample size is less than 30 and we want to do a test of means, we use the

student t-distribution� We see a lot of the student t-distribution, because using a sample size less than

30 is very common due to the expense and time required to collect data for larger samples�

The t-distribution looks like a standard normal or Z-distribution and fits a bell-shaped curve� However,

the tails in the t-distribution have a little more data and are higher than in a standard normal curve,

which come down a little lower first and then come across�

The One-sample t Test

The context for a one-sample t test about a population mean is as follows� Suppose we have an estimate

of the historical average of a process, but we want to test whether or not we should continue to consider

the historical value valid� We collect some data and we use that data to construct a hypothesis test

about the population mean using the t distribution�

Example: historically, the average length of a particular part has been 10cm� We are concerned this

average length might have increased, so we obtain a random sample of 9 pieces of production� The

sample average is 10�23cm and the sample standard deviation of the sample is 0�30 m� Since we are

concerned the average may have increased, we choose to conduct a one-tailed hypothesis test with

null and alternative hypotheses as follows� The null hypothesis represents the claim that the average

hasn’t changed, while the alternative states it has increased from the historical average of 10cm�

Ho: µ = 10cm (note: more technically, we should say µ ≤ 10cm.)

Ha: µ > 10cm

So, to conduct the hypothesis test, we must (1) specify our null and alternative hypotheses, (2) choose

our level of significance, (3) compute our test statistic, (4) compare our test statistic to the critical

value or compute the p-value, and (5) reach a conclusion�

We have the null and alternative hypotheses� Suppose we choose a level of significance of 5% (that is,

alpha = 0�05)� Next, we compute the test statistic�

The test statistic using the t distribution is computed as follows:

Six Sigma Green Belt | 20

To summarize, our sample average is 10�23, our sample standard deviation is 0�30, and our sample

size is 9� Our historical value is 10�

So, our sample average of 10�23cm is actually 2�30 standard errors above our historical average of

10cm�

Now, with a significance of 0�05, is the value of the test statistic enough large enough for us to reject

Ho? To make this decision, we need to find the critical value of t (from the t Table in your GB Textbook)

given an alpha value of 0�05, a one-tailed test, and a sample size of 9�

One-Tailed 10% 5% 2�5% 1% 0�5% 0�25% 0�1% 0�05%

Two-Tailed 20% 10% 5% 2% 1% 0�5% 0�2% 0�1%

df

1 3�078 6�314 12�71 31�82 63�66 127�3 318�3 636�6

2 1�886 2�920 4�303 6�965 9�925 14�09 22�33 31�60

3 1�638 2�353 3�182 4�541 5�841 7�453 10�21 12�92

4 1�533 2�132 2�776 3�747 4�604 5�598 7�173 8�610

5 1�476 2�015 2�571 3�365 4�032 4�773 5�893 6�869

6 1�440 1�943 2�447 3�143 3�707 4�317 5�208 5�959

7 1�415 1�895 2�365 2�998 3�499 4�029 4�785 5�408

8 1�397 1�860 2�306 2�896 3�355 3�833 4�501 5�041

9 1�383 1�833 2�262 2�821 3�250 3�690 4�297 4�781

Figure 11

The t distribution has a parameter known as “degrees of freedom” that we must include� For a one-

sample t test, the degrees of freedom are n – 1; so, in this example, the degrees of freedom (df) are 9

– 1 = 8� To find the critical value, we move down the first column until we get to row 8 and then we

move across until we find the column for a one-tailed probability of 5% (0�05)� Note the critical value

of t is 1�860� Since the test statistic (2�30) is greater than the critical value (1�860), we reject Ho and

say we are 95% confident the mean has increased�

So, what about the p-value? Remember, the p-value is the probability of obtaining a value more

extreme (that is, larger, in this case) than our test statistic (assuming the null hypothesis is true)� Using

Six Sigma Green Belt | 21

the t Table, we can only estimate the p-value� Moving across row 8, we note there is a critical value

of 2�306 which is very close to our test statistic of 2�30� Moving up that column to the one-tailed

probability, we see there is about a 0�025 (2�5%) probability of being greater than 2�306, which is

a close approximation to the p-value in this example� The p-value is sometimes referred to as the

observed level of significance�

Finally, note that if we had chosen 2�5% (0�025) as our level of significance, we would not have

rejected Ho, since 2�30 is not greater than 2�306�

Two-Sample t Test

Introduction

A two-sample t Test is a hypothesis test about the difference in two population means in which the test

statistic is based on the t distribution�

The Two-sample t Test

You can even use the two-sample t test in several places of the DMAIC model� Suppose you have two

suppliers for a critical part and you are concerned that the two suppliers are not delivering the same

average value for a critical-to-quality characteristic� To test the equality of the population means for the

two suppliers, we would obtain random samples from each supplier and conduct a two-sample t test�

Example: two machines (A and B) are utilized in processing steel shafts for small electric motors� The

operator is concerned the machines are not comparable when it comes to the outside diameter of a

specific location on the shafts� The operator decides to use a two-sample t test to test for the equality

of the average outside diameters� The operator takes a random sample from each machine� The

measurements are in millimeters�

Machine A 10�43 10�84 10�29 10�50 10�44 10�68 10�64 10�64 10�02 10�56 10�33 10�67

Machine B 10�18 10�14 10�02 10�28 10�25 10�20 10�31 10�11 10�12 10�25 10�26 10�22

Figure 12

First, we need to compute the sample mean and the sample standard deviation for each machine�

Mean Std. Dev.

Machine A 10�503 0�2198

Machine B 10�195 0�0843

Six Sigma Green Belt | 22

Figure 13

As with any hypothesis test, we need to define our null and alternative hypotheses and we need to

choose a value for alpha� Suppose we choose an alpha of 5% (0�05) and we decide on a two-tailed

test� Then, our hypotheses are:

Ho: µA = µB

Ha: µA ≠ µB

Basically, we are testing the claim that the averages of the population outside diameters are equal

versus they are not equal� Since the alternative hypothesis allows for a difference in either direction,

this is a two-tailed test�

Next, we to compute the test statistic� The test statistic (t) in this case is:

Note: depending on the situation, there are other ways of estimating the combined standard deviations

(that is, the denominator in the test statistic formula)�

So, our test statistic is a quite large 4�55�

Our chosen alpha value is 0�05, so we need to find the critical value from a t Table� To do that, we will

need the degrees of freedom� In this two-sample case, the degrees of freedom can be estimated using:

df = nA + nB – 2 = 12 + 12 – 2 = 22

Six Sigma Green Belt | 23

One-Tailed 10% 5% 2�5% 1% 0�5% 0�25% 0�1% 0�05%

Two-Tailed 20% 10% 5% 2% 1% 0�5% 0�2% 0�1%

df

1 3�078 6�314 12�71 31�82 63�66 127�3 318�3 636�6

2 1�886 2�920 4�303 6�965 9�925 14�09 22�33 31�60

3 1�638 2�353 3�182 4�541 5�841 7�453 10�21 12�92

4 1�533 2�132 2�776 3�747 4�604 5�598 7�173 8�610

5 1�476 2�015 2�571 3�365 4�032 4�773 5�893 6�869

6 1�440 1�943 2�447 3�143 3�707 4�317 5�208 5�959

7 1�415 1�895 2�365 2�998 3�499 4�029 4�785 5�408

8 1�397 1�860 2�306 2�896 3�355 3�833 4�501 5�041

9 1�383 1�833 2�262 2�821 3�250 3�690 4�297 4�781

10 1�372 1�812 2�228 2�764 3�169 3�581 4�144 4�587

11 1�363 1�796 2�201 2�718 3�106 3�497 4�025 4�437

12 1�356 1�782 2�179 2�681 3�055 3�428 3�930 4�318

13 1�350 1�771 2�160 2�650 3�012 3�372 3�852 4�221

14 1�345 1�761 2�145 2�624 2�977 3�326 3�787 4�140

15 1�341 1�753 2�131 2�602 2�947 3�286 3�733 4�073

16 1�337 1�746 2�120 2�583 2�921 3�252 3�686 4�015

17 1�333 1�740 2�110 2�567 2�898 3�222 3�646 3�965

18 1�330 1�734 2�101 2�552 2�878 3�197 3�610 3�922

19 1�328 1�729 2�093 2�539 2�861 3�174 3�579 3�883

20 1�325 1�725 2�086 2�528 2�845 3�153 3�552 3�850

21 1�323 1�721 2�080 2�518 2�831 3�135 3�527 3�819

22 1�321 1�717 2�074 2�508 2�819 3�119 3�505 3�792

Figure 14

The critical value for this two-tailed test with alpha equal 0�05 is 2�074 (go down the degrees of

freedom rows to 22 and then across to the column of “Two-Tailed, 5%”)� Since our test statistic (4�55)

is greater than our critical value (2�074), we reject Ho and confirm the operator’s suspicion that the

two machines are not producing with equal averages�

The t-test is a tool that indicates whether two means are statistically and reliably different and is effective

when using samples with less than 30 points of data� The real issue, however, is what question you

are trying to ask of the process� Own the tool; don’t let the tool own you� Identify and utilize whatever

tool helps you accomplish your goal, no matter what stage of the process you are in� For instance, in

addition to identifying critical root causes, the t-test can help select solutions� For example, if we want

to know which training method is best, we could train two groups differently and measure the average

level of performance�

Six Sigma Green Belt | 24

Paired t Test

Introduction

The basic t Test requires us to have independent random samples� However, there is a situation in

which you take two different measurements from one sample� This situation often appears to be one

where you have two independent samples, but you do not� Clearly, taking two measurements from

each item in the same sample does not meet the requirement that we have two independent samples�

This situation calls for the Paired t Test�

The Paired t Test

The Paired t Test also constructs a test statistic that is based on the t distribution� However, as the

name suggests, the data are really “paired” data and not two independent random samples� An example

should clarify�

Example: suppose we have two instruments that we are using to measure the hardness of a surface

(like a Vickers test for the hardness of steel)� Further, suppose we have become concerned the two

instruments aren’t providing similar results� We will call them Instrument 1 and 2� We decide to take a

random sample of 11 pieces of the material (a piece large enough to obtain two hardness measures)�

Note that we will have 22 measures, but we do not have two independent samples of eleven each�

Instead, we have one random sample of eleven in which we obtain two measures from each sample�

We will use alpha = 0�05�

Sample 1 2 3 4 5 6 7 8 9 10 11

Instrument 1 164 174 173 168 160 165 166 165 172 171 168

Instrument 2 166 172 176 166 162 161 166 164 169 167 164

Figure 15

The Paired t Test uses a clever approach� We create a new variable (call it Difference) which is the

result of subtracting the Instrument 2 measure from the Instrument 1 measure�

Sample 1 2 3 4 5 6 7 8 9 10 11

Instrument 1 164 174 173 168 160 165 166 165 172 171 168

Instrument 2 166 172 176 166 162 161 166 164 169 167 164

Difference –2 2 –3 2 –2 4 0 1 3 4 4

Figure 16

Six Sigma Green Belt | 25

The idea is as follows: if there is no difference between the two instruments, then the “difference”

should be normally distributed about zero� If it isn’t, then there is a difference (accounting for sampling

variability) and the two instruments are not yielding similar results�

To conduct the Paired t Test, we compute the average and the standard deviation of the “difference”

measure� We obtain an average of 1�18 and a standard deviation of 2�601� Our hypotheses are:

Ho: µ = 0

Ha: µ ≠ 0

The test statistic using the t distribution is computed as follows:

So, our test statistic is equal to 1�50 and we now compare it to the critical value of t (from the t Table)

with n – 1 or 11 – 1 = 10 degrees of freedom and an alpha of 0�05�

One-Tailed 10% 5% 2�5% 1% 0�5% 0�25% 0�1% 0�05%

Two-Tailed 20% 10% 5% 2% 1% 0�5% 0�2% 0�1%

df

1 3�078 6�314 12�71 31�82 63�66 127�3 318�3 636�6

2 1�886 2�920 4�303 6�965 9�925 14�09 22�33 31�60

3 1�638 2�353 3�182 4�541 5�841 7�453 10�21 12�92

4 1�533 2�132 2�776 3�747 4�604 5�598 7�173 8�610

5 1�476 2�015 2�571 3�365 4�032 4�773 5�893 6�869

6 1�440 1�943 2�447 3�143 3�707 4�317 5�208 5�959

7 1�415 1�895 2�365 2�998 3�499 4�029 4�785 5�408

8 1�397 1�860 2�306 2�896 3�355 3�833 4�501 5�041

9 1�383 1�833 2�262 2�821 3�250 3�690 4�297 4�781

10 1�372 1�812 2�228 2�764 3�169 3�581 4�144 4�587

11 1�363 1�796 2�201 2�718 3�106 3�497 4�025 4�437

Figure 17

Six Sigma Green Belt | 26

Note this is a two-tailed test (Ha: µ ≠ 0), so we come down column 1 to 10 degrees of freedom and

move across to the “Two Tailed, 5%” column� Our critical value is 2�228� Our test statistic (1�50) is

not greater than our critical value (2�228), so we cannot reject Ho� Therefore, we do not conclude the

two instruments are producing different results on average�

Rational Subgroups

Introduction

The first question to consider when developing a control chart is: why do we need subgroups? Or,

maybe, when should we consider using subgroups? When and how to use subgroups represents the

approach we choose to employ regarding “background noise” or common variation in our processes�

Of course, if you have chosen to use an Individuals and Moving Range Control Chart, then you have

decided to use the individual values as your “subgroups” of one� So, the differences in the individual

values essentially functions as your background noise� Walter Shewhart (the inventor of the control

chart) developed an approach that allows us to filter the background noise in data so that we might

detect a signal in the data (see Understanding Variation by Donald Wheeler for a nice nontechnical

discussion of Shewhart’s approach)� However, let’s suppose you have decided to use and Average and

Range Chart� With this choice, you have decided to use a control chart to monitor your process that

makes use of subgroups of two or more�

What Does the “Rational” Mean in Rational Subgroups?

By choosing to use a control chart with subgroups (like the Average and Range Control Chart), you

are effectively choosing to collect data under the same basic conditions for the samples within the

subgroups� As such, the subgroups represent an estimate of the background noise or background

variation in the process� Therefore, you must select subgroups in such a way that it is valid for us to

assume the subgroups do indeed capture and represent the “common” variation in the process�

But what makes the subgrouping approach rational? Rational subgroups are the sample groups that

make sense when plotted together on a control chart� That is, the variation within the subgroups and

the variation between the subgroups must be logically consistent� For an Average and Range Control

Chart, the variation within the subgroups is used to estimate the control limits for the averages (that

is, for the variation between subgroups)� For example, suppose we have two processes utilizing very

different technologies that produce the same component� Would it make sense to put use a subgroup of

size n = 2 (one from each process) and place the data on an Average and Range Chart? No� The data

within the subgroups should be subject to a similar causal system, but in this case, they represent two

very different methods of production� The variation within the subgroups does not provide a reasonable

Six Sigma Green Belt | 27

approach to measuring the variation from subgroup to subgroup� Therefore, the subgrouping approach

isn’t rational: it will not facilitate our ability to detect changes subgroup to subgroup� Thus, as Shewhart

indicated, we want to:

1� Maximize the between subgroup variation; and

2� Minimize the within-subgroup variation�

A “Rational Subgroup” is Not a Random Sample

The concept of a rational subgroup is often confused with the statistical concept of a random sample�

They are two very different ideas� As stated previously, we basically must understand our process

clearly enough to “design” a rational subgroup so that it makes sense and facilitates our ability to detect

changes in an ongoing process�

The phrase “random sample” is used in statistics to indicate a specific way in which a sample is

selected� For the most basic type of random sample – a simple random sample – the sample is selected

in such a way that each element of the population has an equal chance of being chosen�

For example, suppose an auditor is given the task of auditing two of five plants� The auditor might be

sensitive to the possibility of being accused of bias, so the auditor decides to “randomly” select the two

plants to audit� From the perspective of statistics, how might this be accomplished? Let’s denote the

plants as A, B, C, D, and E�

Our first inclination might be two places five slips of paper in a hat (one for each plant) and then draw

one slip and then another� Is this a simple random sample? No� Note that when the first sample is

drawn, the five plants have the same probability of being selected (1/5 or 0�20)� But when the second

slip is drawn, there are only four choices left, so the probability is 1/4 or 0�25� This isn’t the same

probability as the first, so the requirement that each element have the same probability of selection is

violated�

So, how do we select a random sample of two plants out of five? First, we need to calculate the number

of ways in which two things can be selected from five things� This is known as a combination� We can

compute it as follows� Recall that 5! = (5)(4)(3)(2)(1) = 120, 2! = (2)(1) = 2, and 3! = (3)(2)(1)

= 6�

Six Sigma Green Belt | 28

So, there are 10 ways in which to choose 2 things from five things� We can write them out in this case�

(A,B) (A,C) (A,D) (A,E)

(B,C) (B,D) (B,E)

(C,D) (C,E)

(D,E)

Now, we can write this on 10 slips of paper (or number them from 1 to 10) and randomly select

them using a random number generator� More work? Yes, but now we are able to claim we have a

simple random sample of two items from five� Again, this is very different from the notion of a rational

subgroup�

Constructing Control Charts When we think of control charts, what might come to mind are interesting plots of data that somehow

might help us keep control of our process�

But the real reason we do them is a bit more profound� You see, every time we take a measurement

from our process – whether it is manufacturing, healthcare, finance, IT, government, or something else

– we expect to get a different result every time we measure� It is just the nature of random data�

How do we know if variation in our process is normal, expected, and when is something unusual

happening that ought to cause us to step in and take action?

Walter Shewhart invented control charting in the 1920s at Bell labs to address this difficult question,

in a time when there were no computers and decisions had to be made on the fly making telephones

from dozens or hundred of parts� So, you can see that control charting gets to the profound nature of

process variation, and it isn’t just a handy way to plot the data�

In this section, we are going to go through the six steps of control charting, and then go through some

examples�

There are six steps to constructing control charts:

1� Gather the data in the order of production

2� Calculate the subgroup averages and ranges

3� Calculate the control limits

4� Plot the control limits

5� Plot the points

6� Act on the control chart

Six Sigma Green Belt | 29

Let us look at each step in more detail, including two rules for constructing control charts�

As you develop your charts, you should consider Shewhart’s eight rules that help to determine if the

variation you are seeing on your control charts is due to common or random cause variation or due to

special cause variation� We are going to have a look at these also�

Step 1 - Gather and Record Data in the Order of Production

If the data is not in order, then it will not make sense�

That sounds so simple, but it is very important� Think about cutting a movie strip into pieces, shuffling

them, and putting them back together� The movie won’t make sense�

Here, we see data being gathered out of a process five at a time at some time interval that makes sense

to the team�

Figure 18

Keep good notes� Here in our example, our notes are machine codes� Do not worry about these codes;

they are simply there to identify different machines�

Six Sigma Green Belt | 30

Step 2 -Calculate Subgroup Averages and Ranges

To calculate subgroup ranges, first find the highest and lowest values in each group� In our example,

the highest and lowest values are in blue� Then subtract the two numbers to find the range�

Figure 19

Step 3 -Calculate Control Limits

• First, you need to calculate the grand average, or the average of the averages; these are also known

as X-double bars�

• To calculate the X-double bar, first add all of the averages together� In our example, the total of our

averages (X-bars) is 275�8� Then divide that number by the number of subgroups; in our example,

we have 12 subgroups� Our X-double bar is 23 because 275�8 ÷ 12 rounds to 23�

• We can do the same with ranges� We have range for each sample group, and we can average those

to make a center line for a graph showing dispersion over time�

• Next, we calculate the R-bar the same way as the X-double bar, but with our ranges; add the ranges

together and divide by the number of subgroups�

• In our example, our ranges totaled 46, and we have a subgroup of 12� Our R-bar is 3�83 because

46 ÷ 12 is 3�83�

• Now, we calculate the upper control limit for the top chart which monitors the centering; this is also

known as an X-bar chart� The upper control limit is calculated by multiplying A2 and R-bar before

adding the X-double bar�

• A2 values are constants for subgroup sizes� We look them up in a table of Shewhart constants� For

example, for n=3, A2=1�02, For n=4, A2=0�73, For n=5, A2= 0�58

Six Sigma Green Belt | 31

• Since we have a subgroup of 5, our A2 is 0�58� For our example, our upper control limit is 25�2

because A2 * Rbar= 0�58 x 3�83 = 2�22 and 23 + 2�22 is 25�2�

• The lower control limit for the X-bar chart is calculated the same way except you subtract from

the X-double bar instead of adding� So, for our example, our lower control limit would be 20�78

because 0�58 x 3�83 is 2�22 and 23 – 2�22 is 20�78�

Figure 20

• Since your upper control limit is 25�2 and your lower control limit is 20�78, then 99�73 percent of

your process outputs should fall within that range�

• Now, we calculate the upper control limit for the lower chart which monitors dispersion; this lower

chart is also known as a range chart� For the range chart, instead of using A2 values, we are going

to use D4 values� These D4 values are constants that depend on your subgroup size�

- For example, n=3 D4=2�57, n=4 D4=2�28, N=5 D5=2�11

• As we have a subgroup size of five, the D4 value is 2�11�

• Multiply the D4 value and the R-bar to get the upper control limit: 2�11 x 3�83 gives you an upper

control limit of 8�1

• The lower control limit for the range is D3 times Rbar� But for n=5, D3=0 and the LCL=0

Wow – there were several parts to this step calculating control limits� But notice it was all just arithmetic�

Shewhart invented these methods of SPC to be done without need of a computer, and with a little

practice, can be done quickly�

Six Sigma Green Belt | 32

Step 4 –Plot the Control Limits

• There are two parts to a control chart:

1� The upper part monitors centering�

2� The lower part monitors dispersion�

• Here is an example of what the X-bar chart might look like after you plot your upper and lower

control limits�

Figure 21

• And here is an example of what the range chart might look like after you plot your upper and lower

control limits�

Figure 22

Six Sigma Green Belt | 33

Step 5 – Plot the Points

• Here is what your upper chart will look like- the one based on the sample averages� Notice there

seems to be a lot of variation in this process� We will learn how to tell if it is normal predictable

variation or whether something special is going on that needs our attention�

• Here is what your lower chart will look like – the one based on sample ranges� There are also rules

we will learn to apply to tell if the dispersion or variation within each sample group is predictable

(normal), or whether something special is going on that needs our attention�

Figure 23

• There is no math involved with it; you just read the chart and act on what it is telling you� Look for

shifts and trends in the upper chart� A shift is a series points above or below the average� A trend

is a series in an upward or downward direction� These shifts and trends do not happen by chance;

they occur because something in the process has changed�

Six Sigma Green Belt | 34

Step 6 – Act on the Chart

• In the range chart, look for discrimination� Discrimination is having a least six possible units of

measure between zero and the upper control limit�

Figure 24

• In this example, our data precision is to the one’s place, so we can fit 8 intervals of 1 between our

lower control limit of zero and our upper control limit of 8 With adequate discrimination, your chart

may look something like this�

Figure 25

Six Sigma Green Belt | 35

There are a few things to keep in mind about control charting�

First, never put specification limits on control charts� If you think about it, our control chart method is

showing us whether our variation is common cause, natural variation, and predictable unless out of

control�

• On the other hand, specification limits are, well, artificial� They are numbers we make up and

impose on the process based on the voice of the customer� So a process may be in control but if

the tolerances are too tight, it will make many defects anyway�

• If the tolerances are loose, even a process that is out of control may run defect or close to it� Control

limits go on control charts; customer specifications are for use with Pp/Ppk and Cp/Cpk�

The second thing to keep in mind about control charting is that you should never construct a control

chart from inspection records� Why? Because inspection records are what happened in the past; they

are not things you can change now� If you can’t take immediate action based on the chart, then don’t

bother�

Finally, while we have looked at Xbar and R charts here – which are great for sample groups from

continuous data – there are other charts depending on the need� You may learn much more about these

is the Black Belt course�

• One other chart for variables data is an X-mR chart, sometimes called an I-mR chart� It is for

charting individual pieces of continuous data and the difference between each individuals point

(moving range)�

• There are also four charts for attribute – rather than variable – data� They are based on counts and

proportions, and are the p, np, c, and u charts� Again, you may learn about these at some point in

your Lean Six Sigma learning journey�

Conclusion

So, in this section, we learned the six steps of control charting and an example of how to construct the

X bar and R chart pair� We also learned the differences between control and specification limits, the

need to act immediately on control charts rather than rely on inspection records, and that there are

other types of control charts depending on the purpose�

Six Sigma Green Belt | 36

Tollgate Review: Critical Root Causes You will need to provide the following four items for this tollgate review meeting:

1� A short list of critical root causes – the critical few from the trivial many�

- These are the root causes that will be carried forward into the Improve phase�

- The team believes that if these root causes are eliminated, that will close the performance gaps�

2� How the team identified these critical root causes�

- Use data-driven tools to make this final selection of critical root causes�

3� An estimation of how much gap in performance can be closed by eliminating each of these root

causes�

4� A hard copy of the official list of critical root causes�

Conclusion to the Analyze Phase

Tollgates of Analyze

1� Possible root causes�

- Identify using idea generation tools�

- Ideal to generate a large number�

- Keep in case you want to revisit�

2� Narrow root causes�

- Use team decision tools or graphical tools (e�g�, Pareto chart)�

- Reduce possible causes to big issues�

- Again, keep in case you want to revisit�

3� Critical root causes�

- Identify the significant few causes and verify to be the source of the problem� Not just symptoms,

but actual causes�

- Used statistical or data-driven tools�

Six Sigma Green Belt | 37

Closing the Gap

Possible reasons that you haven’t identified enough causes to close the gap:

• Didn’t generate enough ideas�

• Narrowed too harshly�

• Ruled something out a little too early�

• Target too high�

• Maybe the data was wrong�

Possible next steps if you haven’t closed the gap:

• Revisit the Measure phase�

• Expand the scope�

Six Sigma Green Belt | 38

Notes

  • Module 6
  • Introduction
  • Objectives
  • Assignment Checklist
    • Introduction to the Analyze Phase
    • Root Cause Analysis
    • Multi-Vari Charts
    • Tollgate Review: Potential Root Causes
    • How to Narrow Root Causes
    • Tollgate Review: Narrow Root Causes
    • Z-Scores
    • Hypothesis Testing: Key Steps
    • Hypothesis Testing: Type I and II Errors
    • One-Sample t Test
    • Two-Sample t Test
    • Paired t Test
    • Rational Subgroups
    • Constructing Control Charts
    • Tollgate Review: Critical Root Causes
    • Conclusion to the Analyze Phase