Summary of Information

gysgtclarke
week2readings.ppt

Lesson II

Getting Started with Stats

Why Might You Be Involved in Doing Statistical Research?

  • Manager as research-based decision maker
  • Subordinate employee as researcher
  • Manager as research services buyer/evaluator
  • Manager as evaluator of secondary data sources
  • Research specialist

Research Defined

A systematic inquiry at providing information to solve managerial problems

  • Beginning researchers should understand that research is a process of reasoning with facts
  • I like to “let the data speak to me.”
  • What did you find from your articles for tonight?
  • There is much you can do, but what SHOULD you do?

What is the decision dilemma we face?

What can research tell us/accomplish?

How do we define “best”?

  • What type of study do we need to do?

Reporting

Descriptive

Explanatory

Predictive

Research Defined continued…

  • Basic vs. applied research

Basic – aims to discover new knowledge in a more general sense; scientists

Applied – an effort to solve an immediate problem; to make a particular management decision

  • Primary vs. Secondary research
  • Initial research vs. Problem solving
  • Survey vs. Experimental research

Research Defined continued…

The Role of Statistics in Research

  • “Statistical thinking is necessary…for effective decision making in various facets of business”
  • The science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making more effective decisions
  • Can be to capture a population’s characteristics & make inferences from a sample’s characteristics
  • Many firms have data they don’t mine for regular decision making insights.

Some Examples

  • You work for Books R Us and have been asked to look at their sales data to see how to boost sales.
  • What questions do we need to ask?

To what purpose will the research be put

How do I shape my research to provide valid, quantifiable results?

If I do an opinion survey, how many people do I need to ask?

How do I craft my questions?

How have similar studies been conducted in the past?

Questions to Ponder…

  • What types of research does your organization use?
  • How can research help your organization save money?
  • Do you feel that the amount of research being conducted has increased or decreased over the past 10 years?
  • How has the Internet changed the quality of quantity of research?

Creating a Research Plan and Design

Time Series Data for the Economy – in class exercise.

Research Designs

  • The general research process contains three major stages:

Exploration of the situation

Collection of data

Analysis and interpretation of the results

  • What questions do the data prompt you to ask?
  • Research study idea: What events triggered changes in key indicators (i.e., oil prices)? How do variables interrelate (i.e., inflation, personal taxes & GDP?)

Research Designs

  • The essentials of a research design

What data do we need?

Is the data available?

Do we need to “massage” it?

Is it longitudinal vs. cross sectional (which was the Econ data and the Gas price data)?

  • Quantitative vs. qualitative data

Research Methods

Exploratory Studies

  • Useful when we lack a clear idea of the problem
  • Saves time and money
  • Relies more heavily on qualitative techniques
  • An exploratory study is finished when we have achieved the following:

Established the major dimensions of the task

Defined a set of investigative questions to guide a detailed research design

Developed several hypotheses about possible causes of the management dilemma

  • May use a focus group.

Case Studies

  • Much of what you’ve done so far at UOP has been of this nature.
  • Can be qualitative or quantitative.
  • Will tend to use primary data, but can also use secondary data

What kind of data is the Gas Station data?

What kind of data are the Econ stats?

You might need to interview people – what questions do you ask & how ask them?

Descriptive Studies

  • More formalized study with clearly stated hypothesis or investigative questions
  • Descriptions of phenomena associated with a subject population - subjects
  • Discovery or associations among different variables – correlational study
  • How is this different from a causational study?

Need to ask: Do the variables interact/effect each other?

In other words, can the “dependent variable” impact or effect the “independent variable”

Causal Studies

  • How one variable affects, or is responsible for, changes in another variable
  • Possible variable relationships

Symmetrical – no direct link, but fluctuate together

Reciprocal – when 2 variables mutually influence or reinforce each other

Asymmetrical – changes in one variable are responsible for changes in another

Questions to Ponder…

  • When is it appropriate to use exploratory research?
  • Descriptive research is usually used in the marketing and sales business functions. What other areas might descriptive research be used?
  • When conducting causal research, how can researchers keep variables constant throughout the entire research period?
  • What factors should be considered when conducting a longitudinal study?
  • Can decision-making be accomplished by using only cross-sectional research?

Theory Building

Theory

  • Theory – set of systematically interrelated concepts, definitions, and propositions that are advanced to explain and predict facts
  • Our ability to make rational decisions is measured by the degree to which we combine fact and theory, each of which is necessary for the other to be of value
  • For our purposes, it is helpful to do a literature search of studies that are similar in nature to see what “guiding principles” we can glean – we will not be looking to expand the academic literature by coming up with a new theory.

Reasoning

Deductive reasoning – form of inference; the conclusion must necessarily follow for the reasons given; imply the conclusion and represent a proof

  • For a deduction to be correct, it must be both true and valid:

Premises (reasons) given for the conclusion must agree with the real world (true)

The conclusion must necessarily follow from the premises (valid)

  • Use Research to avoid Thoughtless Thinking!

Inductive vs. Deductive Thinking

Induction vs. Deduction Cont.

Deduction: If A = B, and B = C then A = C

The conclusion is contained in the premise.

If we deny the premise, we deny the conclusion.

Induction: A conclusion drawn from one or more facts.

The conclusion explains the facts.

The facts support the conclusion

What Causes Societal Dysfunction?

“Data Correlations show that in almost all regards, the highly secular democracies consistently enjoy low rates of societal dysfunction, while pro-religious and anti-evolution America performs poorly.” http://moses.creighton.edu/JRS/2005/2005-11.html

“Since 1962-63 … the judicial rejection of natural law and the embracing of relativism, the United States has become number one in the world in violent crime, divorce, and illegal drug use.” (Barton, David: The Myth of Separation, p. 217, © 1992)

What causes societal dysfunction in the U.S.? What is your theory?

Hypothesis Testing

  • Hypothesis - A tentative explanation for an observation, phenomenon, or scientific problem that can be tested by further investigation
  • Role of the hypothesis

Guide the study

Identifies relevant factors

Leads to data collection

Frames work in which to organize conclusions

Refining the Research Problem

Operational Definitions

  • Requires the use of concepts, constructs, and definitions; building blocks of theory
  • Concept – a generally accepted collection of meanings or characteristics associated with certain events, objects, conditions, situations, and behaviors
  • The success of research hinges on:

How clearly we conceptualize and how well others understand the concepts we use

The challenge is to develop concepts that others will clearly understand

Concepts and Constructs

Constructs

  • Is an image or idea specifically invented for a given research and/or theory building purpose
  • Operational definitions – stated in terms of specific testing or measurement criteria
  • Variables – used as a synonym for construct; a symbol to which we assign numerals and values

Benchmarking

  • A search for best practices that leads to superior performance; measurement
  • Applied to many areas

goods and services

business processes

performance measures

  • Key steps in benchmarking

planning, analysis, integration, action, and maturity

  • Types of benchmarking

internal, competitive, functional, and generic

Let’s Develop a Bus. Research Problem: Sell More Phone/Data Lines

  • Construct

Market share of lines

Install timeframe

Sales activity level

Pricing

Customer service quality

  • Benchmarks

Market breakeven %

% of orders installed in X time frame

10 new appoint/wk.

10% below SBC Price

2 hr. response time

Learning Objectives

  • What is the Difference between Descriptive and Inferential Statistics?
  • What is a Binomial Distribution and how it relates to Stats
  • What is the Variance and Standard Deviation
  • An Introduction to the Concept of a Z Distribution Table

Statistics

  • The science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making more effective decisions
  • Descriptive stats – methods for organizing, summarizing, and presenting data in an informative way
  • Inferential stats – methods used to determine something about a population, based on sample

Descriptive Statistics

  • With Descriptive Statistics you are simply describing “what is” or “what the data show.”
  • For example, “60% of people survey said they prefer Coke”. Or “Both Social Security and Defense Spending make up 21% of the federal budget.”

Example of Gas Price Data

Gas Station Name Day Date Time of Day Location Gas Price
Gas America Monday 3/8/2010 Morning 21st and Franklin Rd. $2.58
BP Gas Station Tuesday 3/9/2010 Afternoon 21st and Post Rd. $2.75
BP Gas Station Sunday 3/7/2010 Morning 21st and Post Rd. $2.59
Admiral Gas Station Tuesday 3/9/2010 Afternoon E. 21st Street. $2.59
Marathon Gas Station Monday 3/8/2010 Evening 21st and Mithoeffer Rd. $2.69
Shell Gas Station Tuesday 3/9/2010 Afternoon 21st and Post Rd. $2.75
Circle K Tuesday 3/9/2010 Afternoon 96th & Meridean $2.75
Meijer Gas Station Saturday 3/6/2010 Morning Rockville & Raceway $2.62
Speedway Gas Station Tuesday 3/9/2010 Afternoon Rockville & Girlschool $2.69
Speedway Gas Station Tuesday 3/9/2010 Morning St. Rd. 32 $2.67

Each row is a “case”. Variables are across the top. Which one is a quantitative measure? Which is qualitative? See p. 4 and 5

*

Inferential Statistics

  • With Inferential Statistics you are asking the data to “speak to you” to infer from the sample data certain characteristics about the population we are studying.
  • For example, the formula to the right measures the variation or dispersion of data above and below the population mean.

2 = S (x- µ)2

N

µ

Variance of a Population

How Long does it Take to Catch a Fish?

µ

Let’s take 4 samples on different parts of the lake to estimate “The Truth”.

30 seconds

30 min.

1 hour

2 hours

The Impact of Statistics

  • Positive

Data translation into useful information

Answers questions of uncertainty

Constructive hedging

  • Negative

Misleading

Abusive

Preemptive bias

The Impact of Statistics continued…

  • Personal

“You might not want to use statistics, but statistics are being used on you”

Buyer behavior

Product availability

Wage determination

Actuarial tables’ influence on insurance products

Queuing theory

Variables

Types of Variables

  • Categorical Places individual into one of several groups or categories

Attribute – gender, age, ethnicity, education level

  • Quantitative: Takes Numerical values, allowing adding and calculating averages

Continuous – can take on fractional values; infinite number of values b/n units on the scale; always an approximate; height

Types of Variables

  • Variable – trait, attribute that can take on different values at different times
  • Constant variable – doesn’t change
  • Qualitative – based on qualities that can be classified but not measured; difference of types of kinds; gender
  • Quantitative – measurable differences in amount
  • Discrete – no possible values b/n adjacent units on the scale (whole #’s); dichotomous; marital status

Flip a coin 10 times.: Either heads or tails (1 or 0)

Results in a Binomial Distribution

Analogy: Product either perfect or imperfect.

We can run statistical tests to predict the probability of H or T, Good or Bad

Types of Variables continued…

  • Control – most important in research study; most difficult implications of study
  • Independent – research systematically manipulates; experimental treatment; measure to observe effect of dependant variable
  • Dependant – researcher measures to observe the effects of independent variable
  • Intervening or modifying – originates within subject; psychological or emotional reaction; can cause errors in study; fear, anxiety, anger, etc. Need to control by getting to know people, developing rapport
  • Extraneous or confounding – Appears to be related, but in fact is not related. Can lead to false conclusions.

Make a List of Variables for Gas Price Data…

  • Make a list of the variables for your team project.
  • What type of variable are they?
  • How will you collect the data?
  • What do you know about the data source? – Would someone dispute this data, recommend other data to consider, or a different research design to analyze it?

What Type of Data?

Levels of Measurement

  • Measurement – procedure for assigning a value (numbers) to an observation (variable) according to certain rules
  • Nominal – categories are mutually exclusive

Male vs. Female, Yes vs. No

  • Ordinal – categories have a logical order

Likert Scale, 1 – 5, Strongly Agree to Strongly Disagree (is Bob a good prof.?)

  • Interval – equal distance between categories

Time, temp.

  • Ratio – True Zero point exits

Height, weight, distance

Ratio Analysis

  • Handout of moving average of GDP.
  • Analyze the Books R Us data – 3 week moving average. (This is on “Books R US Raw Data” excel file.
  • How do we put this into a research question that allows us to collect & analyze data?
  • How will we measure the data we collect?

Role of Statistics

The Language of Statistics

  • Distribution
  • Bar Graph
  • Histogram
  • Pareto Chart
  • Time Plot
  • Cross-sectional data
  • Quartile
  • Box Plot
  • Variance
  • Coefficient of Variation
  • Resistant Measure
  • Degrees of freedom

Definitions in Statistics

  • Population
  • Sample
  • Standard deviation
  • Normal Distribution
  • Statistic versus parameter
  • Random
  • Mean
  • Median
  • Mode
  • Right Skew
  • Left Skew
  • Stemplot

Symbols

  •  (Uppercase Sigma) = Summation
  •  (Mu) = Population mean
  •  (Lowercase Sigma) = Standard deviation
  •  (Pi) = Probability of success in a binomial trial
  •  (Epsilon) = Maximum allowable error
  • 2 (Chi Square) = Nonparametric hypothesis test
  • ! = Factorial
  • H0 = Null hypothesis
  • H1 = Alternate hypothesis

Measures of Central Tendency

  • Central Tendency – the tendency of a set of data to center around certain numerical values
  • Mean – computed by summing all the observations in the sample and dividing the sum by the number of observations; considers the magnitude of each observation
  • Median – is the observation that divides the distribution into equal parts; most typical observation in a distribution
  • Mode – the observation that occurs most frequently; if all the values are different, there is no mode

Means, Medians, and Modes

Age

  • 25
  • 28
  • 30
  • 30
  • 33
  • 34

Mean =30

Median = 30

Mode = 30

Mean

Median

Mode

Frequency

SX

N

µ =

25+28+30+30+33+34

6

= 30

Measures of Central Tendency

  • The idea is that the sum of the differences between any given observed value & the mean = 0
  • S(X – ) = 0 What’s an “X”?
  • What do we do if they all add up to 0?

Calculate Deviation Scores

25 28 30 33 34

-2 +3

-5 +4

x x - µ
25 (25-30) -5
28 (28-30) -2
30 (30-30) 0
30 (30-30) 0
33 (33-30) 3
34 (34-30) 4
180/6=30 0

Population Variance & Population Standard Deviation

  • Population variance – can be used to compare dispersion in 2 or more sets of observations

On average, 1 standard deviation in student’s ages is 3.0 years from the mean of 30 years.

  • Population Standard deviation – square root of the variance (the more alike they are, more reliable they are)
  • The value s = 3.0 indicates, that on average, observations fall 3.0 units + or - from the mean

9 = 3

= S (x- µ)2

N

Measures of Dispersion

2 = S (x- µ)2

N

= 54 = 9 variance

6

9 = the measure of variability that indicated how much all of the scores in a distribution typically deviate or vary from the mean

  • Population variance also know as ‘mean deviation’ (mean of squared deviation from the mean)
x x - µ (x- µ)2
25 -5 25
28 -2 4
30 0 0
30 0 0
33 3 9
34 4 16
180 0 54

Sample Variance

Properties of the “s”

Gives a measure of dispersion relative to the mean

Sensitive to each score in the distribution

If a score is moved closer to the mean, then the standard deviation will decrease, if the score shifts away from mean, the standard deviation increases

3. Tends to underestimate the population variance, so provide an appropriate correction by subtracting 1 from total observations (n-1)

Sample Variance Example

s2 = S (X - X )2

n – 1 Sample

variance formula

54

(5-1)

Variance: 13.5 yrs

Standard Deviation: 13.5 = 3.67 yrs

x x - x (x-x)2 x2
25 -5 25 625
28 -2 4 784
30 0 0 900
33 3 9 1089
34 4 16 1156
150 0 54 4554

Let’s Analyze Class Ages

Mean and Standard Deviation
For 5 Learning Teams
Team1 Team2 Team3 Team4 Team5
24 35 31 35 44
31 34 24 44 27
30 24 27 31 25
47 22 24 42 27
29   24 21 21
      25  

Calculate means and Standard Deviations.

Create a bar chart

Create a histo gram

Create a Pareto Chart

*

Bar Chart

Chart1

Team1
Team2
Team3
Team4
Team5
Bar Chart
32.2
28.75
26
33
28.8

Class ages

Mean and Standard Deviation
For 5 Learning Teams Class Average
Team1 Team2 Team3 Team4 Team5
24 35 31 35 44
31 34 24 44 27
30 24 27 31 25
47 22 24 42 27
29 24 21 21
25
Mean 32.2 28.75 26 33 28.8 29.92
Sample Stand Deviation 8.7005746937 6.7019897543 3.0822070015 9.1433035605 8.8430763878 7.4238534468
Coefficient of Var 0.270204183 0.2331126871 0.1185464231 0.2770698049 0.3070512635 0.2481234441
Note that you go to "Insert" then "Function" then "Average"
to get the formula for the mean.
For the team standard deviations, do the same thing and then use the STDEV function (standard deviation for a sample)
For the class total standard deviation, use the STDEVP function, which is for a population.
Now, try doing the Skewness calculation on your own in the yellow highlighted area.
Team1 Team2 Team3 Team4 Team5 Team 4 Team 1 Team 5 Team 2 Team 3
Mean 32.2 28.75 26 33 28.8 33 32.2 28.8 28.75 26

Class ages

Bar Chart
Team1
Team2
Team3
Team4
Team5
Histogram
Pareto Chart

Histogram

Chart1

32.2 28.75 26 33 28.8
Team1
Team2
Team3
Team4
Team5
Histogram

Class ages

Mean and Standard Deviation
For 5 Learning Teams Class Average
Team1 Team2 Team3 Team4 Team5
24 35 31 35 44
31 34 24 44 27
30 24 27 31 25
47 22 24 42 27
29 24 21 21
25
Mean 32.2 28.75 26 33 28.8 29.92
Sample Stand Deviation 8.7005746937 6.7019897543 3.0822070015 9.1433035605 8.8430763878 7.4238534468
Coefficient of Var 0.270204183 0.2331126871 0.1185464231 0.2770698049 0.3070512635 0.2481234441
Note that you go to "Insert" then "Function" then "Average"
to get the formula for the mean.
For the team standard deviations, do the same thing and then use the STDEV function (standard deviation for a sample)
For the class total standard deviation, use the STDEVP function, which is for a population.
Now, try doing the Skewness calculation on your own in the yellow highlighted area.
Team1 Team2 Team3 Team4 Team5 Team 4 Team 1 Team 5 Team 2 Team 3
Mean 32.2 28.75 26 33 28.8 33 32.2 28.8 28.75 26

Class ages

Bar Chart
Team1
Team2
Team3
Team4
Team5
Histogram
Pareto Chart

Pareto Chart

Chart1

Team 4
Team 1
Team 5
Team 2
Team 3
Pareto Chart
33
32.2
28.8
28.75
26

Class ages

Mean and Standard Deviation
For 5 Learning Teams Class Average
Team1 Team2 Team3 Team4 Team5
24 35 31 35 44
31 34 24 44 27
30 24 27 31 25
47 22 24 42 27
29 24 21 21
25
Mean 32.2 28.75 26 33 28.8 29.92
Sample Stand Deviation 8.7005746937 6.7019897543 3.0822070015 9.1433035605 8.8430763878 7.4238534468
Coefficient of Var 0.270204183 0.2331126871 0.1185464231 0.2770698049 0.3070512635 0.2481234441
Note that you go to "Insert" then "Function" then "Average"
to get the formula for the mean.
For the team standard deviations, do the same thing and then use the STDEV function (standard deviation for a sample)
For the class total standard deviation, use the STDEVP function, which is for a population.
Now, try doing the Skewness calculation on your own in the yellow highlighted area.
Team1 Team2 Team3 Team4 Team5 Team 4 Team 1 Team 5 Team 2 Team 3
Mean 32.2 28.75 26 33 28.8 33 32.2 28.8 28.75 26

Class ages

Bar Chart
Team1
Team2
Team3
Team4
Team5
Histogram
Pareto Chart

Coefficient of Variation (CV)

  • The ratio of the standard deviation to the absolute value of the mean, expressed as a percentage
  • Useful when: data are different units or the data are in the same units, but the means are far apart
  • CV = s

X

(100) converts the decimal to a %

Normal Probability Distribution

Probability of 4 Heads in a Row

  • Mutually exclusive – Must assume outcomes (H or T) are mutually exclusive.
  • Collectively exhaustive – at least one of the events must occur when an experiment is conducted
  • If meet these two tests, then the sum of the probabilities equals 1.

Three Main Properties of a Normal Distribution

  • Bell-shaped curve – extending infinitely in both directions; symmetrical about the mean µ
  • All normal distributions have a particular internal distribution for the area under the curve; the relative area between any 2 designated points is always the same

The area under the curve between 2 points can be interpreted as the relative frequency (P) of the values included between those points

  • Theoretical distribution defined by 2 parameters: the mean µ and the standard deviation s

Characteristics of a
Normal Distribution

Area Under the Normal Curve

  • All normal curves are symmetrical and have an area of 1.0
  • Use a ‘standardized unit’ to compare with various normal curves; one that has a mean of 0 and a standard deviation of 1
  • By standardizing a normal distribution, we can report the distance between the mean in units or the standard deviation.
  • It’s called a Z Statistic

Z Score – Standardized Score

  • Z score –the distance between a selected value, X, and the mean, m

Z Score

  • X is the value of any observation or measurement
  • µ is the mean of the distribution
  • s is the standard deviation of the distribution
  • By determining the z value, we can find the area or the probability under any normal curve.
  • This is the probability that an observation is between 0 and the standard deviation (z score)
  • The % under the curve, the likelihood of the observation, by using the Empirical Rule (68-95-99.7 Rule)

The Lemonade Stand Game

Profits
=
Total Revenue – Total Costs
http://www.coolmath-games.com/lemonade/

Lemonade Stand Game

  • We had to buy cups, lemons, sugar, and ice
  • We had to forecast sales based on the weather.
  • Did you notice that you could get 12 cups to a pitcher?
  • More if you used lots of ice

Some Variables from the
Lemonade Stand Game

  • Cups Sold per day
  • Potential Customers Per Day
  • Total Revenue Per Day
  • Total Costs Per Day
  • Net Revenue Per Day
  • Running Profit/Loss

Pricing for Lemonade Stand Game

# of Cups Price
25 $ .80
50 $1.57
100 $3.13
# of Lemons Price
10 $ .69
30 $2.02
75 $4.28
Cups of Sugar Price
8 $ .71
20 $1.78
48 $3.48
# Ice Cubes Price
100 $ .78
250 $2.18
500 $3.80

At the start of the game, you’re given $20. You have to decide what price to charge your customers per cup of lemonade. The default price to start is $.25, but you can change that. You then have to plan on the number of POTENIAL customers you might have, and purchase supplies to meet your forecasted demand. Of course, the actual number of customers is probably going to be less. Therefore, the question is: Will you make enough money to cover the cost of your supplies and make a profit, or will you lose money and have to take out a loan to purchase more supplies for the next day?

Assignment for Next Meeting

  • Look at the pricing for lemons, ice, cups, and sugar.

Remember that a pitcher will serve 12 cups if there is no ice, and about 20 cups if there are 3 ice cubes per cup.

  • Can we afford our choice of lemons per pitcher, sugar per pitcher, and ice per cup? Can we make money at the price per cup that we decide to charge?

  • It will help if you do the following and SHOW YOUR WORK.

1. Calculate the number of cups/pitcher you can serve. ROUND UP to the nearest whole number.

2. Calculate the number of lemons, cups of sugar, and cups you’ll need to buy.

3. Look at the quantity pricing for each item and add up the cost to purchase your supplies.

4. Add up your costs. Are they less than $10?

5. Calculate your ACTUAL number of customers (the percent of potential customers) and multiply by your price per cup. Did you make money?