Discussion 2

profileVIK1299
Chapter3.ReasoningfromSampletoPopulation.pptx

Reasoning from Sample to Population

Chapter 3

© 2019 McGraw-Hill Education. All rights reserved. Authorized only for instructor use in the classroom. No reproduction or distribution without the prior written consent of McGraw-Hill Education

Learning Objectives

Calculate standard summary statistics for a given data sample.

Explain the reasoning inherit in a confidence level.

Construct a confidence interval.

Explain the reasoning inherit in a hypothesis test.

Execute a hypothesis test.

Outline the roles of deductive and inductive reasoning in making active predictions.

‹#›

© 2019 McGraw-Hill Education.

Population parameter a numerical expression that summarizes some feature of the population

Objective degree of support using inductive and deductive reasoning

Construction of a confidence interval

Hypothesis testing

Distributions and Sample Statistics

‹#›

© 2019 McGraw-Hill Education.

Random variable

A variable that can take on multiple values, with any given realization of the variable being due to chance (or randomness)

Deterministic variable

A variable whose value can be predicted with certainty

Distributions of Random Variables

‹#›

© 2019 McGraw-Hill Education.

Random Variables

Distributions of Random Variables

DISCRETE

COUNTABLE NUMBER OF VALUES (e.g., 5, 9, 19, 27…)

CONTINUOUS

UNCOUNTABLE INFINITE NUMBER OF VALUES (ALL THE NUMBERS, TO ANY DECIMAL PLACE, BETWEEN 0 AND 1)

‹#›

© 2019 McGraw-Hill Education.

Distributions of Random Variables

The probabilities of individual outcomes for a discrete random variable are represented by a probability function

EXAMPLE

OF 10 PEOPLE:

3 OF THEM ARE 25 YEARS OLD,

4 OF THEM ARE 30 YEARS OLD,

2 OF THEM ARE 40 TEARS OLD,

1 OF THEM IS 45 YEARS OLD.

PROBABILITY THAT A SINGLE DRAW WILL BE:

25 YEARS OLD IS 3/10 = 0.3

30 YEARS OLD IS 4/10 = 0.4,

40 TEARS OLD IS 2/10 = 0.2,

45 YEARS OLD IS 1/10 = 0.1.

DISCRETE RANDOM VARIABLE

POPULATION

‹#›

© 2019 McGraw-Hill Education.

Graphical Representation for a Discrete Random Variable (Age)

‹#›

© 2019 McGraw-Hill Education.

Distributions of Random Variables

The probabilities of individual outcomes for a continuous random variable are represented by a probability density function (pdf)

A special type of continuous random variable is called normal random variable, which has a “bell shaped” pdf

‹#›

© 2019 McGraw-Hill Education.

Graphical Representation for a Normal Random Variable

‹#›

© 2019 McGraw-Hill Education.

Distributions and Sample Statistics

For a normal random variable, and any other continuous random variable, the pdf allows us to calculate the probabilities that the random variable falls in various ranges.

The probability that a random variable falls between two numbers A and B is the area under the pdf curve between A and B

‹#›

© 2019 McGraw-Hill Education.

Probability that a Random Variable Falling Between Two Numbers

‹#›

© 2019 McGraw-Hill Education.

Distributions of Random Variables

Expected Value or Population Mean

The summation of each possible realization of Xi multiplied by the probability of that realization.

Variance

A common measure for the spread of the distribution; defined by E[(Xi – E(Xi)2].

Standard Deviation

The square root of the variance.

‹#›

© 2019 McGraw-Hill Education.

Data Samples and Sample Statistics

Sample Size of N

A collection of N realizations of Xi ; {Xi, X2…. XN }

Sample Statistics

Single measures of some feature of a data sample

Sample Mean

A common measure of the center of a sample

‹#›

© 2019 McGraw-Hill Education.

Sample Variance

Common measure of the spread of a sample

For a sample size of N for random variable Xi is:

Sample Standard Deviation

The square root of the sample variance

For a sample size of N for random variable Xi is:

Data and Sample Statistics

‹#›

© 2019 McGraw-Hill Education.

Confidence Interval

Suppose a firm wants to know the average age of its customers

It collects data from 872 of its customers, thus its sample size

Agei = a random variable defined as the age of a single customer

agei = the observed age of customer i in the sample

‹#›

© 2019 McGraw-Hill Education.

Confidence Interval

Estimator

A calculation using sample data that is used to provide information about a population parameter

Random sample

A sample where every member of the population has an equal chance of being selected

‹#›

© 2019 McGraw-Hill Education.

Confidence Interval

Deductive argument: If we have a random sample, the sample mean is a “reasonable guess” for the population mean

Inductive argument: Then the population mean is the same as the sample mean

How sure are we that the population mean in our example is the same as the sample mean?

Confidence interval a range of values such that there is a specified probability that they contain a population parameter

‹#›

© 2019 McGraw-Hill Education.

Confidence Interval

How do we build confidence intervals and determine their objective degree of support?

Independent

The distribution of one random variable does not depend on the realization of the another

Independent and identically distributed (i.i.d)

The distribution of one random variable does not depend on the realization of another and each has identical distribution.

‹#›

© 2019 McGraw-Hill Education.

Confidence Interval

Unbiased estimator

An estimator whose mean is equal to the population parameter it is used to estimate

Population standard deviation

The square root of the population variance

Population variance

The variance of a random variable over the entire population

‹#›

© 2019 McGraw-Hill Education.

Data and Sample Statistics

In order to construct a confidence interval for the population mean and know its objective degree of support, we must know something about its standard deviation and its type of distribution

The assumption that a data sample is a random sample implies the standard deviation of the sample mean is

The spread of the sample mean gets smaller as the sample size increases

‹#›

© 2019 McGraw-Hill Education.

Data and Sample Statistics

Assuming a random sample with reasonably large N (> 30) implies that the sample mean is normally distributed with the mean of µ and standard deviation of

This can be written as:

‹#›

© 2019 McGraw-Hill Education.

Probability Sample Mean within 1.96 Standard Deviations of Population Mean

‹#›

© 2019 McGraw-Hill Education.

Hypothesis Testing

Hypothesis test is the process of using sample data to assess the credibility of a hypothesis about a population

Making an assessment

Reject the hypothesis

Fail to reject the hypothesis

‹#›

© 2019 McGraw-Hill Education.

Hypothesis Testing

Null hypothesis

The hypothesis to be tested using a data sample

Written as H0: µ = K, where K is the hypothesized value for the population mean

The objective is to determine whether the null hypothesis is credible given the data we observe.

If a sample of size N is a random sample, N is “large” (>30) and µ = K, then

‹#›

© 2019 McGraw-Hill Education.

Probability Sample Mean within 1.96 Standard Deviations of Hypothesized Population Mean

‹#›

© 2019 McGraw-Hill Education.

Hypothesis Testing

Steps in Hypothesis Testing:

State the null hypothesis

Collect the data sample and calculate the sample mean

Decide whether or not to reject the deduced distribution for the sample mean

Degree of support

Measure how many standard deviations the sample mean is from the hypothesized population mean

Z =

‹#›

© 2019 McGraw-Hill Education.

Hypothesis Testing

To calculate Z, take the difference between the sample mean and the hypothesized population mean ()

Then take that difference and divide it by the standard deviation of the sample mean ()

t-stat is the difference between the sample mean and the hypothesized population mean () divided by the sample standard deviation (), or t =

‹#›

© 2019 McGraw-Hill Education.

Hypothesis Testing

Test statistic

Any single value derived from a sample that can be used to perform a hypothesis test

p-value

The probability of attaining a test statistic at least as extreme as the one that was observed

‹#›

© 2019 McGraw-Hill Education.

Graphical Illustration of a P-Value

‹#›

© 2019 McGraw-Hill Education.

Hypothesis Testing

The t-stat is an observed value from a t-distribution, a distribution that resembles a normal distribution and is centered at zero

In excel a p-value can be calculated using the formula:

2 × (1-norm.s.dist(, true))

If the observed t-stat is very unlikely (has a low p-value), then reject this distribution and vice versa

If the p-value is less than the cutoff, reject, and fail to reject otherwise

‹#›

© 2019 McGraw-Hill Education.

Hypothesis Testing

Cutoffs using p-values directly correspond to the degrees of support you chose for your inductive argument

If your chosen degree of support is D%, then the cutoff is

100 D%, or 1 D/100

Rejections will be incorrect 5% of the time using this rule because 5% of the time you will observe a p-value less than 0.05 even though the deduced distribution for the sample mean is correct

‹#›

© 2019 McGraw-Hill Education.

Hypothesis Testing

Standard degrees of confidence used are 90%, 95%, and 99%, the standard cutoffs using p-values are 0.10, 0.05, and 0.01

Reject the distribution if the p-value is less than 0.10; fail to reject otherwise. This generates a degree of support of 90%.

Reject the distribution if the p-value is less than 0.05; fail to reject otherwise. This generates a degree of support of 95%.

Reject the distribution if the p-value is less than 0.01; fail to reject otherwise. This generates a degree of support of 99%.

‹#›

© 2019 McGraw-Hill Education.

The Interplay Between Deductive and Inductive Reasoning in Active Predictions

The underlying reason for active predictions:

Forming the prediction uses deductive reasoning

Assume the causal relationship, when then implies the prediction

Estimating the causal relationship uses deductive and inductive reasoning

Deductive reasoning: Make assumptions that imply causality between X and Y and the distribution of an estimator for the magnitude of this causality in the population

Inductive reasoning: Using an observed data sample, build a confidence interval and/or determine whether to reject a null hypothesis for the magnitude of the population-level causality

‹#›

© 2019 McGraw-Hill Education.

image1.png

image2.JPG

image3.JPG

image4.png

image5.JPG

image6.JPG

image7.JPG

image8.png

image9.JPG

image10.png

image10.JPG

image12.png

image11.JPG

image14.png

image15.png

image12.JPG

image17.png

image18.png