A6-20

profilesmiedr
Week7.pptx

One Sample T test and Paired T test

1

Learning objective

Understand the t distribution and the degree of the freedom for t distribution

Apply the descriptive analysis for a continuous outcome

One- Sample T test and paired T test

Interpret the results from one-sample T test and paired T test and make inference

2

t distribution and degree of freedom

Sampling distribution when the statistic is the sample mean

Similar to a normal distribution in shape

Mean and median in the center at the peak

Thick Tails than normal distribution

As sample size increases , the t-distribution is more close to normal distribution

Degree of freedom : Independent information needed to estimate parameters and variability

http://www.countbio.com/web_pages/left_object/R_for_biology/R_biostatistics_part-1/t_distribution.html

When the statistic is a sample mean, the sampling distribution is known as t distribution.

The t-distribution is very similar to the normal distribution in term of shape.

The mean is in the center, but the tails are thicker than those of the normal distribution.

The t-distribution is also characterized by the degree of freedom. It is needed to estimate the parameters and variability.

Degree of freedom is independent pieces of information needed to estimate parameters and variability.

For example, n subjects are recruited in a random sample. These subjects are independent and n pieces of independent information are available.

If mean is estimated, then there are n-1 pieces of independent information or degrees of freedom left over to estimate the variance.

3

One Sample T test: Historical control

Hypothesis

Research Hypothesis: the true mean is not equal to the historical control

Null hypothesis: the true mean is equal to the historical control

If we have a continuous variable, we only had one group of subjects and wanted to compare the estimated population mean to a fixed number(Historical control) and we call use one Sample T test.

For example, if we know the estimated average weight for Americans 10 years ago, we want to see current average weight for American is different or equal to the estimated mean 10 years ago.

And we can use one sample T test.

The hypothesis test involving the mean from only one group is called one-sample T test

Hypothesis tests assume that the null parameter is the true parameter.

And in a hypothesis test, we assume the null hypothesis is true.

When hypothesis involves the mean, the null parameter is at the center of the sampling distribution.

Statistics that are close to the center of the sampling distribution are supposed to be common or happen often.

Statistics that are far away from the center or the null parameter are rare.

If the statistic is common , then it does not provide evidence against the null parameter

If the statistic is rare , then it does provide evidence against the null parameter

Recall we use p value or the probability or proportion of all values that are greater than the observed statistic to measure whether the statistic is common or rare.

4

Two sided (Tailed) Test, Critical value, Significance level

(Source: https://statisticsbyjim.com/hypothesis-testing/one-tailed-two-tailed-hypothesis-tests/)

Most of hypothesis testing are two sided testing. It means that we only can test the mean is not equal to one number, but we are testing whether it is greater or smaller.

It also is called two tailed test.

In our hypothesis from last slide:

Research Hypothesis: the true mean is not equal to the historical control

Null hypothesis: the true mean is equal to the historical control

When we perform a two sided test, we split our significant level on the both sides.

Last class, we talk about the significance level. We define the chance of type I error as the significance level. And we usually use 0.05 as the significance level.

In two sided test, we split 0.05 on the both sides. So there are 0.025 in each side (figure above on the left).

The t value matching the significance levels (0.025) on both sides are called critical values.

For t distributions , the critical values are different based on the different degree of freedom.

In each hypothesis test , we compare the test statistics value to the critical value to see p value is less , equal, or greater than 0.05.

Of course, we can calculate the exact p value from online T value calculator or from SAS function.

In one side(right sided) hypothesis testing (the figure on the right), the example hypothesis as follows:

Research Hypothesis: the true mean is greater than the historical control

Null hypothesis: the true mean is less than or equal to the historical control

And the left sided hypothesis testing is similar

Research Hypothesis: the true mean is less than the historical control

Null hypothesis: the true mean is equal to or greater than the historical control

5

Test statistics and degree of freedom from one Sample t test

Hypothesis:

Research Hypothesis: the true mean is not equal to the historical control

Null hypothesis: the true mean is equal to the historical control

Test Statistic: t = ;

= The sample mean;

= The guess of population mean;

s= The standard deviation of the sample

n = The sample size

Degree of freedom : n-1

6

One-sample confidence intervals (CI) for the mean

Steps to construct the confident interval

Calculate or identify the sample mean()

Calculate or identify the standard error(SE)

SE = s = sample standard deviation; n = sample size

3. Decide the confidence level

4. Construct the Confidence interval based on the confidence level

90% CI : (t value * SE)

95% CI: (t value* SE)

99% CI: (t value * SE)

t value is decided based on the confidence level and degree of freedom

7

Example of one sample T test

A random 20 sample was extracted from NHANES 15-16 cycle for subjects’ hemoglobin level (g/dL):

15.2, 11.5, 10.8, 14.6, 17.5, 12.4, 12.8, 13.7, 16.1, 9.1

13.4, 16.4, 12.9, 12.5, 13.7, 11.9, 16, 14.4 , 12.8, 14.2

Conduct a one sample T test to investigate whether the average hemoglobin level is equal to the historical control 13.5g/dL

= (15.2+ 11.5+ 10.8+ 14.6+ 17.5+ 12.4+ 12.8+ 13.7+ 16.1+ 9.1+13.4+ 16.4+ 12.9+ 12.5+ 13.7+ 11.9+ 16+ 14.4 + 12.8+ 14.2) /20 =13.60 g/dL

To compare the sample to the history control , the sample mean is a little bit of lower than the historical control 13.5g/dL

S2= = = ((15.2-13.60)^2+(11.5-13.60)^2+…….+(14.4-13.60)^2+((12.8-13.60)^2+((14.2-13.60)^2) / (20-1) =4.24

s= = =2.06

t== = (13.60-13.5) /()=0.22

We can use 0.05 as significance level ,

We can use SAS or online T value calculator to find the critical value

The critical value for t distribution with 19 degree of freedom is 2.09.

Compare the t value(0.22) to the critical variable(2.09) . The t value is less than the critical value , so p should be greater than 0.05

We use SAS to find the p value is 0.828.

SAS code :

data t2 ;

x= (1-CDF('T', 0.22, 19))*2;

run;

So we fail to reject the null hypothesis and we conclude that the average hemoglobin is not different than the historical control.

We also can construct the 95% CI for the mean:

1) the mean is 13.60

2) SE = ()= 0.46

3)T value for 95% confidence level = critical value = 2.09

data T1 ;

x=TINV(0.975,19);

run;

4) So the 95% CI is 13.60

We can see the historical control 13.5 falls into the 95% CI , so it is also a way to check whether the statistical is significant or not.

If CI covers the historical value, it is not significant; if it does not cover it significant.

8

Example of the average hemoglobin level (g/dL) in NHANES 15-16 cycle

Conduct the one sample test to investigate whether the average hemoglobin level is equal to the historical control 10.5 g/dL

The estimated

mean

95% CI of the mean

First, the point estimate = 13.63 g/dL

To compare the sample to the history control , the sample mean is lower than the historical control 10.5g/dL

s=1.49

t== = (13.60-10.5) /()=189.65. We also check the distribution of the data look at a t distribution or normal distribtuion

We use 0.05 as significance level ,

We can use SAS or online T value calculator to find the critical value

data T1 ;

y=TINV(0.975,8116);

run;

The critical value for t distribution with 8117-1= 8116 degree of freedom is 1.96.

Compare the t value(189.65) to the critical variable(1.96) . The t value is greater than the critical value , so p should be less than than 0.05

We use SAS to find the p value is 0(note: it is not actually 0, but really small)

SAS code :

data t2 ;

x= (1-CDF('T’, 189.65, 8116))*2;

run;

So we reject the null hypothesis, and we conclude that the average hemoglobin is different than the historical control.

We also can construct the 95% CI for the mean:

1) the mean is 13.63

2) SE = ()= 0.0165

3)T value for 95% confidence level = critical value = 1.96

data T1 ;

x=TINV(0.975,8116);

run;

4) So the 95% CI is 13.63

We can see the historical control 10.5 does not fall into the 95% CI , so it is also a way to check whether the statistical is significant or not.

If CI covers the historical value, it is not significant; if it does not cover it significant.

So we know it is significant since the CI does not cover the historical value.

Conclusion: p value from the t test is less than 0.0001 and we reject the null hypothesis and conclude that the estimated mean is not equal to 10.5g/L.

The average hemoglobin is estimated to be 13.63g/dL with 95 CI (13.60 g/dL- 13.66 g/dL)

9

Paired T test

Research questions involve a change or a difference within subjects

For example:

1) pre and post weight after a weight loss program

2) Before and after hemoglobin values of pregnant study

First to get paired difference is the difference between a pair of measurements.

Then Test whether the difference is equal to 0

Paired T test is used when each pair 9pre and post measurements) is conduced on the same subjects.

The goal of a paired difference analysis is to determine whether the difference is 0

10

Hypothesis for paired T test

Research Hypothesis: the true mean of the paired difference is not 0

Null hypothesis: the true mean of the paired difference is 0.

T Test Statistic: t = ;

= The sample mean of the paired difference;

= The sample standard deviation of paired difference

n = The sample size

11

Example of Paired T test

So 10 people participated a weight loss program.

They were measured on the weight in kg before (pre) and after (post) they start the program. The data are listed below

Id 1 2 3 4 5 6 7 8 9 10
Pre 55 72 80 65 90 56 100 150 132 170
post 53 60 75 63 85 53 90 100 120 140
pre-post 2 12 5 2 5 3 10 50 12 30

First of all, calculate the paired difference by using each pre value minus post value.

= (2+12+5+2+5+3+10+50+12+30) /10 =13.10 kg

To compare the sample to the history control , the sample mean is a little bit of lower than the historical control 13.5g/dL

S2= = = ((2-13.10)^2+(12-13.10)^2+…….+(12-13.10)^2+(30-13.10)^2)/ (10-1) =237.66

s= = =15.4161

t== = (13.10-0) /()=2.69

We can use 0.05 as significance level ,

We can use SAS or online T value calculator to find the critical value

The critical value for t distribution with 9 degree of freedom is 2.26.

data T1 ;

y=TINV(0.975,8116);

run;

Compare the t value(2.69) to the critical variable(2.26) . The t value is greater than the critical value , so p should be greater than 0.05

We use SAS to find the p value is 0.0248.

SAS code :

data t2 ;

x= (1-CDF('T’, 2.69, 9))*2;

run;

So we reject the null hypothesis and we conclude that the average weight difference is different 0.

We also can construct the 95% CI for the mean:

1) the mean is 13.10

2) SE = ()= 4.87

3)T value for 95% confidence level = critical value = 2.26

data T1 ;

x=TINV(0.975,9);

run;

4) So the 95% CI is 13.10

Conclusion: p value from the t test is equal to 0.0248 and we reject the null hypothesis and conclude that the estimated mean is not equal to 0.

The average weight loss is estimated to be 13.10 kg with 95 CI (2.07 kg – 24.13 kg)

12