Reflection Paper
A Practical Approach to Analyzing Healthcare Data, Fourth Edition Chapter 5, Analyzing Continuous Variables
Susan White, PhD, RHIA, CHDA
ahima.org
© 2019 AHIMA
ahima.org
Learning Objectives
Compare and contrast commonly used measures of central tendency
Compare and contrast commonly used measures of variation or spread of values
Illustrate appropriate inferential statistics to use for continuous data
© 2019 AHIMA
ahima.org
Continuous Variables
Data elements that represent naturally numeric values that can take infinite values
Interval (no true zero)
Ratio
Healthcare Examples
Length of stay
Charge
Systolic blood pressure
Age
Time to code records
© 2019 AHIMA
ahima.org
Descriptive Statistics Measures of Central Tendency
Mean
Arithmetic average
Sum of values divided by the number of values
Median
Middle value
If even number of values, average of two middle values
Less influenced by extreme values or outliers than the mean
Mode
Most frequent value
© 2019 AHIMA
ahima.org
Descriptive Statistics Measures of Variation
Range
Maximum value minus minimum value
Interquartile range
Difference between the third and first quartile
Variance
Average squared deviation from the mean
Unit of measure is “squared units”
Standard deviation
Square root of the variance
Unit of measure is same as unit of measure in sample
© 2019 AHIMA
ahima.org
Descriptive Statistics Example
Calculate the mean, median and mode of the following sample length of stay data:
2, 4, 6, 3, 1, 2, 5
Mean:
Median
Sort values: 1, 2, 2, 3, 4, 5, 6
Median or middle value = 3
Mode
2, since it is the most frequent value
Note: The mode is rarely used for continuous variables that have many unique values and is presented here for demonstration purposes.
© 2019 AHIMA
ahima.org
Descriptive Statistics Example
Calculate the range, variance and standard deviation of the following sample length of stay data:
2, 4, 6, 3, 1, 2, 5
Range = 6 – 1 = 5
Sample variance
= 3.2
Standard deviation
s =
© 2019 AHIMA
ahima.org
Review: Hypothesis Testing Steps
Determine the null and alternative hypotheses
Set the acceptable type I error or alpha level
Select the appropriate test statistic
Compare the test statistic to a critical value based on the alpha level and the distribution of the test statistic
Reject the null hypothesis if the test statistic is more extreme than the critical value. If not, do not reject the null hypothesis.
© 2019 AHIMA
ahima.org
Inferential Statistics One Sample t-test
One-sample t-test
Used to test if a population value is different from a standard or benchmark
Test statistic:
Compare to a t-distribution to determine critical value
May be one sided or two sided
Anatomy of test statistic:
Numerator: distance from sample mean to null hypothesis value
Denominator: standard error of the sample mean (SEM)
© 2019 AHIMA
ahima.org
Inferential Statistics One Sample t-test - Example
Suppose the researcher that collected the length of stay (LOS) data in the previous examples would like to determine if the population LOS is longer than a standard of 3 days.
Step 1: Determine the null and alternative hypotheses
Ho: µ ≤ 3
Ha: µ > 3
Step 2: Set the acceptable type 1 error rate (AKA alpha level).
Set α = 0.05
Step 3: Select the appropriate test statistic: t-test
© 2019 AHIMA
ahima.org
Inferential Statistics One Sample t-test -Example
Step 3 (con’t)
Recall from previous slides:
s = 1.8
n = 7
=0.44
Step 4: Compare test statistic to critical value.
T-test statistic critical value comes from the t-distribution with n-1 degrees of freedom
T-distribution is symmetric around zero much like standard normal (bell curve); width is defined by the degrees of freedom. (see Figure 5.1 in text)
© 2019 AHIMA
ahima.org
Inferential Statistics One Sample t-test - Example
Step 4 (con’t): t= 0.44; df = n – 1 = 7 -1 = 6, one sided test at α=0.05, critical value = 1.943
Step 5: Reject the null hypothesis if the test statistic is more extreme than the critical value. 0.44 is not greater than 1.943, do not reject the null hypothesis and conclude that the LOS is not longer than the standard
© 2019 AHIMA
ahima.org
Inferential Statistics Confidence Interval for Population Mean
Recall that a confidence interval is a range of values that is likely to cover the true population value with a pre-defined probability or level of confidence
A (1-α)% confidence interval for the population mean is centered at the sample mean and has a width that is dependent on the confidence level and standard error of the mean
Higher level of confidence requires a wider interval
Large sample size results in a narrower interval
Width of confidence interval is a measure of the precision of the estimate of the sample mean
A narrower interval is more precise
© 2019 AHIMA
ahima.org
Inferential Statistics Confidence Interval for Population Mean
Formulate a 95% confidence interval for the LOS data presented in the previous example:
s = 1.8
n = 7
95% CI, so α = 0.05; α/2 = 0.025
df = 6
Critical value (table 5.1) = 2.447
95% CI:
1.7
(1.6,5.0)
We are 95% sure that the range 1.6 to 5.0 days includes the true population LOS is between
© 2019 AHIMA
ahima.org
Inferential Statistics Paired t-test
Paired t-test
Used to compare pre/post test population values or matched pairs
Test statistic:
Where d = difference between the pre/post values or the pairs
Compare to a t-distribution to determine critical value
May be one sided or two sided
Anatomy of test statistic:
Numerator: distance from sample mean difference to null hypothesis value (usually zero)
Denominator: standard error of the sample mean difference (SEM)
© 2019 AHIMA
ahima.org
Inferential Statistics Paired t-test – Example
The transition from ICD-9 to ICD-10 is predicted to cause an increase in the amount of time required to code medical records. A pilot study was conducted using a random sample of 10 records to determine if the time required was significantly different. Each record was coded using the two coding systems by on coder. The values are recorded in the table.
Step 1: Determine the null and alternative hypotheses:
Ho: D = 0
Ha: D ≠ 0
Step 2: Set the alpha level: 0.01
| ID | ICD-9 Time | ICD-10 Time | d |
| 1 | 10 | 15 | 5 |
| 2 | 11 | 12 | 1 |
| 3 | 15 | 10 | -5 |
| 4 | 30 | 36 | 6 |
| 5 | 5 | 7 | 2 |
| 6 | 10 | 13 | 3 |
| 7 | 8 | 5 | -3 |
| 8 | 11 | 19 | 8 |
| 9 | 21 | 19 | -2 |
| 10 | 18 | 23 | 5 |
© 2019 AHIMA
ahima.org
Inferential Statistics Paired t-test – Example
Step 3: Select the appropriate test statistic:
Step 4: Compare the test statistic to the critical value
=1.49
Compare to t distn with df = 9, α/2 = 0.005
1.49 not > 3.25
Step 5: Do not reject Ho
| ID | ICD-9 Time | ICD-10 Time | d |
| 1 | 10 | 15 | 5 |
| 2 | 11 | 12 | 1 |
| 3 | 15 | 10 | -5 |
| 4 | 30 | 36 | 6 |
| 5 | 5 | 7 | 2 |
| 6 | 10 | 13 | 3 |
| 7 | 8 | 5 | -3 |
| 8 | 11 | 19 | 8 |
| 9 | 21 | 19 | -2 |
| 10 | 18 | 23 | 5 |
© 2019 AHIMA
ahima.org
Inferential Statistics Two Sample t-test
Used to test if a two population means are different
Test statistic complex
Denominator is standard error pooled across the two samples
use statistical software to calculate
Compare to a t-distribution to determine critical value
May be one sided or two sided
Anatomy of test statistic:
Numerator: distance between the two sample means
Denominator: pooled standard error of the difference between the two sample means
© 2019 AHIMA
ahima.org
Inferential Statistics Two Sample t-test - Example
An analyst wanted to find out if the charges for CHF patients admitted through the emergency
department (ED) are different from those admitted through other sources. The data for the
sample may be found in the Chapter 5 Data file. The summary statistics from the samples
appear in the table below:
Step 1: State hypotheses:
Ho: µ1= µ2
Ho: µ1≠ µ2
Step 2: Set the alpha level = 0.01
Step 3: Determine the test statistic: T-test
© 2019 AHIMA
ahima.org
Inferential Statistics Two Sample t-test – Results from R
Step 4: Compared test statistic to critical value
t = 2.2363 with a p-value = 0.03189
Step 5: Reject null hypothesis if test statistic is more extreme than critical value or the p=value is less than alpha
Reject the null hypothesis (0.03 < 0.05) and conclude that patients that are admitted through the emergency department have longer lengths of stay
© 2019 AHIMA
ahima.org
Inferential Statistics ANOVA
Used to test if a more than two population means are different
Test statistic: F-test
Best to use software to compute
Compare to an F-distribution to determine critical value
Anatomy of test statistic:
Numerator: variance between comparison groups
Denominator: variance within comparison groups
© 2019 AHIMA
ahima.org
Inferential Statistics ANOVA
Sum of Squares
Degrees of Freedom
Mean
Squares
Test statistic: F
© 2019 AHIMA
ahima.org
Inferential Statistics ANOVA - Example
The Medicare severity-adjusted diagnosis-related group (MS-DRG) system is designed so that the level of resources as measured by charges per patient required to treat a patient are different within the no complication or comorbidity, complication or comorbidity (CC), or major complication or comorbidity (MCC) family. An analyst was asked to test to see if that relationship was true at her facility. A sample of 80 cases was selected for the three congestive heart failure MS-DRGs: 291 (MCC), 292 (CC), and 293 (no CC or MCC). Since three populations of patients are compared, the analyst used R to generate summary statistics and the ANOVA table below.
© 2019 AHIMA
ahima.org
Inferential Statistics ANOVA - Example
Step 1: State the hypotheses
Ho: µ291= µ292= µ293
At least two of the population means are unequal
Step 2: Set the acceptable error level: α=0.05
Step 3: Determine the appropriate test statistic: F-test
Step 4: Compare p-value to α=0.05
Step 5: Conclude to reject Ho since p < 0.0001 < 0.05
© 2019 AHIMA
ahima.org