Outline

Gem24
BiostatisticsPP.pptx

Biostatistics

DH 242 Dental Public Health

1

RESEARCH

SCIENTIFIC METHOD: a series of logical steps starting with the formulation of a problem

Formulation of a problem (question)

Formulation of a hypothesis ( a proposed answer to the question)

Collecting data (existing as well as gathering your own

Analysis and interpretation of the results

Presentation of the results

Formulation of a conclusion (relationship of results to hypothesis

2

Data

pieces of information

e.g., numbers, collected from measurements and counts obtained during the course of a research study

3

Collecting Data

Review relevant available literature.

Design research…determine how the study will be conducted; how the data will be collected using various data collection methods.

Instruments and examiners for data collection must be calibrated…both valid and reliable

Validity: concerned with gathering data that have been intended to be collected. Reliability: refers the consistency and stability of the data. The data are reliable if the examiners are calibrated and can reproduce the results.

4

STATISTICS AND BIOSTATISTICS

Statistics is the science of making statements about an entire population from a limited sample of that population. It involves analyzing data and drawing conclusions, taking variation and uncertainty into account.

Biostatistics is simply the application of these methods in biologically relevant areas. The appropriate use and interpretation of biostatistical measures and tests are essential to every stage of a dental public health initiative. To define a problem in a community, you first must quantify it using descriptive statistics and measures of disease.

5

Biostatistics

The use of data analysis and interpretation in care research

Most often computer programs are used to do all of the computations.

Data Analysis: Two Steps

The first is to calculate descriptive statistics, the characteristics of the data found within the sample of individuals in whom the study was conducted.

The second step is to calculate inferential statistics. The purpose of generating inferential statistics is to determine whether the results found in the sample may be a result of chance or, assuming no other threats to validity, whether we can generalize our results to the general population of interest.

7

Biostatistics – Data Analysis

Involves the application of statistical tests to the data in order to organize, describe, summarize, and analyze it to answer a research question or test a hypothesis

Explains results, requires that critical thinking be used to explain the meaning and application of the findings, identifies possible factors that could have influenced the results, and draws inferences to the population.

Use of Biostatistics in Dental Hygiene

Used to demonstrate response to dental hygiene therapy

Tests products and treatment regimens used in dental hygiene therapy

Determines the needs of target populations

Evaluates oral health treatment

Prevention of dental disease

Education programs

Variety of other purposes in relation to oral health care

Dental Hygienists Should Understand the Research Process, Including

Data analysis

Interpretation

Critical analysis results

Dental Hygienists Should Understand the Research Process In Order To:

Understand the epidemiology of disease.

Practice therapies.

Implement programs.

Practice evidence-based dentistry.

Causes of Invalid Research

Insufficient number of subjects

Too short of a duration

Incorrect measurement instruments

Incorrect procedures are utilized

Incorrect statistical tests are used to analyze data

Categorizing Data

Quantitative Data

Qualitative Data

Continuous Variable

Discrete Variable

Categorical Variable

Dichotomous Variable

Nominal Scale

Ordinal Scale

Interval Scale

Ratio Scale

Quantitative Data

Represented by numbers

Expressed as counts, percentages, and means

Qualitative Data

Information that reflects the quality or nature of variables that cannot be expressed numerically

Expressed as outcomes, or states, and can be counted for reporting

Variables can be rank ordered

Types of Data

Ways to categorize data

Continuous (Data) Variable

Made up of distinct and separate units or categories

Expressed as large or infinite number of measures along a continuum

Expressed in fractions and are considered quantitative

Can be converted into nominal or ordinal scales

Discrete (Data) Variable

Made up of distinct and separate units or categories, but is counted only in whole numbers

Also quantitative because it is represented numerically

Can be converted to nominal or ordinal scales

Categorical (Data)Variable

A variable that has no numeric representation

Dichotomous (Data) Variable

Categorical variable that places subjects into only two groups

Categorical and dichotomous variables are qualitative in nature

20

Scales of Measurement

21

Nominal Scale

Organizes data into mutually exclusive categories

Categories have no rank order or value

No numeric relationship between the different classifications

Ordinal Scale

Organizes data into mutually exclusive categories that are rank ordered based on criterion

Difference in ranks is not equal in value

.

Interval Scale

Characteristics of the ordinal scale plus it has equal distance between any two adjacent units of measurement

No meaningful zero point

Ratio Scale

Scale of measurement that contains all the characteristics of the preceding scales

Has an absolute zero point

Scales of Measurement

Different scales of measurement are used for discrete and continuous data.

Discrete: use nominal and ordinal

Continuous: use interval and ratio

26

Degrees of Freedom

Also known as “df”

Refers to the number of values or observations that are free to vary when computing a statistic

Represents the number of measurements taken, minus one for each population

The number is necessary to interpret inferential statistical tests.

It is based on sample size, so the larger the (df), the easier it is to obtain a statistically significant result.

DATA ANALYSIS AND PRESENTATION OF RESULTS

Statistical analysis makes an assumption about a population.

Two types:

28

Descriptive Statistics

Consists of the procedures that are used to summarize, organize, and describe quantitative data

Described with the use of tables and graphs

Inferential Statistics

Used to make inferences or generalizations about a population based on data taken from a sample of that population

“Making statistical decisions”

Inferential Statistics

INFERENTIAL STATISTICS

Seek to determine a generalization between the sample studied and the actual population

May be either parametric or nonparametric statistical techniques

Based on the assumption that sampling is randomly collected

32

INFERENTIAL STATISTICS

STATISTICAL SIGNIFICANCE

Indicates whether the results found in an analysis of data have occurred by chance or have been caused by the independent variable.

May be influenced by a sample size that is too small (less than 30). Not enough information has been provided to make generalizations about the populations

33

Confidence Intervals

A confidence interval is a statistical technique used to infer the true value of an unknown population parameter.

Typically, 95% and 99% confidence intervals are used.

The use of a 95% confidence interval is acceptable in oral health research.

HYPOTHESIS TESTING

The second approach to statistical inference is hypothesis testing.

The goal of hypothesis testing is to judge the evidence for a hypothesis.

Hypothesis testing can be divided into four discrete steps: (i) formally stating the null and alternative hypotheses; (ii) choosing an appropriate statistical test; (iii) conducting the statistical test to obtain a p-value; and (iv) comparing the p-value against a fixed cutoff for statistical significance—α (alpha).

Typically, this value is set to 0.05. Essential to the concept of hypothesis testing is the p-value. The objective of hypothesis testing is to formally weigh the evidence against a null hypothesis.

P-value

The p-value is a probability value.

It represents the probability that the findings from the study are due to chance.

The p-value commonly accepted in oral health research is equal to or smaller than 0.05.

If larger than 0.05, the results are said to be not statistically significant.

Power Analysis

Used to determine how many subjects are needed to provide significance in a research study

Calculated by using a statistical formula

“Power of a study” refers to its ability to detect relationships among variables

Is directly related to the sample size and the precision in planning and conducting the study

Importance of statistical significance the greater the significance, the more statistical inference can be made regarding the study population

37

Formulation of a Conclusion/Relationship of Results to the Hypothesis

Determination of whether the study shows significance

The researcher will either accept or reject the null hypothesis. He/she may make an error in this conclusion. There are two types of errors.

Type I alpha (α) occurs when the researcher rejects the null hypothesis when it is actually true states that a relationship exists between the variables when there is none

Type II (β) occurs when the researcher accepts the null hypothesis when it is actually false states that no relationship exists between the variables when one actually does.

38

Null Hypothesis

Is an initial negative statement of belief about the value of a population parameter

Null hypothesis is accepted unless the statistical test indicates it should be rejected

Example:

Two groups do not differ on a variable

Research Hypothesis

Called the alternative or positive hypothesis

Is the logical opposite and can indicate a direction of difference

Example:

One brand of sealants does differ from another brand of sealants

Statistical Decision

Made about the null hypothesis based on the results of inferential statistics

Decision to reject or accept the null hypothesis is based on probability at a significance level

Type I Error

A type I error is also called an alpha error.

It occurs when the null hypothesis is rejected and is actually true so it should have been accepted.

The probability of computing a type I error is the same as at the alpha level.

Researchers can control a type I error by setting the alpha level low.

This type of error can be very costly.

Type II Error

A type II error is also called a beta error.

It occurs when the null hypothesis is accepted, but it is actually false, so it should have been rejected.

The exact probability of computing a type II error is generally unknown.

They are caused by using too small a sample, unreliable measuring devices, or imprecise research methods.

Parametric Inferential Statistics

Used for hypothesis testing when the data meet certain assumptions

Data must be classified as continuous (includes ratio, interval, and ordinal data)

Types of parametric statistics:

Student t-test

Analysis of variance (ANOVA)

Student t-test

Used to compare two mean scores to determine if there is a statistically significant difference

Two types of t-tests:

T-test for independent samples (nonpaired t-test)

T-test for correlated samples (t-test for paired samples)

ANOVA

Used to determine if statistically significant differences occur when comparing more than two mean scores

Data are presented in complex tables

Nonparametric Inferential Statistics

Most useful for data measured at the nominal or ordinal scale which are qualitative

Nonparametric tests involve fewer assumptions about the population.

Sample size may be small; variables are discrete

Chi-Squared Test: most commonly used; used to analyze questionnaire data and to determine whether a relationship exists between two variables

48

DATA ANALYSIS AND PRESENTATION OF RESULTS

49

Measures of Central Tendency

Mean: average; used with continuous ordinal data

Median: midpoint of the data; used with ratio, interval, or ordinal data

Mode: value that occurs most often; used with all types of data

MEASURES OF CENTRAL TENDENCY

MEAN

The average of the group; sum of all the values divided by the number (n) of items

Disadvantage: extreme scores may distort the true average or representation

MEDIAN

The exact middle score in an ordered distribution of scores; the point above and below which 50% of the scores lie

Disadvantage: may not reflect a true midpoint if scores are not evenly distributed

MODE

The score that appears most frequently in a distribution of scores; may be unimodal, bimodal, multimodal, or no mode

Measures of Dispersion

Communicates how much variation is present in a group of data

“Measure of variability”

Describes distribution of data within a research study

Range

Variance

Standard deviation

Measures of Dispersion – Range

Determined by subtracting the lowest score from the highest score

Simplest and least helpful measurement

Usually reported with the median

Measures of Dispersion – Variance

Represents the average distance of each score from the mean

Standard deviation (SD) is associated with range

The most common and useful measures of dispersion

Usually reported with the mean to calculate data intervals

Value of the variance or the SD in relation to the mean depicts the distribution of scores

Descriptive Statistics

57

The Normal Distribution

Forms the theoretical foundation for comparisons and making statistical decisions

A symmetrical, unimodal, bell-shaped curve

Explains why random variables tend to be normally distributed

Mean, median, and mode equal in value

The Normal Distribution – Empirical Rule

Provides an estimation of the spread of data given the mean and the standard deviation of a data set that follows the standard normal distribution

68% of data fall within one SD of the mean, 95% within two SD, and 99.7% within three SD

FIGURE 18-1 Standard Normal Distribution

FIGURE 18-2 Empirical Rule of the Normal Distribution Source: Darby ML, Bowen DM. Research Methods for Oral Health Professionals: An Introduction. St. Louis, MO: C. V. Mosby, 1980. Reprint. Pocatello, ID: McCann; 1993.

The Normal Distribution - Central Limit Theorem

Normal distribution is the foundation of the central limit theorem.

Less sampling error will occur with a larger sample, and a sample size of 30 or more will estimate the population mean with reasonable accuracy.

.

The Normal Distribution – Standard Error of the Mean

The standard deviation of the sample means

Indicates that each sample mean is likely to vary somewhat from the population mean

Larger sample size significantly reduces the standard error

Skewed Distribution

When a distribution of scores is asymmetrical, the curve is said to be distorted or skewed.

Skewing is caused by a few extreme scores in the distribution.

It can be identified by comparing the mean and median of the distribution.

Positively or negatively skewed

Median and mode more accurately represent central tendency in a skewed distribution

May result from using small or homogenous samples, or failing to use random sampling or random assignment techniques

FIGURE 18-3 Skewed Distributions

Advantages of Graphing Data

Effective and economic communication of data

Easier and quicker understanding and interpretation of data

The ability to compare multiple distributions visually

Frequency Distribution Tables

Frequency distribution tables are used to present data in a way that shows the number of times each score occurs in the group of scores.

Distribution tables can be either grouped or ungrouped.

Data can be displayed in a graph

Facilitates our understanding and interpretation of data

Data presented should be understandable even without written explanation

Types of Graphs

Bar graph

Histogram

Frequency polygon

Polygon

Scattergram

Pie chart

Bar Graph

Used to represent categorical data

Spaces separate bars to emphasize the discrete nature of the variable

Length of the bar corresponds with the frequency of the value

Cluster bar graph can also be created

FIGURE 18-4 Bar Graph of Reasons for Missed Clinic Appointments

FIGURE 18-5 Cluster Bar Graph

Histogram

Similar to a bar graph but the bars appear side by side (touching)

Used for interval or ratio variables

Used to represent grouped and ungrouped frequencies

Used for ordinal data that is treated as continuous data

FIGURE 18-6 Histogram

Frequency Polygon

A line graph that represents frequency data that are continuous in nature

Drawn by connecting midpoints of the bars of a histogram, then extending the line at both ends to imaginary midpoints at the right and left of the histogram

Used to represent grouped or ungrouped frequencies

Can also represent frequency, percent, cumulative frequency, or cumulative percent

FIGURE 18-7 Frequency Polygon Comparing Two Distributions

Polygon

Line graph

Used to plot a variable over time

FIGURE 18-8 Polygon

Scattergram

Shows the relationship between two variables

Shows how the level of one variable varies as the level of the other variable changes

FIGURE 18-9 Scattergrams Demonstrating Relationships of Data

FIGURE 18-9 (continued) Scattergrams Demonstrating Relationships of Data

FIGURE 18-9 (continued) Scattergrams Demonstrating Relationships of Data

FIGURE 18-9 (continued) Scattergrams Demonstrating Relationships of Data

FIGURE 18-9 (continued) Scattergrams Demonstrating Relationships of Data

FIGURE 18-9 (continued) Scattergrams Demonstrating Relationships of Data

Pie Chart

Represents parts of a whole

More acceptable with lay audiences then scientific or technical publications and presentations

Percentage represented by each part of the pie should be labeled for clarity

Correlation

Correlation studies relationships between variables.

The term means relationship or association between variables that can be measured mathematically.

(+/-) determines the direction of the relationship.

“r” signifies the correlation.

Value of “r” communicates the direction and strength of the association

Correlation does not equal causality

Provides much of the evidence in oral epidemiology

Establishes risk

Correlation Techniques

Pearson product-moment correlation coefficient

Spearman rank-order correlation Coefficient

Regression analysis

Multiple regression analysis

Pearson Product-Moment Correlation Coefficient

Most common correlation coefficient

Used when both variables are continuous, interval scaled, and have linear relationship

Spearman Rank-Order Correlation Coefficient

Also called “Spearman rho”

Used to correlate two ordinal variables

Regression Analysis

Can be used to quantify the relationship of two variables

Expresses the functional relationship between the variables

Used to predict the score of one variable based on the score of another

Example:

National board scores

Multiple Regression Analysis

Provides a mathematical model that gives the strength or ability of two or more variables to predict another variable

Examples:

SAT scores

GPA strength

CORRELATION

A statistical method to determine certain relationships between

Results of a correlation show either negative or positive relationship

i.e., if positive, as the valuable of one variable increases, the other also increases.

Perfect or complete correlation has a value of 1.0.

Negative correlation shows an inverse relationship between variables.

Perfect negative correlation is shown by -1.0.

OR no relationship at all = 0.0.

The closer the relationship is to +1.0 or –1.0, the stronger the correlation

Source:

Mason, Jill, (,2010) Concepts in Dental Public Heatlh. (Lippincott Williams & Wilkins)

Nathe, Christine Nielsen. Dental Public Health and Research: Contemporary Practice for the Dental Hygienist. Upper Saddle River, NJ: Pearson, 2011.

image5.png

image6.png

image7.jpeg

image8.png

image9.png

image10.png

image11.wmf

image12.wmf

image13.png

image14.png

image15.jpeg

image16.jpeg

image17.jpeg

image18.jpeg

image19.jpeg

image20.jpeg

image21.jpeg

image22.jpeg

image23.jpeg

image24.jpeg

image25.jpeg

image26.jpeg

image27.jpeg

image28.jpeg

image29.jpeg

image30.jpeg

image31.jpeg

image32.jpeg

image33.jpeg

image34.jpeg

image2.jpeg

image3.png

image4.png