MHS506 4 CASE

Chrysw49
Module4Background.docx

Module 4 - Background

Univariate Vs. Bivariate Analyses and Regression

Required Reading

Barrat, H. & Kirwan, M. (2009) Confounding, interactions, methods for assessment of effect modification. Health Knowledge. Retrieved from http://www.healthknowledge.org.uk/public-health-textbook/research-methods/1a-epidemiology/confounding-interactions-methods

Collier, W. Independent & dependent variables. University of North Carolina at Pembroke. Retrieved from http://www.uncp.edu/home/collierw/ivdv.htm

DeLong, E., Li, L., & Cook, A., (2014).  Pairing matching vs.stratification in cluster – Randomized trial. NIH Collaboratory

LaMorte, W.W. & Sullivan, L. (2016). Confounding and effect measure modification. Retrieved from http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704-EP713_Confounding-EM/BS704-EP713_Confounding-EM5.html

Lowry, R. (2016). Simple logistical regression. VassarStats: Website for Statistical Computation. http://www.vassarstats.net/logreg1.html

Ludford, P.J. Linear regression. University of Minnesota, College of Science and Engineering. Retrieved from http://www-users.cs.umn.edu/~ludford/Stat_Guide/Linear_Regression.htm

McDonald, J.H.(2014) Logistic Regression. In Handbook of Biological Statistics.Retrieved from http://www.biostathandbook.com/simplelogistic.html

National Science Digital Library's Computation Science Education Research Desk. (2016) Univariate data and bivariate data. Retrieved from http://www.shodor.org/interactivate/discussions/UnivariateBivariate/

National Science Digital Library's Computation Science Education Research Desk. (2016). Graphing and interpreting bivariate data. Retrieved from http://www.shodor.org/interactivate/discussions/GraphingData/

Penn State. (2016). STAT507 Epidemiological Research Methods: 3.5 - Bias, Confounding, and Effect Modification. Retrieved from https://onlinecourses.science.psu.edu/stat507/node/34

Wunsch, G. (2007). Confounding and control. Demographic Research 16(4). Retrieved from http://www.demographic-research.org/Volumes/Vol16/4/16-4.pdf

Optional Resources

Purdue Online Writing Lab. (2018). General format. Retrieved from  https://owl.purdue.edu/owl/research_and_citation/apa_style/apa_formatting_and_style_guide/general_format.html

Purdue Online Writing Lab. (2018). In-text citations: The basics. Retrieved from https://owl.purdue.edu/owl/research_and_citation/apa_style/apa_formatting_and_style_guide/in_text_citations_the_basics.html

Purdue Online Writing Lab. (2018). Reference list: Basic rules. Retrieved from https://owl.purdue.edu/owl/research_and_citation/apa_style/apa_formatting_and_style_guide/reference_list_basic_rules.html

Module 4 - Home

Univariate Vs. Bivariate Analyses and Regression

Modular Learning Outcomes

Upon successful completion of this module, the student will be able to satisfy the following outcomes:

· Case

· Distinguish between univariate and multivariate analysis.

· Distinguish between dependent and independent variables.

· Distinguish between logistic and linear regression.

· SLP

· Interpret the results of a regression analysis, both linear and logistic.

· Discuss the concept of confounding and note potential confounders in a hypothetical study.

· Assess the merits of matching on confounders versus adjusting for confounders by including them in a regression model.

· Discussion

· Identify confounders for known diseases.

Module Overview

Univariate versus Multivariate Analysis

Univariate analysis looks at how two variables relate to one another. It often examines whether there is an association between a potential risk factor, or background characteristic (e.g., smoking, gender, exercise), with an outcome or disease (e.g., lung cancer, breast cancer, diabetes). The analysis only involves the disease (or outcome) with the potential risk factor (or exposure). Multivariate analysis, on the other hand, examines more than one potential risk factor at the same time, and their potential association to the disease or outcome. For instance, one could examine the effects of smoking, gender, age, obesity, and diabetes together against a potential association with cardiovascular disease.

Dependent versus Independent Variables

In these cases, the outcome or disease status is the dependent variable, whereas any potential exposure or risk factor is an independent variable. Multivariate analysis most often looks at one dependent variable (disease or outcome status) and more than one independent variable (e.g., gender, race, income, medical history, etc.).

Confounder

A confounder is a variable that is linked with a disease (or outcome), is related with a risk factor (or exposure), and changes the relationship between the exposure and outcome. For instance, let's say that obesity is a potential risk factor for diabetes. Then consider a third variable. A family history of diabetes is also a potential risk factor for diabetes and is related to obesity. If the addition of a third variable (family history of diabetes) changes the relationship between obesity and diabetes, then the third variable (family history of diabetes) is a confounder in this situation.

Logistical and Linear Regression

Unlike univariate analysis, regression models allow researchers to examine more than one independent variable at a time against a dependent variable. This means that confounders or demographic variables may be studied alongside the exposure and outcome variables to adjust for any potential bias that may arise due to background characteristics (e.g., difference by gender or race or income, etc.). Depending on the outcome variable, logistical regression is used for binary outcomes (e.g., disease status of "yes" or "no," mortality data, etc.), whereas linear regression is used for continuous outcomes (e.g., blood pressure, bone mass density, fasting blood glucose, etc.).

Logistical and Linear models can be interpreted as follows:

Lung Cancer = 4.5 + 2.4 (smoking) + 1.7 (gender) + 2.3 (age) + 0.7 (race), p<0.05

After controlling for gender, age, and race, those with a history of smoking are 2.4 times more likely to have lung cancer than those who do not smoke (p<0.05). In this statement, lung cancer is the dependent variable, history of smoking is the independent variable of interest (the exposure), and gender, age, and race are the confounders. This is a logistical regression model, where the dependent variable is binary: lung cancer versus no lung cancer.

BMI (1 unit) = 3.9 + 3.4 (high fasting glucose) + 1.5 (gender) + 1.3 (age) + 2.7 (race), p<0.05

After controlling for gender, age, and race, a one unit increase in BMI is 3.4 times more likely in those with a high fasting glucose level than those with a lower fasting glucose level. In a linear regression model, the dependent variable is continuous and results are measured in units. The dependent variable here is body mass index (BMI), the independent variable is fasting glucose levels (high versus low), and the potential confounders are gender, age, and race.