MHS506 4 CASE
Module 4 - Background
Univariate Vs. Bivariate Analyses and Regression
Required Reading
Barrat, H. & Kirwan, M. (2009) Confounding, interactions, methods for assessment of effect modification. Health Knowledge. Retrieved from http://www.healthknowledge.org.uk/public-health-textbook/research-methods/1a-epidemiology/confounding-interactions-methods
Collier, W. Independent & dependent variables. University of North Carolina at Pembroke. Retrieved from http://www.uncp.edu/home/collierw/ivdv.htm
DeLong, E., Li, L., & Cook, A., (2014). Pairing matching vs.stratification in cluster – Randomized trial. NIH Collaboratory
LaMorte, W.W. & Sullivan, L. (2016). Confounding and effect measure modification. Retrieved from http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704-EP713_Confounding-EM/BS704-EP713_Confounding-EM5.html
Lowry, R. (2016). Simple logistical regression. VassarStats: Website for Statistical Computation. http://www.vassarstats.net/logreg1.html
Ludford, P.J. Linear regression. University of Minnesota, College of Science and Engineering. Retrieved from http://www-users.cs.umn.edu/~ludford/Stat_Guide/Linear_Regression.htm
McDonald, J.H.(2014) Logistic Regression. In Handbook of Biological Statistics.Retrieved from http://www.biostathandbook.com/simplelogistic.html
National Science Digital Library's Computation Science Education Research Desk. (2016) Univariate data and bivariate data. Retrieved from http://www.shodor.org/interactivate/discussions/UnivariateBivariate/
National Science Digital Library's Computation Science Education Research Desk. (2016). Graphing and interpreting bivariate data. Retrieved from http://www.shodor.org/interactivate/discussions/GraphingData/
Penn State. (2016). STAT507 Epidemiological Research Methods: 3.5 - Bias, Confounding, and Effect Modification. Retrieved from https://onlinecourses.science.psu.edu/stat507/node/34
Wunsch, G. (2007). Confounding and control. Demographic Research 16(4). Retrieved from http://www.demographic-research.org/Volumes/Vol16/4/16-4.pdf
Optional Resources
Purdue Online Writing Lab. (2018). General format. Retrieved from https://owl.purdue.edu/owl/research_and_citation/apa_style/apa_formatting_and_style_guide/general_format.html
Purdue Online Writing Lab. (2018). In-text citations: The basics. Retrieved from https://owl.purdue.edu/owl/research_and_citation/apa_style/apa_formatting_and_style_guide/in_text_citations_the_basics.html
Purdue Online Writing Lab. (2018). Reference list: Basic rules. Retrieved from https://owl.purdue.edu/owl/research_and_citation/apa_style/apa_formatting_and_style_guide/reference_list_basic_rules.html
Module 4 - Home
Univariate Vs. Bivariate Analyses and Regression
Modular Learning Outcomes
Upon successful completion of this module, the student will be able to satisfy the following outcomes:
· Case
· Distinguish between univariate and multivariate analysis.
· Distinguish between dependent and independent variables.
· Distinguish between logistic and linear regression.
· SLP
· Interpret the results of a regression analysis, both linear and logistic.
· Discuss the concept of confounding and note potential confounders in a hypothetical study.
· Assess the merits of matching on confounders versus adjusting for confounders by including them in a regression model.
· Discussion
· Identify confounders for known diseases.
Module Overview
Univariate versus Multivariate Analysis
Univariate analysis looks at how two variables relate to one another. It often examines whether there is an association between a potential risk factor, or background characteristic (e.g., smoking, gender, exercise), with an outcome or disease (e.g., lung cancer, breast cancer, diabetes). The analysis only involves the disease (or outcome) with the potential risk factor (or exposure). Multivariate analysis, on the other hand, examines more than one potential risk factor at the same time, and their potential association to the disease or outcome. For instance, one could examine the effects of smoking, gender, age, obesity, and diabetes together against a potential association with cardiovascular disease.
Dependent versus Independent Variables
In these cases, the outcome or disease status is the dependent variable, whereas any potential exposure or risk factor is an independent variable. Multivariate analysis most often looks at one dependent variable (disease or outcome status) and more than one independent variable (e.g., gender, race, income, medical history, etc.).
Confounder
A confounder is a variable that is linked with a disease (or outcome), is related with a risk factor (or exposure), and changes the relationship between the exposure and outcome. For instance, let's say that obesity is a potential risk factor for diabetes. Then consider a third variable. A family history of diabetes is also a potential risk factor for diabetes and is related to obesity. If the addition of a third variable (family history of diabetes) changes the relationship between obesity and diabetes, then the third variable (family history of diabetes) is a confounder in this situation.
Logistical and Linear Regression
Unlike univariate analysis, regression models allow researchers to examine more than one independent variable at a time against a dependent variable. This means that confounders or demographic variables may be studied alongside the exposure and outcome variables to adjust for any potential bias that may arise due to background characteristics (e.g., difference by gender or race or income, etc.). Depending on the outcome variable, logistical regression is used for binary outcomes (e.g., disease status of "yes" or "no," mortality data, etc.), whereas linear regression is used for continuous outcomes (e.g., blood pressure, bone mass density, fasting blood glucose, etc.).
Logistical and Linear models can be interpreted as follows:
Lung Cancer = 4.5 + 2.4 (smoking) + 1.7 (gender) + 2.3 (age) + 0.7 (race), p<0.05
After controlling for gender, age, and race, those with a history of smoking are 2.4 times more likely to have lung cancer than those who do not smoke (p<0.05). In this statement, lung cancer is the dependent variable, history of smoking is the independent variable of interest (the exposure), and gender, age, and race are the confounders. This is a logistical regression model, where the dependent variable is binary: lung cancer versus no lung cancer.
BMI (1 unit) = 3.9 + 3.4 (high fasting glucose) + 1.5 (gender) + 1.3 (age) + 2.7 (race), p<0.05
After controlling for gender, age, and race, a one unit increase in BMI is 3.4 times more likely in those with a high fasting glucose level than those with a lower fasting glucose level. In a linear regression model, the dependent variable is continuous and results are measured in units. The dependent variable here is body mass index (BMI), the independent variable is fasting glucose levels (high versus low), and the potential confounders are gender, age, and race.