STATISTICS DATA BASE Final Project

profileJovanmaires
Statisticalmethodology.edited.docx

The impact of weight, gender, marital status, age, and cholesterol levels on systolic blood pressure

Name

Course

Instructor

Data source

This project proposes to use secondary data obtained from the Hawkes learning platform available online for free at https://mailchi.mp/hawkeslearning/new-data-sets-available-for-download-465529?e=b1324c3209. The data consists of a sample of one hundred observations (n=100 patients) collected on seven variables related to the medical statistics of patients from a hospital. The variables include the patient identification number, patient gender, patient's marital status, age, weight (in pounds), the patient’s total cholesterol levels measured mg/dL, and the patient's systolic blood pressure measured in mmHg. The patient's systolic blood pressure will be used as the response variable, whereas the patient's gender, marital status, age, cholesterol levels, and weight will be employed as the explanatory variables.

Objectives

The project aims to use the patient's systolic blood pressure and their metrics related to age, weight, gender, marital status, and cholesterol levels to determine the effect these have on the systolic blood pressure of the patients sampled. Additionally, the project seeks to identify whether there exist any statistically significant associations between the response variable (systolic blood pressure) and gender, the age (in years), weight (in pounds), cholesterol levels, and the patient’s marital status. Therefore, the project is based on the hypothesis that all the predictors will have a non-zero effect on the outcome variable.

Statistical methods

Exploratory data analysis

Exploratory data analysis will be carried out to identify the underlying patterns between the variables in the dataset. Descriptive statistics will involve calculations of central tendency and dispersion measures for numeric variables, whereas frequency tables will be employed for categorical variables. Charts will be used to explore significant predictors.

Correlation analysis

Correlation analysis will be employed in examining the nature of associations between the explanatory and response variables, that is, whether the variables are positively or inversely related and whether the associations are strong, medium, or weak.

Regression modeling

Multiple regression modeling will be employed to analyze the effect of gender, age, weight, marital status, and cholesterol levels on systolic blood pressure. The multiple regression model allows examination of the effect of more than one predictor on the outcome variable by the expression it as a linear combination of other predictors in the model. The multiple regression model employed in this project takes the form:

Y denotes the dependent/outcome/ response variable, which in this case is the systolic blood pressure

denotes the patient’s systolic blood pressure when the values of the other predictors are zero, that is, an intercept term for the model.

are the coefficients of the predictors and denote the corresponding effect of the predictor on the outcome variable.

denotes the error term of the model.

Software and statistical significance

This project will use SPSS to explore the effects of the predictors on the response variable. All p values less than 0.05 will be considered statistically significant.

References

Bickel, P. J., & Lehmann, E. L. (2012). Descriptive statistics for nonparametric models. III. Dispersion. In Selected works of EL Lehmann (pp. 499-518). Springer, Boston, MA.

Fisher, M. J., & Marshall, A. P. (2009). Understanding descriptive statistics. Australian Critical Care22(2), 93-97.

Holcomb, Z. (2016). Fundamentals of descriptive statistics. Routledge.