statistics project
Jane Q. Public Semester Project
MTH 245-FXX Fall 2021
Section 1: Visual Data Assessment
The BMI histogram is skewed to the right. There is one peak in the area of class 26.00-30.49. There are no gaps in this histogram.
The LDL histogram appears to be symmetric. There is one peak located the 95-114 class. There are no gaps in the histogram.
Section 2: Descriptive Statistics
Mean Range Variance Standard Deviation BMI 28.441 29.620 54.672 7.394 LDL 112.7 170.0 1,366.0 37.0
The regular BMI boxplot appears to be right- skewed.
The regular LDL boxplot appears to be symmetric.
Five Number Summary Minimum Q1 Median Q3 Maximum
BMI 17.62 23.60 26.505 32.205 47.24 LDL 39 87.5 110.5 135 209
The modified BMI boxplot shows one potential outlier with value 47.24.
The modified LDL boxplot shows one potential outlier with value 209.
Section 3: Confidence Intervals
BMI 95% confidence interval: (28.076, 30.805) – t-distribution LDL 95% confidence interval: (100.9, 124.5) – t-distribution
Section 4: Hypothesis Test For a p-value hypothesis test at α=0.05, the hypotheses are as follows:
𝐻𝐻0: µ = 26.00 𝐻𝐻𝐴𝐴: µ < 26.00
The t distribution was used to perform the test, which resulted in a p-value of 0.978. Since p- value > α, we fail to reject 𝐻𝐻0 and conclude that there is insufficient evidence to reject the claim that the mean BMI measurement is 26.00 or greater. Section 5: Correlation/Regression Analysis Least-Squares Regression Equation: 𝑦𝑦� = 103.003 + 0.341𝑥𝑥. Scatterplot with Least-Squares Line.
Coefficient of Determination. The coefficient of determination is 0.005, which means that only 0.5% of the variability in LDL is explained by its relationship to BMI. This suggests a poor fit of the model to the data.
Influential Points. No data point has a Cook's distance greater than 0.5, so there are no influential points.
Formal Hypothesis Test for Correlation. The p-value equals 0.676, which is greater than α = 0.05. Therefore, we conclude that there is insufficient evidence that a linear correlation exists between BMI and LDL.
LINE Criteria Assessment. The estimated regression fails two out of the four LINE criteria.
Linear Relationship: FAIL.
− There is insufficient evidence of correlation between LDL and BMI − The scatterplot shows no identifiable pattern − The graph of the estimated regression line is nearly horizontal, suggesting an
overall lack of fit.
Independence: PASS. Assumed to be independent.
Normal Distribution: PASS. Visually, the residuals appear to be normally distributed:
The histogram appears to be roughly The dots follow the trend line symmetric. closely except at the upper end. A Shapiro-Wilk goodness-of-fit test resulted in a p-value of 0.699, which is greater than α = 0.05, indicating that there is sufficient evidence that the residuals are normally distributed. This result is consistent with the visual assessment above.
Equal Variances: FAIL. The residuals appear not to have equal variances for all predictor values:
The pattern is funnel-shaped, narrowing as BMI increases.
Overall Validity Assessment. The estimated regression line is not a valid model because there is insufficient evidence of correlation between the variables, and because it fails two out of the four LINE criteria.
Estimates of LDL when BMI = 30.00.
− Estimate of a New Response: Since the regression model is not valid, we cannot use it to estimate LDL values for individual BMI values. Therefore, 𝑦𝑦𝑛𝑛𝑛𝑛𝑛𝑛 = 112.7, the sample mean of the LDL column.
− 95% Prediction Interval of a New Response: Not applicable – an invalid model cannot be used to construct a prediction interval.