Math Project

LEGEND09

MathFinal.pdf

Home >Mathematics homework help >Statistics homework help >Math Project

Math 140 Final Exam COC Spring 2022

150 Points

Question 1 (40 points) Match the following vocabulary words in the table below with the corresponding definitions.

Placebo Effect Placebo Mean Average Standard Deviation

Slope Histogram of the residuals

Correlation Coefficient (r)

Contingency Table

Conditional Percentage (Conditional Proportion)

Explanatory Variable Scatterplot R-squared

Response Variable Sampling Variability Significance Level Type II Error

Interquartile Range (IQR)

Hypothesis Test Bias Bootstrap Distribution

Standard Deviation of the Residual Errors

Quantitative Data Confidence Interval Categorical Data

Critical Value Regression Regression Line Census

Outliers Random Variability Normal Data

Type I Error Residual Correlation Beta Level

Marginal Percentage (Marginal Proportion)

Residual Plot P-value Joint Percentage (Joint Proportion)

a. Also called the Alpha Level. If the P-value is lower than this number, then the sample data

significantly disagrees with the null hypothesis and is unlikely to have happened by random

chance. This is also the probability of making a type 1 error.

b. When data does not reflect the population.

c. The vertical distance between the regression line and a point in the scatterplot.

d. A fake medicine or fake treatment used to control the placebo effect.

e. When biased sample data leads you to support the alternative hypothesis when the alternative

hypothesis is actually wrong in the population.

f. A graph for visualizing the relationship between two quantitative ordered pair variables. The

ordered pairs (𝑥, 𝑦) are plotted on the rectangular coordinate system.

g. Data in the form of numbers that measure or count something. They usually have units and

taking an average makes sense.

h. Also called the line of best fit or the line of least squares. This line minimizes the vertical

distances between it and all the points in the scatterplot.

i. Data that is bell shaped, symmetric and unimodal. Also referred to as data that has a normal

distribution.

j. The distance between the middle 50% of the numbers in a data set. Calculated by subtracting

the 1st and 3rd quartiles. The measure of typical spread for a data set that is not normal.

k. The probability of getting the sample data or more extreme because of sampling variability (by

random chance) if the null hypothesis is true.

l. Unusual values in the data set.

m. Data in the form of labels that tell us something about the people or objects in the data set.

n. Another name for the y-variable or dependent variable in a correlation study.

o. A single percentage or proportion without any conditions. In a contingency table, this can found

with numbers in the margins.

p. Also called the spread. A measure of how spread out a data set is. A large spread tells us that

the data is less consistent and the more difficult to predict. A small spread tells us that the data

is more consistent and easier to predict.

q. When biased sample data leads you fail to reject the null hypothesis when the null hypothesis is

actually wrong in the population.

r. Putting many bootstrap statistics on the same graph in order to simulate the sampling variability

in a population, calculate standard error, and create a confidence interval. The center of the

bootstrap distribution is the original real sample statistic.

s. Also called a two-way table. This table summarizes the counts when comparing two different

categorical data sets each with two or more variables.

t. The probability of making a type 2 error.

u. The amount of increase or decrease in the y-variable for every one-unit increase in the x-

variable.

v. Random samples values and sample statistics are usually different from each other and usually

different from the population parameter.

w. A statistic that measures how far points in a scatterplot are from the regression line on average

and measures the average amount of prediction error.

x. The balancing point for distances in a data set. The average for a data set that is normal.

y. A graph that pairs the residuals with the x values. This graph should be evenly spread out and

not fan shaped.

z. A percentage or proportion involving two variables being true about the person or object, but

does not have a condition. There are generally two types (AND, OR).

cc. When everyone in the population has a chance to be included in the sample.

dd. Statistical analysis that determines if there is a relationship between two different quantitative

variables.

ee. The capacity of the human brain to manifest physical responses based on the person believing

something is true.

ff. Statistical analysis that involves finding the line or model that best fits a quantitative relationship,

using the model to make predictions, and analyzing error in those predictions.

gg. A number we compare our test statistic to in order to determine significance. In a sampling

distribution or a theoretical distribution approximating the sampling distribution, the critical value

shows us where the tail or tails are. The test statistic must fall in the tail to be significant.

hh. A statistic between −1 and +1 that measures the strength and direction of linear relationships

between two quantitative variables.

ii. The average or typical distance that points in a data set are from the mean. The measure of

typical spread (typical variability) for a data set that is normal.

jj. Another name for the x-variable or independent variable in a correlation study.

kk. The percentage or proportion calculated from a particular group or if a particular condition was

true. These are the very important when studying categorical relationships.

ll. Collecting data from everyone in a population.

mm. A graph showing the shape of the residuals. This graph should be nearly normal and centered

close to zero.

nn. Also called the coefficient of determination. This statistic measures the percent of variability in

the y-variable that can be explained by the linear relationship with the x-variable.

oo. Two numbers that we think a population parameter is in between. Can be calculated by either a

bootstrap distribution or by adding and subtracting the sample statistic and the margin of error.

pp. A procedure for testing a claim about a population.

Question 2 (20 Points) 2-Population Mean Confidence Interval

Directions: Use the following Statcato and StatKey printouts and answer the following questions.

a) What is the confidence level for this confidence interval?

b) Show that the data meets the assumptions for inference with two population means?

c) Are the groups independent or matched pair?

d) Write the lower and upper values for the confidence interval.

e) Does the confidence interval indicate that the mean or percentage from population 1 is higher,

lower, or not significantly different from population 2? Explain how you know. If the mean or

percentage from population 1 is higher than population 2, then how much higher could it be?

f) Write the two-population confidence interval sentence explaining this confidence interval.

The Scenario: Random samples of 400 males and 400 females were taken regarding their hours of exercise. Construct

a 2-population mean confidence interval which studies the difference in hours of exercise between

males and females. The dataset is taken from StatKey. The bootstrap dotplot was made in StatKey.

Question 3 (10 points) Use the following printout to analyze the variance and standard deviation confidence interval.

N Variance Standard Deviation

99% Confidence Interval - Variance

99% Confidence Interval – Standard Deviation

78 32.952 3.869 (23.586, 67.564) (2.940, 5.006)

The Scenario: A random sample of 78 females was taken. The researcher wanted to study the scores for the GRE

exam (an exam taken to apply to several graduate degree programs). We are given the 99% confidence

interval for variance and the 99% confidence interval for standard deviation in the above printout.

a) Assume the sample data is normal. State the other two assumptions for this problem. Also justify

your answers (i.e. just writing the other two assumptions have been met will be treated as an

incomplete answer and points will be deducted).

b) Write down a sentence to explain the population variance confidence interval.

c) Write down a sentence to explain the population standard deviation confidence interval.

Question 4 (35 Points) Chi-Squared Categorical Association Test

Directions: For each of the following problems, use the printout provided to answer the following

questions.

a) Write the null and alternative hypothesis. Make sure to label which one is the claim.

b) Check the assumptions for the categorical association test.

c) What is the Chi-squared test statistic? Write a sentence to explain the test statistic.

d) Does the test statistic fall in the tail determined by the critical value?

e) Does the sample data significantly disagree with the null hypothesis? Explain your answer.

f) Are the observed counts significantly different from the expected counts? Explain your answer.

g) What is the P-value? Write a sentence to explain the P-value.

h) Compare the P-value to the significance level. Should we reject the null hypothesis or fail to reject the

null hypothesis? Explain your answer.

i) If the null hypothesis was true, could the sample data or more extreme have occurred by sampling

variability or is it unlikely to be sampling variability? Explain your answer.

j) Write a conclusion for the test addressing evidence and the claim. Explain your conclusion in non-

technical language.

k) Are the categories related or not? Explain your answer.

The Scenario: A random sample of male college students were asked their major. Later, a random sample of female

college students were asked their major. The goal of the study was to show that gender is not related to

major. Use a 5% significance level and the printouts below to answer the questions given above.

The Chi-Squared Test Contingency Table

Business English History Music Biology Math Row Total

Female 89.0 (97.30) [0.71]

71.0 (62.45) [1.17]

62.0 (58.58) [0.20]

48.0 (48.89) [0.02]

56.0 (57.12) [0.02]

9.0 (10.65) [0.26]

335.0

Male 112.0 (103.70)

[0.67]

58.0 (66.55) [1.10]

59.0 (62.42) [0.19]

53.0 (52.11) [0.02]

62.0 (60.88) [0.02]

13.0 (11.35) [0.24]

357.0

Column Total

201.0

129.0

121.0

101.0

118.0

22.0

692.0

Test Statistic Contribution

Significance Level

Degrees of Freedom

Chi-Squared Test Statistic

Critical Value P-Value

0.05 5 4.6014 11.0705 0.4664

Question 5 (35 Points) Correlation Hypothesis Test

Directions: For each of the following problems, use the information provided to answer the following

questions. Assume the assumptions have all been met.

a) Write the null and alternative hypothesis for the correlation test. Address the quantitative

relationship and label which is the claim.

b) Write a sentence to explain the strength and direction based on the correlation coefficient (r =

+0.321).

c) Write a sentence to explain the given sample slope (𝑏1 = +0.028).

d) Compare the T-test statistic (0.289405) to the given critical value (0.2). Does the test statistic fall in a

tail determined by the critical value?

e) Does the sample data significantly disagree with the null hypothesis? Explain your answer.

f) Compute r-squared. Write the sentence to explain r-squared.

g) Write a sentence explaining the P-value (0.006749).

h) Compare the P-value to the significance level. Could the sample data or more extreme occur by

sampling variability if the null hypothesis was true or is it unlikely? Explain your answer.

i) Should we reject the null hypothesis or fail to reject the null hypothesis? Explain your answer.

j) Write a conclusion for the test addressing the evidence and the claim.

The Scenario: Use a 5% significance level to test the claim that there is a linear relationship between the number of

hurricanes and the year. The dataset with printouts were created in StatKey.

Question 6 (10 points) a.) State in one sentence the most difficult topic you encountered in this course.

b.) In two to three sentences, describe why this topic was the most difficult for you.