New urgent
data-CI.xlsx
data1
| https://www.surveymonkey.com/r/JHSXPTF | ||
| 1. Create two frequency tables based on two separate questions from your survey. | ||
| 1. What is your Age? | ||
| Age | Frequency | Percentage |
| 18-24 | 3 | 9.09% |
| 25-34 | 7 | 21.22% |
| 35-44 | 10 | 30.30% |
| 45-54 | 4 | 12.12% |
| 55-64 | 5 | 15.15% |
| 65+ | 4 | 12.12% |
| 2. How many hours spent in the office in a day? | ||
| Time spent | Frequency | Percentage |
| 8-9 hours | 6 | 19.35% |
| 9-10 hours | 9 | 29.03% |
| Less than 8 hours | 8 | 25.81% |
| More than 10 Hours | 8 | 25.81% |
data2
| Gender (F=1, M=2) | Age |
| 1 | 18 |
| 1 | 23 |
| 1 | 24 |
| 1 | 25 |
| 1 | 27 |
| 1 | 28 |
| 1 | 27 |
| 2 | 28 |
| 1 | 27 |
| 1 | 32 |
| 1 | 35 |
| 1 | 35 |
| 2 | 36 |
| 2 | 38 |
| 1 | 41 |
| 2 | 42 |
| 1 | 43 |
| 1 | 43 |
| 2 | 44 |
| 1 | 44 |
| 2 | 45 |
| 2 | 45 |
| 2 | 47 |
| 1 | 50 |
| 2 | 55 |
| 1 | 57 |
| 2 | 57 |
| 1 | 58 |
| 1 | 64 |
| 2 | 65 |
| 2 | 67 |
| 1 | 72 |
| 1 | 73 |
DS.xlsx
MeanMedianMode
| Descriptive Statistics | |||
| SYM506 | |||
| Mean, Median, and Mode of Variables | |||
| Variable | Mean | Median | Mode |
| Gender | 1.36 | 1 | 1 |
| Age | 42.88 | 43 | 27 |
| Gender (F=1, M=2) | Age | ||
| 1 | 18 | Gender: The mean gender is 1.36 because the values of 1 and 2 represent the dichotomous variable of gender. The mean value of 1.36 indicates that there are slightly more males in the sample than there are females, because the value is closer to 1 than 2. The median value would defaults to 1 for the same reason, because in the sampling there are slightly more males than there are females. In the case of gender, the mode is also 1, because the 1 value representing the male occurs most often in the sequence. Age: The mean age of the sample group is 42.88 years old, and the median is also 43 years old, and the mode is 27, as there are 3 employees that are age 27. | |
| 1 | 23 | ||
| 1 | 24 | ||
| 1 | 25 | ||
| 1 | 27 | ||
| 1 | 28 | ||
| 1 | 27 | ||
| 2 | 28 | ||
| 1 | 27 | ||
| 1 | 32 | ||
| 1 | 35 | ||
| 1 | 35 | ||
| 2 | 36 | ||
| 2 | 38 | ||
| 1 | 41 | ||
| 2 | 42 | ||
| 1 | 43 | ||
| 1 | 43 | ||
| 2 | 44 | ||
| 1 | 44 | ||
| 2 | 45 | ||
| 2 | 45 | ||
| 2 | 47 | ||
| 1 | 50 | ||
| 2 | 55 | ||
| 1 | 57 | ||
| 2 | 57 | ||
| 1 | 58 | ||
| 1 | 64 | ||
| 2 | 65 | ||
| 2 | 67 | ||
| 1 | 72 | ||
| 1 | 73 |
Variance
| Variable | Variance | Gender: The gender variance is 0.24, and this value means that the gender can vary 0.24 between male and female. Age: The age variance of the sample is 229.17. This number represents the spread of each value present from the data set. | |
| Gender | 0.24 | ||
| Age | 229.17 | ||
| Gender (F=1, M=2) | Age | ||
| 1 | 18 | ||
| 1 | 23 | ||
| 1 | 24 | ||
| 1 | 25 | ||
| 1 | 27 | ||
| 1 | 28 | ||
| 1 | 27 | ||
| 2 | 28 | ||
| 1 | 27 | ||
| 1 | 32 | ||
| 1 | 35 | ||
| 1 | 35 | ||
| 2 | 36 | ||
| 2 | 38 | ||
| 1 | 41 | ||
| 2 | 42 | ||
| 1 | 43 | ||
| 1 | 43 | ||
| 2 | 44 | ||
| 1 | 44 | ||
| 2 | 45 | ||
| 2 | 45 | ||
| 2 | 47 | ||
| 1 | 50 | ||
| 2 | 55 | ||
| 1 | 57 | ||
| 2 | 57 | ||
| 1 | 58 | ||
| 1 | 64 | ||
| 2 | 65 | ||
| 2 | 67 | ||
| 1 | 72 | ||
| 1 | 73 | ||
| Age | Frequency | ||
| 18-24 | 3 | ||
| 25-34 | 7 | ||
| 35-44 | 10 | ||
| 45-54 | 4 | ||
| 55-64 | 5 | ||
| 65+ | 4 | ||
Employee age
Employee Age
18-24 25-34 35-44 45-54 55-64 65+ 3 7 10 4 5 4
Employee Age
Employee Age
18-24 25-34 35-44 45-54 55-64 65+ 3 7 10 4 5 4
Standard Deviation
| Variable | Standard Deviation | |||||
| Gender | 0.48 | |||||
| Age | 14.91 | Ideal Age Range: | 27.97 | to | 57.79 | |
| Gender (F=1, M=2) | Age | Gender: The gender standard deviation is 0.48, meaning that there is a 0.48 deviation based upon the data sample. Age: The age standard deviation is 14.91. Knowing this we could say that the best sample age for this survey could be to take the mean age of 42.88, and subtract/add the standard deviation of 14.91 getting the age range of 27 to 57 years old. | ||||
| 1 | 18 | |||||
| 1 | 23 | |||||
| 1 | 24 | |||||
| 1 | 25 | |||||
| 1 | 27 | |||||
| 1 | 28 | |||||
| 1 | 27 | |||||
| 2 | 28 | |||||
| 1 | 27 | |||||
| 1 | 32 | |||||
| 1 | 35 | |||||
| 1 | 35 | |||||
| 2 | 36 | |||||
| 2 | 38 | |||||
| 1 | 41 | |||||
| 2 | 42 | |||||
| 1 | 43 | |||||
| 1 | 43 | |||||
| 2 | 44 | |||||
| 1 | 44 | |||||
| 2 | 45 | |||||
| 2 | 45 | |||||
| 2 | 47 | |||||
| 1 | 50 | |||||
| 2 | 55 | |||||
| 1 | 57 | |||||
| 2 | 57 | |||||
| 1 | 58 | |||||
| 1 | 64 | |||||
| 2 | 65 | |||||
| 2 | 67 | |||||
| 1 | 72 | |||||
| 1 | 73 |
Probability
| Gender (F=1, M=2) | Age | Variable | Mean | SD | |
| 1 | 18 | 0.0303030303 | Gender | 1.36 | 0.48 |
| 1 | 23 | 0.0303030303 | Age | 42.88 | 14.91 |
| 1 | 24 | 0.0303030303 | |||
| 1 | 25 | 0.0303030303 | |||
| 1 | 27 | 0.0909090909 | |||
| 1 | 28 | 0.0606060606 | Age Probability | ||
| 1 | 27 | 0.0909090909 | Younger than 43 years old | 0.4764677739 | |
| 2 | 28 | 0.0606060606 | Older than 43 years old | 0.5235322261 | |
| 1 | 27 | 0.0909090909 | |||
| 1 | 32 | 0.0303030303 | Gender Probability | ||
| 1 | 35 | 0.0606060606 | Male | 0.5833333333 | |
| 1 | 35 | 0.0606060606 | Female | 0.4166666667 | |
| 2 | 36 | 0.0303030303 | |||
| 2 | 38 | 0.0303030303 | Gender: The probability that a member of the sample was male was 58%, meaning that the probability it was a female was 41%. Age: The probability that a member of the sample was younger than 43 years old was 47% , while the probability that an individual was older than 43 years old was 52%. | ||
| 1 | 41 | 0.0303030303 | |||
| 2 | 42 | 0.0303030303 | |||
| 1 | 43 | 0.0606060606 | |||
| 1 | 43 | 0.0606060606 | |||
| 2 | 44 | 0.0606060606 | |||
| 1 | 44 | 0.0606060606 | |||
| 2 | 45 | 0.0606060606 | |||
| 2 | 45 | 0.0606060606 | |||
| 2 | 47 | 0.0303030303 | |||
| 1 | 50 | 0.0303030303 | |||
| 2 | 55 | 0.0303030303 | |||
| 1 | 57 | 0.0606060606 | |||
| 2 | 57 | 0.0606060606 | |||
| 1 | 58 | 0.0303030303 | |||
| 1 | 64 | 0.0303030303 | |||
| 2 | 65 | 0.0303030303 | |||
| 2 | 67 | 0.0303030303 | |||
| 1 | 72 | 0.0303030303 | |||
| 1 | 73 | 0.0303030303 | |||
Data
| Gender (F=1, M=2) | Age |
| 1 | 18 |
| 1 | 23 |
| 1 | 24 |
| 1 | 25 |
| 1 | 27 |
| 1 | 28 |
| 1 | 27 |
| 2 | 28 |
| 1 | 27 |
| 1 | 32 |
| 1 | 35 |
| 1 | 35 |
| 2 | 36 |
| 2 | 38 |
| 1 | 41 |
| 2 | 42 |
| 1 | 43 |
| 1 | 43 |
| 2 | 44 |
| 1 | 44 |
| 2 | 45 |
| 2 | 45 |
| 2 | 47 |
| 1 | 50 |
| 2 | 55 |
| 1 | 57 |
| 2 | 57 |
| 1 | 58 |
| 1 | 64 |
| 2 | 65 |
| 2 | 67 |
| 1 | 72 |
| 1 | 73 |
data set3.xlsx
Question 1
| https://www.surveymonkey.com/r/JHSXPTF | ||
| 1. Create two frequency tables based on two separate questions from your survey. | ||
| 1. What is your Age? | ||
| Age | Frequency | Percentage |
| 18-24 | 3 | 9.09% |
| 25-34 | 7 | 21.22% |
| 35-44 | 10 | 30.30% |
| 45-54 | 4 | 12.12% |
| 55-64 | 5 | 15.15% |
| 65+ | 4 | 12.12% |
| 2. How many hours spent in the office in a day? | ||
| Time spent | Frequency | Percentage |
| 8-9 hours | 6 | 19.35% |
| 9-10 hours | 9 | 29.03% |
| Less than 8 hours | 8 | 25.81% |
| More than 10 Hours | 8 | 25.81% |
Question 2
| 2. Create a bar graph and a pie chart based on the data in the frequency tables. |
Age
18-24 25-34 35-44 45-54 55-64 65+ 3 7 10 4 5 4Answer Choices
Responses
Time Spent
Frequency 8-9 hours 9-10 hours Less than 8 hours More than 10 Hours 6 9 8 8Age
18-24 25-34 35-44 45-54 55-64 65+ 3 7 10 4 5 4
Time Spent
8-9 hours 9-10 hours Less than 8 hours More than 10 Hours 6 9 8 8
Question 3
| 3. Determine the class intervals and create frequency distribution for each of the frequency tables. | |||
| Age/Class Intervals | Frequency | Cumulative frequency | Percentage |
| 15-24 | 3 | 3 | 9.09% |
| 25-34 | 7 | 10 | 21.22% |
| 35-44 | 10 | 20 | 30.30% |
| 45-54 | 4 | 24 | 12.12% |
| 55-64 | 5 | 29 | 15.15% |
| 65+ | 4 | 33 | 12.12% |
| Total | 33 | 100.00% | |
| The time spent in office is Categorical so we cant get any class interval | |||
| Hours spent | Frequency | Cumulative Frequency | Percentage |
| 8-9 hours | 6 | 6 | 19.35% |
| 9-10 hours | 9 | 15 | 29.03% |
| Less than 8 hours | 8 | 23 | 25.81% |
| More than 10 Hours | 8 | 31 | 25.81% |
Question 4
| 4. Create one frequency polygon of the data from the frequency distribution. |
Frequency and Cumulative Frequency
Frequency 15-24 25-34 35-44 45-54 55-64 65+ 3 7 10 4 5 4 Cumulative frequency 15-24 25-34 35-44 45-54 55-64 65+ 3 10 20 24 29 33Hours spent
Frequency
Percentage 15-24 25-34 35-44 45-54 55-64 65+ 9.0899999999999995E-2 0.2122 0.30299999999999999 0.1212 0.1515 0.1212Frequency and Cumulative Frequency
Frequency 8-9 hours 9-10 hours Less than 8 hours More than 10 Hours 6 9 8 8 Cumulative Frequency 8-9 hours 9-10 hours Less than 8 hours More than 10 Hours 6 15 23 31Hours spent
Frequency
Percentage 8-9 hours 9-10 hours Less than 8 hours More than 10 Hours 0.19350000000000001 0.2903 0.2581 0.2581data collection.doc
Running Head: DATA COLLECTION 1
DATA COLLECTION 2
Data Collection: Work Stress
Asha Kolagunda Nagappa Shetty
Grand Canyon University: SYM 506
April 2, 2019
The topic that I selected for my final project allows extensive research in a qualitative and quantitative form regarding Employees and how they are affected by stress and depression. The qualitative data that I am going to utilize for my research helps explore collections of narrative data from my co-workers to report stories and opinions on why they feel they are more prone to stress and anxiety which is leading to depression. Throughout the extensive research, quantitative data will be collected to help focus on a collection of numerical data that is based on a statistical analysis of trends and variables such as the percentage of employees who are having work stress.
The studies from this paper will allow for a better understanding of the challenges that employees face when it comes to handle stress and anxiety and why the number of depression cases is four times more common in employees than the self-employees .Employers will never fully understand how work pressure is damaging health has been for employees, as I have witnessed that it is more devastating than any disease. Aside from the stereotypes, this research will shed light on the people who are affected by depression and illustrate how it is affecting the mental health.
I have used the data collection method of Survey Monkey which I asked the respondents 6 questions with various choices as follows.
https://www.surveymonkey.com/r/JHSXPTF
Analysis of the Survey Monkey Data:
Methodology
The Stress in the Workplace survey was conducted online tool ‘Survey Monkey’ among adults aged 18+ who reside in the U.S who are either employed full-time, part-time, or self-employed. Results were weighted as needed for age, sex, and working hours. Because the sample is based on those who were invited to participate in the Survey Monkey, no estimates of theoretical sampling error can be calculated.
The sample comprised 32 individuals, consists of employees, self-employed and other occupation.
Gender differences in job stress were investigated, collecting both qualitative (stressful incidents at work like time spent in office-stretching day) and quantitative (rating scales of job strains) data from a sample of respondents from my survey. Content analyses of the qualitative data revealed stress is experienced by both genders. When comparisons are made between men and women on their job stress experiences it is found women are more prone to stress. When time spent in office is considered , it is been observed employees have extra time spent for commute and spent extra time lagging at office due to unfinished work and last-minute work requests.
It is also found that with Self-employed responses, there is a lot to love about self-employment. You have the freedom to make your own choices and decisions. You can set your own hours, build your own investment (rather than somebody else’s), and never have to ask off for vacation or family events.
One of the biggest things that bothered employees about working for someone else was being told what to do.
When the survey is controlled for occupation, women reported more overall psychological strains (as indicated by the qualitative data) and depression (as indicated by the quantitative data) than did men. In this study, both qualitative and quantitative data indicated interaction effects between gender and occupation in predicting job stressors and strains. Finally, there was a stronger relation between interpersonal conflicts and negative emotions/job satisfaction were stronger for employees than for self-employed .
Key Findings
Combining these graphs as the number of respondents, more than two-thirds (66 percent) of employed adults report they are stressed with their job. However, less than half (44 percent) report being satisfied with the job they do
study material.docx
ANALYSIS OF VARIANCE 387
INTRODUCTION
In this chapter, we continue our discussion of hypothesis testing. Recall that in Chapters 10
and 11 we examined the general theory of hypothesis testing. We described the case
where a sample was selected from the population. We used the z distribution (the standard
normal distribution) or the t distribution to determine whether it was reasonable to
conclude that the population mean was equal to a specified value. We tested whether
two population means are the same. In this chapter, we expand our idea of hypothesis
tests. We describe a test for variances and then a test that simultaneously compares
several population means to determine if they are equal.
COMPARING TWO POPULATION VARIANCES
In Chapter 11, we tested hypotheses about equal population means. The tests differed
based on our assumptions regarding whether the population standard deviations or
variances were equal or unequal. In this chapter, the assumption about equal population
variances is also important. In this section, we present a way to statistically test this
The F Distribution
The probability distribution used in this chapter is the F distribution. It was named to honor
Sir Ronald Fisher, one of the founders of modern-day statistics. The test statistic for several
situations follows this probability distribution. It is used to test whether two samples are
from populations having equal variances, and it is also applied when we want to compare
several population means simultaneously. The simultaneous comparison of several population
means is called analysis of variance (ANOVA). In both of these situations, the populations
must follow a normal distribution, and the data must be at least interval-scale.
What are the characteristics of the F distribution?
1. There is a family of F distributions. A particular member of the family is determined
by two parameters: the degrees of freedom in the numerator and the degrees of
freedom in the denominator. The shape of the distribution is illustrated by the following
graph. There is one F distribution for the combination of 29 degrees of freedom
in the numerator (df ) and 28 degrees of freedom in the denominator. There is
another F distribution for 19 degrees of freedom in the numerator and 6 degrees of
freedom in the denominator. The final distribution shown has 6 degrees of freedom
in the numerator and 6 degrees of freedom in the denominator. We will describe
the concept of degrees of freedom later in the chapter. Note that the shapes of the
distributions change as the degrees of freedom change.
The F distribution is continuous. This means that the value of F can assume an
infinite number of values between zero and positive infinity.
3. The F statistic cannot be negative. The smallest value F can assume is 0.
4. The F distribution is positively skewed. The long tail of the distribution is to the
right-hand side. As the number of degrees of freedom increases in both the numerator
and denominator, the distribution approaches a normal distribution.
5. The F distribution is asymptotic. As the values of F increase, the distribution approaches
the horizontal axis but never touches it. This is similar to the behavior of
the normal probability distribution, described in Chapter 7.
Testing a Hypothesis of Equal Population Variances
The first application of the F distribution that we describe occurs when we test the hypothesis
that the variance of one normal population equals the variance of another
normal population. The following examples will show the use of the test:
• A health services corporation manages two hospitals in Knoxville, Tennessee:
St. Mary’s North and St. Mary’s South. In each hospital, the mean waiting time in the
Emergency Department is 42 minutes. The hospital administrator believes that the
St. Mary’s North Emergency Department has more variation in waiting time than
St. Mary’s South.
• The mean rate of return on two types of
common stock may be the same, but
there may be more variation in the rate
of return in one than the other. A sample
of 10 technology and 10 utility
stocks shows the same mean rate of
return, but there is likely more variation
in the technology stocks.
• An on-line newspaper found that men
and women spend about the same
amount of time per day accessing
news apps. However, the same report
indicated the times of men had nearly
twice as much variation compared to
the times of women.
The F distribution is also used to test the assumption that the variances of two normal
populations are equal. Recall that in the previous chapter the t test to investigate
whether the means of two independent populations differed assumes that the variances
of the two normal populations are the same. See this list of assumptions on page 361.
The F distribution is used to test the assumption that the variances are equal.
To compare two population variances, we first state the null hypothesis. The null
hypothesis is that the variance of one normal population, σ21
, equals the variance of another
normal population, σ22
. The alternate hypothesis is that the variances differ. In this
instance, the null hypothesis and the alternate hypothesis are:
H0: σ2
1 = σ2
2
H1: σ2
1 ≠ σ2
2
To conduct the test, we select a random sample of observations, n1, from one population
and a random sample of observations, n2, from the second population. The test
statistic is defined as follows.
ANALYSIS OF VARIANCE 389
The terms s1
2 and s2
2 are the respective sample variances. If the null hypothesis is
true, the test statistic follows the F distribution with n1 − 1 and n2 − 1 degrees of freedom.
To reduce the size of the table of critical values, the larger sample variance is
placed in the numerator; hence, the tabled F ratio is always larger than 1.00. Thus, the
right-tail critical value is the only one required. The critical value of F for a two-tailed test
is found by dividing the significance level in half (α/2) and then referring to the appropriate
degrees of freedom in Appendix B.6. An example will illustrate.
SOLUTION
The mean driving times along the two routes are nearly the same. The mean time is
58.29 minutes for the U.S. 25 route and 59.0 minutes along the I-75 route. However,
in evaluating travel times, Mr. Lammers is also concerned about the variation
in the travel times. The first step is to compute the two sample variances. We’ll use
formula (3–9) to compute the sample standard deviations. To obtain the sample
variances, we square the standard deviations.
U.S. ROUTE 25
x =
Σx
n =
408
7 = 58.29 s = _
Σ(x − x )2
n − 1 = _
485.43
7 − 1 = 8.9947
INTERSTATE 75
x =
Σx
n =
472
8 = 59.00 s = _
Σ(x − x)2
n − 1 = _
134
8 − 1 = 4.3753
There is more variation, as measured by the standard deviation, in the U.S. 25 route
than in the I-75 route. This is consistent with his knowledge of the two routes; the U.S.
25 route contains more stoplights, whereas I-75 is a limited-access interstate highway.
However, the I-75 route is several miles longer. It is important that the service
EXAMPLE
Lammers Limos offers limousine service
from Government Center in downtown
Toledo,
Ohio, to Metro Airport in Detroit.
Sean Lammers, president of the company,
is considering two routes. One is via U.S.
25 and the other via I-75. He wants to
study the time it takes to drive to the airport
using each route and then compare
the results. He collected the following
sample data, which is reported in minutes.
Using the .10 significance level, is there a
difference in the variation in the driving
times for the two routes?
A
NALYSIS OF VARIANCE
387
INTRODUCTION
In this chapter, we continue our discussion of hypothesis testing. Recall that in Chapters 10
and 11 we examined the general theory of hypothesis testing. We described the case
where a sample was selected from the popul
ation. We used the
z
distribution (the standard
normal distribution) or the
t
distribution to determine whether it was reasonable to
conclude that the population mean was equal to a specified value. We tested whether
two population means are the same. In t
his chapter, we expand our idea of hypothesis
tests. We describe a test for variances and then a test that simultaneously compares
several population means to determine if they are equal.
COMPARING TWO POPULATION VARIANCES
In Chapter 11, we tested hypothes
es about equal population means. The tests differed
based on our assumptions regarding whether the population standard deviations or
variances were equal or unequal. In this chapter, the assumption about equal population
variances is also important. In this section, we present a way to statistically test this
The
F
Distribution
The probability distribution used in this chapter is the
F
distribution. It was named to honor
Sir Ronald Fisher, one of the founders of
modern
-
day statistics. The test statistic for several
situations follows this probability distribution. It is used to test whether two samples are
from populations having equal variances, and it is also applied when we want to compare
several population me
ans simultaneously. The simultaneous comparison of several population
means is called
analysis of variance (ANOVA).
In both of these situations, the populations
must follow a normal distribution, and the data must be at least interval
-
scale.
What are the c
haracteristics of the
F
distribution?
1.
There is a family of
F
distributions.
A particular member of the family is determined
by two parameters: the degrees of freedom in the numerator and the degrees of
freedom in the denominator. The shape of the distri
bution is illustrated by the following
graph. There is one
F
distribution for the combination of 29 degrees of freedom
in the numerator (
df
) and 28 degrees of freedom in the denominator. There is
another
F
distribution for 19 degrees of freedom in the num
erator and 6 degrees of
freedom in the denominator. The final distribution shown has 6 degrees of freedom
in the numerator and 6 degrees of freedom in the denominator. We will describe
the concept of degrees of freedom later in the chapter. Note that the s
hapes of the
distributions change as the degrees of freedom change.
ANALYSIS OF VARIANCE 387
INTRODUCTION
In this chapter, we continue our discussion of hypothesis testing. Recall that in Chapters 10
and 11 we examined the general theory of hypothesis testing. We described the case
where a sample was selected from the population. We used the z distribution (the standard
normal distribution) or the t distribution to determine whether it was reasonable to
conclude that the population mean was equal to a specified value. We tested whether
two population means are the same. In this chapter, we expand our idea of hypothesis
tests. We describe a test for variances and then a test that simultaneously compares
several population means to determine if they are equal.
COMPARING TWO POPULATION VARIANCES
In Chapter 11, we tested hypotheses about equal population means. The tests differed
based on our assumptions regarding whether the population standard deviations or
variances were equal or unequal. In this chapter, the assumption about equal population
variances is also important. In this section, we present a way to statistically test this
The F Distribution
The probability distribution used in this chapter is the F distribution. It was named to honor
Sir Ronald Fisher, one of the founders of modern-day statistics. The test statistic for several
situations follows this probability distribution. It is used to test whether two samples are
from populations having equal variances, and it is also applied when we want to compare
several population means simultaneously. The simultaneous comparison of several population
means is called analysis of variance (ANOVA). In both of these situations, the populations
must follow a normal distribution, and the data must be at least interval-scale.
What are the characteristics of the F distribution?
1. There is a family of F distributions. A particular member of the family is determined
by two parameters: the degrees of freedom in the numerator and the degrees of
freedom in the denominator. The shape of the distribution is illustrated by the following
graph. There is one F distribution for the combination of 29 degrees of freedom
in the numerator (df ) and 28 degrees of freedom in the denominator. There is
another F distribution for 19 degrees of freedom in the numerator and 6 degrees of
freedom in the denominator. The final distribution shown has 6 degrees of freedom
in the numerator and 6 degrees of freedom in the denominator. We will describe
the concept of degrees of freedom later in the chapter. Note that the shapes of the
distributions change as the degrees of freedom change.