Probability and statistics

Bebisha
PROJECTpart12345.docx

GAUCHAN

1

Gauchan

Angeela Gauchan

Dr. Nicholas Jacob

Intro to Probability and Statistics 1223

April 4th, 2021

2010: Time spent by different sexes in Europe

Project Part I

I was randomly looking for some data set for the project and I landed upon this one. I see a lot of influencers from European countries on my social media feed (mostly Instagram), and they seem to have a perfectly balanced lifestyle, so this dataset about how people spend their time realistically based on country (European countries) and sex appealed to me. The knowledge covers how people spend their time on items like paying jobs, housework, and family.

The following websites have articles related to the subject, with both the source website and csv files. We can navigate through them by clicking on the links below.

https://perso.telecom-paristech.fr/eagan/class/igr204/datasets

http://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=tus_00week&lang=en

There are a number of variables that can be used to interpret this database. The quantitative data that used are time spent in (hh:mm) ratio, participation time in hour and participation rate. These data would be used to compare and contrast how males and females spend their time in various areas of their lives based on where they live. Other data that can be used are categorical data which includes, time spent in sleeping eating, other unspecified personal care, employment related and travel as part of/ during main and second job, main and second job related travel, activities related to employment, study, school and university except homework, homework, free time study, household and family care, food management except dishwashing, cleaning, laundry, construction, shopping, childcare, visiting, computing, hobbies and so on. As we are comparing different countries we will be also looking into nominal variable.

TABLE 1.1

Table, calendar Description automatically generated

I can see some differences in time usages between males and females just by looking at a few sections of the results. In contrast to men, many women tend to spend more time doing house chores. They seem to spend around the same amount of time in childcare, however. I'd like to know the correlations and disparities between them in other aspects of their lives by looking into the specifics in the dataset. 

Project Part II

Below is the updated version of the dataset that I chose which has 7 most common variable listed.

TABLE 2.1

SEX

GEO/ACL00

Total

Personal care

Sleep

Eating

Employment, related activities

Main and second job and related travel

Household and family care

Leisure, social and associative life

TV and video

Travel except travel related to jobs

Males

Belgium

24:00:00

10:45

8:15

1:49

3:07

3:05

2:28

5:58

2:35

1:30

Males

Bulgaria

24:00:00

11:54

9:08

2:07

3:32

3:27

2:37

4:46

2:41

1:07

Males

Germany (including former GDR from 1991)

24:00:00

10:40

8:08

1:43

3:27

3:21

2:22

5:42

1:58

1:29

Males

Estonia

24:00:00

10:35

8:24

1:19

4:27

4:20

2:33

5:02

2:29

1:12

Males

Spain

24:00:00

11:11

8:36

1:47

4:21

4:17

1:37

5:16

2:00

1:16

Males

France

24:00:00

11:44

8:45

2:18

3:48

3:46

2:24

4:44

2:08

1:03

Males

Italy

24:00:00

11:16

8:17

1:57

4:15

4:11

1:35

5:05

1:52

1:35

Males

Latvia

24:00:00

10:46

8:35

1:33

5:00

4:55

1:50

4:45

2:18

1:28

Males

Lithuania

24:00:00

10:53

8:28

1:32

4:45

4:43

2:09

4:47

2:36

1:13

Males

Poland

24:00:00

10:44

8:21

1:33

4:01

3:58

2:22

5:20

2:34

1:13

Males

Slovenia

24:00:00

10:31

8:18

1:33

3:53

3:49

2:38

5:31

2:12

1:10

Males

Finland

24:00:00

10:23

8:22

1:23

3:48

3:46

2:16

5:56

2:25

1:12

Males

United Kingdom

24:00:00

10:22

8:18

1:24

4:10

4:06

2:18

5:22

2:37

1:30

Males

Norway

24:00:00

10:06

7:56

1:25

4:04

4:03

2:21

5:52

2:06

1:21

Females

Belgium

24:00:00

11:11

8:34

1:50

1:53

1:52

4:10

5:06

2:13

1:22

Females

Bulgaria

24:00:00

11:38

9:07

1:55

2:34

2:33

5:01

3:47

2:14

0:52

Females

Germany (including former GDR from 1991)

24:00:00

10:58

8:15

1:46

1:56

1:53

4:14

5:15

1:40

1:19

Females

Estonia

24:00:00

10:30

8:26

1:12

3:05

3:02

4:53

4:18

2:06

1:02

Females

Spain

24:00:00

11:05

8:32

1:44

2:06

2:05

4:55

4:26

1:46

1:05

Females

France

24:00:00

11:53

8:55

2:11

2:17

2:16

4:34

4:05

1:55

0:54

Females

Italy

24:00:00

11:12

8:19

1:52

1:52

1:50

5:20

4:06

1:29

1:14

Females

Latvia

24:00:00

10:53

8:44

1:26

3:29

3:26

3:56

4:08

1:55

1:20

Females

Lithuania

24:00:00

10:56

8:35

1:26

3:31

3:29

4:29

3:45

1:59

1:05

Females

Poland

24:00:00

11:03

8:35

1:34

2:15

2:14

4:45

4:32

2:03

1:06

Females

Slovenia

24:00:00

10:32

8:25

1:26

2:42

2:39

4:56

4:27

1:44

1:02

Females

Finland

24:00:00

10:38

8:32

1:19

2:33

2:32

3:56

5:17

2:02

1:07

Females

United Kingdom

24:00:00

10:43

8:27

1:26

2:24

2:21

4:15

4:55

2:09

1:25

Females

Norway

24:00:00

10:27

8:10

1:20

2:38

2:37

3:47

5:40

1:39

1:11

For the second part of the project, I examined the frequency and relative frequency of the most time spent categories by these males and females in the listed European countries. I took the average of the most time spent and reported the frequency of the time spent on a particular task that was less than the average.

TABLE 2.2

Time most spent (Avg)

Frequency (Below average time)

Relative Frequency

Personal Care (Avg 10:54 hrs/min)

16

11.8%

Sleep (Avg 8:28 hrs/min)

16

11.8%

Eating (Avg 1:38 hrs/min)

17

12.5%

Employment related activities (Avg 3:16 hrs /min)

13

8.9%

Main and second job (Avg 3:14 hrs /min)

13

8.9%

Household and family care (Avg 3:12 hrs /min)

14

10.3%

Leisure, social and associative life (Avg 4:55 hrs/min)

15

11.11%

TV and video (Avg 2:07 hrs/min)

15

11.11%

Travel (Avg 1:14 hrs/min)

16

11.85%

Total

145

100%

Europeans tend to spend the majority of their time on personal care and sleep, according to the dataset. What is even more fascinating is that the majority of them work a second job and spend an average of 6:30 hours/min on both jobs, which sounds like a good reason for you and me to relocate to Europe. Since the majority of social media influencers are freelancers, their feeds can tend to be flawless. After all, their work is to be on social media.

Next, I compared the same categories to their gender which is shown below.

TABLE 2.3

GENDER

Time most spent (Avg)

Male

Female

Personal Care (Avg 10:54 hrs/min)

8

9

Sleep (Avg 8:28 hrs/min)

10

6

Eating (Avg 1:38 hrs/min)

8

8

Employment related activities (Avg 3:16 hrs /min)

1

15

Main and second job (Avg 3:14 hrs /min)

1

12

Household and family care (Avg 3:12 hrs /min)

14

0

Leisure, social and associative life (Avg 4:55 hrs/min)

4

9

TV and video (Avg 2:07 hrs/min)

4

11

Travel (Avg 1:14 hrs/min)

7

9

Since we're looking at data below the average, it's clear that women in the mentioned European countries spend less time working outside and more time in leisure and social activities. In comparison to males, they often drive less and sleep more than the average. Females in household and family care have no below-average percentage, implying that they spend the majority of their time caring for their families. It was also interesting to learn that both men and women devote equal amounts of time to personal care.

PROJECT PART III

For project part III, I created two graphs to display one of the quantitative variables of my dataset that is; time most spent in terms of hour: minute (hh:mm) ratio. I changed the ratio into decimal and the box plot of the most time spent activity is demonstrated below.

Fig 3.1

Fig 3.3

By looking at these graphs you can gather a variety of data. According to my calculations I’ve listed the five-number summary, mean, and standard deviation below.

Five number Summary:

Minimum: 0.42 Quartile Q1: 0.44 Median: 0.45 Quartile Q3: 0.47 Maximum: 0.50

Average (Mean): 0.45 There are no outliers in this data set you can tell by looking at fig 3.3. The distribution of the data is skewed to the right because the mean time is larger than the median time. It also has no gaps which means it is a continuous dataset.

PROJECT PART IV

I suspected that the average time spent on the personal care is the greatest than the other categorical variables.

Ho: = The mean of the time spent on personal care by females = the mean of time spent on personal care by males.

Ha: The mean of the time spent on personal care ≠ the mean of time spent on personal care by females.

If my hypothesis is correct, all I have to do is compare the average amount of time spent on personal care by males and females. Despite the fact that the figures do not differ significantly, table 2.3 shows that females spend more time on personal care than males. As a result, my hypothesis is verified. We can also see in the graph that women in Europe prefer to stay at home longer than men. One of the reasons they spend more time on themselves may be because of this.

Simple formula: (f=female, m=male)

Ho: μf = μm

Ha: μf ≠ μm

For the categorical value, I decided to look at which among the nine chosen variables has the lowest time spent my both male and female.

Ho: The proportion of time spent in employment related activities is p= 0.25

Ha: The proportion of time spent in employment related activities is p <0.25.

I chose this because the most time spent variable was used in all of the previous projects, but I think we should also use the least time spent variable to formulate hypotheses and draw conclusions.

PROJECT PART V

Going back and testing the quantitative hypothesis, I created a bootstrap sample. I found that average amount of time spent on personal time by males was 0.45 while the average amount of time spent on the personal care by female was 0.46 which is very close. Similarly, I found that the standard error for the sample of the male was 0.01 whereas the female was 0.004. I then computed the 95% confidence interval for both means of male and female. For the male, the interval was between 0.44 and 0.46 whereas for female the interval lies between 0.45 and 0.47. With this we reject the null hypothesis because the estimated mean is not equal to the other mean value. Below are the two histograms for the bootstrap distribution of the male and female (Table 5.1).

I also tested the categorical hypothesis using bootstrapping. I found the standard error 0.0064. The 95% interval that I was able to find for the proportion of the time spent on employment related activities is between 0.1249 and 0.1453. Below is the histogram for the bootstrap distribution (Table 5.2). Since, the null hypothesis of p=0.25 is not within the confidence interval we reject the null hypothesis.

TABLE 5.1

TABLE 5.2

TABLE 5.3