Scenario Paper Summary

profilekeneth 1
scenario2.docx

Scenario 2 1

Scenario 2 6

Scenario 2

Student name

Institution affiliation

Introduction

The advancement of technology has brought immense changes in the world we are living today. One area which has affected the lives of everyone in this planet is the internet, especially the social media. In summary, the social media has become the mode of communication, with people thousands of miles’ part being able to communicate as easy as if they were together. However, the social media has a dark side which is the purpose of this study.

Individuals who spend a significant amount of time in social media are at risk of being drawn into online sex work. Day by day, the number of people are at risk in being involved into online sex trade. Data on the issue is limited by how low it is available. However, thanks to Kaggle.com, someone kind enough scrapped a popular European adult forum and collected the data. The data is pretty large, with approximately 30,000 entries and 17 variables. However, for the sake of this exercise, we will use a subset of the data. The subset will include 6 variables of 40 cases.

The sample size is very important when it comes to sampling from a very big population. The reason is that statistical tests greatly depend on them and according to the law of large numbers, the larger the size of the sample, the accurate the result will be. Moreover, the power of a study depends on the sample size and the effect size.

There are several issues which should be considered when choosing a sample size. First of all, the parameter of the population which need to be estimated. Secondly, the cost of sampling should also be considered. When the sample size is too large, the scarce resources will be misused. Third, how much information is already known will also affect the amount of information to include in the study. Fourth, the spread of the population as well as the practicality of collecting the data.

The sample

As explained, the sample data was a subset of a large dataset which collected on the members of an adult forum. There were many variables but only the username(coded), gender, age, amount of time since membership (categorized in years) and the risk in being involved into online sex trade. The sampling was done in excel spreadsheet. The 30,000 entries were populated in excel. Cleaning of the data was done with those without a risk variable being eliminated from the sample. The remaining data was sampled using the rand() excel function. Furthermore, the duration of membership was calculated by subtracting the current year (2018) with the year which the member joined the forum. Because the number of years being in forum was numeric, it was converted into three categories.

The explanatory variables are age, gender, duration of membership, number of comments in public forum, and the location. The dependent variable was the riskiness of the individual being involved into online sex trade.

Statistical analysis and assumption

Since we are only using one independent variable (membership in years) and one dependent variable (risk), we will analyze the data using statistical test known as the Chi-Square test of Independence.

Because we are using categorical data when using the chi – Square test of independence, there are only two assumptions.

1. The two variables should either be measured at an ordinal or categorical level. In our problem, the number of years which the members of the forum joined the adult forum is converted into category as follows

a. One year or less

b. Between two and four years

c. More than four years

On the other hand, the dependent variable is also categorical with following possibilities

a. High risk

b. No risk

2. The two variables (independent and dependent) should consist of two or more categories, independent groups. As highlighted above, the independent variable has three levels whereas the dependent variable has two categories.

Analysis

A quick view of the data can be seen using the barplot below

It is pretty straight forward to see that those who joined two years ago are at a risk of being involved in online sex trade. Furthermore, it seems that as you spend more time in the forum, the risk of being involved also increases.

Chi – Square test of independence

To be able to use the chi – square test of independence, we should first declare our hypothesis.

H0: There is no relationship between the number of years since one joined the forum and the risk of being involved in online sex trade.

H1: there is a relationship between relationship between the number of years since one joined the forum and the risk of being involved in online sex trade.

Once we run the chi-square, we will use the output to see if we will reject or fail to reject the null hypothesis.

Results

Chi-Square Tests

Value

df

Asymp. Sig. (2-sided)

Pearson Chi-Square

1.907a

2

.385

Likelihood Ratio

2.802

2

.246

N of Valid Cases

40

a. 3 cells (50.0%) have expected count less than 5. The minimum expected count is .98.

Interpretation

In the Chi-Square tests table, we are interested in the results of the “Pearson Chi-square” row. Therefore, according to the table, χ(2) = 1.907, p = 0.385. because the p-value is greater than 0.05, we will reject the null hypothesis and conclude that there is no any significant association between the number of years since an individual signed up in the adult forum and the risk involved in online sex trade. In other words, the years do not influence the riskiness.

Conclusion

Therefore, we have seen that time does not influence any riskiness in the social media platforms. However, it should also be noted that there is an assumption that the riskiness can only be determined by the years. However, in a real world situation, there are many factors independent factors (some being depending on each other) which can influence a certain dependent factor. In our case, we never know if age, the number of times an individual has commented or posted on the forum, how many profile pictures they have and so on, which can explain the dependent variable.

Reference

www.kaggle.com