helpfn

profilebcs
article1.pdf

Journal of Data Science 355-376 , DOI: 10.6339/JDS.201804_16(2).0007

CAN EMOTICONS BE USED TO PREDICT SENTIMENT?

Keenen Cates1, Pengcheng Xiao1,∗, Zeyu Zhang1, Calvin Dailey1

1 Department of Mathematics, University of Evansville

1800 Lincoln Ave, Evansville, Indiana, 47722 USA

Abstract: Getting a machine to understand the meaning of language is a

largely important goal to a wide variety of fields, from advertising to enter-

tainment. In this work, we focus on Youtube comments from the top two-

hundred trending videos as a source of user text data. Previous Sentiment

Analysis Models focus on using hand-labelled data or predetermined

lexicon-s.Our goal is to train a model to label comment sentiment with

emoticons by training on other user-generated comments containing

emoticons. Naive Bayes and Recurrent Neural Network models are both

investigated and im- plemented in this study, and the validation accuracies

for Naive Bayes model and Recurrent Neural Network model are found to

be .548 and .812.

Key words: Sentiment analysis, Emoticons, Natural Language Processing,

Machine Learning.

1. Introduction

Sentiment analysis is a branch of natural language processing that involves trying to

understand the underlying sentiment and emotion behind language. For example,“Have a

great day” has a positive sentiment, and “Have a bad day” has a negative sentiment.

Current state of the art techniques for modelling sentiment in language involve using

machine learning and deep neural networks to classify the sentiment of language. For

example, SemEval is a yearly contest for trying to classify tweets as Positive, Negative, or

Neutral. Its findings advance the field of sentiment analysis and machine learning

(Rosenthal, Noura, and Preslav 2017).

356 Can emoticons be used to predict sentiment?

1.1 Objectives

Our focus is on another major social platform, Youtube, which garners hundreds of

thousands of comments and other user generated statistics. User data yields important

results in the fields of social sciences. In particular we are in- terested in the top trending

Youtube videos,and aim to identify sentiment of commenters by suggesting what emoticon

a user might use with their comments. We suggest emoticons give insight into the

sentiment of the user, and the emoticons pictographic nature gives us a better language to

indicate emotion. Using the subset of comments with emoticons we engineered a

labelled dataset of com- ments and emoticons. Our models take advantage of this

labelling to model the emoticon lexicon. This is further used to suggest what emoticons

might ac- company a comment (Hogenboom 2013). Using this dataset and the models we

have create, we hope to answer whether or not we can accurately predict what emoticon a

user might use.

1.2 Literature Review

Sentiment Analysis drives many industries and being able to correctly identify

sentiment in a Youtube comment would allow automated systems to moderate comments

or correctly recommend media or advertisements to users. In general, there are two

methods that Natural Language Processing researchers use for Sentiment Analysis;

Lexicon based and Machine Learning based. Sentiment Analysis is a fairly robust field,

and has consistently seen interest since its conception. This field has increased

exponentially with the surge in data seen with the rise of the internet, in many cases the

amount of data is intractable. Social platforms such as Youtube, by themselves generate

more data than any one hu- man could analyze. Therefore a system of Natural Language

Processing (NLP) is required to deal with the sheer volume of data.

Natural Language Processing can be considered a subset of cognitive science or

computer science. The concept of natural language processing originally came about in the

mid-20th century. The initial motivation was language translation (Salas-Za ŕate 2017).

Natural Language Processing naturally lends itself to the field of Artificial Intelligence, as

there is a strong desire for agents that can understand human language; for example, a chat

bot. Sentiment Analysis did not pull much attention until the early 2000s. The natural

language processing systems that were developed at first were only applicable to narrow

subject areas, such as answering questions with information from a database about moon

rocks, or answering questions from a manual on airplane maintenance (Liu 2012). The

Keenen Cates, Pengcheng Xiao, Zeyu Zhang, Calvin Dailey 357

explosion of social data quickly created a necessity to autonomously understand language

sentiment. Especially with the ubiquitous nature of social media in recent years, the field

of sentiment analysis has become more and more applicable to many fields. It has been

one of the most active areas of research in the field of natural language processing since

the turn of the century (Pozzi 2017).

There are many commercial applications. It may have significant effects for the fields

of management, political science, economics, and other social sciences, among others (Liu

2012). Sentiment analysis, also known as opinion mining, refers to the process of creating

automatic tools or systems which can derive subjective information from text in natural

(human) languages, as opposed to computer codes. The subjective information most

commonly desired by researchers are opinions and sentiments, hence the name sentiment

analysis. Sentiment analysis, while originally only practiced by computer scientists, has

become widely used by the management scientists and the social sciences. Microsoft,

Google, Hewlett-Packard, IBM, and others have created their own systems for sentiment

analysis.

Before the turn of the century, there were previous developments in what would later

become the field of sentiment analysis. Naive Bayes classifier pro- vided a way to model

the affective tone of an entire document based on the “semantic differential scores” of each

of the words in the document. The semantic meanings and scores were derived from a 1965

study by Heise. According to Lee and Pang (2002) marked an explosion of research in

sentiment analysis. This increase in the study of this topic was partially attributed to the

increasing popularity of machine learning models, and the availability of training sets with

which machine learning models could be trained. Turney (2002) used an algorithm based

on parts-of-speech tagging and semantic orientation in order to classify online reviews as

recommended or not recommended. Anderson and McMaster (1982) used machine

learning techniques such as Support Vector Ma- chines and Naive Bayes in order to

classify the sentiment of movie reviews. Dave, Lawrence, and Pennock (2003) classified

polarity of web reviews based on several n-gram methods. It was not as accurate when

applied to individual sentences because it was developed with the purpose of classifying

reviews which normally contained multiple sentences. Hu and Liu (2004) used a method

that could predict the sentimental orientation of opinion words and therefore the opinion

orientation of a sentence. It was an unsupervised method and did not require a corpus, and

was loosely based off the work of Dave, Lawrence and Pennock. It returned the

sentiments at the sentence level instead of at the entire review at once. Then it combined

the sentence-level sentiments to give a summary of the entire review. Moraes, Valiati, and

Neto (2013) showed the effectiveness of machine learning processes as opposed to

358 Can emoticons be used to predict sentiment?

lexicon-based models. They empirically compared the Support Vector Machines and

Artificial Neural Network machine learning methods for sentiment analysis and found that

the Artificial Neural Networks performed better. In 2015, Wang, Liu, Sun, Wang.B, and

Wang.X. showed the effectiveness of Long short-term memory recurrent neural networks

for sentiment analysis by predicting the sentiments of tweets.

1.3 Sentiment Lexicon

The lexicon method splits input text into many individual words or phrases called

tokens. Then, it creates a table of these tokens and records the number of times each token

shows up in the text. The resulting tally is called a “Bag of Words” model. Once this

process is done, another tool called “Sentiment Lexicon” is used for computing the

classification of the bag of tokens we mentioned above. The Sentiment Lexicon has the

sentiment values, which can be just positive or negative numbers or some other value-

representations, like vectors, that are pre-recorded for each token. This can be done either

manually or by some machine learning techniques. Once we have the input text tokenized

and a suit- able Sentiment Lexicon, the final task is to design a function to compute the

final sentiment. The simplest way to compute the final sentiment is to sum the sentiment

values of each token together. The lexicon method is a traditional way to deal with natural

language processing problems, and it has a good theoretical basis. Many people are still

using and studying this method in spite of its origins in the 1960s. However, it does have

some drawbacks such as ignoring the importance of integrality and continuity of the text.

We know that the meaning of a sentence highly depends on the order of words and context;

these should not be ignored if we want a real intelligent sentiment processing system

(Tbboada 2011).

1.4 Machine Learning

In the Machine Learning technique of sentiment analysis the classification algorithm

uses a training set to learn a model based on features in the set. This makes a more nuanced

classification possible and can help with ambiguous words or interpretations that vary by

context. A method of feature extraction must be chosen. Some of these methods include

n-grams, which are sets of words that contain n words each. Others use parts-of-speech

information, emotional, affective, or semantic data. One of the disadvantages of the

machine learning method is that it requires a large set of labelled data to be used as the

Keenen Cates, Pengcheng Xiao, Zeyu Zhang, Calvin Dailey 359

training set. It is simpler to use the lexicon-based method unless a suitable training set is

available (Salas-Za ŕate 2017).

We will need to classify the sentiments of the emoticons manually in order to prepare

them for use in our analysis. Once that is done, we can compile our training set using the

comments in the data that already contain emoticons, using the sentiments of each

emoticon. Then our model will be able to classify and assign an emoticon to each comment

in the data set that does not already contain one. Recurrent Neural Networks(RNNs) have

had a great deal of success in the Natural Language Processing Realm. The reason is that

text data is highly sequential, for example, the word “day” does not mean much unless you

know the words that came before it; i.e “Have a great day.” RNNs have pushed the state

of the art of previous architectures in short-length text data (Lee and Dernoncourt 2016).

Given previous attempts to model sentiment have not thoroughly explored emoticons,

we hope to answer the question of whether or not we can accurately recommend emoticons

that might accompany a piece of text. Once we have answered this, further research can

make attempts to analyze sentiment with emoticons on a machine.

2. Methodology

2.1 Data

To get our data, we used the Data Science Competition Website Kaggle. On this

website, people share datasets, competitions, and tutorials. We found a dataset containing

comments from the top 200 trending Youtube videos. The author of this dataset obtained

the data through Youtube’s publicly available API, which allows developers to easily

query for data on Youtube. The data itself contains profanity, nonsensical text, and in

general is noisy. The data itself could be generated by bots, and we do no vetting to

determine whether a comment actually comes form a human. The noisiness of the data

might prevent us from training a successful model; however, we assume that the large

amount of data will help our models perform well in spite of the low quality of data.

In order to answer the question of whether or not a model could recommend emoticons,

we created 2 models that attempt to perform this recommendation. We also created a

simple dummy model for purposes of comparison. We have roughly three-hundred

thousand comments with emoticons, and use that to boos- trap a dataset of comments with

labels. More data is desirable, but this is a fairly large corpus for initial research.

In total, there are 691, 388 rows in the dataset. A large proportion of them contain

emoticons, (more than 200, 000), so there is a quite a bit of data, and it would be fairly

360 Can emoticons be used to predict sentiment?

straightforward to access the Youtube API and get more if needed. This means I have as

much data as I could possibly want, and more if needed. As for features, I will only use

the text, likes, reply threads, and so on will be ignored in this phase of the project. On

average, each text is 15 words long. Figure 1 shows some examples of how the data looks.

Figure 1: Example unprocessed data

2.2 Evaluation Metrics

The models will be evaluated using a holdout set of data, in which each will

recommend five emoticons that might accompany a text. If at least one recommendation

is an emoticons that occurs in the validation comments, then I will consider it to be

a ”correct” guess. Accuracy is then the number of correct guesses divided by total guesses.

Keras calls this accuracy ”top k categorical accuracy”, and will be implemented for our

models. Mathematically, this would look something like this where matching x ∈

Comments and y ∈ Labels and score(x) = 1 if any p ∈ argmaxk=5(predict labels(x)) is in

y, else score(x) = 0. predict labels(x) would return the probabilities of each output class

occurring. Then the accuracy of the model would be ΣN(score(xi))

𝑁 where xi∈ Comments

and N =| Comments |.

One consideration is that the distribution of emoticons occurring in the corpus of data

is highly skewed; this would be good reason to suggest F1 scores and might be better for

future analysis. However, we chose this evaluation metric because it more closely

resembles the question we are asking. The important thing to note is that the distribution

is indeed skewed(see Figure 2).

Keenen Cates, Pengcheng Xiao, Zeyu Zhang, Calvin Dailey 361

Figure 2: Distribution of a subset of Emoticons

2.3 Analysis Plan

In order to compare the performance of our model, we created a holdout set of data

meant for only validation of accuracy. We also defined what a prediction would be for

each model, each model would output its top five highest predictions. If any of those

predictions are in the output validation set, then we considered it an accurate prediction.

Then in order to analyze the dataset, we will compute the prediction accuracy of each

model and compare those scores. One might also consider looking at the training accuracy

of each model; however, these scores are not directly comparable, so we ignore them

except for the purposes of optimizing the model.

2.4 Approach

In our approach, we had to make a few crucial assumptions and simplifications to

contextualize our problem. Firstly, our dataset involved input data with multiple output

classifications. For example, a users can add hundreds of the same emoticon or many

different emoticons. As a preprocessing step, we narrowed down these classes to the

unique emoticons that show up in a comment, and unrolled the data set to have a single

label. Table 1, displays how each comment gets unrolled into multiple data points with

single labels.

362 Can emoticons be used to predict sentiment?

I loved this video! x x y

I loved this video!

I loved this video!

x

y

Table 1: Unrolling of data labels

The other assumption exists only for our Naive Bayes Model, and it is that all words

in the comments are independent. This assumption is difficult to back up, and it is not clear

whether there is mutual dependence or mutual exclusivity between words. However, our

Recurrent Neural Network does not have this limitation because it can model the entire

sequence.

2.5 Preprocessing

One of the most important steps is the preprocessing stage. This is done before all

models are trained. We first separate the data into comments with emoticons and comments

without emoticons. We then make all comments lowercase and afterwards normalize our

comments on both by creating a dictionary of punctuation to tokens, and a dictionary of

word counts over all comments that use thes ordering of each word as its embedding. Table

2 shows an example of how the dictionaries are used to tokenize a comment. A similar

process is used to encode the emoticons, we use a dictionary to encode them as integers.

Preprocessing the comments in this way gives us a normalized integer sequence, which

deals with comments that might have different capitalizations of words.

2.6 Dummy Model

For purposes of comparison, we created a very simple model that always predicts that

a comment would use the emoticon with the largest prior probability. The motivation

behind this, is that it gives us a baseline score to beat. If we can do significantly better than

this, then we know that the models have potential.

2.7 Naive Bayes Model

Our second model uses Bayesian Statistics that creates tables of posterior proba-

bilities for each class given a word using Bayes rule. Naive Bayes is a conditional

probability model, and given some instance to be classified, represented by a vector of

features:

x = (𝑥1,…,𝑥𝑛)

We then compute the probability of each output class using conditional probability

Keenen Cates, Pengcheng Xiao, Zeyu Zhang, Calvin Dailey 363

p(𝐶𝑘|𝑥1,…,𝑥𝑛)

Table 2: Tokenization Process

Since n, can be large making this model less tractable we need to reformulate our

model using Bayes Rule. In plain english,

𝑝𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟 = 𝑝𝑟𝑖𝑜𝑟 ∙ 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑

𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒

And symbolically,

p(𝐶𝑘|𝑥) = 𝑝(𝐶𝑘)𝑝(𝑥|𝐶𝑘)

𝑝(𝑥)

In practice, the numerator is the most import part as the denominator does not depend

on effectively making it a constant. The numerator is equivalent to the joint probability

model meaning we can replace the numerator with,

p(𝐶𝑘,𝑥1,…,𝑥𝑛)

We can then rewrite the numerator using the chain rule for repeated applications of

conditional probability, derivation is in appendix 1. Then we add the naive as- sumption

of conditional independence, allowing use to further simplify our model

364 Can emoticons be used to predict sentiment?

Figure 3: Naive Bayes Model

to:

p(𝐶𝑘,𝑥1,…,𝑥𝑛) = 1

𝑍 𝑝(𝐶𝑘)∏𝑝(𝑥𝑖|𝐶𝑘)

𝑛

𝑖=1

Where Z is:

Z = p(x) = ∑𝑝(𝐶𝑘) 𝑘

𝑝(𝑥|𝐶𝑘)

Which is the scaling factor dependent on the instance. The derivation is in appendix 2.

In order to make a classifier, we would generally take the argmax of the simplified model

without Z, but in our case we take the top five arguments as our program is recommending

multiple emoticons that might be appropriate to the definition of Naive Bayes classifier .

We implement this model in python and the model follows figure 3.

Another problem is that we have to deal with words that never show up in our corpus

of texts. In order to deal with this, we smooth the probabilities. To do this, we make any

word or class that doesn’t show up have a very small probability that is close, but not zero.

Otherwise, the probability would zero out when words are not in the corpus.

2.8 Recurrent Neural Network

Our third and final model, is a recurrent neural network and our architecture is as

follows in table 3.

Keenen Cates, Pengcheng Xiao, Zeyu Zhang, Calvin Dailey 365

Input

Embedding Layer

LSTM Layer

LSTM Layer

LSTM Layer

Fully-Connected Layer

Output layer

Table 3: RNN Architecture

Recurrent Neural networks are a class of neural networks that form a directed cycle,

allowing them to take time into account, or a notion of memory. This allows for the RNN

to be suited to predicted arbitrary sequences by taking advantage of their memories.

The label data also undergoes another transformation before the RNN begins the

learning process. Since the emoticons are encoded using an ordinal number, the integer

representation does not quite make sense as one emoticon is not greater than another. To

rectify this, we represent this integer as a one-hot vector, essentially we take a fixed-length

vector that is the size of the total number of output classes. Then the integer is used as an

index of the “hot” class. Table 4 gives a small example of encoding a small class space.

Table 4: One-Hot Encoding

One of the major features of this model is the stacked LSTM layers. This architecture

allows us to better model hierarchical elements of language. This means each layer will

represent progressively complex parts of the hierarchy. One might imagine this in terms

of the composition of the human face. For example, the most basic element is an edge.

Then a more complex step would be individual elements of the face such as a nose or

mouth. Then the most complex part would be the entire face, and the composition of its

requisite parts.

366 Can emoticons be used to predict sentiment?

The LSTM itself is able to remember previous contexts in sentences, meaning we

could potentially get more performance via our model becoming better at modelling

context.Our RNN had a much longer time to run, and in order to train the model, we

decided to use more power hardware in the form of a GPU. The Neural Network was then

trained on a GPU using Floyd Hub, a platform for running deep learning projects. The

expense was roughly 14 dollars, as a we subscribed to the Data Science plan which gave

us 10 hours of gpu time which we used for experimentation on multiple occasions. The

price was remarkably cheap compared to other platforms such as Amazon. Usage of

FloydHub is remarkably simply, and resembles version control programs such as git. One

simply uploads their code to the website using command line tools, and are given an

interface to interact with their instance. This service was worthwhile to learn because it

abstracted away elements such as infrastructure, version control, and storage and we could

focus on the problem.

In addition to our baseline architecture, we also preform dropout on each lay- er,

which helps prevent against training bias because the network probabilistic “drops” some

of the weight which forces the network to build redundancies. For the training metric, we

implemented the top k categorical accuracy metric listed in the evaluation metrics. For

the objective function we found that categorical cross entropy work best which typically

works well in multi-class, single-label s- cenarios.Using TFLearn, a deep learning library

for Python, we implemented the architecture we decided on with relative ease. TFLearn

builds on top of Tensor- Flow, abstracting away many of the more intimate computational

components, and allowing the programming to think about the layers and interactions

between layers rather than how to build a well known type of layer or cell.

2.9 Implementation

2.9.1 rogramming Language Libraries

•Python 3

•TFLearn a deep learning library featuring a higher-level API for Tensor- Flow.

•TensorFlow a deep learning library

As mentioned throughout the text, the models where implemented using the listed

libraries. We did our coding on the website FloydHub via iPython Notebooks, which

abstracted away much of the setup. We split our code up into three notebooks, one for

preprocessing, Bayesian Model, and RNN. We ran into very few problems implementing

our solution; however, some are outlined below.

Keenen Cates, Pengcheng Xiao, Zeyu Zhang, Calvin Dailey 367

2.9.2 Problems

•Bayes Smoothing We ran into a small hitch with the Bayesian when dealing with

querying prior probabilities when certain values did not exist in the data. However, we

used a technique to ”smooth” the values by assigning a small probability to these values.

•Skin Tone Modifters There are emoticons that exist that modify other emoticons, i.e.

allowing one to change the skin tone of the smiley face. We found that these confounded

our predictions, and removed them as possible predictions.

•Finding loss, activation, and metrics We had to experiment many times to find the

best loss, activation, and metric functions for our RNN. This process may be simple trial-

and-error as we experienced.

2.10 Reftnement

Originally, our RNN model did not preform as well as we had hoped; however, a few

optimization to our model vastly impacted our performance. The first model we used was

a multi-class, multi-label classifier which performed very poorly. Our RNN had

performance at .508 which left much to be desired. We believe the reason for this is that

instead of one-hot encoded vector, we had many-hot encoded. This means that the label

space would be of order 2# of emoticons. Since this space is extremely large, the model would

have trouble representing any reasonable portion of this. For this reason, we needed to

unroll data points to preform multi-class, single-label classification. After adjusting our

loss function, metric function, and activation function we ended up with much better

performance. We believe this to be because of the reduction in potential labels to just #

of emoticons. In addition, hyper parameters were adjusted, such as, learning rate and batch

size to find out what setting worked best. The best we found was a learning rate of .001

and a batch size of 128.

3. Results

In order to validate the models, we created a holdout set of labelled data that none of

the models got to use for training or testing. The accuracy of each model using top k

categorical accuracy is in tables 5 and 6.

368 Can emoticons be used to predict sentiment?

Model Accuracy

Dummy

Naive Bayes

RNN

.527

.859

.702

Table 5: Training Accuracy Results

Model Accuracy

Dummy

Naive Bayes

RNN

.527

.548

.812

Table 6: Validation Accuracy Results

Table 6 gives us a measurement of how well our recommendation engine gives us

accurate emoticons to represent our text. Our results do not promote strong confidence in

our Naive Bayes Model’s ability to recommend emoticons; however, there are some

potential improvements to the model such as n-gram modelling. Notably, the Bayesian

Model preforms decently on the training data, but generalizes quite poorly and shows signs

of over-fitting. The RNN on the other hand, surprisingly preforms slightly worse on

training, but preforms much better on the validation set. For whatever reason this

phenomenon occurs, it is clear that the model generalizes much better.

3.1 Visualization of Model Functionality

We have a model that could be incorporated into a wide variety of applications; for

example, a browser plugin that predicts what emoticons you might put with a comment

and assist the user similar to an auto-complete feature. One issue to consider might be the

nature of Youtube comments themselves, which might pre- vent the generalization of this

model to other applications. However, the models do show that this sort of functionality is

possible. For example, we have pulled some examples from the data and run them through

our models to produces the tables below, and the comments themselves seem to be quite

different than more formal forms of language.

Keenen Cates, Pengcheng Xiao, Zeyu Zhang, Calvin Dailey 369

Table 7: Example data and predictions

While the machine learning back-end may not be the most sophisticated, the model

does a good job in practice of giving recommendations, and we think the model would be

good enough to use for applications to be built on top of.

3.2 Limitations

One limitation of our models is that words that do not show up in the Youtube

Comment corpus cause issues, as our models have trouble predicting outputs for words

that it has never seen. One way to fix this, might be to mine for more Comment data. Some

drawbacks of the Naive Bayes Model is that we may not be able to model longer term

trends in comments, however with the short length of the comments, this may be a non

issue. We also are limited in our choice of language modelling because we are on the word

level. We would likely see large improvement by expanding our level of modelling to some

type of n-gram. The RNN has limitations in multi-class classification, and this may be

hindering its ability to learning. Another limitation might be that the training time is cost

prohibitive. The model would likely continue to learn and perform better with more

370 Can emoticons be used to predict sentiment?

training time and data, meaning ultimately a higher cost for the model. The naive bayes is

easy to program with fast run time, and no need to train for hours upon hours.

Another major consideration is that an RNN might be a bad fit. We originally though

long term sequential modelling would be important, but it turns out the average comment

length is 15 words long. It may be the case that sense the length of texts are so short, that

we might have to thoroughly rethink what our strategy would be if this sequential

modelling is unimportant.

3.3 Future Work

In order to eliminate the assumption of independence in the Bayesian model, we can

add complexity by changing at what level we model the data. To do such we would need

to employ a skip-gram or n-gram model that contain larger parts of the sequence data. One

might also explore alternative Bayesian Models such as Hidden Markov Models. The same

improvements to the data modelling using n-grams would likely improve the quality of the

RNN results. The RNN model likely has a great deal of room for improvement, one might

experiment with hyperparameter tuning or modifying the architecture. There are even more

powerful models such as CRNNs and GANs that push the state of the art in deep learning.

These models would be worth exploring; however, we pushed our newfound deep learning

knowledge as far as we could in the time allotted.

Another important consideration is the unrolling of the data. Future work should

further explore how to deal with multi-class classification, which would likely involve

writing new validation and loss functions for the neural network model. However, the

Naive Bayes Model does not suffer from this limitation.

Future work might also try and further connect the emoticons and sentiment. We

hypothesize that emoticons will naturally lend themselves to a easily convert into

sentiment classes. However, our current models predict only what emoticon might be used,

and the user of the model would have to infer what sentiment the emoticon might convey

depending on context.

One might also find more optimizations by adding further preprocessing steps, for

example, eliminating common english words that add very little information.

Keenen Cates, Pengcheng Xiao, Zeyu Zhang, Calvin Dailey 371

3.4 Reflection

Looking back at the process, here are the steps we took to get to the current models

• Literature Review We made sure to have a rough idea of what people in this field

have tried, and what the state of the art is.

• Deciding on a Model After reviewing the field, we made a decision on what models

we wanted to implement which set the tone for preprocessing and implementation.

• FloydHub Next we setup our programming environment with cloud computing in

mind. It’s important to setup an environment such as FloydHub or AWS to minimize

training time on a fast gpu. At this step we also made sure to download all the libraries we

would need

• Preprocessing a large majority of time was spent trying to learn how to deal with

the data, and exploring the data itself. We had to go through multiple iterations of

embedding and tokenization to find the method that made sense.

• Model Implementation After preprocessing our data, this step was fairly

straightforward. Most of the time at this step is dealing with edge cases, or optimization of

models rather than the actual implementation.

• Reftnement Refinement may have been the hardest part because we had to make

inferences about why our model was not performing up to our desires. It’s hard to say what

the potential of each model was, so we kept iterating until we had something that seemed

substantial.

3.5 Conclusion

Overall, there are many areas for potential improvement, and our work serves as a

baseline for recommending emoticons. However, we have begun to answer our original

question, it seems plausible the emoticons can be assigned with accuracy to comments as

noisy as Youtube comments, making it easy for a casual observer to understand the

sentiment of a text.

Acknowledgment: The authors appreciate the anonymous referee for the con-

structive review of the paper which has greatly improve the quality of the article. The

authors would also like to thank the generous support from the mathematics department at

University of Evansville.

372 Can emoticons be used to predict sentiment?

Appendix

1. Chain rule for repeated applications of conditional probability.

p(𝐶𝑘,𝑥1,…,𝑥𝑛) = 𝑝(𝑥1,…,𝑥𝑛,𝐶𝑘)

= 𝑝(𝑥1|𝑥2 …,𝑥𝑛,𝐶𝑘)𝑝(𝑥2 …,𝑥𝑛,𝐶𝑘)

= 𝑝(𝑥1|𝑥2 …,𝑥𝑛,𝐶𝑘)𝑝(𝑥2|𝑥3 …,𝑥𝑛,𝐶𝑘)𝑝(𝑥3 … ,𝑥𝑛,𝐶𝑘)

=….

= 𝑝(𝑥1|𝑥2 …,𝑥𝑛,𝐶𝑘)𝑝(𝑥2|𝑥3 …,𝑥𝑛,𝐶𝑘)…𝑝(𝑥𝑛−1|𝑥𝑛,𝐶𝑘)𝑝(𝑥𝑛|𝐶𝑘)p(𝐶𝑘)

2. Naive Assumption of conditional independence to simplify model. This the joint

model can be derived via:

p(𝑋𝑘|𝑥1,…,𝑥𝑛) = p(𝐶𝑘,𝑥1,… ,𝑥𝑛)

= p(𝐶𝑘)𝑝(𝑥1|𝐶𝑘)𝑝(𝑥2|𝐶𝑘)𝑝(𝑥3|𝐶𝑘)…

= p(𝐶𝑘)∏𝑝(𝑥𝑖|𝐶𝑘)

𝑛

𝑖=1

Keenen Cates, Pengcheng Xiao, Zeyu Zhang, Calvin Dailey 373

References

[1] Anderson, C., McMaster, G. (1982). Computer Assisted Modeling of Affective Tone

in Written Documents. Computers and the Humanities, 16(1), 1-9.

[2] Brill, E., Mooney, R. J. (1997). An Overview of Empirical Natural Language

Processing. AI Magazine, 18(4), 13.

[3] Chipman, S. E. (2017). The Oxford Handbook of Cognitive Science. Oxford: Oxford

University Press.

[4] Dave, K., Lawrence, S., and Pennock, D. (2003). Mining the Peanut Gallery: Opinion

Extraction and Semantic Classification of Product Reviews. In Proceedings of the

12th International Conference on World Wide Web (WWW 03). ACM, New York,

NY, USA, 519-528.

[5] Hu, M., Liu, B. (2004). Mining and Summarizing Customer Reviews. In Pro- ceedings

of the Tenth ACM SIGKDD International Conference on Knowl- edge Discovery and

Data Mining pp. 168-177. ACM.

[6] Hogenboom, Alexander, et al.(2013) Exploiting emoticons in sentiment analysis.

Proceedings of the 28th Annual ACM Symposium on Applied Computing. ACM.

[7] Kang, Mangi, Jaelim Ahn, and Kichun Lee.(2017) Opinion mining using ensem- ble

text hidden Markov models for text classification. Expert Systems with Applications.

[8] Lee, Ji Young, and Franck Dernoncourt.(2016) Sequential short-text classifi- cation

with recurrent and convolutional neural networks. arXiv preprint arXiv:1603.03827.

[9] Liu, B. (2012). Sentiment Analysis and Opinion Mining. Morgan and Claypool.

LSTM Networks for Sentiment Analysis. DeepLearning 0.1 Documentation,

Deeplearning. Retrieved December 01, 2017.

[10] Mitchell, J. (Datasnaek). Trending Youtube Video Statistics and Comments.

Kaggle, Kaggle Inc., Aug./Sep. 2017.

374 Can emoticons be used to predict sentiment?

[11] Moraes, R., Valiati, J. F., Neto, W. P. G. (2013). Document-level Sentiment

Classification: An Empirical Comparison between SVM and ANN. Expert Systems

with Applications, 40(2), 621-633.

[12] Naive Bayes classifier.Wikipedia, Wikimedia Foundation INC, 30 Nov. 2017,

Available from http://en.wikipedia.org/wiki/NaiveBayesclassifier.

[13] Pang, B., Lee, L., Vaithyanathan, S. (2002). Thumbs up? Proceedings of the ACL-02

Conference on Empirical Methods in Natural Language Processing

- EMNLP 02.

[14] Pozzi, F. A. (2017). Sentiment Analysis in Social Networks. Amsterdam: Else- vier.

[15] Rosenthal Sara, Noura Farra, and Preslav Nakov. (2017 )SemEval-2017 task 4:

Sentiment analysis in Twitter. Proceedings of the 11th International Workshop on

Semantic Evaluation .

[16] Salas-Za ŕate, M. P., Medina-Moreira, J., Lagos-Ortiz, K., Luna-Aveiga, H.,

Rodr íguez-Garc ía, M. A .́, and Valencia-Garc ía, R.(2017) Sentiment Analy- sis on

Tweets about Diabetes: An Aspect-Level Approach. Computational and

Mathematical Methods In Medicine, 1-9.

[17] Siersdorfer, Stefan, et al.(2010) How useful are your comments?: analyzing and

predicting Youtube comments and comment ratings. Proceedings of the 19th

international conference on World wide web. ACM.

[18] Taboada, Maite, et al.(2011) Lexicon-based methods for sentiment analysis.

Computational linguistics 37.2:267-307.

[19] Turney, P. D. (2002). Thumbs Up or Thumbs Down?: Semantic Orientation Applied

to Unsupervised Classification of Reviews. In Proceedings of the 40th Annual

Meeting on Association for Computational Linguistics, pp. 417-424. Association for

Computational Linguistics.

[20] Wang, X., Liu, Y., Sun, C., Wang, B., Wang, X. (2015). Predicting Polarities of

Tweets by Composing Word Embeddings with Long Short-Term Memory. In

Proceedings of the 53rd Annual Meeting of the Association for Compu- tational

Linguistics and the 7th International Joint Conference on Natural Language

Keenen Cates, Pengcheng Xiao, Zeyu Zhang, Calvin Dailey 375

Processing Volume 1: Long Papers, pp. 1343-1353, Beijing, Chi- na. Association for

Computational Linguistics.

[21] Whitelaw, C., Garg, N., Argamon, S. (2005). Using Appraisal Groups for Sen- timent

Analysis. In Proceedings of the 14th ACM International Conference on Information

and Knowledge Management, pp. 625-631. ACM.

Keenen Cates1, Pengcheng Xiao1,∗, Zeyu Zhang1, Calvin Dailey1

1Department of Mathematics, University of Evansville

1800 Lincoln Ave, Evansville, Indiana, 47722 USA

∗Corresponding author: [email protected]; fax: (812)488-2944

Copyright of Journal of Data Science is the property of National University of Kaohsiung, Department of Applied Mathematics and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.