help3
Journal of Data Science 355-376 , DOI: 10.6339/JDS.201804_16(2).0007
CAN EMOTICONS BE USED TO PREDICT SENTIMENT?
Keenen Cates1, Pengcheng Xiao1,∗, Zeyu Zhang1, Calvin Dailey1
1 Department of Mathematics, University of Evansville
1800 Lincoln Ave, Evansville, Indiana, 47722 USA
Abstract: Getting a machine to understand the meaning of language is a
largely important goal to a wide variety of fields, from advertising to enter-
tainment. In this work, we focus on Youtube comments from the top two-
hundred trending videos as a source of user text data. Previous Sentiment
Analysis Models focus on using hand-labelled data or predetermined
lexicon-s.Our goal is to train a model to label comment sentiment with
emoticons by training on other user-generated comments containing
emoticons. Naive Bayes and Recurrent Neural Network models are both
investigated and im- plemented in this study, and the validation accuracies
for Naive Bayes model and Recurrent Neural Network model are found to
be .548 and .812.
Key words: Sentiment analysis, Emoticons, Natural Language Processing,
Machine Learning.
1. Introduction
Sentiment analysis is a branch of natural language processing that involves trying to
understand the underlying sentiment and emotion behind language. For example,“Have a
great day” has a positive sentiment, and “Have a bad day” has a negative sentiment.
Current state of the art techniques for modelling sentiment in language involve using
machine learning and deep neural networks to classify the sentiment of language. For
example, SemEval is a yearly contest for trying to classify tweets as Positive, Negative, or
Neutral. Its findings advance the field of sentiment analysis and machine learning
(Rosenthal, Noura, and Preslav 2017).
356 Can emoticons be used to predict sentiment?
1.1 Objectives
Our focus is on another major social platform, Youtube, which garners hundreds of
thousands of comments and other user generated statistics. User data yields important
results in the fields of social sciences. In particular we are in- terested in the top trending
Youtube videos,and aim to identify sentiment of commenters by suggesting what emoticon
a user might use with their comments. We suggest emoticons give insight into the
sentiment of the user, and the emoticons pictographic nature gives us a better language to
indicate emotion. Using the subset of comments with emoticons we engineered a
labelled dataset of com- ments and emoticons. Our models take advantage of this
labelling to model the emoticon lexicon. This is further used to suggest what emoticons
might ac- company a comment (Hogenboom 2013). Using this dataset and the models we
have create, we hope to answer whether or not we can accurately predict what emoticon a
user might use.
1.2 Literature Review
Sentiment Analysis drives many industries and being able to correctly identify
sentiment in a Youtube comment would allow automated systems to moderate comments
or correctly recommend media or advertisements to users. In general, there are two
methods that Natural Language Processing researchers use for Sentiment Analysis;
Lexicon based and Machine Learning based. Sentiment Analysis is a fairly robust field,
and has consistently seen interest since its conception. This field has increased
exponentially with the surge in data seen with the rise of the internet, in many cases the
amount of data is intractable. Social platforms such as Youtube, by themselves generate
more data than any one hu- man could analyze. Therefore a system of Natural Language
Processing (NLP) is required to deal with the sheer volume of data.
Natural Language Processing can be considered a subset of cognitive science or
computer science. The concept of natural language processing originally came about in the
mid-20th century. The initial motivation was language translation (Salas-Za ŕate 2017).
Natural Language Processing naturally lends itself to the field of Artificial Intelligence, as
there is a strong desire for agents that can understand human language; for example, a chat
bot. Sentiment Analysis did not pull much attention until the early 2000s. The natural
language processing systems that were developed at first were only applicable to narrow
subject areas, such as answering questions with information from a database about moon
rocks, or answering questions from a manual on airplane maintenance (Liu 2012). The
Keenen Cates, Pengcheng Xiao, Zeyu Zhang, Calvin Dailey 357
explosion of social data quickly created a necessity to autonomously understand language
sentiment. Especially with the ubiquitous nature of social media in recent years, the field
of sentiment analysis has become more and more applicable to many fields. It has been
one of the most active areas of research in the field of natural language processing since
the turn of the century (Pozzi 2017).
There are many commercial applications. It may have significant effects for the fields
of management, political science, economics, and other social sciences, among others (Liu
2012). Sentiment analysis, also known as opinion mining, refers to the process of creating
automatic tools or systems which can derive subjective information from text in natural
(human) languages, as opposed to computer codes. The subjective information most
commonly desired by researchers are opinions and sentiments, hence the name sentiment
analysis. Sentiment analysis, while originally only practiced by computer scientists, has
become widely used by the management scientists and the social sciences. Microsoft,
Google, Hewlett-Packard, IBM, and others have created their own systems for sentiment
analysis.
Before the turn of the century, there were previous developments in what would later
become the field of sentiment analysis. Naive Bayes classifier pro- vided a way to model
the affective tone of an entire document based on the “semantic differential scores” of each
of the words in the document. The semantic meanings and scores were derived from a 1965
study by Heise. According to Lee and Pang (2002) marked an explosion of research in
sentiment analysis. This increase in the study of this topic was partially attributed to the
increasing popularity of machine learning models, and the availability of training sets with
which machine learning models could be trained. Turney (2002) used an algorithm based
on parts-of-speech tagging and semantic orientation in order to classify online reviews as
recommended or not recommended. Anderson and McMaster (1982) used machine
learning techniques such as Support Vector Ma- chines and Naive Bayes in order to
classify the sentiment of movie reviews. Dave, Lawrence, and Pennock (2003) classified
polarity of web reviews based on several n-gram methods. It was not as accurate when
applied to individual sentences because it was developed with the purpose of classifying
reviews which normally contained multiple sentences. Hu and Liu (2004) used a method
that could predict the sentimental orientation of opinion words and therefore the opinion
orientation of a sentence. It was an unsupervised method and did not require a corpus, and
was loosely based off the work of Dave, Lawrence and Pennock. It returned the
sentiments at the sentence level instead of at the entire review at once. Then it combined
the sentence-level sentiments to give a summary of the entire review. Moraes, Valiati, and
Neto (2013) showed the effectiveness of machine learning processes as opposed to
358 Can emoticons be used to predict sentiment?
lexicon-based models. They empirically compared the Support Vector Machines and
Artificial Neural Network machine learning methods for sentiment analysis and found that
the Artificial Neural Networks performed better. In 2015, Wang, Liu, Sun, Wang.B, and
Wang.X. showed the effectiveness of Long short-term memory recurrent neural networks
for sentiment analysis by predicting the sentiments of tweets.
1.3 Sentiment Lexicon
The lexicon method splits input text into many individual words or phrases called
tokens. Then, it creates a table of these tokens and records the number of times each token
shows up in the text. The resulting tally is called a “Bag of Words” model. Once this
process is done, another tool called “Sentiment Lexicon” is used for computing the
classification of the bag of tokens we mentioned above. The Sentiment Lexicon has the
sentiment values, which can be just positive or negative numbers or some other value-
representations, like vectors, that are pre-recorded for each token. This can be done either
manually or by some machine learning techniques. Once we have the input text tokenized
and a suit- able Sentiment Lexicon, the final task is to design a function to compute the
final sentiment. The simplest way to compute the final sentiment is to sum the sentiment
values of each token together. The lexicon method is a traditional way to deal with natural
language processing problems, and it has a good theoretical basis. Many people are still
using and studying this method in spite of its origins in the 1960s. However, it does have
some drawbacks such as ignoring the importance of integrality and continuity of the text.
We know that the meaning of a sentence highly depends on the order of words and context;
these should not be ignored if we want a real intelligent sentiment processing system
(Tbboada 2011).
1.4 Machine Learning
In the Machine Learning technique of sentiment analysis the classification algorithm
uses a training set to learn a model based on features in the set. This makes a more nuanced
classification possible and can help with ambiguous words or interpretations that vary by
context. A method of feature extraction must be chosen. Some of these methods include
n-grams, which are sets of words that contain n words each. Others use parts-of-speech
information, emotional, affective, or semantic data. One of the disadvantages of the
machine learning method is that it requires a large set of labelled data to be used as the
Keenen Cates, Pengcheng Xiao, Zeyu Zhang, Calvin Dailey 359
training set. It is simpler to use the lexicon-based method unless a suitable training set is
available (Salas-Za ŕate 2017).
We will need to classify the sentiments of the emoticons manually in order to prepare
them for use in our analysis. Once that is done, we can compile our training set using the
comments in the data that already contain emoticons, using the sentiments of each
emoticon. Then our model will be able to classify and assign an emoticon to each comment
in the data set that does not already contain one. Recurrent Neural Networks(RNNs) have
had a great deal of success in the Natural Language Processing Realm. The reason is that
text data is highly sequential, for example, the word “day” does not mean much unless you
know the words that came before it; i.e “Have a great day.” RNNs have pushed the state
of the art of previous architectures in short-length text data (Lee and Dernoncourt 2016).
Given previous attempts to model sentiment have not thoroughly explored emoticons,
we hope to answer the question of whether or not we can accurately recommend emoticons
that might accompany a piece of text. Once we have answered this, further research can
make attempts to analyze sentiment with emoticons on a machine.
2. Methodology
2.1 Data
To get our data, we used the Data Science Competition Website Kaggle. On this
website, people share datasets, competitions, and tutorials. We found a dataset containing
comments from the top 200 trending Youtube videos. The author of this dataset obtained
the data through Youtube’s publicly available API, which allows developers to easily
query for data on Youtube. The data itself contains profanity, nonsensical text, and in
general is noisy. The data itself could be generated by bots, and we do no vetting to
determine whether a comment actually comes form a human. The noisiness of the data
might prevent us from training a successful model; however, we assume that the large
amount of data will help our models perform well in spite of the low quality of data.
In order to answer the question of whether or not a model could recommend emoticons,
we created 2 models that attempt to perform this recommendation. We also created a
simple dummy model for purposes of comparison. We have roughly three-hundred
thousand comments with emoticons, and use that to boos- trap a dataset of comments with
labels. More data is desirable, but this is a fairly large corpus for initial research.
In total, there are 691, 388 rows in the dataset. A large proportion of them contain
emoticons, (more than 200, 000), so there is a quite a bit of data, and it would be fairly
360 Can emoticons be used to predict sentiment?
straightforward to access the Youtube API and get more if needed. This means I have as
much data as I could possibly want, and more if needed. As for features, I will only use
the text, likes, reply threads, and so on will be ignored in this phase of the project. On
average, each text is 15 words long. Figure 1 shows some examples of how the data looks.
Figure 1: Example unprocessed data
2.2 Evaluation Metrics
The models will be evaluated using a holdout set of data, in which each will
recommend five emoticons that might accompany a text. If at least one recommendation
is an emoticons that occurs in the validation comments, then I will consider it to be
a ”correct” guess. Accuracy is then the number of correct guesses divided by total guesses.
Keras calls this accuracy ”top k categorical accuracy”, and will be implemented for our
models. Mathematically, this would look something like this where matching x ∈
Comments and y ∈ Labels and score(x) = 1 if any p ∈ argmaxk=5(predict labels(x)) is in
y, else score(x) = 0. predict labels(x) would return the probabilities of each output class
occurring. Then the accuracy of the model would be ΣN(score(xi))
𝑁 where xi∈ Comments
and N =| Comments |.
One consideration is that the distribution of emoticons occurring in the corpus of data
is highly skewed; this would be good reason to suggest F1 scores and might be better for
future analysis. However, we chose this evaluation metric because it more closely
resembles the question we are asking. The important thing to note is that the distribution
is indeed skewed(see Figure 2).
Keenen Cates, Pengcheng Xiao, Zeyu Zhang, Calvin Dailey 361
Figure 2: Distribution of a subset of Emoticons
2.3 Analysis Plan
In order to compare the performance of our model, we created a holdout set of data
meant for only validation of accuracy. We also defined what a prediction would be for
each model, each model would output its top five highest predictions. If any of those
predictions are in the output validation set, then we considered it an accurate prediction.
Then in order to analyze the dataset, we will compute the prediction accuracy of each
model and compare those scores. One might also consider looking at the training accuracy
of each model; however, these scores are not directly comparable, so we ignore them
except for the purposes of optimizing the model.
2.4 Approach
In our approach, we had to make a few crucial assumptions and simplifications to
contextualize our problem. Firstly, our dataset involved input data with multiple output
classifications. For example, a users can add hundreds of the same emoticon or many
different emoticons. As a preprocessing step, we narrowed down these classes to the
unique emoticons that show up in a comment, and unrolled the data set to have a single
label. Table 1, displays how each comment gets unrolled into multiple data points with
single labels.
362 Can emoticons be used to predict sentiment?
I loved this video! x x y
I loved this video!
I loved this video!
x
y
Table 1: Unrolling of data labels
The other assumption exists only for our Naive Bayes Model, and it is that all words
in the comments are independent. This assumption is difficult to back up, and it is not clear
whether there is mutual dependence or mutual exclusivity between words. However, our
Recurrent Neural Network does not have this limitation because it can model the entire
sequence.
2.5 Preprocessing
One of the most important steps is the preprocessing stage. This is done before all
models are trained. We first separate the data into comments with emoticons and comments
without emoticons. We then make all comments lowercase and afterwards normalize our
comments on both by creating a dictionary of punctuation to tokens, and a dictionary of
word counts over all comments that use thes ordering of each word as its embedding. Table
2 shows an example of how the dictionaries are used to tokenize a comment. A similar
process is used to encode the emoticons, we use a dictionary to encode them as integers.
Preprocessing the comments in this way gives us a normalized integer sequence, which
deals with comments that might have different capitalizations of words.
2.6 Dummy Model
For purposes of comparison, we created a very simple model that always predicts that
a comment would use the emoticon with the largest prior probability. The motivation
behind this, is that it gives us a baseline score to beat. If we can do significantly better than
this, then we know that the models have potential.
2.7 Naive Bayes Model
Our second model uses Bayesian Statistics that creates tables of posterior proba-
bilities for each class given a word using Bayes rule. Naive Bayes is a conditional
probability model, and given some instance to be classified, represented by a vector of
features:
x = (𝑥1,…,𝑥𝑛)
We then compute the probability of each output class using conditional probability
Keenen Cates, Pengcheng Xiao, Zeyu Zhang, Calvin Dailey 363
p(𝐶𝑘|𝑥1,…,𝑥𝑛)
Table 2: Tokenization Process
Since n, can be large making this model less tractable we need to reformulate our
model using Bayes Rule. In plain english,
𝑝𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟 = 𝑝𝑟𝑖𝑜𝑟 ∙ 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑
𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒
And symbolically,
p(𝐶𝑘|𝑥) = 𝑝(𝐶𝑘)𝑝(𝑥|𝐶𝑘)
𝑝(𝑥)
In practice, the numerator is the most import part as the denominator does not depend
on effectively making it a constant. The numerator is equivalent to the joint probability
model meaning we can replace the numerator with,
p(𝐶𝑘,𝑥1,…,𝑥𝑛)
We can then rewrite the numerator using the chain rule for repeated applications of
conditional probability, derivation is in appendix 1. Then we add the naive as- sumption
of conditional independence, allowing use to further simplify our model
364 Can emoticons be used to predict sentiment?
Figure 3: Naive Bayes Model
to:
p(𝐶𝑘,𝑥1,…,𝑥𝑛) = 1
𝑍 𝑝(𝐶𝑘)∏𝑝(𝑥𝑖|𝐶𝑘)
𝑛
𝑖=1
Where Z is:
Z = p(x) = ∑𝑝(𝐶𝑘) 𝑘
𝑝(𝑥|𝐶𝑘)
Which is the scaling factor dependent on the instance. The derivation is in appendix 2.
In order to make a classifier, we would generally take the argmax of the simplified model
without Z, but in our case we take the top five arguments as our program is recommending
multiple emoticons that might be appropriate to the definition of Naive Bayes classifier .
We implement this model in python and the model follows figure 3.
Another problem is that we have to deal with words that never show up in our corpus
of texts. In order to deal with this, we smooth the probabilities. To do this, we make any
word or class that doesn’t show up have a very small probability that is close, but not zero.
Otherwise, the probability would zero out when words are not in the corpus.
2.8 Recurrent Neural Network
Our third and final model, is a recurrent neural network and our architecture is as
follows in table 3.
Keenen Cates, Pengcheng Xiao, Zeyu Zhang, Calvin Dailey 365
Input
Embedding Layer
LSTM Layer
LSTM Layer
LSTM Layer
Fully-Connected Layer
Output layer
Table 3: RNN Architecture
Recurrent Neural networks are a class of neural networks that form a directed cycle,
allowing them to take time into account, or a notion of memory. This allows for the RNN
to be suited to predicted arbitrary sequences by taking advantage of their memories.
The label data also undergoes another transformation before the RNN begins the
learning process. Since the emoticons are encoded using an ordinal number, the integer
representation does not quite make sense as one emoticon is not greater than another. To
rectify this, we represent this integer as a one-hot vector, essentially we take a fixed-length
vector that is the size of the total number of output classes. Then the integer is used as an
index of the “hot” class. Table 4 gives a small example of encoding a small class space.
Table 4: One-Hot Encoding
One of the major features of this model is the stacked LSTM layers. This architecture
allows us to better model hierarchical elements of language. This means each layer will
represent progressively complex parts of the hierarchy. One might imagine this in terms
of the composition of the human face. For example, the most basic element is an edge.
Then a more complex step would be individual elements of the face such as a nose or
mouth. Then the most complex part would be the entire face, and the composition of its
requisite parts.
366 Can emoticons be used to predict sentiment?
The LSTM itself is able to remember previous contexts in sentences, meaning we
could potentially get more performance via our model becoming better at modelling
context.Our RNN had a much longer time to run, and in order to train the model, we
decided to use more power hardware in the form of a GPU. The Neural Network was then
trained on a GPU using Floyd Hub, a platform for running deep learning projects. The
expense was roughly 14 dollars, as a we subscribed to the Data Science plan which gave
us 10 hours of gpu time which we used for experimentation on multiple occasions. The
price was remarkably cheap compared to other platforms such as Amazon. Usage of
FloydHub is remarkably simply, and resembles version control programs such as git. One
simply uploads their code to the website using command line tools, and are given an
interface to interact with their instance. This service was worthwhile to learn because it
abstracted away elements such as infrastructure, version control, and storage and we could
focus on the problem.
In addition to our baseline architecture, we also preform dropout on each lay- er,
which helps prevent against training bias because the network probabilistic “drops” some
of the weight which forces the network to build redundancies. For the training metric, we
implemented the top k categorical accuracy metric listed in the evaluation metrics. For
the objective function we found that categorical cross entropy work best which typically
works well in multi-class, single-label s- cenarios.Using TFLearn, a deep learning library
for Python, we implemented the architecture we decided on with relative ease. TFLearn
builds on top of Tensor- Flow, abstracting away many of the more intimate computational
components, and allowing the programming to think about the layers and interactions
between layers rather than how to build a well known type of layer or cell.
2.9 Implementation
2.9.1 rogramming Language Libraries
•Python 3
•TFLearn a deep learning library featuring a higher-level API for Tensor- Flow.
•TensorFlow a deep learning library
As mentioned throughout the text, the models where implemented using the listed
libraries. We did our coding on the website FloydHub via iPython Notebooks, which
abstracted away much of the setup. We split our code up into three notebooks, one for
preprocessing, Bayesian Model, and RNN. We ran into very few problems implementing
our solution; however, some are outlined below.
Keenen Cates, Pengcheng Xiao, Zeyu Zhang, Calvin Dailey 367
2.9.2 Problems
•Bayes Smoothing We ran into a small hitch with the Bayesian when dealing with
querying prior probabilities when certain values did not exist in the data. However, we
used a technique to ”smooth” the values by assigning a small probability to these values.
•Skin Tone Modifters There are emoticons that exist that modify other emoticons, i.e.
allowing one to change the skin tone of the smiley face. We found that these confounded
our predictions, and removed them as possible predictions.
•Finding loss, activation, and metrics We had to experiment many times to find the
best loss, activation, and metric functions for our RNN. This process may be simple trial-
and-error as we experienced.
2.10 Reftnement
Originally, our RNN model did not preform as well as we had hoped; however, a few
optimization to our model vastly impacted our performance. The first model we used was
a multi-class, multi-label classifier which performed very poorly. Our RNN had
performance at .508 which left much to be desired. We believe the reason for this is that
instead of one-hot encoded vector, we had many-hot encoded. This means that the label
space would be of order 2# of emoticons. Since this space is extremely large, the model would
have trouble representing any reasonable portion of this. For this reason, we needed to
unroll data points to preform multi-class, single-label classification. After adjusting our
loss function, metric function, and activation function we ended up with much better
performance. We believe this to be because of the reduction in potential labels to just #
of emoticons. In addition, hyper parameters were adjusted, such as, learning rate and batch
size to find out what setting worked best. The best we found was a learning rate of .001
and a batch size of 128.
3. Results
In order to validate the models, we created a holdout set of labelled data that none of
the models got to use for training or testing. The accuracy of each model using top k
categorical accuracy is in tables 5 and 6.
368 Can emoticons be used to predict sentiment?
Model Accuracy
Dummy
Naive Bayes
RNN
.527
.859
.702
Table 5: Training Accuracy Results
Model Accuracy
Dummy
Naive Bayes
RNN
.527
.548
.812
Table 6: Validation Accuracy Results
Table 6 gives us a measurement of how well our recommendation engine gives us
accurate emoticons to represent our text. Our results do not promote strong confidence in
our Naive Bayes Model’s ability to recommend emoticons; however, there are some
potential improvements to the model such as n-gram modelling. Notably, the Bayesian
Model preforms decently on the training data, but generalizes quite poorly and shows signs
of over-fitting. The RNN on the other hand, surprisingly preforms slightly worse on
training, but preforms much better on the validation set. For whatever reason this
phenomenon occurs, it is clear that the model generalizes much better.
3.1 Visualization of Model Functionality
We have a model that could be incorporated into a wide variety of applications; for
example, a browser plugin that predicts what emoticons you might put with a comment
and assist the user similar to an auto-complete feature. One issue to consider might be the
nature of Youtube comments themselves, which might pre- vent the generalization of this
model to other applications. However, the models do show that this sort of functionality is
possible. For example, we have pulled some examples from the data and run them through
our models to produces the tables below, and the comments themselves seem to be quite
different than more formal forms of language.
Keenen Cates, Pengcheng Xiao, Zeyu Zhang, Calvin Dailey 369
Table 7: Example data and predictions
While the machine learning back-end may not be the most sophisticated, the model
does a good job in practice of giving recommendations, and we think the model would be
good enough to use for applications to be built on top of.
3.2 Limitations
One limitation of our models is that words that do not show up in the Youtube
Comment corpus cause issues, as our models have trouble predicting outputs for words
that it has never seen. One way to fix this, might be to mine for more Comment data. Some
drawbacks of the Naive Bayes Model is that we may not be able to model longer term
trends in comments, however with the short length of the comments, this may be a non
issue. We also are limited in our choice of language modelling because we are on the word
level. We would likely see large improvement by expanding our level of modelling to some
type of n-gram. The RNN has limitations in multi-class classification, and this may be
hindering its ability to learning. Another limitation might be that the training time is cost
prohibitive. The model would likely continue to learn and perform better with more
370 Can emoticons be used to predict sentiment?
training time and data, meaning ultimately a higher cost for the model. The naive bayes is
easy to program with fast run time, and no need to train for hours upon hours.
Another major consideration is that an RNN might be a bad fit. We originally though
long term sequential modelling would be important, but it turns out the average comment
length is 15 words long. It may be the case that sense the length of texts are so short, that
we might have to thoroughly rethink what our strategy would be if this sequential
modelling is unimportant.
3.3 Future Work
In order to eliminate the assumption of independence in the Bayesian model, we can
add complexity by changing at what level we model the data. To do such we would need
to employ a skip-gram or n-gram model that contain larger parts of the sequence data. One
might also explore alternative Bayesian Models such as Hidden Markov Models. The same
improvements to the data modelling using n-grams would likely improve the quality of the
RNN results. The RNN model likely has a great deal of room for improvement, one might
experiment with hyperparameter tuning or modifying the architecture. There are even more
powerful models such as CRNNs and GANs that push the state of the art in deep learning.
These models would be worth exploring; however, we pushed our newfound deep learning
knowledge as far as we could in the time allotted.
Another important consideration is the unrolling of the data. Future work should
further explore how to deal with multi-class classification, which would likely involve
writing new validation and loss functions for the neural network model. However, the
Naive Bayes Model does not suffer from this limitation.
Future work might also try and further connect the emoticons and sentiment. We
hypothesize that emoticons will naturally lend themselves to a easily convert into
sentiment classes. However, our current models predict only what emoticon might be used,
and the user of the model would have to infer what sentiment the emoticon might convey
depending on context.
One might also find more optimizations by adding further preprocessing steps, for
example, eliminating common english words that add very little information.
Keenen Cates, Pengcheng Xiao, Zeyu Zhang, Calvin Dailey 371
3.4 Reflection
Looking back at the process, here are the steps we took to get to the current models
• Literature Review We made sure to have a rough idea of what people in this field
have tried, and what the state of the art is.
• Deciding on a Model After reviewing the field, we made a decision on what models
we wanted to implement which set the tone for preprocessing and implementation.
• FloydHub Next we setup our programming environment with cloud computing in
mind. It’s important to setup an environment such as FloydHub or AWS to minimize
training time on a fast gpu. At this step we also made sure to download all the libraries we
would need
• Preprocessing a large majority of time was spent trying to learn how to deal with
the data, and exploring the data itself. We had to go through multiple iterations of
embedding and tokenization to find the method that made sense.
• Model Implementation After preprocessing our data, this step was fairly
straightforward. Most of the time at this step is dealing with edge cases, or optimization of
models rather than the actual implementation.
• Reftnement Refinement may have been the hardest part because we had to make
inferences about why our model was not performing up to our desires. It’s hard to say what
the potential of each model was, so we kept iterating until we had something that seemed
substantial.
3.5 Conclusion
Overall, there are many areas for potential improvement, and our work serves as a
baseline for recommending emoticons. However, we have begun to answer our original
question, it seems plausible the emoticons can be assigned with accuracy to comments as
noisy as Youtube comments, making it easy for a casual observer to understand the
sentiment of a text.
Acknowledgment: The authors appreciate the anonymous referee for the con-
structive review of the paper which has greatly improve the quality of the article. The
authors would also like to thank the generous support from the mathematics department at
University of Evansville.
372 Can emoticons be used to predict sentiment?
Appendix
1. Chain rule for repeated applications of conditional probability.
p(𝐶𝑘,𝑥1,…,𝑥𝑛) = 𝑝(𝑥1,…,𝑥𝑛,𝐶𝑘)
= 𝑝(𝑥1|𝑥2 …,𝑥𝑛,𝐶𝑘)𝑝(𝑥2 …,𝑥𝑛,𝐶𝑘)
= 𝑝(𝑥1|𝑥2 …,𝑥𝑛,𝐶𝑘)𝑝(𝑥2|𝑥3 …,𝑥𝑛,𝐶𝑘)𝑝(𝑥3 … ,𝑥𝑛,𝐶𝑘)
=….
= 𝑝(𝑥1|𝑥2 …,𝑥𝑛,𝐶𝑘)𝑝(𝑥2|𝑥3 …,𝑥𝑛,𝐶𝑘)…𝑝(𝑥𝑛−1|𝑥𝑛,𝐶𝑘)𝑝(𝑥𝑛|𝐶𝑘)p(𝐶𝑘)
2. Naive Assumption of conditional independence to simplify model. This the joint
model can be derived via:
p(𝑋𝑘|𝑥1,…,𝑥𝑛) = p(𝐶𝑘,𝑥1,… ,𝑥𝑛)
= p(𝐶𝑘)𝑝(𝑥1|𝐶𝑘)𝑝(𝑥2|𝐶𝑘)𝑝(𝑥3|𝐶𝑘)…
= p(𝐶𝑘)∏𝑝(𝑥𝑖|𝐶𝑘)
𝑛
𝑖=1
Keenen Cates, Pengcheng Xiao, Zeyu Zhang, Calvin Dailey 373
References
[1] Anderson, C., McMaster, G. (1982). Computer Assisted Modeling of Affective Tone
in Written Documents. Computers and the Humanities, 16(1), 1-9.
[2] Brill, E., Mooney, R. J. (1997). An Overview of Empirical Natural Language
Processing. AI Magazine, 18(4), 13.
[3] Chipman, S. E. (2017). The Oxford Handbook of Cognitive Science. Oxford: Oxford
University Press.
[4] Dave, K., Lawrence, S., and Pennock, D. (2003). Mining the Peanut Gallery: Opinion
Extraction and Semantic Classification of Product Reviews. In Proceedings of the
12th International Conference on World Wide Web (WWW 03). ACM, New York,
NY, USA, 519-528.
[5] Hu, M., Liu, B. (2004). Mining and Summarizing Customer Reviews. In Pro- ceedings
of the Tenth ACM SIGKDD International Conference on Knowl- edge Discovery and
Data Mining pp. 168-177. ACM.
[6] Hogenboom, Alexander, et al.(2013) Exploiting emoticons in sentiment analysis.
Proceedings of the 28th Annual ACM Symposium on Applied Computing. ACM.
[7] Kang, Mangi, Jaelim Ahn, and Kichun Lee.(2017) Opinion mining using ensem- ble
text hidden Markov models for text classification. Expert Systems with Applications.
[8] Lee, Ji Young, and Franck Dernoncourt.(2016) Sequential short-text classifi- cation
with recurrent and convolutional neural networks. arXiv preprint arXiv:1603.03827.
[9] Liu, B. (2012). Sentiment Analysis and Opinion Mining. Morgan and Claypool.
LSTM Networks for Sentiment Analysis. DeepLearning 0.1 Documentation,
Deeplearning. Retrieved December 01, 2017.
[10] Mitchell, J. (Datasnaek). Trending Youtube Video Statistics and Comments.
Kaggle, Kaggle Inc., Aug./Sep. 2017.
374 Can emoticons be used to predict sentiment?
[11] Moraes, R., Valiati, J. F., Neto, W. P. G. (2013). Document-level Sentiment
Classification: An Empirical Comparison between SVM and ANN. Expert Systems
with Applications, 40(2), 621-633.
[12] Naive Bayes classifier.Wikipedia, Wikimedia Foundation INC, 30 Nov. 2017,
Available from http://en.wikipedia.org/wiki/NaiveBayesclassifier.
[13] Pang, B., Lee, L., Vaithyanathan, S. (2002). Thumbs up? Proceedings of the ACL-02
Conference on Empirical Methods in Natural Language Processing
- EMNLP 02.
[14] Pozzi, F. A. (2017). Sentiment Analysis in Social Networks. Amsterdam: Else- vier.
[15] Rosenthal Sara, Noura Farra, and Preslav Nakov. (2017 )SemEval-2017 task 4:
Sentiment analysis in Twitter. Proceedings of the 11th International Workshop on
Semantic Evaluation .
[16] Salas-Za ŕate, M. P., Medina-Moreira, J., Lagos-Ortiz, K., Luna-Aveiga, H.,
Rodr íguez-Garc ía, M. A .́, and Valencia-Garc ía, R.(2017) Sentiment Analy- sis on
Tweets about Diabetes: An Aspect-Level Approach. Computational and
Mathematical Methods In Medicine, 1-9.
[17] Siersdorfer, Stefan, et al.(2010) How useful are your comments?: analyzing and
predicting Youtube comments and comment ratings. Proceedings of the 19th
international conference on World wide web. ACM.
[18] Taboada, Maite, et al.(2011) Lexicon-based methods for sentiment analysis.
Computational linguistics 37.2:267-307.
[19] Turney, P. D. (2002). Thumbs Up or Thumbs Down?: Semantic Orientation Applied
to Unsupervised Classification of Reviews. In Proceedings of the 40th Annual
Meeting on Association for Computational Linguistics, pp. 417-424. Association for
Computational Linguistics.
[20] Wang, X., Liu, Y., Sun, C., Wang, B., Wang, X. (2015). Predicting Polarities of
Tweets by Composing Word Embeddings with Long Short-Term Memory. In
Proceedings of the 53rd Annual Meeting of the Association for Compu- tational
Linguistics and the 7th International Joint Conference on Natural Language
Keenen Cates, Pengcheng Xiao, Zeyu Zhang, Calvin Dailey 375
Processing Volume 1: Long Papers, pp. 1343-1353, Beijing, Chi- na. Association for
Computational Linguistics.
[21] Whitelaw, C., Garg, N., Argamon, S. (2005). Using Appraisal Groups for Sen- timent
Analysis. In Proceedings of the 14th ACM International Conference on Information
and Knowledge Management, pp. 625-631. ACM.
Keenen Cates1, Pengcheng Xiao1,∗, Zeyu Zhang1, Calvin Dailey1
1Department of Mathematics, University of Evansville
1800 Lincoln Ave, Evansville, Indiana, 47722 USA
∗Corresponding author: [email protected]; fax: (812)488-2944
Copyright of Journal of Data Science is the property of National University of Kaohsiung, Department of Applied Mathematics and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.