helpfn

profilebcs
SentimentanalysisFrombinarytomulti-classclassificationApattern-basedapproachformulti-classsentimentanalysisinTwitter.pdf

Sentiment Analysis: from Binary to Multi-Class Classification

A Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter

Mondher Bouazizi Graduate School of Science and Technology

Keio University

Yokohama, Japan

Email: [email protected]

Tomoaki Ohtsuki Department of Information and Computer Science

Faculty of Science and Technology, Keio University

Yokohama, Japan

Email: [email protected]

Abstract—Most of the state of the art works and researches on the automatic sentiment analysis and opinion mining of texts collected from social networks and microblogging websites are oriented towards the classification of texts into positive and nega- tive. In this paper, we propose a pattern-based approach that goes deeper in the classification of texts collected from Twitter (i.e., tweets). We classify the tweets into 7 different classes; however the approach can be run to classify into more classes. Experiments show that our approach reaches an accuracy of classification equal to 56.9% and a precision level of sentimental tweets (other than neutral and sarcastic) equal to 72.58%. Nevertheless, the approach proves to be very accurate in binary classification (i.e., classification into “positive” and “negative”) and ternary classification (i.e., classification into “positive”, “negative” and “neutral”): in the former case, we reach an accuracy of 87.5% for the same dataset used after removing neutral tweets, and in the latter case, we reached an accuracy of classification of 83.0%.

I. INTRODUCTION

Twitter became one of the biggest web destinations: a very

popular platform for people to express their thoughts about

products [1] [2] or movies [3], share their daily experience

and communicate their opinion about real-time and upcoming

events, such as sports or political elections [4], etc.

This ecosystem presents a very rich, source of data to mine.

However, due to the limitation in terms of characters (i.e.

140 characters per tweet), mining such data present lower

performance than that when mining longer texts. In addition,

classification into multiple classes remains a challenging task:

binary classification of a text usually relies on the sentiment

polarity of its components (i.e., whether they are positive

or negative). However, when positive and negative classes

are divided into subclasses, the accuracy tends to decrease

remarkably.

In this paper, we propose an approach that relies on writ-

ing patterns, and special unigrams to classify tweets into 7

different classes, and demonstrate how the proposed approach

presents good performances (i.e., classification accuracy and

precision). The main contributions present in this paper are as

follows:

1) We propose a set of pattern-based features, along with

other features to classify tweets.

2) We classify tweets into 7 different sentiment classes.

The remainder of this paper is structured as follows: In

Section II we present our motivations and describe some of the

related work. In Section III we formally define the aim of our

work and describe in details the proposed method. In Section

IV we detail our experiments and the results obtained. Section

V concludes this paper and proposes possible directions for

future work.

II. MOTIVATIONS AND RELATED WORK

A. Motivations

Social networks and microblogging websites such as Twitter

have been the subject to many studies in the recent few years.

Automatic sentiment analysis and opinion mining present a hot

topic of study. Social networks present a huge source of data

representing the opinions of a significant, yet totally random,

proportion of users and customers who are using a product

of a service. However, due to the informal language used, the

presence of non-textual content and the use of slang words

and abbreviations, classification of data extracted from such

microblogging websites is rather a challenging task. Ghag et

al. [5] defines “Hidden Sentiment Identification” which is the

identification of the real feeling rather than the sentiment po-

larity, “Handling Polysemy” which is the existence of multiple

meanings that might have different sentiment polarity for the

same word, and “Mapping Slangs” which is the identification

of the meaning and the polarity of slang words, among others

as the most challenging tasks facing the sentiment analysis of

short microblog texts. On a related context, the state of the art proposed approaches

are mostly focusing on the binary and ternary sentiment

classification. In other words, they classify texts either into

“positive” and “negative”, or into “positive”, “negative” and

“neutral”. However, to study the opinion of a user, it would be

more interesting to go deeper in the classification, and detect

the sentiment hidden behind his post. Following two examples

of tweets which are negative, however, reflect two completely

different aspects:

• “Damn damn.. no iPhone support for windows XP x64.

There are some workarounds, but I can’t figure this out.”

IEEE ICC 2016 SAC Social Networking

978-1-4799-6664-6/16/$31.00 ©2016 IEEEAuthorized licensed use limited to: University of the Cumberlands. Downloaded on July 24,2021 at 03:22:18 UTC from IEEE Xplore. Restrictions apply.

• “Nooooooooooo! My iPhone glass cracked :(”

In the first example, the user is expressing his fury towards

the absence of support of his phone on an operating system.

However, in the second he is expressing some feeling of

sadness because of a physical problem his phone faced. The

first example shows some important information regarding the

satisfaction of the user, therefore, it might be more important

to study. However, in general, both information can be used,

yet, they have to be distinguished from each other.

B. Related Work

Twitter data mining has been a hot topic of research in

the last few years. Nature of the data mined varies widely

depending on the aim and the final result expected. Conse-

quently, the techniques used to process data and extract the

needed information are different.

Akcora et al. [6] proposed a method to determine the

changes in public opinion over the time, and identify the

news that led to breakpoints in public opinion. In a related

context, Sriram et al. [7] proposed a method to classify tweets

depending on their natures into a set of classes including

private messages, opinions and event, etc.

However, most of the work has been focusing on the content

of the tweets and how to extract opinions of users towards

specific topics or objects. The work of Pang et al. [8] presented

the pioneer work for the use of machine learning to classify

texts based on their sentiment polarity. In their work, the

authors used unigrams, bigrams and adjectives in different

ways to classify a set of movie reviews into positive or

negative. Other works iterated more on the idea, and new types

of features have been used for the classification, depending on

the aim and application: Boia et al. [9] and Manuel et al. [10]

proposed two approaches that, respectively, rely on emoticons

to detect the polarity of tweets and on slang words to assign a

sentiment score to online texts. These two works proved how

non-textual components can be used to detect the polarity of

a text.

More recent works went deeper, and new models have been

built: Gao et al. [11] proposed a recent approach that focus

in the repartition or the frequency of sentiment classes in the

set they analyze. Moving from classification to quantification,

the authors concluded that using a quantification-specific algo-

rithm presents a better frequency estimation than using regular

classification-oriented algorithms.

Few works have been conducted on the multi-class senti-

ment analysis. Most of them focused on assessing the sen-

timent strength into different sentiment strength levels (e.g.,

“very negative”, “negative”, “neutral”, “positive” and “very

positive”) or simply give numeric sentiment scores to the

texts [12] [13]. Nevertheless, other works were conducted

to classify texts into different sentiment classes: Lin et al.

[14] [15] proposed an approach that classifies documents into

reader-emotion categories. They relied on what they qualify

of similarity features and word emotion features along with

other basic features. The approach, although it shows some

potential, is oriented towards the reader rather than the writer.

TABLE I: Structure of the Dataset Used

Class Training set Test set

Happiness 3000 225 Love 3000 219 Sadness 3000 223 Anger 3000 201 Hate 3000 157 Sarcasm 3000 199 Neutral 3000 176

Total 21 000 1400

Therefore, the sentiment classes proposed are different from

what a writer might intend to show. Similarly, Ye et al. [16]

studied the problem of emotion detection of news articles from

reader’s perspective, and tried various multi-label classification

methods and different strategies for features selection to con-

clude which are to be adopted to solve the problem. Liang et

al. [17] proposed an emoticon recommendation system that

recommends emoticons for posted texts to help to author

decide which emoticon to insert to show what he intends.

III. PROPOSED APPROACH

A. Problem Statement

Given a set of tweets, we aim to classify each one of them

to one of the following 7 classes: “happiness”, “sadness”,

“anger”, “love”, “hate”, “sarcasm” and “neutral”. There-

fore, from each tweet, we extract a 4 set of features, refer to

a training set and use machine learning algorithms to perform

the classification.

B. Data

For the sake of this work, we manually collected and

prepared 2 datasets as follow:

• Set 1: this set contains 21 000 tweets which have been

manually classified into the 7 classes, each containing

3 000 tweets. This set is used for training. Therefore, in

the rest of this work, it will be referred to as the “training

set”.

• Set 2: this set contains 1400 tweets. All tweets are

manually checked and classified into the 7 classes. This

set will serve as a test set. Therefore, in the rest of this

work, it will be referred to as the “test set”.

The structure of the dataset used is shown in TABLE I.

C. Features Extraction

Under different emotional conditions, humans tend to be-

have differently. This includes the way they talk and express

their feelings. Therefore, it might be important to rely, not

only on the vocabularies used, but also on the expressions

and sentence structures used under the different conditions, to

quantify and model these feelings. Therefore, in the rest of this

section, we rely on these assumptions to extract the following

four families of features:

Authorized licensed use limited to: University of the Cumberlands. Downloaded on July 24,2021 at 03:22:18 UTC from IEEE Xplore. Restrictions apply.

1) Sentiment-based features: Sentiment-based features

are ones based on the sentiment polarity (i.e., “posi-

tive”/“negative”) of the different components of tweets. We

first extract emotional scores from words using SentiStrength.

SentiStrength attributes sentiments scores to words, where

negative words have scores varying from -1 (almost negative) -

5 (extremely negative) and positive words have scores varying

from 1 (almost positive) to 5 (extremely positive). We uses

SentiStrength to extract the following features:

• Total score of positive words denoted by PW • Total Score of negative words denoted by NW (this score

is positive)

• Number of highly emotional positive words (i.e., having

score equal to or more than 3) denoted by Npw • Number of highly emotional negative words (i.e., having

score equal to or less than -3) denoted by Nnw • Ratio of emotional words ρ(t) defined as

ρ(t) = PW(t) − NW(t)

PW(t) + NW(t) (1)

where t is the tweet. In case the tweet does not contain any emotional word, ρ is set to 0.

We then add four more features by counting the number

of positive, negative, joking (or ironic) and neutral emoticons.

Joking emoticons are emoticons used sometimes with ironical

or sarcastic statements (e.g., “:P”). Hashtags also have emo-

tional content. In some cases, they are used to disambiguate

the real intention of the twitterer conveyed in his message,

particularly when he is being sarcastic. Therefore, we count

also the number of positive and negative hashtags.

We then define 4 features that represent whether there is

a sentiment contrast between the different components. By

contrast we mean the coexistence of a negative component

and a positive one within the same tweet: we extract such

contrast between words, between hashtags, between words and

hashtags and between words and emoticons, and use them as

extra features.

2) punctuation and syntax-based features: In addition to

sentiment-based features, we extract a second set of features

we qualify of punctuation and syntax-based features. A certain

use of punctuation marks, the repetition of vowels or the

employment of all-capital words may show how intense the

sentiment of the person is. To detect such aspects, we extract

the following set of features: number of exclamation marks,

number of question marks, number of dots, number of all-

capital words and number of quotes.

We also add a sixth feature by checking if any of the

words contains a vowel that is repeated more than twice (e.g.,

“looooove”). If such a word exists, the feature is set to “true”,

otherwise, it is set to “false”.

3) Unigram-based features: Since proposed by Pang et al.

[8], unigrams and n-grams in general, have been used as basic

features for sentiment analysis using machine learning. In the

different approaches, unigrams are collected from the training

datasets, and either the count or the presence of these unigrams

are used as features for the classification. In our work, we

Fig. 1: Number of unigrams collected from WordNet using the

seed words proposed

make use of WordNet [18] to collect unigrams related to each

sentiment class. We start with a set of seed words few in

number for each class, and used WordNet to collect their

synonyms and hyponyms down to a certain depth.

The choice of synonyms and hyponyms is based on the

fact that these words are highly correlated with the initial

seed word, and usually describe the same object, if not a

more precise one. While synonyms refer usually to equiva-

lent terms, hypernyms and hyponyms show the relationship

between the more general term and its more specific instances.

A hypernym, or a superordinate, is a broader term than a

hyponym, whereas a hyponym is a word or an expression

which is more specific than its hypernym. For example, for the

word “feeling”, two of its direct hypernyms are “perception”

and “idea”. The words “happiness”, “anger” and “fear” are

some of its hyponyms. Hypernyms might lose some of the

specificities of the initial word, therefore, in our study, we

collect only synonyms and hyponyms of the seed words. On

the other hand, hyponyms also might lose the original meaning

of the word, and collide with some of other classes. Therefore,

the depth down to which we collect the hyponyms is set to a

certain value Dhypo, which is a parameter to optimize. We start with an initial set of seed words for each class

(except the class “neutral”). The words selected are nouns,

adjectives and verbs. We then collect the synonyms and

hypernyms up to different depths. Fig. 1 shows the number of

words for each depth as well as the number/ratio of duplicated

terms in different classes. To obtain as much terms as possible,

while maintaining a low duplication ratio and keeping in mind

that the deeper we go, the more we lose in the original meaning

of the word, we set Dhypo to 1. The words are associated to the classes described, and are given the absolute value of scores

returned by SentiStrength (if a word has a score equal to 0 in

SentiStrength, we give it a score equal to 1).

We use the resulted sets of words to extract 6 features, by

counting the occurrences of the words in the tweet to classify,

taking into consideration the score of the words (words are

given positive score regardless of their class).

4) Pattern-based features: The idea of our pattern-related

features is inspired from our previous work [19], in which

we proposed an approach that relies on Part of Speech tags

(PoS-tags) to extract sarcastic patterns. In our current work, we

Authorized licensed use limited to: University of the Cumberlands. Downloaded on July 24,2021 at 03:22:18 UTC from IEEE Xplore. Restrictions apply.

TABLE II: Expressions Used to Replace the Words of EI and

GFI

PoS-tag Expression

“CD” [CARDINAL] “FW” [FOREIGNWORD] “UH” [INTERJECTION] “LS” [LISTMARKER] “NN”, “NNS”, “NNP”, “NNPS”, [NOUN] “PRP”, “PRP$” [INTERJECTION] “MD” [MODAL] “RB”, “RBR”, “RBS” [ADVERB] “VB”, “VBD”, “VBG”, “VBN”, “VBP”, “VBZ” [VERB] “WDT”, “WP”, “WP$”, “WRB” [WHDETERMINER] “SYM” [SYMBOL]

TABLE III: Part-of-Speech Tag Categories

Class PoS Tags

CI “CC”, “DT”, “EX”, “IN”, ‘MD”, “PDT”, “POS”, “RB”, “RBR”, “RBS”, “RP”, “TO”, “WDT”, “WP”, “WP$”, “WRB”

GFI “CD”, “FW”, “LS”, “NNP”, “NNPS”, “PRP”, “PRP$”, “SYM”, “UH”

EI “JJ”, “JJR”, “JJS”, “NN”, “NNS”, “VB”, “VBD”, “VBG”, “VBN”, “VBP”, “VBZ”

rely on PoS-Tag of words to extract similar patterns. However,

instead of dividing words into two categories, we divide them

into three: a first one, referred to as EI, containing words

which might have emotional content, a second one, referred

to as “CI”, containing non emotional words whose content is

important and a third one, referred to as “GFI”, containing

the words whose grammatical function is important. If a word

belongs to the first category, it is replaced by the corresponding

expression shown in TABLE II along with its polarity (e.g.,

the word “good” would be replaced by POS-ADJECTIVE);

if it belongs to the second, it is lemmatized and replaced by

its lemma; and if it belongs to the third, it is replaced by the

corresponding expression shown in TABLE II.

As mentioned above, the classification into categories is

done based on the PoS-tag of the word. The list of part-of-

speech tags and their category is given in TABLE III.

We generate the vector of words for each tweet as

defined. For example, the following PoS-tagged tweet

“He PRP is VBP dummy JJ , , why WP would VBD

you PRP think VBP I PRP want VBP to TO go VB with IN

him PRP !!!! .” gives, among others, the following pattern

vector [PRONOUN VERB NEG-ADJECTIVE . why VERB

PRONOUN VERB PRONOUN POS-VERB to VERB with

PRONOUN .].

We define a pattern as an ordered sequence of words.

Patterns are extracted from the training set such as their lengths

satisfy:

Lmin ≤ Length(pattern) ≤ Lmax (2)

where Lmin and Lmax represent respectively the minimal and maximal allowed length of patterns in words and Length(pattern) is the length of the pattern in words. The number of pattern lengths will be referred to as NL and is

TABLE IV: Pattern Features

Pattern length L1 L2 · · · LN

1 F11 F12 · · · F1N Sentiment

.

.

. . . .

.

.

. . . .

.

.

. Class

7 F71 F72 · · · F7N

equal to (Lmax−Lmin +1). Only patterns that appear at least Nocc times in our training set for the same class are kept; the others are discarded. We then divide the resulted patterns into

NF sets where: NF = NL × NC (3)

where NL is the number of pattern lengths and NC is the number of classes (7 in our case).

We create NF features, as shown in TABLE IV. Each feature Fij of the table represents the degree of resemblance of the tweet to the patterns of sentiment class i and length j. Therefore, given a tweet t, we calculate the resemblance degree res(p, t) of each pattern in the training set p to the tweet t [19]:

res(p, t) =

⎧ ⎪⎪⎪⎪⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎪⎪⎪⎪⎩

1, if the tweet vector contains the pattern as it is, in the same order,

α · n/N, if n words out of the N words of the pattern appear in the tweet in the correct order,

0, if no word of the pattern appears in the tweet.

Given the K patterns that have the highest resemblance to the pattern p among the patterns extracted from the class i which have a length j, the value of the feature Fij is

Fij = βj ∗ K∑

k=1

res(pk, t) (4)

where βj is a weight given to patterns of length Lj (regardless of their class). We give different weights for each length of

pattern since longer patterns are more likely to have higher

impact. Fij as defined measures the degree of resemblance of a tweet t to patterns of class i and length j.

In our previous work [19], we demonstrated that the optimal

values for Nocc, Lmin, Lmax, α and βi are as follows:⎧ ⎪⎪⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎪⎪⎩

Nocc = 2,

Lmin = 3,

Lmax = 10,

α = 0.03,

βn = (n − 1)/(n + 1), ∀n ∈ {3, . . . , 10}.

On the other hand the parameter K has been introduced in this work since we noticed a high imbalance between the

number of patterns for each class. Fig. 2 shows the classifica-

tion accuracy using pattern-based features for different values

Authorized licensed use limited to: University of the Cumberlands. Downloaded on July 24,2021 at 03:22:18 UTC from IEEE Xplore. Restrictions apply.

Fig. 2: Accuracy of Classification Using Pattern-Based Fea-

tures for Different Value of K

TABLE V: Binary Classification Accuracy, Precision, Recall

and and F-Measure

Accuracy Precision Recall F-Measure Positive 90.5% 82.4% 90.5% 86.3% Negative 85.2% 92.2% 85.2% 88.6% Overall 87.5% 87.9% 87.5% 87.6%

of K. According to the figure, the optimal value is 5. Higher values enhance the accuracy during cross-validation, but have

no big impact on that of the test set.

In the next section, we evaluate the model we built, and

present the results of our experiments in the cases of binary,

ternary and multi-class classification.

IV. EXPERIMENTAL RESULTS

After the extraction of features, we run different test using

“Random Forest” [20] classifier. We use 4 Key Performance

Indicators (KPIs) to evaluate the effectiveness of our approach:

Accuracy, Precision, Recall and F-measure which is defined

as follows:

F-measure = 2 · precision · recall

precision + recall . (5)

A. Binary Classification

We first run our experiment to detect the sentiment polarity

of tweets. For this sake, we remove the tweets belonging

to the classes “neutral” and “sarcasm”, and grouped the

other classes into two main classes which are “positive” and

“negative”. The former class contains tweets from the classes

“love” and “happiness”, while the latter contains tweets from

the classes “hate”, “anger” and “sadness”. TABLE V shows

the results of classification. The accuracy obtained reaches

87.51%. Noticeably, the recall of positive tweets is the highest

(i.e., 90.5%), however the precision of negative tweets is the

highest (i.e., 92.2%). This means that tweets which are clas-

sified as negative are mostly negative. However, tweets which

have positive polarity tend to be classified more correctly as

shown in the confusion matrix presented in TABLE VI.

B. Ternary Classification

Despite its importance, binary classification supposes that

the given data are already known to be emotional. However,

Twitter contains many tweets which have no emotional polar-

ity such as news tweets, etc. Therefore, in this subsection we

TABLE VI: Binary Classification Confusion Matrix

Class Classified as

Positive Negative Positive 402 42 Negative 86 495

add neutral tweets as shown before in the description of our

dataset. We then rely on the same set of features to classify

the tweets. The results obtained are given in TABLE VII, and

the confusion matrix of classification is given in TABLE VIII.

The obtained results show that the introduction of a third

class decreases noticeably the accuracy to reach 83.0%. The

new class (i.e., “neutral”) presents a low accuracy, but a very

high precision rate. This can be explained by the fact that the

amount of training data (i.e., number of tweets) for this class

is lower than that for the other classes. Therefore, a tweet

that meets the conditions of the class “neutral” can be easily

detected by the classifier as “neutral”. However, not many of

them meet the condition, and therefore, they are misclassified.

Overall, the results obtained are promising.

C. Multi-class classification

In this subsection, we use the 7 sentiment classes that we

described in Section III. The classification results are given in

TABLE IX, while the confusion matrix is given in TABLE X.

Despite the number of classes, the accuracy obtained is

equal to 56.9%, with a precision that reaches 65.2%. More

interestingly, some sentiments seem to be easier to detect than

others. In particular, tweets belonging to the class “happiness”

were classified with an accuracy equal to 83.1%. This shows

that tweets belonging to this class are easily distinguished from

other classes. This might be due to the fact that, contrarily to

negative tweets, positive tweets belong to mainly two classes,

easy to distinguish from each other. Negative tweets on the

other hand are closer to each other. A typical example is given

by the following tweet: “Damn it.. I really hate when this

happens. This crap doesn’t want to work!!!”. In this tweet,

the user expresses both sentiments of anger and hate. However,

since he explicitly uses the word “hate” the tweet would be

classified as belonging to the class “Hate”, although it shows

sentiments of anger more than hate.

TABLE VII: Ternary Classification Accuracy, Precision, Re-

call and F-Measure

Accuracy Precision Recall F-Measure Positive 90.3% 77.6% 90.3% 83.5% Negative 85.0% 86.5% 85.0% 85.8% Neutral 58.2% 90.4% 58.2% 70.8% Overall 83.0% 83.8% 83.0% 82.7%

TABLE VIII: Ternary Classification Confusion Matrix

Class Classified as

Positive Negative Neutral Positive 401 38 5 Negative 81 494 6 Neutral 35 39 103

Authorized licensed use limited to: University of the Cumberlands. Downloaded on July 24,2021 at 03:22:18 UTC from IEEE Xplore. Restrictions apply.

TABLE IX: Multi-Class Classification Accuracy, Precision,

Recall and F-Measure

Accuracy Precision Recall F-Measure Happiness 83.1% 82.4% 83.1% 82.7% Love 43.4% 59.7% 43.4% 50.3% Neutral 62.7% 69.4% 62.7% 65.9% Sadness 48.9% 71.7% 48.9% 58.1% Anger 42.3% 77.3% 42.3% 54.7% Hate 59.2% 68.4% 59.2% 63.5% Sarcasm 59.1% 25.7% 59.1% 35.8% Overall 56.9% 65.2% 56.9% 58.8%

TABLE X: Multi-Class Classification Confusion Matrix

Class Classified as

H L N Sd A H Sr Happiness (H) 187 15 1 1 0 0 21 Love (L) 14 95 10 10 3 3 84 Neutral (N) 4 1 111 8 2 3 48 Sadness (Sd) 0 4 6 109 6 15 83 Anger (A) 6 5 8 16 85 14 67 Hate (H) 2 8 8 6 4 93 36 Sarcasm (Sr) 14 31 16 2 10 8 117

On the other hand, presence of the class “sarcasm” was a

main reason which led to the low classification accuracy. The

presence of sarcastic tweets engenders the misclassification of

many tweets. Although it has a relatively high classification

accuracy, many tweets are misclassified as sarcastic. There-

fore, arises the necessity of detecting sarcastic statements in

a first stage to discard them before classification. The work

presented in [19] presents good accuracy for classification of

tweets into sarcastic and non-sarcastic.

However, globally, we can confirm that classifying tweets

into separate sentiment classes is a challenging task: as men-

tioned above, many tweets present more than one sentiment.

Therefore, a more interesting task would be quantifying the

sentiments present in the tweet: a tweet should be attributed

more than one sentiment with different scores.

V. CONCLUSION

In this paper, we have proposed a new approach for sen-

timent analysis, where a set of tweets is to be classified into

7 different classes. The obtained results show some potential:

the accuracy obtained for multi-class sentiment analysis in the

data set used was 56.9%. However, we believe that a more

optimized training set would present better performances.

Throughout this work, we demonstrated that multi-class

sentiment analysis can achieve high accuracy level, but it

remains a challenging task. A more interesting task is to

quantify sentiments present in the tweet. Therefore, in a

future work, we will use the results obtained for ternary

classification (which achieved an accuracy equal to 83.0%) to

classify tweets into “positive”, “negative” and “neutral”. The

classified sentimental tweets (i.e., which have been classified

as “positive” or “negative”) will then be given scores for the

corresponding sentiment subclasses. Eventually, we will use

the work [19] in an earlier stage to put aside sarcastic tweets.

ACKNOWLEDGMENT

The research results have been achieved by ”Cognitive

Security: A New Approach to Securing Future Large Scale and

Distributed Mobile Applications,” the Commissioned Research

of National Institute of Information and Communications

Technology (NICT) , JAPAN.

REFERENCES

[1] B. O’Connor, R. Balasubramanyan, B. Routledge, and N. Smith, “From tweets to polls: Linking text sentiment to public opinion time series,” in Proc. Int. AAAI Conf. Weblogs and Social Media, pp. 26–33, May 2010.

[2] M. A. Cabanlit and K. J. Espinosa, “Optimizing N-gram based text feature selection in sentiment analysis for commercial products in Twitter through polarity lexicons,” in Proc. 5th Int. Conf. Inform., Intelligence, Syst. and Applicat., pp. 94–97, July 2014.

[3] U. R. Hodeghatta, “Sentiment analysis of Hollywood movies on Twitter,” in Proc. IEEE/ACM ASONAM, pp. 1401–1404, Aug. 2013.

[4] J. M. Soler, F. Cuartero, and M. Roblizo, “Twitter as a tool for predicting elections results,” in Proc. IEEE/ACM ASONAM, pp. 1194–1200, Aug. 2012.

[5] K. Ghag and K. Shah, “Comparative analysis of the techniques for sentiment analysis,” in Proc. Int. Conf. Advances in Technology and Eng., pp. 1–7, Jan. 2013.

[6] C. G. Akcora, M. A. Bayir, M. Demirbas, and H. Ferhatosmanoglu, “Identifying breakpoints in public opinion,” in Proc. First Workshop on Social Media Analytics, pp. 62–66, July 2010.

[7] B. Sriram, D. Fuhry, E. Demir, H. Ferhatosmanoglu, and M. Demirbas, “Short text classification in twitter to improve information filtering,” in Proc. 33rd Int. ACM SIGIR Conf. Research and development in information retrieval, pp. 841–842, July 2010.

[8] B. Pang, L. Lillian, and V. Shivakumar, “Thumbs up?: Sentiment clas- sification using machine learning techniques,” in Proc. ACL-02 Conf. Empirical Methods in Natural Language Process., vol. 10, pp.79–86, July 2002.

[9] M. Boia, B. Faltings, C.-C. Musat and P. Pu, “A :) is worth a thousand words: How people attach sentiment to emoticons and words in tweets,” in Proc. Int. Conf. Social Computing, pp. 345–350, Sept. 2013.

[10] K. Manuel, K. V. Indukuri and P. R. Krishna, “Analyzing internet slang for sentiment mining,” in Proc. 2nd Vaagdevi Int. Conf. Inform. Technology for Real World Problems, pp. 9–11 Dec. 2010.

[11] W. Gao and F. Sebastiani, “Tweet Sentiment: From Classification to Quantification,” in Proc. IEEE/ACM ASONAM, pp. 97–104, Aug. 2015.

[12] Y.H.P.P. Priyadarshana, K.I.H. Gunathunga, K.K.A. Nipuni N.Perera, L. Ranathunga, P.M. Karunaratne, and T.M. Thanthriwatta, “Sentiment analysis: Measuring sentiment strength of call centre conversations,” in Proc. IEEE ICECCT, pp.1–9, March 2015.

[13] R. Srivastava and M.P.S. Bhatia, “Quantifying modified opinion strength: A fuzzy inference system for Sentiment Analysis,” in Proc. Int. Conf. Advanced in Computing, Communications and Informatics, pp.1512– 1519, Aug. 2013.

[14] K.H. Lin, C. Yang and H.Chen, “What emotions do news articles trigger in their readers?,” in Proc. ACM SIGIR ’07, pp. 733–734, July 2007.

[15] K.H. Lin, ; C. Yang and H Chen, “Emotion Classification of Online News Articles from the Reader’s Perspective,” in Proc. IEEE/WIC/ACM WI-IAT ’08, vol.1, pp.220–226, Dec. 2008.

[16] L. Ye, R. Xu and J. Xu, “Emotion prediction of news articles from reader’s perspective based on multi-label classification,” in Proc. Int. Conf. Machine Learning and Cybernetics, vol.5, pp. 2019–2024, July 2012.

[17] W. Liang, H. Wang, Y. Chu and C. Wu, “Emoticon recommendation in microblog using affective trajectory model,” in Proc. Annual Summit and Conf. Asia-Pacific Signal and Information Processing Association

(APSIPA), pp.1–5, Dec. 2014. [18] C. Fellbaun, WordNet: an Electronic Lexical Database, Cambridge,

Massachusetts, 1998. [19] M. Bouazizi and T. Ohtsuki, “Sarcasm detection in Twitter,” to be

published in IEEE Globecom, Dec. 2015. [20] L. Breiman, “Random Forest,” Machine Learning, vol. 45, no. 1, pp.

5–32, Jan. 2001.

Authorized licensed use limited to: University of the Cumberlands. Downloaded on July 24,2021 at 03:22:18 UTC from IEEE Xplore. Restrictions apply.

<< /ASCII85EncodePages false /AllowTransparency false /AutoPositionEPSFiles false /AutoRotatePages /None /Binding /Left /CalGrayProfile (Gray Gamma 2.2) /CalRGBProfile (sRGB IEC61966-2.1) /CalCMYKProfile (U.S. Web Coated \050SWOP\051 v2) /sRGBProfile (sRGB IEC61966-2.1) /CannotEmbedFontPolicy /Warning /CompatibilityLevel 1.4 /CompressObjects /Off /CompressPages true /ConvertImagesToIndexed true /PassThroughJPEGImages true /CreateJobTicket false /DefaultRenderingIntent /Default /DetectBlends true /DetectCurves 0.0000 /ColorConversionStrategy /LeaveColorUnchanged /DoThumbnails false /EmbedAllFonts true /EmbedOpenType false /ParseICCProfilesInComments true /EmbedJobOptions true /DSCReportingLevel 0 /EmitDSCWarnings false /EndPage -1 /ImageMemory 1048576 /LockDistillerParams true /MaxSubsetPct 100 /Optimize false /OPM 0 /ParseDSCComments false /ParseDSCCommentsForDocInfo false /PreserveCopyPage true /PreserveDICMYKValues true /PreserveEPSInfo false /PreserveFlatness true /PreserveHalftoneInfo true /PreserveOPIComments false /PreserveOverprintSettings true /StartPage 1 /SubsetFonts false /TransferFunctionInfo /Remove /UCRandBGInfo /Preserve /UsePrologue false /ColorSettingsFile () /AlwaysEmbed [ true /Arial-Black /Arial-BoldItalicMT /Arial-BoldMT /Arial-ItalicMT /ArialMT /ArialNarrow /ArialNarrow-Bold /ArialNarrow-BoldItalic /ArialNarrow-Italic /ArialUnicodeMS /BookAntiqua /BookAntiqua-Bold /BookAntiqua-BoldItalic /BookAntiqua-Italic /BookmanOldStyle /BookmanOldStyle-Bold /BookmanOldStyle-BoldItalic /BookmanOldStyle-Italic /BookshelfSymbolSeven /Century /CenturyGothic /CenturyGothic-Bold /CenturyGothic-BoldItalic /CenturyGothic-Italic /CenturySchoolbook /CenturySchoolbook-Bold /CenturySchoolbook-BoldItalic /CenturySchoolbook-Italic /ComicSansMS /ComicSansMS-Bold /CourierNewPS-BoldItalicMT /CourierNewPS-BoldMT /CourierNewPS-ItalicMT /CourierNewPSMT /EstrangeloEdessa /FranklinGothic-Medium /FranklinGothic-MediumItalic /Garamond /Garamond-Bold /Garamond-Italic /Gautami /Georgia /Georgia-Bold /Georgia-BoldItalic /Georgia-Italic /Haettenschweiler /Impact /Kartika /Latha /LetterGothicMT /LetterGothicMT-Bold /LetterGothicMT-BoldOblique /LetterGothicMT-Oblique /LucidaConsole /LucidaSans /LucidaSans-Demi /LucidaSans-DemiItalic /LucidaSans-Italic /LucidaSansUnicode /Mangal-Regular /MicrosoftSansSerif /MonotypeCorsiva /MSReferenceSansSerif /MSReferenceSpecialty /MVBoli /PalatinoLinotype-Bold /PalatinoLinotype-BoldItalic /PalatinoLinotype-Italic /PalatinoLinotype-Roman /Raavi /Shruti /Sylfaen /SymbolMT /Tahoma /Tahoma-Bold /TimesNewRomanMT-ExtraBold /TimesNewRomanPS-BoldItalicMT /TimesNewRomanPS-BoldMT /TimesNewRomanPS-ItalicMT /TimesNewRomanPSMT /Trebuchet-BoldItalic /TrebuchetMS /TrebuchetMS-Bold /TrebuchetMS-Italic /Tunga-Regular /Verdana /Verdana-Bold /Verdana-BoldItalic /Verdana-Italic /Vrinda /Webdings /Wingdings2 /Wingdings3 /Wingdings-Regular /ZWAdobeF ] /NeverEmbed [ true ] /AntiAliasColorImages false /CropColorImages true /ColorImageMinResolution 200 /ColorImageMinResolutionPolicy /OK /DownsampleColorImages true /ColorImageDownsampleType /Bicubic /ColorImageResolution 300 /ColorImageDepth -1 /ColorImageMinDownsampleDepth 1 /ColorImageDownsampleThreshold 1.50000 /EncodeColorImages true /ColorImageFilter /DCTEncode /AutoFilterColorImages false /ColorImageAutoFilterStrategy /JPEG /ColorACSImageDict << /QFactor 0.76 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> /ColorImageDict << /QFactor 0.76 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> /JPEG2000ColorACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 15 >> /JPEG2000ColorImageDict << /TileWidth 256 /TileHeight 256 /Quality 15 >> /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 200 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages false /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict << /QFactor 0.76 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> /GrayImageDict << /QFactor 0.76 /HSamples [2 1 1 2] /VSamples [2 1 1 2] >> /JPEG2000GrayACSImageDict << /TileWidth 256 /TileHeight 256 /Quality 15 >> /JPEG2000GrayImageDict << /TileWidth 256 /TileHeight 256 /Quality 15 >> /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 400 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 600 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict << /K -1 >> /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False /CreateJDFFile false /Description << /CHS <FEFF4f7f75288fd94e9b8bbe5b9a521b5efa7684002000410064006f006200650020005000440046002065876863900275284e8e55464e1a65876863768467e5770b548c62535370300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c676562535f00521b5efa768400200050004400460020658768633002> /CHT <FEFF4f7f752890194e9b8a2d7f6e5efa7acb7684002000410064006f006200650020005000440046002065874ef69069752865bc666e901a554652d965874ef6768467e5770b548c52175370300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c4f86958b555f5df25efa7acb76840020005000440046002065874ef63002> /DAN <FEFF004200720075006700200069006e0064007300740069006c006c0069006e006700650072006e0065002000740069006c0020006100740020006f007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e007400650072002c0020006400650072002000650067006e006500720020007300690067002000740069006c00200064006500740061006c006a006500720065007400200073006b00e60072006d007600690073006e0069006e00670020006f00670020007500640073006b007200690076006e0069006e006700200061006600200066006f0072007200650074006e0069006e006700730064006f006b0075006d0065006e007400650072002e0020004400650020006f007000720065007400740065006400650020005000440046002d0064006f006b0075006d0065006e0074006500720020006b0061006e002000e50062006e00650073002000690020004100630072006f00620061007400200065006c006c006500720020004100630072006f006200610074002000520065006100640065007200200035002e00300020006f00670020006e0079006500720065002e> /DEU <FEFF00560065007200770065006e00640065006e0020005300690065002000640069006500730065002000450069006e007300740065006c006c0075006e00670065006e0020007a0075006d002000450072007300740065006c006c0065006e00200076006f006e002000410064006f006200650020005000440046002d0044006f006b0075006d0065006e00740065006e002c00200075006d002000650069006e00650020007a0075007600650072006c00e40073007300690067006500200041006e007a006500690067006500200075006e00640020004100750073006700610062006500200076006f006e00200047006500730063006800e40066007400730064006f006b0075006d0065006e00740065006e0020007a0075002000650072007a00690065006c0065006e002e00200044006900650020005000440046002d0044006f006b0075006d0065006e007400650020006b00f6006e006e0065006e0020006d006900740020004100630072006f00620061007400200075006e0064002000520065006100640065007200200035002e003000200075006e00640020006800f600680065007200200067006500f600660066006e00650074002000770065007200640065006e002e> /ESP <FEFF005500740069006c0069006300650020006500730074006100200063006f006e0066006900670075007200610063006900f3006e0020007000610072006100200063007200650061007200200064006f00630075006d0065006e0074006f0073002000640065002000410064006f00620065002000500044004600200061006400650063007500610064006f007300200070006100720061002000760069007300750061006c0069007a00610063006900f3006e0020006500200069006d0070007200650073006900f3006e00200064006500200063006f006e006600690061006e007a006100200064006500200064006f00630075006d0065006e0074006f007300200063006f006d00650072006300690061006c00650073002e002000530065002000700075006500640065006e00200061006200720069007200200064006f00630075006d0065006e0074006f00730020005000440046002000630072006500610064006f007300200063006f006e0020004100630072006f006200610074002c002000410064006f00620065002000520065006100640065007200200035002e003000200079002000760065007200730069006f006e0065007300200070006f00730074006500720069006f007200650073002e> /FRA <FEFF005500740069006c006900730065007a00200063006500730020006f007000740069006f006e00730020006100660069006e00200064006500200063007200e900650072002000640065007300200064006f00630075006d0065006e00740073002000410064006f006200650020005000440046002000700072006f00660065007300730069006f006e006e0065006c007300200066006900610062006c0065007300200070006f007500720020006c0061002000760069007300750061006c00690073006100740069006f006e0020006500740020006c00270069006d007000720065007300730069006f006e002e0020004c0065007300200064006f00630075006d0065006e00740073002000500044004600200063007200e900e90073002000700065007500760065006e0074002000ea0074007200650020006f007500760065007200740073002000640061006e00730020004100630072006f006200610074002c002000610069006e00730069002000710075002700410064006f00620065002000520065006100640065007200200035002e0030002000650074002000760065007200730069006f006e007300200075006c007400e90072006900650075007200650073002e> /ITA (Utilizzare queste impostazioni per creare documenti Adobe PDF adatti per visualizzare e stampare documenti aziendali in modo affidabile. I documenti PDF creati possono essere aperti con Acrobat e Adobe Reader 5.0 e versioni successive.) /JPN <FEFF30d330b830cd30b9658766f8306e8868793a304a3088307353705237306b90693057305f002000410064006f0062006500200050004400460020658766f8306e4f5c6210306b4f7f75283057307e305930023053306e8a2d5b9a30674f5c62103055308c305f0020005000440046002030d530a130a430eb306f3001004100630072006f0062006100740020304a30883073002000410064006f00620065002000520065006100640065007200200035002e003000204ee5964d3067958b304f30533068304c3067304d307e305930023053306e8a2d5b9a3067306f30d530a930f330c8306e57cb30818fbc307f3092884c3044307e30593002> /KOR <FEFFc7740020c124c815c7440020c0acc6a9d558c5ec0020be44c988b2c8c2a40020bb38c11cb97c0020c548c815c801c73cb85c0020bcf4ace00020c778c1c4d558b2940020b3700020ac00c7a50020c801d569d55c002000410064006f0062006500200050004400460020bb38c11cb97c0020c791c131d569b2c8b2e4002e0020c774b807ac8c0020c791c131b41c00200050004400460020bb38c11cb2940020004100630072006f0062006100740020bc0f002000410064006f00620065002000520065006100640065007200200035002e00300020c774c0c1c5d0c11c0020c5f40020c2180020c788c2b5b2c8b2e4002e> /NLD (Gebruik deze instellingen om Adobe PDF-documenten te maken waarmee zakelijke documenten betrouwbaar kunnen worden weergegeven en afgedrukt. De gemaakte PDF-documenten kunnen worden geopend met Acrobat en Adobe Reader 5.0 en hoger.) /NOR <FEFF004200720075006b00200064006900730073006500200069006e006e007300740069006c006c0069006e00670065006e0065002000740069006c002000e50020006f0070007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e00740065007200200073006f006d002000650072002000650067006e0065007400200066006f00720020007000e5006c006900740065006c006900670020007600690073006e0069006e00670020006f00670020007500740073006b007200690066007400200061007600200066006f0072007200650074006e0069006e006700730064006f006b0075006d0065006e007400650072002e0020005000440046002d0064006f006b0075006d0065006e00740065006e00650020006b0061006e002000e50070006e00650073002000690020004100630072006f00620061007400200065006c006c00650072002000410064006f00620065002000520065006100640065007200200035002e003000200065006c006c00650072002e> /PTB <FEFF005500740069006c0069007a006500200065007300730061007300200063006f006e00660069006700750072006100e700f50065007300200064006500200066006f0072006d00610020006100200063007200690061007200200064006f00630075006d0065006e0074006f0073002000410064006f00620065002000500044004600200061006400650071007500610064006f00730020007000610072006100200061002000760069007300750061006c0069007a006100e700e3006f002000650020006100200069006d0070007200650073007300e3006f00200063006f006e0066006900e1007600650069007300200064006500200064006f00630075006d0065006e0074006f007300200063006f006d0065007200630069006100690073002e0020004f007300200064006f00630075006d0065006e0074006f00730020005000440046002000630072006900610064006f007300200070006f00640065006d0020007300650072002000610062006500720074006f007300200063006f006d0020006f0020004100630072006f006200610074002000650020006f002000410064006f00620065002000520065006100640065007200200035002e0030002000650020007600650072007300f50065007300200070006f00730074006500720069006f007200650073002e> /SUO <FEFF004b00e40079007400e40020006e00e40069007400e4002000610073006500740075006b007300690061002c0020006b0075006e0020006c0075006f0074002000410064006f0062006500200050004400460020002d0064006f006b0075006d0065006e007400740065006a0061002c0020006a006f0074006b006100200073006f0070006900760061007400200079007200690074007900730061007300690061006b00690072006a006f006a0065006e0020006c0075006f00740065007400740061007600610061006e0020006e00e400790074007400e4006d0069007300650065006e0020006a0061002000740075006c006f007300740061006d0069007300650065006e002e0020004c0075006f0064007500740020005000440046002d0064006f006b0075006d0065006e00740069007400200076006f0069006400610061006e0020006100760061007400610020004100630072006f0062006100740069006c006c00610020006a0061002000410064006f00620065002000520065006100640065007200200035002e0030003a006c006c00610020006a006100200075007500640065006d006d0069006c006c0061002e> /SVE <FEFF0041006e007600e4006e00640020006400650020006800e4007200200069006e0073007400e4006c006c006e0069006e006700610072006e00610020006f006d002000640075002000760069006c006c00200073006b006100700061002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e007400200073006f006d00200070006100730073006100720020006600f60072002000740069006c006c006600f60072006c00690074006c006900670020007600690073006e0069006e00670020006f006300680020007500740073006b007200690066007400650072002000610076002000610066006600e4007200730064006f006b0075006d0065006e0074002e002000200053006b006100700061006400650020005000440046002d0064006f006b0075006d0065006e00740020006b0061006e002000f600700070006e00610073002000690020004100630072006f0062006100740020006f00630068002000410064006f00620065002000520065006100640065007200200035002e00300020006f00630068002000730065006e006100720065002e> /ENU (Use these settings to create PDFs that match the "Required" settings for PDF Specification 4.01) >> >> setdistillerparams << /HWResolution [600 600] /PageSize [612.000 792.000] >> setpagedevice