help3

article2.pdf

Home >Computer Science homework help >help3

Received July 1, 2017, accepted August 7, 2017, date of publication August 18, 2017, date of current version October 25, 2017.

Digital Object Identifier 10.1109/ACCESS.2017.2740982

A Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter MONDHER BOUAZIZI AND TOMOAKI OHTSUKI, (Senior Member, IEEE) Graduate School of Science and Technology, Keio University, Yokohama 223-8522, Japan

Corresponding author: Mondher Bouazizi (bouazizi@ohtsuki.ics.keio.ac.jp)

ABSTRACT Sentiment analysis and opinion mining in social networks present nowadays a hot topic of research. However, most of the state of the art works and researches on the automatic sentiment analysis and opinion mining of texts collected from social networks and microblogging websites are oriented toward the binary classification (i.e., classification into ‘‘positive’’ and ‘‘negative’’) or the ternary classification (i.e., classification into ‘‘positive,’’ ‘‘negative,’’ and ‘‘neutral’’) of texts. In this paper, we propose a novel approach that, in addition to the aforementioned tasks of binary and ternary classifications, goes deeper in the classification of texts collected from Twitter and classifies these texts into multiple sentiment classes. While in this paper, we limit our scope to seven different sentiment classes, the proposed approach is scalable and can be run to classify texts into more classes. We first introduce SENTA, our tool built to help users select out of a wide variety of features the ones that fit the most for their application, to run the classification, through an easy-to-use graphical user interface. We then use SENTA to run our own experiments of multi- class classification. Our experiments show that the proposed approach can reach up to 60.2% accuracy on the multi-class classification. Nevertheless, the approach proves to be very accurate in binary classification and ternary classification: in the former case, we reach an accuracy of 81.3% for the same data set used after removing neutral tweets, and in the latter case, we reached an accuracy of classification equal to 70.1%.

INDEX TERMS Twitter, sentiment analysis, machine learning.

I. INTRODUCTION Twitter, as well as other Online Social Networks (OSN) and microblogging websites became literally the biggest web destinations for people to communicate with each other, to express their thoughts about products [1], [2] or movies [3], share their daily experience and communicate their opin- ion about real-time and upcoming events, such as sports or political elections [4], etc.

While new platforms such as Snapchat1 focused on video- and multimedia-based communication, Twitter kept some properties that make it a very interesting subject of data mining: • In its basic form, Twitter is a microblogging service that allows users to post brief text updates, with the unique property of not allowing more than 140 characters in one text message. This limitation turned out to be a very attractive property, since it allows posting quick, even real-time, updates regarding one’s activities and facil- itates sharing and forwarding status messages, as well as replying to them quickly [5]. This allows the quick spread of news or information, regardless of whether that

1https://www.snapchat.com

has a positive impact or a negative one, whether the news spread are correct or false, etc.

• The openness and ease of access to posts from different sources attract people more than ever: while in most of the cases, the access to someone’s updates requires a mutual friendship on OSNs such as Facebook, Twitter allows any user to follow another, without the reciprocal being true.

• The wide use of hashtags makes it easy for people to search for tweets dealing with a specific subject. Hashtags are labels ‘‘used on social network and microblogging services which makes it easier for users to find messages with a specific theme or content’’.2

Hashtags also allow users to categorize their own tweets so that other users know what the tweet is dealing with.

Thanks to these properties, this ecosystem presents a very rich, source of data to mine. However, due to the limitation in terms of characters (i.e. 140 characters per tweet), mining such data present lower performances than that when mining longer texts. In addition, classification into multiple classes remains a challenging task: binary classification of a text

2https://en.wikipedia.org/wiki/Hashtag

VOLUME 5, 2017 2169-3536 2017 IEEE. Translations and content mining are permitted for academic research only.

Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

20617

M. Bouazizi, T. Ohtsuki: Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter

usually relies on the sentiment polarity of its components (i.e., whether they are positive or negative); whereas, when positive and negative classes are divided into subclasses, the accuracy tends to decrease remarkably.

In this paper, we propose an approach that relies on writing patterns, and special unigrams to classify tweets into 7 dif- ferent classes, and demonstrate how the proposed approach presents good performances (i.e., classification accuracy and precision). The main contributions present in this paper are as follows:

1) We introduce SENTA, a user-friendly tool that allows the extraction of a wide set of features from texts that cover both the content and the form,

2) We introduce, in addition to some conventional fea- tures, writing pattern-related features to help enhance the accuracy of classification,

3) We use SENTA to extract an optimal set of features to classify tweets into 7 different sentiment classes.

The remainder of this paper is structured as follows: In Section II, we present our motivations for this work and in Section III, we describe some of the related work. In Section IV, we present SENTA, our tool to extract different features from tweets, and that we will use later on to per- form the multi-class classification. In Section V, we describe in details the proposed method. In Section VI, we detail our experiments and the results obtained. Section VII con- cludes this paper and proposes possible directions for future work.

II. MOTIVATIONS A. WHY MULTI-CLASS SENTIMENT ANALYSIS? Social networks and microblogging websites such as Twitter have been the subject to many studies in the recent few years. Automatic sentiment analysis and opinion mining present a hot topic of study. Social networks present a huge source of data representing the opinions of a significant, yet totally random, proportion of users and customers who are using a product of a service. However, due to the informal language used, the presence of non-textual content and the use of slang words and abbreviations, classification of data extracted from such microblogging websites is rather a challenging task. Ghag and Shah [6] defines ‘‘Hidden Sentiment Identifica- tion’’ which is the identification of the real feeling rather than the sentiment polarity, ‘‘Handling Polysemy’’ which is the existence of multiple meanings that might have different sentiment polarity for the same word, and ‘‘Mapping Slangs’’ which is the identification of the meaning and the polarity of slang words, among others as the most challenging tasks facing the sentiment analysis of short microblog texts.

On a related context, the state of the art proposed approaches are mostly focusing on the binary and ternary sen- timent classification. In other words, they classify texts either into ‘‘positive’’ and ‘‘negative’’, or into ‘‘positive’’, ‘‘nega- tive’’ and ‘‘neutral’’. However, to study the opinion of a user, it would be more interesting to go deeper in the classification, and detect the sentiment hidden behind his post. Following

two examples of tweets which are negative, however, reflect two completely different aspects: • ‘‘Damn damn.. no iPhone support for windows XP x64. There are some workarounds, but I can’t figure this out.’’

• ‘‘Nooooooooooo! My iPhone glass cracked :(’’ In the first example, the user is expressing his fury towards the absence of support of his phone on an operating system. However, in the second, the user is expressing some feeling of sadness because of a physical problem his phone faced. The first example shows some important information regard- ing the satisfaction of the user, therefore, it might be more important to study. However, in general, both information can be used, yet, they have to be distinguished from each other.

B. THE NEED FOR AN OPEN-SOURCE TOOL FOR FEATURE EXTRACTION FROM TWEETS Nowadays, a variety of tools such as LIWC [7] offer the option to extract advanced features for different languages from texts, most of these tools are paid and require some programming knowledge to use.

In addition, to the best of our knowledge, none of these tools offer the possibility to extract, in a flexible way, writing patterns, that can be used to enhance the performances of classification tasks such as the detection of sarcasm or, as in the current work, the multi-class sentiment analysis.

Therefore, arises the need for a more flexible, yet easy to use and user-friendly tool that allows the extraction of multiple types of features, while offering the possibility to customize them depending on the use case, to obtain perfor- mances as high as possible.

In this work, we present SENTA, an open-source tool that performs the extraction of features and save them either in an excel format sheet or a file that can be read by Weka [8] to perform the classification.

This tool, as described, is to be publicly open for any contribution, and hopefully makes a start point for an open- source efficient tool to perform text classification for any purpose.

III. RELATED WORK Twitter data mining has been a hot topic of research in the last few years. Nature of the data mined varies widely depending on the aim and the final result expected. Consequently, the techniques used to process data and extract the needed infor- mation are different.

Akcora et al. [9] proposed a method to determine the changes in public opinion over the time, and identify the news that led to breakpoints in public opinion. In a related con- text, Sriram et al. [10] proposed a method to classify tweets depending on their natures into a set of classes including private messages, opinions and event, etc.

However, most of the work has been focusing on the content of the tweets and how to extract opinions of users towards specific topics or objects. The work of Pang et al. [11] presented the pioneer work for the use of machine learning to classify texts based on their sentiment polarity. In their work,

20618 VOLUME 5, 2017

M. Bouazizi, T. Ohtsuki: Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter

the authors used unigrams, bigrams and adjectives in different ways to classify a set of movie reviews into positive or nega- tive. Other works iterated more on the idea, and new types of features have been used for the classification, depending on the aim and application: Boia et al. [12] and Manuel et al. [13] proposed two approaches that, respectively, rely on emoti- cons to detect the polarity of tweets and on slang words to assign a sentiment score to online texts. These two works proved how non-textual components can be used to detect the polarity of a text.

More recent works went deeper, and new models have been built: Gao and Sebastiani [14] proposed a recent approach that focus in the repartition or the frequency of sentiment classes in the set they analyze. Moving from classification to quantification, the authors concluded that using a quantification-specific algorithm presents a better fre- quency estimation than using regular classification-oriented algorithms.

Few works have been conducted on the multi-class sen- timent analysis. Most of them focused on assessing the sentiment strength into different sentiment strength levels (e.g., ‘‘very negative’’, ‘‘negative’’, ‘‘neutral’’, ‘‘positive’’ and ‘‘very positive’’) or simply give numeric sentiment scores to the texts [15], [16]. Nevertheless, other works were conducted to classify texts into different sentiment classes: Lin et al. [17], [18] proposed an approach that classifies doc- uments into reader-emotion categories. They relied on what

they qualify as similarity features and word emotion features along with other basic features. The approach, although it shows some potential, is oriented towards the reader rather than the writer. Therefore, the sentiment classes proposed are different from what a writer might intend to show. Similarly, Ye et al. [19] studied the problem of emotion detection of news articles from reader’s perspective, and tried various multi-label classification methods and different strategies for features selection to conclude which are to be adopted to solve the problem. Liang et al. [20] proposed an emoticon recommendation system that recommends emoticons for posted texts to help to author decide which emoticon to insert to show what he intends.

IV. SENTA - A USER FRIENDLY TOOL FOR FEATURES EXTRACTION FROM TEXTS SENTA is a user-friendly tool we developed to extract differ- ent features from the tweets, and texts in general, to perform in a later step the classification of tweets/texts into different classes. The features extracted vary widely, and cover the context as well as the form of the text.

SENTA has several graphical interfaces that allow the user to easily input his data, choose the features he wants to extract, and save the output in different formats. In this work, we have used SENTA to extract the necessary features that we used to perform the task of multi-class sentiment analysis in Twitter.

FIGURE 1. Pre-processing of tweets.

VOLUME 5, 2017 20619

M. Bouazizi, T. Ohtsuki: Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter

A. TOOLS SENTA was built using Java and Java FXML. While many libraries were used to build this program, mainly OpenNLP was exploited in most of the tasks. OpenNLP has been used to perform the NLP basic tasks such as the tokenization, Part-of-Speech (PoS) tagging and the lemmatization of the texts (i.e., tweets in our case).

B. CONVENTION For the rest of this Section, the user of the program SENTA will be referred to as ‘‘the user’’, while the Twitter user whose tweet is processed will be referred to as ‘‘the twitterer’’

In addition, by interface, we mean a graphical user interface of SENTA.

C. PRE-PROCESSING OF TWEETS During this work, we pre-process each tweet as shown in Fig. 1: we start by removing the URLs, tags at the beginning of the tweets and irrelevant content. We then use OpenNLP to tokenize the tweet, get the Part-of-Speech (PoS) tags of the obtained tokens, and refer to both (tokens + PoS tags) to get the lemmas of all the words. We then generate what we call a negation vector of the tweet. A negation vector is a vector having the same length as that of the tokens. If the tweet contain a negation word (e.g., ‘‘not’’, ‘‘never’’, etc. ),

all the tokens (words) that come after, until the next punctua- tion mark are considered as negated, and are attributed a value equal to 1 in the corresponding negation vector. This will help later detect which words are positive and which are negative. Obviously, many works such as [21] present better solutions to handle the presence of negation and polarity shifting in sentiment analysis, however, we opted for this more straight- forward, yet less complex and faster approach.

We also made an internal tool that decomposes the hashtags into words referring to a dictionary of words occurrence probability as we will describe later on in this work. This decomposition is used also for detecting any sentiment hidden in the hashtags. On a small set of hashtags (i.e., 100 different hashtags) our tool reached a good accuracy of decomposition that reached 88%.

D. GRAPHICAL USER INTERFACES 1) MAIN WINDOWS a: PROJECT TYPE WINDOW As mentioned above, SENTA was developed as a user- friendly tool to extract different possible features from texts. Therefore, to assist the user all over the process, different interfaces are present.

From the first window shown in Fig. 2, the user chooses whether he wants to open an already existing project,

FIGURE 2. The ‘‘Main’’ window of SENTA.

20620 VOLUME 5, 2017

M. Bouazizi, T. Ohtsuki: Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter

FIGURE 3. The ‘‘Open an existing project’’ window.

import features from an existing file (and eventually add them to the ones he will extract once he goes to the next step), or start a new project.

b: IMPORT PROJECT WINDOW The import of an existing project supposes that a project has already been created. SENTA allows the user to save an existing project in a file with the extension ‘‘*.senta’’, along with the different files required to load the project.

Fig. 3 shows the interface displayed when the user chooses to open an existing project. He has the choice to browse his computer to look for a project, or to select directly one of the recently opened/created projects.

After the selection of the file, the user needs to click ‘‘Get’’ to collect the different options, parameters and features to be collected: • Project type: this refers to whether the sets used in the existing project are a training set and a test set or a train- ing set and a non-annotated set. The difference between a test set and a non-annotated set will be explained later in this section.

• Project name: the name of the project as saved earlier, and this cannot be changed for the existing project, but when saving the current project, the user might choose a different name.

• Training and test files: these are the data sets used previously.

• Sentiment classes: these are the classes that the tweets are supposed to be classified to (extracted from the training set)

• Features file: the different sets of features and fea- ture parameters as selected previously for the opened project.

• Extra files: these are used to make the feature extrac- tion faster, if they have previously been extracted and saved in the corresponding files. These will be explained further later.

For the same project, the user can choose a different train- ing and/or test set (or non-annotated set). He can also choose not to use the old set of features, and select new ones.

c: IMPORT FEATURES WINDOW As stated above, in addition to the extraction of features, SENTA allows the import of extra features, which have been extracted using external tools, so that they are added to the set of features extracted by SENTA. Fig. 4 shows the window where the extra features can be imported.

In addition to the training and the test/non-annotated sets themselves, the user inputs 2 files corresponding to the extra features.

VOLUME 5, 2017 20621

M. Bouazizi, T. Ohtsuki: Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter

FIGURE 4. The ‘‘Import features’’ window.

The user needs to specify the format of the file. Only a Weka file (i.e., ‘‘*.arff’’), a text file (i.e., ‘‘*.txt’’ tabulation separated) or a CSV file (i.e., ‘‘*.csv’’ comma separated), can be imported.

The extra features extracted from both the training and the test/non-annotated set need to be provided for all the instances (tweets). In case one of the files is missing or in case of inconsistency in terms of number of instances, the extra features will be dismissed entirely.

Once the user specifies the location of all the files, he needs to click on ‘‘Collect features’’ to get the tweets and their features. The training and test/non-annotated sets have a specific format required that will be discussed later on. However, regarding the extra features files, they are highly recommended to contain the Tweet ID field so that the fea- tures can match the actual tweets collected from the data sets. If such a field does not exist, the features will be attributed automatically to the tweets in the same order. Obviously in case of inconsistency (e.g., the number of lines in the data set file and the features file are not equal) the features file will be dismissed.

d: CREATION OF A NEW PROJECT WINDOW However, during this work, no features, other than the ones extracted with SENTA are used. Therefore, we opt for the

creation of a new project. To start a new project, the user is supposed to provide two datasets: a training set and either a test set or a non-annotated set as shown in Fig. 5. The training set and test set have to contain at least the following attributes: • Tweet ID: this is the unique ID of the tweet, that will be used in the rest of the work to identify the tweet and that will be used later to save the tweets features.

• Username: the name of the twitterer who posted the tweet. While this information is not used for any purpose during this work, this information might be needed in a future extension (e.g., to detect the gender/location of the user as extra features).

• Tweet message: the content of the tweet itself. • Class: the user-defined class of the tweet. The last attribute supposes that the tweets have already

been manually annotated by the user, and therefore can be used for training and/or testing. For the same reason, if the user decides to opt for a non-annotated set, in which case he will extract the features and try to perform the prediction of the classes of the different tweets, this attribute is not supposed to be provided, and if given, it is simply ignored.

Once the files containing the data sets are selected, the user can check the different classes by selecting ‘‘Load classes’’. The user has also the possibility to add extra classes. While this might seem irrelevant and meaningless at this point,

20622 VOLUME 5, 2017

M. Bouazizi, T. Ohtsuki: Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter

FIGURE 5. The ‘‘Start a new project’’ window.

these extra classes can be used later to extract extra features (e.g., Unigram features), to enhance the accuracy of classifi- cation. This will be discussed later on in this Section.

e: FEATURE SELECTION WINDOW After the collection of the training tweets and the test/non- annotated tweets, the user is supposed to select the features he wants to extract. The features that can be extracted using SENTA are divided into 7 different sets as shown in Fig. 6 that we will cover later on. However, note that all the interfaces that manage the extraction of features are similar.

The 7 sets of features are: • Sentiment-related features • Punctuation features • Syntactic and stylistic features • Semantic features • Unigram Features • Top words • Pattern-related features To select a set of features, the user has to check it, and then

customize it. The small question mark button next to the name of the set of features opens a help window that explains what the set of features does, and how to configure it.

The features selection along with their parameters can be exported and re-imported for a future project any time.

Once the features and their associated parameters are set, on the main window, the number of features to be extracted for each family of features is displayed.

f: SAVE PROJECT WINDOW The user is then called to choose the different options to save his project as shown in Fig. 7, where he has to specify a name for his project, a location for it to be saved, along with the different save options including the type of output and whether some extra data are to be saved or not.

Inside the project directory specified, a subfolder will be created and named after the project name.

The features qualified as ‘‘Top words’’ and ‘‘Pattern- related features’’ require the extraction of some words, expressions or patterns from the training set (or an indepen- dent set other than the test/non-annotated set) as we will discuss later. However, given the fact that this procedure takes some time, or that the user might prefer to extract these dictionaries from an independent set, SENTA offers the option to let the user import these from a different source (and checks if they are valid or not). SENTA also allows him to save the patterns and/or top words at this stage that will be extracted from the current training set (this requires that the user already selected these features to be extracted).

VOLUME 5, 2017 20623

M. Bouazizi, T. Ohtsuki: Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter

FIGURE 6. The ‘‘Features selection’’ window.

The features, once extracted, can be saved in different formats: a Weka file (i.e., ‘‘*.arff’’), a text file (i.e., ‘‘*.txt’’ tabulation separated) and/or a CSV file (i.e., ‘‘*.csv’’ comma separated).

g: START EXTRACTION WINDOW Once the project details have been set, the user can start the feature extraction, and keep track of which task is currently being run as well as the tasks already finished as shown in Fig. 8. The time displayed is in seconds (s). The user can also pause the task any time but this will not free any space in the memory neither free the thread being run.

h: PROJECT SUMMARY WINDOW The last interface in the main windows is a recapitulation of the project along with the output files is displayed as shown in Fig. 9.

The recapitulation includes in addition to the project name, directory and type, the location and size of the training and test sets, and the files generated along with the project file.

From this point the user can go to the previous interface, go back to the main interface or open in the system explorer the project directory to browse the saved files.

2) FEATURE CUSTOMIZATION WINDOWS Feature customization window appears when a user presses the button ‘‘customize’’. For all the sets of features, we added the button ‘‘Default’’ that selects by default the features that we used to perform the multi-class classification in the rest of this work to make it easy to replicate.

a: SENTIMENT FEATURES Sentiment features are features which rely on the senti- ment polarities of the different components of the text such as the words themselves, emoticons, hashtags, etc. These features are extracted using already-built dictionaries and small sub-tools we use internally. Noticeably we referred to SentiStrength to build our dictionary of emotional words, however, we are currently building our own. Sentiment fea- tures are divided into 5 sub-categories as shown in Fig. 10:

- Textual features: these are features that deal with the textual component of the tweet. These include the following features: • Number of positive words • Number of negative words • Number of highly emotional positive words (i.e., words having score returned by SentiStrength greater or equal to 3)

20624 VOLUME 5, 2017

M. Bouazizi, T. Ohtsuki: Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter

FIGURE 7. The ‘‘Save project’’ window.

• Number of highly emotional negative words (i.e., words having score returned by SentiStrength less or equal to −3)

• Number of capitalized positive words • Number of capitalized negative words • Ratio of emotional words ρ(t) defined as

ρ(t) = PW(t)−NW(t) PW(t)+NW(t)

(1)

where t is the tweet, PW and NW are the total score of positive words and that of negative words as returned by SentiStrength. In case the tweet does not contain any emotional word, ρ is set to 0.

- Emoticons-related features: these include the count of positive, negative, neutral and joking (or ironic) emoticons. Emoticons qualified of neutral are ones who do not show clear emotion such as ‘‘(._.)’’ while joking emoticons are ones used sometimes with ironical or sarcastic statements (e.g., ‘‘:P’’). - Hashtags-related features: these include the count of

positive and negative hashtags. To decide on a hashtag’s polarity, we defined a simple probabilistic model that decom- poses the hashtag into words, and detects the polarity of the resulting expression.

- Slang words-related features: these include the count of positive and negative slang words. To extract these we refer to a dictionary containing the most common slang words along with their polarities.

- Contrast features: these detect whether there is any contrast between the different components. By contrast we mean the coexistence of a negative component and a positive one within the same tweet, whether the two components have the same nature (e.g., words, emoticons, etc.) or different natures (e.g., words vs emoticons, etc.). In total 5 features are extracted which include the contrast between words, between hashtags, between words and hashtags, between words and emoticons and between hashtags and emoticons.

b: PUNCTUATION FEATURES Punctuation features are ones related to the use of punctuation marks as well as the capitalization of words, etc. as shown in Fig. 11. They are divided into 4 sub-categories:

- Punctuation marks: these include the number of full stops, commas, semicolons, exclamation marks and question marks.

- Parentheses and similar symbols: these include the number of parentheses, brackets and braces.

VOLUME 5, 2017 20625

M. Bouazizi, T. Ohtsuki: Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter

FIGURE 8. The ‘‘Start of collection and project progress’’ window.

- Words and characters these include the count of words and characters, the average number of words and characters per sentence, etc.

- Apostrophe and quotation marks

c: SYNTACTIC AND STYLISTIC FEATURES Syntactic and stylistic features are ones related to the use of words and expressions in the tweet/text. They are divided into 3 sub-categories as shown in Fig. 12:

- Use of content words-related features: content words are nouns, verbs, adjectives and adverbs. The fea- tures extracted are the count and the ratio of each aside.

- Syntactic features: these are related to the use of some speech forms, proper nouns, and symbols.

- Use of words: these are features related to the use of non-content words such as particles, interjections, pro- nouns, negation. They also include the use of uncommon words (which might obviously be content words). To judge whether a word is common or not, we referred to a big amount of texts collected online. We calculated the prob- ability of use of the different words and qualified the top 5,000 words as ‘‘common’’ while the rest are considered as ‘‘uncommon’’.

d: SEMANTIC FEATURES Semantic features are ones related to the meanings of words in the language as well as the logic behind it. Fig. 13 shows the features window. In the current version of the project, very few features can be extracted. They include the use of opinion words or expressions, the use of highly sentimental words, the use of uncertainty words and the use of active and passive forms.

e: UNIGRAM FEATURES Unigram features are kind of special features that are extracted with reference to dictionaries built according to the user’s defined parameters. Since proposed by Pang et al. [11], unigrams and n-grams in general, have been used as basic features for sentiment analysis using machine learning. In the different approaches, unigrams are collected from the training data sets, and either the count or the presence of these uni- grams are used as features for the classification. In this work, we make use of WordNet [22] to collect unigrams related to each sentiment class. The user is supposed to come up with a small set of seed words few in number for each class, and use WordNet to collect their synonyms and hyponyms down to a certain depth. The choice of synonyms and hyponyms

20626 VOLUME 5, 2017

M. Bouazizi, T. Ohtsuki: Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter

FIGURE 9. The window displaying the ‘‘Summary of the project’’.

is based on the fact that these words are highly correlated with the initial seed word, and usually describe the same object, if not a more precise one. While synonyms refer usually to equivalent terms, hypernyms and hyponyms show the relationship between the more general term and its more specific instances.

A hypernym, or a superordinate, is a broader term than a hyponym, whereas a hyponym is a word or an expression which is more specific than its hypernym. For example, for the word ‘‘feeling’’, two of its direct hypernyms are ‘‘per- ception’’ and ‘‘idea’’. The words ‘‘happiness’’, ‘‘anger’’ and ‘‘fear’’ are some of its hyponyms. Hypernyms might lose some of the specificities of the

initial word, therefore, in our study, we collect only syn- onyms and hyponyms of the seed words. On the other hand, hyponyms also might lose the original meaning of the word, and collide with some of other classes. Therefore, the depth down to which we collect the hyponyms is set to a certain value we refer to as Depth (or Dhypo, which is a parameter to optimize by the user).

This is explained in Fig. 14 which shows how the dictio- naries are extracted: we start with a set of seed words for each sentiment class. We then collect the synonyms and hyponyms to get to new sets of words, from which we further extract the

synonyms and hyponyms. The same process is repeated over and over Dhypo times.

Fig. 15 show the different parameters set for unigram features: in SENTA, the extracted words can be used as individual binary features (i.e., a feature for each word that detects whether or not that word appear in the tweet/text or not) or they are all summed for each sentiment class, and the count of words from each set on a given tweet is used as a separate feature. They can also be separated based on their PoS (i.e., nouns, verbs, adjectives and adverbs each aside) so instead of having one group of words per sentiment class, the user can get up to 4. This is because the number of words to be extracted totally has to be set prior to the extraction. The user can also choose to collect only words of just one or two PoS out of the 4. This set of features has been proven to be very efficient in detecting the sentiment of tweets as we will discuss later in this paper.

The sets of seed words can be defined by pressing ‘‘manage seedwords’’. By default, SENTA offers seed words for 12 dif- ferent sentiment classes so that, if any of them is present, when the user chooses to import default seed words, they are added. The interface showing how to add a seed word is given in Fig. 16. The user types the word, chooses its PoS and the class it belongs to.

VOLUME 5, 2017 20627

M. Bouazizi, T. Ohtsuki: Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter

FIGURE 10. The ‘‘Sentiment features customization’’ window.

FIGURE 11. The ‘‘Punctuation features customization’’ window.

f: TOP WORDS Top words, as their name indicate, are the words that occur the most in the training set. Fig. 17 shows the parameters related

to this set of features: The user can choose the PoS of the top words to be collected, whether he wants each PoS-related words to be extracted separately, the number of Top Words

20628 VOLUME 5, 2017

M. Bouazizi, T. Ohtsuki: Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter

FIGURE 12. The ‘‘Stylistic and semantic features customization’’ window.

FIGURE 13. The ‘‘Semantic features customization’’ window.

per class or PoS, and again whether the features are binary or numeric.

The two parameters ‘‘Min Ratio’’ and ‘‘Min Occurrence’’ define the criteria of extraction of top words. For a positive

sentiment class ‘‘A’’ (e.g. ‘‘Happiness’’), the ratio of occur- rence of this word on the positive sentiment tweets over that on all the negative sentiment tweets should be higher than ‘‘Min Ratio’’. In addition, it has to occur on the sentiment

VOLUME 5, 2017 20629

M. Bouazizi, T. Ohtsuki: Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter

FIGURE 14. Procedure of extraction of Unigrams using WordNet.

FIGURE 15. The ‘‘Unigram features customization’’ window.

class ‘‘A’’ more than the value set for the parameters ‘‘Min Occurrence’’. In this work, when we run the multi-class sentiment analysis on our training and test tweets, Top Words have not been used as features, for the reason that they present some redundancy with unigram features, since many of the words on both collide.

g: PATTERN-RELATED FEATURES The idea of our pattern-related features has been proposed in our previous work [23], in which we proposed an approach

that relies on Part of Speech tags (PoS-tags) to extract sarcastic patterns. In SENTA we elaborated more this kind of features, and made a more generic approach to extract pat- terns. Patterns are extracted based on the PoS-tags of words: the different possible PoS-tags (36 in total, along with a 37th one referring to the punctuation) are divided into different groups, and given a sentence S, containing n different words, the words of S are subject to different actions based on their PoS-tag, and according to the rules defined by the user.

Fig. 18 shows the different parameters of the Pattern fea- tures: initially, the user defines whether he wants his pattern

20630 VOLUME 5, 2017

M. Bouazizi, T. Ohtsuki: Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter

FIGURE 16. The ‘‘Seed words management’’ window.

to be used each as a separate feature, or summed based on their length and sentiment class. If the features are separate (i.e., each is a unique feature), only one pattern length is taken into account, otherwise he can choose a minimal and a maximal length for patterns. The user then chooses how many categories he wants his features to be divided into,

and specifies the action to do for each category by pressing ‘‘Customize’’. The different actions for the different cate- gories are given in Fig. 19: a word can be kept as it is, lemmatized, replaced by a specific expression, or by a user defined expression, etc.

The user is next supposed to specify for each PoS tag, which category it belongs to by pressing the button ‘‘Define’’ which displays the window shown in Fig. 20.

Later on this work, when performing the multi-class classi- fication, we will give a concrete example of how patterns are extracted using SENTA. A pattern should occur on a given sentiment class at least the value of the parameter ‘‘Min # of Occurrences’’ times to be considered. Given a full pattern T extracted from a tweet, and a pattern P extracted earlier from the training set, we define the following resemblance function [24]:

res(p, t) =

 

1, if the tweet vector contains the pattern as it is, in the same order,

α, if all the words of the pattern appear in the tweet in the correct order but with other words in between,

γ ·n/N, if n words out of the N words of the pattern appear in the tweet in the correct order,

0, if no word of the pattern appears in the tweet.

FIGURE 17. The ‘‘Top words features customization’’ window.

VOLUME 5, 2017 20631

M. Bouazizi, T. Ohtsuki: Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter

FIGURE 18. The ‘‘Pattern-related features customization’’ window.

FIGURE 19. The different actions for different PoS-Tags categories.

If the patterns are used as unique features, each feature takes the value of resemblance as defined. Otherwise, the patterns are grouped into different groups based on their sen- timent class and length as shown in TABLE 1 where L1 · · ·LM are the different lengths of the patterns, and S1 · · ·SM are the different sentiments (classes).

Given the K patterns extracted for the sentiment class Si and the length Lj p the value of the feature Fij is

Fij = K∑ k=1

res(pk, t) (2)

FIGURE 20. The ‘‘PoS-Tags categories customization’’ window.

TABLE 1. Pattern features.

Fij as defined measures the degree of resemblance of a tweet t to patterns of class i and length j. Therefore, two more param- eters are to be defined by the user which are α and γ .

20632 VOLUME 5, 2017

M. Bouazizi, T. Ohtsuki: Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter

E. EXTENSIBILITY Currently, SENTA extracts some basic features that allow performing tasks such as sentiment analysis, even for mul- tiple classes. However, for more advanced tasks, we believe that it requires more features to be added.

Currently, we are building some sets of features we quali- fied as ‘‘advanced sentiment features’’, ‘‘advanced semantic features’’ and ‘‘advanced pattern features’’ that extract deeper features from the texts. However, other features related to causality, conditionality, differentiation of informative and interrogative form, etc. are to be added.

In addition, currently SENTA supports only English, which presents a big limitation, since it makes it inapplicable for other languages: we believe that making it support other languages and/or detect automatically the language of the text will add more value. Last, yet not the least, we plan to implement some machine learning algorithms, or call Weka internally to perform the classification, in case the user does not want his features to be exported, rather prefers to make the classification internally and get the results, so that he can adjust the parameters while still running the program and retry.

V. MULTI-CLASS SENTIMENT ANALYSIS - PROPOSED APPROACH A. PROBLEM STATEMENT Given a set of tweets, we aim to classify each one of them to one of the following 7 classes: ‘‘love’’, ‘‘happiness’’, ‘‘fun’’, ‘‘neutral’’, ‘‘hate’’, ‘‘sadness’’ and ‘‘anger’’. Therefore, from each tweet, we extract different sets of features, refer to a training set and use machine learning algorithms to perform the classification.

We have chosen the aforementioned sentiment classes for different reasons. First of all, given our observation during our work [25], we mainly concluded that we needed a bal- anced amount of data between negative and positive classes. In addition, while the aforementioned sentiments are the ones present the most in tweets as observed in [26].

B. DATA For the sake of this work, we manually collected and prepared 2 datasets as follow: • Set 1: this set contains 21 000 tweets which have been manually classified into the 7 classes, each containing 3 000 tweets. This set is used for training. Therefore, in the rest of this work, it will be referred to as the ‘‘training set’’.

• Set 2: this set contains 19 740 tweets. All tweets are manually checked and classified into the 7 classes. This set will serve as a test set. Therefore, in the rest of this work, it will be referred to as the ‘‘test set’’.

The structure of the dataset used is shown in TABLE 2.

C. FEATURES EXTRACTION Under different emotional conditions, humans tend to behave differently. This includes the way they talk and express their

TABLE 2. Structure of the dataset used.

feelings. Therefore, it might be important to rely, not only on the vocabularies used, but also on the expressions and sentence structures used under the different conditions, to quantify and model these feelings. Therefore, in the rest of this section, we rely on these assumptions to extract different sets (or families) of features.

The features are extracted using SENTA, the tool we intro- duced in Section IV.

1) SENTIMENT-BASED FEATURES As stated above, sentiment-based features are ones based on the sentiment polarity (i.e., ‘‘positive’’/‘‘negative’’) of the different components of tweets. Out of the different features offered by SENTA, we extract the following ones: • The number of positive words and that of negative words,

• The number of highly emotional positive words and that of highly emotional negative words,

• The ratio of emotional words, • The number of positive and negative emoticons, • The number of positive and negative slang words.

2) PUNCTUATION-BASED FEATURES While punctuation do not usually show any sentiments explicitly, except for exclamation marks maybe, we believe that the excessive use of some (e.g., question marks, excla- mation marks, etc.) shows the strength of some sentiments. For example, the following two tweets might show different sentiments according to the annotators:

- ‘‘Why didn’t you go with him?’’ - ‘‘Why did you tell her???????’’ While in both examples, the twitterers are asking questions,

in the first one, the annotators agreed on classifying the tweet as totally neutral, whereas in the second, some of them pointed out that the twitterer is most likely angry or upset. Even though, it is quite hard to tell whether it is the case or not, we agree with the annotator on the fact that the second tweet might be sentimental, regardless of what sentiment is present, while the first one is neutral.

Out of the variety of punctuation features, after our prelim- inary experiments, we decided to use the following ones: • The number of full stop marks, • The number of exclamation mark, • The number of Question Marks, • The total number of words, • Number of quotation marks.

VOLUME 5, 2017 20633

M. Bouazizi, T. Ohtsuki: Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter

3) SYNTACTIC AND STYLISTIC FEATURES In addition to the aforementioned sets of features, we also extract features related to the use of words. We first extract the ratios of nouns, verbs, adjectives and adverbs in the tweets (out of all the words, including hashtags, symbols, etc.). We also check whether or not the twitterer employed the comparative and/or the superlative forms.

Furthermore, our experiments showed the usefulness of the following features as good indicators of sentiment polarity, as well as the sentiment class for some of them: • The total number of particles, • The total number of interjections • The total number of pronouns, that of pronouns of group I and II separately,

• The use of negation, • The use, and the total number of uncommon words.

4) SEMANTIC FEATURES Semantic features are features that focus on the meanings in the language or the logic inside of the sentences. While these features have not all been added, we used few of the existing ones, including: • The use of opinion words, • The use of highly sentimental words, • The use of uncertainty words, • The use of the passive form of speech.

5) UNIGRAM FEATURES VS TOP WORDS FEATURES ‘‘Unigram features’’, as described above, are numeric fea- tures that rely on WordNet to be extracted. In brief, a set of seed words for each sentiment class is provided and we use WordNet to enrich them. We then extract N features (where N is the number of sentiments) by counting, for a given tweet, how many words from each set exist in it.

‘‘Top words’’, on the other hands, are words that are extracted from the training set itself. From all the training tweets of a given sentiment S, we collect the most commonly used words while making sure that the words extracted are ones that show the given sentiment (i.e., that the number of occurrences of any word in the tweets of the sentiment S is higher enough than its occurrences in the tweets of the other sentiments). These words are used later as indicators (features) to detect the sentiment of a given tweet.

However, given the nature of these two sets of features, a huge part of the words will overlap, and create a useless redundancy that we do not need. Therefore, for the sake of this work, we discarded ‘‘Top Words features’’, and focused on what we qualified as ‘‘Unigram Features’’.

We started with 6 sets of words (i.e., for all the sentiments except the sentiment ‘‘Neutral’’ containing in total 486 words, with an average number of 81 words for each sentiment. The initial set of words contains an overlapping equal to 0 between words of sentiments of opposite polarities, while we tolerated some overlapping for sentiment of the same polarities (e.g., the word ‘‘enjoy’’ is a seed word for both sentiments ‘‘happiness’’ and ‘‘fun’’). The words selected can be nouns, verbs, adjectives and adverbs.

Judging from the Fig. 21, the overlapping (or duplication) of words in different sentiments including that in sentiments of different polarities increases rapidly. Even though, these words are being removed automatically, the duplication is a crucial indicator of where to stop continuing collecting the words. In this work, we were restricted to a depth equal to 2.

As described above, we use the resulted sets of words to extract 6 features, by counting the occurrences of the words in the tweet to classify, taking into consideration the score of the words.

FIGURE 21. Number of unigrams collected from WordNet using the seed words proposed.

20634 VOLUME 5, 2017

M. Bouazizi, T. Ohtsuki: Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter

6) PATTERN-BASED FEATURES As described in Section IV, patterns are used as a comple- mentary set of features to detect what unigrams cannot detect: while in most of the cases, sentimental words are enough to tell the sentiment of a sentence, in other cases, the person employs some specific longer expressions to express his sen- timent. For example, the following tweet shows sentiments of happiness without employing any sentimental word showing explicitly happiness: ‘‘You took me to the world I always dreamt of!!! Thank you

soooo much!’’ Even though the word ‘‘thank’’ refers to a positive attitude

or sentiment, the tweets contains sentiments of happiness that the twitterer shows, and thanks her friend for.

To detect such expressions and learn them, we refer to patterns of speech.

We basically divide the PoS tags into three categories: a first one, referred to as EI, containing words which might have emotional content, a second one, referred to as ‘‘CI’’, containing non emotional words whose content is important and a third one, referred to as ‘‘GFI’’, containing the words whose grammatical function is important. If a word belongs to the first category, it is replaced by the corresponding expression shown in TABLE 3 along with its polarity (e.g., the word ‘‘good’’ would be replaced by POS-ADJECTIVE); if it belongs to the second, it is lemmatized and replaced by its lemma; and if it belongs to the third, it is replaced by the corresponding expression shown in TABLE 3.

TABLE 3. Expressions used to replace the words of EI and GFI.

As mentioned above, the classification into categories is done based on the PoS-tag of the word. The list of part-of- speech tags and their category is given in TABLE 4.

TABLE 4. Part-of-speech tag categories.

We generate the vector of words for each tweet as defined. For example, the following PoS-tagged tweet

‘‘He_PRP is_VBP dummy_JJ, _, why_WP would_VBD you_PRP think_VBP I_PRP want_VBP to_TO go_VB with_IN him_PRP !!!!_.’’ gives, among others, the following pattern vector [PRONOUN VERB NEG-ADJECTIVE . why VERB PRONOUN VERB PRONOUN POS-VERB to VERB with PRONOUN .] that can be later used to generate smaller patterns following the rules defined (i.e., minimal and maxi- mal lengths of patterns).

In this work, we opted for the use of patterns of different lengths, so that the features created are small in number to make the classification task run faster.

Based on our previous work [25] and with few adjustments, we set that the optimal values for Nocc, Lmin, Lmax, α and γ as follows: 



Nocc = 3, Lmin = 3, Lmax = 10, α = 0.1, γ = 0.02,

On the other hand the parameter K has been introduced in this work since we noticed a high imbalance between the number of patterns for each class. Fig. 22 shows the classi- fication accuracy using pattern-based features for different values of K. According to the figure, the optimal value is 5. Higher values enhance the accuracy during cross-validation, but have no big impact on that of the test set.

In the next section, we evaluate the model we built, and present the results of our experiments in the cases of binary, ternary and multi-class classification.

VI. EXPERIMENTAL RESULTS After the extraction of features, we run different test using ‘‘Random Forest’’ [27] classifier. We use 4 Key Perfor- mance Indicators (KPIs) to evaluate the effectiveness of our approach: Accuracy, Precision, Recall and F-measure: • Accuracy refers to the overall correctness of classi- fication. It measures the ratio of correctly classified instances over the total number of instances.

• Precision refers to the fraction of the tweets correctly classified, for a given sentiment, over the total number of tweets classified as belonging to that sentiment.

• Recall refers to the fraction of tweets correctly clas- sified, for a given sentiment, over the total number of tweets actually belonging to that sentiment. In other words, for one sentiment, this KPI is nothing different from its accuracy.

• F-measure is defined as follows:

F-measure = 2 · precision · recall precision+ recall

. (3)

A. BINARY CLASSIFICATION We first run our experiment to detect the sentiment polarity of tweets. For this sake, we remove the tweets belonging to the class ‘‘Neutral’’, and grouped the other classes into

VOLUME 5, 2017 20635

M. Bouazizi, T. Ohtsuki: Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter

FIGURE 22. Accuracy of classification using pattern-based features for different value of K .

two main classes which are ‘‘Positive’’ and ‘‘Negative’’. The former class contains tweets from the classes ‘‘Fun’’, ‘‘Happi- ness’’ and ‘‘Love’’, while the latter contains tweets from the classes ‘‘Hate’’, ‘‘Anger’’ and ‘‘Sadness’’. TABLE 5 shows the results of classification. The accuracy obtained reaches 81.3%. Noticeably, the recall of negative tweets is the highest (i.e., 83.5%), however the precision of positive tweets is the highest (i.e., 82.0%). This means that tweets which are clas- sified as positive are mostly positive. However, tweets which have negative polarity tend to be classified more correctly as shown in the confusion matrix presented in TABLE 6.

TABLE 5. Binary classification Accuracy, Precision, Recall and F-measure.

TABLE 6. Binary classification confusion matrix.

The classification presents a noticeably low accuracy com- pared with that of our previous work [25]. This is because in our previous work, we exploited the information regarding the detailed sentiment class for unigram features and pattern features. In other words, when we extracted the features from the training and the test set, we counted unigrams belonging to the classes ‘‘Happiness’’, ‘‘Love’’, ‘‘Anger’’, etc. on tweets of the training set and the test set. Furthermore, we extracted patterns related to these detailed sentiments and used them

to measure the resemblance between the training and the test tweets. While that was fair and acceptable given the fact that we dispose of a training set with the detailed sentiment sub- classes, for a more general case, where a person wants to classify tweets into ‘‘Positive’’ and ‘‘Negative’’, such infor- mation might not be provided, and so the training set will contain tweets classified only as ‘‘Positive’’ and ‘‘Negative’’. Therefore, in this work, we used the training set as a set of tweets having initially only two classes: only two unigram features are extracted, and patterns are also extracted from the training set in only two subsets: positive patterns and negative patterns.

B. TERNARY CLASSIFICATION Despite its importance, binary classification supposes that the given data are already known to be emotional. However, Twitter contains many tweets which have no emotional polar- ity such as news tweets, etc. Therefore, in this subsection we add neutral tweets as shown before in the description of our dataset. We then rely on the same set of features to classify the tweets. As described previously, no information regarding the sentiment sub-class is given or exploited here. The results obtained are given in TABLE 7, and the confusion matrix of classification is given in TABLE 8.

TABLE 7. Ternary classification Accuracy, Precision, Recall and F-measure.

The obtained results show that the introduction of the third class decreases noticeably the accuracy to reach 70.1%.

20636 VOLUME 5, 2017

M. Bouazizi, T. Ohtsuki: Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter

TABLE 8. Ternary classification confusion matrix.

The new class (i.e., ‘‘Neutral’’) presents a low accuracy and a low precision. This can be explained by the fact that the amount of training data (i.e., number of tweets) for this class is lower than that for the other classes. In addition, tweets, regardless of their content tend to be polarized (i.e., either classified as positive or as negative). This is because most of the features used, except for the pattern features, are ones that try to detect any sentimental component in a given tweet, or find any resemblance of the tweet to ones in the training set (which is highly unbalanced in favor of the sentimental classes over the neutral class).

Overall, the results obtained are promising.

C. MULTI-CLASS CLASSIFICATION In this subsection, we use the 7 sentiment classes that we described in Section V. The classification results are given in TABLE 9, while the confusion matrix is given in TABLE 10.

TABLE 9. Multi-class classification Accuracy, Precision, Recall and F-measure.

TABLE 10. Multi-class classification confusion matrix.

Despite the number of classes, the accuracy obtained is equal to 60.2%, with a precision that reaches 60.8%. More interestingly, some sentiments seem to be easier to detect than others. In particular, tweets belonging to the class ‘‘Love’’ and those belonging to the class ‘‘Hate’’ were classified with an accuracy equal to 75.2% and 90.9% respectively. This shows that tweets belonging to these classes are easily distinguished from other classes. This might be due to the fact that other classes, such as ‘‘Happiness’’ and ‘‘Fun’’ for example are very close to each other. Therefore, many tweets of one class are classified as if they belong to the others.

The class ‘‘Neutral’’ on the other side, presents the low- est precision. Many tweets, from all the other classes were classified as neutral. While this does not go along with our observations on [25]. We believe that the main difference is that our current training set presents a cleaner reference for training. The training set used in [25] contains a lot of noise, and most of the noisy data are mainly neutral, but are used for the other classes, which resulted in a misclassification of most of the neutral tweets, and made the class ‘‘Neutral’’ present a very low recall.

D. DISCUSSION Classifying tweets is, to begin with, a difficult task given the very limited size of tweets. The challenges presented in Section II were tackled by many researchers, however, remain still not completely solved. With reference to this work, we can confirm that classifying tweets into separate sentiment classes is a challenging task: as mentioned above, many tweets present more than one sentiment. Therefore, a more interesting task would be quantifying the sentiments present in the tweet: a tweet should be attributed more than one sentiment with different scores. The sentiments attributed will represent all the existing sentiments detected in the tweet, whereas the scores will represent the estimated weight of the detected sentiment. We strongly believe that this would allow to have a more accurate description of the sentiments in the tweet, and solves the main issue that we encountered in this work, which is the existence of multiple sentiments in the tweet.

On a related context, even though we have ran several experiments on our dataset, we cannot confirm that the values set for the parameters defined are the optimal ones. SENTA presents more than 12 different parameters, for the different sets of features. We tried to optimize each set of parame- ters, related to the same family of features aside, however, this could be a non-optimal solution given the fact that the machine learning algorithm used (i.e., Random Forest) does not consider the features independently. It rather builds the model with reference to all the features combined. However, it is unpractical, and almost impossible to try all the combi- nations of features to derive the optimal ones, that give the highest accuracy.

Regarding the test set used itself, its manual annotation was done on crowdflower.3 Several annotators from different backgrounds participated in the annotation. To check the per- formance of the annotators, we randomly picked 300 tweets, annotated them, and compared the results with those done by the random annotators. Interestingly, the sentiment polarity (whether the tweet is positive, negative or neutral) of 91.3% of the tweets was agreed on. However, when it came to the detection of the sentiment itself, the rate of agreement dropped to 72%. However, for many of the non-agreed on tweets, we understood why the annotators decided to attribute one sentiment over another, and this goes back to the issue

3https://www.crowdflower.com

VOLUME 5, 2017 20637

M. Bouazizi, T. Ohtsuki: Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter

we highlighted earlier: the existence of multiple sentiments within the same tweet.

VII. CONCLUSION In this paper, we have proposed a new approach for sentiment analysis, where a set of tweets is to be classified into 7 dif- ferent classes. The obtained results show some potential: the accuracy obtained for multi-class sentiment analysis in the data set used was 60.2%. However, we believe that a more optimized training set would present better performances.

Throughout this work, we demonstrated that multi-class sentiment analysis can achieve a high accuracy level, but it remains a challenging task. A more interesting task is to quantify sentiments present in the tweet. Therefore, in a future work, we will use the results obtained for ternary classification (which achieved an accuracy equal to 70.1%) to classify tweets into ‘‘Positive’’, ‘‘Negative’’ and ‘‘Neutral’’. The classified sentimental tweets (i.e., which have been clas- sified as ‘‘Positive’’ or ‘‘Negative’’) will then be given scores for the corresponding sentiment subclasses.

ACKNOWLEDGMENT The research results have been achieved by ‘‘Cognitive Security: A New Approach to Securing Future Large Scale and Distributed Mobile Applications,’’ the Commissioned Research of National Institute of Information and Commu- nications Technology (NICT), JAPAN.

REFERENCES [1] B. O’Connor, R. Balasubramanyan, B. Routledge, and N. Smith, ‘‘From

tweets to polls: Linking text sentiment to public opinion time series,’’ in Proc. Int. AAAI Conf. Weblogs Social Media, May 2010, pp. 26–33.

[2] M. A. Cabanlit and K. J. Espinosa, ‘‘Optimizing N-Gram based text feature selection in sentiment analysis for commercial products in Twitter through polarity lexicons,’’ in Proc. 5th Int. Conf. Inform., Intell., Syst. Appl., Jul. 2014, pp. 94–97.

[3] U. R. Hodeghatta, ‘‘Sentiment analysis of Hollywood movies on Twitter,’’ in Proc. IEEE/ACM ASONAM, Aug. 2013, pp. 1401–1404.

[4] J. M. Soler, F. Cuartero, and M. Roblizo, ‘‘Twitter as a tool for predicting elections results,’’ in Proc. IEEE/ACM Int. Conf. Adv. Social Netw. Anal. Mining (ASONAM), Aug. 2012, pp. 1194–1200.

[5] A. Java, X. Song, T. Finin, and B. Tseng, ‘‘Why we Twitter: Understanding microblogging usage and communities,’’ in Proc. 9th WebKDD 1st SNA- KDD Workshop Web Mining Social Netw. Anal., Aug. 2007, pp. 56–65.

[6] K. Ghag and K. Shah, ‘‘Comparative analysis of the techniques for senti- ment analysis,’’ in Proc. Int. Conf. Adv. Technol. Eng., Jan. 2013, pp. 1–7.

[7] Y. R. Tausczik and J. W. Pennebaker, ‘‘The psychological meaning of words: LIWC and computerized text analysis methods,’’ J. Lang. Social Psychol., vol. 29, no. 1, pp. 24–54, Dec. 2010.

[8] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, ‘‘The WEKA data mining software: An update,’’ SIGKDD Explorations Newslett., vol. 11, no. 1, pp. 10–18, Jun. 2009.

[9] C. G. Akcora, M. A. Bayir, M. Demirbas, and H. Ferhatosmanoglu, ‘‘Iden- tifying breakpoints in public opinion,’’ in Proc. 1st Workshop Social Media Anal., Jul. 2010, pp. 62–66.

[10] B. Sriram, D. Fuhry, E. Demir, H. Ferhatosmanoglu, and M. Demirbas, ‘‘Short text classification in Twitter to improve information filtering,’’ in Proc. 33rd Int. ACM SIGIR Conf. Res. Develop. Inf. Retr., Jul. 2010, pp. 841–842.

[11] B. Pang, L. Lillian, and V. Shivakumar, ‘‘Thumbs up?: Sentiment clas- sification using machine learning techniques,’’ in Proc. ACL-02 Conf. Empirical Methods Natural Lang. Process., vol. 10, pp. 79–86, Jul. 2002.

[12] M. Boia, B. Faltings, C.-C. Musat, and P. Pu, ‘‘A :) Is worth a thousand words: How people attach sentiment to emoticons and words in tweets,’’ in Proc. Int. Conf. Social Comput., Sep. 2013, pp. 345–350.

[13] K. Manuel, K. V. Indukuri, and P. R. Krishna, ‘‘Analyzing Internet slang for sentiment mining,’’ in Proc. 2nd Vaagdevi Int. Conf. Inform. Technol. Real World Problems, Dec. 2010, pp. 9–11.

[14] W. Gao and F. Sebastiani, ‘‘Tweet sentiment: From classification to quan- tification,’’ in Proc. IEEE/ACM Int. Conf. Adv. Social Netw. Anal. Min- ing (ASONAM), Aug. 2015, pp. 97–104.

[15] Y. H. P. P. Priyadarshana, K. I. H. Gunathunga, K. K. A. N. N. Perera, L. Ranathunga, P. M. Karunaratne, and T. M. Thanthriwatta, ‘‘Sentiment analysis: Measuring sentiment strength of call centre conversations,’’ in Proc. IEEE ICECCT, Mar. 2015, pp. 1–9.

[16] R. Srivastava and M. P. S. Bhatia, ‘‘Quantifying modified opinion strength: A fuzzy inference system for sentiment analysis,’’ in Proc. Int. Conf. Adv. Comput., Commun. Informat., Aug. 2013, pp. 1512–1519.

[17] K. H.-Y. Lin, C. Yang, and H.-H. Chen, ‘‘What emotions do news articles trigger in their readers?’’ in Proc. ACM SIGIR, Jul. 2007, pp. 733–734.

[18] K. H.-Y. Lin, C. Yang, and H.-H. Chen, ‘‘Emotion classification of online news articles from the reader’s perspective,’’ in Proc. IEEE/WIC/ACM WI- IAT, vol. 1. Dec. 2008, pp. 220–226.

[19] L. Ye, R. Xu, and J. Xu, ‘‘Emotion prediction of news articles from reader’s perspective based on multi-label classification,’’ in Proc. Int. Conf. Mach. Learn. Cybern., vol. 5. Jul. 2012, pp. 2019–2024.

[20] W. B. Liang, H. C. Wang, Y. A. Chu, and C. H. Wu, ‘‘Emoticon rec- ommendation in microblog using affective trajectory model,’’ in Proc. Asia–Pacific Signal Inf. Proc. Assoc. Ann. Summit Conf. (APSIPA), Dec. 2014, pp. 1–5.

[21] R. Xia, F. Xu, C. Zong, Q. Li, Y. Qi, and T. Li, ‘‘Dual sentiment analysis: Considering two sides of one review,’’ IEEE Trans. Knowl. Data Eng., vol. 27, no. 8, pp. 2120–2133, Aug. 2015.

[22] C. Fellbaum, WordNet: An Electronic Lexical Database. Cambridge, MA, USA: MIT Press, 1998.

[23] M. Bouazizi and T. Ohtsuki, ‘‘Sarcasm detection in Twitter: ‘All your products are incredibly amazing!!!’—Are they really?’’ in Proc. IEEE Globecom, Dec. 2015, pp. 1–6.

[24] D. Davidov, O. Tsur, and A. Rappoport, ‘‘Semi-supervised recognition of sarcastic sentences in Twitter and Amazon,’’ in Proc. 14th Conf. Comput. Natural Lang. Learn., Jul. 2010, pp. 107–116.

[25] M. Bouazizi and T. Ohtsuki, ‘‘Sentiment analysis: From binary to multi- class classification: A pattern-based approach for multi-class sentiment analysis in Twitter,’’ in Proc. IEEE ICC, May 2016, pp. 1–6.

[26] M. Bouazizi and T. Ohtsuki, ‘‘Sentiment analysis in Twitter: From clas- sification to quantification of sentiments within tweets,’’ in Proc. IEEE GLOBECOM, May 2016, pp. 1–6.

[27] L. Breiman, ‘‘Random forest,’’ Mach. Learn., vol. 45, no. 1, pp. 5–32, Jan. 2001.

MONDHER BOUAZIZI received the Bachelor Engineering Diploma in communications from SUPCOM, Carthage University, Tunisia, in 2010, and the master’s degree from Keio University in 2017, where he is currently pursuing the Ph.D. degree. He was a Telecommunication Engineer (access network quality and optimization) for three years with Ooredoo Tunisia.

20638 VOLUME 5, 2017

M. Bouazizi, T. Ohtsuki: Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter

TOMOAKI OHTSUKI (OTSUKI) (SM’01) received the B.E., M.E., and Ph.D. degrees in electrical engineering from Keio University, Yoko- hama, Japan, in 1990, 1992, and 1994, respec- tively. From 1994 to 1995, he was a Post-Doctoral Fellow and a Visiting Researcher in electrical engineering with Keio University. From 1993 to 1995, he was a Special Researcher of Fellowships of the Japan Society for the Promotion of Sci- ence for Japanese Junior Scientists. From 1998 to

1999, he was with the Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA, USA. From 1995 to 2005, he was with the Tokyo University of Science. In 2005, he joined Keio University. He is currently a Professor with Keio University. He has authored or co-authored over 140 journal papers and 340 inter- national conference papers. He is involved in research on wireless com- munications, optical communications, signal processing, and information theory.

He is a fellow of the IEICE. He was a recipient of the 1997 Inoue Research Award for Young Scientist, the 1997 Hiroshi Ando Memorial Young

Engineering Award, the Ericsson Young Scientist Award 2000, the 2002 Funai Information and Science Award for Young Scientist, the IEEE the 1st Asia-Pacific Young Researcher Award 2001, the 5th International Com- munication Foundation Research Award, the 2011 IEEE SPCE Outstanding Service Award, the 27th TELECOM System Technology Award, the ETRI Journal’s 2012 Best Reviewer Award, and the 9th International Conference on Communications and Networking in China 2014 (CHINACOM ’14) Best Paper Award. He gave tutorials and the keynote speech at many international conferences, including IEEE VTC, IEEE PIMRC, and so on. He was a Vice President of the Communications Society of the IEICE. He served as a Chair of the IEEE Communications Society, Signal Processing for Com- munications and Electronics Technical Committee. He served a Technical Editor of the IEEE Wireless Communications Magazine and an Editor of Elsevier Physical Communications. He is currently serving an Area Editor of the IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY and an Editor of the IEEE COMMUNICATIONS SURVEYS AND TUTORIALS. He has served as general co-chair and symposium co-chair of many conferences, including IEEE GLOBECOM 2008, SPC, IEEE ICC2011, CTS, IEEE GCOM2012, SPC, and IEEE SPAWC.

VOLUME 5, 2017 20639