helpfn

A_Hybrid_Framework_for_Sentiment_Analysis_Using_Genetic_Algorithm_Based_Feature_Reduction.pdf

Home >Computer Science homework help >helpfn

Received December 27, 2018, accepted January 7, 2019, date of publication January 21, 2019, date of current version February 8, 2019.

Digital Object Identifier 10.1109/ACCESS.2019.2892852

A Hybrid Framework for Sentiment Analysis Using Genetic Algorithm Based Feature Reduction FARKHUND IQBAL 1, JAHANZEB MAQBOOL HASHMI2, BENJAMIN C. M. FUNG 3, (Senior Member, IEEE), RABIA BATOOL1, ASAD MASOOD KHATTAK1, SAIQA ALEEM 1, AND PATRICK C. K. HUNG 4 1College of Technological Innovation, Zayed University, Abu Dhabi 144534, United Arab Emirates 2School of Electrical Engineering and Computer Science, National University of Sciences and Technology, Islamabad 44000, Pakistan 3School of Information Studies, McGill University, Montreal, QC H3A 0G4, Canada 4University of Ontario Institute of Technology, Oshawa, ON L1G 0C5, Canada

Corresponding author: Farkhund Iqbal ([email protected])

This work was supported by the Research Incentive Fund under Grant R15048, by the Research Clusters under Grant R17082 and Grant R16083, and by the Zayed University, United Arab Emirates.

ABSTRACT Due to the rapid development of Internet technologies and social media, sentiment analysis has become an important opinion mining technique. Recent research work has described the effectiveness of different sentiment classification techniques ranging from simple rule-based and lexicon-based approaches to more complex machine learning algorithms. While lexicon-based approaches have suffered from the lack of dictionaries and labeled data, machine learning approaches have fallen short in terms of accuracy. This paper proposes an integrated framework which bridges the gap between lexicon-based and machine learning approaches to achieve better accuracy and scalability. To solve the scalability issue that arises as the feature-set grows, a novel genetic algorithm (GA)-based feature reduction technique is proposed. By using this hybrid approach, we are able to reduce the feature-set size by up to 42% without compromising the accuracy. The comparison of our feature reduction technique with more widely used principal component analysis (PCA) and latent semantic analysis (LSA) based feature reduction techniques have shown up to 15.4% increased accuracy over PCA and up to 40.2% increased accuracy over LSA. Furthermore, we also evaluate our sentiment analysis framework on other metrics including precision, recall, F-measure, and feature size. In order to demonstrate the efficacy of GA-based designs, we also propose a novel cross-disciplinary area of geopolitics as a case study application for our sentiment analysis framework. The experiment results have shown to accurately measure public sentiments and views regarding various topics such as terrorism, global conflicts, and social issues. We envisage the applicability of our proposed work in various areas including security and surveillance, law-and-order, and public administration.

INDEX TERMS Classifier, feature optimization, genetic algorithm, machine learning, sentiment analysis.

I. INTRODUCTION The Internet and associated web technologies have dramat- ically changed the way our society works [1]. Social net- works such as Facebook and Twitter have become com- monplace for exchanging ideas, sharing information, pro- moting business and trade, running political and ideolog- ical campaigns, and promoting products and services [2]. Social media is generally studied from different perspectives i.e., collecting business intelligence for products and ser- vices promotion, monitoring malicious activities for detect- ing and mitigating cyber-threats, and sentiment analysis for

analyzing people’s feedback and reviews. Sentiment analysis, often referred as opinion mining, is the extraction, identifi- cation, or characterization of the sentiment from text using Natural Language Processing (NLP), statistics, or machine learning (ML) methods [3]. The field of sentiment analy- sis has been widely studied by researchers during the last few years [4] [5]. In this context, several approaches have been proposed, developed, and tested [3]. The most common approach is ML which needs a significant dataset for training and learning the association between different aspects and sentiments. Furthermore, ML-based models usually target a

VOLUME 7, 2019 2169-3536 2019 IEEE. Translations and content mining are permitted for academic research only.

Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

14637

https://orcid.org/0000-0002-9903-4862

https://orcid.org/0000-0001-9081-3598

https://orcid.org/0000-0001-8423-2906

https://orcid.org/0000-0002-3385-0613

F. Iqbal et al.: Hybrid Framework for Sentiment Analysis Using GA-Based Feature Reduction

simple global classification, rather than individual aspects of the reviewed product. There are three major techniques being used for sentiment analysis; ML, lexicon-based, and rule-based approach [6]. ML methods use different learning algorithms and labeled dataset to train the classifier and to determine the sentiment [7]. The lexicon-based approach involves calculating the sentiment polarity of text using the semantic orientation of words or sentences [8]. The semantic orientation is a measure of subjectivity and opinion in text. The rule-based approach looks for opinion words in the text and then classifies it based on the number of positive and neg- ative words [9]. It considers different rules for classification such as dictionary polarity, booster words, negation words, idioms etc.

Sentiment analysis is mostly discussed in the context of product reviews like; Is this product review positive or neg- ative? Are customers satisfied or dissatisfied? Furthermore, it also helps to answer the Business Intelligence related ques- tions like; Why aren’t consumers buying our product? How- ever, cross-domain insights and applications of sentiment analysis are scarce [10]. The examples of such applications include analysis of user opinion on the politics, sociology, and the psychology of society.

The existing research on sentiment analysis focuses on three different approaches individually. Thus, it is evident that there is a wide gap in terms of integrated tools and techniques for sentiment analysis which allow users to plug, play, and test different algorithms and optimizations based on customized preferences and parameters. In the light of this discussion, we clearly see a growing need for an integrated sentiment analysis tool which should fill the gap presented in the previous research.

This paper proposes a hybrid approach to sentiment anal- ysis which employs state-of-the-art ML algorithms and lex- ical databases to automatically analyze archives of online documents (e.g., reviews, chats, and social media data). We propose a novel Genetic Algorithm (GA) based solution to feature reduction problem by developing a customized fitness function. The fitness function utilizes SentiWord- Net [11] lexicon to calculate the polarity difference between a class label and feasible feature vector (potential solution). To the best of our knowledge, we are the first to employ such a hybrid approach with GA based optimized feature selection. This evolutionary approach for optimal feature selection results in increased accuracy and better scalability. The customized fitness function shows up to 42% reduced feature-set without any compromise on overall accuracy. Furthermore, in order to demonstrate the feasibility of the proposed feature reduction algorithm, we also perform a detailed comparison with other feature reduction algorithms including PCA [12] and LSA [13] which results in our system having up to 15.4% increased accuracy over PCA and up to 40.2% increased accuracy over LSA. PCA is a dimension- ality reduction procedure that simplifies the complexity in high-dimensional data by reducing a large set of variables to a small set that still retains information and trends present

in data. It projects a set of points onto a smaller dimensional affine subspace of ‘‘best fit’’. LSA is a method used in NLP that discovers a data representation which has a lower dimension than the original semantic space by analyzing relationships between documents and its terms. It decreases the dimension using a mathematical technique called singular value decomposition (SVD).

The second contribution of this work lies in the novelty of the proposed application area in the geopolitical con- text. There is a lack of modern sentiment analysis tools which provide insights into the cross-disciplinary domain of geopolitics. Hence available insights about people’s opinions on social media, magnified in a political context, and their impact on several uprisings in parts of the world are scarce. The notable examples of such uprisings include London Riots, Occupy Wall Street, the Egyptian revolution, and Arab Spring. We aim to cater this problem by discussing our pro- posed framework in the context of user opinion in associ- ation with geopolitical uprisings or conflicts. The proposed framework classifies user’s opinions based on their political affiliations. In addition to that, the extracted sentiments can be used for cyber-intelligence [14]. This could also be helpful in rooting out any foreign element involved in assisting the local uprisings, hence making it beneficial for the security agencies. An interesting implication of sentiment analysis and opinion mining would help governments to keep a watch on the growing trends of any political uprising. It can also be helpful in the forensic investigation of criminals, identity tracing, and criminal networks mining.

The major contributions of our work are as follows; • We design, develop, and evaluate a hybrid senti- ment analysis framework by combining ML and lexicon-based approaches in order to solve the limita- tions of each method.

• We propose a novel feature reduction algorithm by employing a GA based approach with a customized fitness function. The fitness function utilizes Senti- WordNet to evaluate feasible solutions which result in improved system scalability.

• We analyze our proposed method which shows improved accuracy as compared to the state-of-the-art feature reduction algorithms.

• We propose a novel application area of cross-disciplinary geopolitical analysis as a case study application to our framework to measure public sentiments and views regarding various topics such as terrorism, global con- flicts, social issues etc.

To show the results of the proposed approach, a series of experiments is performed using three different types of dataset. One is UCI ML Repository’s Sentiment Analysis dataset [15] which consists of reviews data from IMDB, Amazon, and Yelp. Second data is Twitter dataset from [16] while the third dataset is a geopolitical dataset related to 2016 United States Presidential Election [17]. The evaluation is based on several parameters including precision, recall, F-measure, scalability, and accuracy. Furthermore, we have

14638 VOLUME 7, 2019

F. Iqbal et al.: Hybrid Framework for Sentiment Analysis Using GA-Based Feature Reduction

also provided a run-time analysis of our GA based feature reduction algorithm.

The remainder of this paper is structured as follow. Section II presents the related work in the area of sen- timent analysis, text mining, and forensic investigation. Section III consists of the proposed methodology and frame- work design. Experimental design and discussion are pro- vided in Section IV. Section V presents the conclusion and possible future work.

II. RELATED WORK In this section we discuss the prominent related research being carried out in the area of sentiment analysis and text mining. Our comparison criteria is based on the two fac- tors we discussed before; integration of sentiment analysis approaches in a unified way and a cross-disciplinary appli- cation area. We are interested to see how user’s opinion and his/her social behavior can be helpful in analyzing the current geopolitical situation and uprising.

Medhat et al. [18] presented a comprehensive overview of the recently proposed algorithms, enhancements, and appli- cations in the area of sentiment analysis. They also dis- cussed the related fields to sentiment analysis e.g., transfer learning, emotion detection, and building resources. They tried to give a full image of the sentiment analysis tech- niques and related fields with brief details. Khan et al. [19] proposed a rule-based domain-independent method which classifies subjective and objective sentences from reviews and blog comments. SentiWordNet is used to calculate the score and to determine the polarity. They showed that their proposed method is effective and it outperforms ML-based methods with an accuracy of 76.8% at the feedback level and 86.6% at the sentence level. Our proposed approach is aligned with these studies as we are also focusing on ML and lexicon-based methods. However, we are employing GA based optimized feature selection for training ML algorithms.

Agarwal et al. [20] examined sentiment analysis on Twit- ter data. They introduced POS-specific prior polarity fea- tures and explored the use of a tree kernel to obviate the need for tedious feature engineering. Their new fea- tures and the tree kernel performed almost at the same level and both outperformed the state-of-the-art baseline techniques. Kouloumpis et al. [21] investigated the utility of linguistic features for detecting the sentiment of Twit- ter messages. They evaluated the usefulness of the exist- ing lexical resources as well as the creative language used in microblogging. Devies and Ghahramani [4] presented a language-independent model for sentiment analysis for short text forms e.g., social networks statuses. They used Twitter datasets to model happy and sad sentiments and showed that their system performed 10% better than Naive Bayes (NB) model. These three papers are employing sentiment analysis on short-text data i.e., SMS, tweets etc.

Similarly, Pontiki et al. [22] described the aspect based sentiment analysis. They identified the aspects of given tar- get entities and the sentiment expressed for each aspect.

They used manually annotated reviews of restaurants and laptops as a dataset. Njolstad et al. [23] proposed, defined, and evaluated four different feature categories composed of 26 article features for sentiment analysis. They used five different ML methods to train sentiment classifier of Norweign financial internet news articles. They achieved classification precision up to 71%. When comparing ML classifiers, they found that J48 yielded the highest perfor- mance closely followed by Random Forest (RF). We have also presented a similar comparison in which we compared different classifiers and their accuracy on our system. How- ever, we extended our evaluation by including GA optimized features in comparison.

Govindarajan [24] proposed a hybrid classification method based on integration classification methods using arcing clas- sifier. They analyzed the performance in terms of accuracy. They designed classifier ensemble using NB and GA. They evaluated the effectiveness of ensemble technique for senti- ment analysis. Finally, they evaluated the performance under different performance metrics using movie reviews datasets. However, they do not compare the performance of different classifiers and do not provide any optimization for feature size reduction.

As we observe that most of the related work employed independent techniques for sentiment analysis while using few evaluation metrics. Furthermore, they do not provide the user with the freedom to choose different algorithms, classifiers, and optimizations according to customized needs. In contrast, our proposed framework bridges the gap between sentiment analysis and geopolitical intelligence by providing 1) a unified framework having the facility to plug different algorithms, cross-validation, and optimized feature selection 2) a two-dimensional analysis on public opinions in asso- ciation with political uprisings by combining security and opinion mining.

III. PROPOSED FRAMEWORK In this work, a unified framework has been developed which includes all the components required in sentiment analysis. This modular method provides different approaches to senti- ment analysis with a focus on optimizations.

The proposed framework consists of different modules which govern the internal working of the system. In order to automate the entire framework, we employ a pipeline based approach in which different modules ranging from data cleaning, preprocessing, GA, feature generation and selection, and sentiment analysis are performed in pipeline fashion. Figure 1 explains the sentiment analysis pipeline of

FIGURE 1. Sentiment analysis framework pipeline.

VOLUME 7, 2019 14639

F. Iqbal et al.: Hybrid Framework for Sentiment Analysis Using GA-Based Feature Reduction

Algorithm 1 SLANG_REMOVAL

/* Removes slang from a given text */ Input: T: text from file output: τ: updated text T ← T .toLowerCase (); /* simple string tokenizer */ String[] L ← T .split(‘‘ ′′) /* get slangs from dictionary */ Set < String > slangKey ← slangs.keySet() foreach ti ∈ L do

if slangKey contains ti then /* update the token in text */ ti ← slangs.get(ti)

end end /* update list */ foreach ti ∈ L do

τ = ti + ‘‘ ′′

end return τ

the whole framework. There are mainly three stages of the framework; data cleaning, data pre-processing, and analysis engine. The algorithms and the internal working of each module are explained in the following section. Definition 1 (Polarity Score): The sentiment score of a

given word as determined by SentiWordNet ontology. The score is from 0 to 1.0 ranging from extremely negative to extremely positive sentiment. �

A. DATA CLEANING Data cleaning is the first module in the processing pipeline of this framework. In this phase, extracted data is streamed from the files and saved in the memory for cleaning purpose. This stage consists of three sub-stages.

1) GARBAGE REMOVAL In this step, unwanted characters (non-ASCII characters) including URLs, web addresses, and online links are removed from the text using customized regular expressions.

2) SLANG CORRECTION This step involves correcting any slang and abbreviated word that is used in online conversations. We use predefined dic- tionaries and maps to translate slangs or abbreviation to their original and abbreviated form. e.g.‘‘ttyl’’ to ‘‘talk to you later’’ and ‘‘afk’’ to ‘‘away from keyboard’’. This is helpful for later stages because, during sentiment analysis, the abbre- viated words make no sense for analysis engine. The working of this module is explained in Algorithm 1.

3) STOPWORD REMOVAL Stopword removal removes very common words of a lan- guage e.g., ‘‘an’’, ‘‘about’’, ‘‘above’’ etc. These words usually

have no impact on NLP. We use CMU’s Rainbow stopword list [25] for finding any stopword in the data.

B. PREPROCESSING This module includes different NLP tasks i.e., tokenization, word stemming, and part-of-speech tagging.

1) TOKENIZATION Tokenization is the process of breaking a stream of text into words, phrases, symbols, or other meaningful elements called tokens. In order to tokenize the text, LingPipeTokenizer from Apache Lucene package [26] is used which preserves punc- tuations. Initially, we used StringTokenizer but due to the inherent limitations of this tokenizer, we opted for much bet- ter LingPipeTokenizer. An important point to mention is that custom data structures are designed to hold tokens (Keyword) and sentences (list of Keywords) of each document.

2) STEMMING Stemming is the process of reducing inflected word to its base or root word. The framework use porter-2 algorithm [27] to convert each token to its stem form and store in the Key- word object alongside the original token.

3) POS-TAGGING POS tagging is the process of tagging a word in a text as corresponding to a particular part of speech, based on both, its definition and its context. In order to get part-of-speech tags of the words, we use Maxent Tagger from Stanford CoreNLP [28]. Each Keyword object contains an original token, its stem form, and a pos tag associated with this token. Once the data is preprocessed, it is sent to the next module in the pipeline.

C. ANALYSIS ENGINE This is the most vital module of the framework. It includes all the natural language based techniques for sentiment anal- ysis. Each sentence (list of Keywords) is fed to the analysis engine and it produces the aggregated sentiment polarity score of the sentence based on different sentiment analy- sis techniques including lexicon-based, ML using bag-of- words as features, and hybrid approach with feature reduction using GA. A complete architecture of our system is shown in Figure 2. We will explain these approaches in detail in the following subsections.

1) LEXICON-BASED SENTIMENT ANALYSIS In this approach, after preprocessing the data, the polarity score of each token in the document is calculated. In order to calculate the polarity score, the framework uses SentiWordnet lexical database. Furthermore, the score of all the tagged keywords is aggregated on a document level to find the global score and a sentiment value of either positive ‘‘P’’ or negative ‘‘N’’ is assigned. The algorithm for sentiment scoring using SentiWordnet is described in Algorithm 2. The lexicon-based approaches which have proved to have higher accuracy are,

14640 VOLUME 7, 2019

F. Iqbal et al.: Hybrid Framework for Sentiment Analysis Using GA-Based Feature Reduction

FIGURE 2. Proposed sentiment analysis framework architecture.

Algorithm 2 POLARITY_SCORING_SWN

/* calculates aggregated polarity score of a sentence */ Input: S: Sentence (a list of keywords) output: P: Aggregated polarity score sum ← 0 foreach Ti ∈ S do

TTi ← getPosTag(Ti) score ← getSentiWordnetScore(Ti,TTi) sum+= score

end return sum

however, limited in terms of the size of lexical databases i.e., WordNet and SentiWordNet. This is a potential draw- back which we have aimed to cater by employing a hybrid approach of lexicon-based and ML to offset the limitations.

2) ML USING BAG-OF-WORDS AS FEATURES In this approach, we mainly use different ML algorithms to classify sentiment values of given data. For this purpose,

Weka toolkit is used because it contains several classifiers algorithms and its richness in terms of the analysis. We start by modeling our preprocessed data for Weka classi- fiers. In order to model the preprocessed data (list of tokens) and generated feature vector, we employ a bag-of-words approach. This is a basic approach in which we include all the potential keywords in the feature vector. We start by reading each document and add its keywords in a feature- set. Then, we append sentiment value associated with that document as a class label and generate an ARFF file. Finally, we process this ARFF file in Weka toolkit and run prominent classifier algorithms including J48, NB, PART, Sequential Minimal Optimization (SMO), Instance-Based with k-nearest neighbors (IB-k), and JRip. Here is a description of these classifiers. • J48: J48 is a decision tree classifier in which an attribute is selected based on information gain from the train- ing data to build each node of the tree. The selected attributes effectively split a set of training data into subsets enriched in one class or the other. It is mostly used because of its simplicity in explanation and inter- pretation.

• NB: It is a classification technique based on Bayes The- orem. It works with the assumption that all the attributes

VOLUME 7, 2019 14641

F. Iqbal et al.: Hybrid Framework for Sentiment Analysis Using GA-Based Feature Reduction

on the training samples are independent. It is fast and can be used with the small amount of training data. Although it is very simple, it has outperformed many sophisticated classification methods.

• PART: PART is a rule-based classification algorithm which generates a set of rules according to the divide- and-conquer strategy, removes all instances from the training collection that are covered by this rule and proceeds recursively until no instance remains.

• SMO: SMO is used to solve the quadratic programming problem arise in SVM training by breaking the prob- lem into a series of smallest possible problems. Many optimizations are designed to achieve the speed up and algorithm convergence.

• IBk: IBk is among the simplest of all ML algorithms used for classification and regression predictive prob- lems. It is a great choice for classification problems when there is little or no prior knowledge about the distribution data.

• JRip: JRip (RIPPER) is one of the basic and most popu- lar algorithms. It implements a propositional rule learner and reduces the error using the repeated incremental pruning.

The detailed analysis is discussed in the results section.

3) HYBRID METHOD WITH OPTIMAL FEATURE SELECTION In this approach, we use ML algorithms to classify sentiment values of the given data. However, the problem with the previ- ous bag-of-words approach is that it does not scale well since almost 80% of the input data gets included in the feature-set. This problem worsens as the size of the dataset grows bigger. In order to solve this scalability problem, we have devised an efficient technique to reduce the feature-set size.

We propose an evolutionary Genetic Algorithm based approach to evaluate each document and instead of choosing all the keywords, choose a subset of keywords such that the discarded keywords do not impact the overall sentiment score of the document. In other words, we aim to reduce the feature-set size by extracting those keywords that contribute towards the sentiment score of the entire document while excluded keywords make no effect. Once the feature selection is optimized, we use this feature-set to generate ARFF file and consequently perform the analysis using ML classifiers. Definition 2 (Chromosome (Genotype)): A set of parame-

ters which define a proposed solution to the problem that the GA is trying to solve. A chromosome represents a candidate solution. � Definition 3 (Population): A set of chromosomes (candi-

date solutions) that evolves towards a better solution over the certain generations in order to solve the problem. Different genetic operators e.g., mutation, crossover are applied to a population. � Definition 4 (Fitness Function): The core of an evolution-

ary algorithm. It is a particular type of objective func- tion which is responsible for performing the evaluation and returning a ‘‘fitness value’’ that reflects how optimal the

solution is. The fitness value is used to determine which candidate solution (chromosome) will be surviving in the next generation. �

D. FEATURE OPTIMIZATION Extracting features using bag-of-words data structures results in significantly large feature vector size because all the keywords which have any associated sentiment value are included in the feature vector. This technique, however, poses significant scalability problem when using a larger dataset. To solve this problem, we need to optimize the feature vec- tor by reducing its size while maintaining accuracy. In this section, we formulated this problem and proposed its solution by using evolutionary Genetic Algorithmic approach.

1) PROBLEM FORMULATION Let W be a set of all the possible keywords of a document. We are interested to find a subset S, to be added in feature-set, of W which should give us a polarity value equal (or closest) to the labeled sentiment value without affecting the accuracy. This is important for the scalability of the program because if we include all the possible features then the feature vector of larger documents will grow big enough that it would not fit in the memory. Hence, in order to solve this problem, we need to optimize the feature selection technique.

The selection of a set of minimum number of features from the larger feature-set is an optimization problem with local minima. As we have discussed before, GA, due to its evolutionary nature is a well-suited technique for such non-polynomial time problems.

2) MATHEMATICAL MODEL We first start with modeling our problem on an evolutionary model of GA. Let W be the set of all the tokens in a document after preprocessing and τ be the labeled sentiment value of this document.

W = { w1,w2,w3, . . . ,wn

} (1)

Then, choose a set S, such that;

S ⊂ W ∧ n∑ i=1

Si = τ (or closest) (2)

where Si is the sentiment score of i-th token in subset S. Then, we introduce a vector Ex as a feasible solution.

Feasible Solution : Ex = ( x1,x2,x3, . . . ,xn

) (3)

where xi ∈ { 0,1

} n∑ i=1

wixi = τ (or closest) (4)

for i = 1,2, . . . ,n.

Objective Function : P(Ex) = n∑ i=1

wi.xi = τ (5)

where Ex = ( x1,x2,x3, . . . ,xn

) 14642 VOLUME 7, 2019

F. Iqbal et al.: Hybrid Framework for Sentiment Analysis Using GA-Based Feature Reduction

3) FITNESS CALCULATION In order to apply GA to this problem, the binary string Ex is to be chosen as genotype. The fitness function f (Ex) which is a simple form of our objective function P(Ex), is a vital part of our GA based feature selection. It determines the criteria for the best candidates which will be allowed to produce offsprings and survive in the next generation. We designed our fitness function in a way that it should make the solution converge right from the first generation. The fitness function to evaluate the fitness of a selected set of features depends on the relative distance from the labeled sentiment value where distance is in terms of polarity score determined by Senti- WordNet lexical database. The lesser the polarity distance between the class label and calculated score, the most feasible is the current solution, hence, more probable to survive in the next generation. The fitness function f (Ex) to evaluate the fit- ness of each individual genotype is described in equation (6). The fitness calculation is also described in Algorithm 3.

FitnessFunction : f (Ex) = s. ( τ −P(Ex)

) + ( 1− s

) .P(Ex) (6)

where,

s =

{ 1 if

( τ −P(Ex)

) = 0 (or min.), Ex is feasible

0 otherwise

Algorithm 3 FITNESS_CALCULATION

Input: T: list of tokens G: current genotype S: labelled sentiment Output: score: polarity score sum ← 0 foreach gi ∈ G do

/* 1 means include, 0 means exclude. Calculate polarity score of only subset of T determined by G

*/ if gi = 1 then

TTi ← getPosTag(Ti) score ← getSentiWordnetScore(Ti,TTi) sum+= score

end score ← (S − sum) return score

4) ALGORITHM AND ANALYSIS The algorithm for GA based feature selection is shown in Algorithm 4 [29]–[31]. We run the simulation until N number of generations so that entire population should converge to a single optimal solution. In each generation, different steps that constitute the working of the GA are performed. This includes crossover, mutation, offspring generation, and fit- ness evaluation. These processes of a single generation are described in Algorithm 5.

Algorithm 4 FEATURE_SELECTION_GA Input: A finite list A ={a1,a2, . . . ,an} of tokens and a

labelled sentiment value T. Output: a list of optimal features Let P be the initial randomly seeded population and k be the number of generations numGenerations ← k count ← 0 while count <numGenerations do

ProduceNextGeneration(P,A,T) end return P0

The time complexity of GA depends upon the fitness function. There were two ways to implement this problem, either keep generating new population until the solution is achieved or fix the number of generations to a big enough number k such that the solution can converge before reaching that limit. For the sake of simplicity, the latter one is used and the limit is set to k generations. Let Np and Na be the size of the population and the size of the keyword list respectively. In this case, the value of k is 5000 and the value of Np is 40. This is just the initial capacity and the list will be recreated to accommodate new elements. In Algorithm 4, the outer while loop runs until the k number of generations and for each iteration, it calls GenerateNextGenGA. In Algorithm 5, the while loop at line − 2 runs until the Np. In this while loop, there are other loops that iterate over each gene gj of each chromosome Pi in operations like crossover and fitness (they are not included in the above algorithms for the sake of clarity). Thus these for loops iterate until the size of each chromosome Nc which is as equal to Na. So the complexity of each evolution of GA is;

T(n) = k ∗Np ∗Nc (7)

T(n) = O(Np ∗Nc) (8)

by ignoring the constant value and all the lower order terms. Please note that this time complexity is subject to fixing the number of generations to some constant k.

IV. RESULTS AND DISCUSSION This section presents our results and the discussion. We first describe our software and hardware setup for evaluations. Later, we explain different evaluation parameters and discuss the performance of our system on these parameters. We use several performance metrics i.e., precision, recall, F-measure, and execution time. We further discuss the comparison of different ML classifiers e.g., NB, J48, PART, SMO, IB-k, and JRip. Finally, we also discuss the relative performance of our framework using three different approaches to sentiment analysis; SentiWordnet alone, ML, hybrid method with GA optimized feature-set.

VOLUME 7, 2019 14643

F. Iqbal et al.: Hybrid Framework for Sentiment Analysis Using GA-Based Feature Reduction

Algorithm 5 GENERATE_NEXT_GEN_GA Input: Initial population P, A and target T Pn ← φ Let Pn be the new population. while Pn.size <P.size do

Let i, j, k and l be 4 distinct random integers. Choose 4 chromosomes ch1, ch2, ch3, ch4 at these random indices from P. Check the fitness between ch1 and ch2, and between ch3 and ch4 and let the winners be two parents. w1 ← winner12 w2 ← winner34 Perform uniform crossover on w1 and w2 with probability 0.5 and generate 2 new children child1 and child2. Probmutate ← 0.01 r ← random() if r < probmutate then

k ← random(child1.size) if child1(k) = 1 then

child1(k) ← 0 else

child1(k) ← 1 end k ← random(child2.size) if child2(k) = 1 then

child2(k) ← 0 else

child2(k) ← 1 end

end isChild1Good ← child1.CalculateFitness() is better than w1.CalculateFitness() isChild2Good ← child2.CalculateFitness() is better than w2.CalculateFitness() if isChild1Good then

Pn.add(child1) else

Pn.add(w1) end if isChild2Good then

Pn.add(child2) else

Pn.add(w2) end

end P ← Pn return

A. EXPERIMENTAL SETUP In order to evaluate different approaches to sentiment analy- sis used in the proposed framework, we use three datasets: UCI ML dataset for sentiment scoring [15] which consist of user’s reviews and their relevant sentiment scores from

three different websites, Twitter labeled sentiment analysis dataset [16], and geopolitical dataset related to 2016 United States Presidential Election [17].

The experiments are performed on a desktop computer with Core i7 processor having 2.6 GHz frequency, 8GB of RAM, and 1TB of hard disk space. The development of the framework is carried out in Java language with Eclipse IDE as the workbench.

B. REVIEWS DATASET In this section, we discuss each experiment, relevant graphs, and performance metrics on reviews dataset from UCI ML repository. This dataset contains 3000 instances, labeled with 1 for positive or 0 for negative sentiment. These reviews were collected randomly from three different larger review sources: movie reviews from IMDB, restaurant reviews from Yelp, and product reviews from Amazon [32].

1) FEATURES SIZE AND ACCURACY COMPARISON OF ML APPROACHES First of all, we perform an accuracy comparison of GA based and non-GA based ML approaches for reviews from three different resources. Figure 3, Figure 4, and Figure 5 show the comparison of both ML techniques on IMDB, Amazon, and Yelp reviews respectively using six different classifiers.

FIGURE 3. Accuracy comparison of two feature selection techniques on IMDB movies reviews.

FIGURE 4. Accuracy comparison of two feature selection techniques on Amazon product reviews.

14644 VOLUME 7, 2019

F. Iqbal et al.: Hybrid Framework for Sentiment Analysis Using GA-Based Feature Reduction

FIGURE 5. Accuracy comparison of two feature selection techniques on Yelp restaurants reviews.

As we observe that the accuracy of GA optimized reduced feature-set results in almost equal to non-GA based technique (which contains 40% more features). We also observe that out of these six classifiers, NB and SMO show close to 80% accuracy in both approaches.

FIGURE 6. Features size comparison of feature vector before and after using GA optimization.

In order to substantiate our claim that the GA based approach for optimal feature selection results in significant feature size reduction while maintaining the similar accuracy, we perform a feature size comparison experiment. Figure 6 shows the size of the feature-set before and after we per- formed GA optimization on feature selection. As we can see that GA optimization has reduced the feature size by almost 40% which is significant. An important point to note is that we have already seen in Figures 3, 4, and 5 that accuracy of both approaches is same but GA optimization gives us a reduced feature size. This has a significant impact on the scalability of the system. Using bag-of-words as feature-set can result in a huge bottleneck when using larger dataset.

Figure 7 shows the scalability of our system in terms of execution time on GA optimized ML approach for sentiment analysis. It also shows the parts of execution and how much time is spent during each step. As we observe that GA take almost 60%-70% of the total execution time. However, our basic assumption is to optimize space to achieve better scalability at the cost of execution time. The execution time

FIGURE 7. Scalability and Time consumption of different steps.

FIGURE 8. Precision, Recall, and F-measure comparison of six different classifiers using GA optimized features on IMDB dataset.

spent in GA operations can be reduced by employing differ- ent parallelization techniques, however, these approaches are beyond the scope of the current state of this paper.

2) COMPARING ML CLASSIFIERS UNDER GA OPTIMIZATION We perform a relative comparison of six different ML clas- sifiers by using GA enhanced feature-set. We use precision, recall, and F-measure as the performance metric for clas- sifiers. The metrics are calculated for both positive ‘‘P’’ and negative ‘‘N’’ classified documents on each classifier. Furthermore, we perform the same tests for reviews from three different resources which we have discussed earlier. The results on IMDB dataset is shown in Figure 8. We observe that the NB classifier has the highest recall with 0.89 for negative class, followed by SMO with 0.8 for negative class. This is in alignment with our previous results of accuracy comparison of ML classifiers which shows that NB has the highest accuracy under GA. For precision, we observe a sim- ilar trend as NB for the positive class has the highest precision of 0.85, followed by SMO with 0.77. For F-measure, NB for negative has the highest F-measure with 0.787 while SMO for negative with 0.755 closely followed. We observe that for negative class, NB has the highest values while SMO came to be the second best while for the positive class, SMO has the highest F-Measure of 0.73 with closely followed by NB with F-measure of 0.719.

IB-k results in 0.652 F-measure for positive whereas 0.538 for the negative class. JRip shows the least score with

VOLUME 7, 2019 14645

F. Iqbal et al.: Hybrid Framework for Sentiment Analysis Using GA-Based Feature Reduction

FIGURE 9. Precision, Recall, and F-measure comparison of six different classifiers using GA optimized features on Amazon dataset.

FIGURE 10. Precision, Recall, and F-measure comparison of six different classifiers using GA optimized features on Yelp datasets.

0.599 F-measure for positive class while 0.546 for the nega- tive class.

The results of other two dataset are shown in Figure 9 and Figure 10. For Amazon dataset, we observe that JRip has the highest recall of 0.914 closely followed by J48 with 0.89 (both on negatively classified documents). Similarly, J48 has the highest precision for positively classified documents closely followed by JRip for positive, but their F-measure is affected by low recall. For F-measure, NB for positive has the highest F-measure with 0.784 closely followed by NB for negative with 0.782.

Lastly, we evaluate the Yelp dataset. For recall, JRip for negative has the highest recall with 0.94 while IB-k with negative class closely followed. For precision, JRip for the positive class has the highest value as 0.856 followed by J48 on positive class with 0.844. For F-measure, J48 for neg- atives class has the highest F-measure with 0.771 followed by NB for negative class with 0.74.

We found that in all these results, overall performance is better in NB classifier while SMO and J48 closely followed. Since we will be evaluating three different approaches to sentiment analysis, we will be using NB because it previously showed the best accuracy as compared to other classifiers.

3) COMPARING THREE DIFFERENT APPROACHES TO SENTIMENT ANALYSIS As discussed before, we include three different approaches of sentiment analysis in our framework. First, we use only

SentiWordNet (SWN) ontology to find polarity score of each keyword (after pos-tagging) and then aggregate the overall score of a document to find whether the overall notion is positive or negative. The second approach is based on ML in which we use a feature-set and different classifiers (as dis- cussed in previous sections) to classify positive and nega- tive documents. However, this approach is further divided into two approaches based on how the feature selection is performed. The first approach in ML technique uses bag-of- words as feature vector which means all the keywords having any polarity score attached are included in the feature vector. The second approach in ML technique uses a GA based optimized feature selection. In this approach, each document is modeled onto the GA model and GA simulation is run for several hundred generations to find the optimal set of features which results in best sentiment classification.

We evaluate all three approaches (SWN, ML with all fea- tures, and ML with GA optimized features) using precision, recall, F-measure as performance metrics. In order to provide a more detailed analysis, we perform these evaluations on three different reviews datasets e.g., IMDB, Amazon, Yelp which we have discussed before. Finally, we also evaluate the accuracy of these three approaches on all three datasets. An important point to note is that the results for ML approach are taken only using NB classifier. We have already seen in the previous discussion that NB gave best results for ML on both GA and non-GA approach.

FIGURE 11. Precision, Recall, and F-measure of SentiWordNet based sentiment analysis on IMDB, Amazon, and Yelp reviews.

Figure 11 shows the results of sentiment analysis only using SWN polarity scoring and aggregation. We observe that Amazon dataset for the negative class shows the highest precision score of 0.71 which is closely followed by Yelp dataset for the negative class with a score of 0.67. Similarly, Yelp dataset for the positive class has the highest recall of 0.58 while IMDB dataset for the negative class came afterward with 0.577. For F-measure, we found that IMDB for the negative class has the highest score of 0.615 followed by Yelp for positive class with a score of 0.608.

The results for the second approach which was ML using bag-of-words as features are shown in Figure 12. We observe that IMDB for the positive class has the highest precision of 0.868 followed by Amazon for negative class having 0.838.

14646 VOLUME 7, 2019

F. Iqbal et al.: Hybrid Framework for Sentiment Analysis Using GA-Based Feature Reduction

FIGURE 12. Precision, Recall, and F-measure of simple feature selection for sentiment analysis on NB classifier.

For recall, IMDB for negative shows the highest recall with 0.904 score which is followed by Amazon for positive class with a score of 0.866. Similarly, Amazon for positive class shows the highest F-measure of 0.80 and IMDB for negative comes afterward with a slightly lower score of 0.79.

FIGURE 13. Precision, Recall, and F-measure of GA optimized feature selection for sentiment analysis on NB classifier.

The results of the third approach with optimized feature selection using GA are shown in Figure 13. We observe that IMDB with positive class has the highest precision of 0.86 followed by Amazon for negative class with a score of 0.786. Similarly, IMDB for the negative class has the high- est recall of 0.896 while Amazon for the positive class came afterward with a score of 0.788. For F-measure, we found IMDB for the negative class has the highest score of 0.787 fol- lowed by Amazon for negative class with a slightly lower score of 0.782.

Finally, we compare the accuracy of these three approaches used for sentiment analysis in our framework. The accuracy comparison is shown in Figure 14. As we can see that SWN approach has almost 50% accuracy at best which is not fea- sible for real-time analysis. The ML approaches (with and without GA optimization) has an accuracy ranging from 74% to 78%. However, GA optimized ML technique has 40% reduced feature-set. Overall, we see that GA optimized ML in case of Amazon dataset has the highest accuracy of 77.9% while non-GA based ML has around 78.3%.

We conclude that, in sentiment analysis, GA based opti- mal feature selection does not much affect the accuracy as

FIGURE 14. Accuracy comparison of three main approaches to sentiment analysis on IMDB, Amazon, and Yelp reviews.

compared to non-GA based feature selection. At the same time, GA based approach has also reduced the feature-set size by a marginal 40% as compared to other approaches.

4) COMPARISON OF GA WITH PCA AND LSA In the final part of our evaluation, we demonstrate that our GA based feature reduction techniques perform better than PCA and LSA based feature reduction. Figure 15 shows the accuracy graph of all three feature reduction techniques on Amazon dataset. As we can see that GA based approach has, on average, 15.38% better accuracy than PCA and 20.29% better accuracy than LSA. Similarly, we observe that the NB classifier shows the highest accuracy difference with 24.08% increase than PCA, and 27.32% increased performance than LSA based feature reduction. Based on these observations, we conclude that our GA based technique is more effective than two of the existing well-known approaches for feature reduction.

FIGURE 15. Accuracy comparison of GA based reduction with PCA and LSA based reduction techniques on Amazon dataset.

C. TWITTER DATASET In this section, we discuss the experiment, relevant graphs, and performance metrics on twitter dataset.

1) FEATURES SIZE AND ACCURACY COMPARISON OF ML APPROACHES First of all, we perform an accuracy comparison of GA based and non-GA based ML approaches. Figure 16 shows

VOLUME 7, 2019 14647

F. Iqbal et al.: Hybrid Framework for Sentiment Analysis Using GA-Based Feature Reduction

FIGURE 16. Accuracy comparison of two feature selection techniques on tweets.

the comparison of both ML techniques on twitter dataset using six different classifiers. As we observe that the accu- racy of GA optimized reduced feature-set results in almost equal to the non-GA based technique (which contains 42% more features). We also observe that out of these six clas- sifiers, NB and SMO show close to 80% accuracy in both approaches. On our Twitter dataset, in case of IB-k, NB, and JRip, the GA based feature reduction technique results in almost 4.3%, 2%, and 0.8% respectively better accuracy than the non-GA based feature selection.

For Twitter dataset, the number of features before perform- ing GA are 2722 and after performing GA optimization on feature-set, feature size decreases to 1562 which is almost 42% reduction in the size. We have already seen in Figure 16 that accuracy of both approaches is same or even better in case of IB-k, NB, and JRip, and GA optimization gives us a reduced feature-set size. This has a significant impact on the scalability of the system.

FIGURE 17. Scalability and Time consumption of different steps on Twitter dataset.

Figure 17 shows the scalability of our system in terms of execution time on GA optimized ML approach for sen- timent analysis. It also shows the parts of execution and how much time is spent during each step. As we observe that preprocessing takes almost 55% - 60% of the total execution time. This is because twitter data is noisy and requires more cleaning before being able to process for ML. After prepro- cessing, GA also consumes a lot of time, however, our basic

assumption is to optimize space to achieve better scalability at the cost of the execution time. As we have discussed earlier that the execution time spent in GA operations can be reduced by employing different parallelization techniques.

2) COMPARING ML CLASSIFIERS UNDER GA OPTIMIZATION Figure 18 shows a relative comparison of six different ML classifiers by using GA enhanced feature-set. We observe that IB-k classifier has the highest recall with 0.895 for positive classified tweets, followed by J48 with 0.892 but both have a very low precision for positive class. For positive class, the precision of NB i.e.0.747 is the highest and same for the negative class with the precision of 0.849. NB also have the highest F-measure for both the positive class with 0.807 closely followed by SMO with 0.786 and for the nega- tive class with 0.767 followed by SMO with 0.753.

FIGURE 18. Precision, Recall, and F-measure comparison of six different classifiers using GA optimized features on tweets.

We found that overall performance is better on NB classi- fier while SMO closely followed which is in accordance with our previous results on reviews dataset.

3) COMPARING THREE DIFFERENT APPROACHES TO SENTIMENT ANALYSIS Our framework contains three different approaches to sen- timent analysis. We have already seen in the previous dis- cussion that NB gave best results for ML on both GA and non-GA approach, so, Figure 19 shows the accuracy compar- ison of these three approaches of sentiment analysis.

As we can see that SWN approach has almost 56% accu- racy at best which is not feasible for real-time analysis. Over- all, we see that GA optimized ML has the highest accuracy of almost 79% while non-GA based ML has around almost 77%. GA based optimal feature selection has improved the accu- racy and at the same time, GA based approach has also reduced the feature-set size by a marginal 42%.

4) COMPARISON OF GA WITH PCA AND LSA Figure 20 shows the accuracy graph of all three feature reduc- tion techniques on Twitter Sentiment dataset. As we can see that GA based approach has, on average, 10.4% better accu- racy than PCA and 14.5% better accuracy than LSA. All the

14648 VOLUME 7, 2019

F. Iqbal et al.: Hybrid Framework for Sentiment Analysis Using GA-Based Feature Reduction

FIGURE 19. Accuracy comparison of three main approaches to sentiment analysis on Twitter dataset.

FIGURE 20. Accuracy comparison of GA based reduction with PCA and LSA based reduction techniques.

classifiers show better accuracy on GA based feature-set except IB-K which shows better accuracy with PCA based feature-set. We observe that the NB classifier shows the high- est accuracy difference with 21.8% increase than PCA, and 23.1% increased performance than LSA based feature reduc- tion. Hence, it proves that our GA based technique is more effective than two of the existing well-known approaches for feature reduction.

D. GEOPOLITICAL DATASET In this section, we discuss the experiment, relevant graphs, and performance metrics on a geopolitical dataset which is related to the 2016 United States Presidential Election. This dataset contains tweet IDs collected using candidates and key election hashtags. We use Hydrator [33], a desktop applica- tion that takes in tweet IDs and returns the corresponding data from Twitter as JSON. We selected tweets related to first debate for our case study. To label this dataset, we use emoti- cons methods as used by several researchers [34] [35] using emoticons selected by Hu et al. [36]. To test on the proposed framework, we randomly selected almost 1000 tweets which discuss multiple topics related to US government first debate. This dataset can be used to measure public sentiments and views regarding various topics which have applications in various areas including security and surveillance, law-and- order, and public administration.

1) FEATURES SIZE AND ACCURACY COMPARISON OF ML APPROACHES First of all, we perform an accuracy comparison of GA based and non-GA based ML approaches. Figure 21 shows the comparison of both ML techniques on the geopolitical dataset using six different classifiers. As we observe that the accu- racy of GA optimized reduced feature-set results in almost equal to non-GA based technique (which contains 34% more features). On geopolitical dataset, we observe that out of these six classifiers, IB-k and SMO show more than 90% accuracy in both approaches and in case of IB-k and JRip, GA based technique show 3.1% and 1.19% respectively better accuracy than non-GA based method even with decreased feature-set. We observe that IB-k classifier with GA based approach outperforms all the classifiers with the accuracy of 95.7%.

FIGURE 21. Accuracy comparison of two feature selection techniques using geopolitical data.

FIGURE 22. Time comparison for models training of two feature selection techniques.

Number of features before performing GA was 1899 and after performing GA optimization on feature-set, feature size decreased to 1246 which is almost 34% reduction in size. As we have stated earlier that this reduced features set have a significant impact on the scalability of the system which is shown in Figure 22. This figure shows a comparison of time

VOLUME 7, 2019 14649

F. Iqbal et al.: Hybrid Framework for Sentiment Analysis Using GA-Based Feature Reduction

required to build the ML models. We can see that applying GA based feature selection significantly decreases the time required to build the model. In the case of PART which is slowest among all the classifiers, using GA based technique, we are able to reduce time by 37%. Largest speedup is achieved with JRIP which is 55%. Values of time for NB and IB-k are so small to be shown on the graph so we exclude them. For time required to preprocess and apply GA on data, we found the same patterns as shown in Figure 17.

2) COMPARING ML CLASSIFIERS UNDER GA OPTIMIZATION Figure 23 shows a relative comparison of six different ML classifiers by using GA enhanced feature-set. We observe that IB-k classifier has the highest recall with 0.978 for positive class and highest precision of 0.975 for negative class. This results in overall highest F-measure of IB-k classifier for both positive and negative class and hence highest accuracy among all the six classifiers. JRIP also have the good recall measures i.e. 0.944 for positive class but low precision affects its F-measure. SMO shows the second highest F-measure for both positive and negative class. We found that overall performance is better in IB-k classifier while SMO closely followed.

FIGURE 23. Precision, Recall, and F-measure comparison of six different classifiers using GA optimized features on tweets.

3) COMPARING THREE DIFFERENT APPROACHES TO SENTIMENT ANALYSIS Our framework contains three different approaches to sen- timent analysis. On geopolitical dataset, we found that IB-k classifier gives best results on both GA and non-GA based approach. It is also shown in Figure 21 that GA based IB-k gives higher accuracy than non-GA based IB-k. So, we choose IB-K classifier to compare our three approaches. Figure 24 shows the accuracy comparison of three approaches of sentiment analysis for IB-k classifier.

As we can see that SWN approach has almost 39.84% accuracy at best which is not feasible for practical appli- cation. Overall, we see that GA optimized ML has the highest accuracy of almost 95.7% while non-GA based ML has around almost 92.6%. GA based optimal feature selection has improved the accuracy and at the same time,

FIGURE 24. Accuracy comparison of three main approaches to sentiment analysis on geopolitical dataset.

GA based approach has also reduced the feature-set size by a marginal 34%.

4) COMPARISON OF GA WITH PCA AND LSA Figure 25 shows the accuracy graph of all three feature reduc- tion techniques on the geopolitical dataset. All the classifiers show better accuracy on GA based feature-set except J48 and JRip which show better accuracy with PCA based feature reduction dataset. As on geopolitical dataset, IB-k has the highest accuracy which also shows 4.4% better accuracy as compared to PCA and 5% better accuracy as compared to LSA when GA based feature-set is used. We observe that the SMO classifier shows the highest accuracy difference with 9.6% increased performance than PCA and 40.2% increased performance than LSA based feature reduction. This also proves that our GA based technique is more effective than two of the existing well-known approaches for feature reduction.

FIGURE 25. Accuracy comparison of GA based reduction with PCA and LSA based reduction techniques on geopolitical dataset.

V. CONCLUSION In this paper, we have presented the design, development, and evaluation of our integrated sentiment analysis frame- work in detail. We employed three different approaches to sentiment analysis which includes SWN, ML, and ML with GA optimized feature selection. We proposed and developed

14650 VOLUME 7, 2019

F. Iqbal et al.: Hybrid Framework for Sentiment Analysis Using GA-Based Feature Reduction

an evolutionary model for feature selection using GA’s evolutionary model. This novel approach resulted in 36% - 42% reduced feature size and about 5% increased efficiency as compared to a normal ML approach. We also presented a detailed evaluation of these approaches with respect to different datasets. Furthermore, our detailed analy- sis of different ML classifiers revealed that the NB classifier has the highest accuracy (about 80%) while using our GA based optimal feature selection on Twitter and reviews dataset while in case of the geopolitical dataset, IB-k outperformed all the classifiers with the accuracy of 95%.

Furthermore, we evaluated our proposed technique for scalability by using execution time comparison. We found that our system showed a linear speedup with the increased dataset size. Although, the time spent in the selection of optimal feature-set using GA took about 60% to 70% of the total execution time on reviews dataset, however, it still remained linear and produced a feature-set with 40% reduced size than the original feature-set. GA based feature set results in a speedup of modeling the classifiers up to 55%

In order to demonstrate the benefit of using our feature reduction algorithm over other feature reduction techniques, we have provided an accuracy comparison of GA based hybrid approach with PCA and LSA. The results showed that our GA based feature reduction showed up to 15.4% increased accuracy over PCA and up to 40.2% increased accuracy over LSA. This strengthens our claim that our pro- posed algorithm is fast, accurate, and scales well as the dataset grows bigger.

We conclude that our sentiment analysis framework has proved to be a great addition in the discipline of opinion mining. It provided the flexibility of choosing among three widely used sentiment analysis techniques according to cus- tom needs. With additional benefits of GA based optimiza- tion, it reduces feature size and improves efficiency while maintaining the scalability. In the future, we aim to extend this framework for cyber-intelligence so that it would help gener- ate recommendations for law-enforcement agencies based on user opinions.

REFERENCES [1] P. DiMaggio, E. Hargittai, W. R. Neuman, and J. P. Robinson, ‘‘Social

implications of the Internet,’’ Annu. Rev. Sociol., vol. 27, pp. 307–336, Aug. 2001.

[2] C. Wang and P. Zhang, ‘‘The evolution of social commerce: The people, management, technology, and information dimensions,’’ Commun. Assoc. Inf. Syst., vol. 31, no. 5, pp. 1–23, 2012.

[3] B. Pang and L. Lee, ‘‘Opinion mining and sentiment analysis,’’ Found. Trends Inf. Retr., vol. 2, nos. 1–2, pp. 1–135, 2008.

[4] A. Davies and Z. Ghahramani, ‘‘Language-independent Bayesian senti- ment mining of Twitter,’’ in Proc. Workshop Social Netw. Mining Anal., 2011, pp. 99–107.

[5] R. Prabowo and M. Thelwall, ‘‘Sentiment analysis: A combined approach,’’ J. Informetrics, vol. 3, no. 2, pp. 143–157, 2009.

[6] A. Collomb, C. Costea, D. Joyeux, O. Hasan, and L. Brunie, ‘‘A study and comparison of sentiment analysis methods for reputation evaluation,’’ Tech. Rep. RR-LIRIS-2014-002, 2014.

[7] E. Boiy and M.-F. Moens, ‘‘A machine learning approach to sentiment analysis in multilingual Web texts,’’ Inf. Retr., vol. 12, no. 5, pp. 526–558, 2009.

[8] M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede, ‘‘Lexicon- based methods for sentiment analysis,’’ Comput. Linguistics, vol. 37, no. 2, pp. 267–307, 2011.

[9] X. Ding, B. Liu, and P. S. Yu, ‘‘A holistic lexicon-based approach to opinion mining,’’ in Proc. Int. Conf. Web Search Data Mining, 2008, pp. 231–240.

[10] R. Feldman, ‘‘Techniques and applications for sentiment analysis,’’ Com- mun. ACM, vol. 56, no. 4, pp. 82–89, 2013.

[11] A. Esuli and F. Sebastiani, ‘‘SentiWordNet: A publicly available lexical resource for opinion mining,’’ in Proc. LREC, vol. 6, 2006, pp. 417–422.

[12] S. Wold, K. Esbensen, and P. Geladi, ‘‘Principal component analysis,’’ Chemometrics Intell. Lab. Syst., vol. 2, nos. 1–3, pp. 37–52, 1987.

[13] S. T. Dumais, ‘‘Latent semantic analysis,’’ Annu. Rev. Inf. Sci. Technol., vol. 38, no. 1, pp. 188–230, 2004.

[14] S. Goel, ‘‘Cyberwarfare: Connecting the dots in cyber intelligence,’’ Com- mun. ACM, vol. 54, no. 8, pp. 132–140, Aug. 2011.

[15] (2015). UCI ML Repository—Sentiment Analysis Dataset. Accessed: Jun. 8, 2018. [Online]. Available: http://archive.ics.uci.edu/ml/datasets/ Sentiment+Labelled+Sentences

[16] J. A. Bowden. (2016). Twitter Sentiment Analysis. Accessed: Jun. 8, 2018. [Online]. Available: https://old.datahub.io/dataset/twitter-sentiment- analysis

[17] J. Littman, L. Wrubel, and D. Kerchner. (2016). 2016 United States Presi- dential Election Tweet IDS. Accessed: Dec. 21, 2018. [Online]. Available: https://doi.org/10.7910/DVN/PDI7IN

[18] W. Medhat, A. Hassan, and H. Korashy, ‘‘Sentiment analysis algo- rithms and applications: A survey,’’ Ain Shams Eng. J., vol. 5, no. 4, pp. 1093–1113, 2014.

[19] A. Khan, B. Baharudin, and K. Khairullah, ‘‘Sentiment classification using sentence-level lexical based semantic orientation of online reviews,’’ Trends Appl. Sci. Res., vol. 6, no. 10, pp. 1141–1157, 2011.

[20] A. Agarwal, B. Xie, I. Vovsha, O. Rambow, and R. Passonneau, ‘‘Sentiment analysis of Twitter data,’’ in Proc. Workshop Lang. Social Media, 2011, pp. 30–38.

[21] E. Kouloumpis, T. Wilson, and J. Moore, ‘‘Twitter sentiment analysis: The good the bad and the OMG!’’ in Proc. ICWSM, vol. 11. 2011, pp. 538–541.

[22] M. Pontiki et al., ‘‘SemEval-2016 task 5: Aspect based sentiment anal- ysis,’’ in Proc. 8th Int. Workshop Semantic Eval. (SemEval), 2014, pp. 27–35.

[23] P. C. S. Njølstad, L. S. Høysæter, W. Wei, and J. A. Gulla, ‘‘Evaluating feature sets and classifiers for sentiment analysis of financial news,’’ in Proc. IEEE/WIC/ACM Int. Joint Conf. Web Intell. (WI) Intell. Agent Technol. (IAT), vol. 2, Aug. 2014, pp. 71–78.

[24] M. Govindarajan, ‘‘Sentiment analysis of movie reviews using hybrid method of naive Bayes and genetic algorithm,’’ Int. J. Adv. Comput. Res., vol. 3, no. 4, pp. 139–145, 2013.

[25] A. McCallum. (1998). Rainbow Stopwords. Accessed: Jun. 8, 2018. [Online]. Available: http://www.cs.cmu.edu/~mccallum/bow/rainbow/

[26] M. McCandless, E. Hatcher, and O. Gospodnetic, Lucene Action: Covers Apache Lucene 3.0. New York, NY, USA: Manning Publications, 2010.

[27] M. F. Porter, ‘‘An algorithm for suffix stripping,’’ Program, vol. 14, no. 3, pp. 130–137, 1980.

[28] C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky, ‘‘The Stanford CoreNLP natural language pro- cessing toolkit,’’ in Proc. 52nd Annu. Meeting Assoc. Comput. Lin- guistics, Syst. Demonstrations, 2014, pp. 55–60. [Online]. Available: http://www.aclweb.org/anthology/P/P14/P14-5010

[29] L. M. Schmitt, ‘‘Theory of genetic algorithms,’’ Theor. Comput. Sci., vol. 259, nos. 1–2, pp. 1–61, May 2001.

[30] M. Mitchell, An Introduction to Genetic Algorithms. Cambridge, MA, USA: MIT Press, 1998.

[31] D. Beasley, D. R. Bull, and R. R. Martin, ‘‘An overview of genetic algo- rithms: Part 1, fundamentals,’’ Univ. Comput., vol. 15, no. 2, pp. 56–69, 1993.

[32] D. Kotzias, M. Denil, N. De Freitas, and P. Smyth, ‘‘From group to individual labels using deep features,’’ in Proc. 21th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2015, pp. 597–606.

[33] (2016). Hydrator. Accessed: Dec. 21, 2018. [Online]. Available: https://github.com/DocNow/hydrator

[34] A. Pak and P. Paroubek, ‘‘Twitter as a corpus for sentiment analysis and opinion mining,’’ in Proc. LREC, 2010, pp. 1320–1326.

[35] A. Bifet and E. Frank, ‘‘Sentiment knowledge discovery in twitter stream- ing data,’’ in Proc. Int. Conf. Discovery Sci. Berlin, Germany: Springer, 2010, pp. 1–15.

[36] X. Hu, J. Tang, H. Gao, and H. Liu, ‘‘Unsupervised sentiment analysis with emotional signals,’’ in Proc. 22nd Int. Conf. World Wide Web, 2013, pp. 607–618.

VOLUME 7, 2019 14651

F. Iqbal et al.: Hybrid Framework for Sentiment Analysis Using GA-Based Feature Reduction

FARKHUND IQBAL holds the position of Asso- ciate Professor and Director Advanced Cyber Forensics Research Laboratory in the College of Technological Innovation, Zayed University, United Arab Emirates. He holds a Master (2005) and a Ph.D. degree (2011) from Concordia Uni- versity, Canada. He is using machine learning and Big Data techniques for problem solving in health- care, cybersecurity and cybercrime investigation in smart and safe city domain. He has published more

than 80 papers in peer-reviewed high ranked journals and conferences. He is an affiliate professor in school of information studies, McGill University, Canada and Adjunct Professor, Faculty of Business and IT, University of Ontario Institute of Technology, Canada. Dr. Iqbal is the recipient of several prestigious awards and research grants. He has served as a chair and TPC member of several IEEE/ACM conferences, guest editor of special issues and reviewer of high rank journals. He is a member of several professional organization including ACM and IEEE Digital society.

JAHANZEB MAQBOOL HASHMI received the B.S. degree from the National University of Sci- ence and Technology, Pakistan, through the Prime Minister’s ICT Scholarship Program, and the M.S. degree in computer engineering from Ajou Uni- versity, South Korea, through the Korean Global IT Fellowship. He is currently pursuing the Ph.D. degree with Ohio State University.

BENJAMIN C. M. FUNG received the Ph.D. degree in computing science from Simon Fraser University, in 2007. He is currently a Canada Research Chair with Data Mining for Cybersecurity, an Associate Professor with the School of Information Studies, McGill University, and a Co-Curator in cybersecurity with the World Economic Forum. He has over 120 refereed publi- cations that span the research forums of data min- ing, machine learning, privacy protection, cyber

forensics, and building engineering. His data mining works in crime inves- tigation and authorship analysis have been reported by media worldwide. He is a Licensed Professional Engineer in software engineering. He is a Senior Member of both the IEEE and ACM.

RABIA BATOOL received the master’s degree in computer science from Kyung Hee University, South Korea. She is currently a Researcher with the College of Technological Innovation, Zayed University. Her research interests include data mining, machine learning, natural language pro- cessing, and information extraction and she has published several research papers in these areas.

ASAD MASOOD KHATTAK received the M.S. degree in information technology from the National University of Sciences and Technology, Islamabad, Pakistan, in 2008, and the Ph.D. degree in computer engineering from Kyung Hee Univer- sity, South Korea, in 2012. He joined the College for Technological Innovation, Zayed University, Abu Dhabi, in 2014, where he is currently an Assistant Professor. He was a Postdoctoral Fellow with the Department of Computer Engineering,

Kyung Hee University, for seven months, and then he started working with the same college as an Assistant Professor. He is currently leading two research projects and collaboration in three research projects in the same research fields. He has authored/co-authored more than 50 journal and conference articles in highly reputed venues.

SAIQA ALEEM received the M.S. degree in computer science from the University of Central Punjab, Pakistan, in 2004, the M.S. degree in information technology from United Arab Emi- rates University, in 2013, and the Ph.D. degree in electrical and computer engineering from the University of Western Ontario, Canada, in 2016. She had many years of academic and industrial experience, holding various technical positions. She is currently an Assistant Professor with Zayed

University. She is also with Microsoft, CompTIA, and Cisco certified pro- fessional with MCSE, MCDBA, A+, and CCNA certifications. Her research interests include game development process, cloud computing, software process assessment model, the IoT, and social network analysis.

PATRICK C. K. HUNG received the bachelor’s degree in computer science from the Univer- sity of New South Wales, Australia, the master’s and Ph.D. degrees in computer science from The Hong Kong University of Science and Technol- ogy, and the master’s degree in management sci- ences from the University of Waterloo, Canada. He was with Boeing Research and Technology, Seattle, with a focus on aviation services-related research with two U.S. patents on mobile network

dynamic workflow system. Before that, he was a Research Scientist with Commonwealth Scientific and Industrial Research Organization, Australia. He is currently a Professor and the Director of International Programs with the Faculty of Business and Information Technology, University of Ontario Institute of Technology, Canada. He is a Founding Member of the IEEE Technical Committee on Services Computing and the IEEE TRANSACTIONS ON SERVICES COMPUTING.

14652 VOLUME 7, 2019

INTRODUCTION
RELATED WORK
PROPOSED FRAMEWORK

DATA CLEANING

GARBAGE REMOVAL
SLANG CORRECTION
STOPWORD REMOVAL

PREPROCESSING

TOKENIZATION
STEMMING
POS-TAGGING

ANALYSIS ENGINE

LEXICON-BASED SENTIMENT ANALYSIS
ML USING BAG-OF-WORDS AS FEATURES
HYBRID METHOD WITH OPTIMAL FEATURE SELECTION

FEATURE OPTIMIZATION

PROBLEM FORMULATION
MATHEMATICAL MODEL
FITNESS CALCULATION
ALGORITHM AND ANALYSIS

RESULTS AND DISCUSSION

EXPERIMENTAL SETUP
REVIEWS DATASET

FEATURES SIZE AND ACCURACY COMPARISON OF ML APPROACHES
COMPARING ML CLASSIFIERS UNDER GA OPTIMIZATION
COMPARING THREE DIFFERENT APPROACHES TO SENTIMENT ANALYSIS
COMPARISON OF GA WITH PCA AND LSA

TWITTER DATASET

FEATURES SIZE AND ACCURACY COMPARISON OF ML APPROACHES
COMPARING ML CLASSIFIERS UNDER GA OPTIMIZATION
COMPARING THREE DIFFERENT APPROACHES TO SENTIMENT ANALYSIS
COMPARISON OF GA WITH PCA AND LSA

GEOPOLITICAL DATASET

FEATURES SIZE AND ACCURACY COMPARISON OF ML APPROACHES
COMPARING ML CLASSIFIERS UNDER GA OPTIMIZATION
COMPARING THREE DIFFERENT APPROACHES TO SENTIMENT ANALYSIS
COMPARISON OF GA WITH PCA AND LSA

CONCLUSION
REFERENCES
Biographies

FARKHUND IQBAL
JAHANZEB MAQBOOL HASHMI
BENJAMIN C. M. FUNG
RABIA BATOOL
ASAD MASOOD KHATTAK
SAIQA ALEEM
PATRICK C. K. HUNG