helpfn

SentimentAnalysisWithComparisonEnhancedDeepNeuralNetwork.pdf

Home >Computer Science homework help >helpfn

Received April 2, 2020, accepted April 14, 2020, date of publication April 22, 2020, date of current version May 8, 2020.

Digital Object Identifier 10.1109/ACCESS.2020.2989424

Sentiment Analysis With Comparison Enhanced Deep Neural Network YUAN LIN 1, JIAPING LI2, LIANG YANG 3, KAN XU 2, AND HONGFEI LIN 2 1Faculty of Humanities and Social Sciences, Dalian University of Technology, Dalian 116024, China 2Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian 116024, China 3Department of Computer Science, Dalian University of Technology, Dalian 116024, China

Corresponding author: Liang Yang ([email protected])

This work was supported in part by the Natural Science Foundation of China under Grant 61976036, Grant 61702080, and Grant 61632011, in part by the National Key Research and Development Program of China under Grant 2018YFC0832101, in part by the Postdoctoral Science Foundation of China under Grant 2018M631788, and in part by the Fundamental Research Funds for the Central Universities under Grant DUT19RC(4)016).

ABSTRACT Sentiment analysis is a significant task in Natural Language Processing. It refers to classifica- tion based on the emotional tendency in text by extracting text features. The existing results show that models based on RNN and CNN have good performance. In order to improve the performance of text sentiment analysis, we reformulate the classification task as a comparing problem, and propose Comparison Enhanced Bi-LSTM with Multi-Head Attention (CE-B-MHA). In fact, it is efficient to classify by comparison mechanism instead of doing complex calculation. In this model, bidirectional LSTM is used for initial feature extraction, and valuable information is extracted from different dimensions and representation subspaces by Multi-Head Attention. The comparison mechanism aims to score the feature vectors by comparing with the labeled vectors. The experimental results show that CE-B-MHA has better performance than many existing models on three sentiment analysis datasets.

INDEX TERMS Sentiment analysis, machine learning, neural networks.

I. INTRODUCTION Today, countless text messages are produced in the internet every day. On social media, people express their opinions in text form. On pages with film reviews, they are also presented in text form. These text information with large quantity and rich content can become an important data resource.

The short texts published on various platforms contain strong emotional tendencies and reflect the diverse views held by users. Sentiment analysis of these massive text messages is of great value to various industries. For example, by ana- lyzing citizens’ different attitudes towards the same news event, the government can understand the public’s opinions on social events and related policies. By analyzing the user’s attitude towards a product function, the manufacturer can improve the product easily. The potential value of senti- ment analysis attracts extensive attention from researchers in different fields, such as data mining and natural language processing. Sentiment analysis for user-generated text has become a research hotspot in relevant fields.

The associate editor coordinating the review of this manuscript and

approving it for publication was Huazhu Fu .

Sentiment analysis [1], also known as opinion mining [2], is a research field to analyze people’s subjective feelings such as emotions, evaluations, opinions and attitudes towards products, services, organizations, individuals, events, sub- jects and their attributes. Text sentiment analysis task is one of the most important tasks in the field of natural language processing. It is found that the neural network model which is based on Long Short-Term Memory (LSTM) or Gated Recur- rent Unit (GRU) can effectively extract the context relations of sentences. The classification accuracy of the model can be improved by introducing Attention Mechanism to filter irrelevant information.

Attention mechanism was first proposed in the field of computer vision [3], and gradually entered the field of natural language processing [4]. The purpose of this mechanism is to allow the model to focus on important objects, which are mathematically represented as weighted sums. In ‘‘Attention is all you need’’ [5] published by Google machine translation team in 2017, the Multi-Head Attention mechanism used in this paper was proposed.

Sequential information is particularly important for nat- ural language processing tasks. It represents the logic and

78378 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ VOLUME 8, 2020

https://orcid.org/0000-0001-7452-5270

https://orcid.org/0000-0002-5557-7515

https://orcid.org/0000-0001-6954-2578

https://orcid.org/0000-0003-0872-7688

https://orcid.org/0000-0002-9702-5524

Y. Lin et al.: Sentiment Analysis With Comparison Enhanced Deep Neural Network

structure of the text. However, Multi-Head Attention mecha- nism can obtain global structure, but it is hard to effectively capture sequence information. In this paper, Bi-LSTM is introduced to capture sequence information to get the order of the text and improve the model.

LSTM is a special type of recurrent neural network [6], has achieved great success in solving many problems. LSTM network introduces a self-cycling mechanism that makes it easier to learn long-term dependent information than a simple cycling structure. The bidirectional LSTM (Bi-LSTM) model inputs the forward and backward directions of the sequence into the LSTM network, which can better capture the contex- tual order information of the sequence [7].

Existing sentiment analysis models based on deep neural network have little to do with psychology, and have to use too much parameters. On the basis of deep neural network, we introduce comparison mechanism, a psychological con- cept, to enhance the learning ability of the model, which uses the comparison of text itself instead of using too much param- eters. Comparison is the most intuitive and effective way of classification in people’s daily life. People always learn new things by comparing. In this paper, we reformulate the sentiment analysis task as a comparing problem. Sentiment analysis task aims to train a model that maps the item feature to a score. We get the score by comparing the items with labeled samples, instead of fitting the hard-to-learn patterns.

In this paper, Comparison Enhanced Bi-LSTM with Multi- Head Attention (CE-B-MHA) is proposed to improve the performance of sentiment analysis. CE-B-MHA combines the ability of Multi-Head Attention to obtain global infor- mation with the ability of Bi-LSTM to obtain local sequence information, and enhances it with a comparison mechanism.

The main contributions of our work can be summarized as follows: •We combine Bi-LSTM with Multi-Head Attention, and

propose a new model with good performance. We use Bi-LSTM to capture the context relationship, and use Multi-Head Attention to capture long distance text features. •We propose a text sentiment analysis model, namely CE-

B-MHA. On the basis of constructing deep neural network with Bi-LSTM and Multi-Head Attention, we use comparison mechanism to enhance the model and improve its perfor- mance. •We use public datasets to verify the efficiency of our

model, and compare it with the current methods. Experimen- tal results show that CE-B-MHA can improve the perfor- mance compared with several baselines.

The rest of our paper is structured as follows: Section II discusses related works, Section III gives details of our approach, Section IV gives our experimental results, and Section V summarizes this work.

II. RELATED WORKS A. SENTIMENT ANALYSIS TASK The sentiment analysis task can be divided into binary clas- sification task and multi-class classification task according

to the different classification objectives. In many cases, the researchers divided the emotional polarity of the text into positive and negative categories, commonly known as binary classification task. Generally, positive text indicates positive emotional tendency, while negative text indicates negative emotional tendency. This simple classification method can be applied in many real situations, such as analyzing whether users are favorable or unfavorable to a certain commodity and literary works, and public opinion towards a certain social event. In the research process, binary classification is also a common task to evaluate the classification ability of models.

In addition, multi-class classification task is also an impor- tant direction. Multi-class classification task can be divided into emotional level classification and fine-grained emotion classification. Emotional level classification refers to divide text emotional tendency from negative to positive into several levels. The classification of emotions into 1 to 5 can be called a five-classification problem. The fine-grained emotion is classified according to the categories of emotion. There is no common criterion for this classification, and it is generally self-determined based on research questions, such as emo- tions that can be divided into joy, anger, sadness, surprise, disgust, fear and neutrality.

B. SENTIMENT ANALYSIS METHODS The methods used in text sentiment analysis can be divided into two categories: emotion dictionary method and machine learning method. Constructing an emotion dictionary and using it as a tool is the traditional method to judge the emo- tional polarity of texts [8]. Most emotional dictionaries need to be constructed manually. The principle is to summarize the words with emotional tendency to form a dictionary. Emo- tional words have strong indicating ability, which is one of the important signals that the text contains emotional tendency. When the text is entered, it is matched with the contents of the dictionary, looking for emotional words in the text to deter- mine the emotional polarity of the text. However, there are some limitations in the emotion dictionary method. It covers insufficient forms of emotional expression and cannot timely cover the emerging forms of expression, which makes the accuracy of textual emotional judgment relatively low.

Nowadays, machine learning is a common method for researchers to analyze text emotion. The computer processes the text, extracts the text features and outputs the sentiment analysis. Machine learning methods have obvious advantages over emotion dictionary method which rely on manual work heavily. The machine learning methods of text sentiment analysis are mainly divided into supervised sentiment anal- ysis and unsupervised sentiment analysis. The method pro- posed in this paper is a supervised sentiment analysis method, which is briefly introduced in the following part.

The basic principle of supervised sentiment analysis is to use the labeled text training model and use the trained model to conduct sentiment analysis on the unlabeled text. In addition to traditional machine learning methods, such as support vector machine, there are also deep learning methods

VOLUME 8, 2020 78379

Y. Lin et al.: Sentiment Analysis With Comparison Enhanced Deep Neural Network

such as CNN and RNN. Pang et al. [9] applied three rep- resentative classifiers (support vector machine, Naive Bayes and maximum entropy) to conduct experimental research on text sentiment analysis task, which has a high accuracy rate. Kim [10] proposed the classification of text CNN, becoming one of the important baselines of sentiment analysis task. Brueckner and Schuller [11] used Bi-LSTM in sentiment analysis task, which contributes to the solution of obtaining both historical information and future information by using the bidirectional propagation mechanism. Tang et al. [12] used two different RNNs to conduct sentiment analysis in combination with texts and themes. Wang etal. [13] proposed an Attention-based Long Short-Term Memory Network for aspect-level sentiment classification. The attention mecha- nism can concentrate on different parts of a sentence when different aspects are taken as input. Baziotis et al. [7] believed that deep LSTM with attention(D-LSTM) can improve the performance of the model. Shen et al. [14] proposed a novel LSTM, called ON-LSTM, to deal with natural language pro- cessing problems. The neurons in the LSTM are specifi- cally ordered to express richer information. Du et al. [15] proposed a new network architecture, called CRAN, which combines a recurrent neural network with convolution-based attention model and further stacks an attention-based neural model to build a hierarchical sentiment classification model. In recent years, more and more new methods have emerged in this field. Many researchers have realized the advantages of machine learning methods and applied them to the task of text sentiment analysis to improve the classification accuracy.

III. MODEL DESCRIPTION The CE-B-MHA first generates word vectors based on text, and enters them into the Bi-LSTM network. Bi-LSTM can capture the context relationship of encoded word sequences initially. At the same time, the Multi-Head Attention mech- anism in the model can capture long distance text features effectively. Finally, comparison mechanism scores the items by comparing with samples, instead of fitting the hard-to- learn patterns.

A. WORD EMBEDDING Before text enter into the network, it needs to be con- verted into word vectors for computer processing. Therefore, Embedding Layer is added to encode the text. Word2Vec is a commonly used tool for training word vectors [16], which can convert a word into vector form quickly and effectively according to a given corpus. In this paper, Word2Vec is used in advance to process text and generate the required word vectors. After loading the text, the sentence is divided into words, and the stop words are removed. In the embedding layer, word vectors are read and input into the model as initialization values.

B. Bi-LSTM We introduces LSTM network to capture contextual order information of sequences. LSTM [6] is a common method

in processing sequence data. It introduces a self-cycling mechanism that makes it easier to learn long-term dependent information than a simple cycling structure.

LSTM is a special type of recurrent neural network, which has internal structure named ‘‘LSTM cell’’. In the model, the ‘‘gate’’ structure is used to realize the selective passing of information. The gate consists of a sigmoid layer with weights in the [0,1] and a multiplication operation to remove or add information to the cell state. Three ‘‘gate’’ structures were set in each LSTM cell: Forget Gate, Input Gate and Out- put Gate, performing different functions to control cell state.

ft = σ(Wf xt +Uf ht−1 +bf ) (1)

it = σ(Wixt +Uiht−1 +bi) (2)

ot = σ(Woxt +Uoht−1 +bo) (3)

ĉt = tanh(Wcxt +Ucht−1 +bc) (4)

ct = ft ∗ ct−1 + it ∗ ĉt (5)

ht = ot ∗ tanh(ct) (6)

In LSTM cell, xt represents the input of the current cell, ht−1 represents the output of the previous cell, ht represents the output of the current cell, ct−1 and ct represent the pre- vious cell state and the current cell state. ft, it and ot are the outputs of three gates.

The model uses the Forget Gate to determine what informa- tion to discard from the cell state. The gate inputs ht−1 and xt, and outputs a weight between 0 and 1 multiplied by the number in the cell state ct−1: 1 for ‘‘completely retained’’ and 0 for ‘‘completely discarded’’. Use the input gate to determine how to add new information to the cell state. Candidate value vectors are generated using a tanh layer and multiplied with the results of the sigmoid layer to determine the values that should be added to the cell state. Add this value to the cell state and update the old cell state to ct. The output gate is used to determine the output value based on the cell state ct. The cell state ct is processed by tanh and multiplied with the result of the sigmoid layer to get the cell output ht. LSTM can output all the hidden vectors including h1 to ht. LSTM cell is abstracted as a function with inputs of ht−1

and xt, outputs of ht. which can be expressed as follows:

ht = LSTM (xt,ht−1) (7)

The bidirectional LSTM (Bi-LSTM) can extract the fea- tures of sequences from the front and back directions, and can better capture the contextual order information of sequences. In this paper, Bi-LSTM is adopted to extract local order information of text, and the forward LSTM and backward LSTM are combined to form Bi-LSTM. The forward and backward LSTM respectively corresponding to the output →

ht and ←

ht. →

ht = LSTM ( xt, →

ht−1 )

(8) ←

ht = LSTM ( xt, ←

ht−1 )

(9)

ht = Concat ( →

ht ) t , ←

ht (10)

78380 VOLUME 8, 2020

Y. Lin et al.: Sentiment Analysis With Comparison Enhanced Deep Neural Network

FIGURE 1. Comparison Enhanced Bi-LSTM with Multi-Head Attention (CE-B-MHA).

The inputs of forward LSTM cell are →

ht−1 and xt, and the output is

→

ht. Similarly, the inputs of backward LSTM cell are

←

ht−1 and xt, and the output is ←

ht. Will →

ht and ←

ht splicing (Concat) together, generate a new vector. it contains all information of forward and backward. The final output of Bi-LSTM layer is a matrix made up of vector h1 to ht.

C. MULTI-HEAD ATTENTION Multi-Head Attention (MHA) mechanism is used to fully capture long-distance features and obtain global information. The output vectors of Bi-LSTM layer, h1 to ht, are combined to form a matrix, which will become the three inputs of MHA, named Q (Query), K (Key), V (Value) (Q = K = V), as shown in FIGURE 1.

Q = K = V = [h1,h2, · · · ,ht] (11)

SDPA(Q,K,V) = soft max ( QKT √ dK

) V (12)

The Scaled Dot-Product Attention (SPQA) compute the dot product between Q and K, and going to divide it by a scale of

√ dK . Purpose of

√ dK is to play a regulatory role,

so that the inner product is not too large. Soft-max operations are used to normalize the result into a probability distribution, and then multiplied by the matrix V .

headi = SDPA ( QWQi ,KW

K i ,VW

V i

) (13)

MHA(Q,K,V) = Concat (head1, · · · ,headh)W (14)

In Multi-Head Attention, a linear transformation is per- formed for Q, K and V with different parameters WQi , W

K i

and WVi for h times, and input results into Scaled Dot- Product Attention (SDPA). Each input is the original input transformed by a linear transformation. Make each result

headi learn features in different representation spaces. The results head1 to headh are spliced (Concat) after h times of Scaled Dot-Product Attention, and they are combined into a large matrix. In this way, we get the information from h differ- ent representation spaces. The spliced matrix is transformed linearly by the parameter W, and the obtained value is the result of Multi-Head Attention (MHA).

Bi-LSTM layer can effectively obtain the context order information of sequences; Multi-Head Attention mechanism can learn information from different dimensions and rep- resentation subspaces, and fully capture long-distance text features. They complement each other and can improve the emotional analysis ability of the model effectively.

D. COMPARISON MECHANISM We introduce comparison mechanism to enhance the learn- ing ability of the model. Comparison mechanism scores the sentence embedding which generate from MHA by com- paring with samples. Positive samples and negative samples are selected from the labeled training data. The samples are selected by random in this paper. The number of positive samples and negative samples should be equal to obtain better results. Corresponding sentence vectors of these samples are generated to become a part of the model.

A simple method is used to generate sentence vectors of positive and negative samples. We get word vectors of sam- ples from Embedding Layer, and sample vectors are obtained by averaging all the word vectors in the sentence.

We use neural network with hidden layer as a similarity function to get a similarity score for classification. The activa- tion functions of neural networks are respectively ‘‘relu’’ and ‘‘sigmoid’’. The input of the neural network is the connection (Concat) between the sentence embedding and a sample

VOLUME 8, 2020 78381

Y. Lin et al.: Sentiment Analysis With Comparison Enhanced Deep Neural Network

FIGURE 2. Details of comparison mechanism.

vector, and the output layer size is 1. Its hidden layer size V is the length of sentence vector.

Score = W2(W1Concat(sample,s)+b1)+b2) (15)

In the formula, the neural network is represented by two linear transformations and bias, and the parameters are W1, b1, W2 and b2. The s represents the sentence embedding, and sample represents the sample vector. Every sample can calculate similarity score with sentence

embedding. The method to integrate them is to calculate weighted sum of the scores. The result of the Comparison Mechanism can be obtained in this way.

r = K∑ i=1

(wiScorei) (16)

In the formula, wi represents the weight of each score. We use a layer of neural network to calculate the weights. K represents the size of samples selected.

IV. EXPERIMENTS A. DATASETS The experiment was performed on three datasets, which are Large Movie Review Dataset [17], Semeval2017-task4-A English [18] and Stanford Sentiment Treebank [19]. In the experiment, their training and test sets were directly used. 20% of the original training set was assigned as the verifi- cation set, and the rest was assigned as training set.

The Large Movie Review Dataset (IMDB) is a com- monly used IMDB comment sentiment analysis dataset, which contains 25,000 positive and 25,000 negative samples. Semeval2017-task4-A English (Semeval2017) is a dataset provided by task4 of the Semeval2017 competition, contain- ing more than 7,000 positive and 3,000 negative samples. Stanford Sentiment Treebank (SST) is a sentiment analysis dataset provided by Stanford, containing 5,000 positive and 4,500 negative samples. For the datasets with three senti- ments, the neutral samples were removed.

B. RESULTS RNN, CNN, ON-LSTM [14], D-LSTM [7], and Bi-LSTM with attention were used as baselines. We did ablation exper- iments on Bi-LSTM, MHA, B-MHA, CE, and CE-B-MHA. Where B-MHA refers to Bi-LSTM with MHA, and CE refers to independent Comparison Mechanism.

We implemented the model by using the Keras, a popular deep learning tool based on Python. We use Adam [20] as optimizer, and cross entropy as the loss. TABLE 1, 2 and 3 respectively show the experimental results of the three met- rics. The item with the highest score in each column is highlighted in bold.

TABLE 1. The ACC of baselines and ablation experiments.

TABLE 2. The AUC of baselines and ablation experiments.

TABLE 1 shows the ACC of three datasets. For IMDB, the ACC of CE-B-MHA is the highest among all models, which is about 1.4% higher than baselines. After combining the Bi-LSTM and MHA, the effect of B-MHA is better than either one alone. After joining CE, the performance has been further improved. For Semeval2017, the ACC of CE-B-MHA is the highest among all models, which is about 3.1% higher than baselines. The ablation experiments have similar results as IMDB. For SST, the dataset is smaller and the sentences in the dataset are shorter. Overly complex models are prone to serious overfitting, and the simpler model gets better results.

TABLE 2 and 3 respectively show the AUC and Precision of three datasets. For IMDB, the AUC and Precision of

78382 VOLUME 8, 2020

Y. Lin et al.: Sentiment Analysis With Comparison Enhanced Deep Neural Network

TABLE 3. The Precision of baselines and ablation experiments.

FIGURE 3. The effect of comparison mechanism. We train CE-B-MHA on Semeval2017 with different sample sizes. This result is generated from the validation set.

CE-B-MHA are the highest among all models. For Semeval2017, the AUC of CE-B-MHA is the highest among all models. Although CE-B-MHA can achieve good results comparing with the baselines, it is not optimal on all metrics. The enhancement effect of Comparison Mechanism on Preci- sion is weak. This may be due to an imbalance in the number of positive and negative data.

C. THE EFFECT OF COMPARISON MECHANISM We tested the effects of different sample sizes on the enhance- ment of Comparison Mechanism. The sample size means the number of positive and negative sample pairs. FIGURE 3 shows the results of CE-B-MHA on Semeval2017 with different sample sizes. With fewer samples, the enhancement of Comparison Mechanism is not obvious. When the sample size is higher than 35, the influence of sample size on the model performance decreases.

V. CONCLUSION We propose a method to analyze text sentiment named Comparison Enhanced Bi-LSTM with Multi-Head Attention (CE-B-MHA). Experiments show that the sentiment analysis effect of CE-B-MHA is improved, compared with existing classification models.

CE-B-MHA has a complex internal structure. It combines the ability of Multi-Head Attention to obtain global informa- tion with the ability of Bi-LSTM to obtain sequence informa- tion. The Bi-LSTM network is used to obtain the internal rela- tion between the front and back directions of sentences and obtain the local order information. In addition, Multi-Head Attention mechanism is used to fully capture the features of long distance and learn relevant information from different dimensions and representation subspaces. On the other hand, CE-B-MHA introduces comparison mechanism to enhance the learning ability of the model. Comparison mechanism scores the items by comparing with samples instead of fitting the hard-to-learn patterns, and achieves good result.

REFERENCES [1] T. Nasukawa and J. Yi, ‘‘Sentiment analysis: Capturing favorability using

natural language processing,’’ in Proc. Int. Conf. Knowl. Capture (K-CAP), 2003, pp. 70–77.

[2] B. Liu, ‘‘Sentiment analysis and opinion mining,’’ Synth. Lectures Hum. Lang. Technol., vol. 5, no. 1, pp. 1–167, 2012.

[3] V. Mnih, N. Heess, and A. Graves, ‘‘Recurrent models of visual attention,’’ in Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 2204–2212.

[4] D. Bahdanau, K. Cho, and Y. Bengio, ‘‘Neural machine translation by jointly learning to align and translate,’’ 2014, arXiv:1409.0473. [Online]. Available: http://arxiv.org/abs/1409.0473

[5] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, N. A. Gomez, and K. aiser, ‘‘Attention is all you need,’’ in Proc. Adv. Neural Inf. Process. Syst., Long Beach, CA, USA, 2017, pp. 5998–6008.

[6] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’ Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.

[7] C. Baziotis, N. Pelekis, and C. Doulkeridis, ‘‘DataStories at SemEval- 2017 task 4: Deep LSTM with attention for message-level and topic- based sentiment analysis,’’ in Proc. 11th Int. Workshop Semantic Eval. (SemEval), 2017, pp. 747–754.

[8] J. Wiebe, T. Wilson, and C. Cardie, ‘‘Annotating expressions of opin- ions and emotions in language,’’ Lang. Resour. Eval., vol. 39, nos. 2–3, pp. 165–210, May 2005.

[9] B. Pang, L. Lee, and S. Vaithyanathan, ‘‘Thumbs up?: Sentiment classifi- cation using machine learning techniques,’’ in Proc. ACL Conf. Empirical Methods Natural Lang. (ACL), vol. 10, 2002, pp. 79–86.

[10] Y. Kim, ‘‘Convolutional neural networks for sentence classification,’’ 2014, arXiv:1408.5882. [Online]. Available: http://arxiv.org/abs/1408.5882

[11] R. Brueckner and B. Schulter, ‘‘Social signal classification using deep BLSTM recurrent neural networks,’’ in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), May 2014, pp. 4823–4827.

[12] D. Tang, B. Qin, X. Feng, and T. Liu, ‘‘Effective LSTMs for target- dependent sentiment classification,’’ 2015, arXiv:1512.01100. [Online]. Available: http://arxiv.org/abs/1512.01100

[13] Y. Wang, M. Huang, X. Zhu, and L. Zhao, ‘‘Attention-based LSTM for aspect-level sentiment classification,’’ in Proc. Conf. Empirical Methods Natural Lang. Process., 2016, pp. 606–615.

[14] Y. Shen, S. Tan, A. Sordoni, and A. Courville, ‘‘Ordered neu- rons: Integrating tree structures into recurrent neural networks,’’ 2018, arXiv:1810.09536. [Online]. Available: http://arxiv.org/abs/1810.09536

[15] J. Du, L. Gui, Y. He, R. Xu, and X. Wang, ‘‘Convolution-based neural atten- tion with applications to sentiment classification,’’ IEEE Access, vol. 7, pp. 22983–27992, 2019.

[16] T. Mikolov, K. Chen, G. Corrado, and J. Dean, ‘‘Efficient estimation of word representations in vector space,’’ 2013, arXiv:1301.3781. [Online]. Available: http://arxiv.org/abs/1301.3781

[17] A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts, ‘‘Learning word vectors for sentiment analysis,’’ in Proc. 49th Annu. Meeting Assoc. Comput. Linguistics, Hum. Lang. Technol., vol. 1, 2011, pp. 142–150.

[18] S. Rosenthal, N. Farra, and P. Nakov, ‘‘SemEval-2017 task 4: Senti- ment analysis in Twitter,’’ 2019, arXiv:1912.00741. [Online]. Available: http://arxiv.org/abs/1912.00741

VOLUME 8, 2020 78383

Y. Lin et al.: Sentiment Analysis With Comparison Enhanced Deep Neural Network

[19] R. Socher, A. Perelygin, J. Wu, and J. Chuang, ‘‘Recursive deep models for semantic compositionality over a sentiment treebank,’’ in Proc. Conf. Empirical Methods Natural Lang. Process., 2013, pp. 1631–1642.

[20] D. P. Kingma and J. Ba, ‘‘Adam: A method for stochastic optimization,’’ 2014, arXiv:1412.6980. [Online]. Available: http://arxiv.org/abs/1412.6980

YUAN LIN received the B.S. and Ph.D. degrees from the Dalian University of Technology, China, in 2006 and 2012, respectively. He is currently an Associate Professor with the School of Pub- lic Administration and Law, Dalian University of Technology. His current research interests include information retrieval and machine learning.

JIAPING LI is currently pursuing the bachelor’s degree with the Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology. His current research interests include machine learning and sentiment analysis.

LIANG YANG received the B.S. and Ph.D. degrees from the Dalian University of Technology, China, in 2009 and 2017, respectively. He is currently a Lecturer with the School of Computer Science and Technology, Dalian University of Technology. His current research interests include machine learning and sentiment analysis.

KAN XU received the B.S., master’s, and Ph.D. degrees from the Dalian University of Technology, China, in 2005, 2009, and 2017, respectively. He is currently a Senior Engineer with the School of Computer Science and Technology, Dalian Uni- versity of Technology. His current research inter- ests include patent retrieval and learning to rank.

HONGFEI LIN received the B.S. degree from Northeastern Normal University, in 1983, the M.Sc. degree from the Dalian University of Technology, in 1992, and the Ph.D. degree from Northeastern University, in 2000. He is currently a Professor with the School of Computer Science and Technology, Dalian University of Technol- ogy. He is also the Director of the Information Retrieval Laboratory, Dalian University of Tech- nology. He has published more than 100 research

articles in various journals, conferences, and books. His research interests include information retrieval, text mining for biomedical literatures, biomed- ical hypothesis generation, information extraction from huge biomedical resources, learning-to-rank.

78384 VOLUME 8, 2020

INTRODUCTION
RELATED WORKS

SENTIMENT ANALYSIS TASK
SENTIMENT ANALYSIS METHODS

MODEL DESCRIPTION

WORD EMBEDDING
Bi-LSTM
MULTI-HEAD ATTENTION
COMPARISON MECHANISM

EXPERIMENTS

DATASETS
RESULTS
THE EFFECT OF COMPARISON MECHANISM

CONCLUSION
REFERENCES
Biographies

YUAN LIN
JIAPING LI
LIANG YANG
KAN XU
HONGFEI LIN