Sentiment Classification and Aspect Based Sentiment Analysis On Yelp Reviews Using Deep Learning and Word Embeddings
Sentiment Classification and Aspect Based Sentiment Analysis On Yelp Reviews Using Deep Learning and Word Embeddings
To cite this article: Eman Saeed Alamoudi & Norah Saleh Alghamdi (2021): Sentiment
classification and aspect-based sentiment analysis on yelp reviews using deep learning and word
embeddings, Journal of Decision Systems, DOI: 10.1080/12460125.2020.1864106
ARTICLE
1. Introduction
The web is a highly common information source, complemented by the development of
social media. The number of people regularly involved in social networking has been
shown to be growing steadily (Hemmatian & Sohrabi, 2019). Blogs, tweets, articles,
reviews, social media conversations are analysed to collect people’s opinions. Online
reviews are a type of user-generated content focusing on the personal experiences of
a product. Online reviews are considered as the new generation of the word-of-mouth,
which known as electronic word-of-mouth (eWOM; Jeong & Jang, 2011). Nowadays, online
shopping websites provide online forums for product reviews and for expressing opi
nions. People share their opinions on various topics in the form of comments, tweets,
posts, and reviews (Hemmatian & Sohrabi, 2019). A study conducted by Li and Liu (2014)
indicated that 81% of Internet users have searched for relevant comments at least once
before purchasing a product. Search levels for corresponding comments were registered
between 73% and 87% prior to using restaurants, hotels and a range of other services. It is
important to note that these online contents have a major impact on consumer decisions,
CONTACT Eman Saeed Alamoudi [email protected] College of Computers and Information Technology,
Taif University, Alhawiah, 888 Taif, Saudi Arabia
© 2021 Informa UK Limited, trading as Taylor & Francis Group
2 E. S. ALAMOUDI AND N. S. ALGHAMDI
as new consumers usually trust previous clients reviews more than the owners’ product
ads or product details (Jeong & Jang, 2011). Both online users and companies have
increasingly started paying attention to reviews. Online customers take previous custo
mer experiences into account when choosing to buy a product or service. Moreover,
public opinion for businesses plays an important role in marketing products, generating
new opportunities and forecasting sales. In addition, management teams within organi
sations require these online feedbacks to be evaluated in order to determine customer
satisfaction level.
Digital transformation (e.g. online shopping and social media platforms) integrates
digital technology into businesses work in order to solve problems related to old operat
ing methods, attract new opportunities and deliver valuable services to customers
(Verhoef et al., 2021). Furthermore, incorporating big data analysis with business pro
cesses will lead to more accurate results-producing through decision support systems
(DSS; Osuszek et al., 2016). However, in the case of online review, the vast number of
reviews has made it difficult for interested parties to read all the reviews and determine
the opinions and the quality of products. Subsequently, the artificial intelligence (AI)
branches such as sentiment analysis is simply an automated way for the classification of
emotions in the review text. It is a method of extracting information from the text. It seeks
to turn the large and unstructured dataset into observable indices of sentiment (e.g.
positive, negative, or neutral). The extracted information can be summarised opinions
presented in the form of numbers or graphs, which makes producing the required
information by interested persons, such as managers or customers, easy and quick. This
underlines the inspiration behind sentiment analysis and generates greater interest in this
field of research.
In this research, restaurant reviews are analysed from the Yelp website. This research
employs natural language processing (NLP) and opinion mining, known as sentiment
analysis, to analyse online users’ reviews. The study aims to examine the relationship
between reviews’ contents and the ratings that are assigned to them by users in order to
automate the sentiment classification by building models that can predict sentiments
involved in the reviews.
The primary contributions of this paper are as follows:
(1)This paper compared three different prediction techniques of sentiment classifica
tion, machine learning, deep learning and transfer learning.
(2)A novel unsupervised technique has been applied in aspect extraction based on pre-
trained language models and semantic similarity, which eliminate the labour of data
annotating and the needs of supervised model training.
(3)The aspect polarity (or sentiment) has been detected and the aspect average rating
has been computed to enable a comparison of restaurants based on different aspects,
such as food service, ambience, and price instead of comparing them only based on the
overall rating.
(4)The analysis findings have been presented simply and conveniently to help custo
mers find valuable information and make a confident purchasing decision and, on the
other hand, to enable organisations to identify the level of customer satisfaction in order
to make appropriate decisions.
The remainder of this paper is organised as follows: The Related Work section reviews
the latest work in the field. The Research Methodology section presents the background
JOURNAL OF DECISION SYSTEMS 3
of the methods used in this study. The Implementation and Experiment section describes
the dataset and experimental settings. Results and Evaluation section evaluates and
discusses the performance of the study results. Finally, the conclusions, along with future
work, are presented in Conclusion and Future Work.
2. Related work
All the approaches used in the field of sentiment classification and extraction can be
grouped into three main classes (Yadav & Vishwakarma, 2020): prediction-based methods,
lexicon-based methods, and hybrid methods.
Machine Learning-based methods have been implemented via various studies. In
(Vairetti et al., 2020) the authors proposed a modified version of the support vector
machine (SVM) algorithm, which consists of including a new parameter, γ, for weighting
the contribution of each part of a review, title and body. To extract aspects from the
restaurant reviews (e.g. food, costs, service, environment, and anecdotes/miscellaneous)
in Kiritchenko et al. (2014), five binary one-vs-all SVM classifications were used to deal with
this problem as a multi-label text classification.
Lexicon-based methods have also been involved in literature either purely or jointly with
prediction models. Yu et al. (2018) proposed a word vector refinement model to refine
existing pre-trained word vectors using real-valued sentiment intensity scores provided by
the extended version of the Affective Norms of English Words (E-ANEW) sentiment lexicon.
Rezaeinia et al. (2019) introduced a novel method, improved word vectors (IWV), to
increase the accuracy of pre-trained word embeddings in sentiment analysis. This is done
by connecting traditional word2vec/GloVe (global vector for word representation) embed
dings for every word in a sentence, with three other embeddings representing using six
different lexicons. In order to improve the accuracy and analysis speed of twitter sentiment
classification, Jianqiang et al. (2018) introduced a word embedding method called GloVe-
DCNN (GloVe-deep convolution neural networks) using the AFINN lexicon. The lexicon-
enhanced LSTM (LE-LSTM) model introduced by Fu et al. (2018) to enhance LSTM networks
for sentiment classification tasks on the unlabelled datasets.
The majority of recent research relies on deep learning techniques, which have widely
proven their high potential to obtain competitive results. One of the primary methods of
deep learning is the convolutional neural network (CNN), which was originally developed
to work in the field of computer vision. A novel and simple CNN model, which uses two
forms of pre-trained embedding for the extraction of aspect (general-purpose embedding
and domain-specific embedding) was applied in Xu et al. (2018). An approach named
semantic-based padding was proposed by Giménez et al. (2020) to improve the perfor
mance of CNN in NLP tasks.
Regarding the implementation of deep learning sequence models, the study sug
gested by Rao et al. (2018) was aimed at addressing one of the difficulties in the
deployment of long short-term memory networks (LSTM), in document-level sentiment
classification, which is to model the semantic relations between sentences. To solve this
problem, two improvements were introduced. The first is SR-LSTM, which stands for
sentence representation LSTM. The second improvement is SSR-LSTM (sorted SR-LSTM),
which is an approach to improve SR-LSTM, by first removing sentences with less emo
tional polarity before data are input to the SR-LSTM model. The idea is that although
4 E. S. ALAMOUDI AND N. S. ALGHAMDI
LSTMs can theoretically handle long sequences, it still can benefit from removing part of
the sequence that is not relevant to the task at hand.
A new form of deep contextualised word representation, embeddings from language
model (ELMo) representations, was introduced by Peters et al. (2018). This differs from
conventional word embedding as every word is represented by an embedding vector,
which is a function of the entire input sequence. A bidirectional LSTM (BiLSTM) network
was used to train a language model (LM) on a large text corpus; thus, each word in the
downstream function is represented by a weighted sum of all the hidden vectors that
match the same word in the BiLSTM layers. The aim of the downstream task model is,
therefore, to learn the linear combination of these hidden vectors. Jeremy Howard and
Sebastian Ruder (2018) introduced ULMfiT (universal language model fine-tuning for text
classification), which is an efficient form of TL that can be applied to any NLP task. There
are three key steps in the algorithm. First, training an LM based on the state-of-the-art
AWD-LSTM network on a broad general text-domain to learn general language character
istics (Chen et al., 2010). Second, fine-tuning the LM on downstream task data to learn
task-specific features using discriminative fine-tuning and slanted triangular learning
rates (STLR). Finally, adding to the LM two fully connected layers, both of which are the
classifier. The classifier is fine-tuned to the task-specific data using STLR and gradually
unfreezing the layers in order to preserve low-level representations and adjust high-level
representations. Devlin et al. (2018) introduced bidirectional encoder representations
from transformers (BERT) by pre-training a multi-layer bidirectional Transformer encoder
based on the original implementation described by Vaswani et al. (2017). BERT primarily
consists of two stages: pre-training and fine-tuning. BERT is pre-trained by two unsuper
vised tasks: the masked language model (MLM), in which some percentage of input
tokens are randomly masked, and the model’s goal is to predict the masked tokens.
The second task is next sentence prediction (NSP), which aims to respond to the question
whether sentence B is in the sequence after sentence A. BERT is later fine-tuned for the
downstream task by adding one additional output layer that is appropriate for the task
and fine-tuning all end-to-end parameters.
By reviewing the literature, we found that sentiment classification and aspect extrac
tion have not been studied satisfactorily on Yelp dataset. There is no previous research
that compared the performance of different types of prediction models: machine learn
ing, deep learning, and transfer learning models using two different classification tasks,
binary classification and ternary classification. In addition, no simple way has been
suggested to extract aspect from un-labelled dataset such as Yelp. Therefore, this research
gap will be filled by our proposed study. This paper aims to conduct a comparison
between the efficiency of three various classification models using different feature
extractions. Furthermore, a new approach to extract aspect from restaurants review will
be introduced, which can be generalised easily to apply in different aspect extraction
domain.
3. Research methodology
This section discusses two methodologies, the first methodology is for review classifica
tion, and the second is the methodology of aspect extraction and polarity. The sequences
of the methodologies are described in Figures 1 and 2.
JOURNAL OF DECISION SYSTEMS 5
based on the approach of semantic similarity. To the best of our knowledge, we are the
first to use this method in the field of aspect extraction.
situation, further filtering was performed to retain only restaurants with over 500 reviews,
as this criterion results in an appropriate number of reviews that can be used for
comparable analysis. (2) SemEval-2014 Dataset: SemEval-2014 was used in order to
examine the approach proposed for aspect extraction as the Yelp dataset is not annotated
in relation to the restaurants’ aspects. The SemEval-2014 dataset is annotated with
restaurants reviews data. This dataset is provided for Task 4 of SemEval-2014, Aspect-
Based Sentiment Analysis (ABSA). It comprises 3,044 English sentences from Ganu et al.
(2009) restaurant reviews. There are descriptions for many details (e.g. aspect terms,
aspect categories). This research only focuses on aspect categories for each sentence.
The dataset was downloaded from the META-SHARE website.3
represents a rating of 4 stars and above, whereas the negative class corresponded to 2
stars and below. The 3-star rating was allocated to the neutral class. Second, the Yelp
dataset was adapted to the binary-class task, positive and negative reviews. The positive
class represents a rating of 4 stars and above, whereas the negative class was assigned to
2 stars and below. The 3-star rating was dropped. The data splitting mechanism is
explained in detail in Table 1.
Figure 3. A review text before and after the pre-processing stage for machine learning models.
'I like this restaurant !? .......' 'I like this restaurant' [[3, 40, 16, 75]]
Figure 4. A review text before and after the pre-processing stage for deep learning models.
and batch-normalisation were applied after both the 1D-convolutional layers and the 128-
dense layer, with a dropout rate of 0.25 for the first 1D-convolutional layer and 0.1 for
both the second 1D-convolutional layer and the 128-dense layer. Figure 5 shows the
architecture layers of the CNN model.
Other hyperparameters were: a maximum learning rate of 0.001 (1e−3), weight decay (a
regularisation for overfitting reduction) of 0.01, the type of cross-entropy loss was cate
gorical, Adam gradient descent was used as an optimiser, a batch size of 256, and four
epochs were selected to implement the model.
Three different experiments were applied using the CNN model. In the first experiment,
the embedding layer’s weights were assigned randomly and were fine-tuned during the
experiment. In the second experiment, the GloVe model (word-embedding model) from
spaCy was used to initialise the weights of the embedding layer (30,000 rows, 300
columns) and the weights’ values of the embedding layer were frozen, not trained (fine-
tuned) during the experiment. In the third experiment, the same GloVe model from spaCy
was used but with the weights’ values fine-tuned during the training phase.
The second model is BERT. Specifically, the BERT-base model was used. Before the BERT
model was trained on the entire data, a sample of 25% of the data was selected to train
the model as an initial training stage. It ensured that the classes were presented as the
same proportion in the full dataset. The input length was determined with 256 words and
an embedding layer of 30,000 rows, and 768 columns. During this stage, the model was
fine-tuned using the recommended hyper-parameters’ values mentioned in Zhang et al.
(2015), which are: maximum learning rate of (2e−5), Adam as an optimiser, four epochs,
and a batch size of 32. In the second training stage, the BERT model was trained using the
same architecture and the optimal weights obtained from the first stage on the full
dataset. The hyper-parameters’ values used in the second stage were: a batch size of
32, maximum learning rate of (1e−6) and one epoch. The third model is ALBERT for self-
supervised learning of language representations. We used the ALBERT-base model and
applied the same training methodology and the hyper-parameters’ values that were
applied on the BERT model experiment with a maximum learning rate of (5e−6) for the
three classes dataset and (6e−6) for binary classes in the first stage and a maximum
learning rate of (2e−7) for both classes dataset in the second stage. All models’ experi
ments were carried out using a Tesla P100-16GB GPU by NVIDIA.
determine the most significant aspects that customers discussed in their comments. The
results showed three obvious patterns, related to three aspects: food, service, and place,
whereas the fourth segment consisted of various aspects not based on a particular aspect.
Table 2 displays examples of three trigrams from each cluster.
computed using the cosine similarity. An example of the semantic similarity implementa
tion is explained in Table 3.
Numberofcorrectpositiveresults
Precition ¼ (2)
Numberofallresultsthatwerepredictedaspositive
Numberofcorrectpositiveresults
Recall ¼ (3)
Numberofallpositivesamples
1
F1Score ¼ 2 � 1 1 (4)
Precitionþ Recall
N X M
1X �
LogLoss ¼ yij � log pij (5)
N i¼1 j¼1
Table 9. Average sentiment rating for highest rated and lowest rated restaurants in the 3-class Yelp
dataset.
No. of reviews No. of negative No. of positive Average sentiment
Restaurant’s name reviews No. of neutral reviews reviews rating
Brew Tea Bar 10 15 1140 4.965
KFC 1769 208 306 1.589
� � � �
P 2
Rating ¼ �4 þ1¼ � 4 þ 1 ¼ 5:00 (6)
PþN 2þ0
P represents the number of positive reviews (or positive sentences in all reviews), and
N represents the number of negative reviews (or negative sentences in all reviews). The
computed average rating is scaled in the 1–5 range, to be comparable to the original
rating that was used in the evaluation system of the Yelp dataset. We compared the
overall average rating and the aspect average rating of the highest-rated restaurant, Brew
Tea Bar, and the lowest-rated restaurant, KFC. Brew Tea Bar’s aspect average rating is
4.3974, whereas the actual one is 4.9652. KFC’s aspect average rating is 2.2204, whereas
the actual value is 1.5898. There is only about six decimal point difference between both
readings. The radar graph in Figure 8 shows the average rating based on the aspects of
Brew Tea Bar and KFC. The close results indicate that a reasonable aspect average rating
Figure 8. Average rating of ‘Brew Tea Bar’ and ‘KFC’ based on aspects.
18 E. S. ALAMOUDI AND N. S. ALGHAMDI
for a restaurant can be obtained, which provides a better understanding of the important
information in customer review and can also be utilised to rank the restaurants based on
a selected aspect.
Table 11. Illustration of aspect extraction proposed method on the SemEval-2014 dataset.
Predicted
Sentences of reviews Food Service Ambiance Price aspect Actual aspect Match
But the staff was so horrible to us. 0.48779 0.49894 0.25924 0.37803 Service [Service] True
To be completely fair, the only 0.53761 0.47861 0.23591 0.44604 Food [Food,
redeeming factor was the food, anecdotes/
which was above average, but
could not make up for all the other
deficiencies of Teodora.
miscellaneous] True
The food is uniformly exceptional, 0.59301 0.48844 0.32160 0.40226 Food [Food] True
with a very capable kitchen which
will proudly whip up whatever you
feel like eating, whether it’s on the
menu or not.
Where Gabriela personally greets you 0.51113 0.38971 0.28075 0.31797 Food [Service] False
and recommends you what to eat.
Not only was the food outstanding, 0.55962 0.48798 0.28862 0.41952 Food [Food, service] True
but the little’ perks’ were great.
experiments that involved pre-trained word embeddings). TL methods include CNN with
fine-tuning GloVe and frozen GloVe and a fine-tuning technique for BERT base and
ALBERT base models. A range of different evaluation metrics was applied to assess the
models’ performances. The results were evaluated using several metrics such as accuracy,
precision, recall, F1 score, and log loss. The results of all experiments were discussed and
compared to the current state-of-the-art results in the NLP sentiment analysis domain.
A novel and universal method of aspect extraction was introduced, evaluated, discussed,
and compared to the current state-of-the-art results in aspect extraction studies. Finally, it
can be observed that each classification method and each predicted model have their
own advantages and drawbacks, and choosing the appropriate approach is a difficult
decision involving a degree of compromise. Deep learning approaches have provided
accurate performance and enabled the skipping of the complicated feature extraction
process, while it often requires long training time. Machine learning models, on the other
hand, required less computational sophistication, but with high data preparing require
ments. Moreover, the unsupervised method with the competitive results offers an easy
and universal approach that can be adapted to distinct tasks. The entire used pipeline for
the data pre-processing, the feature extraction and the built learning models by adopting
specific structures and appropriate values for hyperparameters have been contributed to
improve results. However, it was found that transfer learning (pre-trained word embed
dings) applied when training the models and when finding the semantic similarity had the
most effect in improving the results for both the sentiment classification and the aspect
extraction.
Recommendations for future research include:
•Conducting more deep learning approaches such as sequence-to-sequence
approaches (e.g. recurrent neural network (RNN), gated recurrent units (GRUs), and
LSTM) and other transformer models (e.g. RoBERT and DistilBERT).
•Implementing human-labelled reviews where the sentiments are classified by expert
human annotators, which could help to overcome the problem of mislabelled reviews.
•Learning word embeddings on specific-domain tasks, which could boost the models’
performance.
•Training word embeddings on informal dialects, which may help improve the models’
performance.
•Expanding the current research with other languages, like Arabic.
•Implementing on reviews in other domains, such as online products.
•Implementing the attention mechanisms on the proposed method of aspect
extraction.
Notes
1. https://www.yelp.com/.
2. https://www.kaggle.com/.
3. http://metashare.ilsp.gr:8080/.
4. https://spacy.io/.
5. https://pypi.org/project/ktrain/.
6. https://github.com/marcotcr/lime/.
JOURNAL OF DECISION SYSTEMS 21
Acknowledgments
The authors offer their sincere thanks to Bank Albilad Chair of Electronic Commerce (BACEC) for their
financial support to conduct this successful research. This research was also funded by the Deanship
of Scientific Research at Princess Nourah bint Abdulrahman University through the Fast-track
Research Funding Program.
Disclosure statement
No potential conflict of interest was reported by the authors.
Funding
This work was supported by the Princess Nourah Bint Abdulrahman University [1000-FTFP-20].
ORCID
Eman Saeed Alamoudi http://orcid.org/0000-0001-7186-9406
Norah Saleh Alghamdi http://orcid.org/0000-0001-6421-6001
References
Abdi, A., Shamsuddin, S.M., Hasan, S., & Piran, J. (2019). Deep learning-based sentiment classification
of evaluative text based on multi-feature fusion. Information Processing and Management, 56(4),
1245–1259. https://doi.org/10.1016/j.ipm.2019.02.018
Batista, G.E.A.P.A., Prati, R.C., & Monard, M.C. (2004). A study of the behavior of several methods for
balancing machine learning training data. ACM SIGKDD Explorations Newsletter, 6(1), 20–29.
https://doi.org/10.1145/1007730.1007735
Chen, D., Denman, S., Fookes, C., & Sridharan, S. (2010). AWD-LSTM. Proceedings of the 2010 digital
image computing: Techniques and applications (DICTA 2010), 369–374. Sydney, Australia. https://
doi.org/10.1109/DICTA.2010.69
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional
transformers for language understanding. ArXiv. http://arxiv.org/abs/1810.04805
Do, H.H., Prasad, P.W.C., Maag, A., & Alsadoon, A. (2019). Deep learning for aspect-based sentiment
analysis: A comparative review. Expert Systems with Applications, 118, 272–299. https://doi.org/10.
1016/j.eswa.2018.10.003
Fu, X., Yang, J., Li, J., Fang, M., & Wang, H. (2018). Lexicon-enhanced LSTM with attention for general
sentiment analysis. IEEE Access, 6, 71884–71891. https://doi.org/10.1109/ACCESS.2018.2878425
Ganu, G., Elhadad, N., & Marian, A. (2009). Beyond the stars: Improving rating predictions using
review text content. WebDB, 9, 1–6.
Giménez, M., Palanca, J., & Botti, V. (2020). Semantic-based padding in convolutional neural net
works for improving the performance in natural language processing: A case of study in senti
ment analysis. Neurocomputing, 378, 315–323. https://doi.org/10.1016/j.neucom.2019.08.096
Hemmatian, F., & Sohrabi, M.K. (2019). A survey on classification techniques for opinion mining and
sentiment analysis. Artificial Intelligence Review, 52(3), 1495–1545. https://doi.org/10.1007/
s10462-017-9599-6
Howard, J., & Ruder, S. (2018). Universal language model fine-tuning for text classification. ACL 2018:
56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the
Conference, 1, 328–339. https://doi.org/10.18653/v1/p18-1031
Huang, A. (2008). Similarity measures for text document clustering. New Zealand computer science
research student conference, NZCSRSC 2008, 49–56. Christchurch, New Zealand.
22 E. S. ALAMOUDI AND N. S. ALGHAMDI
Jeong, E.H., & Jang, S.C.S. (2011). Restaurant experiences triggering positive electronic word-of-
mouth (eWOM) motivations. International Journal of Hospitality Management, 30(2), 356–366.
https://doi.org/10.1016/j.ijhm.2010.08.005
Jianqiang, Z., Xiaolin, G., & Xuejun, Z. (2018). Deep convolution neural networks for twitter senti
ment analysis. IEEE Access, 6, 23253–23260. https://doi.org/10.1109/ACCESS.2017.2776930
Johnson, R., & Zhang, T. (2017). Deep pyramid convolutional neural networks for text categorisation.
ACL 2017: 55th annual meeting of the association for computational linguistics, proceedings of the
conference (Long papers), 1, 562–570. Vancouver, Canada. https://doi.org/10.18653/v1/P17-1052
Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2017). Bag of tricks for efficient text classification.
15th conference of the European chapter of the association for computational linguistics, EACL 2017,
2, 427–431. Valencia, Spain. https://doi.org/10.18653/v1/e17-2068
Kiritchenko, S., Zhu, X., Cherry, C., & Mohammad, S. (2014). NRC-Canada-2014: Detecting aspects and
sentiment in customer reviews. Proceedings of the 8th international workshop on semantic
evaluation (SemEval 2014), 437–442. Dublin, Ireland. https://doi.org/10.3115/v1/s14-2076
Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). ImageNet classification with deep convolutional
neural networks. Advances in Neural Information Processing Systems, 2, 1097–1105.
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2019). ALBERT: A lite BERT for
self-supervised learning of language representations. ArXiv. http://arxiv.org/abs/1909.11942
Li, G., & Liu, F. (2014). Sentiment analysis based on clustering: A framework in improving accuracy
and recognising neutral opinions. Applied Intelligence, 40(3), 441–452. https://doi.org/10.1007/
s10489-013-0463-3
Liu, B. (2010). Sentiment analysis and subjectivity. In N. Indurkhya & F. Damerau (Eds.), Handbook of
natural language processing (2nd ed., pp. 627-666). Chapman & Hall: CRC Press.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations of words
and phrases and their compositionality. Advances in Neural Information Processing Systems,
3111–3119. http://arxiv.org/abs/1310.4546
Osuszek, L., Stanek, S., & Twardowski, Z. (2016). Leverage big data analytics for dynamic informed
decisions with advanced case management. Journal of Decision Systems, 25(Suppl. 1), 436–449.
https://doi.org/10.1080/12460125.2016.1187401
Pennington, J., Socher, R., & Manning, C.D. (2014). GloVe: Global vectors for word representation.
EMNLP 2014: 2014 Conference on Empirical Methods in Natural Language Processing, 1532–1543.
https://doi.org/10.3115/v1/d14-1162
Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep
contextualised word representations. Proceedings of the 2018 conference of the North American
chapter of the association for computational linguistics: Human language technologies, 1,
2227–2237. New Orleans, Louisiana. https://doi.org/10.18653/v1/n18-1202
Ramos, J. (2003). Using TF–IDF to determine word relevance in document queries. Proceedings of the
first instructional conference on machine learning, 242, 133–142.
Rao, G., Huang, W., Feng, Z., & Cong, Q. (2018). LSTM with sentence representations for
document-level sentiment classification. Neurocomputing, 308, 49–57. https://doi.org/10.1016/j.
neucom.2018.04.045
Rezaeinia, S.M., Rahmani, R., Ghodsi, A., & Veisi, H. (2019). Sentiment analysis based on improved
pre-trained word embeddings. Expert Systems With Applications, 117, 139–147. https://doi.org/10.
1016/j.eswa.2018.08.044
Sun, C., Qiu, X., Xu, Y., & Huang, X. (2019). How to fine-tune BERT for text classification? Lecture Notes
in Computer Science, 11856 LNAI(May), 194–206. https://doi.org/10.1007/978-3-030-32381-3_16
Vairetti, C., Martínez-Cámara, E., Maldonado, S., Luzón, V., & Herrera, F. (2020). Enhancing the
classification of social media opinions by optimizing the structural information. Future
Generation Computer Systems, 102, 838–846. https://doi.org/10.1016/j.future.2019.09.023
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., & Polosukhin, I.
(2017). Attention is all you need. Advances in neural information processing systems. NIPS’17:
Proceedings of the 31st international conference on neural information processing systems,
6000–6010. Long Beach, CA, USA. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
JOURNAL OF DECISION SYSTEMS 23
Verhoef, P.C., Broekhuizen, T., Bart, Y., Bhattacharya, A., Qi Dong, J., Fabian, N., & Haenlein, M. (2021).
Digital transformation: A multidisciplinary reflection and research agenda. Journal of Business
Research, 122, 889-901. https://doi.org/10.1016/j.jbusres.2019.09.022
Wei, Z., Miao, D., Chauchat, J.H., Zhao, R., & Li, W. (2009). N-grams based feature selection and text
representation for Chinese text classification. International Journal of Computational Intelligence
Systems, 2(4), 365–374. https://doi.org/10.1080/18756891.2009.9727668
Xu, H., Liu, B., Shu, L., & Yu, P.S. (2018). Double embeddings and CNN-based sequence labeling for
aspect extraction. ACL 2018: 56th annual meeting of the association for computational linguistics,
proceedings of the conference, 2, 592–598. Melbourne, Australia. https://doi.org/10.18653/v1/p18-
2094
Yadav, A., & Vishwakarma, D.K. (2020). Sentiment analysis using deep learning architectures: A
review. Artificial Intelligence Review, 53, 4335–4385. https://doi.org/10.1007/s10462-019-09794-5
Yu, L.C., Wang, J., Lai, K.R., & Zhang, X. (2018). Refining word embeddings using intensity scores for
sentiment analysis. IEEE/ACM Transactions on Audio Speech and Language Processing, 26(3),
671–681. https://doi.org/10.1109/TASLP.2017.2788182
Zhang, H. (2005). Exploring conditions for the optimality of naïve Bayes. International Journal of
Pattern Recognition and Artificial Intelligence, 19(2), 183–198. https://doi.org/10.1142/
S0218001405003983
Zhang, X., Zhao, J., & Lecun, Y. (2015). Character-level convolutional networks for text. In NIPS,
649–657.
Zhao, R., & Mao, K. (2018). Fuzzy bag-of-words model for document representation. IEEE Transactions
on Fuzzy Systems, 26(2), 794–804. https://doi.org/10.1109/TFUZZ.2017.2690222