SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 02 | Feb 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 95
UTILIZING TWITTER TO PERFORM AUTONOMOUS SENTIMENT
ANALYSIS
Akanksha Srivastava1, Mr. Sambhav Agarwal2
1M.Tech, Computer Science and Engineering, SR Institute of Management & Technology, Lucknow, India
2Associate Professor, Computer Science, and Engineering, SR Institute of Management & Technology, Lucknow
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Applications in many domains make Sentiment
Analysis an exciting area for study. The use of online polls and
surveys to get feedback from the public regarding goods,
current events, and societal or political issues are on the rise.
The public and the stakeholders benefit from hearing the
thoughts and feelings of the general public when important
choices must be made. Opinion mining is the practice of
gleaning insights from online sources including web search
engines, blogs, micro-blogs, Twitter, and social networks to
produce meaningful conclusions. Twitter's user base provides
a wealth of material from which to get insight intothepublic's
perspective. The massive volume of tweets as theunstructured
text makes it challenging to physically delineate the
information. Consequently, extracting and condensing the
tweets from corpora calls for expert computational
methodologies, which in turn necessitates familiarity with
terms that convey emotion. Sentiment analysis from the
unstructured text may be accomplished usingawidevarietyof
computer methodologies, models, and algorithms. The vast
majority are based on machine learning methods, namely the
Bag-of-Words (BoW) representation. In thisresearch, weused
a lexicon-based strategy to automatically identify sentiment
for tweets gathered from the Twitter public domain. To
further investigate the efficacy of alternative feature
combinations, we have used three distinct machine learning
algorithms for the task of tweet sentiment identification:
Naive Bayes (NB), Maximum Entropy (ME), and Support
Vector Machines (SVM). Our results suggest that bothNBwith
Laplace smoothing and SVM are successful in categorizingthe
tweets. The feature used for NB is unigramandPart-of-Speech
(POS), while unigram is utilized for SVM.
Key Words: Bag-of-Words, Lexicon, Machine Learning
Algorithms, Laplace Smoothing, Part-of-Speech.
1. INTRODUCTION
It has been found via two separate polls of over 2000
American adults that 81% of Internet users (or 60% of
Americans) have done product research online at least once
and that 20% of Internet users (15% of Americans) prefer it
on a certain day. We may claim that people's consumption of
goods and services is not the only factor for their online
information-seekingandopinion-sharingactivities.The need
for access to current political information is another critical
factor to consider. At the moment, individuals may utilize
email for political campaigns by sharing information and
discussing candidates and issues online. The user trusts
internet advice and suggestions since they deal mostly with
an opinion. Despite the generally pleasant experiences of
American Internet users during online product research,
Horrigan [1] found that 58% of users reported experiencing
missing, difficult-to-discover, confused, or overwhelming
online information. Therefore, there is a significant need for
improved information-access technologies to aid shoppers
and researchers. Web 2.0 sites like blogs, message boards,
and other kinds of social media havemadeiteasierthan ever
for customers to voice their thoughts and views on the
brands they use. In recent years, businesses have begun to
acknowledge the power that user reviews have on shaping
the perceptions of others and the standing of certain brands.
Companies are beginning to watch social media to react to
customer feedback and adjust their marketing, brand
positioning, product development, and other strategies
appropriately.
1.1. Opinion Mining and Sentiment Analysis
Extracting views from text is called "Opinion Mining" (OM).
Viewpoint mining (OM) is a new field at the intersection of
information retrieval, text mining, and computational
linguistics that seeks to detect the opinion represented in
natural language texts, as described by Pang et al. [3].
Opinion mining is a subfield of KDD that employs Natural
Language Processing (NLP) and statistical machine learning
methods to identify and distinguish between opinionated
and factual content. Tasks in opinionminingincludelocating
opinions, labeling them as favorable, negative, or neutral,
determining where those opinions originated, and
summarising them. To automatically extract a summary of
an entity's opinion from a largebodyof theunstructuredtext
is the primary goal of the Opinion Mining assignment.
Opinion Mining and Sentiment Analysis (SA) are two names
for the same thing: the study of how people feel about
something. An individual's thoughts, feelings, and
impressions about a matter, as expressed in the form of an
opinion, are deeply personal and confidential. Individuals,
groups, and societies may benefit greatly from the advice
and counsel of others throughout the decision-making
process, as concluded by the work of Liu et al. [2]. To act
swiftly and wisely, humans demand information that isboth
precise and brief. While making a choice, people often seek
advice from friends, family, and experts for whom they have
developed an opinion or point of view based on their own
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 02 | Feb 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 96
experiences, observations, conceptions, and beliefs (which
may or may not be good or negative).
2. SENTIMENT TARGET IDENTIFICATION
Identifying sentiment (opinion) targets isa crucial partofSA
work. The aim here might be anything from the subject of
the statement to the object of that statement. Everyone
involved in making and selling a product has to do a
thorough evaluation of it in light of public and buyer
feedback. Automatically identifying and extracting aspects
mentioned in reviews is a key step in conducting a review
comparison. Opinion mining and summarization, thus, rely
heavily on product feature mining[10].Sentimentanalysisis
a difficult field of study. This is because a system has to be
able to discern evaluative expressions and some qualities
that are not overtly present and need to be identified from
the term semantic to correctly identify opinion targets in a
phrase or document. Previous studies on the topic of
sentiment target identification have shown that several
Natural Language Processing (NLP) methods, including
processing, Part-of-Speech tagging, noise reduction, feature
selection, and classification, are all necessary stages in the
extraction process.
3. METHODOLOGY
Research data collecting is more complex than it may seem
since it requires drawing important and relevantinferences.
Test data, subjective training data, and objective (neutral)
training data are the three types of data that have been
gathered. The Twitter API will be covered beforehand.
3.1. Twitter API
Developers may access Tweets, DMs, media, and other
Twitter data using the Twitter API, which provides a
collection of programming interfaces. Through the API,
programmers may create products that communicate with
the Twitter service and carry out actions like publishing
Tweets, getting user information, and viewing trending
topics, among other things. Different endpoints,
authentication mechanisms, and useconstraintsapplyto the
API's several flavors, which include REST (Representational
State Transfer), streaming, and advertising. A Twitter
developer account and API keys (also known as access
tokens) are prerequisites for interacting with the API.
3.2. Twython
Twython is a Python library for accessing the Twitter API. It
provides a simple andconvenientwayforPythondevelopers
to interact with the Twitter platform and performtaskssuch
as posting Tweets,retrieving userinformation,andaccessing
timelines. Twython abstracts manyofthecomplexitiesofthe
Twitter API and provides a simple, Pythonic interface for
accessing the API's resources. To useTwython,you will need
to obtain API keys or access tokens froma Twitterdeveloper
account, and then use these credentials to initialize a
Twython client object, which you can use to make API
requests. The library supports both REST and Streaming
APIs and includes functionalityforOAuth1.0a andOAuth2.0
authentication.
3.3. Data Preprocessing in Twitter
Data preprocessing in Twitter involves cleaning and
transforming Twitter data into a format that is suitable for
further analysis or modeling. This may includetaskssuchas:
1. Data Collection: Collect raw data from the Twitter
API, such as tweets, user profiles, and trends.
2. Data Cleaning: Removing irrelevant information,
correcting errors, handling missing values, and
removing duplicates from the collected data.
3. Text Processing: Processing textual data from
tweets, such as removing stop words, stemming,
and converting text to lowercase.
4. Sentiment Analysis: Classifyingtweetsintopositive,
negative, or neutral sentiment categories.
5. Data Transformation: Converting the data into a
format that is suitable for analysis, such as
converting textual data into numerical
representations.
6. Data Reduction: Reducing the dimensionality ofthe
data, such as aggregating data by user or period.
These steps ensure that the data is in a clean, consistent,and
usable format, and help improve the accuracy and reliability
of any subsequent analysis or modeling.
3.4. Lexicon-Based Approach
The lexicon-based approach is a method used in sentiment
analysis and opinion mining to classify the sentiment of a
piece of text, such as a tweet, into positive, negative, or
neutral categories. Theapproachinvolvesusinga predefined
lexicon, or a list of words, that are associated with specific
sentiments.
In a lexicon-based approach, the sentiment of a piece of text
is determined by counting the number of words in the text
that match words in the lexicon and then aggregating the
sentiment scores associated with these words.Theresulting
sentiment score is then used to classify the text as positive,
negative, or neutral.
There are many different lexicons available for use in
sentiment analysis, each with its strengths and weaknesses.
Some popularlexiconsincludeSentiWordNet,theHarvardIV
dictionary, and the AFINN lexicon.
The lexicon-based approach is simple to implement and has
been widely used in sentiment analysis. However, it has
some limitations, such as being limited to the words in the
lexicon and not taking into account the context in which
words are used. To overcome these limitations, other
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 02 | Feb 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 97
approaches such as machine learning and deep learning
models have been developed.
3.5. SentiWordNet
SentiWordNet is a lexiconfor sentimentanalysisandopinion
mining. It is a manually constructed, multi-word expression
resource for the English language that provides sentiment
scores for words and phrases.
SentiWordNet assigns sentiment scores to words based on
three dimensions: positivity, negativity,andobjectivity.Each
word in the lexicon is associated with three sentiment
scores, representing its positivity,negativity,andobjectivity.
The scores are based on the collective sentiment of words
that are semantically similar to the word being scored.
SentiWordNet can be used as a resource in sentiment
analysis and opinion mining to classify the sentiment of a
piece of text into positive, negative, or neutral categories. To
do this, the sentiment scores of the words in the text are
aggregated to determine the overall sentiment of the text.
SentiWordNet has been widely used in sentiment analysis
and has been shown to perform well in comparison to other
lexicons and machine learning models. It is a valuable
resource for researchers and practitioners in the field of
sentiment analysis.
4. RESULTS AND ANALYSIS
4.1. Naive Bayes
Naive Bayes is a simple probabilistic classifier based on
Bayes' Theorem. It is a popular algorithm in the field of
machine learning and is widely used for tasks such as text
classification, sentiment analysis, and spam filtering.
The basic idea behind Naive Bayes is to use Bayes' Theorem
to calculate the probability of a class (e.g., positive, negative,
or neutral sentiment) given a set of features (e.g., words in a
text). The algorithm assumes that the features are
conditionally independent, meaningthatthe presenceof one
feature does not affect the presence of another feature. This
is the "naive" part of the algorithm, hence its name.
There are several variants of the Naive Bayes algorithm,
including the Multinomial Naive Bayes, Bernoulli Naive
Bayes, and Gaussian Naive Bayes. Each variant is suited for
different types of data and different classification tasks.
Naive Bayes is a fast and effective algorithm for text
classification and sentiment analysis. It is simple to
implement and requires little data preparation.However, its
performance can be limited by the "naive" assumption of
independence between features, which is not always
accurate in practice. Despite this, Naive Bayes remains a
popular and widely used algorithm in the field of text
classification and sentiment analysis.
4.2. For Twitter Dataset
We investigate a wide range of characteristics that have a
significant impact on sentiment analysis. We have made use
of N-gram features such as unigrams (n = 1) and bigrams (n
= 2), which are used often in a variety of text classifications
including sentiment analysis. In the course of our research,
we played around with boolean features using both
unigrams and bigrams. Each n-gram feature has a boolean
value that is connected with it. This value is set to true if and
only if the corresponding n-gram appears in the tweet [12].
The many characteristics that we have employed are
outlined in Table 1, along with the accuracy results obtained
from each particular classifier. A comparison of this dataset
with the one that Pang Lee et al. utilized fortheirresearchon
movie reviews has been carried out here. According to what
was found in Table 1, the classification accuracies that
resulted from using unigrams as features gave better results
in the case of tweets than movie reviews when we used the
NB classifier with Laplace smoothing; however, when we
used the MaxEnt classifier, the accuracy result of movie
reviews was more than the tweets.
Table 1: Accuracy of tweets using different features
Table 2: F1 score of MNB classifier
We investigate a wide range of characteristics that have a
significant impact on sentiment analysis. We have made use
of N-gram features such as unigrams (n = 1) and bigrams (n
= 2), which are used often in a variety of text classifications
including sentiment analysis. In the course of our research,
we played around with boolean features using both
unigrams and bigrams. Each n-gram feature has a boolean
value that is connected with it. This value is set to true if and
only if the corresponding n-gram appears in the tweet [12].
The many characteristics that we have employed are
outlined in Table 1, along with the accuracy results obtained
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 02 | Feb 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 98
from each particular classifier. A comparison of this dataset
with the one that Pang Lee et al. utilized fortheirresearchon
movie reviews has been carried out here. According to what
was found in Table 1, the classification accuracies that
resulted from using unigrams as features gave better results
in the case of tweets than movie reviews when we used the
NB classifier with Laplace smoothing; however, when we
used the MaxEnt classifier, the accuracy result of movie
reviews was more than the tweets.
The effectiveness of POS features has been validated using
sentiment analysis. As a general rule,adjectivesareregarded
as useful components for sentimentanalysissincetheyserve
as reliable indicators of a subject's feelings. Taking into
account solely adjectives provides results that are
comparable to those produced by employing unigrams and
bigrams, as can be seen in Line (5) of the table displayingthe
results of our experiment. Line (4) of the tabledisplayingthe
results demonstrates that when unigrams and POS are used
as a feature, all three classifiers generate superior results.
The first line of the table displayingtheresultsdemonstrates
that using SVM with unigram as a feature yields the best
result out of all the characteristics that were taken into
consideration. The comprehensive findings of the MNB
classifier may be seen in Table 2, which displays the F1
score. The Receiver Operating Characteristic (ROC) curve of
the MNB classifier is shown in Figure 1. This curve is for
tweets that have been manually annotated.
Figure-1: ROC curve of MNB classifier for tweets
4.3. Emotion Dataset
Hashtags are often used as a means for people to
communicate their thoughts and feelings. Therefore, a
satisfactory amount of feelings and sentiments may be
gleaned fromthesehashtaggedphrases.Thesehashtagshave
been included in our machine-learning algorithm to provide
it with more data. Figure 2 depicts a snapshot of the
confusion matrix forouremotiondataset'sunigramfeatures.
Additionally, the F1 score of each class for the unigram
feature is shown in this figure. Figure 3 shows theROCcurve
that was generated by our classifier.
Figure 2: Snapshot of emotion dataset
Table 3: Accuracy of emotion dataset using different
features
Table 4: F1 score of MNB classifier for unigram feature
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 02 | Feb 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 99
Figure 3: ROC curve of MNB classifier for an emotion
data set
When compared to the data set that is generated by
manually annotatingtweets, weobservedthatconstructing a
dataset by automatically collecting tweets via the use of
hashtags demonstrates a clear advantage. This was one of
the findings of our experiment. This is because authors are
accurate about their feelings, buttheconventional methodof
annotating material requires annotators toinferthewriters'
feelings from the text, which is not possible to do accurately.
5. CONCLUSION
As part of our study, we looked at the difficulties of
Sentiment Analysis and the many approaches used in this
area. Identification of sentiment in social media data is
notoriously challenging due to the data's richness and
subtlety. To determine which characteristicsaremostuseful
for Sentiment Analysis, we experimented using tweets
collected from the public domain. We have used Machine
Learning and lexicon-based algorithms for SA. The goal of
our project was to make the most efficient use of the
SentiWordNet vocabulary to develop a Twitter Sentiment
Analysis platform. Using the SentiWordNet lexicon, we
obtained an accuracy of 75.20 percent for our dataset,
although we observed that this number varied significantly
from one area to the next. Because the current lexicon has a
huge number of terms with their emotion score, it is lacking
specific words that are common in a certain domain, it is
preferable to construct a lexicon from the test corpus and
use it for classification. Our model, which uses the Google
search engine to determinea term'sscoreutilizingpointwise
mutual information, outperforms the SentiWordNet lexicon
on our dataset and can deal with one of the difficulties of
Sentiment Analysis—the unexpected shift from positive to
negative sentiments.
REFERENCES
[1] C. Alm, D. Roth, and R. Sproat, “Emotions from the text:
machine learning for text-based emotion prediction,” in
Proceedings of HLT and EMNLP. ACL, 2005, pp. 579–586.
[2] S. Aman and S. Szpakowicz, “Using Roget’s thesaurus for
fine-grained emotion recognition,” inProceedingsofIJCNLP,
2008, pp. 296–302.
[3] P. Chesley, B. Vincent, L. Xu, and R. K. Srihari, “Using
verbs and adjectives to automatically classify blog
sentiment,” in AAAI Spring Symposium: Computational
Approaches to Analyzing Weblogs, 2006, pp. 27–29.
[4] M. D. Choudhury, S. Counts, and M. Gamon, “Not all
moods are created equal! exploring human emotional states
in social media,” in Proceedings of ICWSM, 2012.
[5] R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin, “Liblinear:
A library for large linear classification,” The Journal of
Machine Learning Research, vol. 9, pp. 1871–1874, 2008.
[6] K. Gimpel, N. Schneider, B. O’Connor, D. Das, D. Mills, J.
Eisenstein, M. Heilman, D. Yogatama, J. Flanigan, and N. A.
Smith, “Part-of-speech tagging for Twitter: annotation,
features, and experiments,” in Proceedings of HLT: short
papers, ser. HLT ’11. Stroudsburg, PA, USA: ACL, 2011, pp.
42–47.
[7] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann,
and I. Witten, “The weka data mining software: an update,”
ACM SIGKDD Explorations Newsletter, vol. 11, no. 1, pp. 10–
18, 2009.
[8] G. Mishne, “Experiments with mood classification inblog
posts,” in Proceedings of ACM SIGIR 2005 Workshop on
Stylistic Analysis of Text for Information Access.
[9] S. Mohammad,“#emotional tweets,”in Proceedingsofthe
Sixth International Workshop on Semantic Evaluation. ACL,
7-8 June 2012, pp. 246–255.
[10] A. Neviarouskaya, H. Prendinger, and M. Ishizuka,
“Affect analysis model: A novel rule-basedapproachtoaffect
sensing from text,” Natural Language Engineering, vol. 17,
no. 1, pp. 95–135, 2011.
[11] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up?:
sentiment classification using machinelearningtechniques,”
in Proceedings of EMNLP. ACL, 2002, pp. 79–86.
[12] P. Shaver, J. Schwartz, D. Kirson, and C. O’Connor,
“Emotion knowledge: Further exploration of a prototype
approach.” Journal of personality and social psychology,vol.
52, no. 6, pp. 1061–1086, 1987.
[13] C. Strapparava and R. Mihalcea, “Learning to identify
emotions in text,” in Proceedings of the 2008 ACM
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 02 | Feb 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 100
symposium on Applied computing. ACM, 2008, pp. 1556–
1560.
[14] C. Strapparava and A. Valitutti, “Wordnet-affect: an
affective extension of wordnet,” in Proceedings of LREC, vol.
4. Citeseer, 2004, pp. 1083– 1086.
[15] C. Strapparava and R. Mihalcea, “Semeval-2007task 14:
affective text,” in Proceedings of the 4th International
Workshop on Semantic Evaluations, ser. SemEval ’07, 2007,
pp. 70–74.
[16] R. Tokuhisa, K. Inui, and Y. Matsumoto, “Emotion
classification using massive examples extracted from the
web,” in Proceedings of COLING. ACL, 2008, pp. 881–888.
[17] T. Wilson, J. Wiebe, and P. Hoffmann, “Recognizing
contextual polarity in phrase-level sentiment analysis,” in
Proceedings of HLT and EMNLP. ACL, 2005, pp. 347–354.
[18] I. Witten, E. Frank, and M. Hall, Data Mining: Practical
machine learning tools and techniques. Morgan Kaufmann,
2011.
[19] C. Yang, K. Lin, and H. Chen, “Emotion classification
using web blog corpora,” in IEEE/WIC/ACM International
Conference on Web Intelligence. IEEE, 2007, pp. 275–278.

More Related Content

Similar to UTILIZING TWITTER TO PERFORM AUTONOMOUS SENTIMENT ANALYSIS (20)

PDF
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET Journal
 
PDF
Sentiment Analysis on Twitter data using Machine Learning
IRJET Journal
 
PDF
BANKING CHATBOT USING NLP AND MACHINE LEARNING ALGORITHMS
IRJET Journal
 
PDF
Review on Opinion Targets and Opinion Words Extraction Techniques from Online...
IRJET Journal
 
PDF
A Novel Voice Based Sentimental Analysis Technique to Mine the User Driven Re...
IRJET Journal
 
PDF
IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...
IRJET Journal
 
PDF
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...
IRJET Journal
 
PDF
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
ijistjournal
 
PDF
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
ijistjournal
 
PDF
Analysis and Prediction of Sentiments for Cricket Tweets using Hadoop
IRJET Journal
 
PDF
Sentiment Analysis of Product Reviews and Trustworthiness Evaluation using TRS
IRJET Journal
 
PDF
Sentiment Analysis of Product Reviews and Trustworthiness Evaluation using TRS
IRJET Journal
 
PDF
An Opinion Mining and Sentiment Analysis Techniques: A Survey
IRJET Journal
 
PDF
THE ANALYSIS FOR CUSTOMER REVIEWS THROUGH TWEETS, BASED ON DEEP LEARNING
IRJET Journal
 
PDF
[IJET V2I4P9] Authors: Praveen Jayasankar , Prashanth Jayaraman ,Rachel Hannah
IJET - International Journal of Engineering and Techniques
 
DOCX
Python report on twitter sentiment analysis
AntaraBhattacharya12
 
PDF
IRJET- The Sentimental Analysis on Product Reviews of Amazon Data using the H...
IRJET Journal
 
PDF
A Survey on Analysis of Twitter Opinion Mining using Sentiment Analysis
IRJET Journal
 
PDF
Sentiment Analysis on Product Reviews Using Supervised Learning Techniques
IRJET Journal
 
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET Journal
 
Sentiment Analysis on Twitter data using Machine Learning
IRJET Journal
 
BANKING CHATBOT USING NLP AND MACHINE LEARNING ALGORITHMS
IRJET Journal
 
Review on Opinion Targets and Opinion Words Extraction Techniques from Online...
IRJET Journal
 
A Novel Voice Based Sentimental Analysis Technique to Mine the User Driven Re...
IRJET Journal
 
IRJET- Strength and Workability of High Volume Fly Ash Self-Compacting Concre...
IRJET Journal
 
IRJET- Implementing Social CRM System for an Online Grocery Shopping Platform...
IRJET Journal
 
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
ijistjournal
 
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
ijistjournal
 
Analysis and Prediction of Sentiments for Cricket Tweets using Hadoop
IRJET Journal
 
Sentiment Analysis of Product Reviews and Trustworthiness Evaluation using TRS
IRJET Journal
 
Sentiment Analysis of Product Reviews and Trustworthiness Evaluation using TRS
IRJET Journal
 
An Opinion Mining and Sentiment Analysis Techniques: A Survey
IRJET Journal
 
THE ANALYSIS FOR CUSTOMER REVIEWS THROUGH TWEETS, BASED ON DEEP LEARNING
IRJET Journal
 
[IJET V2I4P9] Authors: Praveen Jayasankar , Prashanth Jayaraman ,Rachel Hannah
IJET - International Journal of Engineering and Techniques
 
Python report on twitter sentiment analysis
AntaraBhattacharya12
 
IRJET- The Sentimental Analysis on Product Reviews of Amazon Data using the H...
IRJET Journal
 
A Survey on Analysis of Twitter Opinion Mining using Sentiment Analysis
IRJET Journal
 
Sentiment Analysis on Product Reviews Using Supervised Learning Techniques
IRJET Journal
 

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
PDF
Kiona – A Smart Society Automation Project
IRJET Journal
 
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
PDF
Breast Cancer Detection using Computer Vision
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
Kiona – A Smart Society Automation Project
IRJET Journal
 
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
Breast Cancer Detection using Computer Vision
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Ad

Recently uploaded (20)

PDF
monopile foundation seminar topic for civil engineering students
Ahina5
 
PPTX
原版一样(Acadia毕业证书)加拿大阿卡迪亚大学毕业证办理方法
Taqyea
 
PPTX
UNIT DAA PPT cover all topics 2021 regulation
archu26
 
PPTX
site survey architecture student B.arch.
sri02032006
 
PPTX
Introduction to Neural Networks and Perceptron Learning Algorithm.pptx
Kayalvizhi A
 
PPTX
Pharmaceuticals and fine chemicals.pptxx
jaypa242004
 
PDF
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
PPTX
ISO/IEC JTC 1/WG 9 (MAR) Convenor Report
Kurata Takeshi
 
PPTX
Thermal runway and thermal stability.pptx
godow93766
 
PPTX
Benefits_^0_Challigi😙🏡💐8fenges[1].pptx
akghostmaker
 
PDF
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
PPTX
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
PDF
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
PPTX
Break Statement in Programming with 6 Real Examples
manojpoojary2004
 
PPTX
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
PPTX
Introduction to Design of Machine Elements
PradeepKumarS27
 
PPTX
EC3551-Transmission lines Demo class .pptx
Mahalakshmiprasannag
 
PDF
Statistical Data Analysis Using SPSS Software
shrikrishna kesharwani
 
PPTX
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
PDF
POWER PLANT ENGINEERING (R17A0326).pdf..
haneefachosa123
 
monopile foundation seminar topic for civil engineering students
Ahina5
 
原版一样(Acadia毕业证书)加拿大阿卡迪亚大学毕业证办理方法
Taqyea
 
UNIT DAA PPT cover all topics 2021 regulation
archu26
 
site survey architecture student B.arch.
sri02032006
 
Introduction to Neural Networks and Perceptron Learning Algorithm.pptx
Kayalvizhi A
 
Pharmaceuticals and fine chemicals.pptxx
jaypa242004
 
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
ISO/IEC JTC 1/WG 9 (MAR) Convenor Report
Kurata Takeshi
 
Thermal runway and thermal stability.pptx
godow93766
 
Benefits_^0_Challigi😙🏡💐8fenges[1].pptx
akghostmaker
 
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
Break Statement in Programming with 6 Real Examples
manojpoojary2004
 
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
Introduction to Design of Machine Elements
PradeepKumarS27
 
EC3551-Transmission lines Demo class .pptx
Mahalakshmiprasannag
 
Statistical Data Analysis Using SPSS Software
shrikrishna kesharwani
 
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
POWER PLANT ENGINEERING (R17A0326).pdf..
haneefachosa123
 
Ad

UTILIZING TWITTER TO PERFORM AUTONOMOUS SENTIMENT ANALYSIS

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 02 | Feb 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 95 UTILIZING TWITTER TO PERFORM AUTONOMOUS SENTIMENT ANALYSIS Akanksha Srivastava1, Mr. Sambhav Agarwal2 1M.Tech, Computer Science and Engineering, SR Institute of Management & Technology, Lucknow, India 2Associate Professor, Computer Science, and Engineering, SR Institute of Management & Technology, Lucknow ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Applications in many domains make Sentiment Analysis an exciting area for study. The use of online polls and surveys to get feedback from the public regarding goods, current events, and societal or political issues are on the rise. The public and the stakeholders benefit from hearing the thoughts and feelings of the general public when important choices must be made. Opinion mining is the practice of gleaning insights from online sources including web search engines, blogs, micro-blogs, Twitter, and social networks to produce meaningful conclusions. Twitter's user base provides a wealth of material from which to get insight intothepublic's perspective. The massive volume of tweets as theunstructured text makes it challenging to physically delineate the information. Consequently, extracting and condensing the tweets from corpora calls for expert computational methodologies, which in turn necessitates familiarity with terms that convey emotion. Sentiment analysis from the unstructured text may be accomplished usingawidevarietyof computer methodologies, models, and algorithms. The vast majority are based on machine learning methods, namely the Bag-of-Words (BoW) representation. In thisresearch, weused a lexicon-based strategy to automatically identify sentiment for tweets gathered from the Twitter public domain. To further investigate the efficacy of alternative feature combinations, we have used three distinct machine learning algorithms for the task of tweet sentiment identification: Naive Bayes (NB), Maximum Entropy (ME), and Support Vector Machines (SVM). Our results suggest that bothNBwith Laplace smoothing and SVM are successful in categorizingthe tweets. The feature used for NB is unigramandPart-of-Speech (POS), while unigram is utilized for SVM. Key Words: Bag-of-Words, Lexicon, Machine Learning Algorithms, Laplace Smoothing, Part-of-Speech. 1. INTRODUCTION It has been found via two separate polls of over 2000 American adults that 81% of Internet users (or 60% of Americans) have done product research online at least once and that 20% of Internet users (15% of Americans) prefer it on a certain day. We may claim that people's consumption of goods and services is not the only factor for their online information-seekingandopinion-sharingactivities.The need for access to current political information is another critical factor to consider. At the moment, individuals may utilize email for political campaigns by sharing information and discussing candidates and issues online. The user trusts internet advice and suggestions since they deal mostly with an opinion. Despite the generally pleasant experiences of American Internet users during online product research, Horrigan [1] found that 58% of users reported experiencing missing, difficult-to-discover, confused, or overwhelming online information. Therefore, there is a significant need for improved information-access technologies to aid shoppers and researchers. Web 2.0 sites like blogs, message boards, and other kinds of social media havemadeiteasierthan ever for customers to voice their thoughts and views on the brands they use. In recent years, businesses have begun to acknowledge the power that user reviews have on shaping the perceptions of others and the standing of certain brands. Companies are beginning to watch social media to react to customer feedback and adjust their marketing, brand positioning, product development, and other strategies appropriately. 1.1. Opinion Mining and Sentiment Analysis Extracting views from text is called "Opinion Mining" (OM). Viewpoint mining (OM) is a new field at the intersection of information retrieval, text mining, and computational linguistics that seeks to detect the opinion represented in natural language texts, as described by Pang et al. [3]. Opinion mining is a subfield of KDD that employs Natural Language Processing (NLP) and statistical machine learning methods to identify and distinguish between opinionated and factual content. Tasks in opinionminingincludelocating opinions, labeling them as favorable, negative, or neutral, determining where those opinions originated, and summarising them. To automatically extract a summary of an entity's opinion from a largebodyof theunstructuredtext is the primary goal of the Opinion Mining assignment. Opinion Mining and Sentiment Analysis (SA) are two names for the same thing: the study of how people feel about something. An individual's thoughts, feelings, and impressions about a matter, as expressed in the form of an opinion, are deeply personal and confidential. Individuals, groups, and societies may benefit greatly from the advice and counsel of others throughout the decision-making process, as concluded by the work of Liu et al. [2]. To act swiftly and wisely, humans demand information that isboth precise and brief. While making a choice, people often seek advice from friends, family, and experts for whom they have developed an opinion or point of view based on their own
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 02 | Feb 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 96 experiences, observations, conceptions, and beliefs (which may or may not be good or negative). 2. SENTIMENT TARGET IDENTIFICATION Identifying sentiment (opinion) targets isa crucial partofSA work. The aim here might be anything from the subject of the statement to the object of that statement. Everyone involved in making and selling a product has to do a thorough evaluation of it in light of public and buyer feedback. Automatically identifying and extracting aspects mentioned in reviews is a key step in conducting a review comparison. Opinion mining and summarization, thus, rely heavily on product feature mining[10].Sentimentanalysisis a difficult field of study. This is because a system has to be able to discern evaluative expressions and some qualities that are not overtly present and need to be identified from the term semantic to correctly identify opinion targets in a phrase or document. Previous studies on the topic of sentiment target identification have shown that several Natural Language Processing (NLP) methods, including processing, Part-of-Speech tagging, noise reduction, feature selection, and classification, are all necessary stages in the extraction process. 3. METHODOLOGY Research data collecting is more complex than it may seem since it requires drawing important and relevantinferences. Test data, subjective training data, and objective (neutral) training data are the three types of data that have been gathered. The Twitter API will be covered beforehand. 3.1. Twitter API Developers may access Tweets, DMs, media, and other Twitter data using the Twitter API, which provides a collection of programming interfaces. Through the API, programmers may create products that communicate with the Twitter service and carry out actions like publishing Tweets, getting user information, and viewing trending topics, among other things. Different endpoints, authentication mechanisms, and useconstraintsapplyto the API's several flavors, which include REST (Representational State Transfer), streaming, and advertising. A Twitter developer account and API keys (also known as access tokens) are prerequisites for interacting with the API. 3.2. Twython Twython is a Python library for accessing the Twitter API. It provides a simple andconvenientwayforPythondevelopers to interact with the Twitter platform and performtaskssuch as posting Tweets,retrieving userinformation,andaccessing timelines. Twython abstracts manyofthecomplexitiesofthe Twitter API and provides a simple, Pythonic interface for accessing the API's resources. To useTwython,you will need to obtain API keys or access tokens froma Twitterdeveloper account, and then use these credentials to initialize a Twython client object, which you can use to make API requests. The library supports both REST and Streaming APIs and includes functionalityforOAuth1.0a andOAuth2.0 authentication. 3.3. Data Preprocessing in Twitter Data preprocessing in Twitter involves cleaning and transforming Twitter data into a format that is suitable for further analysis or modeling. This may includetaskssuchas: 1. Data Collection: Collect raw data from the Twitter API, such as tweets, user profiles, and trends. 2. Data Cleaning: Removing irrelevant information, correcting errors, handling missing values, and removing duplicates from the collected data. 3. Text Processing: Processing textual data from tweets, such as removing stop words, stemming, and converting text to lowercase. 4. Sentiment Analysis: Classifyingtweetsintopositive, negative, or neutral sentiment categories. 5. Data Transformation: Converting the data into a format that is suitable for analysis, such as converting textual data into numerical representations. 6. Data Reduction: Reducing the dimensionality ofthe data, such as aggregating data by user or period. These steps ensure that the data is in a clean, consistent,and usable format, and help improve the accuracy and reliability of any subsequent analysis or modeling. 3.4. Lexicon-Based Approach The lexicon-based approach is a method used in sentiment analysis and opinion mining to classify the sentiment of a piece of text, such as a tweet, into positive, negative, or neutral categories. Theapproachinvolvesusinga predefined lexicon, or a list of words, that are associated with specific sentiments. In a lexicon-based approach, the sentiment of a piece of text is determined by counting the number of words in the text that match words in the lexicon and then aggregating the sentiment scores associated with these words.Theresulting sentiment score is then used to classify the text as positive, negative, or neutral. There are many different lexicons available for use in sentiment analysis, each with its strengths and weaknesses. Some popularlexiconsincludeSentiWordNet,theHarvardIV dictionary, and the AFINN lexicon. The lexicon-based approach is simple to implement and has been widely used in sentiment analysis. However, it has some limitations, such as being limited to the words in the lexicon and not taking into account the context in which words are used. To overcome these limitations, other
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 02 | Feb 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 97 approaches such as machine learning and deep learning models have been developed. 3.5. SentiWordNet SentiWordNet is a lexiconfor sentimentanalysisandopinion mining. It is a manually constructed, multi-word expression resource for the English language that provides sentiment scores for words and phrases. SentiWordNet assigns sentiment scores to words based on three dimensions: positivity, negativity,andobjectivity.Each word in the lexicon is associated with three sentiment scores, representing its positivity,negativity,andobjectivity. The scores are based on the collective sentiment of words that are semantically similar to the word being scored. SentiWordNet can be used as a resource in sentiment analysis and opinion mining to classify the sentiment of a piece of text into positive, negative, or neutral categories. To do this, the sentiment scores of the words in the text are aggregated to determine the overall sentiment of the text. SentiWordNet has been widely used in sentiment analysis and has been shown to perform well in comparison to other lexicons and machine learning models. It is a valuable resource for researchers and practitioners in the field of sentiment analysis. 4. RESULTS AND ANALYSIS 4.1. Naive Bayes Naive Bayes is a simple probabilistic classifier based on Bayes' Theorem. It is a popular algorithm in the field of machine learning and is widely used for tasks such as text classification, sentiment analysis, and spam filtering. The basic idea behind Naive Bayes is to use Bayes' Theorem to calculate the probability of a class (e.g., positive, negative, or neutral sentiment) given a set of features (e.g., words in a text). The algorithm assumes that the features are conditionally independent, meaningthatthe presenceof one feature does not affect the presence of another feature. This is the "naive" part of the algorithm, hence its name. There are several variants of the Naive Bayes algorithm, including the Multinomial Naive Bayes, Bernoulli Naive Bayes, and Gaussian Naive Bayes. Each variant is suited for different types of data and different classification tasks. Naive Bayes is a fast and effective algorithm for text classification and sentiment analysis. It is simple to implement and requires little data preparation.However, its performance can be limited by the "naive" assumption of independence between features, which is not always accurate in practice. Despite this, Naive Bayes remains a popular and widely used algorithm in the field of text classification and sentiment analysis. 4.2. For Twitter Dataset We investigate a wide range of characteristics that have a significant impact on sentiment analysis. We have made use of N-gram features such as unigrams (n = 1) and bigrams (n = 2), which are used often in a variety of text classifications including sentiment analysis. In the course of our research, we played around with boolean features using both unigrams and bigrams. Each n-gram feature has a boolean value that is connected with it. This value is set to true if and only if the corresponding n-gram appears in the tweet [12]. The many characteristics that we have employed are outlined in Table 1, along with the accuracy results obtained from each particular classifier. A comparison of this dataset with the one that Pang Lee et al. utilized fortheirresearchon movie reviews has been carried out here. According to what was found in Table 1, the classification accuracies that resulted from using unigrams as features gave better results in the case of tweets than movie reviews when we used the NB classifier with Laplace smoothing; however, when we used the MaxEnt classifier, the accuracy result of movie reviews was more than the tweets. Table 1: Accuracy of tweets using different features Table 2: F1 score of MNB classifier We investigate a wide range of characteristics that have a significant impact on sentiment analysis. We have made use of N-gram features such as unigrams (n = 1) and bigrams (n = 2), which are used often in a variety of text classifications including sentiment analysis. In the course of our research, we played around with boolean features using both unigrams and bigrams. Each n-gram feature has a boolean value that is connected with it. This value is set to true if and only if the corresponding n-gram appears in the tweet [12]. The many characteristics that we have employed are outlined in Table 1, along with the accuracy results obtained
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 02 | Feb 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 98 from each particular classifier. A comparison of this dataset with the one that Pang Lee et al. utilized fortheirresearchon movie reviews has been carried out here. According to what was found in Table 1, the classification accuracies that resulted from using unigrams as features gave better results in the case of tweets than movie reviews when we used the NB classifier with Laplace smoothing; however, when we used the MaxEnt classifier, the accuracy result of movie reviews was more than the tweets. The effectiveness of POS features has been validated using sentiment analysis. As a general rule,adjectivesareregarded as useful components for sentimentanalysissincetheyserve as reliable indicators of a subject's feelings. Taking into account solely adjectives provides results that are comparable to those produced by employing unigrams and bigrams, as can be seen in Line (5) of the table displayingthe results of our experiment. Line (4) of the tabledisplayingthe results demonstrates that when unigrams and POS are used as a feature, all three classifiers generate superior results. The first line of the table displayingtheresultsdemonstrates that using SVM with unigram as a feature yields the best result out of all the characteristics that were taken into consideration. The comprehensive findings of the MNB classifier may be seen in Table 2, which displays the F1 score. The Receiver Operating Characteristic (ROC) curve of the MNB classifier is shown in Figure 1. This curve is for tweets that have been manually annotated. Figure-1: ROC curve of MNB classifier for tweets 4.3. Emotion Dataset Hashtags are often used as a means for people to communicate their thoughts and feelings. Therefore, a satisfactory amount of feelings and sentiments may be gleaned fromthesehashtaggedphrases.Thesehashtagshave been included in our machine-learning algorithm to provide it with more data. Figure 2 depicts a snapshot of the confusion matrix forouremotiondataset'sunigramfeatures. Additionally, the F1 score of each class for the unigram feature is shown in this figure. Figure 3 shows theROCcurve that was generated by our classifier. Figure 2: Snapshot of emotion dataset Table 3: Accuracy of emotion dataset using different features Table 4: F1 score of MNB classifier for unigram feature
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 02 | Feb 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 99 Figure 3: ROC curve of MNB classifier for an emotion data set When compared to the data set that is generated by manually annotatingtweets, weobservedthatconstructing a dataset by automatically collecting tweets via the use of hashtags demonstrates a clear advantage. This was one of the findings of our experiment. This is because authors are accurate about their feelings, buttheconventional methodof annotating material requires annotators toinferthewriters' feelings from the text, which is not possible to do accurately. 5. CONCLUSION As part of our study, we looked at the difficulties of Sentiment Analysis and the many approaches used in this area. Identification of sentiment in social media data is notoriously challenging due to the data's richness and subtlety. To determine which characteristicsaremostuseful for Sentiment Analysis, we experimented using tweets collected from the public domain. We have used Machine Learning and lexicon-based algorithms for SA. The goal of our project was to make the most efficient use of the SentiWordNet vocabulary to develop a Twitter Sentiment Analysis platform. Using the SentiWordNet lexicon, we obtained an accuracy of 75.20 percent for our dataset, although we observed that this number varied significantly from one area to the next. Because the current lexicon has a huge number of terms with their emotion score, it is lacking specific words that are common in a certain domain, it is preferable to construct a lexicon from the test corpus and use it for classification. Our model, which uses the Google search engine to determinea term'sscoreutilizingpointwise mutual information, outperforms the SentiWordNet lexicon on our dataset and can deal with one of the difficulties of Sentiment Analysis—the unexpected shift from positive to negative sentiments. REFERENCES [1] C. Alm, D. Roth, and R. Sproat, “Emotions from the text: machine learning for text-based emotion prediction,” in Proceedings of HLT and EMNLP. ACL, 2005, pp. 579–586. [2] S. Aman and S. Szpakowicz, “Using Roget’s thesaurus for fine-grained emotion recognition,” inProceedingsofIJCNLP, 2008, pp. 296–302. [3] P. Chesley, B. Vincent, L. Xu, and R. K. Srihari, “Using verbs and adjectives to automatically classify blog sentiment,” in AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, 2006, pp. 27–29. [4] M. D. Choudhury, S. Counts, and M. Gamon, “Not all moods are created equal! exploring human emotional states in social media,” in Proceedings of ICWSM, 2012. [5] R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin, “Liblinear: A library for large linear classification,” The Journal of Machine Learning Research, vol. 9, pp. 1871–1874, 2008. [6] K. Gimpel, N. Schneider, B. O’Connor, D. Das, D. Mills, J. Eisenstein, M. Heilman, D. Yogatama, J. Flanigan, and N. A. Smith, “Part-of-speech tagging for Twitter: annotation, features, and experiments,” in Proceedings of HLT: short papers, ser. HLT ’11. Stroudsburg, PA, USA: ACL, 2011, pp. 42–47. [7] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. Witten, “The weka data mining software: an update,” ACM SIGKDD Explorations Newsletter, vol. 11, no. 1, pp. 10– 18, 2009. [8] G. Mishne, “Experiments with mood classification inblog posts,” in Proceedings of ACM SIGIR 2005 Workshop on Stylistic Analysis of Text for Information Access. [9] S. Mohammad,“#emotional tweets,”in Proceedingsofthe Sixth International Workshop on Semantic Evaluation. ACL, 7-8 June 2012, pp. 246–255. [10] A. Neviarouskaya, H. Prendinger, and M. Ishizuka, “Affect analysis model: A novel rule-basedapproachtoaffect sensing from text,” Natural Language Engineering, vol. 17, no. 1, pp. 95–135, 2011. [11] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up?: sentiment classification using machinelearningtechniques,” in Proceedings of EMNLP. ACL, 2002, pp. 79–86. [12] P. Shaver, J. Schwartz, D. Kirson, and C. O’Connor, “Emotion knowledge: Further exploration of a prototype approach.” Journal of personality and social psychology,vol. 52, no. 6, pp. 1061–1086, 1987. [13] C. Strapparava and R. Mihalcea, “Learning to identify emotions in text,” in Proceedings of the 2008 ACM
  • 6. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 02 | Feb 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 100 symposium on Applied computing. ACM, 2008, pp. 1556– 1560. [14] C. Strapparava and A. Valitutti, “Wordnet-affect: an affective extension of wordnet,” in Proceedings of LREC, vol. 4. Citeseer, 2004, pp. 1083– 1086. [15] C. Strapparava and R. Mihalcea, “Semeval-2007task 14: affective text,” in Proceedings of the 4th International Workshop on Semantic Evaluations, ser. SemEval ’07, 2007, pp. 70–74. [16] R. Tokuhisa, K. Inui, and Y. Matsumoto, “Emotion classification using massive examples extracted from the web,” in Proceedings of COLING. ACL, 2008, pp. 881–888. [17] T. Wilson, J. Wiebe, and P. Hoffmann, “Recognizing contextual polarity in phrase-level sentiment analysis,” in Proceedings of HLT and EMNLP. ACL, 2005, pp. 347–354. [18] I. Witten, E. Frank, and M. Hall, Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2011. [19] C. Yang, K. Lin, and H. Chen, “Emotion classification using web blog corpora,” in IEEE/WIC/ACM International Conference on Web Intelligence. IEEE, 2007, pp. 275–278.