Group Members :
Mohan Kamal Hassan
Vinay Mathukumalli
Sai Krishna Mannepalli
Sentimental Analysis Performance Advanced
approaches of Machine Learning
(Impacting People’s Daily lives with Products and
Services Reviews)
Guided By
Ahmet Ozkul
Professor, Ph.D
University Of New Haven
INDEX
Abstract
Introduction
Problem Statement
Literature Review
Proposed Methodology
Result, Comparison and Analysis
Conclusion
References
Abstract
Sentiment analysis or opinion mining is one of the major tasks of NLP
(Natural Language Processing). Sentiment analysis has gained much
attention in recent years. Sentiment analysis systems are being applied
in almost every business and social domain because opinions are central
to almost all human activities and are key influencers of our behaviors.
Our beliefs and perceptions of reality, and the choices we make, are
largely conditioned on how others see and evaluate the world. For this
reason, when we need to make a decision we often seek out the
opinions of others. This is true not only for individuals but also for
organizations.
For this reason, when we need to make a decision we often seek out the
opinions of others. This is true not only for individuals but also for
organizations. Experiments for both sentence-level categorization and
review-level categorization are performed with promising outcomes.
• Objective: To determine the sentiment embedded in product reviews using
computational methods.
• Approaches: Utilizing two distinct sentiment analysis strategies:
• A traditional method employing the AFINN sentiment score lexicon.
• Advanced machine learning algorithms for a more nuanced analysis.
• Scope: Analysis spans across different sentiment levels, from negative to
positive (1 to 5 scale).
• Tools: Implementing various R packages and tools for text mining, data
manipulation, and machine learning.
• Outcome: Aiming to accurately classify the sentiment of reviews and
understand customer feedback.
• Application: Enhancing customer experience and business insights through
data-driven sentiment evaluation.
Introduction
Problem Statement
As we know that there are several E-Commerce sites
has been in the market with different products.
Thus there are so many reviews are being generated
for only one product, thus the problem arises for the
customers as well as E-Commerce company to
understand the review
Though there is a star rating, most of the customers
go through the reviews thus classifying the reviews
with an appropriate accuracy is need to retain the
customer.
• So how to extract useful information and build objective
products’ quality test system automatically to deal with
the massive textual information is emerging in the related
research field. Opinion Mining is a new technology based
on the technology of text mining and natural language
processing.
• It provides the approach to generate summary of the
products.It recognizes the opinion of the contents which
authors, mainly discusses the sentence-level opinion
mining and treats the statements of the product features
for each viewpoint as analysis objects, then we can find
authors’ opinion inclinations.
6
Literature Review
Paper title: “Sentiment analysis using product review data”
Author: Xing Fang and Justin Zhan
• This paper, it explained that they aim to tackle the problem of
sentiment polarity categorization, which is one of the
fundamental problems of sentiment analysis.
• A general process for sentiment polarity categorization is
proposed with detailed process descriptions.
• Data used in this study are online product reviews collected
from Amazon.com.
• Experiments for both sentence-level categorization and review-
level categorization are performed with promising outcomes.
Paper title: “Mining the customer behavior using web usage
mining in e-commerce”
Author: Yadav, M. P.
• Explained customer behavior for E-commerce companies using
K Mean.
• With the drastic growth of WWW users can easily find, extract,
filter and evaluated whatever they want.
• With the advancement in technology servers are now able to
collect and store a lot of data which can help them to know about
customers perceptions.
• Hence, to determine the relationship between web mining data
and ecommerce. Consumers mostly prefer to choose among
millions of ones in an online store to satisfy their demands
instead to choose from a superstore. It shows that consumers
have taken interest on e-commerce site to engage in international
trade.
Paper title: “Web Mining Techniques in E-Commerce Applications”
Author: Ahmad Tasnim Siddiqui
• Explained that today web is the best medium of communication in
modern business. Now day’s online purchase has been increased as
compared to window shopping as it provides millions of
ranges.As, companies are able to attract most of the customers
because ecommerce is not just buying and selling over internet but
it also act as to get advantage on big giants of market.
• For this purpose data mining sometimes called as knowledge
discovery is used. As vast information has been provided on
internet, it helps to improve e-commerce applications After that
they explained the proposed architecture which contains mainly
four components business data, data obtained from consumer’s
interaction, data warehouse and data analysis. After finishing the
task by data analysis module it’ll produce report which can be
utilized by the consumers as well as the e-commerce application
owners.
• Data Preprocessing: Cleaned and prepared the text data by tokenizing, removing
stopwords, and stemming to ensure quality input for analysis.
• Keyword-Based Analysis: Applied the AFINN lexicon to assign sentiment scores to
individual words and aggregated these to determine the overall sentiment of each
review.
• Machine Learning Models: Trained multiple models including Logistic Regression,
Naive Bayes, SVM, Random Forest, and XGBoost to classify sentiments into different
levels.
• Document-Term Matrix (DTM): Converted text reviews into a DTM to create a
structured numerical representation of the text for machine learning processing.
• Model Evaluation: Assessed the performance of each model using confusion
matrices and ROC curves, evaluating their predictive accuracy and ability to
generalize.
• Binary and Multiclass Classification: Implemented Logistic Regression for binary
(positive/negative) sentiment analysis and other models for multiclass (1 to 5 scale)
sentiment rating.
Proposed Methodology
NB Maxent Classifier:
• The NB Maxent Entropy Classifier Another well-known classifier or
Maxent as some people prefer to call it. The idea behind Maxent
classifiers is that we are preferring is that the most uniform models
that satisfy any given constraint. Maxent models are feature based
models. We use these features (Based on product reviews which are
based on product features) to find a distribution over the different
classes using logistic regression (as an accuracy value).
• The probability of a data point belonging to a particular class is
calculated for measuring the accuracy of Maxent with respect to the
parameters they are Precision, Recall and Fscore. Through which we
can decide and identify its accuracy in classification process.
11
NB Tree Classifier:
• The NB Boosted tree is a classifier that is basically a combination of Boosting
and Decision Trees. Boosting is a machine Meta learning algorithm for
reducing ambiguity in supervised learning. In Boosting predictive classifiers
are used to develop weighted trees (classes of positive negative and neutral)
which are further combined into single prediction accuracy value, which is
based on the three parameters they are (Precision, Recall and Fscore).
• Boosted trees combine the strengths of two algorithms and provide the
accuracy value, through which we can decide and identify its accuracy in
classification process.
12
NB Support Vector Machine (SVM) Classifier:
• The NB Support Vector Machine algorithm has defined input and
output format. Input is a vector space and output is 0 or 1
(positive/negative). Text reviews (particularly the user reviews for the
product purchased) in original form are not suitable for learning. They
are transformed into format which matches into input of machine
learning algorithm input. For this pre-processing on text reviews is
carried out. Then we carryout transformation.
• Each word will correspond or belong to one class (the classes will be
positive, negative and neutral) and identical words (which bears same
meaning) belong to same classes. As we can calculate the TF-IDF for
this purpose. Such that the accuracy of support vector machine
algorithm can be identified and made a suitable comparison with
other algorithms.
13
NB Random Forest Classifier:
• The NB Random forests is an ensemble learning method for
classification that operate by constructing a vector of classes from the
pool of words consisting of different classes. It produces multi-
altitude decision based on the parameters on which the classifying
accuracy value are depended. The correlation between classes is
reduced by randomly selecting the bag or pool of words and thus the
prediction power increases and leads to increase in efficiency.
• The predictions are made by aggregating the predictions of various
ensemble data sets consisting of different classes. Such that the
accuracy of support vector machine algorithm can be identified and
made a suitable comparison with other algorithms.
14
NB Bagging Classifier:
• The NB Bagging is an ensemble machine learning technique which Employs
simplest way of combining predictions that belong to the same type of
classes, by differentiating the level of categorization in the process of
identifying the text level representation of the word belong to which class of
category that is (Positive, negative and neutral). Such that bagging method
incorporate the process of identifying the classifier from the given dataset.
• Therefore, improves the performances by identifying the different classifier
belong to which class. Thus, with the parameter we can determine the
accuracy in terms of value, so that in future we can use the bagging
technique for sentiment analysis of product reviews.
15
• Aggregated sentiment scores for each rating level. There are
more instances of positive sentiments than negative
sentiments.
Ratings / Wordcloud
Results, Comparison and Analysis
• The AFINN lexicon is a list of words each rated with a
sentiment score, reflecting the positivity or negativity of the
word.
• Reviews are scanned, and words are matched with the AFINN
lexicon to assign a sentiment score based on the lexicon’s
predefined ratings.
• Individual word scores are summed to determine an overall
sentiment score for each review, indicating its positive or
negative nature.
1 - Keyword Based Analysis
Sentimental Analysis
• Machine learning models are used to predict the sentiment of
reviews beyond simple positive or negative, often categorizing
into multiple levels of sentiment.
• Textual data is transformed into numerical form through a
Document-Term Matrix (DTM) to facilitate machine learning.
• Sparse columns in the Document-Term Matrix, where terms
appear in less than 2% of the documents, are removed to reduce
noise and computational complexity, enhancing model
performance.
• Each model is trained using a subset of the data (training set -
80% of total data) to learn the patterns associated with various
sentiment levels, and evaluated on testing set (remaining 20%
data).
2 – NB and Machine Learning Classifiers
Sentimental Analysis
• The Logistic Regression classifier achieved a
solid accuracy of 71.15% in sentiment
classification.
• High sensitivity (recall) at 81.31% suggests the
model is very good at identifying positive
reviews but less effective in recognizing
negative sentiments, with a specificity of
54.71%.
• The model's precision is robust at 74.39%.
• Balanced accuracy is at 68.01%, reflecting a
reasonable trade-off between sensitivity and
specificity across both sentiment classes.
• The Matthew's Correlation Coefficient (MCC)
stands at 0.3738, indicating that the classifier's
performance is better than a random guess
but still has room for improvement.
Logistic Regression Classifer
Sentimental Analysis
Naive Bayes Classifier and Random Forest
Sentimental Analysis
• The Naive Bayes classifier achieved an overall
accuracy of 33.75%, indicating a moderate level of
performance in multi-class sentiment classification.
• The sensitivity (recall) across classes is at 33.95%,
suggesting that the model has a uniform but
moderate ability to identify all sentiment classes
correctly.
• The model's balanced accuracy, which averages
sensitivity and specificity, is 58.72%, showing that
performance is not uniform across classes.
• The Matthew's Correlation Coefficient (MCC) for the
multi-class classification is 0.1792, which is low and
suggests that there's considerable room for
improvement in the model's predictive capability.
• The model's precision (ppv) is low at 33.38%,
indicating challenges in the accuracy of predictions
across the various sentiment classes.
NB and Support Vector Machines
Sentimental Analysis
• The Support Vector Machine (SVM) classifier shows an
accuracy of 38.25% in multi-class sentiment classification,
suggesting a modest ability to correctly classify
sentiment levels.
• With a balanced accuracy of 61.35%, the SVM classifier
demonstrates a moderate level of consistency in correctly
identifying sentiment across different classes.
• The sensitivity (recall) and precision (ppv) of the model
are both around 38%, indicating that the model has a fair
rate of correctly identifying true positives for each class
and that its positive predictions are correct at a similar
rate.
• The Matthew's Correlation Coefficient (MCC) at 0.2283
reflects a low but better than random correlation
between the observed and predicted classifications.
• The relatively low detection prevalence of 20% indicates
that the model is quite selective in predicting the positive
class, potentially leading to a higher number of false
negatives.
NB and Random Forest Classifier
Sentimental Analysis
• The Random Forest classifier demonstrates strong
performance with an accuracy of 65.85%, indicating it is
well-suited for multi-class sentiment classification.
• High balanced accuracy of 78.56% suggests a consistent
performance by the model in identifying various
sentiment classes.
• The model achieves a good sensitivity (recall) and
precision (ppv), both approximately 65.87%, showing its
effectiveness in correctly identifying and predicting
sentiment classes.
• The Matthew's Correlation Coefficient (MCC) is robust at
0.5745, suggesting a strong correlation between the
observed and predicted classifications, well above what
would be expected by chance.
• The model's specificity is very high at 91.45%, indicating
it is very effective at correctly identifying negative cases
for each sentiment class.
XGBoost Classifier
Sentimental Analysis
• The XGBoost classifier has achieved perfect performance
metrics across the board on the training set, indicating
that it has learned to classify the training data with 100%
accuracy.
• For the training data, every metric including accuracy,
sensitivity (recall), specificity, precision (ppv), and
Matthew's Correlation Coefficient (MCC) scored a
maximum of 1.0, implying no misclassification.
• The confusion matrix for the training data shows
complete accuracy with no instances of false positives or
false negatives, which may suggest overfitting to the
training data.
• The confusion matrix for the testing data also shows full
100% accuracy as indicated by zeros in non-diagonal
cells.
• The detection prevalence of 0.2 shows that the model
has a tendency to predict positive cases at a rate of 20%,
which is consistent across both the training and test sets.
• The project showcased the use of various advanced hybrid
machine learning models to analyze sentiment in product
reviews with different levels of effectiveness.
• Machine learning models, notably NB and Random Forest and
XGBoost, outperformed the traditional keyword-based
method, providing a deeper analysis of sentiments.
• NB hybrid Random Forest demonstrated high accuracy and
robust performance, marking it as a reliable classifier for
sentiment analysis tasks.
• Nb with XGBoost achieved perfect scores on training data but
the perfectness may indicate potential overfitting issues.
Conclusion
Sentimental Analysis
References
[1] S. ChandraKala1 and C. Sindhu2, “OPINION MINING AND
SENTIMENT CLASSIFICATION: A SURVEY,”.Vol .3(1),Oct-
2012,420-427.
[2] Kim S-M, Hovy E (2004) Determining the sentiment of
opinions In: Proceedings of the 20th international conference
on Computational Linguistics, page 1367.Association for
Computational Linguistics, Strasbourg, PA, USA.
[3] Liu B (2010) Sentiment analysis and subjectivity In:
Handbook of Natural Language Processing, Second Edition..
Taylor and Francis Group, Boca
[4] Liu B, Hu M, Cheng J (2005) Opinion observer: Analyzing
and comparing opinions on the web In: Proceedings of the
14th International Conference on World Wide Web, WWW ’05,
342–351.ACM, New York, NY, USA.
25
[5] Pang B, Lee L (2004) A sentimental education: Sentiment
analysis using subjectivity summarization based on minimum
cuts In: Proceedings of the 42Nd Annual Meeting on Association
for Computational Linguistics, ACL ’04.Association for
Computational Linguistics, Stroudsburg, PA, USA
[6] Liu B, Hu M, Cheng J (2005) Opinion observer: Analyzing and
comparing opinions on the web In: Proceedings of the 14th
International Conference on World Wide Web, WWW ’05, 342–
351..ACM, New York, NY, USA.
[7] Pang B, Lee L (2004) A sentimental education: Sentiment analysis
using subjectivity summarization based on minimum cuts In:
Proceedings of the 42Nd Annual Meeting on Association for
Computational Linguistics, ACL ’04..Association for Computational
Linguistics, Stroudsburg, PA, USA.
THANK YOU

More Related Content

PDF
Amazon Product Review Sentiment Analysis with Machine Learning
PPTX
How Does Customer Feedback Sentiment Analysis Work in Search Marketing?
PDF
Sentiment Analysis Using Hybrid Approach: A Survey
DOCX
Customer_Analysis.docx
PDF
IRJET- Survey of Classification of Business Reviews using Sentiment Analysis
PDF
Analysis Levels And Techniques A Survey
PDF
International Journal of Engineering Research and Development (IJERD)
PDF
International Journal of Engineering Research and Development (IJERD)
Amazon Product Review Sentiment Analysis with Machine Learning
How Does Customer Feedback Sentiment Analysis Work in Search Marketing?
Sentiment Analysis Using Hybrid Approach: A Survey
Customer_Analysis.docx
IRJET- Survey of Classification of Business Reviews using Sentiment Analysis
Analysis Levels And Techniques A Survey
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)

Similar to Business Analytics Final Capstone Project Presenation PPT.pptx (20)

PDF
A Novel Jewellery Recommendation System using Machine Learning and Natural La...
PDF
K1802056469
PDF
Co-Extracting Opinions from Online Reviews
PDF
A Review on Sentimental Analysis of Application Reviews
PDF
Ijmer 46067276
PDF
Ijmer 46067276
DOCX
Camera ready sentiment analysis : quantification of real time brand advocacy ...
PPTX
Reasesrty djhjan S - explanation required.pptx
PPTX
Machine_learning_presentation_on_movie_recomendation_system.pptx
PDF
IRJET - Support Vector Machine versus Naive Bayes Classifier:A Juxtaposition ...
PDF
Using NLP Approach for Analyzing Customer Reviews
PDF
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
PPTX
Collaborative Filtering Recommendation System
PDF
IRJET- Analysis of Brand Value Prediction based on Social Media Data
PDF
Framework for Product Recommandation for Review Dataset
PPTX
Emerging Techniques in Machine Learning, Data Science and Internet of Things
PDF
B1802021823
PDF
IRJET- Fatigue Analysis of Offshore Steel Structures
PDF
IRJET- Web based Hybrid Book Recommender System using Genetic Algorithm
PDF
SUPPORT VECTOR MACHINE CLASSIFIER FOR SENTIMENT ANALYSIS OF FEEDBACK MARKETPL...
A Novel Jewellery Recommendation System using Machine Learning and Natural La...
K1802056469
Co-Extracting Opinions from Online Reviews
A Review on Sentimental Analysis of Application Reviews
Ijmer 46067276
Ijmer 46067276
Camera ready sentiment analysis : quantification of real time brand advocacy ...
Reasesrty djhjan S - explanation required.pptx
Machine_learning_presentation_on_movie_recomendation_system.pptx
IRJET - Support Vector Machine versus Naive Bayes Classifier:A Juxtaposition ...
Using NLP Approach for Analyzing Customer Reviews
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
Collaborative Filtering Recommendation System
IRJET- Analysis of Brand Value Prediction based on Social Media Data
Framework for Product Recommandation for Review Dataset
Emerging Techniques in Machine Learning, Data Science and Internet of Things
B1802021823
IRJET- Fatigue Analysis of Offshore Steel Structures
IRJET- Web based Hybrid Book Recommender System using Genetic Algorithm
SUPPORT VECTOR MACHINE CLASSIFIER FOR SENTIMENT ANALYSIS OF FEEDBACK MARKETPL...
Ad

Recently uploaded (20)

DOCX
An investigation of the use of recycled crumb rubber as a partial replacement...
PPT
Unit - I.lathemachnespct=ificationsand ppt
PPT
Module_1_Lecture_1_Introduction_To_Automation_In_Production_Systems2023.ppt
PPTX
ARCHITECTURE AND PROGRAMMING OF EMBEDDED SYSTEMS
PDF
ST MNCWANGO P2 WIL (MEPR302) FINAL REPORT.pdf
PPTX
Real Estate Management PART 1.pptxFFFFFFFFFFFFF
PPTX
Solar energy pdf of gitam songa hemant k
PPTX
SE unit 1.pptx by d.y.p.akurdi aaaaaaaaaaaa
PDF
V2500 Owner and Operatore Guide for Airbus
PPTX
Module1.pptxrjkeieuekwkwoowkemehehehrjrjrj
PDF
Introduction to Machine Learning -Basic concepts,Models and Description
PPTX
chapter 1.pptx dotnet technology introduction
PPTX
IOP Unit 1.pptx for btech 1st year students
PDF
BBC NW_Tech Facilities_30 Odd Yrs Ago [J].pdf
PDF
ASPEN PLUS USER GUIDE - PROCESS SIMULATIONS
PDF
Module 1 part 1.pdf engineering notes s7
PPTX
INTERNET OF THINGS - EMBEDDED SYSTEMS AND INTERNET OF THINGS
PPTX
Design ,Art Across Digital Realities and eXtended Reality
PDF
electrical machines course file-anna university
PPTX
Research Writing, Mechanical Engineering
An investigation of the use of recycled crumb rubber as a partial replacement...
Unit - I.lathemachnespct=ificationsand ppt
Module_1_Lecture_1_Introduction_To_Automation_In_Production_Systems2023.ppt
ARCHITECTURE AND PROGRAMMING OF EMBEDDED SYSTEMS
ST MNCWANGO P2 WIL (MEPR302) FINAL REPORT.pdf
Real Estate Management PART 1.pptxFFFFFFFFFFFFF
Solar energy pdf of gitam songa hemant k
SE unit 1.pptx by d.y.p.akurdi aaaaaaaaaaaa
V2500 Owner and Operatore Guide for Airbus
Module1.pptxrjkeieuekwkwoowkemehehehrjrjrj
Introduction to Machine Learning -Basic concepts,Models and Description
chapter 1.pptx dotnet technology introduction
IOP Unit 1.pptx for btech 1st year students
BBC NW_Tech Facilities_30 Odd Yrs Ago [J].pdf
ASPEN PLUS USER GUIDE - PROCESS SIMULATIONS
Module 1 part 1.pdf engineering notes s7
INTERNET OF THINGS - EMBEDDED SYSTEMS AND INTERNET OF THINGS
Design ,Art Across Digital Realities and eXtended Reality
electrical machines course file-anna university
Research Writing, Mechanical Engineering
Ad

Business Analytics Final Capstone Project Presenation PPT.pptx

  • 1. Group Members : Mohan Kamal Hassan Vinay Mathukumalli Sai Krishna Mannepalli Sentimental Analysis Performance Advanced approaches of Machine Learning (Impacting People’s Daily lives with Products and Services Reviews) Guided By Ahmet Ozkul Professor, Ph.D University Of New Haven
  • 2. INDEX Abstract Introduction Problem Statement Literature Review Proposed Methodology Result, Comparison and Analysis Conclusion References
  • 3. Abstract Sentiment analysis or opinion mining is one of the major tasks of NLP (Natural Language Processing). Sentiment analysis has gained much attention in recent years. Sentiment analysis systems are being applied in almost every business and social domain because opinions are central to almost all human activities and are key influencers of our behaviors. Our beliefs and perceptions of reality, and the choices we make, are largely conditioned on how others see and evaluate the world. For this reason, when we need to make a decision we often seek out the opinions of others. This is true not only for individuals but also for organizations. For this reason, when we need to make a decision we often seek out the opinions of others. This is true not only for individuals but also for organizations. Experiments for both sentence-level categorization and review-level categorization are performed with promising outcomes.
  • 4. • Objective: To determine the sentiment embedded in product reviews using computational methods. • Approaches: Utilizing two distinct sentiment analysis strategies: • A traditional method employing the AFINN sentiment score lexicon. • Advanced machine learning algorithms for a more nuanced analysis. • Scope: Analysis spans across different sentiment levels, from negative to positive (1 to 5 scale). • Tools: Implementing various R packages and tools for text mining, data manipulation, and machine learning. • Outcome: Aiming to accurately classify the sentiment of reviews and understand customer feedback. • Application: Enhancing customer experience and business insights through data-driven sentiment evaluation. Introduction
  • 5. Problem Statement As we know that there are several E-Commerce sites has been in the market with different products. Thus there are so many reviews are being generated for only one product, thus the problem arises for the customers as well as E-Commerce company to understand the review Though there is a star rating, most of the customers go through the reviews thus classifying the reviews with an appropriate accuracy is need to retain the customer.
  • 6. • So how to extract useful information and build objective products’ quality test system automatically to deal with the massive textual information is emerging in the related research field. Opinion Mining is a new technology based on the technology of text mining and natural language processing. • It provides the approach to generate summary of the products.It recognizes the opinion of the contents which authors, mainly discusses the sentence-level opinion mining and treats the statements of the product features for each viewpoint as analysis objects, then we can find authors’ opinion inclinations. 6
  • 7. Literature Review Paper title: “Sentiment analysis using product review data” Author: Xing Fang and Justin Zhan • This paper, it explained that they aim to tackle the problem of sentiment polarity categorization, which is one of the fundamental problems of sentiment analysis. • A general process for sentiment polarity categorization is proposed with detailed process descriptions. • Data used in this study are online product reviews collected from Amazon.com. • Experiments for both sentence-level categorization and review- level categorization are performed with promising outcomes.
  • 8. Paper title: “Mining the customer behavior using web usage mining in e-commerce” Author: Yadav, M. P. • Explained customer behavior for E-commerce companies using K Mean. • With the drastic growth of WWW users can easily find, extract, filter and evaluated whatever they want. • With the advancement in technology servers are now able to collect and store a lot of data which can help them to know about customers perceptions. • Hence, to determine the relationship between web mining data and ecommerce. Consumers mostly prefer to choose among millions of ones in an online store to satisfy their demands instead to choose from a superstore. It shows that consumers have taken interest on e-commerce site to engage in international trade.
  • 9. Paper title: “Web Mining Techniques in E-Commerce Applications” Author: Ahmad Tasnim Siddiqui • Explained that today web is the best medium of communication in modern business. Now day’s online purchase has been increased as compared to window shopping as it provides millions of ranges.As, companies are able to attract most of the customers because ecommerce is not just buying and selling over internet but it also act as to get advantage on big giants of market. • For this purpose data mining sometimes called as knowledge discovery is used. As vast information has been provided on internet, it helps to improve e-commerce applications After that they explained the proposed architecture which contains mainly four components business data, data obtained from consumer’s interaction, data warehouse and data analysis. After finishing the task by data analysis module it’ll produce report which can be utilized by the consumers as well as the e-commerce application owners.
  • 10. • Data Preprocessing: Cleaned and prepared the text data by tokenizing, removing stopwords, and stemming to ensure quality input for analysis. • Keyword-Based Analysis: Applied the AFINN lexicon to assign sentiment scores to individual words and aggregated these to determine the overall sentiment of each review. • Machine Learning Models: Trained multiple models including Logistic Regression, Naive Bayes, SVM, Random Forest, and XGBoost to classify sentiments into different levels. • Document-Term Matrix (DTM): Converted text reviews into a DTM to create a structured numerical representation of the text for machine learning processing. • Model Evaluation: Assessed the performance of each model using confusion matrices and ROC curves, evaluating their predictive accuracy and ability to generalize. • Binary and Multiclass Classification: Implemented Logistic Regression for binary (positive/negative) sentiment analysis and other models for multiclass (1 to 5 scale) sentiment rating. Proposed Methodology
  • 11. NB Maxent Classifier: • The NB Maxent Entropy Classifier Another well-known classifier or Maxent as some people prefer to call it. The idea behind Maxent classifiers is that we are preferring is that the most uniform models that satisfy any given constraint. Maxent models are feature based models. We use these features (Based on product reviews which are based on product features) to find a distribution over the different classes using logistic regression (as an accuracy value). • The probability of a data point belonging to a particular class is calculated for measuring the accuracy of Maxent with respect to the parameters they are Precision, Recall and Fscore. Through which we can decide and identify its accuracy in classification process. 11
  • 12. NB Tree Classifier: • The NB Boosted tree is a classifier that is basically a combination of Boosting and Decision Trees. Boosting is a machine Meta learning algorithm for reducing ambiguity in supervised learning. In Boosting predictive classifiers are used to develop weighted trees (classes of positive negative and neutral) which are further combined into single prediction accuracy value, which is based on the three parameters they are (Precision, Recall and Fscore). • Boosted trees combine the strengths of two algorithms and provide the accuracy value, through which we can decide and identify its accuracy in classification process. 12
  • 13. NB Support Vector Machine (SVM) Classifier: • The NB Support Vector Machine algorithm has defined input and output format. Input is a vector space and output is 0 or 1 (positive/negative). Text reviews (particularly the user reviews for the product purchased) in original form are not suitable for learning. They are transformed into format which matches into input of machine learning algorithm input. For this pre-processing on text reviews is carried out. Then we carryout transformation. • Each word will correspond or belong to one class (the classes will be positive, negative and neutral) and identical words (which bears same meaning) belong to same classes. As we can calculate the TF-IDF for this purpose. Such that the accuracy of support vector machine algorithm can be identified and made a suitable comparison with other algorithms. 13
  • 14. NB Random Forest Classifier: • The NB Random forests is an ensemble learning method for classification that operate by constructing a vector of classes from the pool of words consisting of different classes. It produces multi- altitude decision based on the parameters on which the classifying accuracy value are depended. The correlation between classes is reduced by randomly selecting the bag or pool of words and thus the prediction power increases and leads to increase in efficiency. • The predictions are made by aggregating the predictions of various ensemble data sets consisting of different classes. Such that the accuracy of support vector machine algorithm can be identified and made a suitable comparison with other algorithms. 14
  • 15. NB Bagging Classifier: • The NB Bagging is an ensemble machine learning technique which Employs simplest way of combining predictions that belong to the same type of classes, by differentiating the level of categorization in the process of identifying the text level representation of the word belong to which class of category that is (Positive, negative and neutral). Such that bagging method incorporate the process of identifying the classifier from the given dataset. • Therefore, improves the performances by identifying the different classifier belong to which class. Thus, with the parameter we can determine the accuracy in terms of value, so that in future we can use the bagging technique for sentiment analysis of product reviews. 15
  • 16. • Aggregated sentiment scores for each rating level. There are more instances of positive sentiments than negative sentiments. Ratings / Wordcloud Results, Comparison and Analysis
  • 17. • The AFINN lexicon is a list of words each rated with a sentiment score, reflecting the positivity or negativity of the word. • Reviews are scanned, and words are matched with the AFINN lexicon to assign a sentiment score based on the lexicon’s predefined ratings. • Individual word scores are summed to determine an overall sentiment score for each review, indicating its positive or negative nature. 1 - Keyword Based Analysis Sentimental Analysis
  • 18. • Machine learning models are used to predict the sentiment of reviews beyond simple positive or negative, often categorizing into multiple levels of sentiment. • Textual data is transformed into numerical form through a Document-Term Matrix (DTM) to facilitate machine learning. • Sparse columns in the Document-Term Matrix, where terms appear in less than 2% of the documents, are removed to reduce noise and computational complexity, enhancing model performance. • Each model is trained using a subset of the data (training set - 80% of total data) to learn the patterns associated with various sentiment levels, and evaluated on testing set (remaining 20% data). 2 – NB and Machine Learning Classifiers Sentimental Analysis
  • 19. • The Logistic Regression classifier achieved a solid accuracy of 71.15% in sentiment classification. • High sensitivity (recall) at 81.31% suggests the model is very good at identifying positive reviews but less effective in recognizing negative sentiments, with a specificity of 54.71%. • The model's precision is robust at 74.39%. • Balanced accuracy is at 68.01%, reflecting a reasonable trade-off between sensitivity and specificity across both sentiment classes. • The Matthew's Correlation Coefficient (MCC) stands at 0.3738, indicating that the classifier's performance is better than a random guess but still has room for improvement. Logistic Regression Classifer Sentimental Analysis
  • 20. Naive Bayes Classifier and Random Forest Sentimental Analysis • The Naive Bayes classifier achieved an overall accuracy of 33.75%, indicating a moderate level of performance in multi-class sentiment classification. • The sensitivity (recall) across classes is at 33.95%, suggesting that the model has a uniform but moderate ability to identify all sentiment classes correctly. • The model's balanced accuracy, which averages sensitivity and specificity, is 58.72%, showing that performance is not uniform across classes. • The Matthew's Correlation Coefficient (MCC) for the multi-class classification is 0.1792, which is low and suggests that there's considerable room for improvement in the model's predictive capability. • The model's precision (ppv) is low at 33.38%, indicating challenges in the accuracy of predictions across the various sentiment classes.
  • 21. NB and Support Vector Machines Sentimental Analysis • The Support Vector Machine (SVM) classifier shows an accuracy of 38.25% in multi-class sentiment classification, suggesting a modest ability to correctly classify sentiment levels. • With a balanced accuracy of 61.35%, the SVM classifier demonstrates a moderate level of consistency in correctly identifying sentiment across different classes. • The sensitivity (recall) and precision (ppv) of the model are both around 38%, indicating that the model has a fair rate of correctly identifying true positives for each class and that its positive predictions are correct at a similar rate. • The Matthew's Correlation Coefficient (MCC) at 0.2283 reflects a low but better than random correlation between the observed and predicted classifications. • The relatively low detection prevalence of 20% indicates that the model is quite selective in predicting the positive class, potentially leading to a higher number of false negatives.
  • 22. NB and Random Forest Classifier Sentimental Analysis • The Random Forest classifier demonstrates strong performance with an accuracy of 65.85%, indicating it is well-suited for multi-class sentiment classification. • High balanced accuracy of 78.56% suggests a consistent performance by the model in identifying various sentiment classes. • The model achieves a good sensitivity (recall) and precision (ppv), both approximately 65.87%, showing its effectiveness in correctly identifying and predicting sentiment classes. • The Matthew's Correlation Coefficient (MCC) is robust at 0.5745, suggesting a strong correlation between the observed and predicted classifications, well above what would be expected by chance. • The model's specificity is very high at 91.45%, indicating it is very effective at correctly identifying negative cases for each sentiment class.
  • 23. XGBoost Classifier Sentimental Analysis • The XGBoost classifier has achieved perfect performance metrics across the board on the training set, indicating that it has learned to classify the training data with 100% accuracy. • For the training data, every metric including accuracy, sensitivity (recall), specificity, precision (ppv), and Matthew's Correlation Coefficient (MCC) scored a maximum of 1.0, implying no misclassification. • The confusion matrix for the training data shows complete accuracy with no instances of false positives or false negatives, which may suggest overfitting to the training data. • The confusion matrix for the testing data also shows full 100% accuracy as indicated by zeros in non-diagonal cells. • The detection prevalence of 0.2 shows that the model has a tendency to predict positive cases at a rate of 20%, which is consistent across both the training and test sets.
  • 24. • The project showcased the use of various advanced hybrid machine learning models to analyze sentiment in product reviews with different levels of effectiveness. • Machine learning models, notably NB and Random Forest and XGBoost, outperformed the traditional keyword-based method, providing a deeper analysis of sentiments. • NB hybrid Random Forest demonstrated high accuracy and robust performance, marking it as a reliable classifier for sentiment analysis tasks. • Nb with XGBoost achieved perfect scores on training data but the perfectness may indicate potential overfitting issues. Conclusion Sentimental Analysis
  • 25. References [1] S. ChandraKala1 and C. Sindhu2, “OPINION MINING AND SENTIMENT CLASSIFICATION: A SURVEY,”.Vol .3(1),Oct- 2012,420-427. [2] Kim S-M, Hovy E (2004) Determining the sentiment of opinions In: Proceedings of the 20th international conference on Computational Linguistics, page 1367.Association for Computational Linguistics, Strasbourg, PA, USA. [3] Liu B (2010) Sentiment analysis and subjectivity In: Handbook of Natural Language Processing, Second Edition.. Taylor and Francis Group, Boca [4] Liu B, Hu M, Cheng J (2005) Opinion observer: Analyzing and comparing opinions on the web In: Proceedings of the 14th International Conference on World Wide Web, WWW ’05, 342–351.ACM, New York, NY, USA. 25
  • 26. [5] Pang B, Lee L (2004) A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts In: Proceedings of the 42Nd Annual Meeting on Association for Computational Linguistics, ACL ’04.Association for Computational Linguistics, Stroudsburg, PA, USA [6] Liu B, Hu M, Cheng J (2005) Opinion observer: Analyzing and comparing opinions on the web In: Proceedings of the 14th International Conference on World Wide Web, WWW ’05, 342– 351..ACM, New York, NY, USA. [7] Pang B, Lee L (2004) A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts In: Proceedings of the 42Nd Annual Meeting on Association for Computational Linguistics, ACL ’04..Association for Computational Linguistics, Stroudsburg, PA, USA.