SlideShare a Scribd company logo
Online Feedback Correlation using Clustering Research Work Done for CS 651:  Internet Algorithms
Dedicated to Tibor Horvath Whose endless pursuit of getting a PhD (imagine that) kept him from researching this topic.
Problem Statement Millions+ of reviews available Consumers read only a small number of reviews. Reviewer content not always trustworthy
Problem Statement (continued) What information from reviews is important? What can we extract from the overall set of reviews efficiently to provide more utility to consumers than is already provided?
Motivation People are increasingly relying on online feedback mechanisms in making choices [Guernsey 2000] Online feedback mechanisms draw consumers Competitive Edge Quality currently bad
Current Solutions “ Good” review placement Show small number of reviews . . . more Trustworthy?
Amazon Example
Observations Consumers look at a product based on its overall rating Consumers read “editorial review” for content Reviews indicate can indicate common issues …   Can we correlate these reviews in some meaningful way?
Observations Lead to Hypotheses! Hypothesis:   Products with numerous similar negative reviews will often not be purchased regardless of their positive reviews. Furthermore, the number of negative reviews is a high indication of the likeliness of certain flaws in a product.
Definitions Semantic Orientation:  polar classification of whether something is positive or negative Natural Language Processing:  deciphering parts of speech from free text Feature:  quality of a product that customers care about Feature Vector:  vector representing a review in a d-dimensional space where each dimension represents a feature.
Overview of Project Obtain large repository of customer reviews Extract features from customer reviews and orient them Create feature vectors i.e. [1,0,-1,1,1,-1 … ] from reviews and features Cluster feature vectors to find large negative clusters Analyze clusters and compare to hypothesis
Related Work Related work has fallen into one of three disparate camps Classification:   classifying Reviews into Negative or Positive reviews Domain Specificity:  overall effect of reviews in a domain Summarization:  features extraction to summarize reviews
Limitations of Related Work Classification Overly summarizing Domain Specificity Hard to generalize given domain information Summarization No overall knowledge of collection
Close to Summarization? Most closely related to work done in Summarization by Hu and Liu. Summarization with dynamical feature extraction and orientation per review
Data for Project Data from Amazon.com customer reviews  Available through the use of Amazon E-Commerce Service (ECS) Four thousand products related to mp3 players Over twenty thousand customer reviews
Technologies Used Java to program modules Amazon ECS NLProcessor (trial version) from Infogistics Princeton’s WordNet as a thesaurus KMLocal from David Mount’s group at University of Maryland for clustering
Project Structure
Simplifications Made Limited data set  Feature list created a priori Features from same sentence given same orientation Sentences without features neglected Number of clusters chosen only to see correlations in biggest cluster Small adjective seed set
Analysis Associated Clusters with Products Found negative clusters using threshold (-0.1) Eliminated non-Negative Clusters  Sorted products list twice  Products by sales rank (given by Amazon) Products sorted by hypothesis with tweak Tweak:  Relative Size * Distortion Computed Spearman’s Distance
Results Hypothesis calculates with 82% accuracy! But most of the four thousand products were pruned due to poor orientation
Conclusion Consumers are affected by negative reviews that correlate to show similar flaws. Affected regardless of the positive reviews
Future Work Larger seed set for adjectives  Use more complicated NLP techniques Experiment with the size of clusters Dynamically determine features using summary techniques Use different data sets Use different distance measure in clustering
Questions

More Related Content

Similar to Online feedback correlation using clustering (20)

PPT
Social Recommender Systems Tutorial - WWW 2011
idoguy
 
PPT
Thesis Presentation
nirvdrum
 
PPTX
Aspect-level sentiment analysis of customer reviews using Double Propagation
Hardik Dalal
 
PDF
Amazon Product Review Sentiment Analysis with Machine Learning
ijtsrd
 
PPT
Collaborative filtering
Aravindharamanan S
 
PDF
Using NLP Approach for Analyzing Customer Reviews
cscpconf
 
PDF
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
csandit
 
PDF
Sentiment Analysis Using Hybrid Approach: A Survey
IJERA Editor
 
PPT
opinionminingkavitahyunduk00-110407113230-phpapp01.ppt
ssuser059331
 
PPT
opinionminingkavitahyunduk00-110407113230-phpapp01.ppt
ssuser059331
 
PPTX
Business Analytics Final Capstone Project Presenation PPT.pptx
Kavitha860274
 
PPTX
Collaborative Filtering Recommendation System
Milind Gokhale
 
PDF
Web Rec Final Report
weichen
 
PPTX
A b-testing-101
Madhumita Mantri
 
PDF
IRJET- Survey of Classification of Business Reviews using Sentiment Analysis
IRJET Journal
 
PDF
End Users’ Perception of Hybrid Mobile Apps in the Google Play Store
Ivano Malavolta
 
PDF
Systems Thinking - Shekman Tang
uxbri
 
PDF
Agentic Techniques in Retrieval-Augmented Generation with Azure AI Search
Maxim Salnikov
 
PPTX
How Does Customer Feedback Sentiment Analysis Work in Search Marketing?
Countants
 
PPT
Mining Product Reputations On the Web
feiwin
 
Social Recommender Systems Tutorial - WWW 2011
idoguy
 
Thesis Presentation
nirvdrum
 
Aspect-level sentiment analysis of customer reviews using Double Propagation
Hardik Dalal
 
Amazon Product Review Sentiment Analysis with Machine Learning
ijtsrd
 
Collaborative filtering
Aravindharamanan S
 
Using NLP Approach for Analyzing Customer Reviews
cscpconf
 
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
csandit
 
Sentiment Analysis Using Hybrid Approach: A Survey
IJERA Editor
 
opinionminingkavitahyunduk00-110407113230-phpapp01.ppt
ssuser059331
 
opinionminingkavitahyunduk00-110407113230-phpapp01.ppt
ssuser059331
 
Business Analytics Final Capstone Project Presenation PPT.pptx
Kavitha860274
 
Collaborative Filtering Recommendation System
Milind Gokhale
 
Web Rec Final Report
weichen
 
A b-testing-101
Madhumita Mantri
 
IRJET- Survey of Classification of Business Reviews using Sentiment Analysis
IRJET Journal
 
End Users’ Perception of Hybrid Mobile Apps in the Google Play Store
Ivano Malavolta
 
Systems Thinking - Shekman Tang
uxbri
 
Agentic Techniques in Retrieval-Augmented Generation with Azure AI Search
Maxim Salnikov
 
How Does Customer Feedback Sentiment Analysis Work in Search Marketing?
Countants
 
Mining Product Reputations On the Web
feiwin
 

More from awesomesos (18)

PPT
A Hardware Architecture For Implementing Protection Rings
awesomesos
 
PPT
Amazon’s Cloud Computing Efforts
awesomesos
 
PPT
Bringing The Grid Home for Grid2008
awesomesos
 
PPT
Handling Byzantine Faults
awesomesos
 
PPT
Masters of Science presentation: Bringing The Grid Home
awesomesos
 
PPT
DIOS - compilers
awesomesos
 
PPT
Distributed Snapshots
awesomesos
 
PPT
PicFS presentation
awesomesos
 
PPT
Web Service Choreography Interface (Wsci)
awesomesos
 
PPT
Hadoop Tutorial
awesomesos
 
PPT
Lustre And Nfs V4
awesomesos
 
PPT
An Installable File System For Genesis II
awesomesos
 
PPT
A Web Based Covert File System
awesomesos
 
PPT
DIOS
awesomesos
 
PPT
Distributed File Systems
awesomesos
 
PPT
Exploring The Cloud
awesomesos
 
PPT
Data Grid Taxonomies
awesomesos
 
PPT
A Guide to DAGMan
awesomesos
 
A Hardware Architecture For Implementing Protection Rings
awesomesos
 
Amazon’s Cloud Computing Efforts
awesomesos
 
Bringing The Grid Home for Grid2008
awesomesos
 
Handling Byzantine Faults
awesomesos
 
Masters of Science presentation: Bringing The Grid Home
awesomesos
 
DIOS - compilers
awesomesos
 
Distributed Snapshots
awesomesos
 
PicFS presentation
awesomesos
 
Web Service Choreography Interface (Wsci)
awesomesos
 
Hadoop Tutorial
awesomesos
 
Lustre And Nfs V4
awesomesos
 
An Installable File System For Genesis II
awesomesos
 
A Web Based Covert File System
awesomesos
 
Distributed File Systems
awesomesos
 
Exploring The Cloud
awesomesos
 
Data Grid Taxonomies
awesomesos
 
A Guide to DAGMan
awesomesos
 
Ad

Recently uploaded (20)

PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
July Patch Tuesday
Ivanti
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
July Patch Tuesday
Ivanti
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
Ad

Online feedback correlation using clustering

  • 1. Online Feedback Correlation using Clustering Research Work Done for CS 651: Internet Algorithms
  • 2. Dedicated to Tibor Horvath Whose endless pursuit of getting a PhD (imagine that) kept him from researching this topic.
  • 3. Problem Statement Millions+ of reviews available Consumers read only a small number of reviews. Reviewer content not always trustworthy
  • 4. Problem Statement (continued) What information from reviews is important? What can we extract from the overall set of reviews efficiently to provide more utility to consumers than is already provided?
  • 5. Motivation People are increasingly relying on online feedback mechanisms in making choices [Guernsey 2000] Online feedback mechanisms draw consumers Competitive Edge Quality currently bad
  • 6. Current Solutions “ Good” review placement Show small number of reviews . . . more Trustworthy?
  • 8. Observations Consumers look at a product based on its overall rating Consumers read “editorial review” for content Reviews indicate can indicate common issues … Can we correlate these reviews in some meaningful way?
  • 9. Observations Lead to Hypotheses! Hypothesis: Products with numerous similar negative reviews will often not be purchased regardless of their positive reviews. Furthermore, the number of negative reviews is a high indication of the likeliness of certain flaws in a product.
  • 10. Definitions Semantic Orientation: polar classification of whether something is positive or negative Natural Language Processing: deciphering parts of speech from free text Feature: quality of a product that customers care about Feature Vector: vector representing a review in a d-dimensional space where each dimension represents a feature.
  • 11. Overview of Project Obtain large repository of customer reviews Extract features from customer reviews and orient them Create feature vectors i.e. [1,0,-1,1,1,-1 … ] from reviews and features Cluster feature vectors to find large negative clusters Analyze clusters and compare to hypothesis
  • 12. Related Work Related work has fallen into one of three disparate camps Classification: classifying Reviews into Negative or Positive reviews Domain Specificity: overall effect of reviews in a domain Summarization: features extraction to summarize reviews
  • 13. Limitations of Related Work Classification Overly summarizing Domain Specificity Hard to generalize given domain information Summarization No overall knowledge of collection
  • 14. Close to Summarization? Most closely related to work done in Summarization by Hu and Liu. Summarization with dynamical feature extraction and orientation per review
  • 15. Data for Project Data from Amazon.com customer reviews Available through the use of Amazon E-Commerce Service (ECS) Four thousand products related to mp3 players Over twenty thousand customer reviews
  • 16. Technologies Used Java to program modules Amazon ECS NLProcessor (trial version) from Infogistics Princeton’s WordNet as a thesaurus KMLocal from David Mount’s group at University of Maryland for clustering
  • 18. Simplifications Made Limited data set Feature list created a priori Features from same sentence given same orientation Sentences without features neglected Number of clusters chosen only to see correlations in biggest cluster Small adjective seed set
  • 19. Analysis Associated Clusters with Products Found negative clusters using threshold (-0.1) Eliminated non-Negative Clusters Sorted products list twice Products by sales rank (given by Amazon) Products sorted by hypothesis with tweak Tweak: Relative Size * Distortion Computed Spearman’s Distance
  • 20. Results Hypothesis calculates with 82% accuracy! But most of the four thousand products were pruned due to poor orientation
  • 21. Conclusion Consumers are affected by negative reviews that correlate to show similar flaws. Affected regardless of the positive reviews
  • 22. Future Work Larger seed set for adjectives Use more complicated NLP techniques Experiment with the size of clusters Dynamically determine features using summary techniques Use different data sets Use different distance measure in clustering