SlideShare a Scribd company logo
 
 
 
Predicting
Restaurants Rating
and Popularity based
on Yelp Dataset
  
MACHINE LEARNING PROJECT REPORT 
 
 
 
 
 
Submitted by 
ALIN BABU (67) 
NANDU O (66) 
LIJU THOMAS (36) 
 
 
 
 
 
 
Introduction  
Restaurants rating on Yelp becomes an important indicator of their future. In this                         
project, we focus on predicting ratings and popularity change of restaurants. With data                         
from Yelp, we use several machine learning methods including logistic regression and                       
Naive Bayes, to make relevant predictions. While logistic regression seems to perform                       
better than the others, predictions from all the methods are far from perfect. This implies                             
the potential improvement of more data and more suited methodology. 
 
Project Objectives 
 
➔ To predict ratings of restaurants on Yelp and popularity change based on                       
restaurant features. 
➔ Project can shed light on what customers value the most about a restaurant. 
 
Dataset 
➔ The data comes from Yelp Dataset Challenge . 
➔ It includes review data, including text, time and rating. 
➔ From the raw dataset, we select 20000 samples for testing. 
➔ Due to different cultures across cities, we only focus on restaurants in a particular                           
city and surrounding areas in this project. 
1 
 
 
Algorithm and Methods 
 
  In this project we use mainly three machine learning algorithms to predict the 
restaurant rating.The algorithm used here are supervisory learning .The algorithm used in 
the project are: 
★ Logistic Regression 
★ Multinomial Naive Bayes 
★ Naive Bayes 
Logistic Regression 
Logistic regression is a classification algorithm used to assign observations to a                       
discrete set of classes. Unlike linear regression which outputs continuous number values,                       
logistic regression transforms its output using the logistic sigmoid function to return a                         
probability value which can then be mapped to two or more discrete classes. 
Naive Bayes 
  A Naive Bayes classifier is a probabilistic machine learning model that’s used for                         
classification task. The crux of the classifier is based on the Bayes theorem. 
Bayes Theorem: 
2 
 
Using Bayes theorem, we can find the probability of A happening, given that B has                             
occurred. Here, B is the evidence and A is the hypothesis. The assumption made here is                               
that the predictors/features are independent. That is presence of one particular feature                       
does not affect the other. Hence it is called naive. 
Multinomial Naive Bayes 
Multinomial Naive Bayes is a specialized version of Naive Bayes that is designed                         
more for text documents. Whereas simple naive Bayes would model a document as the                           
presence and absence of particular words, multinomial naive bayes explicitly models the                       
word counts and adjusts the underlying calculations to deal with in.  
 
 
Data Pre-Processing 
In this project we use mainly Yelp Dataset.Dataset consists of user review and                         
rating. We mainly use restaurant name, date, comfortability, star rating, comments,                     
review id out of these data we will two features which are essential for our prediction. 
Selecting two valid features manually. 
❖ Star rating 
❖ Comments 
After selecting the valid features we find the missing values of an attribute and then root 
word extracting from comment rating using the methods below: 
❖ Removing punctuations 
3 
 
❖ Removing stop words 
❖ Stemming - The process of producing morphological variants of a root/base word.   
 
Performance Evaluation 
 
Logistic Regression 
 
 
Naive Bayes 
 
 
4 
 
 
Multinomial Naive Bayes 
 
 
 
Conclusion 
 
  After testing with 20000 samples we can see that logistic regression performs                       
better than the other methods. One possible explanation is that the assumptions for other                           
models are problematic, and logistic regression is more robust to problematic model                       
assumptions.This implies the potential improvement of more data and more suited                     
methodology.However, the prediction needs further improvement. We compare our best                   
predictor-logistic regression with a random-number predictor, and a constant-number                 
predictor. As we can see, the logistic predictor is only slightly better than the                           
constant-number predictor. 
 
5 

More Related Content

PDF
[DLHacks]Comet ML -機械学習のためのGitHub-
Deep Learning JP
 
PDF
データサイエンス概論第一=7 画像処理
Seiichi Uchida
 
PPTX
CNNの構造最適化手法について
MasanoriSuganuma
 
PDF
無瑕的程式碼 Clean Code 心得分享
Win Yu
 
PPTX
[Ridge-i 論文よみかい] Wasserstein auto encoder
Masanari Kimura
 
PPTX
RedisConf17- durable_rules
Redis Labs
 
PDF
公平性を保証したAI/機械学習
アルゴリズムの最新理論
Kazuto Fukuchi
 
PDF
【論文紹介】U-GAT-IT
meownoisy
 
[DLHacks]Comet ML -機械学習のためのGitHub-
Deep Learning JP
 
データサイエンス概論第一=7 画像処理
Seiichi Uchida
 
CNNの構造最適化手法について
MasanoriSuganuma
 
無瑕的程式碼 Clean Code 心得分享
Win Yu
 
[Ridge-i 論文よみかい] Wasserstein auto encoder
Masanari Kimura
 
RedisConf17- durable_rules
Redis Labs
 
公平性を保証したAI/機械学習
アルゴリズムの最新理論
Kazuto Fukuchi
 
【論文紹介】U-GAT-IT
meownoisy
 

Similar to Prediciting restaurant and popularity based on Yelp Dataset project report (20)

PPTX
Prediciting restaurant and popularity based on Yelp Dataset - 2
ALIN BABU
 
PPTX
Naive Bayes
Abdullah al Mamun
 
PPTX
Prediciting restaurant and popularity based on Yelp Dataset - 1
ALIN BABU
 
PPTX
Applications of Classification Algorithm.pptx
nagarajan740445
 
PPTX
NaĂŻve Bayes Classifier Algorithm.pptx
PriyadharshiniG41
 
PDF
NAIVE BAYES ALGORITHM
Rang Technologies
 
PDF
Naive Bayes Simple approach to classification
nallavardhanreddyvar
 
PDF
Driver Analysis and Product Optimization with Bayesian Networks
Bayesia USA
 
PPTX
Presentation - Predicting Online Purchases Using Conversion Prediction Modeli...
Christopher Sneed, MSDS, PMP, CSPO
 
PPTX
1.1 Probability Theory and Naiv Bayse.pptx
sarwagyaadixitt
 
PPTX
Sentiment analysis with variaous Modeling
Yuki245468
 
DOCX
PM3 ARTICALS
ra na
 
PPTX
Chapter 11 KNN Naive Bayes and LDA.pptx
kiitlabsbsc
 
PDF
Yelp Rating Prediction
Kartik Lunkad
 
PPTX
Ml part2
Leon Gladston
 
PDF
Summary_Classification_Algorithms_Student_Data
Madeleine Organ
 
PDF
Computing Ratings and Rankings by Mining Feedback Comments
IRJET Journal
 
PPTX
Restaurant Review Sentiment Analysis
AkritiGupta99
 
PDF
IRJET- Survey of Classification of Business Reviews using Sentiment Analysis
IRJET Journal
 
PDF
Online Testing Learning to Rank with Solr Interleaving
Sease
 
Prediciting restaurant and popularity based on Yelp Dataset - 2
ALIN BABU
 
Naive Bayes
Abdullah al Mamun
 
Prediciting restaurant and popularity based on Yelp Dataset - 1
ALIN BABU
 
Applications of Classification Algorithm.pptx
nagarajan740445
 
NaĂŻve Bayes Classifier Algorithm.pptx
PriyadharshiniG41
 
NAIVE BAYES ALGORITHM
Rang Technologies
 
Naive Bayes Simple approach to classification
nallavardhanreddyvar
 
Driver Analysis and Product Optimization with Bayesian Networks
Bayesia USA
 
Presentation - Predicting Online Purchases Using Conversion Prediction Modeli...
Christopher Sneed, MSDS, PMP, CSPO
 
1.1 Probability Theory and Naiv Bayse.pptx
sarwagyaadixitt
 
Sentiment analysis with variaous Modeling
Yuki245468
 
PM3 ARTICALS
ra na
 
Chapter 11 KNN Naive Bayes and LDA.pptx
kiitlabsbsc
 
Yelp Rating Prediction
Kartik Lunkad
 
Ml part2
Leon Gladston
 
Summary_Classification_Algorithms_Student_Data
Madeleine Organ
 
Computing Ratings and Rankings by Mining Feedback Comments
IRJET Journal
 
Restaurant Review Sentiment Analysis
AkritiGupta99
 
IRJET- Survey of Classification of Business Reviews using Sentiment Analysis
IRJET Journal
 
Online Testing Learning to Rank with Solr Interleaving
Sease
 
Ad

More from ALIN BABU (7)

PPTX
SECRY - Secure file storage on cloud using hybrid cryptography
ALIN BABU
 
PDF
Project final report
ALIN BABU
 
PPTX
Secry poster
ALIN BABU
 
PDF
Secure Cloud Storage
ALIN BABU
 
PPTX
Secure cloud storage
ALIN BABU
 
PPTX
iPhone
ALIN BABU
 
PDF
Report
ALIN BABU
 
SECRY - Secure file storage on cloud using hybrid cryptography
ALIN BABU
 
Project final report
ALIN BABU
 
Secry poster
ALIN BABU
 
Secure Cloud Storage
ALIN BABU
 
Secure cloud storage
ALIN BABU
 
iPhone
ALIN BABU
 
Report
ALIN BABU
 
Ad

Recently uploaded (20)

PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 

Prediciting restaurant and popularity based on Yelp Dataset project report

  • 1.       Predicting Restaurants Rating and Popularity based on Yelp Dataset    MACHINE LEARNING PROJECT REPORT            Submitted by  ALIN BABU (67)  NANDU O (66)  LIJU THOMAS (36)         
  • 2.     Introduction   Restaurants rating on Yelp becomes an important indicator of their future. In this                          project, we focus on predicting ratings and popularity change of restaurants. With data                          from Yelp, we use several machine learning methods including logistic regression and                        Naive Bayes, to make relevant predictions. While logistic regression seems to perform                        better than the others, predictions from all the methods are far from perfect. This implies                              the potential improvement of more data and more suited methodology.    Project Objectives    ➔ To predict ratings of restaurants on Yelp and popularity change based on                        restaurant features.  ➔ Project can shed light on what customers value the most about a restaurant.    Dataset  ➔ The data comes from Yelp Dataset Challenge .  ➔ It includes review data, including text, time and rating.  ➔ From the raw dataset, we select 20000 samples for testing.  ➔ Due to different cultures across cities, we only focus on restaurants in a particular                            city and surrounding areas in this project.  1 
  • 3.     Algorithm and Methods      In this project we use mainly three machine learning algorithms to predict the  restaurant rating.The algorithm used here are supervisory learning .The algorithm used in  the project are:  ★ Logistic Regression  ★ Multinomial Naive Bayes  ★ Naive Bayes  Logistic Regression  Logistic regression is a classification algorithm used to assign observations to a                        discrete set of classes. Unlike linear regression which outputs continuous number values,                        logistic regression transforms its output using the logistic sigmoid function to return a                          probability value which can then be mapped to two or more discrete classes.  Naive Bayes    A Naive Bayes classifier is a probabilistic machine learning model that’s used for                          classification task. The crux of the classifier is based on the Bayes theorem.  Bayes Theorem:  2 
  • 4.   Using Bayes theorem, we can find the probability of A happening, given that B has                              occurred. Here, B is the evidence and A is the hypothesis. The assumption made here is                                that the predictors/features are independent. That is presence of one particular feature                        does not affect the other. Hence it is called naive.  Multinomial Naive Bayes  Multinomial Naive Bayes is a specialized version of Naive Bayes that is designed                          more for text documents. Whereas simple naive Bayes would model a document as the                            presence and absence of particular words, multinomial naive bayes explicitly models the                        word counts and adjusts the underlying calculations to deal with in.       Data Pre-Processing  In this project we use mainly Yelp Dataset.Dataset consists of user review and                          rating. We mainly use restaurant name, date, comfortability, star rating, comments,                      review id out of these data we will two features which are essential for our prediction.  Selecting two valid features manually.  ❖ Star rating  ❖ Comments  After selecting the valid features we find the missing values of an attribute and then root  word extracting from comment rating using the methods below:  ❖ Removing punctuations  3 
  • 5.   ❖ Removing stop words  ❖ Stemming - The process of producing morphological variants of a root/base word.      Performance Evaluation    Logistic Regression      Naive Bayes      4 
  • 6.     Multinomial Naive Bayes        Conclusion      After testing with 20000 samples we can see that logistic regression performs                        better than the other methods. One possible explanation is that the assumptions for other                            models are problematic, and logistic regression is more robust to problematic model                        assumptions.This implies the potential improvement of more data and more suited                      methodology.However, the prediction needs further improvement. We compare our best                    predictor-logistic regression with a random-number predictor, and a constant-number                  predictor. As we can see, the logistic predictor is only slightly better than the                            constant-number predictor.    5Â