LEXICON-BASED SENTIMENT ANALYSIS FOR PERSIAN
TEXT
Authors:
Mohammadhassan Khodashahi
Twitter: @KhodashahiMh
Email: mh.khodashai@gmail.com
1
CONTENTS
➢ Motivation
➢ Introduction
➢ Approach
✓ Data collection
✓ Processing resource
▪ Tokenizer
▪ Sentence splitter
▪ Pos tagger
▪ Jape rules
➢ Testing
✓ Performance testing (1)
✓ Performance testing (2)
➢ Conclusion and future works
➢ References
2
MOTIVATION
• Online shopping become more preferable for customers.
• E-business has improved in different aspects.
• User feedbacks and comments on different products have an important roll on other
users’ decision.
• Sentiment analysis works on this amount of data to give the expected information
for different Enterprises to help them improved.
3
INTRODUCTION
• Definition (Wikipedia): Sentiment analysis ( opinion mining) refers to the use
of natural language processing and text mining to identify and extract
subjective information in source materials.
• sentiment analysis determine the attitude of a speaker as
• Why for Persian language?! Number of Persian websites and users
increasing, but no such research for Persian language.
4
APPROACH
• Data collection:
● Gathering 7179 Persian adjectives.
● we received a total of 8278 votes for adjectives.
● Asking Persian people to vote about their sense on each
adjectives
● Calculate the average voting for each word, to define the
sentiment of them.
5
APPROACH…
6
• Tokenizer: splits the text into very simple tokens
• Sentence splitter: Fragments the text into its sentences
• Gazetter: our Gazeteer is our gathered adjectives with their sentiment
Approach…
7
APPROACH…
• Jape Rules: To identify regular expressions we have formulated as
grammar base
❖ Word level
“‫”ﺑﯽ‬ “Bi” and “‫”ﻧﺎ‬ “Na” + noun => negative word (‫ﻣﻌﺮﻓﺖ‬ ‫ﺑﯽ‬ ,‫)ﻧﺎﻣﺮد‬
“ ‫”ن‬ + Verb => negative verb (‫ﻓﮭﻤﺪ‬ ‫)ﻧﻤﯽ‬
❖ Sentence level
“ He is not a lair”
“That film had a lot of famous actors but it couldn’t attract people’s “
8
TESTING
• Performance test (1):
Using a Persian web site : www.iran-booking.com
69%
9
10
TESTING…
• Performance test (2)
Gathering about 5000 news from most famous Persian news websites
depends on Alexa rating.
Choose 800 news randomly and asked Persian people to vote on them about
their sense.
11
PERFORMANCE TEST (2)…
1200 votes have been gathered on 800 news.
12
CONCLUSION AND FUTURE WORKS
• For the first Persian API we receive to 69% accuracy till now
How to improved the accuracy?
❖ more jape rules especially in the form of “BUT” “Although” and …
❖ Encouraging more Persian people to vote on the adjectives and phrases to
have a better lexicon data base.
❖ Working on informal form too that’s common in tweets and comments.
13
REFERENCES
• [1] A SVM-Based Method for Sentiment Analysis in Persian Language
(Mohammad Sadegh Hajmohammadi, Roliana Ibrahim)
• [2] https://gate.ac.uk/
• [3] “Majid Sezavar” http://sazvar.student.um.ac.ir/
• [4]
http://www.coreservlets.com/Apache-Tomcat-Tutorial/tomcat-7-with
-eclipse.html
• [5] http://javapapers.com/web-service/java-web-service-using-eclipse
14
THANKS FOR YOUR ATTENTION…
15

More Related Content

PDF
Arabic Sentiment Lexicon - ESWC SSchool 14 - Student project
PDF
Disambiguating Polysemous Queries For Document Retrieval
PPT
PPT
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
PDF
Natural Language Processing: L01 introduction
PDF
Information Retrieval (for beginners)
PDF
Polarity detection of movie reviews in
PPTX
Boston Mini Upa2011: Localization Research Presentation by Jennifer Fabrizi a...
Arabic Sentiment Lexicon - ESWC SSchool 14 - Student project
Disambiguating Polysemous Queries For Document Retrieval
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Natural Language Processing: L01 introduction
Information Retrieval (for beginners)
Polarity detection of movie reviews in
Boston Mini Upa2011: Localization Research Presentation by Jennifer Fabrizi a...

Similar to Lexicon-based Sentiment Analysis for Persian Text (20)

PPTX
Lexicon-Based Sentiment Analysis at GHC 2014
PPTX
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
PPTX
Opinion Mining
PPT
Fypca4
PPTX
Fypca4
PPTX
Fypca4
PDF
Conversational AI from CAT I to III
PDF
Karen N. Johnson: Managing an Offshore Team
PPT
Introduction to Natural Language Processing
PPTX
Introducing talent auditions: The future of attracting, assessing, & acquirin...
PPTX
Introducing talent auditions: The future of attracting, assessing, & acquirin...
PPTX
Deep Learning for Dialogue Modeling - NTHU
KEY
Better Search Engine Testing
PPTX
Sentiment Analysis for SEO
PPTX
Customer Insights Workshop - Consumer Text Analytics Conference
PDF
Beyond the Symbols: A 30-minute Overview of NLP
PPTX
Mobipedia presentation
PPTX
[ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community
PDF
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
PPTX
What Questions Are Worth Answering?
Lexicon-Based Sentiment Analysis at GHC 2014
The Actionable Guide to Doing Better Semantic Keyword Research #BrightonSEO (...
Opinion Mining
Fypca4
Fypca4
Fypca4
Conversational AI from CAT I to III
Karen N. Johnson: Managing an Offshore Team
Introduction to Natural Language Processing
Introducing talent auditions: The future of attracting, assessing, & acquirin...
Introducing talent auditions: The future of attracting, assessing, & acquirin...
Deep Learning for Dialogue Modeling - NTHU
Better Search Engine Testing
Sentiment Analysis for SEO
Customer Insights Workshop - Consumer Text Analytics Conference
Beyond the Symbols: A 30-minute Overview of NLP
Mobipedia presentation
[ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
What Questions Are Worth Answering?
Ad

Recently uploaded (20)

PPTX
PPT for Diseases (1)-2, types of diseases.pptx
PDF
Mcdonald's : a half century growth . pdf
PDF
Introduction to Database Systems Lec # 1
PDF
PPT nikita containers of the company use
PPTX
cyber row.pptx for cyber proffesionals and hackers
PPTX
GPS sensor used agriculture land for automation
PPT
Technicalities in writing workshops indigenous language
PDF
9 FinOps Tools That Simplify Cloud Cost Reporting.pdf
PPTX
ch20 Database System Architecture by Rizvee
PDF
American Journal of Multidisciplinary Research and Review
PDF
Nucleic-Acids_-Structure-Typ...-1.pdf 011
PPTX
Power BI - Microsoft Power BI is an interactive data visualization software p...
PPTX
cardiac failure and associated notes.pptx
PPTX
Reinforcement learning in artificial intelligence and deep learning
PDF
toaz.info-grade-11-2nd-quarter-earth-and-life-science-pr_5360bfd5a497b75f7ae4...
PPTX
9 Bioterrorism.pptxnsbhsjdgdhdvkdbebrkndbd
PPTX
Sheep Seg. Marketing Plan_C2 2025 (1).pptx
PPTX
DATA ANALYTICS COURSE IN PITAMPURA.pptx
PPTX
DIGITAL DESIGN AND.pptx hhhhhhhhhhhhhhhhh
PPT
What is life? We never know the answer exactly
PPT for Diseases (1)-2, types of diseases.pptx
Mcdonald's : a half century growth . pdf
Introduction to Database Systems Lec # 1
PPT nikita containers of the company use
cyber row.pptx for cyber proffesionals and hackers
GPS sensor used agriculture land for automation
Technicalities in writing workshops indigenous language
9 FinOps Tools That Simplify Cloud Cost Reporting.pdf
ch20 Database System Architecture by Rizvee
American Journal of Multidisciplinary Research and Review
Nucleic-Acids_-Structure-Typ...-1.pdf 011
Power BI - Microsoft Power BI is an interactive data visualization software p...
cardiac failure and associated notes.pptx
Reinforcement learning in artificial intelligence and deep learning
toaz.info-grade-11-2nd-quarter-earth-and-life-science-pr_5360bfd5a497b75f7ae4...
9 Bioterrorism.pptxnsbhsjdgdhdvkdbebrkndbd
Sheep Seg. Marketing Plan_C2 2025 (1).pptx
DATA ANALYTICS COURSE IN PITAMPURA.pptx
DIGITAL DESIGN AND.pptx hhhhhhhhhhhhhhhhh
What is life? We never know the answer exactly
Ad

Lexicon-based Sentiment Analysis for Persian Text

  • 1. LEXICON-BASED SENTIMENT ANALYSIS FOR PERSIAN TEXT Authors: Mohammadhassan Khodashahi Twitter: @KhodashahiMh Email: [email protected] 1
  • 2. CONTENTS ➢ Motivation ➢ Introduction ➢ Approach ✓ Data collection ✓ Processing resource ▪ Tokenizer ▪ Sentence splitter ▪ Pos tagger ▪ Jape rules ➢ Testing ✓ Performance testing (1) ✓ Performance testing (2) ➢ Conclusion and future works ➢ References 2
  • 3. MOTIVATION • Online shopping become more preferable for customers. • E-business has improved in different aspects. • User feedbacks and comments on different products have an important roll on other users’ decision. • Sentiment analysis works on this amount of data to give the expected information for different Enterprises to help them improved. 3
  • 4. INTRODUCTION • Definition (Wikipedia): Sentiment analysis ( opinion mining) refers to the use of natural language processing and text mining to identify and extract subjective information in source materials. • sentiment analysis determine the attitude of a speaker as • Why for Persian language?! Number of Persian websites and users increasing, but no such research for Persian language. 4
  • 5. APPROACH • Data collection: ● Gathering 7179 Persian adjectives. ● we received a total of 8278 votes for adjectives. ● Asking Persian people to vote about their sense on each adjectives ● Calculate the average voting for each word, to define the sentiment of them. 5
  • 7. • Tokenizer: splits the text into very simple tokens • Sentence splitter: Fragments the text into its sentences • Gazetter: our Gazeteer is our gathered adjectives with their sentiment Approach… 7
  • 8. APPROACH… • Jape Rules: To identify regular expressions we have formulated as grammar base ❖ Word level “‫”ﺑﯽ‬ “Bi” and “‫”ﻧﺎ‬ “Na” + noun => negative word (‫ﻣﻌﺮﻓﺖ‬ ‫ﺑﯽ‬ ,‫)ﻧﺎﻣﺮد‬ “ ‫”ن‬ + Verb => negative verb (‫ﻓﮭﻤﺪ‬ ‫)ﻧﻤﯽ‬ ❖ Sentence level “ He is not a lair” “That film had a lot of famous actors but it couldn’t attract people’s “ 8
  • 9. TESTING • Performance test (1): Using a Persian web site : www.iran-booking.com 69% 9
  • 10. 10
  • 11. TESTING… • Performance test (2) Gathering about 5000 news from most famous Persian news websites depends on Alexa rating. Choose 800 news randomly and asked Persian people to vote on them about their sense. 11
  • 12. PERFORMANCE TEST (2)… 1200 votes have been gathered on 800 news. 12
  • 13. CONCLUSION AND FUTURE WORKS • For the first Persian API we receive to 69% accuracy till now How to improved the accuracy? ❖ more jape rules especially in the form of “BUT” “Although” and … ❖ Encouraging more Persian people to vote on the adjectives and phrases to have a better lexicon data base. ❖ Working on informal form too that’s common in tweets and comments. 13
  • 14. REFERENCES • [1] A SVM-Based Method for Sentiment Analysis in Persian Language (Mohammad Sadegh Hajmohammadi, Roliana Ibrahim) • [2] https://gate.ac.uk/ • [3] “Majid Sezavar” http://sazvar.student.um.ac.ir/ • [4] http://www.coreservlets.com/Apache-Tomcat-Tutorial/tomcat-7-with -eclipse.html • [5] http://javapapers.com/web-service/java-web-service-using-eclipse 14
  • 15. THANKS FOR YOUR ATTENTION… 15