SUB EVENT DETECTION
ON SOCIAL MEDIA
Kshitij Kansal	

Maaz Anwar Nomani	

Ahmed Ali Durga
Information Retrieval and Extraction
INTRODUCTION
1. Motivation	

• Social Media is filled with a lot of information.	

• Information is shared much before the news gets displayed on the
news websites.	

• The information shared captures even the minute details which
news websites might ovelook.	

• This gives us a lot of scope for early news detection with more
diminutive details.	

2. Objective	

• We aim to propose an automatic method for extracting Sub -
Events in the given Social Media feeds.
SUB EVENT
• What is a Sub Event?	

• Any kind of information which is small to be conveyed as a part of
whole event. 	

• large enough to affect some appreciably large reader's community.	

• Includes aftermath of an event, real time notifications, responses,
public sentiments and reports.	

• Why Sub Event?	

• Closely related to a particular commuity.	

• Can be used to enhance the knowledge of an event.	

• Can measure the public sentiments along the whole course of
occurance of the event.
OUR EXPERIMENT
• Detecting the "sub events" in the Twitter Stream related to the
US Presedential Elections.
• Main Event: US Presidential Elections and the Victory of
Barack Obama.
• Sub Events: Victory or defeats of some famous candidates,
public sentiments across the course of elections, changes in the
stock market as the treds start to pour out etc.
• The approach decided is not specific to this dataset only. This
can be applied to any dataset in the form of Twitter stream.
APPROACH
We followed an organised approach where we divide the
whole process in the following three sub parts which
were dealt with separately and later integrated.	

• Tweet filtering and Noise Reduction	

• Sub Event Detection	

• Sub Event Summarization
TWEET FILTERING AND NOISE REDUCTION
Aim: To eliminate the useless tweets which do not convey much
information regarding the event.
• Tweet Stream provided is cleaned using the self defined filter.
• Filter takes into account the linguistic aspects of the language and
context filtering.
• Remove Diacritic marks
• Consider only ASCII characters
• Ignore repeatitions
• Ignore Multiple Punctuations
• Consider only tweets starting with capitals
• Remove extremely small and large tweets
SUB EVENT EXTRACTION
Aim: To extract tweets that express some defining moments in the
event.
• To be applied on the filtered stream available from the noise
reduction module.
• Dictionary of the tweets words and generation of Tweet Vector
• Find the distance between the tweets.
• Group together the similar tweets.
• Chunks of relevant tweets will form the sub events.
• Hashing of the tweet stream to increse the speed of the system
EXTRACTION ...
Dictionary Creating and Vector Generation
• Dictionary Creation:
• Bag of Word Representation.
• Stop Word Removal.
• Assign unique ID to the words.
• Vector Generation:
• Create the n dimension vector
• n is the number of words in the dictionary.
• Vector value = 1, if word present
• Vector value = 0, if word not present
• Create sparse vector for space optimization.
EXTRACTION ...
Distance and Similarity Measures
• Euclidean Distance:
• Simple distance between the tweet vecors.
• Similar to finding distance between the points in n dimension space.
• n being the size of Tweet dictionary.
• Similarity Measure:
• Calculate the no of similar words in the tweets.
• If greter than some threshold, assume them to be similar
• Threshold(in our case): 50% of the length of smaller tweet.
• Takes into account the length of tweets i.e. Normalization.
• Cosine Similarity:
• Similar to above method.
• Also takes int account the length i.e. Normalization.
• Works by finding out the angle between the two tweets.
• Tweets are taken to points in n dimension space.
EXTRACTION ...
Hashing
• Increases the speed of retrieval module
• Locality Sensitive Hashing
• Dimension Reduction of high dimension data
• Maximizes the probability of collision of similar
tweets.
• PyLucene
• Python extension for using Java Lucene
• Apache Lucene is a free/open source
information retrieval software library
SUMMARIZATION
• Related tweets are extracted and stored in separated files.
• Need to make extract the sub event from these related tweets.
• Some kind of summarization of the colled tweets is required.
• Summarization needs to be in human readble form.
• Should able to convey the happeinings in the sub event.
• If possble, crawl data from the URL's in the links and use it for
summarization.
• Image support will increase its attractiveness and user
acceptability.
SUMMARIZATION ...
• Important for the end user evaluation.
• Thus,Summarization forms the crux of the content defined by a
sub-event.
• Two approaches to automatic summarization
• Extraction: Works by selecting a subset of existing words,
phrases, or sentences in the original text to form the summary
• Abstraction: build an internal semantic representation and
then use natural language generation techniques to create a
summary that is closer to what a human might generate
SUMMARIZATION ...
• Spanning Phrase approach is used.
• Took into account the most frequent words in the
cluster of tweets and club them.
• Choose two to be the maximum frequency of a word is
'w' ccurring in all the tweets.

More Related Content

PDF
D. Zardetto, Using Twitter data for the Social Mood on Economy Index
PPTX
Social network implicit and explicit market convergence
DOCX
Tweet analysis for real time event detection and earthquake reporting system ...
PPTX
Mining Product Synonyms - Slides
PDF
IRE- Algorithm Name Detection in Research Papers
PPT
Web Information Extraction Learning based on Probabilistic Graphical Models
ODP
Web Information Retrieval and Mining
PDF
[EN] Capture Indexing & Auto-Classification | DLM Forum Industry Whitepaper 0...
D. Zardetto, Using Twitter data for the Social Mood on Economy Index
Social network implicit and explicit market convergence
Tweet analysis for real time event detection and earthquake reporting system ...
Mining Product Synonyms - Slides
IRE- Algorithm Name Detection in Research Papers
Web Information Extraction Learning based on Probabilistic Graphical Models
Web Information Retrieval and Mining
[EN] Capture Indexing & Auto-Classification | DLM Forum Industry Whitepaper 0...

Viewers also liked (20)

PDF
Multimodal Information Extraction: Disease, Date and Location Retrieval
PDF
System for-health-diagnosis
PPT
Information extraction for Free Text
PPTX
Information_retrieval_and_extraction_IIIT
PDF
A survey of_eigenvector_methods_for_web_information_retrieval
PDF
Open Information Extraction 2nd
PDF
Information Retrieval and Extraction
PPTX
Algorithm Name Detection & Extraction
PDF
ATI Courses Professional Development Short Course Remote Sensing Information ...
PPTX
PDF
Information Extraction with UIMA - Usecases
ODP
Information Extraction from the Web - Algorithms and Tools
PPT
Enterprise information extraction: recent developments and open challenges
PPTX
Twitter Sentiment Analysis
PDF
Information Extraction from Web-Scale N-Gram Data
PDF
Information Extraction with Linked Data
PDF
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
PDF
Data and Information Extraction on the Web
PDF
Information Extraction
PPTX
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Multimodal Information Extraction: Disease, Date and Location Retrieval
System for-health-diagnosis
Information extraction for Free Text
Information_retrieval_and_extraction_IIIT
A survey of_eigenvector_methods_for_web_information_retrieval
Open Information Extraction 2nd
Information Retrieval and Extraction
Algorithm Name Detection & Extraction
ATI Courses Professional Development Short Course Remote Sensing Information ...
Information Extraction with UIMA - Usecases
Information Extraction from the Web - Algorithms and Tools
Enterprise information extraction: recent developments and open challenges
Twitter Sentiment Analysis
Information Extraction from Web-Scale N-Gram Data
Information Extraction with Linked Data
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
Data and Information Extraction on the Web
Information Extraction
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Ad

Similar to Group-13 Project 15 Sub event detection on social media (20)

PPTX
Twitter Sub-event Detection Project Presentation
PDF
Pre-defense_talk
PPTX
Event summarization using tweets
PDF
Surfacing Real-World Event Content on Twitter
PDF
Vakulenko PhD Status Report - 16 February 2016
PDF
SubTopic Detection of Tweets Related to an Entity
PDF
Topic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank Summarization
PDF
IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...
PDF
IRJET- Tweet Segmentation and its Application to Named Entity Recognition
PPTX
From Research to Applications: What Can We Extract with Social Media Sensing?
PDF
Event detection and summarization based on social networks and semantic query...
PDF
Cloud Major Project
PDF
Curating and Contextualizing Twitter Stories to Assist with Social Newsgathering
PDF
Detection and Analysis of Twitter Trending Topics via Link-Anomaly Detection
PDF
final_nlp
PDF
DMG_final
PPTX
A Framework for Collecting, Extracting and Managing Event Identity Informatio...
PPTX
Extracting City Traffic Events from Social Streams
PDF
UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...
PDF
eventdemo2016
Twitter Sub-event Detection Project Presentation
Pre-defense_talk
Event summarization using tweets
Surfacing Real-World Event Content on Twitter
Vakulenko PhD Status Report - 16 February 2016
SubTopic Detection of Tweets Related to an Entity
Topic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank Summarization
IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...
IRJET- Tweet Segmentation and its Application to Named Entity Recognition
From Research to Applications: What Can We Extract with Social Media Sensing?
Event detection and summarization based on social networks and semantic query...
Cloud Major Project
Curating and Contextualizing Twitter Stories to Assist with Social Newsgathering
Detection and Analysis of Twitter Trending Topics via Link-Anomaly Detection
final_nlp
DMG_final
A Framework for Collecting, Extracting and Managing Event Identity Informatio...
Extracting City Traffic Events from Social Streams
UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...
eventdemo2016
Ad

Recently uploaded (20)

PPTX
Core Concepts of Personalized Learning and Virtual Learning Environments
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
PDF
Journal of Dental Science - UDMY (2022).pdf
PPTX
Climate Change and Its Global Impact.pptx
PDF
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
PDF
My India Quiz Book_20210205121199924.pdf
PDF
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 2).pdf
PPTX
DRUGS USED FOR HORMONAL DISORDER, SUPPLIMENTATION, CONTRACEPTION, & MEDICAL T...
PPTX
Module on health assessment of CHN. pptx
PDF
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PDF
English Textual Question & Ans (12th Class).pdf
PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
PDF
Empowerment Technology for Senior High School Guide
PDF
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
PDF
Hazard Identification & Risk Assessment .pdf
PDF
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
PDF
Climate and Adaptation MCQs class 7 from chatgpt
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PPTX
What’s under the hood: Parsing standardized learning content for AI
Core Concepts of Personalized Learning and Virtual Learning Environments
B.Sc. DS Unit 2 Software Engineering.pptx
Journal of Dental Science - UDMY (2022).pdf
Climate Change and Its Global Impact.pptx
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
My India Quiz Book_20210205121199924.pdf
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 2).pdf
DRUGS USED FOR HORMONAL DISORDER, SUPPLIMENTATION, CONTRACEPTION, & MEDICAL T...
Module on health assessment of CHN. pptx
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
English Textual Question & Ans (12th Class).pdf
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
Empowerment Technology for Senior High School Guide
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
Hazard Identification & Risk Assessment .pdf
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
Climate and Adaptation MCQs class 7 from chatgpt
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
What’s under the hood: Parsing standardized learning content for AI

Group-13 Project 15 Sub event detection on social media

  • 1. SUB EVENT DETECTION ON SOCIAL MEDIA Kshitij Kansal Maaz Anwar Nomani Ahmed Ali Durga Information Retrieval and Extraction
  • 2. INTRODUCTION 1. Motivation • Social Media is filled with a lot of information. • Information is shared much before the news gets displayed on the news websites. • The information shared captures even the minute details which news websites might ovelook. • This gives us a lot of scope for early news detection with more diminutive details. 2. Objective • We aim to propose an automatic method for extracting Sub - Events in the given Social Media feeds.
  • 3. SUB EVENT • What is a Sub Event? • Any kind of information which is small to be conveyed as a part of whole event. • large enough to affect some appreciably large reader's community. • Includes aftermath of an event, real time notifications, responses, public sentiments and reports. • Why Sub Event? • Closely related to a particular commuity. • Can be used to enhance the knowledge of an event. • Can measure the public sentiments along the whole course of occurance of the event.
  • 4. OUR EXPERIMENT • Detecting the "sub events" in the Twitter Stream related to the US Presedential Elections. • Main Event: US Presidential Elections and the Victory of Barack Obama. • Sub Events: Victory or defeats of some famous candidates, public sentiments across the course of elections, changes in the stock market as the treds start to pour out etc. • The approach decided is not specific to this dataset only. This can be applied to any dataset in the form of Twitter stream.
  • 5. APPROACH We followed an organised approach where we divide the whole process in the following three sub parts which were dealt with separately and later integrated. • Tweet filtering and Noise Reduction • Sub Event Detection • Sub Event Summarization
  • 6. TWEET FILTERING AND NOISE REDUCTION Aim: To eliminate the useless tweets which do not convey much information regarding the event. • Tweet Stream provided is cleaned using the self defined filter. • Filter takes into account the linguistic aspects of the language and context filtering. • Remove Diacritic marks • Consider only ASCII characters • Ignore repeatitions • Ignore Multiple Punctuations • Consider only tweets starting with capitals • Remove extremely small and large tweets
  • 7. SUB EVENT EXTRACTION Aim: To extract tweets that express some defining moments in the event. • To be applied on the filtered stream available from the noise reduction module. • Dictionary of the tweets words and generation of Tweet Vector • Find the distance between the tweets. • Group together the similar tweets. • Chunks of relevant tweets will form the sub events. • Hashing of the tweet stream to increse the speed of the system
  • 8. EXTRACTION ... Dictionary Creating and Vector Generation • Dictionary Creation: • Bag of Word Representation. • Stop Word Removal. • Assign unique ID to the words. • Vector Generation: • Create the n dimension vector • n is the number of words in the dictionary. • Vector value = 1, if word present • Vector value = 0, if word not present • Create sparse vector for space optimization.
  • 9. EXTRACTION ... Distance and Similarity Measures • Euclidean Distance: • Simple distance between the tweet vecors. • Similar to finding distance between the points in n dimension space. • n being the size of Tweet dictionary. • Similarity Measure: • Calculate the no of similar words in the tweets. • If greter than some threshold, assume them to be similar • Threshold(in our case): 50% of the length of smaller tweet. • Takes into account the length of tweets i.e. Normalization. • Cosine Similarity: • Similar to above method. • Also takes int account the length i.e. Normalization. • Works by finding out the angle between the two tweets. • Tweets are taken to points in n dimension space.
  • 10. EXTRACTION ... Hashing • Increases the speed of retrieval module • Locality Sensitive Hashing • Dimension Reduction of high dimension data • Maximizes the probability of collision of similar tweets. • PyLucene • Python extension for using Java Lucene • Apache Lucene is a free/open source information retrieval software library
  • 11. SUMMARIZATION • Related tweets are extracted and stored in separated files. • Need to make extract the sub event from these related tweets. • Some kind of summarization of the colled tweets is required. • Summarization needs to be in human readble form. • Should able to convey the happeinings in the sub event. • If possble, crawl data from the URL's in the links and use it for summarization. • Image support will increase its attractiveness and user acceptability.
  • 12. SUMMARIZATION ... • Important for the end user evaluation. • Thus,Summarization forms the crux of the content defined by a sub-event. • Two approaches to automatic summarization • Extraction: Works by selecting a subset of existing words, phrases, or sentences in the original text to form the summary • Abstraction: build an internal semantic representation and then use natural language generation techniques to create a summary that is closer to what a human might generate
  • 13. SUMMARIZATION ... • Spanning Phrase approach is used. • Took into account the most frequent words in the cluster of tweets and club them. • Choose two to be the maximum frequency of a word is 'w' ccurring in all the tweets.