SlideShare a Scribd company logo
Faceting analyzed fields
with some sprinkles of
probability theory
conjures trending topic analysis and other
interesting insights
Boaz Leskes
Elasticsearch
@bleskes
work done for
Buzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference referencetopic
© Buzzcapture
topic reference
≠
topic reference
P(w|T) =
kDt|w 2 Dtk
kDtk
Berlin buzzwords 2013 - Faceting analyzed fields with some sprinkles of probability theory
topic reference
P(w|T) =
kDt|w 2 Dtk
kDtk
P(w|T) =
kDt|w 2 Dtk
kDtk
brown
dog
fox
quick
2 5 10 12
5 6 12 13
2
5
6
10
12
13
brown
dog
fox
quick
In our index.
• Terms = 12GB
• “Arrows” = 41GB
{
    tweet: {
        type:      "string",
        analyzer:  "whitespace"
        fielddata: {
            filter: {
                regex:         "^#.*",
                frequency: {
                    min:       10
                }
            }
        }
    }
}
Drop terms which occur too little
Drop docs with too many terms
reference referencetopic
© Buzzcapture
iculture	

 10,122
floor 	

 8,998
cover	

 6,874
toy	

 4,402
ground	

 3,841
4.0	

 7,878
4.1	

 4,292
rtacties	

 4,078
jelly	

 2,905
bean	

 2,857

More Related Content

PDF
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Austin Benson
 
PDF
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
InfluxData
 
PDF
Data correlation using PySpark and HDFS
John Conley
 
PDF
Case study ap log collector
Jyun-Yao Huang
 
PDF
Boosting command line experience with python and awk
Kirill Pavlov
 
PDF
SSN-TC workshop talk at ISWC 2015 on Emrooz
Markus Stocker
 
PDF
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
Christopher Bradford
 
PPTX
Weather of the Century: Visualization
MongoDB
 
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Austin Benson
 
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
InfluxData
 
Data correlation using PySpark and HDFS
John Conley
 
Case study ap log collector
Jyun-Yao Huang
 
Boosting command line experience with python and awk
Kirill Pavlov
 
SSN-TC workshop talk at ISWC 2015 on Emrooz
Markus Stocker
 
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
Christopher Bradford
 
Weather of the Century: Visualization
MongoDB
 

What's hot (13)

PDF
The Weather of the Century
MongoDB
 
PDF
Visualization Lifecycle
Raffael Marty
 
PDF
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
Alexander Schätzle
 
PDF
The Weather of the Century Part 3: Visualization
MongoDB
 
PDF
AfterGlow
Raffael Marty
 
PDF
R statistics with mongo db
MongoDB
 
PDF
Evaluating the Effectiveness of Axiomatic Approaches in Web Track
Twitter Inc.
 
PDF
Quick 入門 | iOS RDD テストフレームワーク for Swift/Objective-C
Yuki Tanabe
 
PPT
Hashing gt1
Gopi Saiteja
 
PPTX
P4 2017 io
Prof. Wim Van Criekinge
 
PDF
Rdio's Alex Gaynor at Heroku's Waza 2013: Why Python, Ruby and Javascript are...
Heroku
 
PPTX
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Cody Ray
 
PPTX
Data Tribology: Overcoming Data Friction with Cloud Automation
Ian Foster
 
The Weather of the Century
MongoDB
 
Visualization Lifecycle
Raffael Marty
 
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
Alexander Schätzle
 
The Weather of the Century Part 3: Visualization
MongoDB
 
AfterGlow
Raffael Marty
 
R statistics with mongo db
MongoDB
 
Evaluating the Effectiveness of Axiomatic Approaches in Web Track
Twitter Inc.
 
Quick 入門 | iOS RDD テストフレームワーク for Swift/Objective-C
Yuki Tanabe
 
Hashing gt1
Gopi Saiteja
 
Rdio's Alex Gaynor at Heroku's Waza 2013: Why Python, Ruby and Javascript are...
Heroku
 
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Cody Ray
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Ian Foster
 
Ad

Viewers also liked (6)

KEY
Elastic search meetup 20/9/2012
Boaz Leskes
 
DOCX
Snapchat Usability Testing
tahreemsaleem
 
PPTX
Go on GAE (Go Israel Meetup)
Or Hiltch
 
PDF
When Developers Operate and Operators Develop
Adrian Cockcroft
 
PDF
Google App Engine (Introduction)
Praveen Hanchinal
 
PDF
Snap chat Interface Analysis Report
Seunghun Yoo
 
Elastic search meetup 20/9/2012
Boaz Leskes
 
Snapchat Usability Testing
tahreemsaleem
 
Go on GAE (Go Israel Meetup)
Or Hiltch
 
When Developers Operate and Operators Develop
Adrian Cockcroft
 
Google App Engine (Introduction)
Praveen Hanchinal
 
Snap chat Interface Analysis Report
Seunghun Yoo
 
Ad

Similar to Berlin buzzwords 2013 - Faceting analyzed fields with some sprinkles of probability theory (19)

PDF
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Vladimir Alexiev, PhD, PMP
 
PDF
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Vrije Universiteit Amsterdam
 
PPTX
To SQL or NoSQL, That is the Question
SirKetchup
 
PDF
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Tomonari Masada
 
PPTX
OLTP+OLAP=HTAP
EDB
 
PPTX
Scoobi - Scala for Startups
bmlever
 
PPTX
Unlocking Your Hadoop Data with Apache Spark and CDH5
SAP Concur
 
PDF
MCE^3 - Lasse Koskela - Full-Text Search on iOS and Android
PROIDEA
 
PPT
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
rusbase
 
PDF
Hands On Spring Data
Eric Bottard
 
PDF
Alastair Butler - 2015 - Round trips with meaning stopovers
Association for Computational Linguistics
 
PPTX
Introduction to MongoDB and Hadoop
Steven Francia
 
PDF
게임을 위한 DynamoDB 사례 및 팁 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
Amazon Web Services Korea
 
PDF
Robust Operations of Kafka Streams
confluent
 
PDF
ESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic Web
eswcsummerschool
 
PDF
Staab programming thesemanticweb
Aneta Tu
 
PDF
Elasticsearch: You know, for search! and more!
Philips Kokoh Prasetyo
 
PPS
CS101- Introduction to Computing- Lecture 29
Bilal Ahmed
 
PDF
Spark devoxx2014
Andy Petrella
 
Getty Vocabulary Program LOD: Ontologies and Semantic Representation
Vladimir Alexiev, PhD, PMP
 
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Vrije Universiteit Amsterdam
 
To SQL or NoSQL, That is the Question
SirKetchup
 
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Tomonari Masada
 
OLTP+OLAP=HTAP
EDB
 
Scoobi - Scala for Startups
bmlever
 
Unlocking Your Hadoop Data with Apache Spark and CDH5
SAP Concur
 
MCE^3 - Lasse Koskela - Full-Text Search on iOS and Android
PROIDEA
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
rusbase
 
Hands On Spring Data
Eric Bottard
 
Alastair Butler - 2015 - Round trips with meaning stopovers
Association for Computational Linguistics
 
Introduction to MongoDB and Hadoop
Steven Francia
 
게임을 위한 DynamoDB 사례 및 팁 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
Amazon Web Services Korea
 
Robust Operations of Kafka Streams
confluent
 
ESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic Web
eswcsummerschool
 
Staab programming thesemanticweb
Aneta Tu
 
Elasticsearch: You know, for search! and more!
Philips Kokoh Prasetyo
 
CS101- Introduction to Computing- Lecture 29
Bilal Ahmed
 
Spark devoxx2014
Andy Petrella
 

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 

Berlin buzzwords 2013 - Faceting analyzed fields with some sprinkles of probability theory