SlideShare a Scribd company logo
Building a real time
Tweet map with
Flink in six weeks
OSTMap
Fast poc development with
flink
Proof of concept - an important tool in the
industry
• PoC often necessary to show feasibility to customers
• touch several topics:
• Scalability
• Stream processing
• Batch processing
• Storage and querying of data
• OSTMap as example PoC
Goals for OSTMap
• Increase trust into big data
technologies on customer side
• It is easy to build an application
with current technologies
• With almost no experience
• Teach students big data technologies
• Recruiting
• Bring big data to the university
• Build a real time application to view
recent geotagged tweets on a map
• Search for terms and users, show
these tweets on a map
• Analytics:
• First data science jobs
• …
Industry in practice: IT-Ringvorlesung 2016
• A course at the University of Leipzig.
• work on projects of local companies
• six students
• over a period of 6 weeks - no full time
invest
• Weekly meetings
• Github project: github.com/IIDP/OSTMap
Nico Graebling Vincent Märkl
Hans Dieter Pogrzeba
Christopher SchottChristopher Rost
Kevin Shrestha
Michael Schmeißer
Martin Grimmer
Matthias Kricke
OSTMap
mgm technology partners
We bring applications into production!
• Innovative software solution provider with application responsibility
• Specialist for highly scalable, transactional online applications
• Central lines of business: Insurance, E-Commerce, E-Government
• Founded in 1994
• 347 employees, 9 offices (2014)
• Revenue: 43,7 Mio € (2014)
• Part of Allgeier SE
ScaDS
Competence center for scalable data services and solutions Dresden/Leipzig
• bundled Big Data research expertise of the TU
Dresden and Leipzig University
• Drive Big Data innovations
• Bring industry and science together
• Knowledge exchange and transfer
Walking skeleton
“A Walking Skeleton is a tiny implementation of the system that performs a small end-to-
end function. It need not use the final architecture, but it should link together the main
architectural components. The architecture and the functionality can then evolve in
parallel.”
- Alistair Cockburn
gif from http://blog.codeclimate.com/blog/2014/03/20/kickstart-your-next-project-with-a-
walking-skeleton
Milestone 1
read stream, store data as json file, show tweets, read data from json files
Milestone 2
write to and read from accumulo, show tweets on map, full table scans, slow visualization
Milestone 3
Term index, geotemporal index, ui improvements, clustering, …
OSTMap – stream, batch, storage and querying
geotagged tweets
webservice
a) stream processing
b) batch processing
c) querying data
Stream processing of incoming data – first
version
GeoTweetSourc
e
KeyGeneration RawTweetSinkDateExtraction
This enabled us to build a slow term search and a slow map search via full table scans.
time index
data for
Stream processing of incoming data – final
version
TermIndexSink
GeoTweetSourc
e
KeyGeneration RawTweetSinkDateExtraction
Now we were able to build a faster term and map search and language frequency visualization.
time index
TermExtraction
(tokenizing)
UserExtraction
LanguageFrequ
encySink
Language
Extraction
term index
language statistics
GeoTemporalInd
exCreation
GeoTemporalInd
exSink
geotemporal index
data for
1 minute
window
sum by
language
Batch processing
• Initial creation of the term index and geotemporal
index for already processed tweets
• Data export
• Other statistics like:
• Area/ tweet distance a user covers with his tweets
Storage
Table Row Column Family Column Qualifier Value
RawTweetData (TimeIndex)
timestamp, hash
8b + 4b
- - raw tweet json
TermIndex term field (user,text)
RawTweetData key
12b
-
LanguageFrequency
time bucket
YYYYMMDDhhmm
language-tag -
tweet count
4b
Accumulo table design
Geotemporal Index for OSTMap
Geo index
geo data
geohashes used
as row keys
in accumulo
…
3z
6b
6c
6f
6q
9p
9r
9x
9z
d0
d1
d2
d3
d4
d5
d6
…
dg
9z db dc df dg
9x d8 d9 dd de
9r d2 d3 d6 d7
9p d0 d1 d4 d5
3z 6b 6c 6f 6g
partitioned by geohash (z
curve)
function from 2d coordinate
space to 1d key space
Row CF CQ
geohash RawTweetKey -
Geotemporal Index for OSTMap
Geo index – querying?
9z db dc df dg
9x d8 d9 dd de
9r d2 d3 d6 d7
9p d0 d1 d4 d5
3z 6b 6c 6f 6g
partitioned by geohash
bounding
box
calculate
coverage of
bounding box
range: [9p]
calculate scan
ranges from
coverage
range: [9r]
range:
[d0,d1,d2,d3]
…
3z
6b
6c
6f
6q
9p
9r
9x
9z
d0
d1
d2
d3
d4
d5
d6
…
dg
accumulo
iteratorsaccumulo
iterators
accumulo
iterators
result
Row CF CQ
geohash RawTweetKey lat/lon
9z db dc df dg
9x d8 d9 dd de
9r d2 d3 d6 d7
9p d0 d1 d4 d5
3z 6b 6c 6f 6g
9z db dc df dg
9x d8 d9 dd de
9r d2 d3 d6 d7
9p d0 d1 d4 d5
3z 6b 6c 6f 6g
Geotemporal Index for OSTMap
Add some time!
9z db dc df dg
9x d8 d9 dd de
9r d2 d3 d6 d7
9p d0 d1 d4 d5
3z 6b 6c 6f 6g
partitioned by geohash,
with timebuckets
…
13z
16b
16c
16f
16q
19p
19r
19x
19z
1d0
1d1
1d2
1d3
1d4
1d5
1d6
…
1dg
day
lon
lat
…
23z
26b
26c
26f
26q
29p
29r
29x
29z
2d0
2d1
2d2
2d3
2d4
2d5
2d6
…
2dg
…
Row CF CQ
day, geohash RawTweetKey lat/lon
day 1 day 2 day i …
9z db dc df dg
9x d8 d9 dd de
9r d2 d3 d6 d7
9p d0 d1 d4 d5
3z 6b 6c 6f 6g
9z db dc df dg
9x d8 d9 dd de
9r d2 d3 d6 d7
9p d0 d1 d4 d5
3z 6b 6c 6f 6g
Geotemporal Index for OSTMap
What about Hotspots?
9z db dc df dg
9x d8 d9 dd de
9r d2 d3 d6 d7
9p d0 d1 d4 d5
3z 6b 6c 6f 6g
partitioned by geohash,
with timebuckets
…
13z
16b
16c
16f
16q
19p
19r
19x
19z
1d0
1d1
1d2
1d3
1d4
1d5
1d6
…
1dg
day
lon
lat
…
23z
26b
26c
26f
26q
29p
29r
29x
29z
2d0
2d1
2d2
2d3
2d4
2d5
2d6
…
2dg
…
Row CF CQ
day, geohash RawTweetKey lat/lon
9z db dc df dg
9x d8 d9 dd de
9r d2 d3 d6 d7
9p d0 d1 d4 d5
3z 6b 6c 6f 6g
9z db dc df dg
9x d8 d9 dd de
9r d2 d3 d6 d7
9p d0 d1 d4 d5
3z 6b 6c 6f 6g
Geotemporal Index for OSTMap
What about Hotspots?
9z db dc df dg
9x d8 d9 dd de
9r d2 d3 d6 d7
9p d0 d1 d4 d5
3z 6b 6c 6f 6g
partitioned by geohash,
with timebuckets
day
lon
lat
…
12d2
12d3
12d4
…
…
Row CF CQ
sb, day, geohash RawTweetKey lat/lon
…
11d2
11d3
11d4
…
…
02d2
02d3
02d4
…
…
…
01d2
01d3
01d4
…
…
22d2
22d3
22d4
…
…
…
21d2
21d3
21d4
…
…
spreading byte
node 0
node 1
node 2
node n
• spreading byte = hash(tweet) % 255
• reproducable
• pre table splits in accumulo
demo
Martin Grimmer grimmer[at]informatik.uni-leipzig.de
Matthias Kricke kricke[at]informatik.uni-leipzig.de
www.mgm-tp.comwww.scads.de
Thank you
Michael Schmeißer michael.schmeisser[at]mgm-tp.com

More Related Content

PDF
IBM Cloud Community Summit 2018:「Kubernetes in Muiticloudで戦うCloud Native時代」 b...
capsmalt
 
PPTX
Gdal introduction
Tomer Lieber
 
PDF
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB Database
 
PDF
Incremental and parallel computation of structural graph summaries for evolvi...
Till Blume
 
PDF
Multiplatform development with Kotlin
Gaetan Zoritchak
 
ODP
OpenHistoricalMap tim waters - Topomancy / NYPL Lightning Talk
chippy
 
PDF
Snow cover assessment tool using Python
Prasun Kumar Gupta
 
IBM Cloud Community Summit 2018:「Kubernetes in Muiticloudで戦うCloud Native時代」 b...
capsmalt
 
Gdal introduction
Tomer Lieber
 
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB Database
 
Incremental and parallel computation of structural graph summaries for evolvi...
Till Blume
 
Multiplatform development with Kotlin
Gaetan Zoritchak
 
OpenHistoricalMap tim waters - Topomancy / NYPL Lightning Talk
chippy
 
Snow cover assessment tool using Python
Prasun Kumar Gupta
 

What's hot (15)

PDF
RaspberryPiで作るガイガーカウンター
Yu Kusanagi
 
PDF
Python crash course for geologists in the mining industry
Johann Dangin
 
PDF
G2G マッピングに関するアップデート
Shota Matsumoto
 
PDF
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
vishnu rao
 
PDF
EuroPython 2019: GeoSpatial Analysis using Python and JupyterHub
Martin Christen
 
PDF
OpenHistoricMap: overview
SK53
 
PDF
OSGi Community Event 2010 - OSGi and Terracotta - replication of states for c...
mfrancis
 
PDF
Ronan Kerr: Exploring the Debris Disk Around Beta Pictoris
JeremyHeyl
 
PDF
Analysing OpenStreetMap Data with QGIS
SK53
 
PDF
Open Historical Map: Vector Tiles & Other Updates
gwhathistory
 
PDF
Python Data Plotting and Visualisation Extravaganza
Guy K. Kloss
 
DOCX
Use of Nlog library in c#
bhai1122
 
PDF
LIDAR-derived DTM for archaeology and landscape history research some recent ...
Shaun Lewis
 
PDF
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료
BJ Jang
 
ODP
GStreamer Instruments
Kyrylo Polezhaiev
 
RaspberryPiで作るガイガーカウンター
Yu Kusanagi
 
Python crash course for geologists in the mining industry
Johann Dangin
 
G2G マッピングに関するアップデート
Shota Matsumoto
 
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
vishnu rao
 
EuroPython 2019: GeoSpatial Analysis using Python and JupyterHub
Martin Christen
 
OpenHistoricMap: overview
SK53
 
OSGi Community Event 2010 - OSGi and Terracotta - replication of states for c...
mfrancis
 
Ronan Kerr: Exploring the Debris Disk Around Beta Pictoris
JeremyHeyl
 
Analysing OpenStreetMap Data with QGIS
SK53
 
Open Historical Map: Vector Tiles & Other Updates
gwhathistory
 
Python Data Plotting and Visualisation Extravaganza
Guy K. Kloss
 
Use of Nlog library in c#
bhai1122
 
LIDAR-derived DTM for archaeology and landscape history research some recent ...
Shaun Lewis
 
Mago3D Barcelona ICGC(카탈루니아 지형 및 지질연구소) 발표자료
BJ Jang
 
GStreamer Instruments
Kyrylo Polezhaiev
 
Ad

Similar to Building a real time Tweet map with Flink in six weeks (20)

PDF
Portfolio
Ivan Khomyakov
 
PDF
Copy of Copy of Untitled presentation (1).pdf
josephdonnelly2024
 
PDF
Quarterly Technology Briefing, Manchester, UK September 2013
Thoughtworks
 
PDF
ESTA-LD exploring spatio-temporal linked statistical data
geoknow
 
PDF
Esta ld -exploring-spatio-temporal-linked-statistical-data
geoknow
 
PPTX
CitySDK Workshop Feedback
Tyrone Grandison
 
PDF
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
Databricks
 
PPT
Chapter 6 project management
Shadina Shah
 
PDF
Engineering + Programming portfolio
JosephDonnelly14
 
PPTX
SCHEDULING IN PROJECT MANAGEMENT PROJECT SCHEDULE MANAGEMENT
Intan Razali
 
PDF
SC20 SYCL and C++ Birds of a Feather 19th Nov 2020
rodburns
 
PDF
Traveloka's data journey — Traveloka data meetup #2
Traveloka
 
PDF
Graph operations in Git version control system
Jakub Narębski
 
PPTX
Scalable data pipeline at Traveloka - Facebook Dev Bandung
Rendy Bambang Junior
 
PDF
Your Timestamps Deserve Better than a Generic Database
javier ramirez
 
PPTX
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
InfluxData
 
PDF
QTB Technology Lab - The Travel Domain, Beyond SQL, the Cloud, and more...
Thoughtworks
 
PDF
GIS in Pharo PharoOWS & GeoView (ESUG 2025)
ESUG
 
PDF
Deduplicating and analysing time-series data with Apache Beam and QuestDB
javier ramirez
 
PDF
Graphite, an introduction
jamesrwu
 
Portfolio
Ivan Khomyakov
 
Copy of Copy of Untitled presentation (1).pdf
josephdonnelly2024
 
Quarterly Technology Briefing, Manchester, UK September 2013
Thoughtworks
 
ESTA-LD exploring spatio-temporal linked statistical data
geoknow
 
Esta ld -exploring-spatio-temporal-linked-statistical-data
geoknow
 
CitySDK Workshop Feedback
Tyrone Grandison
 
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...
Databricks
 
Chapter 6 project management
Shadina Shah
 
Engineering + Programming portfolio
JosephDonnelly14
 
SCHEDULING IN PROJECT MANAGEMENT PROJECT SCHEDULE MANAGEMENT
Intan Razali
 
SC20 SYCL and C++ Birds of a Feather 19th Nov 2020
rodburns
 
Traveloka's data journey — Traveloka data meetup #2
Traveloka
 
Graph operations in Git version control system
Jakub Narębski
 
Scalable data pipeline at Traveloka - Facebook Dev Bandung
Rendy Bambang Junior
 
Your Timestamps Deserve Better than a Generic Database
javier ramirez
 
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
InfluxData
 
QTB Technology Lab - The Travel Domain, Beyond SQL, the Cloud, and more...
Thoughtworks
 
GIS in Pharo PharoOWS & GeoView (ESUG 2025)
ESUG
 
Deduplicating and analysing time-series data with Apache Beam and QuestDB
javier ramirez
 
Graphite, an introduction
jamesrwu
 
Ad

Recently uploaded (20)

PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
Software Development Methodologies in 2025
KodekX
 
Doc9.....................................
SofiaCollazos
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 

Building a real time Tweet map with Flink in six weeks

  • 1. Building a real time Tweet map with Flink in six weeks OSTMap Fast poc development with flink
  • 2. Proof of concept - an important tool in the industry • PoC often necessary to show feasibility to customers • touch several topics: • Scalability • Stream processing • Batch processing • Storage and querying of data • OSTMap as example PoC
  • 3. Goals for OSTMap • Increase trust into big data technologies on customer side • It is easy to build an application with current technologies • With almost no experience • Teach students big data technologies • Recruiting • Bring big data to the university • Build a real time application to view recent geotagged tweets on a map • Search for terms and users, show these tweets on a map • Analytics: • First data science jobs • …
  • 4. Industry in practice: IT-Ringvorlesung 2016 • A course at the University of Leipzig. • work on projects of local companies • six students • over a period of 6 weeks - no full time invest • Weekly meetings • Github project: github.com/IIDP/OSTMap Nico Graebling Vincent Märkl Hans Dieter Pogrzeba Christopher SchottChristopher Rost Kevin Shrestha Michael Schmeißer Martin Grimmer Matthias Kricke OSTMap
  • 5. mgm technology partners We bring applications into production! • Innovative software solution provider with application responsibility • Specialist for highly scalable, transactional online applications • Central lines of business: Insurance, E-Commerce, E-Government • Founded in 1994 • 347 employees, 9 offices (2014) • Revenue: 43,7 Mio € (2014) • Part of Allgeier SE
  • 6. ScaDS Competence center for scalable data services and solutions Dresden/Leipzig • bundled Big Data research expertise of the TU Dresden and Leipzig University • Drive Big Data innovations • Bring industry and science together • Knowledge exchange and transfer
  • 7. Walking skeleton “A Walking Skeleton is a tiny implementation of the system that performs a small end-to- end function. It need not use the final architecture, but it should link together the main architectural components. The architecture and the functionality can then evolve in parallel.” - Alistair Cockburn gif from http://blog.codeclimate.com/blog/2014/03/20/kickstart-your-next-project-with-a- walking-skeleton
  • 8. Milestone 1 read stream, store data as json file, show tweets, read data from json files
  • 9. Milestone 2 write to and read from accumulo, show tweets on map, full table scans, slow visualization
  • 10. Milestone 3 Term index, geotemporal index, ui improvements, clustering, …
  • 11. OSTMap – stream, batch, storage and querying geotagged tweets webservice a) stream processing b) batch processing c) querying data
  • 12. Stream processing of incoming data – first version GeoTweetSourc e KeyGeneration RawTweetSinkDateExtraction This enabled us to build a slow term search and a slow map search via full table scans. time index data for
  • 13. Stream processing of incoming data – final version TermIndexSink GeoTweetSourc e KeyGeneration RawTweetSinkDateExtraction Now we were able to build a faster term and map search and language frequency visualization. time index TermExtraction (tokenizing) UserExtraction LanguageFrequ encySink Language Extraction term index language statistics GeoTemporalInd exCreation GeoTemporalInd exSink geotemporal index data for 1 minute window sum by language
  • 14. Batch processing • Initial creation of the term index and geotemporal index for already processed tweets • Data export • Other statistics like: • Area/ tweet distance a user covers with his tweets
  • 15. Storage Table Row Column Family Column Qualifier Value RawTweetData (TimeIndex) timestamp, hash 8b + 4b - - raw tweet json TermIndex term field (user,text) RawTweetData key 12b - LanguageFrequency time bucket YYYYMMDDhhmm language-tag - tweet count 4b Accumulo table design
  • 16. Geotemporal Index for OSTMap Geo index geo data geohashes used as row keys in accumulo … 3z 6b 6c 6f 6q 9p 9r 9x 9z d0 d1 d2 d3 d4 d5 d6 … dg 9z db dc df dg 9x d8 d9 dd de 9r d2 d3 d6 d7 9p d0 d1 d4 d5 3z 6b 6c 6f 6g partitioned by geohash (z curve) function from 2d coordinate space to 1d key space Row CF CQ geohash RawTweetKey -
  • 17. Geotemporal Index for OSTMap Geo index – querying? 9z db dc df dg 9x d8 d9 dd de 9r d2 d3 d6 d7 9p d0 d1 d4 d5 3z 6b 6c 6f 6g partitioned by geohash bounding box calculate coverage of bounding box range: [9p] calculate scan ranges from coverage range: [9r] range: [d0,d1,d2,d3] … 3z 6b 6c 6f 6q 9p 9r 9x 9z d0 d1 d2 d3 d4 d5 d6 … dg accumulo iteratorsaccumulo iterators accumulo iterators result Row CF CQ geohash RawTweetKey lat/lon
  • 18. 9z db dc df dg 9x d8 d9 dd de 9r d2 d3 d6 d7 9p d0 d1 d4 d5 3z 6b 6c 6f 6g 9z db dc df dg 9x d8 d9 dd de 9r d2 d3 d6 d7 9p d0 d1 d4 d5 3z 6b 6c 6f 6g Geotemporal Index for OSTMap Add some time! 9z db dc df dg 9x d8 d9 dd de 9r d2 d3 d6 d7 9p d0 d1 d4 d5 3z 6b 6c 6f 6g partitioned by geohash, with timebuckets … 13z 16b 16c 16f 16q 19p 19r 19x 19z 1d0 1d1 1d2 1d3 1d4 1d5 1d6 … 1dg day lon lat … 23z 26b 26c 26f 26q 29p 29r 29x 29z 2d0 2d1 2d2 2d3 2d4 2d5 2d6 … 2dg … Row CF CQ day, geohash RawTweetKey lat/lon day 1 day 2 day i …
  • 19. 9z db dc df dg 9x d8 d9 dd de 9r d2 d3 d6 d7 9p d0 d1 d4 d5 3z 6b 6c 6f 6g 9z db dc df dg 9x d8 d9 dd de 9r d2 d3 d6 d7 9p d0 d1 d4 d5 3z 6b 6c 6f 6g Geotemporal Index for OSTMap What about Hotspots? 9z db dc df dg 9x d8 d9 dd de 9r d2 d3 d6 d7 9p d0 d1 d4 d5 3z 6b 6c 6f 6g partitioned by geohash, with timebuckets … 13z 16b 16c 16f 16q 19p 19r 19x 19z 1d0 1d1 1d2 1d3 1d4 1d5 1d6 … 1dg day lon lat … 23z 26b 26c 26f 26q 29p 29r 29x 29z 2d0 2d1 2d2 2d3 2d4 2d5 2d6 … 2dg … Row CF CQ day, geohash RawTweetKey lat/lon
  • 20. 9z db dc df dg 9x d8 d9 dd de 9r d2 d3 d6 d7 9p d0 d1 d4 d5 3z 6b 6c 6f 6g 9z db dc df dg 9x d8 d9 dd de 9r d2 d3 d6 d7 9p d0 d1 d4 d5 3z 6b 6c 6f 6g Geotemporal Index for OSTMap What about Hotspots? 9z db dc df dg 9x d8 d9 dd de 9r d2 d3 d6 d7 9p d0 d1 d4 d5 3z 6b 6c 6f 6g partitioned by geohash, with timebuckets day lon lat … 12d2 12d3 12d4 … … Row CF CQ sb, day, geohash RawTweetKey lat/lon … 11d2 11d3 11d4 … … 02d2 02d3 02d4 … … … 01d2 01d3 01d4 … … 22d2 22d3 22d4 … … … 21d2 21d3 21d4 … … spreading byte node 0 node 1 node 2 node n • spreading byte = hash(tweet) % 255 • reproducable • pre table splits in accumulo
  • 21. demo
  • 22. Martin Grimmer grimmer[at]informatik.uni-leipzig.de Matthias Kricke kricke[at]informatik.uni-leipzig.de www.mgm-tp.comwww.scads.de Thank you Michael Schmeißer michael.schmeisser[at]mgm-tp.com

Editor's Notes