SlideShare a Scribd company logo
Exploring data with
Elasticsearch and Kibana
Patrick Puecher, Developer SFSCon, November 10th 2017
Elastic Stack (the ELK Stack)
Elasticsearch Kibana
BeatsLogstash
Elasticsearch
- Distributed, RESTful search engine
- Based on Lucene
- Written in Java
- Apache License
- APIs
- Indexes APIs
- Document APIs
- Search APIs
- …
Kibana
- Visualize your data
- Histograms, line graphs, pie charts, …
- Time Series with Timelion
Logstash
- Server-side data processing pipeline
- How Logstash works
- Inputs
- file, syslog, redis, beats, …
- Filters
- split, mutate (convert, rename, add_field, remove_field), date, …
- Outputs
- elasticsearch, file, email, exec, …
Beats
- Send data from machines to Logstash and Elasticsearch
- Beats family
- Filebeat
- log files
- Metricbeat
- system and service metrics
- Packetbeat
- network data
- Winlogbeat
- windows event logs
- Heartbeat (beta)
- uptime monitoring
Demo time!
1Big Data 4 Tourism
- Input: CSV file
- Data processing: Java API
- Visualizing: Kibana
2Instagram Data
- Input: JSON files
- Data processing: Logstash & jq
- Visualizing: Kibana
./jq
Demo 1: Big Data 4 Tourism
- Collect and visualize accommodation enquiries and bookings
○ Create Elasticsearch index
○ Tourism Data Collector (https://github.com/idm-suedtirol/big-data-for-tourism)
- Upload and process CSV files
- Written in Java
- Open Source ツ
○ Kibana to visualize
- Big Data 4 Tourism working group by IDM Südtirol - Alto Adige
○ Brandnamic, HGV, IDM Südtirol - Alto Adige, Internet Consulting, Limitis, LTS, Peer GmbH,
SiMedia …
PUT /tourism-data_2017
{
"mappings" : {
"enquiry" : {
"properties" : {
"arrival" : { "type" : "date", "format" : "epoch_millis||date" },
"departure" : { "type" : "date", "format" : "epoch_millis||date" },
"country.code" : { "type" : "keyword" },
"country.name" : { "type" : "keyword" },
"country.latlon" : { "type" : "geo_point" },
"adults" : { "type" : "byte" },
"children" : { "type" : "byte" },
"destination.code" : { "type" : "short" },
"destination.name" : { "type" : "keyword" },
"destination.latlon" : { "type" : "geo_point" },
"category.code" : { "type" : "byte" },
"category.name" : { "type" : "keyword" },
"booking" : { "type" : "boolean" },
"cancellation" : { "type" : "boolean" },
"submitted_on" : { "type" : "date", "format" : "epoch_millis||date||date_hour_minute_second"},
"length_of_stay" : { "type" : "short" }
}
}
}
}
Demo 1: Create Elasticsearch index
"2015-01-01","2015-01-03","","2","0","21027","1","1","0","2015-01-01T01:59:00"
Demo 1: Tourism Data Collector
Demo 1: Visualize sample data (I)
1
2
Demo 1: Visualize sample data (II)
2
1
3
How to get Instagram posts
from South Tyrol?
Demo 2: Instagram data
Mission: Must-see places for route planner
Demo 2: Instagram data
1. Get a shape file of South Tyrol (http://geoportal.buergernetz.bz.it/)
Demo 2: Instagram data
1. Get a shape file of South Tyrol (http://geoportal.buergernetz.bz.it/)
2. Use QGIS to create a regularly-spaced grid of points
Demo 2: Instagram data
1. Get a shape file of South Tyrol (http://geoportal.buergernetz.bz.it/)
2. Use QGIS to create a regularly-spaced grid of points
3. Export points as latitude and longitude coordinates
Demo 2: Instagram rate limits & scopes
- Global rate limits on the Instagram platform
(https://www.instagram.com/developer/limits/)
- 5000 API calls / hour
- Scopes
- public_content - to read any public profile info and media on a user’s behalf
(applications no longer accepted) :’(
Demo 2: Instagram search API
{
"data":[
{
"id":"1614761577805643016_1157147895",
"user":{
"id":"1157147895","full_name":"Marc Hochstaffl","profile_picture":"…","username":"marc_hochstaffl"
},
"images":{
"thumbnail":{"width":150,"height":150,"url":"…"},"low_resolution":{…},"standard_resolution":{…}
},
"created_time":"1506714602",
"caption":{ … },
"user_has_liked":false,
"likes":{"count":181},
"tags":["sam","karposfasttrail","autumnud83cudf41","ahrntal","hundskehljoch"],
"filter":"Normal",
"comments":{"count":3},
"type":"image",
"link":"https://www.instagram.com/p/BZoyRmCDf0I/",
"location":{"latitude":47.05,"longitude":12.06667,"name":"Hundskehljoch","id":1033509208},
"attribution":null,
"users_in_photo":[]
}
], "meta":{"code":200}
}
https://api.instagram.com/v1/media/search?lat=47.051124693028548&lng=12.039835734128651&access_token=key&distance=5000
PUT /_template/ instagram
{
"template" : "instagram*" ,
"mappings" : {
"_default_" : {
"properties" : {
"images" : { … },
"carousel_media" : { … },
"geoip" : { "type": "geo_point" },
"users_in_photo" : { … },
"link" : { … },
"created_time" : {
"type" : "date",
"format" : "strict_date_optional_time||epoch_second"
},
"caption" : { … },
"type" : { "type": "keyword" },
"tags" : { "type": "keyword" },
"filter" : { "type": "keyword" },
"likes.count" : { "type" : "integer" },
"comments.count" : { "type" : "integer" },
"location" : { … },
"id" : { "type" : "keyword" },
"user" : { … }
}
}
}
}
Demo 2: Create Elasticsearch index
input {
http_poller {
urls => {
insta1 => "/v1/media/search?lat=47.051124693028548&lng=12.039835734128651&access_token=key&distance=5000"
insta2 => "/v1/media/search?lat=47.049359378811829&lng=12.105570031601609&access_token=key&distance=5000"
…
}
keepalive => false
cookies => false
request_timeout => 30
schedule => { every => "10m" }
codec => "json"
}
}
output {
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "instagram-%{+YYYYMM}"
document_id => "%{id}"
}
}
Demo 2: Grab and store posts using Logstash (I)
Demo 2: Grab and store posts using Logstash (II)
filter {
split { field => "data" }
if [data][id] {
mutate {
convert => {
"[data][comments][count]" => "integer"
"[data][likes][count]" => "integer"
}
rename => {
"[data][created_time]" => "[created_time]"
"[data][images]" => "[images]"
"[data][comments][count]" => "[comments_count]"
…
"[data][id]" => "[id]"
"[data][user]" => "[user]"
"[data][likes][count]" => "[likes_count]"
}
add_field => [ "geoip", "%{[location][latitude]},%{[location][longitude]}" ]
remove_field => ["data", "meta"]
}
date {
match => ["[caption][created_time]" , "UNIX"]
target => [ "[caption][created_time]" ]
}
date {
match => ["[created_time]" , "UNIX"]
remove_field => [ "[created_time]" ]
}
}
}
Demo 2: Grab and store posts using Linux Shell
#!/bin/bash
insta=(
'https://api.instagram.com/v1/media/search?lat=47.051124693028548&lng=12.039835734128651&access_token=key&distance=5000'
'https://api.instagram.com/v1/media/search?lat=47.049359378811829&lng=12.105570031601609&access_token=key&distance=5000'
)
count=0
while [ "x${insta[count]} " != "x" ]
do
MIN= `date -d '11 minutes ago' +"%s"` # reduce bandwidth
URL= "${insta[count]} &min_timestamp= $MIN"
curl -s $URL | jq -c '.data[] | .geoip = ((.location.latitude | tostring) + "," + (.location.longitude | tostring)) |
{'index': {'_index': ("instagram-" + (.created_time | ' tonumber ' | gmtime | strftime("%Y%m"))), ' _type': "feed", ' _id': .id}},
.' | curl -s -XPOST localhost:9200/_bulk --data-binary @- & # start in background
if [ $((($count + 1) % 20)) = 0 ]; then # parallelize
wait
fi
count= $(( $count + 1 ))
done
Use a cron job to run the shell script every 10 minutes!
Demo 2: Visualize posts by date
July -
August
Demo 2: Daily rhythm (1 for monday … 7 for sunday)
Sunday… 2 pm - 7 pm
Demo 2: Top locations by number of posts (I)
Riva del Garda
Trento
Bolzano
Tre Cime
Merano
Demo 2: Top tags by number of posts
snukiefulmartinisisters
giuliavalentina
valentinavignali valentinavignali
querly_official
igworld_global
manueldietrich
photography
Demo 2: Top travellers
Demo 2: Influencer Trentino
Demo 2: Influencer South Tyrol
Demo 2:
Glassy
human
Data-Driven Advertising

More Related Content

What's hot (20)

PPTX
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
MongoDB
 
KEY
JSON-LD and MongoDB
Gregg Kellogg
 
PDF
MongoD Essentials
zahid-mian
 
PDF
MongoDB With Style
Gabriele Lana
 
PPTX
ChContext
Marco Montanari
 
KEY
JSON-LD: JSON for Linked Data
Gregg Kellogg
 
PPTX
Back to Basics: My First MongoDB Application
MongoDB
 
PDF
MongoDB .local Munich 2019: Best Practices for Working with IoT and Time-seri...
MongoDB
 
PPTX
The rise of json in rdbms land jab17
alikonweb
 
TXT
Kevin milla arbieto informatica piktochart backup data
Kevin Miguel Milla
 
PDF
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
MongoDB
 
PDF
NoSQL - An introduction to CouchDB
Jonathan Weiss
 
PDF
Apache CouchDB Presentation @ Sept. 2104 GTALUG Meeting
Myles Braithwaite
 
KEY
MongoDB
Steve Klabnik
 
KEY
Forbes MongoNYC 2011
djdunlop
 
PDF
Advanced MongoDB #1
Takahiro Inoue
 
PDF
NoSQL を Ruby で実践するための n 個の方法
Tomohiro Nishimura
 
PDF
MongoDB .local London 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB
 
PDF
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB
 
PDF
Sencha Touch meets TYPO3
Nils Dehl
 
Webinarserie: Einführung in MongoDB: “Back to Basics” - Teil 3 - Interaktion ...
MongoDB
 
JSON-LD and MongoDB
Gregg Kellogg
 
MongoD Essentials
zahid-mian
 
MongoDB With Style
Gabriele Lana
 
ChContext
Marco Montanari
 
JSON-LD: JSON for Linked Data
Gregg Kellogg
 
Back to Basics: My First MongoDB Application
MongoDB
 
MongoDB .local Munich 2019: Best Practices for Working with IoT and Time-seri...
MongoDB
 
The rise of json in rdbms land jab17
alikonweb
 
Kevin milla arbieto informatica piktochart backup data
Kevin Miguel Milla
 
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
MongoDB
 
NoSQL - An introduction to CouchDB
Jonathan Weiss
 
Apache CouchDB Presentation @ Sept. 2104 GTALUG Meeting
Myles Braithwaite
 
MongoDB
Steve Klabnik
 
Forbes MongoNYC 2011
djdunlop
 
Advanced MongoDB #1
Takahiro Inoue
 
NoSQL を Ruby で実践するための n 個の方法
Tomohiro Nishimura
 
MongoDB .local London 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB
 
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB
 
Sencha Touch meets TYPO3
Nils Dehl
 

Similar to SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana" (17)

PPT
How ElasticSearch lives in my DevOps life
琛琳 饶
 
PDF
Q con shanghai2013-[jains krums]-[real-time-delivery-archiecture]
Michael Zhang
 
KEY
Esperwhispering
Theo Schlossnagle
 
PPTX
Elasticsearch
Ricardo Peres
 
PDF
Kibana_Data_analyst_7.1.0.pdf
AlexandreGiordanelli1
 
PDF
The dark side of the app - Todi Appy Days 2015
Todi Appy Days
 
PDF
The dark side of the app
Simone Di Maulo
 
PDF
API moderne e real-time per applicazioni innovative
Commit University
 
PDF
Lot Explorer Report
Christopher Roman
 
PDF
Social Connections 13 - Troubleshooting Connections Pink
Nico Meisenzahl
 
PDF
Scala at foursquare
jorgeortiz85
 
PPTX
NoSql for your Digital Economy Business
Couchbase Japan KK
 
PPTX
Facebook and its development
Tao Wang
 
PDF
PyData Berlin Meetup
Steffen Wenz
 
PDF
Introduction to Elasticsearch
Ruslan Zavacky
 
PDF
OSMC 2014: Using elasticsearch, logstash & kibana in system administration | ...
NETWAYS
 
PDF
Introduction to Elasticsearch
Jason Austin
 
How ElasticSearch lives in my DevOps life
琛琳 饶
 
Q con shanghai2013-[jains krums]-[real-time-delivery-archiecture]
Michael Zhang
 
Esperwhispering
Theo Schlossnagle
 
Elasticsearch
Ricardo Peres
 
Kibana_Data_analyst_7.1.0.pdf
AlexandreGiordanelli1
 
The dark side of the app - Todi Appy Days 2015
Todi Appy Days
 
The dark side of the app
Simone Di Maulo
 
API moderne e real-time per applicazioni innovative
Commit University
 
Lot Explorer Report
Christopher Roman
 
Social Connections 13 - Troubleshooting Connections Pink
Nico Meisenzahl
 
Scala at foursquare
jorgeortiz85
 
NoSql for your Digital Economy Business
Couchbase Japan KK
 
Facebook and its development
Tao Wang
 
PyData Berlin Meetup
Steffen Wenz
 
Introduction to Elasticsearch
Ruslan Zavacky
 
OSMC 2014: Using elasticsearch, logstash & kibana in system administration | ...
NETWAYS
 
Introduction to Elasticsearch
Jason Austin
 
Ad

More from South Tyrol Free Software Conference (20)

PDF
SFSCON24 - Marina Latini - 1, 2, 3, Doc Kit!
South Tyrol Free Software Conference
 
PDF
SFSCON24 - Carmen Delgado Ivar Grimstad - Nurturing OpenJDK distribution: Ecl...
South Tyrol Free Software Conference
 
PDF
SFSCON24 - Eduardo Guerra - codEEmoji – Making code more informative with emojis
South Tyrol Free Software Conference
 
PDF
SFSCON24 - Juri Solovjov - How to start contributing and still have fun
South Tyrol Free Software Conference
 
PDF
SFSCON24 - Michal Skipala & Bruno Rossi - Monolith Splitter
South Tyrol Free Software Conference
 
PDF
SFSCON24 - Jorge Melegati - Software Engineering Automation: From early tools...
South Tyrol Free Software Conference
 
PDF
SFSCON24 - Chiara Civardi & Dominika Tasarz Sochacka - The Crucial Role of Op...
South Tyrol Free Software Conference
 
PDF
SFSCON24 - Moritz Mock, Barbara Russo & Jorge Melegati - Can Test Driven Deve...
South Tyrol Free Software Conference
 
PDF
SFSCON24 - Aurelio Buonomo & Christian Zanotti - Apisense – Easily monitor an...
South Tyrol Free Software Conference
 
PDF
SFSCON24 - Giovanni Giannotta & Orneda Lecini - Approaches to Object Detectio...
South Tyrol Free Software Conference
 
PDF
SFSCON24 - Alberto Nicoletti - The SMART Box of AURA Project
South Tyrol Free Software Conference
 
PDF
SFSCON24 - Luca Alloatti - Open-source silicon chips
South Tyrol Free Software Conference
 
PDF
SFSCON24 - Roberto Innocenti - 2025 scenario on OpenISA OpenPower Open Hardwa...
South Tyrol Free Software Conference
 
PDF
SFSCON24 - Juan Rico - Enabling global interoperability among smart devices ...
South Tyrol Free Software Conference
 
PDF
SFSCON24 - Seckin Celik & Davide Serpico - Adoption Determinants of Open Hard...
South Tyrol Free Software Conference
 
PDF
SFSCON24 - Stefan Mutschlechner - Smart Werke Meran - Lorawan Use Cases
South Tyrol Free Software Conference
 
PDF
SFSCON24 - Mattia Pizzirani - Raspberry Pi and Node-RED: Open Source Tools fo...
South Tyrol Free Software Conference
 
PDF
SFSCON24 - Attaullah Buriro - ClapMetrics: Decoding Users Genderand Age Throu...
South Tyrol Free Software Conference
 
PDF
SFSCON24 - Joseph P. De Veaugh Geiss - Opt out? Opt in? Opt Green! Bringing F...
South Tyrol Free Software Conference
 
PDF
SFSCON24 - Fulvio Mastrogiovanni - On the ethical challenges raised by robots...
South Tyrol Free Software Conference
 
SFSCON24 - Marina Latini - 1, 2, 3, Doc Kit!
South Tyrol Free Software Conference
 
SFSCON24 - Carmen Delgado Ivar Grimstad - Nurturing OpenJDK distribution: Ecl...
South Tyrol Free Software Conference
 
SFSCON24 - Eduardo Guerra - codEEmoji – Making code more informative with emojis
South Tyrol Free Software Conference
 
SFSCON24 - Juri Solovjov - How to start contributing and still have fun
South Tyrol Free Software Conference
 
SFSCON24 - Michal Skipala & Bruno Rossi - Monolith Splitter
South Tyrol Free Software Conference
 
SFSCON24 - Jorge Melegati - Software Engineering Automation: From early tools...
South Tyrol Free Software Conference
 
SFSCON24 - Chiara Civardi & Dominika Tasarz Sochacka - The Crucial Role of Op...
South Tyrol Free Software Conference
 
SFSCON24 - Moritz Mock, Barbara Russo & Jorge Melegati - Can Test Driven Deve...
South Tyrol Free Software Conference
 
SFSCON24 - Aurelio Buonomo & Christian Zanotti - Apisense – Easily monitor an...
South Tyrol Free Software Conference
 
SFSCON24 - Giovanni Giannotta & Orneda Lecini - Approaches to Object Detectio...
South Tyrol Free Software Conference
 
SFSCON24 - Alberto Nicoletti - The SMART Box of AURA Project
South Tyrol Free Software Conference
 
SFSCON24 - Luca Alloatti - Open-source silicon chips
South Tyrol Free Software Conference
 
SFSCON24 - Roberto Innocenti - 2025 scenario on OpenISA OpenPower Open Hardwa...
South Tyrol Free Software Conference
 
SFSCON24 - Juan Rico - Enabling global interoperability among smart devices ...
South Tyrol Free Software Conference
 
SFSCON24 - Seckin Celik & Davide Serpico - Adoption Determinants of Open Hard...
South Tyrol Free Software Conference
 
SFSCON24 - Stefan Mutschlechner - Smart Werke Meran - Lorawan Use Cases
South Tyrol Free Software Conference
 
SFSCON24 - Mattia Pizzirani - Raspberry Pi and Node-RED: Open Source Tools fo...
South Tyrol Free Software Conference
 
SFSCON24 - Attaullah Buriro - ClapMetrics: Decoding Users Genderand Age Throu...
South Tyrol Free Software Conference
 
SFSCON24 - Joseph P. De Veaugh Geiss - Opt out? Opt in? Opt Green! Bringing F...
South Tyrol Free Software Conference
 
SFSCON24 - Fulvio Mastrogiovanni - On the ethical challenges raised by robots...
South Tyrol Free Software Conference
 
Ad

Recently uploaded (20)

PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
July Patch Tuesday
Ivanti
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
July Patch Tuesday
Ivanti
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 

SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"

  • 1. Exploring data with Elasticsearch and Kibana Patrick Puecher, Developer SFSCon, November 10th 2017
  • 2. Elastic Stack (the ELK Stack) Elasticsearch Kibana BeatsLogstash
  • 3. Elasticsearch - Distributed, RESTful search engine - Based on Lucene - Written in Java - Apache License - APIs - Indexes APIs - Document APIs - Search APIs - …
  • 4. Kibana - Visualize your data - Histograms, line graphs, pie charts, … - Time Series with Timelion
  • 5. Logstash - Server-side data processing pipeline - How Logstash works - Inputs - file, syslog, redis, beats, … - Filters - split, mutate (convert, rename, add_field, remove_field), date, … - Outputs - elasticsearch, file, email, exec, …
  • 6. Beats - Send data from machines to Logstash and Elasticsearch - Beats family - Filebeat - log files - Metricbeat - system and service metrics - Packetbeat - network data - Winlogbeat - windows event logs - Heartbeat (beta) - uptime monitoring
  • 7. Demo time! 1Big Data 4 Tourism - Input: CSV file - Data processing: Java API - Visualizing: Kibana 2Instagram Data - Input: JSON files - Data processing: Logstash & jq - Visualizing: Kibana ./jq
  • 8. Demo 1: Big Data 4 Tourism - Collect and visualize accommodation enquiries and bookings ○ Create Elasticsearch index ○ Tourism Data Collector (https://github.com/idm-suedtirol/big-data-for-tourism) - Upload and process CSV files - Written in Java - Open Source ツ ○ Kibana to visualize - Big Data 4 Tourism working group by IDM Südtirol - Alto Adige ○ Brandnamic, HGV, IDM Südtirol - Alto Adige, Internet Consulting, Limitis, LTS, Peer GmbH, SiMedia …
  • 9. PUT /tourism-data_2017 { "mappings" : { "enquiry" : { "properties" : { "arrival" : { "type" : "date", "format" : "epoch_millis||date" }, "departure" : { "type" : "date", "format" : "epoch_millis||date" }, "country.code" : { "type" : "keyword" }, "country.name" : { "type" : "keyword" }, "country.latlon" : { "type" : "geo_point" }, "adults" : { "type" : "byte" }, "children" : { "type" : "byte" }, "destination.code" : { "type" : "short" }, "destination.name" : { "type" : "keyword" }, "destination.latlon" : { "type" : "geo_point" }, "category.code" : { "type" : "byte" }, "category.name" : { "type" : "keyword" }, "booking" : { "type" : "boolean" }, "cancellation" : { "type" : "boolean" }, "submitted_on" : { "type" : "date", "format" : "epoch_millis||date||date_hour_minute_second"}, "length_of_stay" : { "type" : "short" } } } } } Demo 1: Create Elasticsearch index "2015-01-01","2015-01-03","","2","0","21027","1","1","0","2015-01-01T01:59:00"
  • 10. Demo 1: Tourism Data Collector
  • 11. Demo 1: Visualize sample data (I) 1 2
  • 12. Demo 1: Visualize sample data (II) 2 1 3
  • 13. How to get Instagram posts from South Tyrol? Demo 2: Instagram data Mission: Must-see places for route planner
  • 14. Demo 2: Instagram data 1. Get a shape file of South Tyrol (http://geoportal.buergernetz.bz.it/)
  • 15. Demo 2: Instagram data 1. Get a shape file of South Tyrol (http://geoportal.buergernetz.bz.it/) 2. Use QGIS to create a regularly-spaced grid of points
  • 16. Demo 2: Instagram data 1. Get a shape file of South Tyrol (http://geoportal.buergernetz.bz.it/) 2. Use QGIS to create a regularly-spaced grid of points 3. Export points as latitude and longitude coordinates
  • 17. Demo 2: Instagram rate limits & scopes - Global rate limits on the Instagram platform (https://www.instagram.com/developer/limits/) - 5000 API calls / hour - Scopes - public_content - to read any public profile info and media on a user’s behalf (applications no longer accepted) :’(
  • 18. Demo 2: Instagram search API { "data":[ { "id":"1614761577805643016_1157147895", "user":{ "id":"1157147895","full_name":"Marc Hochstaffl","profile_picture":"…","username":"marc_hochstaffl" }, "images":{ "thumbnail":{"width":150,"height":150,"url":"…"},"low_resolution":{…},"standard_resolution":{…} }, "created_time":"1506714602", "caption":{ … }, "user_has_liked":false, "likes":{"count":181}, "tags":["sam","karposfasttrail","autumnud83cudf41","ahrntal","hundskehljoch"], "filter":"Normal", "comments":{"count":3}, "type":"image", "link":"https://www.instagram.com/p/BZoyRmCDf0I/", "location":{"latitude":47.05,"longitude":12.06667,"name":"Hundskehljoch","id":1033509208}, "attribution":null, "users_in_photo":[] } ], "meta":{"code":200} } https://api.instagram.com/v1/media/search?lat=47.051124693028548&lng=12.039835734128651&access_token=key&distance=5000
  • 19. PUT /_template/ instagram { "template" : "instagram*" , "mappings" : { "_default_" : { "properties" : { "images" : { … }, "carousel_media" : { … }, "geoip" : { "type": "geo_point" }, "users_in_photo" : { … }, "link" : { … }, "created_time" : { "type" : "date", "format" : "strict_date_optional_time||epoch_second" }, "caption" : { … }, "type" : { "type": "keyword" }, "tags" : { "type": "keyword" }, "filter" : { "type": "keyword" }, "likes.count" : { "type" : "integer" }, "comments.count" : { "type" : "integer" }, "location" : { … }, "id" : { "type" : "keyword" }, "user" : { … } } } } } Demo 2: Create Elasticsearch index
  • 20. input { http_poller { urls => { insta1 => "/v1/media/search?lat=47.051124693028548&lng=12.039835734128651&access_token=key&distance=5000" insta2 => "/v1/media/search?lat=47.049359378811829&lng=12.105570031601609&access_token=key&distance=5000" … } keepalive => false cookies => false request_timeout => 30 schedule => { every => "10m" } codec => "json" } } output { elasticsearch { hosts => ["127.0.0.1:9200"] index => "instagram-%{+YYYYMM}" document_id => "%{id}" } } Demo 2: Grab and store posts using Logstash (I)
  • 21. Demo 2: Grab and store posts using Logstash (II) filter { split { field => "data" } if [data][id] { mutate { convert => { "[data][comments][count]" => "integer" "[data][likes][count]" => "integer" } rename => { "[data][created_time]" => "[created_time]" "[data][images]" => "[images]" "[data][comments][count]" => "[comments_count]" … "[data][id]" => "[id]" "[data][user]" => "[user]" "[data][likes][count]" => "[likes_count]" } add_field => [ "geoip", "%{[location][latitude]},%{[location][longitude]}" ] remove_field => ["data", "meta"] } date { match => ["[caption][created_time]" , "UNIX"] target => [ "[caption][created_time]" ] } date { match => ["[created_time]" , "UNIX"] remove_field => [ "[created_time]" ] } } }
  • 22. Demo 2: Grab and store posts using Linux Shell #!/bin/bash insta=( 'https://api.instagram.com/v1/media/search?lat=47.051124693028548&lng=12.039835734128651&access_token=key&distance=5000' 'https://api.instagram.com/v1/media/search?lat=47.049359378811829&lng=12.105570031601609&access_token=key&distance=5000' ) count=0 while [ "x${insta[count]} " != "x" ] do MIN= `date -d '11 minutes ago' +"%s"` # reduce bandwidth URL= "${insta[count]} &min_timestamp= $MIN" curl -s $URL | jq -c '.data[] | .geoip = ((.location.latitude | tostring) + "," + (.location.longitude | tostring)) | {'index': {'_index': ("instagram-" + (.created_time | ' tonumber ' | gmtime | strftime("%Y%m"))), ' _type': "feed", ' _id': .id}}, .' | curl -s -XPOST localhost:9200/_bulk --data-binary @- & # start in background if [ $((($count + 1) % 20)) = 0 ]; then # parallelize wait fi count= $(( $count + 1 )) done Use a cron job to run the shell script every 10 minutes!
  • 23. Demo 2: Visualize posts by date July - August
  • 24. Demo 2: Daily rhythm (1 for monday … 7 for sunday) Sunday… 2 pm - 7 pm
  • 25. Demo 2: Top locations by number of posts (I) Riva del Garda Trento Bolzano Tre Cime Merano
  • 26. Demo 2: Top tags by number of posts snukiefulmartinisisters giuliavalentina valentinavignali valentinavignali querly_official igworld_global manueldietrich photography
  • 27. Demo 2: Top travellers
  • 28. Demo 2: Influencer Trentino
  • 29. Demo 2: Influencer South Tyrol