SlideShare a Scribd company logo
Alejandro Llaves
Ontology Engineering Group
Universidad Politécnica de Madrid
Madrid, Spain
allaves@fi.upm.es
Oct 21 2015
Virtual Clusters for
(RDF) Stream Processing
Outline

Some context: morph-streams++

Motivation

Use case: Sensor Cloud data integration

Topologies everywhere

Setting up a virtual cluster

Deploying Storm topologies

Conclusion
Some context...
Motivation

Integrating an unbounded stream of heterogeneous
sensor observations

Solution:
– Storm topologies for real-time processing
– Semantic Sensor Network (SSN) ontology for
modelling observations
– SWEET ontology for environmental phenomena
Use case: Sensor Cloud data integration (1/3)
Sensor Cloud

Viticulture, water
management, weather
monitoring, oyster farming...

RESTful API – JSON

Network → Platform →
Sensor → Phenomenon →
Observation

Lack of semantic
descriptions, e.g.
rain_trace vs Rain.

Multiple HTTP requests to
query various streams.
Source: CSIRO
Use case: Sensor Cloud data integration (2/3)

Sensor Cloud messages to field-named tuples

SWEET annotations for heterogeneous phenomena descriptions
<sample time=”2015­05­28T16:30” value=”15” sensor=”bom_gov_au.94961.air.air_temp”/>
[“2015­05­28T16:32”, “2015­05­28T16:30”, “15”, “bom_gov_au”, “94961”, “air”, “air_temp”,
“­43.3167”, “147.0075”]
network
phenomenon
platform sensorsampling time
system time
latitude longitude
SensorCloudParser
Bolt
SweetAnnotations
Bolt
Use case: Sensor Cloud data integration (3/3)
SSN mapping
SSNConverter
Bolt
Topologies everywhere

A Storm topology “is a graph of stream transformations
where each node is a spout or bolt”.
https://storm.apache.org/documentation/Tutorial.html

Example of simple topology
Virtual Clusters for (RDF) Stream Processing
Virtual Clusters for (RDF) Stream Processing
Setting up a virtual cluster (1/2)
Wirbelsturm - https://github.com/miguno/wirbelsturm/

Allows deploying (local or remote) virtual clusters.

Focus on Big Data technologies: Storm, Kafka,
Zookeeper...

Uses Vagrant for “easy to configure, reproducible, and
portable work environments” - https://docs.vagrantup.com/v2/why-vagrant/index.html

Uses Puppet for provisioning: installation and
configuration of SW packages in the cluster nodes.
Setting up a virtual cluster (2/2)

$ ./deploy

Show wirbelsturm.yaml

Check Storm GUI -
http://localhost:28080/index.html
Deploying Storm topologies

$ ./deploy

Show wirbelsturm.yaml

Check Storm GUI -
http://localhost:28080/index.html

Describe simple topology

Compile & deploy

Describe a topology set

Configure Kafka

Compile & deploy
Virtual Clusters for (RDF) Stream Processing
Conclusion
Conclusion

Wirbelsturm allows easy configuration & deployment of virtual clusters,
with focus on Big Data technologies.

SSN and SWEET ontologies to model and integrate environmental
sensor observations.

Parallelization of bottleneck tasks reduces the average message
processing latency (up to some extent). More about Storm
parallelization: http://bit.ly/1NVyjU2

Delaying RDF conversion does not speed up the processing of Sensor
Cloud messages in the tested environment.

Submitted paper to IJSWIS, special issue on Velocity and Variety
Dimensions of Big Data – Llaves, Corcho et al.
What's coming next

Flying faster with Heron - https://blog.twitter.com/2015/flying-faster-with-twitter-heron
The presented research has has been funded by Ministerio de
Economía y Competitividad (Spain) under the project ”4V:
Volumen, Velocidad, Variedad y Validez en la Gestión Innovadora
de Datos” (TIN2013-46238-C4-2-R), by the EU Marie Curie
IRSES project SemData (612551), and supported by an AWS in
Education Research Grant award.
Alejandro Llaves
allaves@fi.upm.es
Thanks!

More Related Content

PPTX
2019 swan-cs3
Up2Universe
 
PDF
Container orchestration in geo-distributed cloud computing platforms
FogGuru MSCA Project
 
DOCX
da-sync a doppler-assisted time-synchronization scheme for mobile underwater ...
swathi78
 
PDF
Storm @ Fifth Elephant 2013
Prashanth Babu
 
PDF
Manning_3D_Cloud_AGU_Poster
John Pham
 
PDF
From data centers to fog computing: the evaporating cloud
FogGuru MSCA Project
 
PPTX
Zookeeper-aware application server
Andreas Mosti
 
PDF
An Experiment-Driven Performance Model of Stream Processing Operators in Fog ...
FogGuru MSCA Project
 
2019 swan-cs3
Up2Universe
 
Container orchestration in geo-distributed cloud computing platforms
FogGuru MSCA Project
 
da-sync a doppler-assisted time-synchronization scheme for mobile underwater ...
swathi78
 
Storm @ Fifth Elephant 2013
Prashanth Babu
 
Manning_3D_Cloud_AGU_Poster
John Pham
 
From data centers to fog computing: the evaporating cloud
FogGuru MSCA Project
 
Zookeeper-aware application server
Andreas Mosti
 
An Experiment-Driven Performance Model of Stream Processing Operators in Fog ...
FogGuru MSCA Project
 

Similar to Virtual Clusters for (RDF) Stream Processing (20)

PDF
What we do to improve scalability in our RDF processing system
Alejandro Llaves
 
PDF
Storm@Twitter, SIGMOD 2014 paper
Karthik Ramasamy
 
PPTX
Introduction to Storm
Eugene Dvorkin
 
PDF
The Future of Apache Storm
DataWorks Summit/Hadoop Summit
 
PPTX
The Future of Apache Storm
P. Taylor Goetz
 
PPTX
Scaling Apache Storm (Hadoop Summit 2015)
Robert Evans
 
PDF
Real-time Big Data Processing with Storm
viirya
 
PPTX
Multi-tenant Apache Storm as a service
Robert Evans
 
PDF
Mhug apache storm
Joseph Niemiec
 
PPTX
Introduction to Streaming Distributed Processing with Storm
Brandon O'Brien
 
PDF
The Future of Apache Storm
DataWorks Summit/Hadoop Summit
 
PDF
Red Hat Storage: Emerging Use Cases
Red_Hat_Storage
 
PDF
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Otávio Carvalho
 
PDF
New York Storm Users Group 2014-01-28 - Using Storm with MapR M7 for Real-Tim...
Gna Phetsarath
 
PPTX
From Gust To Tempest: Scaling Storm
DataWorks Summit
 
PDF
Streaming Analytics Unit 3 notes for engineers
ManjuAppukuttan2
 
PPT
Docker Based Hadoop Provisioning
DataWorks Summit
 
PPTX
Cassandra summit-2013
dfilppi
 
PDF
Ipres2019 sn-stormcrawler
sebastian_nagel
 
PPTX
Slide #1:Introduction to Apache Storm
Md. Shamsur Rahim
 
What we do to improve scalability in our RDF processing system
Alejandro Llaves
 
Storm@Twitter, SIGMOD 2014 paper
Karthik Ramasamy
 
Introduction to Storm
Eugene Dvorkin
 
The Future of Apache Storm
DataWorks Summit/Hadoop Summit
 
The Future of Apache Storm
P. Taylor Goetz
 
Scaling Apache Storm (Hadoop Summit 2015)
Robert Evans
 
Real-time Big Data Processing with Storm
viirya
 
Multi-tenant Apache Storm as a service
Robert Evans
 
Mhug apache storm
Joseph Niemiec
 
Introduction to Streaming Distributed Processing with Storm
Brandon O'Brien
 
The Future of Apache Storm
DataWorks Summit/Hadoop Summit
 
Red Hat Storage: Emerging Use Cases
Red_Hat_Storage
 
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Otávio Carvalho
 
New York Storm Users Group 2014-01-28 - Using Storm with MapR M7 for Real-Tim...
Gna Phetsarath
 
From Gust To Tempest: Scaling Storm
DataWorks Summit
 
Streaming Analytics Unit 3 notes for engineers
ManjuAppukuttan2
 
Docker Based Hadoop Provisioning
DataWorks Summit
 
Cassandra summit-2013
dfilppi
 
Ipres2019 sn-stormcrawler
sebastian_nagel
 
Slide #1:Introduction to Apache Storm
Md. Shamsur Rahim
 
Ad

Recently uploaded (20)

PDF
oop_java (1) of ice or cse or eee ic.pdf
sabiquntoufiqlabonno
 
PPT
2009worlddatasheet_presentation.ppt peoole
umutunsalnsl4402
 
PPTX
Economic Sector Performance Recovery.pptx
yulisbaso2020
 
PPTX
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
PPTX
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
PPTX
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
PPTX
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
PPT
Grade 5 PPT_Science_Q2_W6_Methods of reproduction.ppt
AaronBaluyut
 
PPTX
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
PPTX
Introduction to Data Analytics and Data Science
KavithaCIT
 
PDF
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
PPTX
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
Employee Salary Presentation.l based on data science collection of data
barridevakumari2004
 
PPTX
1intro to AI.pptx AI components & composition
ssuserb993e5
 
PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PDF
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
oop_java (1) of ice or cse or eee ic.pdf
sabiquntoufiqlabonno
 
2009worlddatasheet_presentation.ppt peoole
umutunsalnsl4402
 
Economic Sector Performance Recovery.pptx
yulisbaso2020
 
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
Grade 5 PPT_Science_Q2_W6_Methods of reproduction.ppt
AaronBaluyut
 
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
Introduction to Data Analytics and Data Science
KavithaCIT
 
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Employee Salary Presentation.l based on data science collection of data
barridevakumari2004
 
1intro to AI.pptx AI components & composition
ssuserb993e5
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
Ad

Virtual Clusters for (RDF) Stream Processing

  • 1. Alejandro Llaves Ontology Engineering Group Universidad Politécnica de Madrid Madrid, Spain [email protected] Oct 21 2015 Virtual Clusters for (RDF) Stream Processing
  • 2. Outline  Some context: morph-streams++  Motivation  Use case: Sensor Cloud data integration  Topologies everywhere  Setting up a virtual cluster  Deploying Storm topologies  Conclusion
  • 4. Motivation  Integrating an unbounded stream of heterogeneous sensor observations  Solution: – Storm topologies for real-time processing – Semantic Sensor Network (SSN) ontology for modelling observations – SWEET ontology for environmental phenomena
  • 5. Use case: Sensor Cloud data integration (1/3) Sensor Cloud  Viticulture, water management, weather monitoring, oyster farming...  RESTful API – JSON  Network → Platform → Sensor → Phenomenon → Observation  Lack of semantic descriptions, e.g. rain_trace vs Rain.  Multiple HTTP requests to query various streams. Source: CSIRO
  • 6. Use case: Sensor Cloud data integration (2/3)  Sensor Cloud messages to field-named tuples  SWEET annotations for heterogeneous phenomena descriptions <sample time=”2015­05­28T16:30” value=”15” sensor=”bom_gov_au.94961.air.air_temp”/> [“2015­05­28T16:32”, “2015­05­28T16:30”, “15”, “bom_gov_au”, “94961”, “air”, “air_temp”, “­43.3167”, “147.0075”] network phenomenon platform sensorsampling time system time latitude longitude SensorCloudParser Bolt SweetAnnotations Bolt
  • 7. Use case: Sensor Cloud data integration (3/3) SSN mapping SSNConverter Bolt
  • 8. Topologies everywhere  A Storm topology “is a graph of stream transformations where each node is a spout or bolt”. https://storm.apache.org/documentation/Tutorial.html  Example of simple topology
  • 11. Setting up a virtual cluster (1/2) Wirbelsturm - https://github.com/miguno/wirbelsturm/  Allows deploying (local or remote) virtual clusters.  Focus on Big Data technologies: Storm, Kafka, Zookeeper...  Uses Vagrant for “easy to configure, reproducible, and portable work environments” - https://docs.vagrantup.com/v2/why-vagrant/index.html  Uses Puppet for provisioning: installation and configuration of SW packages in the cluster nodes.
  • 12. Setting up a virtual cluster (2/2)  $ ./deploy  Show wirbelsturm.yaml  Check Storm GUI - http://localhost:28080/index.html
  • 13. Deploying Storm topologies  $ ./deploy  Show wirbelsturm.yaml  Check Storm GUI - http://localhost:28080/index.html  Describe simple topology  Compile & deploy  Describe a topology set  Configure Kafka  Compile & deploy
  • 15. Conclusion Conclusion  Wirbelsturm allows easy configuration & deployment of virtual clusters, with focus on Big Data technologies.  SSN and SWEET ontologies to model and integrate environmental sensor observations.  Parallelization of bottleneck tasks reduces the average message processing latency (up to some extent). More about Storm parallelization: http://bit.ly/1NVyjU2  Delaying RDF conversion does not speed up the processing of Sensor Cloud messages in the tested environment.  Submitted paper to IJSWIS, special issue on Velocity and Variety Dimensions of Big Data – Llaves, Corcho et al. What's coming next  Flying faster with Heron - https://blog.twitter.com/2015/flying-faster-with-twitter-heron
  • 16. The presented research has has been funded by Ministerio de Economía y Competitividad (Spain) under the project ”4V: Volumen, Velocidad, Variedad y Validez en la Gestión Innovadora de Datos” (TIN2013-46238-C4-2-R), by the EU Marie Curie IRSES project SemData (612551), and supported by an AWS in Education Research Grant award. Alejandro Llaves [email protected] Thanks!