SlideShare a Scribd company logo
Benchmarking of distributed linked data streaming systems
This project has received funding from the European Union's H2020 research and innovation action program under grant agreement number 688227.
The project runtime is December 2015 until November 2018.
The HOBBIT project
Pavel Smirnov
AGT International
1
Stream Reasoning Workshop
January 17, 2018
2
Overview
• The HOBBIT project
• DEBS challenges
• Available benchmarks overview
• Summary
Goal
To abolish the barriers in the adoption and deployment of Big Linked Data by European companies by:
• The deployment of benchmarks on data that reflects reality within realistic settings.
• The provision of corresponding industry-relevant key performance indicators (KPIs).
• The computation of comparable results on standardized hardware.
• The institution of an independent and thus bias-free organization to conduct regular benchmarks and
provide the European industry with up-to-date performance results.
Deliverables:
• The benchmarking platform (the HOBBIT platform)
• The set of benchmarks with KPIs
• Benchmarking association
3
The HOBBIT project. Overview
http://project-hobbit.eu
4
The HOBBIT platform. Business logic
1
2
3.
2
3.
1
4
5
6
Customer
Requires ranking of alternative
solutions by some KPI
Solution provider (vendor)
(e.g. DB, Streaming Platforms, ML
frameworks, etc…)
The HOBBIT platform
(online or local instance)
Customer
Requires ranking of alternative
solutions by some KPI
Customer
Requires ranking of alternative
solutions by some KPI Provides:
1. Automatic benchmark executions
2. Leaderboards (online or private)
Main advantages:
1. Streaming fashion
2. Docker virtualization
3. RDF-enabled
Submit
benchmarks
Submit
systems
http://github.com/hobbit-project/platform
5
The HOBBIT platform. Architecture
The data pipeline:
1. Raw/initial data send (optional)
2. Sending raw tuples
3.1 Sending tasks (task={tuple, id})
3.2 Sending expected results per tasks
4. Send actual results per tasks
5. Send the “expected-actual” pairs
6. Send KPIs back to the controller
7. Send KPIs back to the platform
Benchmark (customer’s application)
System components
(black box for customers)
Platform components
1
2
3.1
3.2
4
5
6
The online platform:
http://master.project-hobbit.eu/
Cluster: 6 nodes, each is
2×64 bit Intel Xeon E5-2630v3
(8-Cores, 2.4 GHz, HT, 20MB
Cache, each proc.), 256 GB RAM,
1Gb Ethernet
Nodes (benchmark/system): 3/3
https://github.com/hobbit-project/platform/wiki/Overview
7
http://github.com/hobbit-project/platform
6
The HOBBIT platform. Technologies
https://github.com/hobbit-project/platform/wiki/Overview
Platform communication channel (RarritMQ only)
Data transportation channel (app-specific)
Platform-side:
1. Java
2. RabbitMQ
3. Docker+Swarm
4. GitLab
5. Redis
6. Virtuoso (RDF)
7. NodeJS
8. KeyCloak
App-side (defaults):
1. Java
2. RabbitMQ
Application side Platform side
(RabbitMQ, Kafka, Netty, Akka…)
http://github.com/hobbit-project/platform
Design and upload to HOBBIT
Create a project at
https://git.project-hobbit.eu
Create and account at
https://master.project-hobbit.eu
Clone and extend the basic codes:
https://github.com/hobbit-project/java-sdk-
example
Design components using the manuals:
Run tests locally as pure java code
Update ttl-files for you project
Upload Design (alternative using the JAVA SDK)
Develop a benchmark component in Java
Develop a component in Java
Develop a system adapter
Develop a system adapter in Java
Create docker files using details (manual)
Design (the standard HOBBIT way)
Debug Docker images by running tests
Find your benchmark or system at
https://master.project-hobbit.eu
Build images (manual)
Configure remote project details
Upload docker images to
https://git.project-hobbit.eu
- Lots of understanding and manual work
- Impossible to debug locally *
- Upload non-tested images *
- No logs from the online platform, only GUI *
+ Clone and extend standard classes with your logic
+ Test and debug your code from IDE
+ Built Docker images on demand from IDE
+ Run your images from IDE, check all internal logs
+ Upload fully tested images
7
* Unless you haven’t a local HOBBIT deployment
8
Example: single benchmark run
http://master.project-hobbit.eu/
9
Example: challenges & leaderboards
http://master.project-hobbit.eu/
Challenges: DEBS GC 2017
DEBS Grand Challenge 2017 successfully completed
Anomaly detection for injection molding machines over RDF-streams.
10
14 teams
registered
7 teams passed
correctness check
2 were awarded
(main and audience
award)
StreaML Open Challenge is opened; Price: 500 €
The main result:
For the first time we can objectively quantify the performance of
a distributed stream processing pipeline running analytics algorithms
https://project-hobbit.eu/challenges/debs-grand-challenge/
https://project-hobbit.eu/open-challenges/streaml-open-challenge/
Find Cluster
Centers Over W
time units
Apply Markov
Model for
Anomaly Detection
Train Markov
Model over last W
time units
start
After at least W
time units
The anomaly detector:
Challenges: DEBS GC 2018
DEBS Grand Challenge 2018 is just started
https://project-hobbit.eu/challenges/debs2018-grand-challenge/
Prediction of arrival times and ports on marine traffic data.
Price: 1000 € + publication at DEBS proceedings (conf. will be in New Zealand)
11
• Synthetic generated data
• Predefined algorithms
• True RDF-streaming benchmark
• Focus: correctness check,
throughput, latency
• Real annotated data
• No predefined approach
• True ML-benchmark
• Focus: prediction accuracy,
performance
DEBS Grand Challenge 2018DEBS Grand Challenge 2017
12
Available benchmarks overview
Versioning Benchmark
• Benchmark for assessing an ability of
versioning systems to efficiently
manage evolving datasets and queries
Data Storage Benchmark
 benchmark for RDF data storage
solutions against an interactive
workload in a real-world scenario, using
various dataset sizes
Linking Benchmark
 Benchmark for assessing the
performance of instance Matching
tools that implement string-based
approaches
Faceted Browsing Benchmark
• Benchmark for systems which support
browsing through linked data by
iterative transitions performed by an
intelligent user
ODIN Benchmark
• benchmark for data extraction
solutions for structured data
• simulates the ingestion, storage
and retrieval of streams of RDF
data
Spatial Benchmark
 Benchmark for systems which deal with
topological relations proposed in the
state of the art DE-9IM model.
Question Answering Benchmark
• Benchmark for ranking question
answering systems based on their
performance and accuracy
GERBIL Benchmark
• benchmark for entity annotation
and disambiguation tools
• 9 annotators, 11 RDF datasets
Stream Machine Learning Benchmark
 Benchmark for assess the performance of
anomaly detection for injection molding
machines over RDF-streams
Stream Machine Learning Benchmark v2
• Benchmark for assess the accuracy of
prediction over stream of marine traffic
data
http://github.com/hobbit-project
Summary
The HOBBIT platform
• Ability to benchmark heterogeneous distibuted systems in streaming fashion
• A set of benchmarks to compare relevant Linked Data technologies and solutions
• We apply the HOBBIT platform to rank machine-learning pipelines over the RDF-streams
• The platform may be a basics for benchmark of stream-reasoning solutions
13
QA
Thank you for attention!
14
psmirnov@agtinternational.com
http://twitter.com/smirnp
http://twitter.com/AGTIntl

More Related Content

What's hot (19)

PDF
OSLC & The Future of Interoperability
Koneksys
 
PDF
DEEP general presentation
EUDAT
 
PPTX
Updates from Hungary (Jozsef Kovacs)
EOSC-hub project
 
PPTX
Big Data Europe Transport Pilot case, Luigi Selmi
BigData_Europe
 
PPSX
The path to an hybrid open source paradigm
Jonathan Challener
 
PDF
Cartogrammar Poster
EDINA, University of Edinburgh
 
PDF
h5web: a web-based viewer of HDF5 files
PaNOSC
 
PPTX
Open DMPs: Machine Actionable open data management planning (Presentation at ...
OpenAIRE
 
PDF
LDBC 6th TUC Meeting conclusions by Peter Boncz
Ioan Toma
 
PDF
20141030 LinDA Workshop echallenges2014 - LinDA project overview
LinDa_FP7
 
PPTX
Deep Hybrid DataCloud
EOSC-hub project
 
PDF
Enabling the digital thread using open OSLC standards
Axel Reichwein
 
PDF
Initiative Based Technology Consulting Case Studies
chanderdw
 
PDF
Planetdata simpda
Elena Simperl
 
PPTX
ECPPM2016 - SemCat: Publishing and Accessing Building Product Information as ...
Pieter Pauwels
 
PPT
Data Processing and Analysis
EUDAT
 
PDF
DSD-NL 2021 Delft-FEWS visie 2025 en roadmap 2021 - stand van zaken en voorui...
Deltares
 
PDF
Big Data Europe: Simplifying Development and Deployment of Big Data Applications
BigData_Europe
 
PDF
Linguistic Linked Open Data, Challenges, Approaches, Future Work
Sebastian Hellmann
 
OSLC & The Future of Interoperability
Koneksys
 
DEEP general presentation
EUDAT
 
Updates from Hungary (Jozsef Kovacs)
EOSC-hub project
 
Big Data Europe Transport Pilot case, Luigi Selmi
BigData_Europe
 
The path to an hybrid open source paradigm
Jonathan Challener
 
Cartogrammar Poster
EDINA, University of Edinburgh
 
h5web: a web-based viewer of HDF5 files
PaNOSC
 
Open DMPs: Machine Actionable open data management planning (Presentation at ...
OpenAIRE
 
LDBC 6th TUC Meeting conclusions by Peter Boncz
Ioan Toma
 
20141030 LinDA Workshop echallenges2014 - LinDA project overview
LinDa_FP7
 
Deep Hybrid DataCloud
EOSC-hub project
 
Enabling the digital thread using open OSLC standards
Axel Reichwein
 
Initiative Based Technology Consulting Case Studies
chanderdw
 
Planetdata simpda
Elena Simperl
 
ECPPM2016 - SemCat: Publishing and Accessing Building Product Information as ...
Pieter Pauwels
 
Data Processing and Analysis
EUDAT
 
DSD-NL 2021 Delft-FEWS visie 2025 en roadmap 2021 - stand van zaken en voorui...
Deltares
 
Big Data Europe: Simplifying Development and Deployment of Big Data Applications
BigData_Europe
 
Linguistic Linked Open Data, Challenges, Approaches, Future Work
Sebastian Hellmann
 

Similar to Benchmarking of distributed linked data streaming systems (20)

PDF
Hobbit project overview presented at EBDVF 2017
Holistic Benchmarking of Big Linked Data
 
PPTX
The DEBS Grand Challenge 2017
Roman Katerinenko
 
PDF
HOBBIT Project Overview @ ESWC HOBBIT Workshop
Holistic Benchmarking of Big Linked Data
 
PDF
Holistic Benchmarking of Big Linked Data: HOBBIT
Graph-TA
 
PDF
HOBBIT Link Discovery Benchmarks at OM2017 ISWC 2017
Holistic Benchmarking of Big Linked Data
 
PDF
HOBBIT at ESWC EU Networking Session
Holistic Benchmarking of Big Linked Data
 
PPTX
The DEBS Grand Challenge 2017
Holistic Benchmarking of Big Linked Data
 
PDF
DataBench session @ BDV Meet-Up Riga: The case of HOBBIT, 27/06/2019
DataBench
 
PDF
Hobbit in a Nutshell - EDF2016
Holistic Benchmarking of Big Linked Data
 
PDF
The DEBS Grand Challenge 2018
Holistic Benchmarking of Big Linked Data
 
PDF
Hobbit presentation at Apache Big Data Europe 2016
Holistic Benchmarking of Big Linked Data
 
PDF
Introducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
Holistic Benchmarking of Big Linked Data
 
PDF
Benchmarking Big Linked Data: The case of the HOBBIT Project
Holistic Benchmarking of Big Linked Data
 
PDF
OpenNebulaConf 2013 -Adventures in Research by Joel Merrick
OpenNebula Project
 
PDF
Adventures in Research
NETWAYS
 
PPTX
Continuous Quality
Stefano Galati
 
PDF
API Performance testing with Gatling
Tetiana Polishchuk
 
PDF
S-CUBE LP: Variability Modeling and QoS Analysis of Web Services Orchestrations
virtual-campus
 
Hobbit project overview presented at EBDVF 2017
Holistic Benchmarking of Big Linked Data
 
The DEBS Grand Challenge 2017
Roman Katerinenko
 
HOBBIT Project Overview @ ESWC HOBBIT Workshop
Holistic Benchmarking of Big Linked Data
 
Holistic Benchmarking of Big Linked Data: HOBBIT
Graph-TA
 
HOBBIT Link Discovery Benchmarks at OM2017 ISWC 2017
Holistic Benchmarking of Big Linked Data
 
HOBBIT at ESWC EU Networking Session
Holistic Benchmarking of Big Linked Data
 
The DEBS Grand Challenge 2017
Holistic Benchmarking of Big Linked Data
 
DataBench session @ BDV Meet-Up Riga: The case of HOBBIT, 27/06/2019
DataBench
 
Hobbit in a Nutshell - EDF2016
Holistic Benchmarking of Big Linked Data
 
The DEBS Grand Challenge 2018
Holistic Benchmarking of Big Linked Data
 
Hobbit presentation at Apache Big Data Europe 2016
Holistic Benchmarking of Big Linked Data
 
Introducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
Holistic Benchmarking of Big Linked Data
 
Benchmarking Big Linked Data: The case of the HOBBIT Project
Holistic Benchmarking of Big Linked Data
 
OpenNebulaConf 2013 -Adventures in Research by Joel Merrick
OpenNebula Project
 
Adventures in Research
NETWAYS
 
Continuous Quality
Stefano Galati
 
API Performance testing with Gatling
Tetiana Polishchuk
 
S-CUBE LP: Variability Modeling and QoS Analysis of Web Services Orchestrations
virtual-campus
 
Ad

More from Holistic Benchmarking of Big Linked Data (20)

PDF
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
Holistic Benchmarking of Big Linked Data
 
PDF
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
Holistic Benchmarking of Big Linked Data
 
PDF
SQCFramework: SPARQL Query Containment Benchmarks Generation Framework
Holistic Benchmarking of Big Linked Data
 
PDF
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
Holistic Benchmarking of Big Linked Data
 
PDF
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
Holistic Benchmarking of Big Linked Data
 
PDF
Scalable Link Discovery for Modern Data-Driven Applications (poster)
Holistic Benchmarking of Big Linked Data
 
PDF
An Evaluation of Models for Runtime Approximation in Link Discovery
Holistic Benchmarking of Big Linked Data
 
PDF
Scalable Link Discovery for Modern Data-Driven Applications
Holistic Benchmarking of Big Linked Data
 
PDF
Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
Holistic Benchmarking of Big Linked Data
 
PPTX
SPgen: A Benchmark Generator for Spatial Link Discovery Tools
Holistic Benchmarking of Big Linked Data
 
PDF
OKE2018 Challenge @ ESWC2018
Holistic Benchmarking of Big Linked Data
 
PDF
MOCHA 2018 Challenge @ ESWC2018
Holistic Benchmarking of Big Linked Data
 
PDF
Dynamic planning for link discovery - ESWC 2018
Holistic Benchmarking of Big Linked Data
 
PDF
Leopard ISWC Semantic Web Challenge 2017 (poster)
Holistic Benchmarking of Big Linked Data
 
PDF
Leopard ISWC Semantic Web Challenge 2017
Holistic Benchmarking of Big Linked Data
 
PDF
Benchmarking Link Discovery Systems for Geo-Spatial Data - BLINK ISWC2017.
Holistic Benchmarking of Big Linked Data
 
PDF
Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017
Holistic Benchmarking of Big Linked Data
 
PDF
High-Performance Approach to String Similarity using Most Frequent K Characters
Holistic Benchmarking of Big Linked Data
 
PPTX
Benchmarking Faceted Browsing Capabilities of Triple Stores
Holistic Benchmarking of Big Linked Data
 
PDF
QALD-7 Question Answering over Linked Data Challenge
Holistic Benchmarking of Big Linked Data
 
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
Holistic Benchmarking of Big Linked Data
 
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
Holistic Benchmarking of Big Linked Data
 
SQCFramework: SPARQL Query Containment Benchmarks Generation Framework
Holistic Benchmarking of Big Linked Data
 
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
Holistic Benchmarking of Big Linked Data
 
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
Holistic Benchmarking of Big Linked Data
 
Scalable Link Discovery for Modern Data-Driven Applications (poster)
Holistic Benchmarking of Big Linked Data
 
An Evaluation of Models for Runtime Approximation in Link Discovery
Holistic Benchmarking of Big Linked Data
 
Scalable Link Discovery for Modern Data-Driven Applications
Holistic Benchmarking of Big Linked Data
 
Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
Holistic Benchmarking of Big Linked Data
 
SPgen: A Benchmark Generator for Spatial Link Discovery Tools
Holistic Benchmarking of Big Linked Data
 
OKE2018 Challenge @ ESWC2018
Holistic Benchmarking of Big Linked Data
 
MOCHA 2018 Challenge @ ESWC2018
Holistic Benchmarking of Big Linked Data
 
Dynamic planning for link discovery - ESWC 2018
Holistic Benchmarking of Big Linked Data
 
Leopard ISWC Semantic Web Challenge 2017 (poster)
Holistic Benchmarking of Big Linked Data
 
Leopard ISWC Semantic Web Challenge 2017
Holistic Benchmarking of Big Linked Data
 
Benchmarking Link Discovery Systems for Geo-Spatial Data - BLINK ISWC2017.
Holistic Benchmarking of Big Linked Data
 
Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017
Holistic Benchmarking of Big Linked Data
 
High-Performance Approach to String Similarity using Most Frequent K Characters
Holistic Benchmarking of Big Linked Data
 
Benchmarking Faceted Browsing Capabilities of Triple Stores
Holistic Benchmarking of Big Linked Data
 
QALD-7 Question Answering over Linked Data Challenge
Holistic Benchmarking of Big Linked Data
 
Ad

Recently uploaded (20)

PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 

Benchmarking of distributed linked data streaming systems

  • 1. Benchmarking of distributed linked data streaming systems This project has received funding from the European Union's H2020 research and innovation action program under grant agreement number 688227. The project runtime is December 2015 until November 2018. The HOBBIT project Pavel Smirnov AGT International 1 Stream Reasoning Workshop January 17, 2018
  • 2. 2 Overview • The HOBBIT project • DEBS challenges • Available benchmarks overview • Summary
  • 3. Goal To abolish the barriers in the adoption and deployment of Big Linked Data by European companies by: • The deployment of benchmarks on data that reflects reality within realistic settings. • The provision of corresponding industry-relevant key performance indicators (KPIs). • The computation of comparable results on standardized hardware. • The institution of an independent and thus bias-free organization to conduct regular benchmarks and provide the European industry with up-to-date performance results. Deliverables: • The benchmarking platform (the HOBBIT platform) • The set of benchmarks with KPIs • Benchmarking association 3 The HOBBIT project. Overview http://project-hobbit.eu
  • 4. 4 The HOBBIT platform. Business logic 1 2 3. 2 3. 1 4 5 6 Customer Requires ranking of alternative solutions by some KPI Solution provider (vendor) (e.g. DB, Streaming Platforms, ML frameworks, etc…) The HOBBIT platform (online or local instance) Customer Requires ranking of alternative solutions by some KPI Customer Requires ranking of alternative solutions by some KPI Provides: 1. Automatic benchmark executions 2. Leaderboards (online or private) Main advantages: 1. Streaming fashion 2. Docker virtualization 3. RDF-enabled Submit benchmarks Submit systems http://github.com/hobbit-project/platform
  • 5. 5 The HOBBIT platform. Architecture The data pipeline: 1. Raw/initial data send (optional) 2. Sending raw tuples 3.1 Sending tasks (task={tuple, id}) 3.2 Sending expected results per tasks 4. Send actual results per tasks 5. Send the “expected-actual” pairs 6. Send KPIs back to the controller 7. Send KPIs back to the platform Benchmark (customer’s application) System components (black box for customers) Platform components 1 2 3.1 3.2 4 5 6 The online platform: http://master.project-hobbit.eu/ Cluster: 6 nodes, each is 2×64 bit Intel Xeon E5-2630v3 (8-Cores, 2.4 GHz, HT, 20MB Cache, each proc.), 256 GB RAM, 1Gb Ethernet Nodes (benchmark/system): 3/3 https://github.com/hobbit-project/platform/wiki/Overview 7 http://github.com/hobbit-project/platform
  • 6. 6 The HOBBIT platform. Technologies https://github.com/hobbit-project/platform/wiki/Overview Platform communication channel (RarritMQ only) Data transportation channel (app-specific) Platform-side: 1. Java 2. RabbitMQ 3. Docker+Swarm 4. GitLab 5. Redis 6. Virtuoso (RDF) 7. NodeJS 8. KeyCloak App-side (defaults): 1. Java 2. RabbitMQ Application side Platform side (RabbitMQ, Kafka, Netty, Akka…) http://github.com/hobbit-project/platform
  • 7. Design and upload to HOBBIT Create a project at https://git.project-hobbit.eu Create and account at https://master.project-hobbit.eu Clone and extend the basic codes: https://github.com/hobbit-project/java-sdk- example Design components using the manuals: Run tests locally as pure java code Update ttl-files for you project Upload Design (alternative using the JAVA SDK) Develop a benchmark component in Java Develop a component in Java Develop a system adapter Develop a system adapter in Java Create docker files using details (manual) Design (the standard HOBBIT way) Debug Docker images by running tests Find your benchmark or system at https://master.project-hobbit.eu Build images (manual) Configure remote project details Upload docker images to https://git.project-hobbit.eu - Lots of understanding and manual work - Impossible to debug locally * - Upload non-tested images * - No logs from the online platform, only GUI * + Clone and extend standard classes with your logic + Test and debug your code from IDE + Built Docker images on demand from IDE + Run your images from IDE, check all internal logs + Upload fully tested images 7 * Unless you haven’t a local HOBBIT deployment
  • 8. 8 Example: single benchmark run http://master.project-hobbit.eu/
  • 9. 9 Example: challenges & leaderboards http://master.project-hobbit.eu/
  • 10. Challenges: DEBS GC 2017 DEBS Grand Challenge 2017 successfully completed Anomaly detection for injection molding machines over RDF-streams. 10 14 teams registered 7 teams passed correctness check 2 were awarded (main and audience award) StreaML Open Challenge is opened; Price: 500 € The main result: For the first time we can objectively quantify the performance of a distributed stream processing pipeline running analytics algorithms https://project-hobbit.eu/challenges/debs-grand-challenge/ https://project-hobbit.eu/open-challenges/streaml-open-challenge/ Find Cluster Centers Over W time units Apply Markov Model for Anomaly Detection Train Markov Model over last W time units start After at least W time units The anomaly detector:
  • 11. Challenges: DEBS GC 2018 DEBS Grand Challenge 2018 is just started https://project-hobbit.eu/challenges/debs2018-grand-challenge/ Prediction of arrival times and ports on marine traffic data. Price: 1000 € + publication at DEBS proceedings (conf. will be in New Zealand) 11 • Synthetic generated data • Predefined algorithms • True RDF-streaming benchmark • Focus: correctness check, throughput, latency • Real annotated data • No predefined approach • True ML-benchmark • Focus: prediction accuracy, performance DEBS Grand Challenge 2018DEBS Grand Challenge 2017
  • 12. 12 Available benchmarks overview Versioning Benchmark • Benchmark for assessing an ability of versioning systems to efficiently manage evolving datasets and queries Data Storage Benchmark  benchmark for RDF data storage solutions against an interactive workload in a real-world scenario, using various dataset sizes Linking Benchmark  Benchmark for assessing the performance of instance Matching tools that implement string-based approaches Faceted Browsing Benchmark • Benchmark for systems which support browsing through linked data by iterative transitions performed by an intelligent user ODIN Benchmark • benchmark for data extraction solutions for structured data • simulates the ingestion, storage and retrieval of streams of RDF data Spatial Benchmark  Benchmark for systems which deal with topological relations proposed in the state of the art DE-9IM model. Question Answering Benchmark • Benchmark for ranking question answering systems based on their performance and accuracy GERBIL Benchmark • benchmark for entity annotation and disambiguation tools • 9 annotators, 11 RDF datasets Stream Machine Learning Benchmark  Benchmark for assess the performance of anomaly detection for injection molding machines over RDF-streams Stream Machine Learning Benchmark v2 • Benchmark for assess the accuracy of prediction over stream of marine traffic data http://github.com/hobbit-project
  • 13. Summary The HOBBIT platform • Ability to benchmark heterogeneous distibuted systems in streaming fashion • A set of benchmarks to compare relevant Linked Data technologies and solutions • We apply the HOBBIT platform to rank machine-learning pipelines over the RDF-streams • The platform may be a basics for benchmark of stream-reasoning solutions 13
  • 14. QA Thank you for attention! 14 [email protected] http://twitter.com/smirnp http://twitter.com/AGTIntl