SlideShare a Scribd company logo
Session 9 - Big Data and Machine Learning with FIWARE
Fernando López, Cloud & Platform Senior Expert
fernando.lopez@fiware.org
@flopezaguilar
FIWARE Foundation, e.V.
Learning Goals
1
● Introduction to Big Data
● Different between Apache Flink and Spark
● FIWARE connectors
● (Work in Progress) Machine Learning in FIWARE
2
Introduction to Big Data
Big Data
3
Big Data Analytics
4
Indexed
Storage
(RDBMS,
Apache
Solr)
Interactive
Processing
(e.g. Drill,
BigQuery,
OLAP)
MapReduce
(e,g, Spark, Hadoop)
Realtime
Analytics
(CEP,
Stream
Processing)
In-Memory
Computing
(e.g. Spark,
SAP Hana,
VoltDB)SizeodtheDataHandled
(persecond)
millis seconds minutes hours days
Time to Act
100k
events
(100MBs)
1k events
(1MBs)
100 events
(10KBs)
5
NGSI-LD
Based on
Source: https://docbox.etsi.org/ISG/CIM/Open/NGSI-LD_introduction.pdf
https://www.webfirst.com/services/open-data-solutions
6
ETL architecture
Source: https://www.red-gate.com/simple-talk/sql/database-
delivery/database-lifecycle-management-for-etl-systems/
7
Lambda architecture
8
Kappa architecture
Source: Siddharth Mittal
Simple Smart solutions: Reference Architecture
9
Draco
Kurento
Wirecloud
QuantumLeap
Knowage
Flink
CrateDB
10
FIWARE Cosmos: Orion Flink Connector
Features
▪ The Cosmos Generic Enabler enables an easier BigData analysis over context
integrated with some of the most popular BigData platforms.
▪ Batch Processing
▪ Stream Processing (Real-time)
▪ Direct data ingestion
▪ Direct connection with Context Broker
▪ Multiple Sinks
11
Apache Flink
▪ Framework and distributed processing engine for stateful computations over unbounded
and bounded data streams.
▪ Designed to run in all common cluster environments, perform computations at in-memory
speed and at any scale.
12
Architecture
13
Connection
14
ORION
Context Broker
Flink Cluster
Flink Job (JAR)
orion-flink-connector
HTTP POST (Notification)
HTTP POST/PUT/PATCH
OrionSource
OrionSink
OrionSource
15
▪ Receives data from the Orion Context Broker from a given port.
▪ The received data is a Stream of NgsiEvent object.
val eventStream = env.addSource(new OrionSource(9001))
OrionSink
16
▪ Sends data back to the Orion Context Broker:
▪ Takes a stream of OrionSinkObjects as a source:
• content: Message content in String format. If it is a JSON, it needs to be stringified.
• url: URL to which the message should be sent
• contentType: Type of HTTP content of the message (JSON, Plain)
• method: HTTP method of the message (POST, PUT, PATCH)
OrionSink.addSink( processedDataStream )
Basic example
17
final val URL_CB = "http://flinkexample_orion_1:1026/v2/entities/"
final val CONTENT_TYPE = ContentType.JSON
final val METHOD = HTTPMethod.POST
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
// Create Orion Source. Receive notifications on port 9001
val eventStream = env.addSource(new OrionSource(9001))
// Process event stream
Basic example
18
// Process event stream
val processedDataStream = eventStream
.flatMap(event => event.entities)
.map(entity => {
val temp = entity.attrs("temperature").value.asInstanceOf[Number].floatValue()
new Temp_Node(entity.id, temp)
})
.keyBy("id")
.timeWindow(Time.seconds(5), Time.seconds(2))
.min("temperature")
.map(tempNode => {
val url = URL_CB + tempNode.id + "/attrs"
OrionSinkObject(tempNode.toString, url, CONTENT_TYPE, METHOD)
})
// Add Orion Sink
Basic example
19
// Add Orion Sink
OrionSink.addSink( processedDataStream )
// …
}
20
FIWARE Cosmos: Orion Spark Connector
Spark Components
21
Spark Scheduler
22
join
union
groupBy
map
Stage 3
Stage 1
Stage 2
A: B:
C: D:
E:
F:
G:
= cached data partition
▪ Dryad-like DAGs
▪ Pipelines functions within a stage
▪ Cache-aware work reuse & locality
▪ Partitioning-aware to avoid shuffles
Motivation of Spark
23
▪ Iterative algorithms (machine learning, graphs)
▪ Interactive data mining tools (R, Excel, Python)
Connection
24
ORION
Context Broker
Spark Cluster
Spark Job (JAR)
orion-spark-connector
HTTP POST (Notification)
HTTP POST/PUT/PATCH
OrionReceiver
OrionSink
25
FIWARE & Machine Learning
26
Machine Learning
Machine Learning Development Lifecycle
27
Machine Learning Algorithms
28
▪ Some solutions have
high algorithm complexity
▪ Some can be parallelized
in a cluster (FlinkML)
▪ Other can use GPU (e.g.
Tensorflow)
▪ Even each case could be
different we try to set up
some generic life cycle.
29
ML Standard Solution
▪ Each problem requires an analysis of which ML algorithm suits our data.
▪ Later, the training dataset needs to be set up.
▪ Each problem may be slightly different (“same same but different”).
▪ We can provide some solutions for some cases and use a proper dataset.
▪ The tool to use (Spark, Flink, Tensorflow) depends on the chosen ML algorithm (not all the
ML algorithms are in all the architectures).
30
Current Status
Orion Connector
Orion Source/Receiver + Orion Sink ✔ ✔
RTD Documentation ✔ ✔
Unit Tests ✔ ✔
Examples ✔ ✔
Step-by-step tutorial ✔
Support NGSI LD
Summary: Terms
31
● OLAP, Online Analytical Processing.
● OLTP, Online Transaction Processing.
● RDBMS, Relational Database Management System.
● ETL, Extract, Transform, Load.
● ERP, Enterprise Resource Planning.
● CRM, Customer relationship management.
Summary: Terms
32
● OSV, Output Slot Vector.
● BI, Business Intelligence.
● HDFS, Hadoop Distributed File System
● DAG, Directed Acyclic Graph. The DAG defines the dataflow of the application, and the vertices of the
graph defines the operations that are to be performed on the data.
References
▪ FIWARE Catalogue
• https://www.fiware.org/developers/catalogue
▪ FIWARE Academy:
• https://fiware-academy.readthedocs.io/en/latest/processing/wirecloud
▪ Installation, administration & reference documentation is available on Read The Docs:
• https://fiware-cosmos-flink.readthedocs.io
33
References
▪ GitHub
• https://github.com/ging/fiware-cosmos-orion-flink-connector
• https://github.com/ging/fiware-cosmos-orion-spark-connector
• https://github.com/ging/fiware-cosmos-orion-flink-connector-examples
• https://github.com/ging/fiware-cosmos-orion-spark-connector-examples
34
Question & Answer
35
fiware-tech-help@lists.fiware.org
Big Data and Machine Learning with FIWARE
3
7

More Related Content

What's hot (20)

PDF
FIWARE Training: Introduction to Smart Data Models
FIWARE
 
PDF
FIWARE Wednesday Webinars - Introduction to NGSI-LD
FIWARE
 
PPTX
FIWARE Wednesday Webinars - IoT Agents
FIWARE
 
PDF
Data Modeling with NGSI, NGSI-LD
Fernando Lopez Aguilar
 
PPTX
FIWARE: Managing Context Information at large scale
Fermin Galan
 
PDF
Session 1 - Introduction to i4Trust Data Spaces, building blocks, and roles |...
FIWARE
 
PDF
Introduction to Smart Data Models
FIWARE
 
PDF
i4Trust - Overview
FIWARE
 
PDF
CockroachDB: Architecture of a Geo-Distributed SQL Database
C4Media
 
PDF
FIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWARE
FIWARE
 
PDF
Introduction to Apache NiFi dws19 DWS - DC 2019
Timothy Spann
 
PDF
FIWARE Global Summit - FIWARE Overview
FIWARE
 
PDF
FIWARE Training: NGSI-LD Advanced Operations
FIWARE
 
PDF
Session 8 - Creating Data Processing Services | Train the Trainers Program
FIWARE
 
PDF
Google Cloud Dataflow
Alex Van Boxel
 
PDF
Integrating Fiware Orion, Keyrock and Wilma
Dalton Valadares
 
PPTX
FIWARE Wednesday Webinars - How to Design DataModels
FIWARE
 
PDF
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
StreamNative
 
PDF
Session 3 - i4Trust components for Identity Management and Access Control i4T...
FIWARE
 
PPTX
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
DataWorks Summit/Hadoop Summit
 
FIWARE Training: Introduction to Smart Data Models
FIWARE
 
FIWARE Wednesday Webinars - Introduction to NGSI-LD
FIWARE
 
FIWARE Wednesday Webinars - IoT Agents
FIWARE
 
Data Modeling with NGSI, NGSI-LD
Fernando Lopez Aguilar
 
FIWARE: Managing Context Information at large scale
Fermin Galan
 
Session 1 - Introduction to i4Trust Data Spaces, building blocks, and roles |...
FIWARE
 
Introduction to Smart Data Models
FIWARE
 
i4Trust - Overview
FIWARE
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
C4Media
 
FIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWARE
FIWARE
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Timothy Spann
 
FIWARE Global Summit - FIWARE Overview
FIWARE
 
FIWARE Training: NGSI-LD Advanced Operations
FIWARE
 
Session 8 - Creating Data Processing Services | Train the Trainers Program
FIWARE
 
Google Cloud Dataflow
Alex Van Boxel
 
Integrating Fiware Orion, Keyrock and Wilma
Dalton Valadares
 
FIWARE Wednesday Webinars - How to Design DataModels
FIWARE
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
StreamNative
 
Session 3 - i4Trust components for Identity Management and Access Control i4T...
FIWARE
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
DataWorks Summit/Hadoop Summit
 

Similar to Big Data and Machine Learning with FIWARE (20)

PPTX
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
MapR Technologies
 
KEY
Getting Started on Hadoop
Paco Nathan
 
PPTX
Big data, just an introduction to Hadoop and Scripting Languages
Corley S.r.l.
 
PDF
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
Sriskandarajah Suhothayan
 
PDF
ACM DEBS 2015: Realtime Streaming Analytics Patterns
Srinath Perera
 
PDF
Consuming RESTful services in PHP
Zoran Jeremic
 
PDF
Consuming RESTful Web services in PHP
Zoran Jeremic
 
PDF
Ingesting hdfs intosolrusingsparktrimmed
whoschek
 
PDF
20170126 big data processing
Vienna Data Science Group
 
PPTX
Why apache Flink is the 4G of Big Data Analytics Frameworks
Slim Baltagi
 
PDF
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
MapR Technologies
 
PDF
Enterprise guide to building a Data Mesh
Sion Smith
 
PDF
Metadata and Provenance for ML Pipelines with Hopsworks
Jim Dowling
 
PPTX
Eagle from eBay at China Hadoop Summit 2015
Hao Chen
 
PDF
Spark meetup TCHUG
Ryan Bosshart
 
PPTX
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
Flink Forward
 
PDF
introduction to data processing using Hadoop and Pig
Ricardo Varela
 
PPTX
Building Deep Learning Workflows with DL4J
Josh Patterson
 
PDF
2021 04-20 apache arrow and its impact on the database industry.pptx
Andrew Lamb
 
PPTX
Big data week presentation
Joseph Adler
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
MapR Technologies
 
Getting Started on Hadoop
Paco Nathan
 
Big data, just an introduction to Hadoop and Scripting Languages
Corley S.r.l.
 
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
Sriskandarajah Suhothayan
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
Srinath Perera
 
Consuming RESTful services in PHP
Zoran Jeremic
 
Consuming RESTful Web services in PHP
Zoran Jeremic
 
Ingesting hdfs intosolrusingsparktrimmed
whoschek
 
20170126 big data processing
Vienna Data Science Group
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Slim Baltagi
 
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
MapR Technologies
 
Enterprise guide to building a Data Mesh
Sion Smith
 
Metadata and Provenance for ML Pipelines with Hopsworks
Jim Dowling
 
Eagle from eBay at China Hadoop Summit 2015
Hao Chen
 
Spark meetup TCHUG
Ryan Bosshart
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
Flink Forward
 
introduction to data processing using Hadoop and Pig
Ricardo Varela
 
Building Deep Learning Workflows with DL4J
Josh Patterson
 
2021 04-20 apache arrow and its impact on the database industry.pptx
Andrew Lamb
 
Big data week presentation
Joseph Adler
 
Ad

More from Fernando Lopez Aguilar (20)

PDF
Introduction to FIWARE technology
Fernando Lopez Aguilar
 
PDF
DW2020 Data Models - FIWARE Platform
Fernando Lopez Aguilar
 
PPTX
How to deploy a smart city platform?
Fernando Lopez Aguilar
 
PPTX
Building the Smart City Platform on FIWARE Lab
Fernando Lopez Aguilar
 
PDF
FIWARE and Robotics
Fernando Lopez Aguilar
 
PDF
Operational Dashboards with FIWARE WireCloud
Fernando Lopez Aguilar
 
PDF
FIWARE Identity Management and Access Control
Fernando Lopez Aguilar
 
PDF
Data persistency (draco, cygnus, sth comet, quantum leap)
Fernando Lopez Aguilar
 
PDF
How to debug IoT Agents
Fernando Lopez Aguilar
 
PDF
Core Context Management
Fernando Lopez Aguilar
 
PDF
What is an IoT Agent
Fernando Lopez Aguilar
 
PDF
FIWARE Overview
Fernando Lopez Aguilar
 
PDF
Overview of the FIWARE Ecosystem
Fernando Lopez Aguilar
 
PPTX
Cloud and Big Data in the agriculture sector
Fernando Lopez Aguilar
 
PDF
Berlin OpenStack Summit'18
Fernando Lopez Aguilar
 
PPTX
Context Information Management in IoT enabled smart systems - the basics
Fernando Lopez Aguilar
 
PPTX
FIWARE IoT Introduction 1
Fernando Lopez Aguilar
 
PPTX
Introduction to FIWARE IoT
Fernando Lopez Aguilar
 
PPTX
Setting up your virtual infrastructure using FIWARE Lab Cloud
Fernando Lopez Aguilar
 
PDF
Connecting to the internet of things (IoT)
Fernando Lopez Aguilar
 
Introduction to FIWARE technology
Fernando Lopez Aguilar
 
DW2020 Data Models - FIWARE Platform
Fernando Lopez Aguilar
 
How to deploy a smart city platform?
Fernando Lopez Aguilar
 
Building the Smart City Platform on FIWARE Lab
Fernando Lopez Aguilar
 
FIWARE and Robotics
Fernando Lopez Aguilar
 
Operational Dashboards with FIWARE WireCloud
Fernando Lopez Aguilar
 
FIWARE Identity Management and Access Control
Fernando Lopez Aguilar
 
Data persistency (draco, cygnus, sth comet, quantum leap)
Fernando Lopez Aguilar
 
How to debug IoT Agents
Fernando Lopez Aguilar
 
Core Context Management
Fernando Lopez Aguilar
 
What is an IoT Agent
Fernando Lopez Aguilar
 
FIWARE Overview
Fernando Lopez Aguilar
 
Overview of the FIWARE Ecosystem
Fernando Lopez Aguilar
 
Cloud and Big Data in the agriculture sector
Fernando Lopez Aguilar
 
Berlin OpenStack Summit'18
Fernando Lopez Aguilar
 
Context Information Management in IoT enabled smart systems - the basics
Fernando Lopez Aguilar
 
FIWARE IoT Introduction 1
Fernando Lopez Aguilar
 
Introduction to FIWARE IoT
Fernando Lopez Aguilar
 
Setting up your virtual infrastructure using FIWARE Lab Cloud
Fernando Lopez Aguilar
 
Connecting to the internet of things (IoT)
Fernando Lopez Aguilar
 
Ad

Recently uploaded (20)

PPT
Agilent Optoelectronic Solutions for Mobile Application
andreashenniger2
 
PPTX
原版西班牙莱昂大学毕业证(León毕业证书)如何办理
Taqyea
 
PPTX
internet básico presentacion es una red global
70965857
 
PPTX
PE introd.pptxfrgfgfdgfdgfgrtretrt44t444
nepmithibai2024
 
PDF
Web Hosting for Shopify WooCommerce etc.
Harry_Phoneix Harry_Phoneix
 
PPTX
Optimization_Techniques_ML_Presentation.pptx
farispalayi
 
PPTX
unit 2_2 copy right fdrgfdgfai and sm.pptx
nepmithibai2024
 
PPTX
西班牙武康大学毕业证书{UCAMOfferUCAM成绩单水印}原版制作
Taqyea
 
PDF
Apple_Environmental_Progress_Report_2025.pdf
yiukwong
 
PPTX
ONLINE BIRTH CERTIFICATE APPLICATION SYSYTEM PPT.pptx
ShyamasreeDutta
 
PPT
Computer Securityyyyyyyy - Chapter 2.ppt
SolomonSB
 
PDF
𝐁𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓
hokimamad0
 
PPTX
L1A Season 1 Guide made by A hegy Eng Grammar fixed
toszolder91
 
PDF
The-Hidden-Dangers-of-Skipping-Penetration-Testing.pdf.pdf
naksh4thra
 
PPT
introduction to networking with basics coverage
RamananMuthukrishnan
 
PPTX
Research Design - Report on seminar in thesis writing. PPTX
arvielobos1
 
PDF
Build Fast, Scale Faster: Milvus vs. Zilliz Cloud for Production-Ready AI
Zilliz
 
PPTX
PM200.pptxghjgfhjghjghjghjghjghjghjghjghjghj
breadpaan921
 
PPTX
L1A Season 1 ENGLISH made by A hegy fixed
toszolder91
 
PPTX
英国假毕业证诺森比亚大学成绩单GPA修改UNN学生卡网上可查学历成绩单
Taqyea
 
Agilent Optoelectronic Solutions for Mobile Application
andreashenniger2
 
原版西班牙莱昂大学毕业证(León毕业证书)如何办理
Taqyea
 
internet básico presentacion es una red global
70965857
 
PE introd.pptxfrgfgfdgfdgfgrtretrt44t444
nepmithibai2024
 
Web Hosting for Shopify WooCommerce etc.
Harry_Phoneix Harry_Phoneix
 
Optimization_Techniques_ML_Presentation.pptx
farispalayi
 
unit 2_2 copy right fdrgfdgfai and sm.pptx
nepmithibai2024
 
西班牙武康大学毕业证书{UCAMOfferUCAM成绩单水印}原版制作
Taqyea
 
Apple_Environmental_Progress_Report_2025.pdf
yiukwong
 
ONLINE BIRTH CERTIFICATE APPLICATION SYSYTEM PPT.pptx
ShyamasreeDutta
 
Computer Securityyyyyyyy - Chapter 2.ppt
SolomonSB
 
𝐁𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓
hokimamad0
 
L1A Season 1 Guide made by A hegy Eng Grammar fixed
toszolder91
 
The-Hidden-Dangers-of-Skipping-Penetration-Testing.pdf.pdf
naksh4thra
 
introduction to networking with basics coverage
RamananMuthukrishnan
 
Research Design - Report on seminar in thesis writing. PPTX
arvielobos1
 
Build Fast, Scale Faster: Milvus vs. Zilliz Cloud for Production-Ready AI
Zilliz
 
PM200.pptxghjgfhjghjghjghjghjghjghjghjghjghj
breadpaan921
 
L1A Season 1 ENGLISH made by A hegy fixed
toszolder91
 
英国假毕业证诺森比亚大学成绩单GPA修改UNN学生卡网上可查学历成绩单
Taqyea
 

Big Data and Machine Learning with FIWARE