SlideShare a Scribd company logo
Dataflow with
Apache NiFi
Aldrin Piri - @aldrinpiri
Apache NiFi Meetup
Hadoop Summit – San Jose
27 June 2016
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Slides available at the conclusion of the
talk:
http://slideshare.net/aldrinpiri/
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key: 'Apache NiFi’
Value: 'PMC Member'
Key: 'Work’
Value: ’Sr. Member of Technical Staff @ Hortonworks'
Key: 'Working with NiFi Since’
Value: '2010’
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
What is dataflow and what are the challenges?
Apache NiFi
Architecture
Live Demo
Community
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
What is dataflow and what are the challenges?
Apache NiFi
Architecture
Live Demo
Community
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Let’s Connect A to B
Producers A.K.A Things
Anything
AND
Everything
Internet!
Consumers
• User
• Storage
• System
• …More Things
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Moving data effectively is hard
Standards: http://xkcd.com/927/
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why is moving data effectively hard?
 Standards
 Formats
 “Exactly Once” Delivery
 Protocols
 Veracity of Information
 Validity of Information
 Ensuring Security
 Overcoming Security
 Compliance
 Schemas
 Consumers Change
 Credential Management
 “That [person|team|group]”
 Network
 “Exactly Once” Delivery
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Let’s Connect Lots of As to Bs to As to Cs to Bs to Δs to Cs to ϕs
Let’s consider the needs of a courier service
Physical Store
Gateway
Server
Mobile Devices
Registers
Server Cluster
Distribution Center Core Data Center at HQ
Server Cluster
On Delivery Routes
Trucks Deliverers
Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/
Deliverer: Rigo Peter, https://thenounproject.com/rigo/
Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/
Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Great! I am collecting all this data! Let’s use it!
Finding our needles in the haystack
Physical Store
Gateway
Server
Mobile Devices
Registers
Server Cluster
Distribution Center
Kafka
Core Data Center at HQ
Server Cluster
Others
Storm / Spark /
Flink / Apex
Kafka
Storm / Spark / Flink / Apex
On Delivery Routes
Trucks Deliverers
Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/
Deliverer: Rigo Peter, https://thenounproject.com/rigo/
Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/
Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why is moving data effectively hard when scoped internally?
 Standards
 Formats
 “Exactly Once” Delivery
 Protocols
 Veracity of Information
 Validity of Information
 Ensuring Security
 Overcoming Security
 Compliance
 Schemas
 Consumers Change
 Credential Management
 “That [person|team|group]”
 Network
 “Exactly Once” Delivery
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Let’s Connect Lots of As to Bs to As to Cs to Bs to Δs to Cs to ϕs
Oh, that courier service is global
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why is moving data effectively hard when scoped globally?
 Standards
 Formats
 “Exactly Once” Delivery
 Protocols
 Veracity of Information
 Validity of Information
 Ensuring Security
 Overcoming Security
 Compliance
 Schemas
 Consumers Change
 Credential Management
 “That [person|team|group]”
 Network
 “Exactly Once” Delivery
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The Unassuming Line: A Case Study
We’ve seen a few lines show up in the wild thus far
Internet! Inter- & Intra- connections in
our global courier enterprise
Spotlight: Arthur Lacôte, https://thenounproject.com/turo/
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dataflow Line Anatomy 101
Let’s dissect what this line typically represents
Fig 1. Lineus Worldwidewebus. Common Name: Internet!
Script or
Application
Script or
Application
Data Data
Disparate Transport
Mechanisms
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dataflow Line Anatomy 201
Sometimes that transport is just more lines
Fig 1. Lineus Worldwidewebus. Common Name: Internet!
Script or
Application
Script or
Application
Line Inception
Data Data
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dataflow Line Anatomy 301
But those lines could also have components…
Fig 1. Lineus Worldwidewebus. Common Name: Internet! Fig 2. Good Recursion Joke
NoSuchJokeException
footage not found
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
What is dataflow and what are the challenges?
Apache NiFi
Architecture
Live Demo
Community
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi
Key Features
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Supports push and pull
models
• Recovery/recording
a rolling log of fine-
grained history
• Visual command and
control
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi Subproject: MiNiFi
 Let me get the key parts of NiFi close to where data begins and provide bidrectional
communication
 NiFi lives in the data center. Give it an enterprise server or a cluster of them.
 MiNiFi lives as close to where data is born and is a guest on that device or system
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Let’s revisit our courier service from the perspective of NiFi
Physical Store
Gateway
Server
Mobile Devices
Registers
Server Cluster
Distribution Center
Kafka
Core Data Center at HQ
Server Cluster
Others
Storm / Spark /
Flink / Apex
Kafka
Storm / Spark / Flink / Apex
On Delivery Routes
Trucks Deliverers
Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/
Deliverer: Rigo Peter, https://thenounproject.com/rigo/
Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/
Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/
Client
Libraries
Client
Libraries
MiNiFi
MiNiFi
NiFi NiFi NiFi NiFi NiFi NiFi
Client
Libraries
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi Managed Dataflow
SOURCES
REGIONAL
INFRASTRUCTURE
CORE
INFRASTRUCTURE
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi is based on Flow Based Programming (FBP)
FBP Term NiFi Term Description
Information
Packet
FlowFile Each object moving through the system.
Black Box FlowFile
Processor
Performs the work, doing some combination of data routing, transformation,
or mediation between systems.
Bounded
Buffer
Connection The linkage between processors, acting as queues and allowing various
processes to interact at differing rates.
Scheduler Flow
Controller
Maintains the knowledge of how processes are connected, and manages the
threads and allocations thereof which all processes use.
Subnet Process
Group
A set of processes and their connections, which can receive and send data via
ports. A process group allows creation of entirely new component simply by
composition of its components.
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
FlowFiles & Data Agnosticism
 NiFi is data agnostic!
 But, NiFi was designed understanding that users
can care about specifics and provides tooling
to interact with specific formats, protocols, etc.
ISO 8601 - http://xkcd.com/1179/
Robustness principle
Be conservative in what you do,
be liberal in what you accept from others“
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
FlowFiles are like HTTP data
HTTP Data FlowFile
HTTP/1.1 200 OK
Date: Sun, 10 Oct 2010 23:26:07 GMT
Server: Apache/2.2.8 (CentOS) OpenSSL/0.9.8g
Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT
ETag: "45b6-834-49130cc1182c0"
Accept-Ranges: bytes
Content-Length: 13
Connection: close
Content-Type: text/html
Hello world!
Standard FlowFile Attributes
Key: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016'
Key: 'lineageStartDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016'
Key: 'fileSize’ Value: '23609'
FlowFile Attribute Map Content
Key: 'filename’ Value: '15650246997242'
Key: 'path’ Value: './’
Binary Content *
Header
Content
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
What is dataflow and what are the challenges?
Apache NiFi
Architecture
Live Demo
Community
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local Storage
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local Storage
Architecture OS/Host
JVM
NiFi Cluster Manger – Request Replicator
Web Server
Master
NiFi Cluster
Manager (NCM)
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFile
Repository
Content
Repository
Provenance
Repository
Local Storage
Slaves
NiFi Nodes
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Architecture – Repositories - Pass by reference
FlowFile Content Provenance
F1 C1 C1 P1 F1
Excerpt of demo flow… What’s happening inside the repositories…
BEFORE
AFTER
F2 C1 C1 P3 F2 – Clone (F1)
F1 C1 P2 F1 – Route
P1 F1 – Create
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Architecture – Repositories – Copy on Write
FlowFile Content Provenance
F1 C1 C1 P1 F1 - CREATE
Excerpt of demo flow… What’s happening inside the repositories…
BEFORE
AFTER
F1 C1
F1.1 C2 C2 (encrypted)
C1 (plaintext)
P2 F1.1 - MODIFY
P1 F1 - CREATE
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
What is dataflow and what are the challenges?
Apache NiFi
Architecture
Demo
Community
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Learn, Share at Birds of a Feather
Streaming, DataFlow & Cybersecurity
Thursday June 30
6:30 pm, Ballroom C
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
What is dataflow and what are the challenges?
Apache NiFi
Architecture
Demo
Community
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Matured at NSA 2006-2014
Brief history of the Apache NiFi Community
• Contributors from Government and several commercial industries
• Releases on a 6-8 week schedule
• Apache NiFi 1.0.0. release on the horizon
• Zero-Master Clustering
Code developed
at NSA
2006
Today
Achieved TLP
status in just
7 months
July 2015
Code available
open source
ASL v2
November 2014
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extension / Integration Points
NiFi Term Description
Flow File
Processor
Push/Pull behavior. Custom UI
Reporting
Task
Used to push data from NiFi to some external service (metrics, provenance,
etc..)
Controller
Service
Used to enable reusable components / shared services throughout the flow
REST API Allows clients to connect to pull information, change behavior, etc..
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Learn more and join us!
Apache NiFi site
http://nifi.apache.org
Subproject MiNiFi site
http://nifi.apache.org/minifi/
Subscribe to and collaborate at
dev@nifi.apache.org
users@nifi.apache.org
Submit Ideas or Issues
https://issues.apache.org/jira/browse/NIFI
Follow us on Twitter
@apachenifi

More Related Content

What's hot (20)

PDF
Introduction to Apache NiFi dws19 DWS - DC 2019
Timothy Spann
 
PDF
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
GetInData
 
PPTX
NJ Hadoop Meetup - Apache NiFi Deep Dive
Bryan Bende
 
PDF
Apache NiFi Meetup - Princeton NJ 2016
Timothy Spann
 
PDF
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
PPTX
Real-Time Data Flows with Apache NiFi
Manish Gupta
 
PDF
Nifi workshop
Yifeng Jiang
 
PDF
Apache NiFi SDLC Improvements
Bryan Bende
 
PDF
Introduction to data flow management using apache nifi
Anshuman Ghosh
 
PPTX
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
Dremio Corporation
 
PPTX
Integrating NiFi and Flink
Bryan Bende
 
PPTX
Securing Hadoop with Apache Ranger
DataWorks Summit
 
PDF
Introduction to Apache NiFi 1.11.4
Timothy Spann
 
PPTX
Introduction to NOSQL databases
Ashwani Kumar
 
PDF
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
PDF
Neo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j
 
PDF
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
PPTX
ORC File - Optimizing Your Big Data
DataWorks Summit
 
PDF
Apache Nifi Crash Course
DataWorks Summit
 
PPTX
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Timothy Spann
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
GetInData
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
Bryan Bende
 
Apache NiFi Meetup - Princeton NJ 2016
Timothy Spann
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
Real-Time Data Flows with Apache NiFi
Manish Gupta
 
Nifi workshop
Yifeng Jiang
 
Apache NiFi SDLC Improvements
Bryan Bende
 
Introduction to data flow management using apache nifi
Anshuman Ghosh
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
Dremio Corporation
 
Integrating NiFi and Flink
Bryan Bende
 
Securing Hadoop with Apache Ranger
DataWorks Summit
 
Introduction to Apache NiFi 1.11.4
Timothy Spann
 
Introduction to NOSQL databases
Ashwani Kumar
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
Neo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
ORC File - Optimizing Your Big Data
DataWorks Summit
 
Apache Nifi Crash Course
DataWorks Summit
 
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
 

Viewers also liked (7)

PPTX
Hadoop 3 (2017 hadoop taiwan workshop)
Wei-Chiu Chuang
 
PDF
Hadoop 3 @ Hadoop Summit San Jose 2017
Junping Du
 
PDF
Coca-Cola East Japan - hadoop summit 2016
Damien Contreras
 
PPTX
Hadoop Summit Tokyo Apache NiFi Crash Course
DataWorks Summit/Hadoop Summit
 
PPTX
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
 
PDF
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
PDF
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
DataWorks Summit
 
Hadoop 3 (2017 hadoop taiwan workshop)
Wei-Chiu Chuang
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Junping Du
 
Coca-Cola East Japan - hadoop summit 2016
Damien Contreras
 
Hadoop Summit Tokyo Apache NiFi Crash Course
DataWorks Summit/Hadoop Summit
 
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
DataWorks Summit
 
Ad

Similar to Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose (20)

PPTX
Apache NiFi Crash Course - San Jose Hadoop Summit
Aldrin Piri
 
PPTX
State of the Apache NiFi Ecosystem & Community
Accumulo Summit
 
PDF
Apache NiFi - Flow Based Programming Meetup
Joseph Witt
 
PPTX
Connecting the Drops with Apache NiFi & Apache MiNiFi
DataWorks Summit
 
PDF
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 
PPTX
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Aldrin Piri
 
PPTX
The Avant-garde of Apache NiFi
DataWorks Summit/Hadoop Summit
 
PPTX
The Avant-garde of Apache NiFi
Joe Percivall
 
PDF
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Bryan Bende
 
PPTX
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Data Con LA
 
PDF
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Mats Johansson
 
PPTX
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks
 
PPTX
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Haimo Liu
 
PDF
Devnexus 2018 - Let Your Data Flow with Apache NiFi
Bryan Bende
 
PDF
HDF: Hortonworks DataFlow: Technical Workshop
Hortonworks
 
PDF
[253] apache ni fi
NAVER D2
 
PPTX
HDF Powered by Apache NiFi Introduction
Milind Pandit
 
PDF
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 
PPTX
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
Isheeta Sanghi
 
PPTX
BigData Techcon - Beyond Messaging with Apache NiFi
Aldrin Piri
 
Apache NiFi Crash Course - San Jose Hadoop Summit
Aldrin Piri
 
State of the Apache NiFi Ecosystem & Community
Accumulo Summit
 
Apache NiFi - Flow Based Programming Meetup
Joseph Witt
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
DataWorks Summit
 
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Aldrin Piri
 
The Avant-garde of Apache NiFi
DataWorks Summit/Hadoop Summit
 
The Avant-garde of Apache NiFi
Joe Percivall
 
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Bryan Bende
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Data Con LA
 
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Mats Johansson
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks
 
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Haimo Liu
 
Devnexus 2018 - Let Your Data Flow with Apache NiFi
Bryan Bende
 
HDF: Hortonworks DataFlow: Technical Workshop
Hortonworks
 
[253] apache ni fi
NAVER D2
 
HDF Powered by Apache NiFi Introduction
Milind Pandit
 
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
Isheeta Sanghi
 
BigData Techcon - Beyond Messaging with Apache NiFi
Aldrin Piri
 
Ad

Recently uploaded (20)

PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 

Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose

  • 1. Dataflow with Apache NiFi Aldrin Piri - @aldrinpiri Apache NiFi Meetup Hadoop Summit – San Jose 27 June 2016
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Slides available at the conclusion of the talk: http://slideshare.net/aldrinpiri/
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key: 'Apache NiFi’ Value: 'PMC Member' Key: 'Work’ Value: ’Sr. Member of Technical Staff @ Hortonworks' Key: 'Working with NiFi Since’ Value: '2010’
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda What is dataflow and what are the challenges? Apache NiFi Architecture Live Demo Community
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda What is dataflow and what are the challenges? Apache NiFi Architecture Live Demo Community
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Let’s Connect A to B Producers A.K.A Things Anything AND Everything Internet! Consumers • User • Storage • System • …More Things
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Moving data effectively is hard Standards: http://xkcd.com/927/
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why is moving data effectively hard?  Standards  Formats  “Exactly Once” Delivery  Protocols  Veracity of Information  Validity of Information  Ensuring Security  Overcoming Security  Compliance  Schemas  Consumers Change  Credential Management  “That [person|team|group]”  Network  “Exactly Once” Delivery
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Let’s Connect Lots of As to Bs to As to Cs to Bs to Δs to Cs to ϕs Let’s consider the needs of a courier service Physical Store Gateway Server Mobile Devices Registers Server Cluster Distribution Center Core Data Center at HQ Server Cluster On Delivery Routes Trucks Deliverers Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/ Deliverer: Rigo Peter, https://thenounproject.com/rigo/ Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/ Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Great! I am collecting all this data! Let’s use it! Finding our needles in the haystack Physical Store Gateway Server Mobile Devices Registers Server Cluster Distribution Center Kafka Core Data Center at HQ Server Cluster Others Storm / Spark / Flink / Apex Kafka Storm / Spark / Flink / Apex On Delivery Routes Trucks Deliverers Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/ Deliverer: Rigo Peter, https://thenounproject.com/rigo/ Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/ Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why is moving data effectively hard when scoped internally?  Standards  Formats  “Exactly Once” Delivery  Protocols  Veracity of Information  Validity of Information  Ensuring Security  Overcoming Security  Compliance  Schemas  Consumers Change  Credential Management  “That [person|team|group]”  Network  “Exactly Once” Delivery
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Let’s Connect Lots of As to Bs to As to Cs to Bs to Δs to Cs to ϕs Oh, that courier service is global
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why is moving data effectively hard when scoped globally?  Standards  Formats  “Exactly Once” Delivery  Protocols  Veracity of Information  Validity of Information  Ensuring Security  Overcoming Security  Compliance  Schemas  Consumers Change  Credential Management  “That [person|team|group]”  Network  “Exactly Once” Delivery
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The Unassuming Line: A Case Study We’ve seen a few lines show up in the wild thus far Internet! Inter- & Intra- connections in our global courier enterprise Spotlight: Arthur Lacôte, https://thenounproject.com/turo/
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dataflow Line Anatomy 101 Let’s dissect what this line typically represents Fig 1. Lineus Worldwidewebus. Common Name: Internet! Script or Application Script or Application Data Data Disparate Transport Mechanisms
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dataflow Line Anatomy 201 Sometimes that transport is just more lines Fig 1. Lineus Worldwidewebus. Common Name: Internet! Script or Application Script or Application Line Inception Data Data
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dataflow Line Anatomy 301 But those lines could also have components… Fig 1. Lineus Worldwidewebus. Common Name: Internet! Fig 2. Good Recursion Joke NoSuchJokeException footage not found
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda What is dataflow and what are the challenges? Apache NiFi Architecture Live Demo Community
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi Key Features • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Recovery/recording a rolling log of fine- grained history • Visual command and control • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi Subproject: MiNiFi  Let me get the key parts of NiFi close to where data begins and provide bidrectional communication  NiFi lives in the data center. Give it an enterprise server or a cluster of them.  MiNiFi lives as close to where data is born and is a guest on that device or system
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Let’s revisit our courier service from the perspective of NiFi Physical Store Gateway Server Mobile Devices Registers Server Cluster Distribution Center Kafka Core Data Center at HQ Server Cluster Others Storm / Spark / Flink / Apex Kafka Storm / Spark / Flink / Apex On Delivery Routes Trucks Deliverers Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/ Deliverer: Rigo Peter, https://thenounproject.com/rigo/ Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/ Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/ Client Libraries Client Libraries MiNiFi MiNiFi NiFi NiFi NiFi NiFi NiFi NiFi Client Libraries
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi Managed Dataflow SOURCES REGIONAL INFRASTRUCTURE CORE INFRASTRUCTURE
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi is based on Flow Based Programming (FBP) FBP Term NiFi Term Description Information Packet FlowFile Each object moving through the system. Black Box FlowFile Processor Performs the work, doing some combination of data routing, transformation, or mediation between systems. Bounded Buffer Connection The linkage between processors, acting as queues and allowing various processes to interact at differing rates. Scheduler Flow Controller Maintains the knowledge of how processes are connected, and manages the threads and allocations thereof which all processes use. Subnet Process Group A set of processes and their connections, which can receive and send data via ports. A process group allows creation of entirely new component simply by composition of its components.
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved FlowFiles & Data Agnosticism  NiFi is data agnostic!  But, NiFi was designed understanding that users can care about specifics and provides tooling to interact with specific formats, protocols, etc. ISO 8601 - http://xkcd.com/1179/ Robustness principle Be conservative in what you do, be liberal in what you accept from others“
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved FlowFiles are like HTTP data HTTP Data FlowFile HTTP/1.1 200 OK Date: Sun, 10 Oct 2010 23:26:07 GMT Server: Apache/2.2.8 (CentOS) OpenSSL/0.9.8g Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT ETag: "45b6-834-49130cc1182c0" Accept-Ranges: bytes Content-Length: 13 Connection: close Content-Type: text/html Hello world! Standard FlowFile Attributes Key: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016' Key: 'lineageStartDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016' Key: 'fileSize’ Value: '23609' FlowFile Attribute Map Content Key: 'filename’ Value: '15650246997242' Key: 'path’ Value: './’ Binary Content * Header Content
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda What is dataflow and what are the challenges? Apache NiFi Architecture Live Demo Community
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved OS/Host JVM Flow Controller Web Server Processor 1 Extension N FlowFile Repository Content Repository Provenance Repository Local Storage OS/Host JVM Flow Controller Web Server Processor 1 Extension N FlowFile Repository Content Repository Provenance Repository Local Storage Architecture OS/Host JVM NiFi Cluster Manger – Request Replicator Web Server Master NiFi Cluster Manager (NCM) OS/Host JVM Flow Controller Web Server Processor 1 Extension N FlowFile Repository Content Repository Provenance Repository Local Storage Slaves NiFi Nodes
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi Architecture – Repositories - Pass by reference FlowFile Content Provenance F1 C1 C1 P1 F1 Excerpt of demo flow… What’s happening inside the repositories… BEFORE AFTER F2 C1 C1 P3 F2 – Clone (F1) F1 C1 P2 F1 – Route P1 F1 – Create
  • 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi Architecture – Repositories – Copy on Write FlowFile Content Provenance F1 C1 C1 P1 F1 - CREATE Excerpt of demo flow… What’s happening inside the repositories… BEFORE AFTER F1 C1 F1.1 C2 C2 (encrypted) C1 (plaintext) P2 F1.1 - MODIFY P1 F1 - CREATE
  • 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda What is dataflow and what are the challenges? Apache NiFi Architecture Demo Community
  • 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Learn, Share at Birds of a Feather Streaming, DataFlow & Cybersecurity Thursday June 30 6:30 pm, Ballroom C
  • 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You
  • 33. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda What is dataflow and what are the challenges? Apache NiFi Architecture Demo Community
  • 34. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Matured at NSA 2006-2014 Brief history of the Apache NiFi Community • Contributors from Government and several commercial industries • Releases on a 6-8 week schedule • Apache NiFi 1.0.0. release on the horizon • Zero-Master Clustering Code developed at NSA 2006 Today Achieved TLP status in just 7 months July 2015 Code available open source ASL v2 November 2014
  • 35. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extension / Integration Points NiFi Term Description Flow File Processor Push/Pull behavior. Custom UI Reporting Task Used to push data from NiFi to some external service (metrics, provenance, etc..) Controller Service Used to enable reusable components / shared services throughout the flow REST API Allows clients to connect to pull information, change behavior, etc..
  • 36. 36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 37. 37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 38. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Learn more and join us! Apache NiFi site http://nifi.apache.org Subproject MiNiFi site http://nifi.apache.org/minifi/ Subscribe to and collaborate at [email protected] [email protected] Submit Ideas or Issues https://issues.apache.org/jira/browse/NIFI Follow us on Twitter @apachenifi

Editor's Notes

  • #28: Introduce the architecture of NiFi, describe major system components, and describe the single node and clustering models. For each component describe its available (and potential)deployment models (relate it to Hadoop).