SlideShare a Scribd company logo
Data at Scales &
the Values of
Starting Small
Aldrin Piri - @aldrinpiri
DataWorks Summit 2017 – Munich
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key: 'Apache NiFi’
Value: 'PMC Member'
Key: 'Work’
Value: ’Sr. Member of Technical Staff @ Hortonworks'
Key: 'Working with NiFi Since’
Value: '2010’
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Apache NiFi: A Primer
Apache MiNiFi
Architecture
Apache NiFi: The Ecosystem
Community
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Byte Scales for Data
SI Prefix
- 10
0
kilo 10
3
mega 10
6
giga 10
9
tera 10
12
peta 10
15
exa 10
18
zetta 10
21
yotta 10
24
“Big Data”
”everything else”
Greek
for
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The Problem at Hand
Producers A.K.A Things
Anything
AND
Everything
Internet!
Consumers
• User
• Storage
• System
• …More Things
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use Case: Courier Service
Physical Store
Gateway
Server
Mobile Devices
Registers
Server Cluster
Distribution Center
Kafka
Core Data Center at HQ
Server Cluster
Others
Storm / Spark /
Flink / Apex
Kafka
Storm / Spark / Flink / Apex
On Delivery Routes
Trucks Deliverers
Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/
Deliverer: Rigo Peter, https://thenounproject.com/rigo/
Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/
Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/
NiFi NiFi NiFi NiFi NiFi NiFi
Gathering data from disparate sources
NiFi
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi: A Primer
Key Features and Principles
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Recovery/recording
a rolling log of fine-
grained history
• Visual command and
control
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi is based on Flow Based Programming (FBP)
FBP Term NiFi Term Description
Information
Packet
FlowFile Each object moving through the system.
Black Box FlowFile
Processor
Performs the work, doing some combination of data routing, transformation,
or mediation between systems.
Bounded
Buffer
Connection The linkage between processors, acting as queues and allowing various
processes to interact at differing rates.
Scheduler Flow
Controller
Maintains the knowledge of how processes are connected, and manages the
threads and allocations thereof which all processes use.
Subnet Process
Group
A set of processes and their connections, which can receive and send data via
ports. A process group allows creation of entirely new component simply by
composition of its components.
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
FlowFiles are like HTTP data
HTTP Data FlowFile
HTTP/1.1 200 OK
Date: Sun, 10 Oct 2010 23:26:07 GMT
Server: Apache/2.2.8 (CentOS) OpenSSL/0.9.8g
Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT
ETag: "45b6-834-49130cc1182c0"
Accept-Ranges: bytes
Content-Length: 13
Connection: close
Content-Type: text/html
Hello world!
Standard FlowFile Attributes
Key: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016'
Key: 'lineageStartDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016'
Key: 'fileSize’ Value: '23609'
FlowFile Attribute Map Content
Key: 'filename’ Value: '15650246997242'
Key: 'path’ Value: './’
Binary Content *
Header
Content
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
FlowFiles & Data Agnosticism
 NiFi is data agnostic!
 But, NiFi was designed understanding that users
can care about specifics and provides tooling
to interact with specific formats, protocols, etc.
ISO 8601 - http://xkcd.com/1179/
Robustness principle
Be conservative in what you do,
be liberal in what you accept from others“
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache MiNiFi
 NiFi lives in the data center. Give it an
enterprise server or a cluster of them.
 MiNiFi lives as close to where data is born
and is a guest on that device or system
“Let me get the key parts of NiFi close to where data begins and provide bidirectional
data transfer"
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache MiNiFi
 Limited computing capability
 Limited power/network
 Restricted software library/platform availability
 No UI
 Physically inaccessible
 Not frequently updated
 Competing standards/protocols
 Scalability
 Privacy & Security
Realities of computing outside the comforts of the data center
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
MiNiFi: Precedent from NiFi
 Provides the semantics between two NiFi components across network boundaries
– A custom protocol for inter-NiFi communication
– Secure, Extensible, Load Balanced & Scalable Delivery to Cluster
 Extracted out to a client library which powers integration into popular frameworks like
Apache Spark, Apache Storm, Apache Flink, and Apache Apex
 Attributes and the FlowFile format maintained
A quick look at NiFi Site to Site
https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
MiNiFi: Precedent from NiFi
 Fine-grained, event level access of interactions with FlowFiles
– CREATE, RECEIVE, FETCH, SEND, DOWNLOAD, DROP, EXPIRE, FORK, JOIN …
 Captures the associated attributes/metadata at the time of the event
 A map of a FlowFile’s journey and how they relate to other FlowFiles in a system
– MiNiFi enables us to get more and further illuminate the map of data processing
A deeper dive into provenance
http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data-provenance
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
MiNiFi: Precedent from NiFi
RECEIVE event
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache MiNiFi
 The feedback loop is longer and not
guaranteed
– Removal of Web Server and UI
 Declarative configuration
– Lends itself well to CM processes
– Extensible interface to support varying formats
• Currently provided in YAML
 Reduced set of bundled components
Departures from NiFi in getting the right fit
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache MiNiFi: Scoping
 Go small: Java – Write once, run anywhere*
– Feature parity and reuse of core NiFi libraries
 Go smaller: C++ – Write once**, run anywhere
 Go smallest: Write n-many times, run anywhere
Language libraries to support tagging, FlowFile format, Site to Site protocol, and
provenance generation without a processing framework
– Mobile: Android & iOS
– Language SDKs
Provide all the key principles of NiFi in varying, smaller footprints
WHAT IS THIS!?
A NiFi FOR ANTS!?!
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
MiNiFi: Use Case - Connected Car
 Outside vehicle’s
network firewall
 On telematics layer
VEHICLE NETWORK FIREWALL
TRANSMIT
EXECUTE FILTER
PRIORITIZE
PARSE
LISTEN
ROUTE
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Connecting the Drops
SOURCES
REGIONAL
INFRASTRUCTURE
CORE
INFRASTRUCTURE
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use Case: Courier Service with Apache NiFi & MiNiFi
Physical Store
Gateway
Server
Mobile Devices
Registers
Server Cluster
Distribution Center
Kafka
Core Data Center at HQ
Server Cluster
Others
Storm / Spark /
Flink / Apex
Kafka
Storm / Spark / Flink / Apex
On Delivery Routes
Trucks Deliverers
Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/
Deliverer: Rigo Peter, https://thenounproject.com/rigo/
Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/
Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/
Client
Libraries
Client
Libraries
MiNiFi
MiNiFi
NiFi NiFi NiFi NiFi NiFi NiFi
Client
Libraries
Gathering data from disparate sources
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved© Hortonworks Inc. 2011 – 2017. All Rights ReservedX
Data Provenance
▪ Constrained
▪ High-latency
▪ Localized context
▪ Hybrid – cloud/on-premises
▪ Low-latency
▪ Global context
Origin – attribution
Replay – recovery
Evolution of topologies
Long retention
Types of Lineage
• Event
• Configuration
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi: The Ecosystem
 Site-to-Site in MiNiFi instances provides machine-to-machine (M2M) communication
– Data arrives to NiFi in a transparent manner allowing integration to existing flows
 Similar attention to extensibility in both Java and C++ clients allows agents to fit the
needs of your organization
 Reduced footprint allows NiFi functionality to aid in production of high fidelity data,
more closely attributable and tracked from where it is generated
We’ve provided a framework to extend the reach of data ingest
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi: The Ecosystem
 Enter the MiNiFi Command & Control
– Provide tooling to map the UX of
interactive command and control in NiFi
to the design and deploy approach of
MiNiFi
But more instances complicate my operational management!
https://cwiki.apache.org/confluence/display/MINIFI/MiNiFi+Command+and+Control
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi: The Ecosystem
 Configuration Management of Flows & Versioning
– The evolution of templates to better support SDLC functions
– https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows
 Extension Repositories
– Publish & Share extension bundles (NARs)
– https://cwiki.apache.org/confluence/display/NIFI/Extension+Repositories+%28aka+Extension+Regis
try%29+for+Dynamically-loaded+Extensions
 Variable Registry
– Initial framework support & file-based implementation
– https://cwiki.apache.org/confluence/display/NIFI/Variable+Registry
Building on efforts for reusable components in the community
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why Apache NiFi & MiNiFi?
 Moving data is multifaceted in its challenges and these are present in different contexts
at varying scopes
– Think of our courier example and organizations like it: inter vs intra, domestically, internationally
 Provide common tooling and extensions that are commonly needed but be flexible for
extension
– Leverage existing libraries and expansive Java ecosystem for functionality
– Allow organizations to integrate with their existing infrastructure
 Empower folks managing your infrastructure to make changes and reason about issues
that are occurring
– Data Provenance to show context and data’s journey
– User Interface/Experience a key component
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Learn more and join us!
Apache NiFi site
https://nifi.apache.org
Subproject MiNiFi site
https://nifi.apache.org/minifi/
Subscribe to and collaborate at
dev@nifi.apache.org
users@nifi.apache.org
Submit Ideas or Issues
https://issues.apache.org/jira/browse/NIFI
https://issues.apache.org/jira/browse/MINIFI
Follow us on Twitter
@apachenifi
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi Crash Course
Thursday, 6 April
11:15 AM – 1:45PM, Room 12
• Learn more about NiFi, the community, and work through a hands-on lab
• Seats available on a first come, first served basis
• Make sure you are in possession of the latest version of VirtualBox
• More details: https://tinyurl.com/nifi-cc-munich17
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Learn, Share at Birds of a Feather
IOT, STREAMING & DATA FLOW
Thursday, April 6
5:50 pm, Room 5
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You

More Related Content

What's hot (20)

PDF
Running Apache NiFi with Apache Spark : Integration Options
Timothy Spann
 
PDF
Apache NiFi: latest developments for flow management at scale
Abdelkrim Hadjidj
 
PPTX
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Aldrin Piri
 
PDF
Introduction to data flow management using apache nifi
Anshuman Ghosh
 
PPTX
BigData Techcon - Beyond Messaging with Apache NiFi
Aldrin Piri
 
PDF
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
DataWorks Summit
 
PPTX
Apache NiFi in the Hadoop Ecosystem
Bryan Bende
 
PPTX
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Haimo Liu
 
PDF
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 
PPTX
Hadoop Summit Tokyo Apache NiFi Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
DataWorks Summit
 
PPTX
Apache NiFi Crash Course Intro
DataWorks Summit/Hadoop Summit
 
PPTX
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Data Con LA
 
PDF
Introduction to Apache NiFi 1.11.4
Timothy Spann
 
PPTX
The Avant-garde of Apache NiFi
DataWorks Summit/Hadoop Summit
 
PPTX
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks
 
PDF
Apache NiFi User Guide
Deon Huang
 
PDF
Nifi workshop
Yifeng Jiang
 
PPTX
Connecting the Drops with Apache NiFi & Apache MiNiFi
DataWorks Summit
 
PDF
Apache NiFi Meetup - Princeton NJ 2016
Timothy Spann
 
Running Apache NiFi with Apache Spark : Integration Options
Timothy Spann
 
Apache NiFi: latest developments for flow management at scale
Abdelkrim Hadjidj
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Aldrin Piri
 
Introduction to data flow management using apache nifi
Anshuman Ghosh
 
BigData Techcon - Beyond Messaging with Apache NiFi
Aldrin Piri
 
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
DataWorks Summit
 
Apache NiFi in the Hadoop Ecosystem
Bryan Bende
 
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Haimo Liu
 
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 
Hadoop Summit Tokyo Apache NiFi Crash Course
DataWorks Summit/Hadoop Summit
 
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
DataWorks Summit
 
Apache NiFi Crash Course Intro
DataWorks Summit/Hadoop Summit
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Data Con LA
 
Introduction to Apache NiFi 1.11.4
Timothy Spann
 
The Avant-garde of Apache NiFi
DataWorks Summit/Hadoop Summit
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks
 
Apache NiFi User Guide
Deon Huang
 
Nifi workshop
Yifeng Jiang
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
DataWorks Summit
 
Apache NiFi Meetup - Princeton NJ 2016
Timothy Spann
 

Similar to Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi (20)

PPTX
State of the Apache NiFi Ecosystem & Community
Accumulo Summit
 
PDF
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
DataWorks Summit
 
PDF
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 
PDF
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
DataWorks Summit
 
PDF
Apache Nifi Crash Course
DataWorks Summit
 
PDF
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
DataWorks Summit
 
PPTX
The Avant-garde of Apache NiFi
Joe Percivall
 
PPTX
NJ Hadoop Meetup - Apache NiFi Deep Dive
Bryan Bende
 
PPTX
Apache NiFi- MiNiFi meetup Slides
Isheeta Sanghi
 
PPTX
MiNiFi 0.0.1 MeetUp talk
Joe Percivall
 
PDF
Devnexus 2018 - Let Your Data Flow with Apache NiFi
Bryan Bende
 
PDF
Enterprise IIoT Edge Processing with Apache NiFi
Timothy Spann
 
PDF
Apache NiFi - Flow Based Programming Meetup
Joseph Witt
 
PPTX
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks
 
PPTX
HDF Powered by Apache NiFi Introduction
Milind Pandit
 
PPTX
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA
 
PDF
Joe Witt presentation on Apache NiFi
Mark Kerzner
 
PDF
HDF: Hortonworks DataFlow: Technical Workshop
Hortonworks
 
PPTX
Introduction to Apache NiFi - Seattle Scalability Meetup
Saptak Sen
 
PPTX
Apache NiFi 1.0 in Nutshell
DataWorks Summit/Hadoop Summit
 
State of the Apache NiFi Ecosystem & Community
Accumulo Summit
 
Intelligently Collecting Data at the Edge – Intro to Apache MiNiFi
DataWorks Summit
 
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
DataWorks Summit
 
Apache Nifi Crash Course
DataWorks Summit
 
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
DataWorks Summit
 
The Avant-garde of Apache NiFi
Joe Percivall
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
Bryan Bende
 
Apache NiFi- MiNiFi meetup Slides
Isheeta Sanghi
 
MiNiFi 0.0.1 MeetUp talk
Joe Percivall
 
Devnexus 2018 - Let Your Data Flow with Apache NiFi
Bryan Bende
 
Enterprise IIoT Edge Processing with Apache NiFi
Timothy Spann
 
Apache NiFi - Flow Based Programming Meetup
Joseph Witt
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks
 
HDF Powered by Apache NiFi Introduction
Milind Pandit
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA
 
Joe Witt presentation on Apache NiFi
Mark Kerzner
 
HDF: Hortonworks DataFlow: Technical Workshop
Hortonworks
 
Introduction to Apache NiFi - Seattle Scalability Meetup
Saptak Sen
 
Apache NiFi 1.0 in Nutshell
DataWorks Summit/Hadoop Summit
 
Ad

Recently uploaded (20)

PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PPTX
Powerful Uses of Data Analytics You Should Know
subhashenia
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PDF
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
PPTX
How to Add Columns and Rows in an R Data Frame
subhashenia
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PDF
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
PPTX
big data eco system fundamentals of data science
arivukarasi
 
PPTX
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PPTX
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PPTX
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
Powerful Uses of Data Analytics You Should Know
subhashenia
 
BinarySearchTree in datastructures in detail
kichokuttu
 
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
How to Add Columns and Rows in an R Data Frame
subhashenia
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
big data eco system fundamentals of data science
arivukarasi
 
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
Ad

Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

  • 1. Data at Scales & the Values of Starting Small Aldrin Piri - @aldrinpiri DataWorks Summit 2017 – Munich
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key: 'Apache NiFi’ Value: 'PMC Member' Key: 'Work’ Value: ’Sr. Member of Technical Staff @ Hortonworks' Key: 'Working with NiFi Since’ Value: '2010’
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Apache NiFi: A Primer Apache MiNiFi Architecture Apache NiFi: The Ecosystem Community
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Byte Scales for Data SI Prefix - 10 0 kilo 10 3 mega 10 6 giga 10 9 tera 10 12 peta 10 15 exa 10 18 zetta 10 21 yotta 10 24 “Big Data” ”everything else” Greek for
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The Problem at Hand Producers A.K.A Things Anything AND Everything Internet! Consumers • User • Storage • System • …More Things
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use Case: Courier Service Physical Store Gateway Server Mobile Devices Registers Server Cluster Distribution Center Kafka Core Data Center at HQ Server Cluster Others Storm / Spark / Flink / Apex Kafka Storm / Spark / Flink / Apex On Delivery Routes Trucks Deliverers Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/ Deliverer: Rigo Peter, https://thenounproject.com/rigo/ Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/ Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/ NiFi NiFi NiFi NiFi NiFi NiFi Gathering data from disparate sources NiFi
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi: A Primer Key Features and Principles • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Recovery/recording a rolling log of fine- grained history • Visual command and control • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi is based on Flow Based Programming (FBP) FBP Term NiFi Term Description Information Packet FlowFile Each object moving through the system. Black Box FlowFile Processor Performs the work, doing some combination of data routing, transformation, or mediation between systems. Bounded Buffer Connection The linkage between processors, acting as queues and allowing various processes to interact at differing rates. Scheduler Flow Controller Maintains the knowledge of how processes are connected, and manages the threads and allocations thereof which all processes use. Subnet Process Group A set of processes and their connections, which can receive and send data via ports. A process group allows creation of entirely new component simply by composition of its components.
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved FlowFiles are like HTTP data HTTP Data FlowFile HTTP/1.1 200 OK Date: Sun, 10 Oct 2010 23:26:07 GMT Server: Apache/2.2.8 (CentOS) OpenSSL/0.9.8g Last-Modified: Sun, 26 Sep 2010 22:04:35 GMT ETag: "45b6-834-49130cc1182c0" Accept-Ranges: bytes Content-Length: 13 Connection: close Content-Type: text/html Hello world! Standard FlowFile Attributes Key: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016' Key: 'lineageStartDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016' Key: 'fileSize’ Value: '23609' FlowFile Attribute Map Content Key: 'filename’ Value: '15650246997242' Key: 'path’ Value: './’ Binary Content * Header Content
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved FlowFiles & Data Agnosticism  NiFi is data agnostic!  But, NiFi was designed understanding that users can care about specifics and provides tooling to interact with specific formats, protocols, etc. ISO 8601 - http://xkcd.com/1179/ Robustness principle Be conservative in what you do, be liberal in what you accept from others“
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache MiNiFi  NiFi lives in the data center. Give it an enterprise server or a cluster of them.  MiNiFi lives as close to where data is born and is a guest on that device or system “Let me get the key parts of NiFi close to where data begins and provide bidirectional data transfer"
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache MiNiFi  Limited computing capability  Limited power/network  Restricted software library/platform availability  No UI  Physically inaccessible  Not frequently updated  Competing standards/protocols  Scalability  Privacy & Security Realities of computing outside the comforts of the data center
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved MiNiFi: Precedent from NiFi  Provides the semantics between two NiFi components across network boundaries – A custom protocol for inter-NiFi communication – Secure, Extensible, Load Balanced & Scalable Delivery to Cluster  Extracted out to a client library which powers integration into popular frameworks like Apache Spark, Apache Storm, Apache Flink, and Apache Apex  Attributes and the FlowFile format maintained A quick look at NiFi Site to Site https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved MiNiFi: Precedent from NiFi  Fine-grained, event level access of interactions with FlowFiles – CREATE, RECEIVE, FETCH, SEND, DOWNLOAD, DROP, EXPIRE, FORK, JOIN …  Captures the associated attributes/metadata at the time of the event  A map of a FlowFile’s journey and how they relate to other FlowFiles in a system – MiNiFi enables us to get more and further illuminate the map of data processing A deeper dive into provenance http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data-provenance
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved MiNiFi: Precedent from NiFi RECEIVE event
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache MiNiFi  The feedback loop is longer and not guaranteed – Removal of Web Server and UI  Declarative configuration – Lends itself well to CM processes – Extensible interface to support varying formats • Currently provided in YAML  Reduced set of bundled components Departures from NiFi in getting the right fit
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache MiNiFi: Scoping  Go small: Java – Write once, run anywhere* – Feature parity and reuse of core NiFi libraries  Go smaller: C++ – Write once**, run anywhere  Go smallest: Write n-many times, run anywhere Language libraries to support tagging, FlowFile format, Site to Site protocol, and provenance generation without a processing framework – Mobile: Android & iOS – Language SDKs Provide all the key principles of NiFi in varying, smaller footprints
  • 19. WHAT IS THIS!? A NiFi FOR ANTS!?!
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved MiNiFi: Use Case - Connected Car  Outside vehicle’s network firewall  On telematics layer VEHICLE NETWORK FIREWALL TRANSMIT EXECUTE FILTER PRIORITIZE PARSE LISTEN ROUTE
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Connecting the Drops SOURCES REGIONAL INFRASTRUCTURE CORE INFRASTRUCTURE
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use Case: Courier Service with Apache NiFi & MiNiFi Physical Store Gateway Server Mobile Devices Registers Server Cluster Distribution Center Kafka Core Data Center at HQ Server Cluster Others Storm / Spark / Flink / Apex Kafka Storm / Spark / Flink / Apex On Delivery Routes Trucks Deliverers Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/ Deliverer: Rigo Peter, https://thenounproject.com/rigo/ Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/ Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/ Client Libraries Client Libraries MiNiFi MiNiFi NiFi NiFi NiFi NiFi NiFi NiFi Client Libraries Gathering data from disparate sources
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved© Hortonworks Inc. 2011 – 2017. All Rights ReservedX Data Provenance ▪ Constrained ▪ High-latency ▪ Localized context ▪ Hybrid – cloud/on-premises ▪ Low-latency ▪ Global context Origin – attribution Replay – recovery Evolution of topologies Long retention Types of Lineage • Event • Configuration
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi: The Ecosystem  Site-to-Site in MiNiFi instances provides machine-to-machine (M2M) communication – Data arrives to NiFi in a transparent manner allowing integration to existing flows  Similar attention to extensibility in both Java and C++ clients allows agents to fit the needs of your organization  Reduced footprint allows NiFi functionality to aid in production of high fidelity data, more closely attributable and tracked from where it is generated We’ve provided a framework to extend the reach of data ingest
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi: The Ecosystem  Enter the MiNiFi Command & Control – Provide tooling to map the UX of interactive command and control in NiFi to the design and deploy approach of MiNiFi But more instances complicate my operational management! https://cwiki.apache.org/confluence/display/MINIFI/MiNiFi+Command+and+Control
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi: The Ecosystem  Configuration Management of Flows & Versioning – The evolution of templates to better support SDLC functions – https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows  Extension Repositories – Publish & Share extension bundles (NARs) – https://cwiki.apache.org/confluence/display/NIFI/Extension+Repositories+%28aka+Extension+Regis try%29+for+Dynamically-loaded+Extensions  Variable Registry – Initial framework support & file-based implementation – https://cwiki.apache.org/confluence/display/NIFI/Variable+Registry Building on efforts for reusable components in the community
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why Apache NiFi & MiNiFi?  Moving data is multifaceted in its challenges and these are present in different contexts at varying scopes – Think of our courier example and organizations like it: inter vs intra, domestically, internationally  Provide common tooling and extensions that are commonly needed but be flexible for extension – Leverage existing libraries and expansive Java ecosystem for functionality – Allow organizations to integrate with their existing infrastructure  Empower folks managing your infrastructure to make changes and reason about issues that are occurring – Data Provenance to show context and data’s journey – User Interface/Experience a key component
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Learn more and join us! Apache NiFi site https://nifi.apache.org Subproject MiNiFi site https://nifi.apache.org/minifi/ Subscribe to and collaborate at [email protected] [email protected] Submit Ideas or Issues https://issues.apache.org/jira/browse/NIFI https://issues.apache.org/jira/browse/MINIFI Follow us on Twitter @apachenifi
  • 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi Crash Course Thursday, 6 April 11:15 AM – 1:45PM, Room 12 • Learn more about NiFi, the community, and work through a hands-on lab • Seats available on a first come, first served basis • Make sure you are in possession of the latest version of VirtualBox • More details: https://tinyurl.com/nifi-cc-munich17
  • 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Learn, Share at Birds of a Feather IOT, STREAMING & DATA FLOW Thursday, April 6 5:50 pm, Room 5
  • 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You