SlideShare a Scribd company logo
Introduction to DataFlow
management using Apache NiFi
Presented by: Anshuman Ghosh
Topics we will cover
 DataFlow and problems.
 What is Apache NiFi – History, key features, core components
 Architecture To start with NiFi (Single server setup)
 Architecture To scale with NiFi (NiFi cluster setup)
 Fundamentals of NiFi Web UI
 Building a NiFi DataFlow Processor
 Live demo
 Testing
 Deployment and automation
 What next?
 Q&A
DataFlow
 The term “DataFlow” can be used in variety of contexts.
 In our context it is the flow of information between systems.
 It is crucial to have a robust platform to create, manage and automate the
flow of enterprise data.
 There are many tools for data gathering and data flow, but more often
than not we lack an integrated platform for that.
 Probably an ideal situation would be have a seamless integration ,..
What enterprises look for
To be able to get data from any source
… To the systems that performs Analytics
… And to those for user availability
Common DataFlow challenges
 System failure
 Difference between data production and consumption
 Change in dynamic data priority
 Protocols and format changes; new systems, new protocols
 Need of bidirectional data flow
 Transparency and control
 Security and privacy
Brief history of Apache NiFi
 Developed at NSA (National Security Agency, USA) for over 8 years.
 Onyara engineers, for NSA, have developed a project called “Niagara
Files” which later went on to become NiFi.
 Trough NSA Technology transfer program it was made available as an open
source Apache project “Apache NiFi” in the year 2014.
 Hortonworks has a partnership with Onyara on their “Hortonworks DataFlow
powered by Apache NiFi”
What is Apache NiFi
 Holistically Apache NiFi is an integrated platform to collect, conduct and
curate real-time data (data in motion).
 Provides an end to end DataFlow management from any source* to any
destination*.
 Provides data logistics – real-time operational visibility and control of
DataFlow.
 Supports powerful and scalable directed graphs of data routing and data
transformation.
 All these in a reliable and secure manner.
*complete list of source and destination on official documentation
Key features
 Guaranteed data delivery – “at least once” semantics
 Data buffering and Back pressure
 Data prioritization in queue
 Flow specific setting for “latency vs. throughput”
 Data provenance
 Visual control
 Flow templates
 Recovery/ Recording through content repository
 Clustering to scale-out
 Security
 Classloader Isolation
Core components of NiFi
 NiFi at it’s core follow the concept of Flow Based programming.
 Core components of NiFi are
 FlowFile – the unit of information packet
 FlowFile Processor – the processing engine; black box.
 Connection – the relation between Processors and bounded buffer.
 Flow Controller – the scheduler in real world.
 Process Group – the compact function or subnet
Core components diagram
 This is how a typical NiFi DataFlow might look
NiFi Architecture
 NiFi executes within a JVM on a host Operating System.
NiFi Architecture – Clustering
 Typical NiFi cluster
Core components of NiFi Cluster
 NiFi Cluster Manager
 Nodes
 Primary Node
 Isolated Processors
 Heartbeats
Fundamentals of the Web UI
Building a DataFlow Processor
 Drag the “Processor” icon from “Component Toolbar” into the canvas; this
will provide a ‘Add Processor’ wizard
Building a DataFlow Processor
 General ‘SETTINGS’ for the processor
Building a DataFlow Processor
 ‘SCHEDULING’ information
Building a DataFlow Processor
 Setting up mandatory and optional ‘PROPERTIES’
Building a DataFlow Processor
 Auto alert mechanism
 If there is an error it will not allow to start the processor
Building a DataFlow Processor
 If everything is se, we are ready to initiate/ start the process
Demo 1
 In this demo, we will go through a NiFi DataFlow that deals with the
following steps
 Connect to Kafka and consume from a topic.
 Store consumed data in a local storage (optional).
 Anonymize IP address.
 Merge content before writing to HDFS (small file issues).
 Finally store Kafka data onto HDFS
 Look into error handling.
 Look into use of expression language.
Introduction to data flow management using apache nifi
Demo 2
 In this demo, we will go through a NiFi DataFlow that deals with the
following steps
 Collect/ fetch data files from a local location.
 Update/ add attributes.
 Parse JSON strings to DB Insert statements.
 Connect to PostgreSQL and Insert.
 Error handling.
Introduction to data flow management using apache nifi
Unit testing components
 For component testing nifi-mock module can be used with JUnit.
 The TestRunner interface allows us to test Processors and Controller Services.
 We need to instantiate and get a new TestRunner (org.apache.nifi.util)
 Add Controller Services and configure
 Set property of Processors setProperty(PropertyDescriptor, String)
 Enqueue FlowFiles by using the enqueue methods of the TestRunner class.
 Processor can be started by triggering run() method of TestRunner.
 Validate output – using the TestRunners assertAllFlowFilesTransferred and
assertTransferCount methods.
 More details can be found here – https://nifi.apache.org/docs/nifi-
docs/html/developer-guide.html#testing
 Add Maven dependency
 Call static newTestRunner method of the TestRunners class
 Call addControllerService method to add controller
 Set properties by setProperty(ControllerService, PropertyDescriptor, String)
 Enable services by enableControllerService(ControllerService)
 Set processor property setProperty(PropertyDescriptor, String)
 Override enqueue method for byte[], InputStream, or Path.
 run(int); This will call methods with @OnScheduled annotation, Processor’s
onTrigger method, and then run the @OnUnscheduled and finally @OnStopped
methods.
 Validate result by assertAllFlowFilesTransferred and assertTransferCount methods.
 Access FlowFiles by calling getFlowFilesForRelationship() method
Error handling
 Following can occur
 Unexpected data format
 Network connection, disk failure
 Bug in processor
 ProcessException and all others (like null pointer)
 ProcessException – Rollback and penalize the FlowFiles
 All others – Rollback, penalize the FlowFiles and Yield the Processor
Testing automation, Deployment
 NiFi provides ‘ReST’ API for all components and entire documentation can
be found here https://nifi.apache.org/docs/nifi-docs/rest-api/index.html
 Apache NiFi Community is working to improve on this area
 We can setup the deployment in following way
 Create an application i.e. entire DataFlow in your local machine and test.
 Create a process group around that (optional though)
 Create a template. (Can be done from Web UI/ ReST API call)
 Download the template. (Can be done from Web UI/ ReST API call)
 Use ReST API call to import the template in new environment.
 Use ReST API call to Update Processors (Properties, Schedule, and Settings etc.)
 Use ReST API call to Instantiate a template
Deployment
 There can be one more option to do it.
 Copying the whole flow (flow.xml.gz) from one environment to another
 Need to copy the entire canvas.
 Need to take care of sensitive properties encryption.
What is next
 We are planning to work on the testing, deployment side and update it.
 Please read more on NiFi development here –
https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html
 And for user guide – https://nifi.apache.org/docs/nifi-docs/html/user-
guide.html
 We have carried out POCs on some of our real use cases; please find them
here
 Link HDFS data ingestion using Apache
 Link How to setup Apache NiFi
 Link Expression Language Guide
 Any questions and/ or suggestions please come by or write 
Q&A
 Questions?
Thank you!
Presented by: Anshuman Ghosh

More Related Content

What's hot (20)

PPTX
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
PDF
Introduction to Apache Flink
datamantra
 
PDF
Nifi workshop
Yifeng Jiang
 
PPTX
Apache NiFi Crash Course Intro
DataWorks Summit/Hadoop Summit
 
PDF
Introduction to Apache NiFi dws19 DWS - DC 2019
Timothy Spann
 
PDF
Spark streaming , Spark SQL
Yousun Jeong
 
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
PDF
NiFi Developer Guide
Deon Huang
 
PDF
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
ScaleGrid.io
 
PDF
Running Apache NiFi with Apache Spark : Integration Options
Timothy Spann
 
PDF
Nifi
Julio Castro
 
PDF
Spark SQL
Joud Khattab
 
PPTX
ORC File - Optimizing Your Big Data
DataWorks Summit
 
PDF
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 
PPTX
The columnar roadmap: Apache Parquet and Apache Arrow
DataWorks Summit
 
PPTX
Securing Hadoop with Apache Ranger
DataWorks Summit
 
PDF
Apache ZooKeeper
Scott Leberknight
 
PDF
Introduction to Spark Internals
Pietro Michiardi
 
PDF
Apache Flume
Arinto Murdopo
 
PDF
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Introduction to Apache Flink
datamantra
 
Nifi workshop
Yifeng Jiang
 
Apache NiFi Crash Course Intro
DataWorks Summit/Hadoop Summit
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Timothy Spann
 
Spark streaming , Spark SQL
Yousun Jeong
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
NiFi Developer Guide
Deon Huang
 
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
ScaleGrid.io
 
Running Apache NiFi with Apache Spark : Integration Options
Timothy Spann
 
Spark SQL
Joud Khattab
 
ORC File - Optimizing Your Big Data
DataWorks Summit
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 
The columnar roadmap: Apache Parquet and Apache Arrow
DataWorks Summit
 
Securing Hadoop with Apache Ranger
DataWorks Summit
 
Apache ZooKeeper
Scott Leberknight
 
Introduction to Spark Internals
Pietro Michiardi
 
Apache Flume
Arinto Murdopo
 
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 

Viewers also liked (20)

PPTX
Real-Time Data Flows with Apache NiFi
Manish Gupta
 
PDF
Streamsets and spark
Hari Shreedharan
 
PDF
2015 Internet Trends Report
IQbal KHan
 
PPTX
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks
 
PDF
Apache Flink's Table & SQL API - unified APIs for batch and stream processing
Timo Walther
 
PDF
[OracleCode SF] In memory analytics with apache spark and hazelcast
Viktor Gamov
 
PDF
Tracxn Research - Finance & Accounting Landscape, February 2017
Tracxn
 
PDF
Tracxn Research - Construction Tech Landscape, February 2017
Tracxn
 
PPTX
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Lucas Jellema
 
PDF
Akka-chan's Survival Guide for the Streaming World
Konrad Malawski
 
PDF
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Bryan Bende
 
PPTX
Hadoop Summit Tokyo Apache NiFi Crash Course
DataWorks Summit/Hadoop Summit
 
PPTX
2017 biological databases_part1_vupload
Prof. Wim Van Criekinge
 
PPTX
Apache NiFi- MiNiFi meetup Slides
Isheeta Sanghi
 
PDF
3P Learning (3PL) - Earning from Learning - equity research initiation report
George Gabriel
 
PPTX
Comparing 30 MongoDB operations with Oracle SQL statements
Lucas Jellema
 
PDF
Tracxn Research - Healthcare Analytics Landscape, February 2017
Tracxn
 
PDF
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Lightbend
 
PDF
Tracxn Research - Insurance Tech Landscape, February 2017
Tracxn
 
Real-Time Data Flows with Apache NiFi
Manish Gupta
 
Streamsets and spark
Hari Shreedharan
 
2015 Internet Trends Report
IQbal KHan
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks
 
Apache Flink's Table & SQL API - unified APIs for batch and stream processing
Timo Walther
 
[OracleCode SF] In memory analytics with apache spark and hazelcast
Viktor Gamov
 
Tracxn Research - Finance & Accounting Landscape, February 2017
Tracxn
 
Tracxn Research - Construction Tech Landscape, February 2017
Tracxn
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Lucas Jellema
 
Akka-chan's Survival Guide for the Streaming World
Konrad Malawski
 
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Bryan Bende
 
Hadoop Summit Tokyo Apache NiFi Crash Course
DataWorks Summit/Hadoop Summit
 
2017 biological databases_part1_vupload
Prof. Wim Van Criekinge
 
Apache NiFi- MiNiFi meetup Slides
Isheeta Sanghi
 
3P Learning (3PL) - Earning from Learning - equity research initiation report
George Gabriel
 
Comparing 30 MongoDB operations with Oracle SQL statements
Lucas Jellema
 
Tracxn Research - Healthcare Analytics Landscape, February 2017
Tracxn
 
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Lightbend
 
Tracxn Research - Insurance Tech Landscape, February 2017
Tracxn
 
Ad

Similar to Introduction to data flow management using apache nifi (20)

PDF
Apache Nifi Crash Course
DataWorks Summit
 
PPTX
Integração de Dados com Apache NIFI - Marco Garcia Cetax
Marco Garcia
 
PPTX
Connecting the Drops with Apache NiFi & Apache MiNiFi
DataWorks Summit
 
PDF
Data ingestion and distribution with apache NiFi
Lev Brailovskiy
 
PPTX
NJ Hadoop Meetup - Apache NiFi Deep Dive
Bryan Bende
 
PDF
Devnexus 2018 - Let Your Data Flow with Apache NiFi
Bryan Bende
 
PPTX
State of the Apache NiFi Ecosystem & Community
Accumulo Summit
 
PDF
WarsawITDays_ ApacheNiFi202
Timothy Spann
 
PPTX
Best practices and lessons learnt from Running Apache NiFi at Renault
DataWorks Summit
 
PDF
Joe Witt presentation on Apache NiFi
Mark Kerzner
 
PDF
Automate your data flows with Apache NIFI
Adam Doyle
 
PDF
Apache NiFi User Guide
Deon Huang
 
PPTX
Apache NiFi Course PPT for Basic Reference
gamevasani
 
PPTX
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Data Con LA
 
PDF
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 
PPTX
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks
 
PPTX
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Haimo Liu
 
PDF
ApacheCon 2021: Apache NiFi 101- introduction and best practices
Timothy Spann
 
PPTX
HDF Powered by Apache NiFi Introduction
Milind Pandit
 
PPTX
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Aldrin Piri
 
Apache Nifi Crash Course
DataWorks Summit
 
Integração de Dados com Apache NIFI - Marco Garcia Cetax
Marco Garcia
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
DataWorks Summit
 
Data ingestion and distribution with apache NiFi
Lev Brailovskiy
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
Bryan Bende
 
Devnexus 2018 - Let Your Data Flow with Apache NiFi
Bryan Bende
 
State of the Apache NiFi Ecosystem & Community
Accumulo Summit
 
WarsawITDays_ ApacheNiFi202
Timothy Spann
 
Best practices and lessons learnt from Running Apache NiFi at Renault
DataWorks Summit
 
Joe Witt presentation on Apache NiFi
Mark Kerzner
 
Automate your data flows with Apache NIFI
Adam Doyle
 
Apache NiFi User Guide
Deon Huang
 
Apache NiFi Course PPT for Basic Reference
gamevasani
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Data Con LA
 
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks
 
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Haimo Liu
 
ApacheCon 2021: Apache NiFi 101- introduction and best practices
Timothy Spann
 
HDF Powered by Apache NiFi Introduction
Milind Pandit
 
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Aldrin Piri
 
Ad

Recently uploaded (20)

PDF
SQL for Accountants and Finance Managers
ysmaelreyes
 
PPTX
How to Add Columns and Rows in an R Data Frame
subhashenia
 
PPTX
Powerful Uses of Data Analytics You Should Know
subhashenia
 
PDF
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
PDF
InformaticsPractices-MS - Google Docs.pdf
seshuashwin0829
 
PPTX
big data eco system fundamentals of data science
arivukarasi
 
PPTX
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
PDF
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PDF
UNISE-Operation-Procedure-InDHIS2trainng
ahmedabduselam23
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PDF
Group 5_RMB Final Project on circular economy
pgban24anmola
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PPTX
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PPTX
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
PPTX
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
PDF
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
SQL for Accountants and Finance Managers
ysmaelreyes
 
How to Add Columns and Rows in an R Data Frame
subhashenia
 
Powerful Uses of Data Analytics You Should Know
subhashenia
 
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
InformaticsPractices-MS - Google Docs.pdf
seshuashwin0829
 
big data eco system fundamentals of data science
arivukarasi
 
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
BinarySearchTree in datastructures in detail
kichokuttu
 
UNISE-Operation-Procedure-InDHIS2trainng
ahmedabduselam23
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
Group 5_RMB Final Project on circular economy
pgban24anmola
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 

Introduction to data flow management using apache nifi

  • 1. Introduction to DataFlow management using Apache NiFi Presented by: Anshuman Ghosh
  • 2. Topics we will cover  DataFlow and problems.  What is Apache NiFi – History, key features, core components  Architecture To start with NiFi (Single server setup)  Architecture To scale with NiFi (NiFi cluster setup)  Fundamentals of NiFi Web UI  Building a NiFi DataFlow Processor  Live demo  Testing  Deployment and automation  What next?  Q&A
  • 3. DataFlow  The term “DataFlow” can be used in variety of contexts.  In our context it is the flow of information between systems.  It is crucial to have a robust platform to create, manage and automate the flow of enterprise data.  There are many tools for data gathering and data flow, but more often than not we lack an integrated platform for that.  Probably an ideal situation would be have a seamless integration ,..
  • 4. What enterprises look for To be able to get data from any source … To the systems that performs Analytics … And to those for user availability
  • 5. Common DataFlow challenges  System failure  Difference between data production and consumption  Change in dynamic data priority  Protocols and format changes; new systems, new protocols  Need of bidirectional data flow  Transparency and control  Security and privacy
  • 6. Brief history of Apache NiFi  Developed at NSA (National Security Agency, USA) for over 8 years.  Onyara engineers, for NSA, have developed a project called “Niagara Files” which later went on to become NiFi.  Trough NSA Technology transfer program it was made available as an open source Apache project “Apache NiFi” in the year 2014.  Hortonworks has a partnership with Onyara on their “Hortonworks DataFlow powered by Apache NiFi”
  • 7. What is Apache NiFi  Holistically Apache NiFi is an integrated platform to collect, conduct and curate real-time data (data in motion).  Provides an end to end DataFlow management from any source* to any destination*.  Provides data logistics – real-time operational visibility and control of DataFlow.  Supports powerful and scalable directed graphs of data routing and data transformation.  All these in a reliable and secure manner. *complete list of source and destination on official documentation
  • 8. Key features  Guaranteed data delivery – “at least once” semantics  Data buffering and Back pressure  Data prioritization in queue  Flow specific setting for “latency vs. throughput”  Data provenance  Visual control  Flow templates  Recovery/ Recording through content repository  Clustering to scale-out  Security  Classloader Isolation
  • 9. Core components of NiFi  NiFi at it’s core follow the concept of Flow Based programming.  Core components of NiFi are  FlowFile – the unit of information packet  FlowFile Processor – the processing engine; black box.  Connection – the relation between Processors and bounded buffer.  Flow Controller – the scheduler in real world.  Process Group – the compact function or subnet
  • 10. Core components diagram  This is how a typical NiFi DataFlow might look
  • 11. NiFi Architecture  NiFi executes within a JVM on a host Operating System.
  • 12. NiFi Architecture – Clustering  Typical NiFi cluster
  • 13. Core components of NiFi Cluster  NiFi Cluster Manager  Nodes  Primary Node  Isolated Processors  Heartbeats
  • 15. Building a DataFlow Processor  Drag the “Processor” icon from “Component Toolbar” into the canvas; this will provide a ‘Add Processor’ wizard
  • 16. Building a DataFlow Processor  General ‘SETTINGS’ for the processor
  • 17. Building a DataFlow Processor  ‘SCHEDULING’ information
  • 18. Building a DataFlow Processor  Setting up mandatory and optional ‘PROPERTIES’
  • 19. Building a DataFlow Processor  Auto alert mechanism  If there is an error it will not allow to start the processor
  • 20. Building a DataFlow Processor  If everything is se, we are ready to initiate/ start the process
  • 21. Demo 1  In this demo, we will go through a NiFi DataFlow that deals with the following steps  Connect to Kafka and consume from a topic.  Store consumed data in a local storage (optional).  Anonymize IP address.  Merge content before writing to HDFS (small file issues).  Finally store Kafka data onto HDFS  Look into error handling.  Look into use of expression language.
  • 23. Demo 2  In this demo, we will go through a NiFi DataFlow that deals with the following steps  Collect/ fetch data files from a local location.  Update/ add attributes.  Parse JSON strings to DB Insert statements.  Connect to PostgreSQL and Insert.  Error handling.
  • 25. Unit testing components  For component testing nifi-mock module can be used with JUnit.  The TestRunner interface allows us to test Processors and Controller Services.  We need to instantiate and get a new TestRunner (org.apache.nifi.util)  Add Controller Services and configure  Set property of Processors setProperty(PropertyDescriptor, String)  Enqueue FlowFiles by using the enqueue methods of the TestRunner class.  Processor can be started by triggering run() method of TestRunner.  Validate output – using the TestRunners assertAllFlowFilesTransferred and assertTransferCount methods.  More details can be found here – https://nifi.apache.org/docs/nifi- docs/html/developer-guide.html#testing
  • 26.  Add Maven dependency  Call static newTestRunner method of the TestRunners class  Call addControllerService method to add controller  Set properties by setProperty(ControllerService, PropertyDescriptor, String)  Enable services by enableControllerService(ControllerService)  Set processor property setProperty(PropertyDescriptor, String)  Override enqueue method for byte[], InputStream, or Path.  run(int); This will call methods with @OnScheduled annotation, Processor’s onTrigger method, and then run the @OnUnscheduled and finally @OnStopped methods.  Validate result by assertAllFlowFilesTransferred and assertTransferCount methods.  Access FlowFiles by calling getFlowFilesForRelationship() method
  • 27. Error handling  Following can occur  Unexpected data format  Network connection, disk failure  Bug in processor  ProcessException and all others (like null pointer)  ProcessException – Rollback and penalize the FlowFiles  All others – Rollback, penalize the FlowFiles and Yield the Processor
  • 28. Testing automation, Deployment  NiFi provides ‘ReST’ API for all components and entire documentation can be found here https://nifi.apache.org/docs/nifi-docs/rest-api/index.html  Apache NiFi Community is working to improve on this area  We can setup the deployment in following way  Create an application i.e. entire DataFlow in your local machine and test.  Create a process group around that (optional though)  Create a template. (Can be done from Web UI/ ReST API call)  Download the template. (Can be done from Web UI/ ReST API call)  Use ReST API call to import the template in new environment.  Use ReST API call to Update Processors (Properties, Schedule, and Settings etc.)  Use ReST API call to Instantiate a template
  • 29. Deployment  There can be one more option to do it.  Copying the whole flow (flow.xml.gz) from one environment to another  Need to copy the entire canvas.  Need to take care of sensitive properties encryption.
  • 30. What is next  We are planning to work on the testing, deployment side and update it.  Please read more on NiFi development here – https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html  And for user guide – https://nifi.apache.org/docs/nifi-docs/html/user- guide.html  We have carried out POCs on some of our real use cases; please find them here  Link HDFS data ingestion using Apache  Link How to setup Apache NiFi  Link Expression Language Guide  Any questions and/ or suggestions please come by or write 
  • 32. Thank you! Presented by: Anshuman Ghosh