SlideShare a Scribd company logo
Introduction to Apache NiFi 1.11.4
Timothy Spann
Principal DataFlow Field Engineer
Cloudera
@PaasDev
© 2020 Cloudera, Inc. All rights reserved. 2
Welcome to Future of Data - Princeton
@PaasDev
https://www.meetup.com/futureofdata-princeton/
From Big Data to AI to Streaming to Containers to
Cloud to Analytics to Cloud Storage to Fast Data to
Machine Learning to Microservices to ...
© 2020 Cloudera, Inc. All rights reserved. 3
Meetup Presenter
Who am I?
Principal DataFlow Field Engineer
@PaasDev
DZone Zone Leader and Big Data MVB;
Princeton NJ Future of Data Meetup;
ex-Pivotal Field Engineer;
Apache Kafka, Tensorflow, Apache Spark RefCards
https://github.com/tspannhw https://www.datainmotion.dev/
https://dzone.com/users/297029/bunkertor.html
4© 2020 Cloudera, Inc. All rights reserved.
© 2020 Cloudera, Inc. All rights reserved. 5
STORAGE LAYER
sensors
EXAMPLE REFERENCE ARCHITECTURE
Apache NiFi
Apache Kafka
DATA SYNDICATION
SERVICE BY KAFKA
Kafka Topic
iot
DATA FLOW APPS
POWERED BY NIFI
Apache Impala
Cloudera Machine
Learning
MODEL EXECUTION
© 2020 Cloudera, Inc. All rights reserved. 6
Cloudera Flow Management
Enable easy ingestion, routing, management and delivery of any data anywhere (Edge, cloud,
data center) to any downstream system with built in end-to-end security and provenance
ACQUIRE PROCESS DELIVER
• Over 300 Prebuilt Processors
• Easy to build your own
• Parse, Enrich & Apply Schema
• Filter, Split, Merger & Route
• Throttle & Backpressure
• Guaranteed Delivery
• Full data provenance from acquisition to
delivery
• Diverse, Non-Traditional Sources
• Eco-system integration
Advanced tooling to industrialize flow development
(Flow Development Life Cycle)
© 2020 Cloudera, Inc. All rights reserved. 7
NiFi 1.14
© 2020 Cloudera, Inc. All rights reserved. 8
Stateless Engine
• Granular containers per flow
• Flows From NiFi Registry
https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
bin/nifi.sh stateless RunFromRegistry Continuous --file kafka.json
https://github.com/apache/nifi/blob/ea1becac4fc519c54b8b4d21773e68f8da364755/nifi-nar-bundles/nifi-framework-bundle/nifi-
framework/nifi-stateless/README.md
© 2020 Cloudera, Inc. All rights reserved. 9
Stateless Engine
• See also Parameters
• Docker
• YARN
• Kubernetes (K8)
• Stateful NiFi clusters
• Apache OpenWhisk (FaaS)
https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
{"registryUrl": "http://tspann-mbp15-hw14277:18080",
"bucketId": "140b30f0-5a47-4747-9021-19d4fde7f993",
"flowId": "0540e1fd-c7ca-46fb-9296-e37632021945",
"ssl": {
"keystoreFile": "","keystorePass": "","keyPass": "","keystoreType": "",
"truststoreFile":
"/Library/Java/JavaVirtualMachines/amazon-corretto-11.jdk/Contents/Home/lib/sec
urity/cacerts",
"truststorePass": "changeit", "truststoreType": "JKS"
},
"parameters": {
"broker" : "4.317.852.100:9092",
"topic" : "iot",
"group_id" : "nifi-stateless-kafka-consumer",
"DestinationDirectory" : "/tmp/nifistateless/output2/",
"output_dir": "/Users/tspann/Documents/nifi-1.10.0-SNAPSHOT/logs/output"
}
}
https://github.com/tspannhw/stateless-examples
© 2020 Cloudera, Inc. All rights reserved. 10
Parameters
• Parameters
• Parameter Context
https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
© 2020 Cloudera, Inc. All rights reserved. 11
Parameters
• Advanced Editors
• Easy to Use
• PARAM
https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
© 2020 Cloudera, Inc. All rights reserved. 12
Parameters
• Configure Externally with JSON
Files to Execute Stateless Flows
https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
© 2020 Cloudera, Inc. All rights reserved. 13
Parameters
• Create / Edit Parameters from
NiFi or in JSON Files
https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
© 2020 Cloudera, Inc. All rights reserved. 14
Parameter Context
• Sensitive or Normal
• Connect to Multiple Process
Groups
https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
© 2020 Cloudera, Inc. All rights reserved. 15
RetryFlowFile
• Configurable Retries
• Maximum #
• Penalties
• When to Fail
• Reuse Mode
https://medium.com/@abdelkrim.hadjidj/apache-nifi-1-10-series-simplifying-error-handling-7de86f130acd
© 2020 Cloudera, Inc. All rights reserved. 16
BackPressure
Prediction
• OrdinaryLeastSquares
• SimpleRegression
• Enable analytics feature
http://lonnifi.blogspot.com/2019/11/back-pressure-prediction-deep-dive.html?es_id=5233333939
https://youtu.be/Tt8TSlHu7PE
© 2020 Cloudera, Inc. All rights reserved. 17
ParquetReader /
ParquetWriter
Records
• Native Record Processors for
Apache Parquet Files!
• CSV <-> Parquet
• XML <-> Parquet
• AVRO <-> Parquet
• JSON <-> Parquet
• More...
https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apac
he_7.html
© 2020 Cloudera, Inc. All rights reserved. 18
PostSlack
• Post Images to Slack
https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
https://www.datainmotion.dev/2019/11/nifi-110-postslack-easy-image-upload.html
© 2020 Cloudera, Inc. All rights reserved. 19
Remote Input Port
in a Process Group
• Put Remote Connections for
Site-To-Site (S2S) Anywhere!
• Not only top level
• Drop down simplicity
https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
© 2020 Cloudera, Inc. All rights reserved. 20
Many New
Features
• Prometheus Reporting Task
• Experimental Encrypted content repository
• PublishKafka Partition Support
• Toolkit module to generate and build Swagger
• GeoEnrichIPRecord Processor
• Command Line Diagnostics
• RocksDB FlowFile Repository
• PutBigQueryStreaming Processor
• Enhanced DevOps and CD/CI
https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
ELT/ETL Lookup Services
• DatabaseRecordLookupService
• KuduLookupService
• HBase_2_ListLookupService
https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.10.0
© 2020 Cloudera, Inc. All rights reserved. 21
NiFI 1.11 Features
• Improved handling and support for partitions when sending data to Azure Event Hubs.
• All repositories (Content, FlowFile, Provenance) can now be encrypted on disk controlled at an application level.
• Class loader isolation now includes isolating native libraries within the Nars! Huge help for interacting with many Hadoop
vendors or other systems from the same NiFi cluster.
• Keytab Credential Service now supported to ensure easily configured secure communications with the Hortonworks
Schema Registry.
• IBM MQ now easier to integrate with for existing NiFi JMS processors.
• Metrics Events Reporting Task
• Rules Action Handler Lookup Service
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12
346451
© 2019 Cloudera, Inc. All rights reserved. 22
Apache NiFi 1.11.4 Features
Reporting Tasks
Total number of reporting tasks.
Examples of new components:
- Prometheus Reporting Task
- Azure Log Analytics RT
- Azure Provenance RT
- Query NiFi Reporting Task
- Metrics Event Reporting Task
Controller Services
Total number of controller services.
Examples of new components:
- Rules Engine Controller Service
- Kudu Lookup Service
- Azure Storage Credentials
- Amazon S3 Encryption Service
- HBase List Lookup Service
- Parquet Reader/Writer
Processors
Total number of processors.
Examples of new components:
- Accumulo processors
- Put Elasticsearch Record
- Put BigQuery Streaming
- RetryFlowFile
© 2019 Cloudera, Inc. All rights reserved. 23
Other Features of Apache NiFi 1.11.4
JDK 11 Support
Improvements:
- Class loading isolation with
native libraries
Security
- Encrypted content repository &
flow file repository (tech
preview)
Operations
Improvements:
- Monitoring analytics and rule
based monitoring
- Parameters to improve CI/CD
and support sensitive
properties
https://www.youtube.com/watch?v=IUjz-rhA3xs
© 2019 Cloudera, Inc. All rights reserved. 24
Cloud, VMs, Containers and Pods
https://hub.docker.com/r/apache/nifi/
https://hub.helm.sh/charts/cetic/nifi
© 2020 Cloudera, Inc. All rights reserved. 25
Example
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
© 2019 Cloudera, Inc. All rights reserved. 31
Useful Links
https://www.datainmotion.dev/2020/02/connecting-apache-nifi-to-apache-atlas.html
https://dev.to/tspannhw/quicktip-ingesting-google-analytics-api-with-apache-nifi-mg1
https://dev.to/tspannhw/analyzing-wood-burning-stoves-with-flank-stack-minifi-flink-ni
fi-kafka-kudu-36on
https://dev.to/tspannhw/cloudera-edge2ai-minifi-java-agent-with-raspberry-pi-and-ther
mal-camera-and-air-quality-sensor-part-1-3oo9
https://dev.to/tspannhw/iot-series-minifi-agent-on-raspberry-pi-4-with-enviro-hat-for-en
vironmental-monitoring-and-analytics-l8d
https://dev.to/tspannhw/introducing-mm-flank-an-apache-flink-stack-for-rapid-streami
ng-development-from-edge-2-ai-5c12
https://dev.to/tspannhw/nifi-1-10-postslack-easy-image-upload-22mh
https://dev.to/tspannhw/nifi-toolkit-cli-for-nifi-1-10-213h
© 2020 Cloudera, Inc. All rights reserved. 32
TH N Y U

More Related Content

What's hot (20)

PDF
Nifi
Julio Castro
 
PPTX
Apache NiFi Crash Course Intro
DataWorks Summit/Hadoop Summit
 
PDF
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
PDF
NiFi Developer Guide
Deon Huang
 
PDF
Introduction to Apache NiFi dws19 DWS - DC 2019
Timothy Spann
 
PDF
Data ingestion and distribution with apache NiFi
Lev Brailovskiy
 
PDF
Nifi workshop
Yifeng Jiang
 
PDF
Running Apache NiFi with Apache Spark : Integration Options
Timothy Spann
 
PDF
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
confluent
 
PDF
Introduction to data flow management using apache nifi
Anshuman Ghosh
 
PPTX
NJ Hadoop Meetup - Apache NiFi Deep Dive
Bryan Bende
 
PPTX
NiFi Best Practices for the Enterprise
Gregory Keys
 
PDF
Distributed SQL Databases Deconstructed
Yugabyte
 
PPTX
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
PPTX
OVN - Basics and deep dive
Trinath Somanchi
 
PPTX
Integrating Apache Spark and NiFi for Data Lakes
DataWorks Summit/Hadoop Summit
 
PPTX
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
 
PPTX
Enable DPDK and SR-IOV for containerized virtual network functions with zun
heut2008
 
PDF
Real time stock processing with apache nifi, apache flink and apache kafka
Timothy Spann
 
PDF
単なるキャッシュじゃないよ!?infinispanの紹介
AdvancedTechNight
 
Apache NiFi Crash Course Intro
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
NiFi Developer Guide
Deon Huang
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Timothy Spann
 
Data ingestion and distribution with apache NiFi
Lev Brailovskiy
 
Nifi workshop
Yifeng Jiang
 
Running Apache NiFi with Apache Spark : Integration Options
Timothy Spann
 
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
confluent
 
Introduction to data flow management using apache nifi
Anshuman Ghosh
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
Bryan Bende
 
NiFi Best Practices for the Enterprise
Gregory Keys
 
Distributed SQL Databases Deconstructed
Yugabyte
 
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
OVN - Basics and deep dive
Trinath Somanchi
 
Integrating Apache Spark and NiFi for Data Lakes
DataWorks Summit/Hadoop Summit
 
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
heut2008
 
Real time stock processing with apache nifi, apache flink and apache kafka
Timothy Spann
 
単なるキャッシュじゃないよ!?infinispanの紹介
AdvancedTechNight
 

Similar to Introduction to Apache NiFi 1.11.4 (20)

PDF
Learning the basics of Apache NiFi for iot OSS Europe 2020
Timothy Spann
 
PDF
Introduction to Apache NiFi 1.10
Timothy Spann
 
PDF
Using the FLaNK Stack for edge ai (flink, nifi, kafka, kudu)
Timothy Spann
 
PDF
28March2024-Codeless-Generative-AI-Pipelines
Timothy Spann
 
PDF
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Big Data Spain
 
PDF
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
PDF
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
Timothy Spann
 
PDF
Cracking the nut, solving edge ai with apache tools and frameworks
Timothy Spann
 
PDF
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann
 
PDF
Docker Containers- Data Engineers' Arsenal.pdf
gr6336192
 
PDF
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Timothy Spann
 
PDF
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
Timothy Spann
 
PDF
Music city data Hail Hydrate! from stream to lake
Timothy Spann
 
PDF
Stream Processing with Apache Kafka and .NET
confluent
 
PDF
WarsawITDays_ ApacheNiFi202
Timothy Spann
 
PDF
Unconference Round Table Notes
Timothy Spann
 
PDF
Apache Deep Learning 201 - Philly Open Source
Timothy Spann
 
PDF
Enterprise guide to building a Data Mesh
Sion Smith
 
PDF
OSSNA Building Modern Data Streaming Apps
Timothy Spann
 
PDF
IoT Edge Data Processing with NVidia Jetson Nano oct 3 2019
Timothy Spann
 
Learning the basics of Apache NiFi for iot OSS Europe 2020
Timothy Spann
 
Introduction to Apache NiFi 1.10
Timothy Spann
 
Using the FLaNK Stack for edge ai (flink, nifi, kafka, kudu)
Timothy Spann
 
28March2024-Codeless-Generative-AI-Pipelines
Timothy Spann
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Big Data Spain
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
Timothy Spann
 
Cracking the nut, solving edge ai with apache tools and frameworks
Timothy Spann
 
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann
 
Docker Containers- Data Engineers' Arsenal.pdf
gr6336192
 
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Timothy Spann
 
2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines
Timothy Spann
 
Music city data Hail Hydrate! from stream to lake
Timothy Spann
 
Stream Processing with Apache Kafka and .NET
confluent
 
WarsawITDays_ ApacheNiFi202
Timothy Spann
 
Unconference Round Table Notes
Timothy Spann
 
Apache Deep Learning 201 - Philly Open Source
Timothy Spann
 
Enterprise guide to building a Data Mesh
Sion Smith
 
OSSNA Building Modern Data Streaming Apps
Timothy Spann
 
IoT Edge Data Processing with NVidia Jetson Nano oct 3 2019
Timothy Spann
 
Ad

More from Timothy Spann (20)

PDF
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Timothy Spann
 
PDF
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
Timothy Spann
 
PDF
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Timothy Spann
 
PDF
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
Timothy Spann
 
PDF
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
Timothy Spann
 
PDF
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
Timothy Spann
 
PDF
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
PDF
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
PDF
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
Timothy Spann
 
PDF
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
Timothy Spann
 
PPTX
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
Timothy Spann
 
PDF
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
Timothy Spann
 
PDF
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
Timothy Spann
 
PDF
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
Timothy Spann
 
PDF
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
Timothy Spann
 
PDF
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
Timothy Spann
 
PDF
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
Timothy Spann
 
PDF
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
Timothy Spann
 
PDF
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
Timothy Spann
 
PDF
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
Timothy Spann
 
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Timothy Spann
 
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
Timothy Spann
 
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Timothy Spann
 
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
Timothy Spann
 
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
Timothy Spann
 
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
Timothy Spann
 
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
Timothy Spann
 
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
Timothy Spann
 
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
Timothy Spann
 
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
Timothy Spann
 
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
Timothy Spann
 
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
Timothy Spann
 
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
Timothy Spann
 
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
Timothy Spann
 
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
Timothy Spann
 
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
Timothy Spann
 
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
Timothy Spann
 
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
Timothy Spann
 
Ad

Recently uploaded (20)

PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
July Patch Tuesday
Ivanti
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Biography of Daniel Podor.pdf
Daniel Podor
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 

Introduction to Apache NiFi 1.11.4

  • 1. Introduction to Apache NiFi 1.11.4 Timothy Spann Principal DataFlow Field Engineer Cloudera @PaasDev
  • 2. © 2020 Cloudera, Inc. All rights reserved. 2 Welcome to Future of Data - Princeton @PaasDev https://www.meetup.com/futureofdata-princeton/ From Big Data to AI to Streaming to Containers to Cloud to Analytics to Cloud Storage to Fast Data to Machine Learning to Microservices to ...
  • 3. © 2020 Cloudera, Inc. All rights reserved. 3 Meetup Presenter Who am I? Principal DataFlow Field Engineer @PaasDev DZone Zone Leader and Big Data MVB; Princeton NJ Future of Data Meetup; ex-Pivotal Field Engineer; Apache Kafka, Tensorflow, Apache Spark RefCards https://github.com/tspannhw https://www.datainmotion.dev/ https://dzone.com/users/297029/bunkertor.html
  • 4. 4© 2020 Cloudera, Inc. All rights reserved.
  • 5. © 2020 Cloudera, Inc. All rights reserved. 5 STORAGE LAYER sensors EXAMPLE REFERENCE ARCHITECTURE Apache NiFi Apache Kafka DATA SYNDICATION SERVICE BY KAFKA Kafka Topic iot DATA FLOW APPS POWERED BY NIFI Apache Impala Cloudera Machine Learning MODEL EXECUTION
  • 6. © 2020 Cloudera, Inc. All rights reserved. 6 Cloudera Flow Management Enable easy ingestion, routing, management and delivery of any data anywhere (Edge, cloud, data center) to any downstream system with built in end-to-end security and provenance ACQUIRE PROCESS DELIVER • Over 300 Prebuilt Processors • Easy to build your own • Parse, Enrich & Apply Schema • Filter, Split, Merger & Route • Throttle & Backpressure • Guaranteed Delivery • Full data provenance from acquisition to delivery • Diverse, Non-Traditional Sources • Eco-system integration Advanced tooling to industrialize flow development (Flow Development Life Cycle)
  • 7. © 2020 Cloudera, Inc. All rights reserved. 7 NiFi 1.14
  • 8. © 2020 Cloudera, Inc. All rights reserved. 8 Stateless Engine • Granular containers per flow • Flows From NiFi Registry https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html bin/nifi.sh stateless RunFromRegistry Continuous --file kafka.json https://github.com/apache/nifi/blob/ea1becac4fc519c54b8b4d21773e68f8da364755/nifi-nar-bundles/nifi-framework-bundle/nifi- framework/nifi-stateless/README.md
  • 9. © 2020 Cloudera, Inc. All rights reserved. 9 Stateless Engine • See also Parameters • Docker • YARN • Kubernetes (K8) • Stateful NiFi clusters • Apache OpenWhisk (FaaS) https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html {"registryUrl": "http://tspann-mbp15-hw14277:18080", "bucketId": "140b30f0-5a47-4747-9021-19d4fde7f993", "flowId": "0540e1fd-c7ca-46fb-9296-e37632021945", "ssl": { "keystoreFile": "","keystorePass": "","keyPass": "","keystoreType": "", "truststoreFile": "/Library/Java/JavaVirtualMachines/amazon-corretto-11.jdk/Contents/Home/lib/sec urity/cacerts", "truststorePass": "changeit", "truststoreType": "JKS" }, "parameters": { "broker" : "4.317.852.100:9092", "topic" : "iot", "group_id" : "nifi-stateless-kafka-consumer", "DestinationDirectory" : "/tmp/nifistateless/output2/", "output_dir": "/Users/tspann/Documents/nifi-1.10.0-SNAPSHOT/logs/output" } } https://github.com/tspannhw/stateless-examples
  • 10. © 2020 Cloudera, Inc. All rights reserved. 10 Parameters • Parameters • Parameter Context https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
  • 11. © 2020 Cloudera, Inc. All rights reserved. 11 Parameters • Advanced Editors • Easy to Use • PARAM https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
  • 12. © 2020 Cloudera, Inc. All rights reserved. 12 Parameters • Configure Externally with JSON Files to Execute Stateless Flows https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
  • 13. © 2020 Cloudera, Inc. All rights reserved. 13 Parameters • Create / Edit Parameters from NiFi or in JSON Files https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
  • 14. © 2020 Cloudera, Inc. All rights reserved. 14 Parameter Context • Sensitive or Normal • Connect to Multiple Process Groups https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
  • 15. © 2020 Cloudera, Inc. All rights reserved. 15 RetryFlowFile • Configurable Retries • Maximum # • Penalties • When to Fail • Reuse Mode https://medium.com/@abdelkrim.hadjidj/apache-nifi-1-10-series-simplifying-error-handling-7de86f130acd
  • 16. © 2020 Cloudera, Inc. All rights reserved. 16 BackPressure Prediction • OrdinaryLeastSquares • SimpleRegression • Enable analytics feature http://lonnifi.blogspot.com/2019/11/back-pressure-prediction-deep-dive.html?es_id=5233333939 https://youtu.be/Tt8TSlHu7PE
  • 17. © 2020 Cloudera, Inc. All rights reserved. 17 ParquetReader / ParquetWriter Records • Native Record Processors for Apache Parquet Files! • CSV <-> Parquet • XML <-> Parquet • AVRO <-> Parquet • JSON <-> Parquet • More... https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apac he_7.html
  • 18. © 2020 Cloudera, Inc. All rights reserved. 18 PostSlack • Post Images to Slack https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html https://www.datainmotion.dev/2019/11/nifi-110-postslack-easy-image-upload.html
  • 19. © 2020 Cloudera, Inc. All rights reserved. 19 Remote Input Port in a Process Group • Put Remote Connections for Site-To-Site (S2S) Anywhere! • Not only top level • Drop down simplicity https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
  • 20. © 2020 Cloudera, Inc. All rights reserved. 20 Many New Features • Prometheus Reporting Task • Experimental Encrypted content repository • PublishKafka Partition Support • Toolkit module to generate and build Swagger • GeoEnrichIPRecord Processor • Command Line Diagnostics • RocksDB FlowFile Repository • PutBigQueryStreaming Processor • Enhanced DevOps and CD/CI https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html ELT/ETL Lookup Services • DatabaseRecordLookupService • KuduLookupService • HBase_2_ListLookupService https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.10.0
  • 21. © 2020 Cloudera, Inc. All rights reserved. 21 NiFI 1.11 Features • Improved handling and support for partitions when sending data to Azure Event Hubs. • All repositories (Content, FlowFile, Provenance) can now be encrypted on disk controlled at an application level. • Class loader isolation now includes isolating native libraries within the Nars! Huge help for interacting with many Hadoop vendors or other systems from the same NiFi cluster. • Keytab Credential Service now supported to ensure easily configured secure communications with the Hortonworks Schema Registry. • IBM MQ now easier to integrate with for existing NiFi JMS processors. • Metrics Events Reporting Task • Rules Action Handler Lookup Service https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12 346451
  • 22. © 2019 Cloudera, Inc. All rights reserved. 22 Apache NiFi 1.11.4 Features Reporting Tasks Total number of reporting tasks. Examples of new components: - Prometheus Reporting Task - Azure Log Analytics RT - Azure Provenance RT - Query NiFi Reporting Task - Metrics Event Reporting Task Controller Services Total number of controller services. Examples of new components: - Rules Engine Controller Service - Kudu Lookup Service - Azure Storage Credentials - Amazon S3 Encryption Service - HBase List Lookup Service - Parquet Reader/Writer Processors Total number of processors. Examples of new components: - Accumulo processors - Put Elasticsearch Record - Put BigQuery Streaming - RetryFlowFile
  • 23. © 2019 Cloudera, Inc. All rights reserved. 23 Other Features of Apache NiFi 1.11.4 JDK 11 Support Improvements: - Class loading isolation with native libraries Security - Encrypted content repository & flow file repository (tech preview) Operations Improvements: - Monitoring analytics and rule based monitoring - Parameters to improve CI/CD and support sensitive properties https://www.youtube.com/watch?v=IUjz-rhA3xs
  • 24. © 2019 Cloudera, Inc. All rights reserved. 24 Cloud, VMs, Containers and Pods https://hub.docker.com/r/apache/nifi/ https://hub.helm.sh/charts/cetic/nifi
  • 25. © 2020 Cloudera, Inc. All rights reserved. 25 Example
  • 31. © 2019 Cloudera, Inc. All rights reserved. 31 Useful Links https://www.datainmotion.dev/2020/02/connecting-apache-nifi-to-apache-atlas.html https://dev.to/tspannhw/quicktip-ingesting-google-analytics-api-with-apache-nifi-mg1 https://dev.to/tspannhw/analyzing-wood-burning-stoves-with-flank-stack-minifi-flink-ni fi-kafka-kudu-36on https://dev.to/tspannhw/cloudera-edge2ai-minifi-java-agent-with-raspberry-pi-and-ther mal-camera-and-air-quality-sensor-part-1-3oo9 https://dev.to/tspannhw/iot-series-minifi-agent-on-raspberry-pi-4-with-enviro-hat-for-en vironmental-monitoring-and-analytics-l8d https://dev.to/tspannhw/introducing-mm-flank-an-apache-flink-stack-for-rapid-streami ng-development-from-edge-2-ai-5c12 https://dev.to/tspannhw/nifi-1-10-postslack-easy-image-upload-22mh https://dev.to/tspannhw/nifi-toolkit-cli-for-nifi-1-10-213h
  • 32. © 2020 Cloudera, Inc. All rights reserved. 32 TH N Y U