SlideShare a Scribd company logo
Real-Time Stock Processing
With Apache NiFi, Apache Flink and Apache Kafka
Timothy Spann, Principal DataFlow Field Engineer @
Pierre Villard, Senior Product Manager @
Who?
Tim Spann
@PaasDev // Blog: www.datainmotion.dev
Principal DataFlow Field Engineer. Princeton Future of Data Meetup.
https://github.com/tspannhw/EverythingApacheNiFi
https://community.cloudera.com/t5/Community-Articles/Real-Time-Stock-Processing-With-A
pache-NiFi-and-Apache-Kafka/ta-p/249221
Pierre Villard
Twitter & Github - @pvillard31 // Blog: www.pierrevillard.com
Committer and PMC member for Apache NiFi (in the community since 2015)
Senior Product Manager at Cloudera for products around Apache NiFi, NiFi Registry and MiNiFi
Previously at Google & Hortonworks
What?
This talk is about ingesting real-time data from many sources and build a dashboard on top it to track in
real time what our stocks are.
This use case is a good example to show the combination of some of the best Apache solutions for
streaming applications.
NiFi, Kafka and Flink in a few numbers
- Apache NiFi (version 1.12.x) - created and open sourced by the NSA - initial release in 2006
350+ contributors, 1200+ people in Slack, 3.1M+ docker pulls
Many sub-projects: NiFi, MiNiFi Java, MiNiFi C++, NiFi Registry, etc
- Apache Kafka (version 2.6.x) - created and open sourced by LinkedIn - initial release in 2011
700+ contributors
- Apache Flink (version 1.11.x) - initial release in 2011
750+ contributors, 2nd top repository by number of commits, top active project on mailing lists
What is NiFi used for?
Analyze
Streaming OLAP
Analytics & Time Series
Store Powered by
Druid & Kudu
Buffer
Apache Kafka
Topics
Ingest Gateway
Powered by Kafka
Distribute
Apache NiFi
Data Flow Apps
Powered by NiFi
Buffer
Apache Kafka
Syndicate
topics
Syndicate Services
Powered by Kafka
Collect
Syndicate
topics
Syndicate Services
Powered by Kafka
Replication /
Data Deployment
Analyze
Streaming Analytics Apps
Stream Processing
Powered by Flink
Streaming Reference Architecture
Data Collection
at the Edge
Apache NiFi / MiNiFi
- sensors, IoT
- databases
- file systems
- app sidecar
- live streams
- MQ
- logs
- network
Anything… you
name it!
Where?
CDP services are optimized for the elastic
compute & ‘always-on’ storage services provided
by any cloud provider
Web service hosted and managed by Cloudera
Hosted in the your cloud environment, but
managed by the CDP Management Console
Shared Data Experience (SDX) technologies form
a secure and governed data lake backed by object
storage (S3, ADLS, GCS)
Flow Management Streams Messaging Streaming Analytics
How? This use case architecture
Stock Data
Logs
Errors
Aggregates
Other data
SQL
Analytics
Demonstration
Let’s see all of this in action…
Thanks! Questions?
Timothy Spann, Principal DataFlow Field Engineer @
Pierre Villard, Senior Product Manager @

More Related Content

What's hot (20)

PDF
Apache Nifi Crash Course
DataWorks Summit
 
PPTX
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
PDF
Introduction to Apache NiFi dws19 DWS - DC 2019
Timothy Spann
 
PDF
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
PDF
Data ingestion and distribution with apache NiFi
Lev Brailovskiy
 
PPTX
Flink Streaming
Gyula Fóra
 
PPTX
Apache Flink and what it is used for
Aljoscha Krettek
 
PDF
Productizing Structured Streaming Jobs
Databricks
 
PDF
Introduction to data flow management using apache nifi
Anshuman Ghosh
 
PDF
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Kai Wähner
 
PDF
Observability for Data Pipelines With OpenLineage
Databricks
 
PPTX
Flink SQL & TableAPI in Large Scale Production at Alibaba
DataWorks Summit
 
PPTX
Apache Flink Deep Dive
DataWorks Summit
 
PDF
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
StreamNative
 
PDF
Introducing Databricks Delta
Databricks
 
PDF
Apache Flink 101 - the rise of stream processing and beyond
Bowen Li
 
PDF
Making Apache Spark Better with Delta Lake
Databricks
 
PDF
Hudi architecture, fundamentals and capabilities
Nishith Agarwal
 
PDF
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
PDF
Nifi
Julio Castro
 
Apache Nifi Crash Course
DataWorks Summit
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Timothy Spann
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
Data ingestion and distribution with apache NiFi
Lev Brailovskiy
 
Flink Streaming
Gyula Fóra
 
Apache Flink and what it is used for
Aljoscha Krettek
 
Productizing Structured Streaming Jobs
Databricks
 
Introduction to data flow management using apache nifi
Anshuman Ghosh
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Kai Wähner
 
Observability for Data Pipelines With OpenLineage
Databricks
 
Flink SQL & TableAPI in Large Scale Production at Alibaba
DataWorks Summit
 
Apache Flink Deep Dive
DataWorks Summit
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
StreamNative
 
Introducing Databricks Delta
Databricks
 
Apache Flink 101 - the rise of stream processing and beyond
Bowen Li
 
Making Apache Spark Better with Delta Lake
Databricks
 
Hudi architecture, fundamentals and capabilities
Nishith Agarwal
 
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 

Similar to Real time stock processing with apache nifi, apache flink and apache kafka (20)

PDF
Real time cloud native open source streaming of any data to apache solr
Timothy Spann
 
PPTX
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Slim Baltagi
 
PPTX
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Slim Baltagi
 
PPTX
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Slim Baltagi
 
PPTX
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
DataWorks Summit/Hadoop Summit
 
PDF
Sql bits apache nifi 101 Introduction and best practices
Timothy Spann
 
PPTX
Unified Batch and Real-Time Stream Processing Using Apache Flink
Slim Baltagi
 
PDF
ApacheCon 2021 - Apache NiFi Deep Dive 300
Timothy Spann
 
PDF
Codeless pipelines with pulsar and flink
Timothy Spann
 
PDF
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
Timothy Spann
 
PDF
Using FLiP with influxdb for edgeai iot at scale 2022
Timothy Spann
 
PPTX
Robust stream processing with Apache Flink
Aljoscha Krettek
 
PDF
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Timothy Spann
 
PDF
Cloud lunch and learn real-time streaming in azure
Timothy Spann
 
PDF
CoC23_ Looking at the New Features of Apache NiFi
Timothy Spann
 
PDF
CoC23_ Looking at the New Features of Apache NiFi
ssuser73434e
 
PDF
Flink Community Update 2015 June
Márton Balassi
 
PPTX
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks
 
PPTX
Apache kafka
sureshraj43
 
PDF
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
Timothy Spann
 
Real time cloud native open source streaming of any data to apache solr
Timothy Spann
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Slim Baltagi
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Slim Baltagi
 
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Slim Baltagi
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
DataWorks Summit/Hadoop Summit
 
Sql bits apache nifi 101 Introduction and best practices
Timothy Spann
 
Unified Batch and Real-Time Stream Processing Using Apache Flink
Slim Baltagi
 
ApacheCon 2021 - Apache NiFi Deep Dive 300
Timothy Spann
 
Codeless pipelines with pulsar and flink
Timothy Spann
 
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
Timothy Spann
 
Using FLiP with influxdb for edgeai iot at scale 2022
Timothy Spann
 
Robust stream processing with Apache Flink
Aljoscha Krettek
 
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Timothy Spann
 
Cloud lunch and learn real-time streaming in azure
Timothy Spann
 
CoC23_ Looking at the New Features of Apache NiFi
Timothy Spann
 
CoC23_ Looking at the New Features of Apache NiFi
ssuser73434e
 
Flink Community Update 2015 June
Márton Balassi
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks
 
Apache kafka
sureshraj43
 
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
Timothy Spann
 
Ad

More from Timothy Spann (20)

PDF
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Timothy Spann
 
PDF
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
Timothy Spann
 
PDF
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Timothy Spann
 
PDF
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
Timothy Spann
 
PDF
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
Timothy Spann
 
PDF
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
Timothy Spann
 
PDF
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
PDF
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
PDF
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
Timothy Spann
 
PDF
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
Timothy Spann
 
PPTX
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
Timothy Spann
 
PDF
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
Timothy Spann
 
PDF
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
Timothy Spann
 
PDF
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
Timothy Spann
 
PDF
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
Timothy Spann
 
PDF
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
Timothy Spann
 
PDF
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
Timothy Spann
 
PDF
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
Timothy Spann
 
PDF
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
Timothy Spann
 
PDF
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
Timothy Spann
 
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Timothy Spann
 
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
Timothy Spann
 
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Timothy Spann
 
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
Timothy Spann
 
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
Timothy Spann
 
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
Timothy Spann
 
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
Timothy Spann
 
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
Timothy Spann
 
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
Timothy Spann
 
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
Timothy Spann
 
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
Timothy Spann
 
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
Timothy Spann
 
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
Timothy Spann
 
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
Timothy Spann
 
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
Timothy Spann
 
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
Timothy Spann
 
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
Timothy Spann
 
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
Timothy Spann
 
Ad

Recently uploaded (20)

PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 

Real time stock processing with apache nifi, apache flink and apache kafka

  • 1. Real-Time Stock Processing With Apache NiFi, Apache Flink and Apache Kafka Timothy Spann, Principal DataFlow Field Engineer @ Pierre Villard, Senior Product Manager @
  • 2. Who? Tim Spann @PaasDev // Blog: www.datainmotion.dev Principal DataFlow Field Engineer. Princeton Future of Data Meetup. https://github.com/tspannhw/EverythingApacheNiFi https://community.cloudera.com/t5/Community-Articles/Real-Time-Stock-Processing-With-A pache-NiFi-and-Apache-Kafka/ta-p/249221 Pierre Villard Twitter & Github - @pvillard31 // Blog: www.pierrevillard.com Committer and PMC member for Apache NiFi (in the community since 2015) Senior Product Manager at Cloudera for products around Apache NiFi, NiFi Registry and MiNiFi Previously at Google & Hortonworks
  • 3. What? This talk is about ingesting real-time data from many sources and build a dashboard on top it to track in real time what our stocks are. This use case is a good example to show the combination of some of the best Apache solutions for streaming applications.
  • 4. NiFi, Kafka and Flink in a few numbers - Apache NiFi (version 1.12.x) - created and open sourced by the NSA - initial release in 2006 350+ contributors, 1200+ people in Slack, 3.1M+ docker pulls Many sub-projects: NiFi, MiNiFi Java, MiNiFi C++, NiFi Registry, etc - Apache Kafka (version 2.6.x) - created and open sourced by LinkedIn - initial release in 2011 700+ contributors - Apache Flink (version 1.11.x) - initial release in 2011 750+ contributors, 2nd top repository by number of commits, top active project on mailing lists
  • 5. What is NiFi used for?
  • 6. Analyze Streaming OLAP Analytics & Time Series Store Powered by Druid & Kudu Buffer Apache Kafka Topics Ingest Gateway Powered by Kafka Distribute Apache NiFi Data Flow Apps Powered by NiFi Buffer Apache Kafka Syndicate topics Syndicate Services Powered by Kafka Collect Syndicate topics Syndicate Services Powered by Kafka Replication / Data Deployment Analyze Streaming Analytics Apps Stream Processing Powered by Flink Streaming Reference Architecture Data Collection at the Edge Apache NiFi / MiNiFi - sensors, IoT - databases - file systems - app sidecar - live streams - MQ - logs - network Anything… you name it!
  • 7. Where? CDP services are optimized for the elastic compute & ‘always-on’ storage services provided by any cloud provider Web service hosted and managed by Cloudera Hosted in the your cloud environment, but managed by the CDP Management Console Shared Data Experience (SDX) technologies form a secure and governed data lake backed by object storage (S3, ADLS, GCS) Flow Management Streams Messaging Streaming Analytics
  • 8. How? This use case architecture Stock Data Logs Errors Aggregates Other data SQL Analytics
  • 9. Demonstration Let’s see all of this in action…
  • 10. Thanks! Questions? Timothy Spann, Principal DataFlow Field Engineer @ Pierre Villard, Senior Product Manager @