SlideShare a Scribd company logo
Citizen Streaming Engineer - A How To
Tim Spann
Developer Advocate
● FLiP(N) Stack = Flink, Pulsar and NiFi Stack
● Streaming Systems/ Data Architect
● Experience:
○ 15+ years of experience with batch and streaming
technologies including Pulsar, Flink, Spark, NiFi, Spring,
Java, Big Data, Cloud, MXNet, Hadoop, Datalakes, IoT
and more.
Demo
ApacheCon2022_Citizen Streaming  Engineer - A How To
Demo
https://github.com/tspannhw/airquality
Why Apache Pulsar?
Unified
Messaging
Platform
Guaranteed
Message
Delivery
Resiliency Infinite
Scalability
Unified Messaging
Model
Streaming
Consumer
Consumer
Consumer
Subscription
Shared
Failover
Consumer
Consumer
Subscription
In case of failure in
Consumer B-0
Consumer
Consumer
Subscription
Exclusive
X
Consumer
Consumer
Key-Shared
Subscription
Pulsar
Topic/Partition
Messaging
Ecosystem
● Ingest Data
● Route, Transform, Enrich
● Join Data
● ML Model Access
● Store
Easy to Build Streaming Data Pipelines
Why Apache NiFi?
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Supports push and pull
models
• Hundreds of processors
• Visual command and
control
• Over a 300 components
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
• Version Control
Use Apache NiFi For Ingest
https://streamnative.io/apache-nifi-connector/
● Ingest Data
● Cleanse
Apache NiFi <-> Apache Pulsar
Use Apache Pulsar For Ingest
Use Pulsar to Route/Transform/Enrich
● Libraries
● Functions
● Connectors
● AMQP, Kafka, MQTT
● Tiered Storage
● Utilizing JSON Data with a JSON Schema
● Consistency, Contracts, Clean Data
● This enables easy SQL:
○ Pulsar SQL (Presto SQL)
○ Flink SQL
○ Spark Structured Streaming
Use Schemas
● Use Java, Python or Go
● Simple way to add
functionality
● Route / Filter /
Transform
● Call Machine Learning
Models
Use Pulsar Functions
Deploying AI With an Event-Driven
Platform
https://dzone.com/trendreports/enterprise-ai-1
ML Models via Python / Java FN
● Visual Question and Answer
● Natural Language Processing
● Sentiment Analysis
● Text Classification
● Named Entity Recognition
● Content-based
Recommendations
• Predictive
Maintenance
• Fault Detection
• Fraud Detection
• Time-Series
Predictions
• Naive Bayes
Functions for Enrichment
Use Apache Flink to Join / Aggregate
Continuous SQL
Use Apache Spark To Store
val dfPulsar = spark.readStream.format("pulsar")
.option("service.url", "pulsar://pulsar1:6650")
.option("admin.url", "http://pulsar1:8080")
.option("topic", "persistent://public/default/airquality").load()
val pQuery = dfPulsar.selectExpr("*")
.writeStream.format("parquet")
.option("truncate", false).start()
https://pulsar.apache.org/docs/en/adaptors-spark/
Use Pulsar to Stream to Lakehouses
(Queuing + Streaming)
Simple Data Pipeline
Streaming FLiP-ML Apps
StreamNative Hub
StreamNative Cloud
Unified Batch and Stream COMPUTING
Batch
(Batch + Stream)
Unified Batch and Stream STORAGE
Offload
(Queuing + Streaming)
Tiered Storage
Pulsar
---
KoP
---
MoP
---
Websocket
Pulsar
Sink
Streaming
Edge Gateway
Protocols
CDC
Apps
Continuous Air Quality Aggregate Monitoring
● Buffer
● Batch
● Route
● Filter
● Aggregate
● Enrich
● Replicate
● Dedupe
● Decouple
● Distribute
FLiP Stack Weekly
This week in Apache Flink, Apache Pulsar, Apache
NiFi, Apache Spark, Java and Open Source friends.
https://bit.ly/32dAJft
Let’s Keep
in Touch!
Tim Spann
Developer Advocate
PaaSDev
https://www.linkedin.com/in/timothyspann
https://github.com/tspannhw
Python For Pulsar on Pi
● https://github.com/tspannhw/FLiP-Pi-BreakoutGarden
● https://github.com/tspannhw/FLiP-Pi-Thermal
● https://github.com/tspannhw/FLiP-Pi-Weather
● https://github.com/tspannhw/FLiP-RP400
● https://github.com/tspannhw/FLiP-Py-Pi-GasThermal
● https://github.com/tspannhw/FLiP-PY-FakeDataPulsar
● https://github.com/tspannhw/FLiP-Py-Pi-EnviroPlus
● https://github.com/tspannhw/PythonPulsarExamples
● https://github.com/tspannhw/pulsar-pychat-function
● https://github.com/tspannhw/FLiP-PulsarDevPython101
Thanks

More Related Content

Similar to ApacheCon2022_Citizen Streaming Engineer - A How To (20)

PDF
Using the flipn stack for edge ai (flink, nifi, pulsar)
Timothy Spann
 
PDF
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) - Pulsar Summit Asia ...
StreamNative
 
PDF
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...
Timothy Spann
 
PDF
The Next Generation of Streaming
Timothy Spann
 
PDF
All Day DevOps - FLiP Stack for Cloud Data Lakes
Timothy Spann
 
PDF
DBCC 2021 - FLiP Stack for Cloud Data Lakes
Timothy Spann
 
PDF
Ai dev world utilizing apache pulsar, apache ni fi and minifi for edgeai io...
Timothy Spann
 
PDF
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
Timothy Spann
 
PDF
Using FLiP with influxdb for edgeai iot at scale 2022
Timothy Spann
 
PDF
Apache Pulsar Development 101 with Python
Timothy Spann
 
PDF
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
Timothy Spann
 
PDF
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Timothy Spann
 
PDF
Real time cloud native open source streaming of any data to apache solr
Timothy Spann
 
PDF
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team
Timothy Spann
 
PDF
Codeless pipelines with pulsar and flink
Timothy Spann
 
PDF
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Timothy Spann
 
PDF
Sql bits apache nifi 101 Introduction and best practices
Timothy Spann
 
PDF
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
StreamNative
 
PDF
Using FLiP with influxdb for EdgeAI IoT at Scale
Timothy Spann
 
PDF
Timothy Spann [StreamNative] | Using FLaNK with InfluxDB for EdgeAI IoT at Sc...
InfluxData
 
Using the flipn stack for edge ai (flink, nifi, pulsar)
Timothy Spann
 
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) - Pulsar Summit Asia ...
StreamNative
 
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...
Timothy Spann
 
The Next Generation of Streaming
Timothy Spann
 
All Day DevOps - FLiP Stack for Cloud Data Lakes
Timothy Spann
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
Timothy Spann
 
Ai dev world utilizing apache pulsar, apache ni fi and minifi for edgeai io...
Timothy Spann
 
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
Timothy Spann
 
Using FLiP with influxdb for edgeai iot at scale 2022
Timothy Spann
 
Apache Pulsar Development 101 with Python
Timothy Spann
 
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
Timothy Spann
 
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Timothy Spann
 
Real time cloud native open source streaming of any data to apache solr
Timothy Spann
 
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team
Timothy Spann
 
Codeless pipelines with pulsar and flink
Timothy Spann
 
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Timothy Spann
 
Sql bits apache nifi 101 Introduction and best practices
Timothy Spann
 
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
StreamNative
 
Using FLiP with influxdb for EdgeAI IoT at Scale
Timothy Spann
 
Timothy Spann [StreamNative] | Using FLaNK with InfluxDB for EdgeAI IoT at Sc...
InfluxData
 

More from Timothy Spann (20)

PDF
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Timothy Spann
 
PDF
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
Timothy Spann
 
PDF
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Timothy Spann
 
PDF
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
Timothy Spann
 
PDF
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
Timothy Spann
 
PDF
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
Timothy Spann
 
PDF
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
PDF
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
PDF
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
Timothy Spann
 
PDF
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
Timothy Spann
 
PPTX
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
Timothy Spann
 
PDF
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
Timothy Spann
 
PDF
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
Timothy Spann
 
PDF
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
Timothy Spann
 
PDF
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
Timothy Spann
 
PDF
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
Timothy Spann
 
PDF
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
Timothy Spann
 
PDF
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
Timothy Spann
 
PDF
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
Timothy Spann
 
PDF
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
Timothy Spann
 
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Timothy Spann
 
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
Timothy Spann
 
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Timothy Spann
 
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
Timothy Spann
 
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
Timothy Spann
 
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
Timothy Spann
 
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
Timothy Spann
 
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
Timothy Spann
 
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
Timothy Spann
 
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
Timothy Spann
 
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
Timothy Spann
 
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
Timothy Spann
 
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
Timothy Spann
 
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
Timothy Spann
 
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
Timothy Spann
 
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
Timothy Spann
 
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
Timothy Spann
 
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
Timothy Spann
 

Recently uploaded (20)

PDF
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
PPT
Brief History of Python by Learning Python in three hours
adanechb21
 
PDF
Using licensed Data Loss Prevention (DLP) as a strategic proactive data secur...
Q-Advise
 
PPTX
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
PPTX
Role Of Python In Programing Language.pptx
jaykoshti048
 
PDF
Supabase Meetup: Build in a weekend, scale to millions
Carlo Gilmar Padilla Santana
 
PDF
How to Download and Install ADT (ABAP Development Tools) for Eclipse IDE | SA...
SAP Vista, an A L T Z E N Company
 
PDF
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
PPTX
Employee salary prediction using Machine learning Project template.ppt
bhanuk27082004
 
PPTX
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
PDF
Infrastructure planning and resilience - Keith Hastings.pptx.pdf
Safe Software
 
PDF
What companies do with Pharo (ESUG 2025)
ESUG
 
PDF
SAP GUI Installation Guide for Windows | Step-by-Step Setup for SAP Access
SAP Vista, an A L T Z E N Company
 
PDF
System Center 2025 vs. 2022; What’s new, what’s next_PDF.pdf
Q-Advise
 
PDF
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
PPTX
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
PPTX
GALILEO CRS SYSTEM | GALILEO TRAVEL SOFTWARE
philipnathen82
 
PDF
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
PDF
SAP GUI Installation Guide for macOS (iOS) | Connect to SAP Systems on Mac
SAP Vista, an A L T Z E N Company
 
PPTX
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
Brief History of Python by Learning Python in three hours
adanechb21
 
Using licensed Data Loss Prevention (DLP) as a strategic proactive data secur...
Q-Advise
 
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
Role Of Python In Programing Language.pptx
jaykoshti048
 
Supabase Meetup: Build in a weekend, scale to millions
Carlo Gilmar Padilla Santana
 
How to Download and Install ADT (ABAP Development Tools) for Eclipse IDE | SA...
SAP Vista, an A L T Z E N Company
 
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
Employee salary prediction using Machine learning Project template.ppt
bhanuk27082004
 
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
Infrastructure planning and resilience - Keith Hastings.pptx.pdf
Safe Software
 
What companies do with Pharo (ESUG 2025)
ESUG
 
SAP GUI Installation Guide for Windows | Step-by-Step Setup for SAP Access
SAP Vista, an A L T Z E N Company
 
System Center 2025 vs. 2022; What’s new, what’s next_PDF.pdf
Q-Advise
 
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
GALILEO CRS SYSTEM | GALILEO TRAVEL SOFTWARE
philipnathen82
 
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
SAP GUI Installation Guide for macOS (iOS) | Connect to SAP Systems on Mac
SAP Vista, an A L T Z E N Company
 
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 

ApacheCon2022_Citizen Streaming Engineer - A How To

  • 2. Tim Spann Developer Advocate ● FLiP(N) Stack = Flink, Pulsar and NiFi Stack ● Streaming Systems/ Data Architect ● Experience: ○ 15+ years of experience with batch and streaming technologies including Pulsar, Flink, Spark, NiFi, Spring, Java, Big Data, Cloud, MXNet, Hadoop, Datalakes, IoT and more.
  • 7. Unified Messaging Model Streaming Consumer Consumer Consumer Subscription Shared Failover Consumer Consumer Subscription In case of failure in Consumer B-0 Consumer Consumer Subscription Exclusive X Consumer Consumer Key-Shared Subscription Pulsar Topic/Partition Messaging
  • 9. ● Ingest Data ● Route, Transform, Enrich ● Join Data ● ML Model Access ● Store Easy to Build Streaming Data Pipelines
  • 10. Why Apache NiFi? • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Hundreds of processors • Visual command and control • Over a 300 components • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering • Version Control
  • 11. Use Apache NiFi For Ingest https://streamnative.io/apache-nifi-connector/ ● Ingest Data ● Cleanse
  • 12. Apache NiFi <-> Apache Pulsar
  • 13. Use Apache Pulsar For Ingest
  • 14. Use Pulsar to Route/Transform/Enrich ● Libraries ● Functions ● Connectors ● AMQP, Kafka, MQTT ● Tiered Storage
  • 15. ● Utilizing JSON Data with a JSON Schema ● Consistency, Contracts, Clean Data ● This enables easy SQL: ○ Pulsar SQL (Presto SQL) ○ Flink SQL ○ Spark Structured Streaming Use Schemas
  • 16. ● Use Java, Python or Go ● Simple way to add functionality ● Route / Filter / Transform ● Call Machine Learning Models Use Pulsar Functions
  • 17. Deploying AI With an Event-Driven Platform https://dzone.com/trendreports/enterprise-ai-1
  • 18. ML Models via Python / Java FN ● Visual Question and Answer ● Natural Language Processing ● Sentiment Analysis ● Text Classification ● Named Entity Recognition ● Content-based Recommendations • Predictive Maintenance • Fault Detection • Fraud Detection • Time-Series Predictions • Naive Bayes
  • 20. Use Apache Flink to Join / Aggregate Continuous SQL
  • 21. Use Apache Spark To Store val dfPulsar = spark.readStream.format("pulsar") .option("service.url", "pulsar://pulsar1:6650") .option("admin.url", "http://pulsar1:8080") .option("topic", "persistent://public/default/airquality").load() val pQuery = dfPulsar.selectExpr("*") .writeStream.format("parquet") .option("truncate", false).start() https://pulsar.apache.org/docs/en/adaptors-spark/
  • 22. Use Pulsar to Stream to Lakehouses
  • 24. Streaming FLiP-ML Apps StreamNative Hub StreamNative Cloud Unified Batch and Stream COMPUTING Batch (Batch + Stream) Unified Batch and Stream STORAGE Offload (Queuing + Streaming) Tiered Storage Pulsar --- KoP --- MoP --- Websocket Pulsar Sink Streaming Edge Gateway Protocols CDC Apps
  • 25. Continuous Air Quality Aggregate Monitoring
  • 26. ● Buffer ● Batch ● Route ● Filter ● Aggregate ● Enrich ● Replicate ● Dedupe ● Decouple ● Distribute
  • 27. FLiP Stack Weekly This week in Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark, Java and Open Source friends. https://bit.ly/32dAJft
  • 28. Let’s Keep in Touch! Tim Spann Developer Advocate PaaSDev https://www.linkedin.com/in/timothyspann https://github.com/tspannhw
  • 29. Python For Pulsar on Pi ● https://github.com/tspannhw/FLiP-Pi-BreakoutGarden ● https://github.com/tspannhw/FLiP-Pi-Thermal ● https://github.com/tspannhw/FLiP-Pi-Weather ● https://github.com/tspannhw/FLiP-RP400 ● https://github.com/tspannhw/FLiP-Py-Pi-GasThermal ● https://github.com/tspannhw/FLiP-PY-FakeDataPulsar ● https://github.com/tspannhw/FLiP-Py-Pi-EnviroPlus ● https://github.com/tspannhw/PythonPulsarExamples ● https://github.com/tspannhw/pulsar-pychat-function ● https://github.com/tspannhw/FLiP-PulsarDevPython101