SlideShare a Scribd company logo
Deep Dive into Building
Streaming Applications with
Apache Pulsar
Tim Spann
Developer Advocate
● FLiP(N) Stack = Flink, Pulsar and NiFi Stack
● Streaming Systems/ Data Architect
● Experience:
○ 15+ years of experience with batch and streaming
technologies including Pulsar, Flink, Spark, NiFi, Spring,
Java, Big Data, Cloud, MXNet, Hadoop, Datalakes, IoT
and more.
FLiP Stack Weekly
This week in Apache Flink, Apache Pulsar, Apache
NiFi, Apache Spark and open source friends.
https://bit.ly/32dAJft
Apache Pulsar is a Cloud-Native
Messaging and Event-Streaming Platform.
Why Apache Pulsar?
Unified
Messaging Platform
Guaranteed
Message Delivery Resiliency Infinite
Scalability
Building
Microservices
Asynchronous
Communication
Building Real Time
Applications
Highly Resilient
Tiered storage
6
Pulsar Benefits
● “Bookies”
● Stores messages and cursors
● Messages are grouped in
segments/ledgers
● A group of bookies form an
“ensemble” to store a ledger
● “Brokers”
● Handles message routing and
connections
● Stateless, but with caches
● Automatic load-balancing
● Topics are composed of
multiple segments
●
● Stores metadata for both
Pulsar and BookKeeper
● Service discovery
Store
Messages
Metadata &
Service Discovery
Metadata &
Service Discovery
Key Pulsar Concepts: Architecture
MetaData
Storage
Component Description
Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although
message data can also conform to data schemas.
Key Messages are optionally tagged with keys, used in partitioning and also is useful for
things like topic compaction.
Properties An optional key/value map of user-defined properties.
Producer name The name of the producer who produces the message. If you do not specify a producer
name, the default name is used. Message De-Duplication.
Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of
the message is its order in that sequence. Message De-Duplication.
Messages - the basic unit of Pulsar
Key Pulsar Concepts:
Messaging vs Streaming
Message Queueing - Queueing systems are ideal
for work queues that do not require tasks to be
performed in a particular order.
Streaming - Streaming works best in situations
where the order of messages is important.
Connectivity
• Functions - Lightweight Stream
Processing (Java, Python, Go)
• Connectors - Sources & Sinks
(Cassandra, Kafka, …)
• Protocol Handlers - AoP (AMQP), KoP
(Kafka), MoP (MQTT)
• Processing Engines - Flink, Spark,
Presto/Trino via Pulsar SQL
• Data Offloaders - Tiered Storage - (S3)
hub.streamnative.io
Schema Registry
Schema Registry
schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3
(value=Avro/Protobuf/JSON)
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
(value=Avro/Protobuf/JSON) data
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
(value=Avro/Protobuf/JSON) data
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers
Kafka
On Pulsar
(KoP)
MQTT
On Pulsar
(MoP)
AMQP On
Pulsar
(AoP)
Presto/Trino workers can read segments
directly from bookies (or offloaded storage) in
parallel. Bookie
1
Segment 1
Producer Consumer
Broker 1
Topic1-Part1
Broker 2
Topic1-Part2
Broker 3
Topic1-Part3
Segment
2
Segment
3
Segment
4
Segment X
Segment 1
Segment
1 Segment 1
Segment 3
Segment
3
Segment 3
Segment 2
Segment
2
Segment 2
Segment 4
Segment 4
Segment
4
Segment X
Segment X
Segment X
Bookie
2
Bookie
3
Query
Coordin
ator
.
.
.
.
.
.
SQL
Worker
SQL
Worker
SQL
Worker
SQL
Worker
Query
Topic
Metadata
Pulsar SQL
● Buffer
● Batch
● Route
● Filter
● Aggregate
● Enrich
● Replicate
● Dedupe
● Decouple
● Distribute
Pulsar Functions
● Lightweight computation
similar to AWS Lambda.
● Specifically designed to use
Apache Pulsar as a message
bus.
● Function runtime can be
located within Pulsar Broker.
A serverless event streaming
framework
18
19
● Buffer
● Batch
● Route
● Filter
● Aggregate
● Enrich
● Replicate
● Dedupe
● Decouple
● Distribute 20
● Consume messages from one or
more Pulsar topics.
● Apply user-supplied processing
logic to each message.
● Publish the results of the
computation to another topic.
● Support multiple programming
languages (Java, Python, Go)
● Can leverage 3rd-party libraries
to support the execution of ML
models on the edge.
Pulsar Functions
21
Run a Local Standalone Bare Metal
wget
https://archive.apache.org/dist/pulsar/pulsar-2.10.1/apache-pulsar-2.10.1-
bin.tar.gz
tar xvfz apache-pulsar-2.10.1-bin.tar.gz
cd apache-pulsar-2.10.1
bin/pulsar standalone
(For Pulsar SQL Support)
bin/pulsar sql-worker start
https://pulsar.apache.org/docs/en/standalone/
22
<or> Run in Docker
docker run -it 
-p 6650:6650 
-p 8080:8080 
--mount source=pulsardata,target=/pulsar/data 
--mount source=pulsarconf,target=/pulsar/conf 
apachepulsar/pulsar:2.10.1 
bin/pulsar standalone
https://pulsar.apache.org/docs/en/standalone-docker/
23
Building Tenant, Namespace, Topics
bin/pulsar-admin tenants create conf
bin/pulsar-admin namespaces create conf/europe
bin/pulsar-admin tenants list
bin/pulsar-admin namespaces list conf
bin/pulsar-admin topics create persistent://conf/europe/first
bin/pulsar-admin topics list conf/europe
24
Install Python 3 Pulsar Client
pip3 install pulsar-client=='2.10.1[all]'
Includes AARCH64, ARM, M2, INTEL, …
For Python on Pulsar on Pi https://github.com/tspannhw/PulsarOnRaspberryPi
https://pulsar.apache.org/docs/en/client-libraries-python/
https://pypi.org/project/pulsar-client/2.10.0/#files
25
Building a Python 3 Producer
import pulsar
client = pulsar.Client('pulsar://localhost:6650')
producer
client.create_producer('persistent://conf/ete/first')
producer.send(('Simple Text Message').encode('utf-8'))
client.close()
26
Building a Python 3 Cloud Producer Oath
python3 prod.py -su pulsar+ssl://name1.name2.snio.cloud:6651 -t
persistent://public/default/pyth --auth-params
'{"issuer_url":"https://auth.streamnative.cloud", "private_key":"my.json",
"audience":"urn:sn:pulsar:name:myclustr"}'
from pulsar import Client, AuthenticationOauth2
parse = argparse.ArgumentParser(prog=prod.py')
parse.add_argument('-su', '--service-url', dest='service_url', type=str,
required=True)
args = parse.parse_args()
client = pulsar.Client(args.service_url,
authentication=AuthenticationOauth2(args.auth_params))
https://github.com/streamnative/examples/blob/master/cloud/python/OAuth2Producer.py
https://github.com/tspannhw/FLiP-Pi-BreakoutGarden 27
Example Avro Schema Usage
import pulsar
from pulsar.schema import *
from pulsar.schema import AvroSchema
class thermal(Record):
uuid = String()
client = pulsar.Client('pulsar://pulsar1:6650')
thermalschema = AvroSchema(thermal)
producer =
client.create_producer(topic='persistent://public/default/pi-thermal-avro',
schema=thermalschema,properties={"producer-name": "thrm" })
thermalRec = thermal()
thermalRec.uuid = "unique-name"
producer.send(thermalRec,partition_key=uniqueid)
https://github.com/tspannhw/FLiP-Pi-Thermal
28
Example Json Schema Usage
import pulsar
from pulsar.schema import *
from pulsar.schema import JsonSchema
class weather(Record):
uuid = String()
client = pulsar.Client('pulsar://pulsar1:6650')
wsc = JsonSchema(thermal)
producer =
client.create_producer(topic='persistent://public/default/wthr,schema=wsc,pro
perties={"producer-name": "wthr" })
weatherRec = weather()
weatherRec.uuid = "unique-name"
producer.send(weatherRec,partition_key=uniqueid)
https://github.com/tspannhw/FLiP-Pi-Weather
https://github.com/tspannhw/FLiP-PulsarDevPython101
29
Building a Python3 Consumer
import pulsar
client = pulsar.Client('pulsar://localhost:6650')
consumer =
client.subscribe('persistent://conf/ete/first',subscription_name='mine')
while True:
msg = consumer.receive()
print("Received message: '%s'" % msg.data())
consumer.acknowledge(msg)
client.close()
30
MQTT from Python
pip3 install paho-mqtt
import paho.mqtt.client as mqtt
client = mqtt.Client("rpi4iot")
row = { }
row['gasKO'] = str(readings)
json_string = json.dumps(row)
json_string = json_string.strip()
client.connect("pulsar-server.com", 1883, 180)
client.publish("persistent://public/default/mqtt-2",
payload=json_string,qos=0,retain=True)
https://www.slideshare.net/bunkertor/data-minutes-2-apache-pulsar-with-mqtt-for-edge-computing-lightning-2022
31
Web Sockets from Python
pip3 install websocket-client
import websocket, base64, json
topic = 'ws://server:8080/ws/v2/producer/persistent/public/default/topic1'
ws = websocket.create_connection(topic)
message = "Hello Philly ETE Conference"
message_bytes = message.encode('ascii')
base64_bytes = base64.b64encode(message_bytes)
base64_message = base64_bytes.decode('ascii')
ws.send(json.dumps({'payload' : base64_message,'properties': {'device' :
'macbook'},'context' : 5}))
response = json.loads(ws.recv())
https://pulsar.apache.org/docs/en/client-libraries-websocket/
https://github.com/tspannhw/FLiP-IoT/blob/main/wspulsar.py
https://github.com/tspannhw/FLiP-IoT/blob/main/wsreader.py
32
Kafka from Python
pip3 install kafka-python
from kafka import KafkaProducer
from kafka.errors import KafkaError
row = { }
row['gasKO'] = str(readings)
json_string = json.dumps(row)
json_string = json_string.strip()
producer = KafkaProducer(bootstrap_servers='pulsar1:9092',retries=3)
producer.send('topic-kafka-1', json.dumps(row).encode('utf-8'))
producer.flush()
https://github.com/streamnative/kop
https://docs.streamnative.io/platform/v1.0.0/concepts/kop-concepts
33
Deploy Python Functions
bin/pulsar-admin functions create --auto-ack true --py py/src/sentiment.py
--classname "sentiment.Chat" --inputs "persistent://public/default/chat"
--log-topic "persistent://public/default/logs" --name Chat --output
"persistent://public/default/chatresult"
https://github.com/tspannhw/pulsar-pychat-function
34
Pulsar IO Function in Python 3.9+
from pulsar import Function
import json
class Chat(Function):
def __init__(self):
pass
def process(self, input, context):
logger = context.get_logger()
msg_id = context.get_message_id()
fields = json.loads(input)
https://github.com/tspannhw/pulsar-pychat-function
35
Building a Golang Pulsar App
http://pulsar.apache.org/docs/en/client-libraries-go/
go get -u "github.com/apache/pulsar-client-go/pulsar"
import (
"log"
"time"
"github.com/apache/pulsar-client-go/pulsar"
)
func main() {
client, err := pulsar.NewClient(pulsar.ClientOptions{
URL: "pulsar://localhost:6650",OperationTimeout: 30 * time.Second,
ConnectionTimeout: 30 * time.Second,
})
if err != nil {
log.Fatalf("Could not instantiate Pulsar client: %v", err)
}
defer client.Close()
}
36
Pulsar Producer
import java.util.UUID;
import java.net.URL;
import org.apache.pulsar.client.api.Producer;
import org.apache.pulsar.client.api.ProducerBuilder;
import org.apache.pulsar.client.api.PulsarClient;
import org.apache.pulsar.client.api.MessageId;
import org.apache.pulsar.client.impl.auth.oauth2.AuthenticationFactoryOAuth2;
PulsarClient client = PulsarClient.builder()
.serviceUrl(serviceUrl)
.authentication(
AuthenticationFactoryOAuth2.clientCredentials(
new URL(issuerUrl), new URL(credentialsUrl.), audience))
.build();
37
Spring RabbitMQ/AMQP Producer
rabbitTemplate.convertAndSend(topicName,
DataUtility.serializeToJSON(observation));
38
Spring MQTT Producer
MqttMessage mqttMessage = new MqttMessage();
mqttMessage.setPayload(DataUtility.serialize(payload));
mqttMessage.setQos(1);
mqttMessage.setRetained(true);
mqttClient.publish(topicName, mqttMessage);
39
Spring Kafka Producer
ProducerRecord<String, String> producerRecord = new
ProducerRecord<>(topicName, uuidKey.toString(),
DataUtility.serializeToJSON(message));
kafkaTemplate.send(producerRecord);
40
Pulsar Simple Producer
String pulsarKey = UUID.randomUUID().toString();
String OS = System.getProperty("os.name").toLowerCase();
ProducerBuilder<byte[]> producerBuilder = client.newProducer().topic(topic)
.producerName("demo");
Producer<byte[]> producer = producerBuilder.create();
MessageId msgID = producer.newMessage().key(pulsarKey).value("msg".getBytes())
.property("device",OS).send();
producer.close();
client.close();
41
import java.util.function.Function;
public class MyFunction implements Function<String, String> {
public String apply(String input) {
return doBusinessLogic(input);
}
}
Your Code Here
Pulsar Function Java
42
import org.apache.pulsar.client.impl.schema.JSONSchema;
import org.apache.pulsar.functions.api.*;
public class AirQualityFunction implements Function<byte[], Void> {
@Override
public Void process(byte[] input, Context context) {
context.getLogger().debug("File:” + new String(input));
context.newOutputMessage(“topicname”,
JSONSchema.of(Observation.class))
.key(UUID.randomUUID().toString())
.property(“prop1”, “value1”)
.value(observation)
.send();
}
}
Your Code Here
Pulsar Function SDK
43
Setting Subscription Type Java
Consumer<byte[]> consumer = pulsarClient.newConsumer()
.topic(topic)
.subscriptionName(“subscriptionName")
.subscriptionType(SubscriptionType.Shared)
.subscribe();
44
Subscribing to a Topic and Setting Subscription Name
Java
Consumer<byte[]> consumer = pulsarClient.newConsumer()
.topic(topic)
.subscriptionName(“subscriptionName")
.subscribe();
45
Producing Object Events From Java
ProducerBuilder<Observation> producerBuilder =
pulsarClient.newProducer(JSONSchema.of(Observation.class))
.topic(topicName)
.producerName(producerName).sendTimeout(60,
TimeUnit.SECONDS);
Producer<Observation> producer = producerBuilder.create();
msgID = producer.newMessage()
.key(someUniqueKey)
.value(observation)
.send();
46
Monitoring and Metrics Check
curl http://pulsar1:8080/admin/v2/persistent/conf/europe/first/stats |
python3 -m json.tool
bin/pulsar-admin topics stats-internal persistent://conf/europe/first
curl http://pulsar1:8080/metrics/
bin/pulsar-admin topics stats-internal persistent://conf/europe/first
bin/pulsar-admin topics peek-messages --count 5 --subscription
ete-reader persistent://conf/europe/first
bin/pulsar-admin topics subscriptions persistent://conf/europe/first
47
Metrics: Broker
Broker metrics are exposed under "/metrics" at port 8080.
You can change the port by updating webServicePort to a different port
in the broker.conf configuration file.
All the metrics exposed by a broker are labeled with
cluster=${pulsar_cluster}.
The name of Pulsar cluster is the value of ${pulsar_cluster},
configured in the broker.conf file.
For more information: https://pulsar.apache.org/docs/en/reference-metrics/#broker
48
Metrics: Broker
These metrics are available for brokers:
● Namespace metrics
○ Replication metrics
● Topic metrics
○ Replication metrics
● ManagedLedgerCache metrics
● ManagedLedger metrics
● LoadBalancing metrics
○ BundleUnloading metrics
○ BundleSplit metrics
● Subscription metrics
● Consumer metrics
● ManagedLedger bookie client metrics
49
Cleanup
bin/pulsar-admin topics delete persistent://conf/europe/first
bin/pulsar-admin namespaces delete conf/europe
bin/pulsar-admin tenants delete conf
50
Java for Pulsar
● https://github.com/tspannhw/airquality
● https://github.com/tspannhw/FLiPN-AirQuality-REST
● https://github.com/tspannhw/pulsar-airquality-function
● https://github.com/tspannhw/FLiPN-DEVNEXUS-2022
● https://github.com/tspannhw/FLiP-Py-ADS-B
● https://github.com/tspannhw/pulsar-adsb-function
● https://github.com/tspannhw/airquality-amqp-consumer
● https://github.com/tspannhw/airquality-mqtt-consumer
● https://github.com/tspannhw/airquality-consumer
● https://github.com/tspannhw/airquality-kafka-consumer
51
Python For Pulsar on Pi
● https://github.com/tspannhw/FLiP-Pi-BreakoutGarden
● https://github.com/tspannhw/FLiP-Pi-Thermal
● https://github.com/tspannhw/FLiP-Pi-Weather
● https://github.com/tspannhw/FLiP-RP400
● https://github.com/tspannhw/FLiP-Py-Pi-GasThermal
● https://github.com/tspannhw/FLiP-PY-FakeDataPulsar
● https://github.com/tspannhw/FLiP-Py-Pi-EnviroPlus
● https://github.com/tspannhw/PythonPulsarExamples
● https://github.com/tspannhw/pulsar-pychat-function
● https://github.com/tspannhw/FLiP-PulsarDevPython101
● https://github.com/tspannhw/airquality
52
FLiP Stack Weekly
This week in Apache Flink, Apache Pulsar, Apache
NiFi, Apache Spark, Java and Open Source friends.
https://bit.ly/32dAJft
Let’s Keep
in Touch!
Tim Spann
Developer Advocate
PaaSDev
https://www.linkedin.com/in/timothyspann
https://github.com/tspannhw

More Related Content

Similar to ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar (20)

PDF
bigdata 2022_ FLiP Into Pulsar Apps
Timothy Spann
 
PDF
Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Timothy Spann
 
PDF
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Timothy Spann
 
PDF
(Current22) Let's Monitor The Conditions at the Conference
Timothy Spann
 
PDF
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
HostedbyConfluent
 
PDF
Timothy Spann: Apache Pulsar for ML
Edunomica
 
PDF
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Timothy Spann
 
PDF
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Timothy Spann
 
PDF
JConf.dev 2022 - Apache Pulsar Development 101 with Java
Timothy Spann
 
PDF
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team
Timothy Spann
 
PDF
Unified Messaging and Data Streaming 101
Timothy Spann
 
PDF
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Timothy Spann
 
PDF
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
Timothy Spann
 
PDF
Using FLiP with influxdb for edgeai iot at scale 2022
Timothy Spann
 
PDF
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
Altinity Ltd
 
PDF
OSA Con 2022: Streaming Data Made Easy
Timothy Spann
 
PDF
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Timothy Spann
 
PDF
Pulsar - flexible pub-sub for internet scale
Matteo Merli
 
PDF
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
Timothy Spann
 
PDF
Virtual Flink Forward 2020: Build your next-generation stream platform based ...
Flink Forward
 
bigdata 2022_ FLiP Into Pulsar Apps
Timothy Spann
 
Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Timothy Spann
 
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Timothy Spann
 
(Current22) Let's Monitor The Conditions at the Conference
Timothy Spann
 
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
HostedbyConfluent
 
Timothy Spann: Apache Pulsar for ML
Edunomica
 
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Timothy Spann
 
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Timothy Spann
 
JConf.dev 2022 - Apache Pulsar Development 101 with Java
Timothy Spann
 
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team
Timothy Spann
 
Unified Messaging and Data Streaming 101
Timothy Spann
 
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Timothy Spann
 
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
Timothy Spann
 
Using FLiP with influxdb for edgeai iot at scale 2022
Timothy Spann
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
Altinity Ltd
 
OSA Con 2022: Streaming Data Made Easy
Timothy Spann
 
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Timothy Spann
 
Pulsar - flexible pub-sub for internet scale
Matteo Merli
 
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
Timothy Spann
 
Virtual Flink Forward 2020: Build your next-generation stream platform based ...
Flink Forward
 

More from Timothy Spann (20)

PDF
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Timothy Spann
 
PDF
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
Timothy Spann
 
PDF
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Timothy Spann
 
PDF
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
Timothy Spann
 
PDF
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
Timothy Spann
 
PDF
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
Timothy Spann
 
PDF
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
PDF
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
PDF
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
Timothy Spann
 
PDF
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
Timothy Spann
 
PPTX
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
Timothy Spann
 
PDF
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
Timothy Spann
 
PDF
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
Timothy Spann
 
PDF
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
Timothy Spann
 
PDF
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
Timothy Spann
 
PDF
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
Timothy Spann
 
PDF
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
Timothy Spann
 
PDF
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
Timothy Spann
 
PDF
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
Timothy Spann
 
PDF
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
Timothy Spann
 
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Timothy Spann
 
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
Timothy Spann
 
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Timothy Spann
 
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
Timothy Spann
 
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
Timothy Spann
 
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
Timothy Spann
 
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
Timothy Spann
 
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
Timothy Spann
 
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
Timothy Spann
 
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
Timothy Spann
 
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
Timothy Spann
 
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
Timothy Spann
 
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
Timothy Spann
 
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
Timothy Spann
 
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
Timothy Spann
 
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
Timothy Spann
 
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
Timothy Spann
 
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
Timothy Spann
 
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
Timothy Spann
 
Ad

Recently uploaded (20)

PDF
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PDF
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
PDF
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
PPTX
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
PPTX
Engineering the Java Web Application (MVC)
abhishekoza1981
 
PPTX
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
PDF
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
 
PDF
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
PPTX
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PPTX
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
Engineering the Java Web Application (MVC)
abhishekoza1981
 
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
Ad

ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar

  • 1. Deep Dive into Building Streaming Applications with Apache Pulsar
  • 2. Tim Spann Developer Advocate ● FLiP(N) Stack = Flink, Pulsar and NiFi Stack ● Streaming Systems/ Data Architect ● Experience: ○ 15+ years of experience with batch and streaming technologies including Pulsar, Flink, Spark, NiFi, Spring, Java, Big Data, Cloud, MXNet, Hadoop, Datalakes, IoT and more.
  • 3. FLiP Stack Weekly This week in Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark and open source friends. https://bit.ly/32dAJft
  • 4. Apache Pulsar is a Cloud-Native Messaging and Event-Streaming Platform.
  • 5. Why Apache Pulsar? Unified Messaging Platform Guaranteed Message Delivery Resiliency Infinite Scalability
  • 7. ● “Bookies” ● Stores messages and cursors ● Messages are grouped in segments/ledgers ● A group of bookies form an “ensemble” to store a ledger ● “Brokers” ● Handles message routing and connections ● Stateless, but with caches ● Automatic load-balancing ● Topics are composed of multiple segments ● ● Stores metadata for both Pulsar and BookKeeper ● Service discovery Store Messages Metadata & Service Discovery Metadata & Service Discovery Key Pulsar Concepts: Architecture MetaData Storage
  • 8. Component Description Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data can also conform to data schemas. Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like topic compaction. Properties An optional key/value map of user-defined properties. Producer name The name of the producer who produces the message. If you do not specify a producer name, the default name is used. Message De-Duplication. Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the message is its order in that sequence. Message De-Duplication. Messages - the basic unit of Pulsar
  • 9. Key Pulsar Concepts: Messaging vs Streaming Message Queueing - Queueing systems are ideal for work queues that do not require tasks to be performed in a particular order. Streaming - Streaming works best in situations where the order of messages is important.
  • 10. Connectivity • Functions - Lightweight Stream Processing (Java, Python, Go) • Connectors - Sources & Sinks (Cassandra, Kafka, …) • Protocol Handlers - AoP (AMQP), KoP (Kafka), MoP (MQTT) • Processing Engines - Flink, Spark, Presto/Trino via Pulsar SQL • Data Offloaders - Tiered Storage - (S3) hub.streamnative.io
  • 11. Schema Registry Schema Registry schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3 (value=Avro/Protobuf/JSON) Schema Data ID Local Cache for Schemas + Schema Data ID + Local Cache for Schemas Send schema-1 (value=Avro/Protobuf/JSON) data serialized per schema ID Send (register) schema (if not in local cache) Read schema-1 (value=Avro/Protobuf/JSON) data deserialized per schema ID Get schema by ID (if not in local cache) Producers Consumers
  • 15. Presto/Trino workers can read segments directly from bookies (or offloaded storage) in parallel. Bookie 1 Segment 1 Producer Consumer Broker 1 Topic1-Part1 Broker 2 Topic1-Part2 Broker 3 Topic1-Part3 Segment 2 Segment 3 Segment 4 Segment X Segment 1 Segment 1 Segment 1 Segment 3 Segment 3 Segment 3 Segment 2 Segment 2 Segment 2 Segment 4 Segment 4 Segment 4 Segment X Segment X Segment X Bookie 2 Bookie 3 Query Coordin ator . . . . . . SQL Worker SQL Worker SQL Worker SQL Worker Query Topic Metadata Pulsar SQL
  • 16. ● Buffer ● Batch ● Route ● Filter ● Aggregate ● Enrich ● Replicate ● Dedupe ● Decouple ● Distribute
  • 17. Pulsar Functions ● Lightweight computation similar to AWS Lambda. ● Specifically designed to use Apache Pulsar as a message bus. ● Function runtime can be located within Pulsar Broker. A serverless event streaming framework
  • 18. 18
  • 19. 19
  • 20. ● Buffer ● Batch ● Route ● Filter ● Aggregate ● Enrich ● Replicate ● Dedupe ● Decouple ● Distribute 20
  • 21. ● Consume messages from one or more Pulsar topics. ● Apply user-supplied processing logic to each message. ● Publish the results of the computation to another topic. ● Support multiple programming languages (Java, Python, Go) ● Can leverage 3rd-party libraries to support the execution of ML models on the edge. Pulsar Functions 21
  • 22. Run a Local Standalone Bare Metal wget https://archive.apache.org/dist/pulsar/pulsar-2.10.1/apache-pulsar-2.10.1- bin.tar.gz tar xvfz apache-pulsar-2.10.1-bin.tar.gz cd apache-pulsar-2.10.1 bin/pulsar standalone (For Pulsar SQL Support) bin/pulsar sql-worker start https://pulsar.apache.org/docs/en/standalone/ 22
  • 23. <or> Run in Docker docker run -it -p 6650:6650 -p 8080:8080 --mount source=pulsardata,target=/pulsar/data --mount source=pulsarconf,target=/pulsar/conf apachepulsar/pulsar:2.10.1 bin/pulsar standalone https://pulsar.apache.org/docs/en/standalone-docker/ 23
  • 24. Building Tenant, Namespace, Topics bin/pulsar-admin tenants create conf bin/pulsar-admin namespaces create conf/europe bin/pulsar-admin tenants list bin/pulsar-admin namespaces list conf bin/pulsar-admin topics create persistent://conf/europe/first bin/pulsar-admin topics list conf/europe 24
  • 25. Install Python 3 Pulsar Client pip3 install pulsar-client=='2.10.1[all]' Includes AARCH64, ARM, M2, INTEL, … For Python on Pulsar on Pi https://github.com/tspannhw/PulsarOnRaspberryPi https://pulsar.apache.org/docs/en/client-libraries-python/ https://pypi.org/project/pulsar-client/2.10.0/#files 25
  • 26. Building a Python 3 Producer import pulsar client = pulsar.Client('pulsar://localhost:6650') producer client.create_producer('persistent://conf/ete/first') producer.send(('Simple Text Message').encode('utf-8')) client.close() 26
  • 27. Building a Python 3 Cloud Producer Oath python3 prod.py -su pulsar+ssl://name1.name2.snio.cloud:6651 -t persistent://public/default/pyth --auth-params '{"issuer_url":"https://auth.streamnative.cloud", "private_key":"my.json", "audience":"urn:sn:pulsar:name:myclustr"}' from pulsar import Client, AuthenticationOauth2 parse = argparse.ArgumentParser(prog=prod.py') parse.add_argument('-su', '--service-url', dest='service_url', type=str, required=True) args = parse.parse_args() client = pulsar.Client(args.service_url, authentication=AuthenticationOauth2(args.auth_params)) https://github.com/streamnative/examples/blob/master/cloud/python/OAuth2Producer.py https://github.com/tspannhw/FLiP-Pi-BreakoutGarden 27
  • 28. Example Avro Schema Usage import pulsar from pulsar.schema import * from pulsar.schema import AvroSchema class thermal(Record): uuid = String() client = pulsar.Client('pulsar://pulsar1:6650') thermalschema = AvroSchema(thermal) producer = client.create_producer(topic='persistent://public/default/pi-thermal-avro', schema=thermalschema,properties={"producer-name": "thrm" }) thermalRec = thermal() thermalRec.uuid = "unique-name" producer.send(thermalRec,partition_key=uniqueid) https://github.com/tspannhw/FLiP-Pi-Thermal 28
  • 29. Example Json Schema Usage import pulsar from pulsar.schema import * from pulsar.schema import JsonSchema class weather(Record): uuid = String() client = pulsar.Client('pulsar://pulsar1:6650') wsc = JsonSchema(thermal) producer = client.create_producer(topic='persistent://public/default/wthr,schema=wsc,pro perties={"producer-name": "wthr" }) weatherRec = weather() weatherRec.uuid = "unique-name" producer.send(weatherRec,partition_key=uniqueid) https://github.com/tspannhw/FLiP-Pi-Weather https://github.com/tspannhw/FLiP-PulsarDevPython101 29
  • 30. Building a Python3 Consumer import pulsar client = pulsar.Client('pulsar://localhost:6650') consumer = client.subscribe('persistent://conf/ete/first',subscription_name='mine') while True: msg = consumer.receive() print("Received message: '%s'" % msg.data()) consumer.acknowledge(msg) client.close() 30
  • 31. MQTT from Python pip3 install paho-mqtt import paho.mqtt.client as mqtt client = mqtt.Client("rpi4iot") row = { } row['gasKO'] = str(readings) json_string = json.dumps(row) json_string = json_string.strip() client.connect("pulsar-server.com", 1883, 180) client.publish("persistent://public/default/mqtt-2", payload=json_string,qos=0,retain=True) https://www.slideshare.net/bunkertor/data-minutes-2-apache-pulsar-with-mqtt-for-edge-computing-lightning-2022 31
  • 32. Web Sockets from Python pip3 install websocket-client import websocket, base64, json topic = 'ws://server:8080/ws/v2/producer/persistent/public/default/topic1' ws = websocket.create_connection(topic) message = "Hello Philly ETE Conference" message_bytes = message.encode('ascii') base64_bytes = base64.b64encode(message_bytes) base64_message = base64_bytes.decode('ascii') ws.send(json.dumps({'payload' : base64_message,'properties': {'device' : 'macbook'},'context' : 5})) response = json.loads(ws.recv()) https://pulsar.apache.org/docs/en/client-libraries-websocket/ https://github.com/tspannhw/FLiP-IoT/blob/main/wspulsar.py https://github.com/tspannhw/FLiP-IoT/blob/main/wsreader.py 32
  • 33. Kafka from Python pip3 install kafka-python from kafka import KafkaProducer from kafka.errors import KafkaError row = { } row['gasKO'] = str(readings) json_string = json.dumps(row) json_string = json_string.strip() producer = KafkaProducer(bootstrap_servers='pulsar1:9092',retries=3) producer.send('topic-kafka-1', json.dumps(row).encode('utf-8')) producer.flush() https://github.com/streamnative/kop https://docs.streamnative.io/platform/v1.0.0/concepts/kop-concepts 33
  • 34. Deploy Python Functions bin/pulsar-admin functions create --auto-ack true --py py/src/sentiment.py --classname "sentiment.Chat" --inputs "persistent://public/default/chat" --log-topic "persistent://public/default/logs" --name Chat --output "persistent://public/default/chatresult" https://github.com/tspannhw/pulsar-pychat-function 34
  • 35. Pulsar IO Function in Python 3.9+ from pulsar import Function import json class Chat(Function): def __init__(self): pass def process(self, input, context): logger = context.get_logger() msg_id = context.get_message_id() fields = json.loads(input) https://github.com/tspannhw/pulsar-pychat-function 35
  • 36. Building a Golang Pulsar App http://pulsar.apache.org/docs/en/client-libraries-go/ go get -u "github.com/apache/pulsar-client-go/pulsar" import ( "log" "time" "github.com/apache/pulsar-client-go/pulsar" ) func main() { client, err := pulsar.NewClient(pulsar.ClientOptions{ URL: "pulsar://localhost:6650",OperationTimeout: 30 * time.Second, ConnectionTimeout: 30 * time.Second, }) if err != nil { log.Fatalf("Could not instantiate Pulsar client: %v", err) } defer client.Close() } 36
  • 37. Pulsar Producer import java.util.UUID; import java.net.URL; import org.apache.pulsar.client.api.Producer; import org.apache.pulsar.client.api.ProducerBuilder; import org.apache.pulsar.client.api.PulsarClient; import org.apache.pulsar.client.api.MessageId; import org.apache.pulsar.client.impl.auth.oauth2.AuthenticationFactoryOAuth2; PulsarClient client = PulsarClient.builder() .serviceUrl(serviceUrl) .authentication( AuthenticationFactoryOAuth2.clientCredentials( new URL(issuerUrl), new URL(credentialsUrl.), audience)) .build(); 37
  • 39. Spring MQTT Producer MqttMessage mqttMessage = new MqttMessage(); mqttMessage.setPayload(DataUtility.serialize(payload)); mqttMessage.setQos(1); mqttMessage.setRetained(true); mqttClient.publish(topicName, mqttMessage); 39
  • 40. Spring Kafka Producer ProducerRecord<String, String> producerRecord = new ProducerRecord<>(topicName, uuidKey.toString(), DataUtility.serializeToJSON(message)); kafkaTemplate.send(producerRecord); 40
  • 41. Pulsar Simple Producer String pulsarKey = UUID.randomUUID().toString(); String OS = System.getProperty("os.name").toLowerCase(); ProducerBuilder<byte[]> producerBuilder = client.newProducer().topic(topic) .producerName("demo"); Producer<byte[]> producer = producerBuilder.create(); MessageId msgID = producer.newMessage().key(pulsarKey).value("msg".getBytes()) .property("device",OS).send(); producer.close(); client.close(); 41
  • 42. import java.util.function.Function; public class MyFunction implements Function<String, String> { public String apply(String input) { return doBusinessLogic(input); } } Your Code Here Pulsar Function Java 42
  • 43. import org.apache.pulsar.client.impl.schema.JSONSchema; import org.apache.pulsar.functions.api.*; public class AirQualityFunction implements Function<byte[], Void> { @Override public Void process(byte[] input, Context context) { context.getLogger().debug("File:” + new String(input)); context.newOutputMessage(“topicname”, JSONSchema.of(Observation.class)) .key(UUID.randomUUID().toString()) .property(“prop1”, “value1”) .value(observation) .send(); } } Your Code Here Pulsar Function SDK 43
  • 44. Setting Subscription Type Java Consumer<byte[]> consumer = pulsarClient.newConsumer() .topic(topic) .subscriptionName(“subscriptionName") .subscriptionType(SubscriptionType.Shared) .subscribe(); 44
  • 45. Subscribing to a Topic and Setting Subscription Name Java Consumer<byte[]> consumer = pulsarClient.newConsumer() .topic(topic) .subscriptionName(“subscriptionName") .subscribe(); 45
  • 46. Producing Object Events From Java ProducerBuilder<Observation> producerBuilder = pulsarClient.newProducer(JSONSchema.of(Observation.class)) .topic(topicName) .producerName(producerName).sendTimeout(60, TimeUnit.SECONDS); Producer<Observation> producer = producerBuilder.create(); msgID = producer.newMessage() .key(someUniqueKey) .value(observation) .send(); 46
  • 47. Monitoring and Metrics Check curl http://pulsar1:8080/admin/v2/persistent/conf/europe/first/stats | python3 -m json.tool bin/pulsar-admin topics stats-internal persistent://conf/europe/first curl http://pulsar1:8080/metrics/ bin/pulsar-admin topics stats-internal persistent://conf/europe/first bin/pulsar-admin topics peek-messages --count 5 --subscription ete-reader persistent://conf/europe/first bin/pulsar-admin topics subscriptions persistent://conf/europe/first 47
  • 48. Metrics: Broker Broker metrics are exposed under "/metrics" at port 8080. You can change the port by updating webServicePort to a different port in the broker.conf configuration file. All the metrics exposed by a broker are labeled with cluster=${pulsar_cluster}. The name of Pulsar cluster is the value of ${pulsar_cluster}, configured in the broker.conf file. For more information: https://pulsar.apache.org/docs/en/reference-metrics/#broker 48
  • 49. Metrics: Broker These metrics are available for brokers: ● Namespace metrics ○ Replication metrics ● Topic metrics ○ Replication metrics ● ManagedLedgerCache metrics ● ManagedLedger metrics ● LoadBalancing metrics ○ BundleUnloading metrics ○ BundleSplit metrics ● Subscription metrics ● Consumer metrics ● ManagedLedger bookie client metrics 49
  • 50. Cleanup bin/pulsar-admin topics delete persistent://conf/europe/first bin/pulsar-admin namespaces delete conf/europe bin/pulsar-admin tenants delete conf 50
  • 51. Java for Pulsar ● https://github.com/tspannhw/airquality ● https://github.com/tspannhw/FLiPN-AirQuality-REST ● https://github.com/tspannhw/pulsar-airquality-function ● https://github.com/tspannhw/FLiPN-DEVNEXUS-2022 ● https://github.com/tspannhw/FLiP-Py-ADS-B ● https://github.com/tspannhw/pulsar-adsb-function ● https://github.com/tspannhw/airquality-amqp-consumer ● https://github.com/tspannhw/airquality-mqtt-consumer ● https://github.com/tspannhw/airquality-consumer ● https://github.com/tspannhw/airquality-kafka-consumer 51
  • 52. Python For Pulsar on Pi ● https://github.com/tspannhw/FLiP-Pi-BreakoutGarden ● https://github.com/tspannhw/FLiP-Pi-Thermal ● https://github.com/tspannhw/FLiP-Pi-Weather ● https://github.com/tspannhw/FLiP-RP400 ● https://github.com/tspannhw/FLiP-Py-Pi-GasThermal ● https://github.com/tspannhw/FLiP-PY-FakeDataPulsar ● https://github.com/tspannhw/FLiP-Py-Pi-EnviroPlus ● https://github.com/tspannhw/PythonPulsarExamples ● https://github.com/tspannhw/pulsar-pychat-function ● https://github.com/tspannhw/FLiP-PulsarDevPython101 ● https://github.com/tspannhw/airquality 52
  • 53. FLiP Stack Weekly This week in Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark, Java and Open Source friends. https://bit.ly/32dAJft
  • 54. Let’s Keep in Touch! Tim Spann Developer Advocate PaaSDev https://www.linkedin.com/in/timothyspann https://github.com/tspannhw