SlideShare a Scribd company logo
Stream Processing with Apache Flink in the Cloud
and Stream Sharing
Kai Waehner
Field CTO, Confluent
Data Streaming is part of our everyday lives
Trending Shows >
Recommendations >
Popular TV >
……
Personalization
Popularity score
Pattern detection
Categorization
Curated features & virality
3
9
3
9
Streaming Data
Pipelines
Data
Sharing
Real-time
Analytics
Cyber-
security
IoT &
Telematics
ML & AI
Customer
360
Stream Processing
…
Core Kafka
Real-Time Applications
Streaming Apps
and Pipelines
Compute
Storage
Data Streaming with Confluent
Governance
Connectors
Platform Security Networking Observability
K A F K A
S T R E A M S
ksqlDB
K A F K A
C O N S U M E R
A N D
P R O D U C E R
S T R E A M
D E S I G N E R
Stream Processing for EVERYONE
Flexibility Simplicity
Steam Processing with Apache Flink
Serverless Flink as part of Confluent Cloud
•
•
• Stream data
•
•
•
Process streams
Two Apache projects, born a few years apart
Immerok acquisition:
Accelerates our efforts of
bringing a cloud- native
Flink service to our
customers
● Also building a cloud-native Flink service
● Employs leading PMC members & committers
for Apache Flink
● Tackling some of the hardest problems in
cloud data infrastructure
Seamlessly process your data everywhere it resides with a Flink service that spans across the three major cloud
providers
Cloud-Native Complete Everywhere
Our Flink service will employ the same product principles
we’ve followed for Kafka
Deployment flexibility
Integrated platform
Leverage Flink fully integrated with
Confluent’s complete feature set,
enabling developers to build stream
processing applications quickly,
reliably, and securely
+
Serverless experience
Eliminate the operational burden
of managing Flink with a fully
managed, cloud-native service
that is simple, secure, and scalable
44
Seamlessly process your data
everywhere it resides with a Flink
service that spans across the three
major cloud providers
Why Stream Processing with Apache Flink?
45
Stream processing use cases
46
Data Exploration Data Pipelines Real-time Apps
Engineers and Analysts
both need to be able to
simply read and
understand the event
streams stored in Kafka
● Metadata discovery
● Throughput analysis
● Data sampling
● Interactive query
Data pipelines are used to
enrich, curate, and transform
events streams, creating new
derived event streams
● Filtering
● Joins
● Projections
● Aggregations
● Flattening
● Enrichment
Whole ecosystems of apps feed
on event streams automating
action in real-time
● Threat detection
● Quality of Service
● Fraud detection
● Intelligent routing
● Alerting
Data Exploration
…
…
SELECT * FROM input where eventType='A'
A A A
Aggregates and Rich Temporal Functions
00:00 00:20 00:40 01:00 01:20
A
B
B
TEMPORAL ANALYTICS FUNCTIONS:
…
COUNT(*) OVER
( PARTITION BY EventType
ORDER BY order_time
RANGE BETWEEN INTERVAL '20'
SECONDS
PRECEDING AND CURRENT ROW
)
A,1 A,2 A,3 A,4 B,1 B,2
A,3 A,1 B,2
WINDOWS:
SELECT EventType, COUNT(*)
TUMBLE (... , INTERVAL '20'
seconds)
GROUP BY EventType
A A A
Aggregates and Rich Temporal Functions
00:00 00:20 00:40 01:00 01:20
A
B
B
WINDOWS:
SELECT EventType, COUNT(*)
TUMBLE (... , INTERVAL '20' seconds)
GROUP BY EventType
TEMPORAL ANALYTICS FUNCTIONS:
…
COUNT(*) OVER
( PARTITION BY EventType
ORDER BY order_time
RANGE BETWEEN INTERVAL '20'
SECONDS
PRECEDING AND CURRENT ROW
)
Windowing and temporal analytics functions offer reach set of constructs for real-time processing and scenarios such as fraud detection.
● Windows (tumbling, hopping, etc.): events produced at regular intervals
● Temporal analytics functions: events products immediately
● Full composability of operators (windows of windows, aggregates of aggregates, etc.)
A,1 A,2 A,3 A,4 B,1 B,2
A,3 A,1 B,2
Data Enrichment
50
Orders
Currency
Rate
t1, 21.5 USD
t3, 55 EUR
t5, 35.3 EUR
t0, EUR:USD=1.01
t2, EUR:USD=1.05
t4: EUR:USD=1.10
t1, 21.5 USD
t3, 57.75 USD
t5, 38.83 USD
SELECT
order_id,
price,
currency,
conversion_rate,
order_time,
FROM orders
LEFT JOIN currency_rates FOR SYSTEM_TIME AS OF orders.order_time
ON orders.currency = currency_rates.currency;
Complex Event Processing (CEP) - Pattern Detection
C
price>lag(price)
B
price<lag(price)
D
price<lag(price)
E
price>lag(price)
MATCH_RECOGNIZE(
PARTITION BY stock_ticker
MEASURES
A as firstvalue
LAST(Z) as lastvalue
PATTERN (A B+ C+ D+ E+)
DEFINE
B as price<LAST(price)
C as price>LAST(price)
D as price<LAST(price)
E as price>LAST(price) and price>LAST(C)
A
Read once, Write many
Fan-out queries using Flink SQL
INSERT INTO cluster1.topicA
SELECT * FROM input where eventType='A'
INSERT INTO cluster1.topicB
SELECT * FROM input where eventType='B'
…
Input
topicA
topicB
topicC
topicD
Support for multi-clusters
topicA
topicB
topicC
topicD
Cluster1
Cluster4
topicA
topicB
Cluster2
Cluster3
Cross cluster processing
Flink
ksqlDB
Kafka Streams
54
55
Serverless Apache Flink in Confluent Cloud
Feature Highlights
New SQL capabilities
Metadata integration
Cross cluster queries
Admin UI
CLI
Feature Highlights
Oauth and RBAC
Metered usage
Cloud UX
Notebook querying
Feature Highlights
99.99% uptime SLA
Private networking
Pricing & packaging
Feature Highlights
GA in all three clouds
Cluster autoscaling
Early Access
Spring 2023
We are planning to GA our Flink service in Q4 2023
Public Preview
Late Summer 2023
2
1
Limited Availability
Late Fall 2023
General Availability
Winter 2023/24
4
3
Product roadmap
56
Data Streaming is part of our everyday lives
Trending Shows >
Recommendations >
Popular TV >
……
Personalization
Popularity score
Pattern detection
Categorization
Curated features & virality
Integration across Kafka Clusters
58
Data Exploration
59
Data Exploration
60
FlinkSQL Query against Kafka Topic (ANSI SQL)
61
Complex Event Processing (Pattern Matching)
62
TL;DR - Serverless Flink in Confluent Cloud
63
64
Confluent Stream Sharing
● Easy collaboration on live data with
partners, customers, and vendors
✓ One-click sharing
✓ Trusted and governed
✓ Secure, granular, auditable
● Single, org-wide portal to discover data
streams from trusted sources
Confluent
Stream Sharing
Secure, trusted, real-time data sharing
Available Now:
GA
66
Confluent Stream Sharing in a Decentralized Data Mesh
Internal and external data sharing in real-time
Faster time to market, better customer experience and new business models
Generate an AsyncAPI Specification
67
Confluent Stream Sharing
Who is Stream Sharing for?
● Mainly for developers and architects in companies with medium-to-high Kafka maturity (Phases 3+ of
the streaming maturity model) with a use case to share their Kafka topics with any external parties
(e.g., vendors, partners, customers) or other internal teams (e.g., from different LoB)
What pain points is it trying to solve?
● Out-of-sync data: most existing solutions involve dumping data from Kafka to a sink in a batch
process and copying data from it before moving it to an external source, which turns real-time data into
stale data.
● Operational complexities: setting up, maintaining and scaling these sharing pipelines to meet security
and privacy requirements requires complex integration work and is operationally taxing
● Vendor lock-in: most sharing solutions require both the Data Provider and Data Recipient to be on the
same platform, resulting in contractual complexity and vendor lock-in
68
Confluent Stream Sharing
What are the differences between Stream Sharing and Cluster Linking in their sharing capabilities?
And which one should we recommend to the customers?
Stream Sharing is our default data-sharing solution
There are three major differences between Cluster Linking and Stream Sharing
● 1) With Stream Sharing, Data Recipients can consume directly from Data Provider’s Kafka cluster without the
need to copy the data, saving cluster infra and provisioning efforts. Cluster Linking requires a destination
cluster ready for byte-by-byte replication
○ Sharing grants recipients access to the shared topic and Schema Registry subjects. 1 topic + n shared
Schema Registry subjects are included in the same share.
● 2) Data Recipients can use any platform to consume from Stream Sharing, whether it’s CC, CP, OSS Kafka,
MSK or Aiven. Cluster Linking requires the Data Recipient to be on CC Dedicated Cluster or CP
● 3) Only an email address is needed for Data Provider to share via Stream Sharing, whereas in Cluster Linking,
both parties’ cluster ID, API credentials are required for multiple steps for setup and provisioning.
69

More Related Content

What's hot (20)

PPTX
APACHE KAFKA / Kafka Connect / Kafka Streams
Ketan Gote
 
PDF
Cloud Cost Management and Apache Spark with Xuan Wang
Databricks
 
PDF
Disaster Recovery Plans for Apache Kafka
confluent
 
PPTX
Introduction to Azure Databricks
James Serra
 
PPTX
The Impala Cookbook
Cloudera, Inc.
 
PPTX
Prometheus design and philosophy
Docker, Inc.
 
PDF
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Kai Wähner
 
PDF
Fundamentals of Apache Kafka
Chhavi Parasher
 
PDF
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Kai Wähner
 
PDF
Apache Flink internals
Kostas Tzoumas
 
PPTX
Reshape Data Lake (as of 2020.07)
Eric Sun
 
PDF
VictoriaMetrics: Welcome to the Virtual Meet Up March 2023
VictoriaMetrics
 
PDF
Can Apache Kafka Replace a Database?
Kai Wähner
 
PPT
Data Loss and Duplication in Kafka
Jayesh Thakrar
 
PPTX
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Gwen (Chen) Shapira
 
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
PDF
Understanding Presto - Presto meetup @ Tokyo #1
Sadayuki Furuhashi
 
PDF
Building End-to-End Delta Pipelines on GCP
Databricks
 
PDF
Stream processing with Apache Flink (Timo Walther - Ververica)
KafkaZone
 
PDF
Spark SQL Bucketing at Facebook
Databricks
 
APACHE KAFKA / Kafka Connect / Kafka Streams
Ketan Gote
 
Cloud Cost Management and Apache Spark with Xuan Wang
Databricks
 
Disaster Recovery Plans for Apache Kafka
confluent
 
Introduction to Azure Databricks
James Serra
 
The Impala Cookbook
Cloudera, Inc.
 
Prometheus design and philosophy
Docker, Inc.
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Kai Wähner
 
Fundamentals of Apache Kafka
Chhavi Parasher
 
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Kai Wähner
 
Apache Flink internals
Kostas Tzoumas
 
Reshape Data Lake (as of 2020.07)
Eric Sun
 
VictoriaMetrics: Welcome to the Virtual Meet Up March 2023
VictoriaMetrics
 
Can Apache Kafka Replace a Database?
Kai Wähner
 
Data Loss and Duplication in Kafka
Jayesh Thakrar
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Gwen (Chen) Shapira
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Understanding Presto - Presto meetup @ Tokyo #1
Sadayuki Furuhashi
 
Building End-to-End Delta Pipelines on GCP
Databricks
 
Stream processing with Apache Flink (Timo Walther - Ververica)
KafkaZone
 
Spark SQL Bucketing at Facebook
Databricks
 

Similar to Stream Processing with Flink and Stream Sharing (20)

PDF
Beyond the brokers - A tour of the Kafka ecosystem
Damien Gasparina
 
PDF
Beyond the Brokers: A Tour of the Kafka Ecosystem
confluent
 
PDF
Beyond the brokers - Un tour de l'écosystème Kafka
Florent Ramiere
 
PDF
Devoxx university - Kafka de haut en bas
Florent Ramiere
 
PDF
Confluent kafka meetupseattle jan2017
Nitin Kumar
 
PPTX
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
HostedbyConfluent
 
PDF
Concepts and Patterns for Streaming Services with Kafka
QAware GmbH
 
PDF
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
HostedbyConfluent
 
PDF
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
confluent
 
PDF
Reinventing Kafka in the Data Streaming Era - Jun Rao
confluent
 
PDF
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Michael Noll
 
PPTX
Splunk App for Stream
Splunk
 
PDF
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Timothy Spann
 
PDF
JHipster conf 2019 - Kafka Ecosystem
Florent Ramiere
 
PDF
Streaming Sensor Data Slides_Virender
vithakur
 
PDF
Citi Tech Talk: Hybrid Cloud
confluent
 
PDF
The Never Landing Stream with HTAP and Streaming
Timothy Spann
 
PDF
DIMT '23 Session_Demo_ Latest Innovations Breakout.pdf
confluent
 
PPTX
Bridge Your Kafka Streams to Azure Webinar
confluent
 
PDF
KFServing Payload Logging for Trusted AI
Animesh Singh
 
Beyond the brokers - A tour of the Kafka ecosystem
Damien Gasparina
 
Beyond the Brokers: A Tour of the Kafka Ecosystem
confluent
 
Beyond the brokers - Un tour de l'écosystème Kafka
Florent Ramiere
 
Devoxx university - Kafka de haut en bas
Florent Ramiere
 
Confluent kafka meetupseattle jan2017
Nitin Kumar
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
HostedbyConfluent
 
Concepts and Patterns for Streaming Services with Kafka
QAware GmbH
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
HostedbyConfluent
 
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
confluent
 
Reinventing Kafka in the Data Streaming Era - Jun Rao
confluent
 
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Michael Noll
 
Splunk App for Stream
Splunk
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Timothy Spann
 
JHipster conf 2019 - Kafka Ecosystem
Florent Ramiere
 
Streaming Sensor Data Slides_Virender
vithakur
 
Citi Tech Talk: Hybrid Cloud
confluent
 
The Never Landing Stream with HTAP and Streaming
Timothy Spann
 
DIMT '23 Session_Demo_ Latest Innovations Breakout.pdf
confluent
 
Bridge Your Kafka Streams to Azure Webinar
confluent
 
KFServing Payload Logging for Trusted AI
Animesh Singh
 
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
PDF
Migration, backup and restore made easy using Kannika
confluent
 
PDF
Five Things You Need to Know About Data Streaming in 2025
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
PDF
Unlocking value with event-driven architecture by Confluent
confluent
 
PDF
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
PDF
Building API data products on top of your real-time data infrastructure
confluent
 
PDF
Speed Wins: From Kafka to APIs in Minutes
confluent
 
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
Migration, backup and restore made easy using Kannika
confluent
 
Five Things You Need to Know About Data Streaming in 2025
confluent
 
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
Unlocking value with event-driven architecture by Confluent
confluent
 
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
Building API data products on top of your real-time data infrastructure
confluent
 
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Ad

Recently uploaded (20)

PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 

Stream Processing with Flink and Stream Sharing

  • 1. Stream Processing with Apache Flink in the Cloud and Stream Sharing Kai Waehner Field CTO, Confluent
  • 2. Data Streaming is part of our everyday lives Trending Shows > Recommendations > Popular TV > …… Personalization Popularity score Pattern detection Categorization Curated features & virality
  • 3. 3 9 3 9 Streaming Data Pipelines Data Sharing Real-time Analytics Cyber- security IoT & Telematics ML & AI Customer 360 Stream Processing … Core Kafka Real-Time Applications Streaming Apps and Pipelines Compute Storage Data Streaming with Confluent Governance Connectors Platform Security Networking Observability
  • 4. K A F K A S T R E A M S ksqlDB K A F K A C O N S U M E R A N D P R O D U C E R S T R E A M D E S I G N E R Stream Processing for EVERYONE Flexibility Simplicity
  • 5. Steam Processing with Apache Flink Serverless Flink as part of Confluent Cloud
  • 6. • • • Stream data • • • Process streams Two Apache projects, born a few years apart
  • 7. Immerok acquisition: Accelerates our efforts of bringing a cloud- native Flink service to our customers ● Also building a cloud-native Flink service ● Employs leading PMC members & committers for Apache Flink ● Tackling some of the hardest problems in cloud data infrastructure
  • 8. Seamlessly process your data everywhere it resides with a Flink service that spans across the three major cloud providers Cloud-Native Complete Everywhere Our Flink service will employ the same product principles we’ve followed for Kafka Deployment flexibility Integrated platform Leverage Flink fully integrated with Confluent’s complete feature set, enabling developers to build stream processing applications quickly, reliably, and securely + Serverless experience Eliminate the operational burden of managing Flink with a fully managed, cloud-native service that is simple, secure, and scalable 44 Seamlessly process your data everywhere it resides with a Flink service that spans across the three major cloud providers
  • 9. Why Stream Processing with Apache Flink? 45
  • 10. Stream processing use cases 46 Data Exploration Data Pipelines Real-time Apps Engineers and Analysts both need to be able to simply read and understand the event streams stored in Kafka ● Metadata discovery ● Throughput analysis ● Data sampling ● Interactive query Data pipelines are used to enrich, curate, and transform events streams, creating new derived event streams ● Filtering ● Joins ● Projections ● Aggregations ● Flattening ● Enrichment Whole ecosystems of apps feed on event streams automating action in real-time ● Threat detection ● Quality of Service ● Fraud detection ● Intelligent routing ● Alerting
  • 11. Data Exploration … … SELECT * FROM input where eventType='A'
  • 12. A A A Aggregates and Rich Temporal Functions 00:00 00:20 00:40 01:00 01:20 A B B TEMPORAL ANALYTICS FUNCTIONS: … COUNT(*) OVER ( PARTITION BY EventType ORDER BY order_time RANGE BETWEEN INTERVAL '20' SECONDS PRECEDING AND CURRENT ROW ) A,1 A,2 A,3 A,4 B,1 B,2 A,3 A,1 B,2 WINDOWS: SELECT EventType, COUNT(*) TUMBLE (... , INTERVAL '20' seconds) GROUP BY EventType
  • 13. A A A Aggregates and Rich Temporal Functions 00:00 00:20 00:40 01:00 01:20 A B B WINDOWS: SELECT EventType, COUNT(*) TUMBLE (... , INTERVAL '20' seconds) GROUP BY EventType TEMPORAL ANALYTICS FUNCTIONS: … COUNT(*) OVER ( PARTITION BY EventType ORDER BY order_time RANGE BETWEEN INTERVAL '20' SECONDS PRECEDING AND CURRENT ROW ) Windowing and temporal analytics functions offer reach set of constructs for real-time processing and scenarios such as fraud detection. ● Windows (tumbling, hopping, etc.): events produced at regular intervals ● Temporal analytics functions: events products immediately ● Full composability of operators (windows of windows, aggregates of aggregates, etc.) A,1 A,2 A,3 A,4 B,1 B,2 A,3 A,1 B,2
  • 14. Data Enrichment 50 Orders Currency Rate t1, 21.5 USD t3, 55 EUR t5, 35.3 EUR t0, EUR:USD=1.01 t2, EUR:USD=1.05 t4: EUR:USD=1.10 t1, 21.5 USD t3, 57.75 USD t5, 38.83 USD SELECT order_id, price, currency, conversion_rate, order_time, FROM orders LEFT JOIN currency_rates FOR SYSTEM_TIME AS OF orders.order_time ON orders.currency = currency_rates.currency;
  • 15. Complex Event Processing (CEP) - Pattern Detection C price>lag(price) B price<lag(price) D price<lag(price) E price>lag(price) MATCH_RECOGNIZE( PARTITION BY stock_ticker MEASURES A as firstvalue LAST(Z) as lastvalue PATTERN (A B+ C+ D+ E+) DEFINE B as price<LAST(price) C as price>LAST(price) D as price<LAST(price) E as price>LAST(price) and price>LAST(C) A
  • 16. Read once, Write many Fan-out queries using Flink SQL INSERT INTO cluster1.topicA SELECT * FROM input where eventType='A' INSERT INTO cluster1.topicB SELECT * FROM input where eventType='B' … Input topicA topicB topicC topicD
  • 19. 55 Serverless Apache Flink in Confluent Cloud
  • 20. Feature Highlights New SQL capabilities Metadata integration Cross cluster queries Admin UI CLI Feature Highlights Oauth and RBAC Metered usage Cloud UX Notebook querying Feature Highlights 99.99% uptime SLA Private networking Pricing & packaging Feature Highlights GA in all three clouds Cluster autoscaling Early Access Spring 2023 We are planning to GA our Flink service in Q4 2023 Public Preview Late Summer 2023 2 1 Limited Availability Late Fall 2023 General Availability Winter 2023/24 4 3 Product roadmap 56
  • 21. Data Streaming is part of our everyday lives Trending Shows > Recommendations > Popular TV > …… Personalization Popularity score Pattern detection Categorization Curated features & virality
  • 25. FlinkSQL Query against Kafka Topic (ANSI SQL) 61
  • 26. Complex Event Processing (Pattern Matching) 62
  • 27. TL;DR - Serverless Flink in Confluent Cloud 63
  • 29. ● Easy collaboration on live data with partners, customers, and vendors ✓ One-click sharing ✓ Trusted and governed ✓ Secure, granular, auditable ● Single, org-wide portal to discover data streams from trusted sources Confluent Stream Sharing Secure, trusted, real-time data sharing Available Now: GA
  • 30. 66 Confluent Stream Sharing in a Decentralized Data Mesh Internal and external data sharing in real-time Faster time to market, better customer experience and new business models
  • 31. Generate an AsyncAPI Specification 67
  • 32. Confluent Stream Sharing Who is Stream Sharing for? ● Mainly for developers and architects in companies with medium-to-high Kafka maturity (Phases 3+ of the streaming maturity model) with a use case to share their Kafka topics with any external parties (e.g., vendors, partners, customers) or other internal teams (e.g., from different LoB) What pain points is it trying to solve? ● Out-of-sync data: most existing solutions involve dumping data from Kafka to a sink in a batch process and copying data from it before moving it to an external source, which turns real-time data into stale data. ● Operational complexities: setting up, maintaining and scaling these sharing pipelines to meet security and privacy requirements requires complex integration work and is operationally taxing ● Vendor lock-in: most sharing solutions require both the Data Provider and Data Recipient to be on the same platform, resulting in contractual complexity and vendor lock-in 68
  • 33. Confluent Stream Sharing What are the differences between Stream Sharing and Cluster Linking in their sharing capabilities? And which one should we recommend to the customers? Stream Sharing is our default data-sharing solution There are three major differences between Cluster Linking and Stream Sharing ● 1) With Stream Sharing, Data Recipients can consume directly from Data Provider’s Kafka cluster without the need to copy the data, saving cluster infra and provisioning efforts. Cluster Linking requires a destination cluster ready for byte-by-byte replication ○ Sharing grants recipients access to the shared topic and Schema Registry subjects. 1 topic + n shared Schema Registry subjects are included in the same share. ● 2) Data Recipients can use any platform to consume from Stream Sharing, whether it’s CC, CP, OSS Kafka, MSK or Aiven. Cluster Linking requires the Data Recipient to be on CC Dedicated Cluster or CP ● 3) Only an email address is needed for Data Provider to share via Stream Sharing, whereas in Cluster Linking, both parties’ cluster ID, API credentials are required for multiple steps for setup and provisioning. 69