SlideShare a Scribd company logo
1
KSQL – Streaming SQL
for Apache Kafka
Matthias J. Sax | Software Engineer
matthias@confluent.io
@MatthiasJSax
2
1.0 Enterprise
Ready
A Brief History of Kafka and Confluent
0.11 Exactly-once
semantics
0.10 Data processing
(Streams API)
0.9 Data integration
(Connect API)
Intra-cluster
replication
0.8
2012 2014
Cluster mirroring0.7
2015 2016 20172013 2018
CP 4.1
KSQL GA
3
Why KSQL?
• Enable Stream Processing for Non-Engineers
• Everybody knows SQL
• Look Ma, no code!
• Declarative stream processing language
• Fast prototyping
• Streaming SQL engine for Apache Kafka
• Streaming ETL
• Ad-hoc topic inspection
4
Trade-Offs
• subscribe()
• poll()
• send()
• flush()
Consumer,
Producer
• filter()
• join()
• aggregate()
Kafka Streams
• Select…from…
• Join…where…
• Group by..
KSQL
Flexibility Simplicity
5
Core Concepts KSQL
6
CREATE STREAM clickstream (
time BIGINT,
url VARCHAR,
status INTEGER,
bytes INTEGER,
user_id VARCHAR,
agent VARCHAR)
WITH (
value_format = 'JSON',
kafka_topic = 'my_clickstream_topic'
);
Creating a Stream
7
Querying Streams
CREATE STREAM user_clicks AS
SELECT *
FROM clickstream
WHERE user_id = 'mjsax';
8
CREATE TABLE clicks AS
SELECT user_id, COUNT(url)
FROM clickstream
WINDOW TUMBLING (size 30 seconds)
GROUP BY user_id
HAVING COUNT(url) > 20
WHERE bytes > 1024;
Windowed Aggregations
9
Windowed Aggregations
10
Core Concepts KSQL
Do you think that’s a table you are querying ?
12
CREATE TABLE users (
user_id INTEGER,
registered_at LONG,
username VARCHAR,
name VARCHAR,
city VARCHAR,
level VARCHAR)
WITH (
key = 'user_id',
kafka_topic = 'clickstream_users',
value_format = 'AVRO');
Creating a Table
13
Confluent Schema Registry FTW!
CREATE TABLE users
WITH (
key = 'user_id',
kafka_topic = 'clickstream_users',
value_format = 'AVRO');
Creating a Table
14
Using Tables
15
CREATE STREAM vip_actions AS
SELECT c.user_id, fullname, url, status
FROM clickstream c
LEFT JOIN users u ON c.user_id = u.user_id
WHERE u.level = 'Platinum';
Joins for Enrichment
16
Joins for Enrichment
17
Stream-Table-Duality
18
How to deploy and use KSQL
19
How to run KSQL
JVM
KSQL Server
KSQL CLI
JVM
KSQL Server
JVM
KSQL Server
Kafka Cluster
#1 Client-server
20
How to run KSQL
#1 Client-server
• Start any number of server nodes
bin/ksql-server-start
• Start one or more CLIs and point them to a server
bin/ksql https://myksqlserver:8090
• All servers share the processing load
Technically, instances of the same Kafka Streams Applications
Scale up/down without restart
21
How to run KSQL
JVM
KSQL Server
JVM
KSQL Server
JVM
KSQL Server
#2 as a standalone Application
Kafka Cluster
22
How to run KSQL
#2 as a standalone Application
• Start any number of server nodes
Pass a file of KSQL statement to execute
bin/ksql-node query-file=foo/bar.sql
• Ideal for streaming ETL application deployment
Version-control your queries and transformations as code
• All running engines share the processing load
Technically, instances of the same Kafka Streams Applications
Scale up/down without restart
23
How to run KSQL
#3 EMBEDDED IN AN APPLICATION
JVM App Instance
KSQL Engine
Application Code
JVM App Instance
KSQL Engine
Application Code
JVM App Instance
KSQL Engine
Application Code
Kafka Cluster
24
How to run KSQL
#3 EMBEDDED IN AN APPLICATION
• Embed directly in your Java application
• Generate and execute KSQL queries through the Java API
Version-control your queries and transformations as code
• All running application instances share the processing load
Technically, instances of the same Kafka Streams Applications
Scale up/down without restart
25
Internals
26
Internals
Read input from Kafka
Operator DAG:
• filter/map/aggregation/joins
• Operators can be stateful
Write result back to Kafka
27
Internals
28
Runtime: Kafka Streams
29
Distributed State
30
Scaling
31
Take home
• Streaming SQL engine for Apache Kafka
• Leverages Kafka Streams
• Open Source: Apache 2.0
• https://github.com/confluentinc/ksql/
• Included in Confluent Platform 4.1
• https://www.confluent.io/product/ksql/
• https://docs.confluent.io/current/ksql/docs/index.html
32
Thank You
We are hiring!

More Related Content

Similar to KSQL---Streaming SQL for Apache Kafka (20)

PDF
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
confluent
 
PDF
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Codemotion
 
PDF
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Codemotion
 
PDF
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
Kai Wähner
 
PDF
KSQL – An Open Source Streaming Engine for Apache Kafka
Kai Wähner
 
PPTX
KSQL and Kafka Streams – When to Use Which, and When to Use Both
confluent
 
ODP
KSQL- Streaming Sql for Kafka
Knoldus Inc.
 
PDF
KSQL Intro
confluent
 
PDF
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Michael Noll
 
PDF
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Michael Noll
 
PDF
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
Kai Wähner
 
PDF
KSQL: Open Source Streaming for Apache Kafka
confluent
 
PPTX
Introduction to KSQL: Streaming SQL for Apache Kafka®
confluent
 
PDF
To Ksql Or Live the KStream
Dani Traphagen
 
PDF
APAC ksqlDB Workshop
confluent
 
PPTX
Deploying and Operating KSQL
confluent
 
PDF
When to KSQL & When to Live the KStream (Dani Traphagen, Confluent) Kafka Sum...
confluent
 
PPTX
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Kinetica
 
PPTX
Exploring KSQL Patterns
confluent
 
PDF
Jug - ecosystem
Florent Ramiere
 
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
confluent
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Codemotion
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Codemotion
 
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
Kai Wähner
 
KSQL – An Open Source Streaming Engine for Apache Kafka
Kai Wähner
 
KSQL and Kafka Streams – When to Use Which, and When to Use Both
confluent
 
KSQL- Streaming Sql for Kafka
Knoldus Inc.
 
KSQL Intro
confluent
 
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Michael Noll
 
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Michael Noll
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
Kai Wähner
 
KSQL: Open Source Streaming for Apache Kafka
confluent
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
confluent
 
To Ksql Or Live the KStream
Dani Traphagen
 
APAC ksqlDB Workshop
confluent
 
Deploying and Operating KSQL
confluent
 
When to KSQL & When to Live the KStream (Dani Traphagen, Confluent) Kafka Sum...
confluent
 
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Kinetica
 
Exploring KSQL Patterns
confluent
 
Jug - ecosystem
Florent Ramiere
 

Recently uploaded (20)

PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
PPTX
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
PPTX
Platform for Enterprise Solution - Java EE5
abhishekoza1981
 
PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PPTX
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
PPTX
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PPTX
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
PPTX
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
PDF
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PPTX
Engineering the Java Web Application (MVC)
abhishekoza1981
 
PPTX
Feb 2021 Cohesity first pitch presentation.pptx
enginsayin1
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
PDF
Executive Business Intelligence Dashboards
vandeslie24
 
PPTX
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
PPTX
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
PDF
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
PPTX
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
Platform for Enterprise Solution - Java EE5
abhishekoza1981
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
Engineering the Java Web Application (MVC)
abhishekoza1981
 
Feb 2021 Cohesity first pitch presentation.pptx
enginsayin1
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
Executive Business Intelligence Dashboards
vandeslie24
 
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
Ad

KSQL---Streaming SQL for Apache Kafka

  • 1. 1 KSQL – Streaming SQL for Apache Kafka Matthias J. Sax | Software Engineer [email protected] @MatthiasJSax
  • 2. 2 1.0 Enterprise Ready A Brief History of Kafka and Confluent 0.11 Exactly-once semantics 0.10 Data processing (Streams API) 0.9 Data integration (Connect API) Intra-cluster replication 0.8 2012 2014 Cluster mirroring0.7 2015 2016 20172013 2018 CP 4.1 KSQL GA
  • 3. 3 Why KSQL? • Enable Stream Processing for Non-Engineers • Everybody knows SQL • Look Ma, no code! • Declarative stream processing language • Fast prototyping • Streaming SQL engine for Apache Kafka • Streaming ETL • Ad-hoc topic inspection
  • 4. 4 Trade-Offs • subscribe() • poll() • send() • flush() Consumer, Producer • filter() • join() • aggregate() Kafka Streams • Select…from… • Join…where… • Group by.. KSQL Flexibility Simplicity
  • 6. 6 CREATE STREAM clickstream ( time BIGINT, url VARCHAR, status INTEGER, bytes INTEGER, user_id VARCHAR, agent VARCHAR) WITH ( value_format = 'JSON', kafka_topic = 'my_clickstream_topic' ); Creating a Stream
  • 7. 7 Querying Streams CREATE STREAM user_clicks AS SELECT * FROM clickstream WHERE user_id = 'mjsax';
  • 8. 8 CREATE TABLE clicks AS SELECT user_id, COUNT(url) FROM clickstream WINDOW TUMBLING (size 30 seconds) GROUP BY user_id HAVING COUNT(url) > 20 WHERE bytes > 1024; Windowed Aggregations
  • 11. Do you think that’s a table you are querying ?
  • 12. 12 CREATE TABLE users ( user_id INTEGER, registered_at LONG, username VARCHAR, name VARCHAR, city VARCHAR, level VARCHAR) WITH ( key = 'user_id', kafka_topic = 'clickstream_users', value_format = 'AVRO'); Creating a Table
  • 13. 13 Confluent Schema Registry FTW! CREATE TABLE users WITH ( key = 'user_id', kafka_topic = 'clickstream_users', value_format = 'AVRO'); Creating a Table
  • 15. 15 CREATE STREAM vip_actions AS SELECT c.user_id, fullname, url, status FROM clickstream c LEFT JOIN users u ON c.user_id = u.user_id WHERE u.level = 'Platinum'; Joins for Enrichment
  • 18. 18 How to deploy and use KSQL
  • 19. 19 How to run KSQL JVM KSQL Server KSQL CLI JVM KSQL Server JVM KSQL Server Kafka Cluster #1 Client-server
  • 20. 20 How to run KSQL #1 Client-server • Start any number of server nodes bin/ksql-server-start • Start one or more CLIs and point them to a server bin/ksql https://myksqlserver:8090 • All servers share the processing load Technically, instances of the same Kafka Streams Applications Scale up/down without restart
  • 21. 21 How to run KSQL JVM KSQL Server JVM KSQL Server JVM KSQL Server #2 as a standalone Application Kafka Cluster
  • 22. 22 How to run KSQL #2 as a standalone Application • Start any number of server nodes Pass a file of KSQL statement to execute bin/ksql-node query-file=foo/bar.sql • Ideal for streaming ETL application deployment Version-control your queries and transformations as code • All running engines share the processing load Technically, instances of the same Kafka Streams Applications Scale up/down without restart
  • 23. 23 How to run KSQL #3 EMBEDDED IN AN APPLICATION JVM App Instance KSQL Engine Application Code JVM App Instance KSQL Engine Application Code JVM App Instance KSQL Engine Application Code Kafka Cluster
  • 24. 24 How to run KSQL #3 EMBEDDED IN AN APPLICATION • Embed directly in your Java application • Generate and execute KSQL queries through the Java API Version-control your queries and transformations as code • All running application instances share the processing load Technically, instances of the same Kafka Streams Applications Scale up/down without restart
  • 26. 26 Internals Read input from Kafka Operator DAG: • filter/map/aggregation/joins • Operators can be stateful Write result back to Kafka
  • 31. 31 Take home • Streaming SQL engine for Apache Kafka • Leverages Kafka Streams • Open Source: Apache 2.0 • https://github.com/confluentinc/ksql/ • Included in Confluent Platform 4.1 • https://www.confluent.io/product/ksql/ • https://docs.confluent.io/current/ksql/docs/index.html