SlideShare a Scribd company logo
1
2
Tim is a teacher, author and technology leader with
Confluent. He is not only an expert on KSQL but he can
also frequently be found speaking at conferences in the
United States and all over the world. He is the co-presenter
of various O’Reilly training videos on topics ranging from Git
to Distributed Systems, and he is the author of Gradle
Beyond the Basics.
Tim Berglund
Senior Director of Developer Experience,
Confluent
3
Housekeeping Items
● This session will last about an hour.
● It will be recorded.
● You can submit your questions by entering them into the GoToWebinar panel.
● The last 10 minutes will consist of Q&A.
● The slides and recording will be available after the talk.
Declarative
Stream
Language
Processing
KSQLis a
KSQLis the
Streaming
SQL Enginefor
Apache Kafka
KSQL Concepts
Exploring KSQL Patterns
KSQL Concepts
• Streams are first-class citizens
• Tables are first-class citizens
• Some queries are persistent
• All queries run until terminated
CREATE STREAM clickstream
WITH (
value_format = ‘JSON’,
kafka_topic=‘my_clickstream_topic’
);
Creating a Stream
• Let’s say we have a topic called my_clickstream_topic
• The topic contains JSON data
• KSQL now knows about that topic
Exploring that Stream
SELECT status, bytes
FROM clickstream
WHERE user_agent =
‘Mozilla/5.0 (compatible; MSIE 6.0)’;
• Now that the stream exists, we can examine its contents
• Simple, declarative filtering
• A non-persistent query
CREATE TABLE users
WITH (
key = ‘user_id',
kafka_topic=‘clickstream_users’,
value_format=‘JSON’
);
Creating a Table
• We have a topic called my_clickstream_topic
• The topic contains JSON data
• The topic contains changelog data
Inspecting that Table
SELECT userid, username
FROM users
WHERE level = ‘Platinum’;
• Now that the table exists, we can examine its contents
• Simple, declarative filtering
• A non-persistent query
Joining a Stream to a Table
• Now that we have clickstream and users, we can join them
• This allows us to do filtering of clicks on a user attribute
CREATE STREAM vip_actions AS
SELECT userid, page, action FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';
Usage Patterns
KSQL for Streaming ETL
• Kafka is popular for data pipelines.
• KSQL enables easy transformations of data within the pipe.
• Transforming data while moving from Kafka to another system.
CREATE STREAM vip_actions AS
SELECT userid, page, action FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';
KSQL for Anomaly Detection
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
Identifying patterns or anomalies in real-time data,
surfaced in milliseconds
KSQL for Real-Time
Monitoring• Log data monitoring, tracking and alerting
• Sensor / IoT data
CREATE TABLE error_counts AS
SELECT error_code, count(*)
FROM monitoring_stream
WINDOW TUMBLING (SIZE 1 MINUTE)
WHERE type = 'ERROR'
GROUP BY error_code;
KSQL for Data Transformation
CREATE STREAM views_by_userid
WITH (PARTITIONS=6,
VALUE_FORMAT=‘JSON’,
TIMESTAMP=‘view_time’) AS
SELECT * FROM clickstream PARTITION BY user_id;
Make simple derivations of existing topics from the command line
Demo
Deployment Patterns
Kafka Cluster
JVM
KSQL ServerKSQL CLI
KSQL in Local Mode
• Starts a CLI and a server in the same JVM
• Ideal for developing on your laptop
bin/ksql-cli local
• Or with customized settings
bin/ksql-cli local --properties-file ksql.properties
KSQL in Local Mode
KSQL in Client-Server Mode
JVM
KSQL Server
KSQL CLI
JVM
KSQL Server
JVM
KSQL Server
Kafka Cluster
• Start any number of server nodes
bin/ksql-server-start
• Start one or more CLIs and point them to a server
bin/ksql-cli remote https://myksqlserver:8090
• All servers share the processing load
Technically, instances of the same Kafka Streams Applications
Scale up/down without restart
KSQL in Client-Server Mode
KSQL in Application Mode
Kafka Cluster
JVM
KSQL Server
JVM
KSQL Server
JVM
KSQL Server
• Start any number of server nodes
Pass a file of KSQL statement to execute
bin/ksql-node query-file=foo/bar.sql
• Ideal for streaming ETL application deployment
Version-control your queries and transformations as code
• All running engines share the processing load
Technically, instances of the same Kafka Streams Applications
Scale up/down without restart
KSQL in Application Mode
Resources and Next Steps
https://github.com/confluentinc/ksql
http://confluent.io/ksql
https://slackpass.io/confluentcommunity #ksql
29
30
Thank you for attending Exploring KSQL
Patterns.

More Related Content

What's hot (20)

PDF
Introduction to Kafka Streams
Guozhang Wang
 
PDF
데이터 분석가를 위한 신규 분석 서비스 - 김기영, AWS 분석 솔루션즈 아키텍트 / 변규현, 당근마켓 소프트웨어 엔지니어 :: AWS r...
Amazon Web Services Korea
 
PPTX
The top 3 challenges running multi-tenant Flink at scale
Flink Forward
 
PDF
Benefits of Stream Processing and Apache Kafka Use Cases
confluent
 
PDF
Delta from a Data Engineer's Perspective
Databricks
 
PDF
Introduction to Elasticsearch
Ruslan Zavacky
 
PDF
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
HostedbyConfluent
 
PPTX
Apache Pinot Meetup Sept02, 2020
Mayank Shrivastava
 
ODP
Stream processing using Kafka
Knoldus Inc.
 
PPTX
Elastic Stack Introduction
Vikram Shinde
 
PDF
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
PDF
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
confluent
 
PDF
From Zero to Hero with Kafka Connect
confluent
 
PPTX
Kafka 101
Clement Demonchy
 
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
PDF
Snowflake free trial_lab_guide
slidedown1
 
PDF
HCL Sametime V11 installation - tips
Ales Lichtenberg
 
PDF
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
HostedbyConfluent
 
PPTX
Introduction to Apache Kafka
Jeff Holoman
 
PDF
How Uber scaled its Real Time Infrastructure to Trillion events per day
DataWorks Summit
 
Introduction to Kafka Streams
Guozhang Wang
 
데이터 분석가를 위한 신규 분석 서비스 - 김기영, AWS 분석 솔루션즈 아키텍트 / 변규현, 당근마켓 소프트웨어 엔지니어 :: AWS r...
Amazon Web Services Korea
 
The top 3 challenges running multi-tenant Flink at scale
Flink Forward
 
Benefits of Stream Processing and Apache Kafka Use Cases
confluent
 
Delta from a Data Engineer's Perspective
Databricks
 
Introduction to Elasticsearch
Ruslan Zavacky
 
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
HostedbyConfluent
 
Apache Pinot Meetup Sept02, 2020
Mayank Shrivastava
 
Stream processing using Kafka
Knoldus Inc.
 
Elastic Stack Introduction
Vikram Shinde
 
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
confluent
 
From Zero to Hero with Kafka Connect
confluent
 
Kafka 101
Clement Demonchy
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
Snowflake free trial_lab_guide
slidedown1
 
HCL Sametime V11 installation - tips
Ales Lichtenberg
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
HostedbyConfluent
 
Introduction to Apache Kafka
Jeff Holoman
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
DataWorks Summit
 

Similar to Exploring KSQL Patterns (20)

PPTX
Exploring KSQL Patterns
confluent
 
PDF
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Paolo Castagna
 
PDF
KSQL – An Open Source Streaming Engine for Apache Kafka
Kai Wähner
 
PPTX
KSQL and Kafka Streams – When to Use Which, and When to Use Both
confluent
 
PDF
KSQL: Open Source Streaming for Apache Kafka
confluent
 
PDF
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
Kai Wähner
 
PDF
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Codemotion
 
PDF
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Codemotion
 
PDF
Riviera Jug - 20/03/2018 - KSQL
Florent Ramiere
 
PDF
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
confluent
 
PDF
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Michael Noll
 
PPTX
Real Time Stream Processing with KSQL and Kafka
David Peterson
 
PPTX
Scylla Summit 2018: Scalable Stream Processing with KSQL, Kafka and ScyllaDB
ScyllaDB
 
PDF
Paris jug ksql - 2018-06-28
Florent Ramiere
 
PPTX
Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL
ScyllaDB
 
PDF
Streams, Tables, and Time in KSQL
confluent
 
PPTX
Live Coding a KSQL Application
confluent
 
PDF
Streaming ETL to Elastic with Apache Kafka and KSQL
confluent
 
PDF
KSQL---Streaming SQL for Apache Kafka
Matthias J. Sax
 
PDF
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Michael Noll
 
Exploring KSQL Patterns
confluent
 
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Paolo Castagna
 
KSQL – An Open Source Streaming Engine for Apache Kafka
Kai Wähner
 
KSQL and Kafka Streams – When to Use Which, and When to Use Both
confluent
 
KSQL: Open Source Streaming for Apache Kafka
confluent
 
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
Kai Wähner
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Codemotion
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Codemotion
 
Riviera Jug - 20/03/2018 - KSQL
Florent Ramiere
 
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
confluent
 
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Michael Noll
 
Real Time Stream Processing with KSQL and Kafka
David Peterson
 
Scylla Summit 2018: Scalable Stream Processing with KSQL, Kafka and ScyllaDB
ScyllaDB
 
Paris jug ksql - 2018-06-28
Florent Ramiere
 
Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL
ScyllaDB
 
Streams, Tables, and Time in KSQL
confluent
 
Live Coding a KSQL Application
confluent
 
Streaming ETL to Elastic with Apache Kafka and KSQL
confluent
 
KSQL---Streaming SQL for Apache Kafka
Matthias J. Sax
 
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Michael Noll
 
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
PDF
Migration, backup and restore made easy using Kannika
confluent
 
PDF
Five Things You Need to Know About Data Streaming in 2025
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
PDF
Unlocking value with event-driven architecture by Confluent
confluent
 
PDF
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
PDF
Building API data products on top of your real-time data infrastructure
confluent
 
PDF
Speed Wins: From Kafka to APIs in Minutes
confluent
 
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
Migration, backup and restore made easy using Kannika
confluent
 
Five Things You Need to Know About Data Streaming in 2025
confluent
 
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
Unlocking value with event-driven architecture by Confluent
confluent
 
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
Building API data products on top of your real-time data infrastructure
confluent
 
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Ad

Recently uploaded (20)

PDF
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Python basic programing language for automation
DanialHabibi2
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 

Exploring KSQL Patterns

  • 1. 1
  • 2. 2 Tim is a teacher, author and technology leader with Confluent. He is not only an expert on KSQL but he can also frequently be found speaking at conferences in the United States and all over the world. He is the co-presenter of various O’Reilly training videos on topics ranging from Git to Distributed Systems, and he is the author of Gradle Beyond the Basics. Tim Berglund Senior Director of Developer Experience, Confluent
  • 3. 3 Housekeeping Items ● This session will last about an hour. ● It will be recorded. ● You can submit your questions by entering them into the GoToWebinar panel. ● The last 10 minutes will consist of Q&A. ● The slides and recording will be available after the talk.
  • 8. KSQL Concepts • Streams are first-class citizens • Tables are first-class citizens • Some queries are persistent • All queries run until terminated
  • 9. CREATE STREAM clickstream WITH ( value_format = ‘JSON’, kafka_topic=‘my_clickstream_topic’ ); Creating a Stream • Let’s say we have a topic called my_clickstream_topic • The topic contains JSON data • KSQL now knows about that topic
  • 10. Exploring that Stream SELECT status, bytes FROM clickstream WHERE user_agent = ‘Mozilla/5.0 (compatible; MSIE 6.0)’; • Now that the stream exists, we can examine its contents • Simple, declarative filtering • A non-persistent query
  • 11. CREATE TABLE users WITH ( key = ‘user_id', kafka_topic=‘clickstream_users’, value_format=‘JSON’ ); Creating a Table • We have a topic called my_clickstream_topic • The topic contains JSON data • The topic contains changelog data
  • 12. Inspecting that Table SELECT userid, username FROM users WHERE level = ‘Platinum’; • Now that the table exists, we can examine its contents • Simple, declarative filtering • A non-persistent query
  • 13. Joining a Stream to a Table • Now that we have clickstream and users, we can join them • This allows us to do filtering of clicks on a user attribute CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = 'Platinum';
  • 15. KSQL for Streaming ETL • Kafka is popular for data pipelines. • KSQL enables easy transformations of data within the pipe. • Transforming data while moving from Kafka to another system. CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = 'Platinum';
  • 16. KSQL for Anomaly Detection CREATE TABLE possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING count(*) > 3; Identifying patterns or anomalies in real-time data, surfaced in milliseconds
  • 17. KSQL for Real-Time Monitoring• Log data monitoring, tracking and alerting • Sensor / IoT data CREATE TABLE error_counts AS SELECT error_code, count(*) FROM monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE type = 'ERROR' GROUP BY error_code;
  • 18. KSQL for Data Transformation CREATE STREAM views_by_userid WITH (PARTITIONS=6, VALUE_FORMAT=‘JSON’, TIMESTAMP=‘view_time’) AS SELECT * FROM clickstream PARTITION BY user_id; Make simple derivations of existing topics from the command line
  • 19. Demo
  • 21. Kafka Cluster JVM KSQL ServerKSQL CLI KSQL in Local Mode
  • 22. • Starts a CLI and a server in the same JVM • Ideal for developing on your laptop bin/ksql-cli local • Or with customized settings bin/ksql-cli local --properties-file ksql.properties KSQL in Local Mode
  • 23. KSQL in Client-Server Mode JVM KSQL Server KSQL CLI JVM KSQL Server JVM KSQL Server Kafka Cluster
  • 24. • Start any number of server nodes bin/ksql-server-start • Start one or more CLIs and point them to a server bin/ksql-cli remote https://myksqlserver:8090 • All servers share the processing load Technically, instances of the same Kafka Streams Applications Scale up/down without restart KSQL in Client-Server Mode
  • 25. KSQL in Application Mode Kafka Cluster JVM KSQL Server JVM KSQL Server JVM KSQL Server
  • 26. • Start any number of server nodes Pass a file of KSQL statement to execute bin/ksql-node query-file=foo/bar.sql • Ideal for streaming ETL application deployment Version-control your queries and transformations as code • All running engines share the processing load Technically, instances of the same Kafka Streams Applications Scale up/down without restart KSQL in Application Mode
  • 27. Resources and Next Steps https://github.com/confluentinc/ksql http://confluent.io/ksql https://slackpass.io/confluentcommunity #ksql
  • 28. 29
  • 29. 30 Thank you for attending Exploring KSQL Patterns.

Editor's Notes

  • #5: Really, stream processing is still a pretty new discipline. We are only on the second generation of OSS tooling (depending on how you look at things), and most people who are building streaming systems are building their first. As a result, most stream processing requires a bunch of custom code, often deployed to specialized infrastructure, coded against specialized APIs. And hey, sometimes that’s what you gotta do, but having a declarative language and getting infrastructure problems out of your way is a good thing. KSQL aims to do both things.
  • #6: Another way to put that, is that KSQL is a SQL engine for Kafka. It’s not a subset of ANSI SQL—it can’t be, since streaming systems deal with unbounded data sets and relational databases are fundamentally about bounded data sets, and that difference matters—but man, how great would it be to have a SQL-like language to describe stream processing computation you want done to the data you have stored in Kafka topics? (If you don’t know Kafka already, it’s a messaging system, and topics are just queues of messages. Basic stuff here, and don’t let it confuse you if you’re new to all of this.)
  • #7: Where does it fit into my system? What is the language syntax like?
  • #8: architecture diagram stuff goes into Kafka, KSQL processes it, it goes out KSQL takes the place of more complex options that have preceded it, like the Streams API or the Producer and Consumer API.
  • #9: KSQL is familiar, but is also different in important ways. What is a stream? An unbounded sequence of facts. What is a table? A collection of evolving facts. We’ll see examples. Queries tend to run until you stop them. This is counterintuitive, but remember we’re dealing with streaming data here. There’s never a “last” record. Persistent queries are really stream processing programs that run in KSQL.
  • #10: Ok, so we want to make ourselves a stream out of a topic we have in Kafka, how to start ? This is a lightweight abstraction on top of the topic. Note that the stream has metadata, but the metadata is extracted automatically from the topic.
  • #11: It’s not an ad-hoc query language as such, but since you can define stream processing jobs with it, it’s certainly possible to use it to arbitrary filtering and projection on existing topics. You have to create streams first, fo course, but we’ve gone over that now.
  • #12: Creating a table. Note that this is fundamentally tabular data: the key is the user_id, so each message in the topic is an update to that user’s record. We don’t need to specify the metadata, because it gets sucked in from the topic.
  • #13: It’s not an ad-hoc query language as such, but since you can define stream processing jobs with it, it’s certainly possible to use it to arbitrary filtering and projection on existing topics. You have to create streams first, fo course, but we’ve gone over that now.
  • #16: On the third bullet: often people build streaming pipelines with Kafka dumping data into C* or Elastic. Well, you’re probably going to need to do some work on the data along the way. No need to have a Spark Streaming job running now!
  • #19: KSQL also turns out to be super-useful for housekeeping and administrative actions that would otherwise require a stream-transforming program of some sort to be written and tested, or changes in the underlying source data ssytems to produce to a topic in a different format in the first place. In this example we’re simply: taking all the records from the ‘clickstream’ topic and copying them into a new ‘views_by_userid’ topic, which we’ve asked to be written out in json format (notice that the inout stream could be in any other format KSQL can read), and explicitly asked for there to be 6 partitions of this output topic, for the record timestamps to be populated from the value of the ‘view_time’ field in the input topic And finally, the records should be distributed across the 6 partitions based on their ‘user_id’ All the options we’re specifying here have sensible defaults and can be omitted if you don’t want or need to override them
  • #20: 4
  • #22: When we’re looking to select tools for solving a particular tech problem in front of us we are always making trade-offs. In kafka-land, one interesting set of trade-offs to consider is this, fairly typical, spectrum: ranging from very flexible and low-level on the left side, using the original kafka client producer and consumer APIs – think of this as being at the level of ‘get-message’, ‘put message’ and you of course have to take care of many details of orchestrating these reads and writes yourself; up through something like the kafka streams api, shown in the center here, where we can hide a lot of lower-level implementation concerns and focus on using functions which operate on a stream of records as a whole – perhaps filtering or transforming every record that passes by in a more functional-programming style. The real shift when using this is in mindset, to a place where you think of passing functions to be run against everything in a stream rather than strictly iterating over the stream yourself. KSQL shifts it up another gear to a place where we can declaratively transform one or more streams into another stream, using syntax and ideas that may be more familiar. Notice how, as we go from left to right on this spectrum, each thing builds upon the preceding one – both conceptually and also literally in terms of implementation – each of these APIs is built around the preceding one
  • #27: Leave resource mgmt. to dedicated systems such as k8s All running Engines share the processing load Technically, instances of the same Kafka Streams Applications Scale up/down without restart
  • #29: This is open source, and you should get involved. You can check out the code on GitHub or play with the many examples there. Also, you are hereby solemnly adjured to join the Slack community and ask questions there!