SlideShare a Scribd company logo
Modifying Your SQL
Streaming Queries on the Fly:
The impossible Trinity
Yingjun Wu
Founder, RisingWave Labs
Who Am I?
2
• Yingjun Wu
• Founder @ RisingWave
• Ex-AWS Redshift
• Ex-IBM Research Almaden
• PhD, National University of Singapore
• Visiting PhD, Carnegie Mellon University
What Am I Doing?
• Building RisingWave, an open-source SQL streaming database
• Extremely easy
• Extremely cost-efficient
3
Adopted in hundreds of enterprises and SMBs!
Streaming analytics
Real-time monitoring/alerting
Real-time ETL
Database replication
…
SQL Stream Processing
4
Operational databases
Messaging queues
BI tools
Stream processing systems
SQL Stream Processing
5
Operational databases
Messaging queues
BI tools
Stream processing systems
SQL Stream Processing
6
• Continuously monitor crypto data
SQL Stream Processing
7
• Continuously monitor crypto data
SQL Stream Processing
8
• Continuously monitor crypto data
SQL Stream Processing
9
• Continuously monitor crypto data
Challenges
• Queries may be modified from time to time!
10
Challenges
• Queries may be modified from time to time!
• Upstream data schema may change
11
Column name Column type
product_id VARCHAR
price NUMERIC
open_24h NUMERIC
volume_24h TIMESTAMP
Column name Column type
product_id VARCHAR
price NUMERIC
open_12h NUMERIC
volume_24h TIMESTAMP
Schema change!
Challenges
• Queries may be modified from time to time!
• Upstream data schema may change
• Business logic may change
12
5 60
Logic change!
Challenges
• Queries may be modified from time to time!
• Upstream data schema may change
• Business logic may change
• Not a big problem for batch queries…
13
Query 1
Query 2
Challenges
• Queries may be modified from time to time!
• Upstream data schema may change
• Business logic may change
• Not a big problem for batch queries…
• But a HUGE problem for streaming queries!
14
Continuous query processing!
State Management
• Consider time window functions
15
i96 i2 i1 o2 o1
State
… …
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘5’ second);
o6
Input progress
96
State Management
• Consider time window functions
16
i96 i2 i1 o2 o1
State
… …
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘5’ second);
o7
i98 i97 o6
Updated!
Add new values and evict expired values
Input progress
98
State Management
• Consider time window functions
17
i96 i2 i1 o2 o1
State
… …
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘5’ second);
o7
i98 i97 o6
o8
i99
Input progress
99
Updated!
Add new values and evict expired values
State Management
• Consider time window functions
18
i96 i2 i1 o2 o1
State
… …
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘5’ second);
o6
Input progress
96
Let’s modify the SQL query!
State Management
• Consider time window functions
19
i96 i2 i1 o2 o1
State
… …
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘5’ second);
o6
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘60’ second);
Input progress
96
Let’s modify the SQL query!
State Management
• Consider time window functions
20
i96 i2 i1 o2 o1
State
… …
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘5’ second);
o6
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘60’ second);
State
Input progress
96
State Management
• Consider time window functions
21
i96 i2 i1 o2 o1
State
… …
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘5’ second);
o6
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘60’ second);
State
Input progress
96
The Native Solution
• Recompute!
22
The Native Solution
• Recompute!
23
i96 i2 i1 o2 o1
State
… …
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘5’ second);
o6
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘60’ second);
Input progress
96
The Native Solution
• Recompute!
24
i96 i2 i1 o2 o1
State
… …
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘5’ second);
o6
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘60’ second);
Input progress
96
Latest progress
96
Mark the
progress
The Native Solution
• Recompute!
25
i96 i2 i1 o2 o1
State
… …
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘5’ second);
o6
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘60’ second);
Input progress
96 0
Latest progress
96
Mark the
progress
Drop the
entire state
The Native Solution
• Recompute!
26
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘5’ second);
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘60’ second);
Input progress
0
Latest progress
96
State
Mark the
progress
Drop the
entire state
Rebuild from
scratch
The Native Solution
• Recompute!
27
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘5’ second);
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘60’ second);
Input progress
2
Latest progress
96
State
i2 i1
Updated!!
Mark the
progress
Drop the
entire state
Rebuild from
scratch
The Native Solution
• Recompute!
28
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘5’ second);
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘60’ second);
Input progress
2
Latest progress
96
State
i2 i1
Mark the
progress
Drop the
entire state
Rebuild from
scratch
Updated!!
The Native Solution
• Recompute!
29
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘5’ second);
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘60’ second);
Input progress
96
Latest progress
96
State
i96 i2 i1
… o2 o1
…
o6
Mark the
progress
Drop the
entire state
Rebuild from
scratch
Updated!!
The Native Solution
• Recompute!
30
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘5’ second);
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘60’ second);
Input progress
96
Latest progress
96
State
i96 i2 i1
… o2 o1
…
o6
Mark the
progress
Drop the
entire state
Rebuild from
scratch
Continue
processing
Updated!!
The Native Solution
• Recompute!
31
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘5’ second);
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘60’ second);
Input progress
96
Latest progress
96
State Updated!!
i96 i2 i1
… o2 o1
…
o6
Mark the
progress
Drop the
entire state
Rebuild from
scratch
Continue
processing
The Ideal Solution
• Reuse!
32
The Ideal Solution
• Reuse!
33
State
Old query
New query
State
i96 i2 i1 o2 o1
… …
o6
The Ideal Solution
• Reuse!
34
State
Old query
New query
State
Reuse?
i96 i2 i1 o2 o1
… …
o6
The Ideal Solution
• Reuse!
35
State
State
i96 i2 i1 o2 o1
… …
o6
Internal state
Entry 1
Entry 2
…
Entry 5
…
Entry 10
Internal state
Entry 1
Entry 2
…
Entry 5
SELECT … FROM
WHERE …
ORDER BY … LIMIT 10;
SELECT … FROM
WHERE …
ORDER BY … LIMIT 5;
The Ideal Solution
• Reuse!
36
State
State
i96 i2 i1 o2 o1
… …
o6
Internal state
Entry 1
Entry 2
…
Entry 5
…
Entry 10
Internal state
Entry 1
Entry 2
…
Entry 5
Truncate!
SELECT … FROM
WHERE …
ORDER BY … LIMIT 10;
SELECT … FROM
WHERE …
ORDER BY … LIMIT 5;
Reuse!
The Ideal Solution
• Reuse!
37
State
State
i96 i2 i1 o2 o1
… …
o6
Internal state
Entry 1
Entry 2
…
Entry 5
…
Entry 10
Internal state
Entry 1
Entry 2
…
Entry 5
Truncate!
Reuse!
SELECT … FROM
WHERE …
ORDER BY … LIMIT 10;
SELECT … FROM
WHERE …
ORDER BY … LIMIT 5;
The Ideal Solution
• Reuse!
38
State
i96 i2 i1 o2 o1
… …
o6
Detect reusable
patterns
SELECT … FROM
WHERE …
ORDER BY … LIMIT 10;
SELECT … FROM
WHERE …
ORDER BY … LIMIT 5;
The Ideal Solution
• Reuse!
39
State
i96 i2 i1 o2 o1
… …
o6
Detect reusable
patterns
Pattern
matching!
SELECT … FROM
WHERE …
ORDER BY … LIMIT 10;
SELECT … FROM
WHERE …
ORDER BY … LIMIT 5;
The Ideal Solution
• Reuse!
40
SELECT … FROM
WHERE …
ORDER BY … LIMIT 10;
SELECT … FROM
WHERE …
ORDER BY … LIMIT 5;
TABLE
FILTER WHERE …
TABLE
FILTER WHERE …
TOP N TOP N
ORDER BY …
LIMIT = 10
ORDER BY …
LIMIT = 5
Stateful
Stateless
Pattern
matching!
Detect reusable
patterns
The Ideal Solution
• Reuse!
41
State
i96 i2 i1 o2 o1
… …
o6
Detect reusable
patterns
Modify internal
state
Internal state
Entry 1
Entry 2
…
Entry 5
…
Entry 10
Truncate!
SELECT … FROM
WHERE …
ORDER BY … LIMIT 10;
SELECT … FROM
WHERE …
ORDER BY … LIMIT 5;
The Ideal Solution
• Reuse!
42
State
SELECT … FROM
WHERE …
ORDER BY … LIMIT 10;
SELECT … FROM
WHERE …
ORDER BY … LIMIT 5;
i96 i2 i1 o2 o1
… …
o6
Detect reusable
patterns
Modify internal
state
Continue
processing
Internal state
Entry 1
Entry 2
…
Entry 5
The Ideal Solution
• Reuse!
43
i96 i2 i1 o2 o1
State
… …
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘5’ second);
o6
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘60’ second);
State
The Ideal Solution
• Reuse!
44
i96 i2 i1 o2 o1
State
… …
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘5’ second);
o6
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘60’ second);
State
The Ideal Solution
• Dozens of SQL clauses
• Unlimited number of combinations…
45
WHERE
LIMIT
UNION
HAVING
DISTINCT
ORDER BY
JOIN
GROUP BY
……
The Real World
46
Recompute! Reuse!
General but inefficient Efficient but not general
The Real World
47
Recompute! Reuse!
General but inefficient Efficient but not general
The Impossible Trinity
48
The Impossible Trinity
49
Efficiency
Generality
Consistency
The Impossible Trinity
50
Efficiency
Generality
Consistency
It’s impossible to achieve all three!
The Impossible Trinity
51
Efficiency
Generality
Consistency
• Sacrifice efficiency!
It’s impossible to achieve all three!
The Impossible Trinity
52
Efficiency
Generality
Consistency
• Sacrifice generality!
It’s impossible to achieve all three!
The Impossible Trinity
53
Efficiency
Generality
Consistency
It’s impossible to achieve all three!
• Sacrifice consistency!
The Impossible Trinity
54
Efficiency
Generality
Consistency
Which one should we sacrifice?
The Impossible Trinity
55
Efficiency
Generality
Consistency
Option Sacrifice efficiency
Strategy Always recompute
Pros General; easy to implement
Cons Can take very long time to recompute
The Impossible Trinity
56
Efficiency
Generality
Consistency
Option Sacrifice generality
Strategy Try best to reuse
Pros Efficient
Cons Very difficult to implement; very limited
The Impossible Trinity
57
Efficiency
Generality
Consistency
Option Sacrifice consistency
Strategy Try to reuse whenever possible
Pros General enough and also efficient
Cons May compromise consistency
The Impossible Trinity
58
Efficiency
Generality
Consistency
Option Sacrifice consistency
Strategy Try to reuse whenever possible
Pros General enough and also efficient
Cons May compromise consistency
The Impossible Trinity
59
Pattern
matching!
SELECT … FROM
WHERE …
ORDER BY … LIMIT 10;
SELECT … FROM
WHERE …
ORDER BY … LIMIT 5;
TABLE
FILTER WHERE …
TABLE
FILTER WHERE …
TOP N TOP N
ORDER BY …
LIMIT = 10
ORDER BY …
LIMIT = 5
Stateful
Stateless
Query plan “Context”
The Impossible Trinity
60
Pattern
matching!
SELECT … FROM
WHERE …
ORDER BY … LIMIT 10;
SELECT … FROM
WHERE …
ORDER BY … LIMIT 5;
TABLE
FILTER WHERE …
TABLE
FILTER WHERE …
TOP N TOP N
ORDER BY …
LIMIT = 10
ORDER BY …
LIMIT = 5
Stateful
Stateless
Query plan “Context”
The Impossible Trinity
61
Pattern
matching!
Internal state
Entry 1
Entry 2
…
Entry 5
…
Entry 10
SELECT … FROM
WHERE …
ORDER BY … LIMIT 10;
SELECT … FROM
WHERE …
ORDER BY … LIMIT 5;
TABLE
FILTER
TABLE
FILTER
TOP N TOP N
Stateful
Stateless
Directly reuse without editing!
The Impossible Trinity
62
Pattern
matching!
TABLE TABLE
TUMBLE TUMBLE
Stateful
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘5’ second);
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘60’ second);
Our Implementation
63
Our Implementation
64
• Principle: let the users to decide!
Our Implementation
65
• Principle: let the users to decide!
Old query New query
Our Implementation
66
• Principle: let the users to decide!
Old query New query
Is there common
pattern?
Our Implementation
67
• Principle: let the users to decide!
Old query New query
Is there common
pattern?
Recompute!
Can we preserve
consistency?
Yes No
Our Implementation
68
• Principle: let the users to decide!
Old query New query
Is there common
pattern?
“Are you okay to sacrifice
consistency?”
Recompute!
Can we preserve
consistency?
Reuse!
Yes No
Yes No
Our Implementation
69
• Principle: let the users to decide!
Old query New query
Is there common
pattern?
“Are you okay to sacrifice
consistency?”
Recompute!
Can we preserve
consistency?
Reuse!
Recompute!
Reuse!
(Inconsistent!)
Yes No
Yes No
Yes No
Optimization
• Best case: reuse states without compromising consistency
• Worst case: recompute from scratch
70
Can we optimize the worst-case scenario?
Optimization
71
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘5’ second);
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘60’ second);
Input progress
96
Latest progress
96
State
i96 i2 i1
… o2 o1
…
o6
Drop the
entire state
Rebuild from
scratch
Continue
processing
• Recompute!
Updated!!
Optimization
72
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘5’ second);
SELECT … FROM
TUMBLE(crypto_source, …, interval ‘60’ second);
Input progress
96
Latest progress
96
State
i96 i2 i1
… o2 o1
…
o6
Drop the
entire state
Rebuild from
scratch
Continue
processing
• Recompute!
Scale out!!!
Updated!!
Optimization
73
• “S3 as primary storage”
State State State
States
State State State
Compute
nodes
Persistent
storage
States
Checkpoint
Cache Cache Cache
“state as checkpoint”
Traditional stream processing
systems
State State State
Optimization
74
• Scale out during the recomputation (backfilling) stage
Cache Cache
State State State
State State State
Compute
nodes
Persistent
storage
Optimization
75
• Scale out during the recomputation (backfilling) stage
Cache Cache Cache Cache
Cache
State State State
State State State
Cache
Compute
nodes
Persistent
storage
Scale out to recompute
Optimization
76
• Scale in once recomputation is done
Cache Cache
State State State
State State State
Compute
nodes
Persistent
storage
Scale in after recomputation
More Challenges
77
More Challenges
78
Materialized view
Materialized view
Materialized view
Materialized view
• Stacked materialized views, or dependent streaming jobs!
More Challenges
79
Materialized view
Materialized view
Materialized view
Materialized view
• Stacked materialized views, or dependent streaming jobs!
• What if an upstream materialized view gets changed?
More Challenges
80
Materialized view
Materialized view
Materialized view
Materialized view
• Stacked materialized views, or dependent streaming jobs!
• What if an upstream materialized view gets changed?
Conclusion
• Modifying SQL streaming queries on the fly can cause troubles
• We could choose only two from generality, efficiency, and consistency
• Optimizations can be applied to greatly accelerate query recomputation
81
Thanks! Q&A?
risingwave.com/slack

More Related Content

Similar to Modifying Your SQL Streaming Queries on the Fly: The Impossible Trinity (20)

PDF
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
HostedbyConfluent
 
PPTX
Why and how to leverage the simplicity and power of SQL on Flink
DataWorks Summit
 
PDF
Streaming SQL
Julian Hyde
 
PDF
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Julian Hyde
 
PDF
Julian Hyde - Streaming SQL
Flink Forward
 
PDF
Streaming SQL
Julian Hyde
 
PPTX
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward
 
PPTX
Row patternmatching12ctech14
stewashton
 
KEY
10x Performance Improvements
Ronald Bradford
 
KEY
10x improvement-mysql-100419105218-phpapp02
promethius
 
PPTX
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
Flink Forward
 
PDF
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
Timo Walther
 
PPTX
Webinar: Flink SQL in Action - Fabian Hueske
Ververica
 
PPTX
Sql analytic queries tips
Vedran Bilopavlović
 
PPTX
Stream Analytics with SQL on Apache Flink
Fabian Hueske
 
PPT
Dbms
philipsinter
 
PDF
Microsoft Big Data @ SQLUG 2013
Nathan Bijnens
 
PDF
Correctly Loading Incremental Data at Scale
Alluxio, Inc.
 
PPTX
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Ververica
 
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
HostedbyConfluent
 
Why and how to leverage the simplicity and power of SQL on Flink
DataWorks Summit
 
Streaming SQL
Julian Hyde
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Julian Hyde
 
Julian Hyde - Streaming SQL
Flink Forward
 
Streaming SQL
Julian Hyde
 
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward
 
Row patternmatching12ctech14
stewashton
 
10x Performance Improvements
Ronald Bradford
 
10x improvement-mysql-100419105218-phpapp02
promethius
 
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
Flink Forward
 
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
Timo Walther
 
Webinar: Flink SQL in Action - Fabian Hueske
Ververica
 
Sql analytic queries tips
Vedran Bilopavlović
 
Stream Analytics with SQL on Apache Flink
Fabian Hueske
 
Microsoft Big Data @ SQLUG 2013
Nathan Bijnens
 
Correctly Loading Incremental Data at Scale
Alluxio, Inc.
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Ververica
 

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
PDF
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
PDF
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
PDF
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
PDF
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
PDF
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
PDF
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
PDF
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
PDF
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
PDF
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Ad

Recently uploaded (20)

PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
PPTX
Top Managed Service Providers in Los Angeles
Captain IT
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Women in Automation Presents: Reinventing Yourself — Bold Career Pivots That ...
DianaGray10
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PPTX
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PDF
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
Top Managed Service Providers in Los Angeles
Captain IT
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Women in Automation Presents: Reinventing Yourself — Bold Career Pivots That ...
DianaGray10
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Ad

Modifying Your SQL Streaming Queries on the Fly: The Impossible Trinity

  • 1. Modifying Your SQL Streaming Queries on the Fly: The impossible Trinity Yingjun Wu Founder, RisingWave Labs
  • 2. Who Am I? 2 • Yingjun Wu • Founder @ RisingWave • Ex-AWS Redshift • Ex-IBM Research Almaden • PhD, National University of Singapore • Visiting PhD, Carnegie Mellon University
  • 3. What Am I Doing? • Building RisingWave, an open-source SQL streaming database • Extremely easy • Extremely cost-efficient 3 Adopted in hundreds of enterprises and SMBs! Streaming analytics Real-time monitoring/alerting Real-time ETL Database replication …
  • 4. SQL Stream Processing 4 Operational databases Messaging queues BI tools Stream processing systems
  • 5. SQL Stream Processing 5 Operational databases Messaging queues BI tools Stream processing systems
  • 6. SQL Stream Processing 6 • Continuously monitor crypto data
  • 7. SQL Stream Processing 7 • Continuously monitor crypto data
  • 8. SQL Stream Processing 8 • Continuously monitor crypto data
  • 9. SQL Stream Processing 9 • Continuously monitor crypto data
  • 10. Challenges • Queries may be modified from time to time! 10
  • 11. Challenges • Queries may be modified from time to time! • Upstream data schema may change 11 Column name Column type product_id VARCHAR price NUMERIC open_24h NUMERIC volume_24h TIMESTAMP Column name Column type product_id VARCHAR price NUMERIC open_12h NUMERIC volume_24h TIMESTAMP Schema change!
  • 12. Challenges • Queries may be modified from time to time! • Upstream data schema may change • Business logic may change 12 5 60 Logic change!
  • 13. Challenges • Queries may be modified from time to time! • Upstream data schema may change • Business logic may change • Not a big problem for batch queries… 13 Query 1 Query 2
  • 14. Challenges • Queries may be modified from time to time! • Upstream data schema may change • Business logic may change • Not a big problem for batch queries… • But a HUGE problem for streaming queries! 14 Continuous query processing!
  • 15. State Management • Consider time window functions 15 i96 i2 i1 o2 o1 State … … SELECT … FROM TUMBLE(crypto_source, …, interval ‘5’ second); o6 Input progress 96
  • 16. State Management • Consider time window functions 16 i96 i2 i1 o2 o1 State … … SELECT … FROM TUMBLE(crypto_source, …, interval ‘5’ second); o7 i98 i97 o6 Updated! Add new values and evict expired values Input progress 98
  • 17. State Management • Consider time window functions 17 i96 i2 i1 o2 o1 State … … SELECT … FROM TUMBLE(crypto_source, …, interval ‘5’ second); o7 i98 i97 o6 o8 i99 Input progress 99 Updated! Add new values and evict expired values
  • 18. State Management • Consider time window functions 18 i96 i2 i1 o2 o1 State … … SELECT … FROM TUMBLE(crypto_source, …, interval ‘5’ second); o6 Input progress 96 Let’s modify the SQL query!
  • 19. State Management • Consider time window functions 19 i96 i2 i1 o2 o1 State … … SELECT … FROM TUMBLE(crypto_source, …, interval ‘5’ second); o6 SELECT … FROM TUMBLE(crypto_source, …, interval ‘60’ second); Input progress 96 Let’s modify the SQL query!
  • 20. State Management • Consider time window functions 20 i96 i2 i1 o2 o1 State … … SELECT … FROM TUMBLE(crypto_source, …, interval ‘5’ second); o6 SELECT … FROM TUMBLE(crypto_source, …, interval ‘60’ second); State Input progress 96
  • 21. State Management • Consider time window functions 21 i96 i2 i1 o2 o1 State … … SELECT … FROM TUMBLE(crypto_source, …, interval ‘5’ second); o6 SELECT … FROM TUMBLE(crypto_source, …, interval ‘60’ second); State Input progress 96
  • 22. The Native Solution • Recompute! 22
  • 23. The Native Solution • Recompute! 23 i96 i2 i1 o2 o1 State … … SELECT … FROM TUMBLE(crypto_source, …, interval ‘5’ second); o6 SELECT … FROM TUMBLE(crypto_source, …, interval ‘60’ second); Input progress 96
  • 24. The Native Solution • Recompute! 24 i96 i2 i1 o2 o1 State … … SELECT … FROM TUMBLE(crypto_source, …, interval ‘5’ second); o6 SELECT … FROM TUMBLE(crypto_source, …, interval ‘60’ second); Input progress 96 Latest progress 96 Mark the progress
  • 25. The Native Solution • Recompute! 25 i96 i2 i1 o2 o1 State … … SELECT … FROM TUMBLE(crypto_source, …, interval ‘5’ second); o6 SELECT … FROM TUMBLE(crypto_source, …, interval ‘60’ second); Input progress 96 0 Latest progress 96 Mark the progress Drop the entire state
  • 26. The Native Solution • Recompute! 26 SELECT … FROM TUMBLE(crypto_source, …, interval ‘5’ second); SELECT … FROM TUMBLE(crypto_source, …, interval ‘60’ second); Input progress 0 Latest progress 96 State Mark the progress Drop the entire state Rebuild from scratch
  • 27. The Native Solution • Recompute! 27 SELECT … FROM TUMBLE(crypto_source, …, interval ‘5’ second); SELECT … FROM TUMBLE(crypto_source, …, interval ‘60’ second); Input progress 2 Latest progress 96 State i2 i1 Updated!! Mark the progress Drop the entire state Rebuild from scratch
  • 28. The Native Solution • Recompute! 28 SELECT … FROM TUMBLE(crypto_source, …, interval ‘5’ second); SELECT … FROM TUMBLE(crypto_source, …, interval ‘60’ second); Input progress 2 Latest progress 96 State i2 i1 Mark the progress Drop the entire state Rebuild from scratch Updated!!
  • 29. The Native Solution • Recompute! 29 SELECT … FROM TUMBLE(crypto_source, …, interval ‘5’ second); SELECT … FROM TUMBLE(crypto_source, …, interval ‘60’ second); Input progress 96 Latest progress 96 State i96 i2 i1 … o2 o1 … o6 Mark the progress Drop the entire state Rebuild from scratch Updated!!
  • 30. The Native Solution • Recompute! 30 SELECT … FROM TUMBLE(crypto_source, …, interval ‘5’ second); SELECT … FROM TUMBLE(crypto_source, …, interval ‘60’ second); Input progress 96 Latest progress 96 State i96 i2 i1 … o2 o1 … o6 Mark the progress Drop the entire state Rebuild from scratch Continue processing Updated!!
  • 31. The Native Solution • Recompute! 31 SELECT … FROM TUMBLE(crypto_source, …, interval ‘5’ second); SELECT … FROM TUMBLE(crypto_source, …, interval ‘60’ second); Input progress 96 Latest progress 96 State Updated!! i96 i2 i1 … o2 o1 … o6 Mark the progress Drop the entire state Rebuild from scratch Continue processing
  • 33. The Ideal Solution • Reuse! 33 State Old query New query State i96 i2 i1 o2 o1 … … o6
  • 34. The Ideal Solution • Reuse! 34 State Old query New query State Reuse? i96 i2 i1 o2 o1 … … o6
  • 35. The Ideal Solution • Reuse! 35 State State i96 i2 i1 o2 o1 … … o6 Internal state Entry 1 Entry 2 … Entry 5 … Entry 10 Internal state Entry 1 Entry 2 … Entry 5 SELECT … FROM WHERE … ORDER BY … LIMIT 10; SELECT … FROM WHERE … ORDER BY … LIMIT 5;
  • 36. The Ideal Solution • Reuse! 36 State State i96 i2 i1 o2 o1 … … o6 Internal state Entry 1 Entry 2 … Entry 5 … Entry 10 Internal state Entry 1 Entry 2 … Entry 5 Truncate! SELECT … FROM WHERE … ORDER BY … LIMIT 10; SELECT … FROM WHERE … ORDER BY … LIMIT 5; Reuse!
  • 37. The Ideal Solution • Reuse! 37 State State i96 i2 i1 o2 o1 … … o6 Internal state Entry 1 Entry 2 … Entry 5 … Entry 10 Internal state Entry 1 Entry 2 … Entry 5 Truncate! Reuse! SELECT … FROM WHERE … ORDER BY … LIMIT 10; SELECT … FROM WHERE … ORDER BY … LIMIT 5;
  • 38. The Ideal Solution • Reuse! 38 State i96 i2 i1 o2 o1 … … o6 Detect reusable patterns SELECT … FROM WHERE … ORDER BY … LIMIT 10; SELECT … FROM WHERE … ORDER BY … LIMIT 5;
  • 39. The Ideal Solution • Reuse! 39 State i96 i2 i1 o2 o1 … … o6 Detect reusable patterns Pattern matching! SELECT … FROM WHERE … ORDER BY … LIMIT 10; SELECT … FROM WHERE … ORDER BY … LIMIT 5;
  • 40. The Ideal Solution • Reuse! 40 SELECT … FROM WHERE … ORDER BY … LIMIT 10; SELECT … FROM WHERE … ORDER BY … LIMIT 5; TABLE FILTER WHERE … TABLE FILTER WHERE … TOP N TOP N ORDER BY … LIMIT = 10 ORDER BY … LIMIT = 5 Stateful Stateless Pattern matching! Detect reusable patterns
  • 41. The Ideal Solution • Reuse! 41 State i96 i2 i1 o2 o1 … … o6 Detect reusable patterns Modify internal state Internal state Entry 1 Entry 2 … Entry 5 … Entry 10 Truncate! SELECT … FROM WHERE … ORDER BY … LIMIT 10; SELECT … FROM WHERE … ORDER BY … LIMIT 5;
  • 42. The Ideal Solution • Reuse! 42 State SELECT … FROM WHERE … ORDER BY … LIMIT 10; SELECT … FROM WHERE … ORDER BY … LIMIT 5; i96 i2 i1 o2 o1 … … o6 Detect reusable patterns Modify internal state Continue processing Internal state Entry 1 Entry 2 … Entry 5
  • 43. The Ideal Solution • Reuse! 43 i96 i2 i1 o2 o1 State … … SELECT … FROM TUMBLE(crypto_source, …, interval ‘5’ second); o6 SELECT … FROM TUMBLE(crypto_source, …, interval ‘60’ second); State
  • 44. The Ideal Solution • Reuse! 44 i96 i2 i1 o2 o1 State … … SELECT … FROM TUMBLE(crypto_source, …, interval ‘5’ second); o6 SELECT … FROM TUMBLE(crypto_source, …, interval ‘60’ second); State
  • 45. The Ideal Solution • Dozens of SQL clauses • Unlimited number of combinations… 45 WHERE LIMIT UNION HAVING DISTINCT ORDER BY JOIN GROUP BY ……
  • 46. The Real World 46 Recompute! Reuse! General but inefficient Efficient but not general
  • 47. The Real World 47 Recompute! Reuse! General but inefficient Efficient but not general
  • 51. The Impossible Trinity 51 Efficiency Generality Consistency • Sacrifice efficiency! It’s impossible to achieve all three!
  • 52. The Impossible Trinity 52 Efficiency Generality Consistency • Sacrifice generality! It’s impossible to achieve all three!
  • 53. The Impossible Trinity 53 Efficiency Generality Consistency It’s impossible to achieve all three! • Sacrifice consistency!
  • 55. The Impossible Trinity 55 Efficiency Generality Consistency Option Sacrifice efficiency Strategy Always recompute Pros General; easy to implement Cons Can take very long time to recompute
  • 56. The Impossible Trinity 56 Efficiency Generality Consistency Option Sacrifice generality Strategy Try best to reuse Pros Efficient Cons Very difficult to implement; very limited
  • 57. The Impossible Trinity 57 Efficiency Generality Consistency Option Sacrifice consistency Strategy Try to reuse whenever possible Pros General enough and also efficient Cons May compromise consistency
  • 58. The Impossible Trinity 58 Efficiency Generality Consistency Option Sacrifice consistency Strategy Try to reuse whenever possible Pros General enough and also efficient Cons May compromise consistency
  • 59. The Impossible Trinity 59 Pattern matching! SELECT … FROM WHERE … ORDER BY … LIMIT 10; SELECT … FROM WHERE … ORDER BY … LIMIT 5; TABLE FILTER WHERE … TABLE FILTER WHERE … TOP N TOP N ORDER BY … LIMIT = 10 ORDER BY … LIMIT = 5 Stateful Stateless Query plan “Context”
  • 60. The Impossible Trinity 60 Pattern matching! SELECT … FROM WHERE … ORDER BY … LIMIT 10; SELECT … FROM WHERE … ORDER BY … LIMIT 5; TABLE FILTER WHERE … TABLE FILTER WHERE … TOP N TOP N ORDER BY … LIMIT = 10 ORDER BY … LIMIT = 5 Stateful Stateless Query plan “Context”
  • 61. The Impossible Trinity 61 Pattern matching! Internal state Entry 1 Entry 2 … Entry 5 … Entry 10 SELECT … FROM WHERE … ORDER BY … LIMIT 10; SELECT … FROM WHERE … ORDER BY … LIMIT 5; TABLE FILTER TABLE FILTER TOP N TOP N Stateful Stateless Directly reuse without editing!
  • 62. The Impossible Trinity 62 Pattern matching! TABLE TABLE TUMBLE TUMBLE Stateful SELECT … FROM TUMBLE(crypto_source, …, interval ‘5’ second); SELECT … FROM TUMBLE(crypto_source, …, interval ‘60’ second);
  • 64. Our Implementation 64 • Principle: let the users to decide!
  • 65. Our Implementation 65 • Principle: let the users to decide! Old query New query
  • 66. Our Implementation 66 • Principle: let the users to decide! Old query New query Is there common pattern?
  • 67. Our Implementation 67 • Principle: let the users to decide! Old query New query Is there common pattern? Recompute! Can we preserve consistency? Yes No
  • 68. Our Implementation 68 • Principle: let the users to decide! Old query New query Is there common pattern? “Are you okay to sacrifice consistency?” Recompute! Can we preserve consistency? Reuse! Yes No Yes No
  • 69. Our Implementation 69 • Principle: let the users to decide! Old query New query Is there common pattern? “Are you okay to sacrifice consistency?” Recompute! Can we preserve consistency? Reuse! Recompute! Reuse! (Inconsistent!) Yes No Yes No Yes No
  • 70. Optimization • Best case: reuse states without compromising consistency • Worst case: recompute from scratch 70 Can we optimize the worst-case scenario?
  • 71. Optimization 71 SELECT … FROM TUMBLE(crypto_source, …, interval ‘5’ second); SELECT … FROM TUMBLE(crypto_source, …, interval ‘60’ second); Input progress 96 Latest progress 96 State i96 i2 i1 … o2 o1 … o6 Drop the entire state Rebuild from scratch Continue processing • Recompute! Updated!!
  • 72. Optimization 72 SELECT … FROM TUMBLE(crypto_source, …, interval ‘5’ second); SELECT … FROM TUMBLE(crypto_source, …, interval ‘60’ second); Input progress 96 Latest progress 96 State i96 i2 i1 … o2 o1 … o6 Drop the entire state Rebuild from scratch Continue processing • Recompute! Scale out!!! Updated!!
  • 73. Optimization 73 • “S3 as primary storage” State State State States State State State Compute nodes Persistent storage States Checkpoint Cache Cache Cache “state as checkpoint” Traditional stream processing systems State State State
  • 74. Optimization 74 • Scale out during the recomputation (backfilling) stage Cache Cache State State State State State State Compute nodes Persistent storage
  • 75. Optimization 75 • Scale out during the recomputation (backfilling) stage Cache Cache Cache Cache Cache State State State State State State Cache Compute nodes Persistent storage Scale out to recompute
  • 76. Optimization 76 • Scale in once recomputation is done Cache Cache State State State State State State Compute nodes Persistent storage Scale in after recomputation
  • 78. More Challenges 78 Materialized view Materialized view Materialized view Materialized view • Stacked materialized views, or dependent streaming jobs!
  • 79. More Challenges 79 Materialized view Materialized view Materialized view Materialized view • Stacked materialized views, or dependent streaming jobs! • What if an upstream materialized view gets changed?
  • 80. More Challenges 80 Materialized view Materialized view Materialized view Materialized view • Stacked materialized views, or dependent streaming jobs! • What if an upstream materialized view gets changed?
  • 81. Conclusion • Modifying SQL streaming queries on the fly can cause troubles • We could choose only two from generality, efficiency, and consistency • Optimizations can be applied to greatly accelerate query recomputation 81