Sergei Petrunia
Vicentiu Ciorbaru
Window functions
in MariaDB
2
Plan
• What are window functions
– Basic window functions
– Frames
– Window functions and other parts of SQL
• Computing window functions
• Optimizations
3
Plan
• What are window functions
– Basic window functions
– Frames
– Window functions and other parts of SQL
• Computing window functions
• Optimizations
4
Scalar functions
select
concat(Name, ' in ', Country)
from Cities
Peking in CHN
Berlin in DEU
Moscow in RUS
Chicago in USA
+------+---------+---------+
| ID | Name | Country |
+------+---------+---------+
| 1891 | Peking | CHN |
| 3068 | Berlin | DEU |
| 3580 | Moscow | RUS |
| 3795 | Chicago | USA |
+------+---------+---------+
• Compute values based on the current row
5
Aggregate functions
• Compute summary for the group
• Group is collapsed into summary row
select
country, sum(Population) as total
from Cities
group by country
+-----------+---------+------------+
| name | country | population |
+-----------+---------+------------+
| Berlin | DEU | 3386667 |
| Frankfurt | DEU | 643821 |
| Moscow | RUS | 8389200 |
| New York | USA | 8008278 |
| Chicago | USA | 2896016 |
| Seattle | USA | 563374 |
+-----------+---------+------------+
+---------+----------+
| country | total |
+---------+----------+
| DEU | 4030488 |
| RUS | 8389200 |
| USA | 11467668 |
+---------+----------+
6
Window functions
• Function is computed over an ordered partition (=group)
• Groups are not collapsed
select
name,
rank() over (partition by country,
order by population desc)
from cities
+-----------+---------+------------+
| name | country | population |
+-----------+---------+------------+
| Berlin | DEU | 3386667 |
| Frankfurt | DEU | 643821 |
| Moscow | RUS | 8389200 |
| New York | USA | 8008278 |
| Chicago | USA | 2896016 |
| Seattle | USA | 563374 |
+-----------+---------+------------+
+-----------+------+
| name | rank |
+-----------+------+
| Berlin | 1 |
| Frankfurt | 2 |
| Moscow | 1 |
| New York | 1 |
| Chicago | 2 |
| Seattle | 3 |
+-----------+------+
7
Window functions
• Function is computed over an ordered partition (=group)
• Groups are not collapsed
select
name,
rank() over (partition by country,
order by population desc)
from cities
+-----------+---------+------------+
| name | country | population |
+-----------+---------+------------+
| Berlin | DEU | 3386667 |
| Frankfurt | DEU | 643821 |
| Moscow | RUS | 8389200 |
| New York | USA | 8008278 |
| Chicago | USA | 2896016 |
| Seattle | USA | 563374 |
+-----------+---------+------------+
+-----------+------+
| name | rank |
+-----------+------+
| Berlin | 1 |
| Frankfurt | 2 |
| Moscow | 1 |
| New York | 1 |
| Chicago | 2 |
| Seattle | 3 |
+-----------+------+
8
Plan
• What are window functions
– Basic window functions
– Frames
– Window functions and other parts of SQL
• Computing window functions
• Optimizations
9
Basic Window Functions
select
name, incidents
from
support_staff
+----------+-----------+
| name | incidents |
+----------+-----------+
| Claudio | 10 |
| Valeriy | 9 |
| Daniel | 9 |
| Geoff | 9 |
| Stephane | 8 |
+----------+-----------+
10
row_number()
select
name, incidents,
row_number() over (order by incidents desc) as ROW_NUM
from
support_staff
+----------+-----------+---------+
| name | incidents | ROW_NUM |
+----------+-----------+---------+
| Claudio | 10 | 1 |
| Valeriy | 9 | 2 |
| Daniel | 9 | 3 |
| Geoff | 9 | 4 |
| Stephane | 8 | 5 |
+----------+-----------+---------+
11
rank()
select
name, incidents,
row_number() over (order by incidents desc) as ROW_NUM,
rank() over (order by incidents desc) as RANK,
from
support_staff
+----------+-----------+---------+------+
| name | incidents | ROW_NUM | RANK |
+----------+-----------+---------+------+
| Claudio | 10 | 1 | 1 |
| Valeriy | 9 | 2 | 2 |
| Daniel | 9 | 3 | 2 |
| Geoff | 9 | 4 | 2 |
| Stephane | 8 | 5 | 5 |
+----------+-----------+---------+------+
12
dense_rank()
select
name, incidents,
row_number() over (order by incidents desc) as ROW_NUM,
rank() over (order by incidents desc) as RANK,
dense_rank() over (order by incidents desc) as DENSE_R,
from
support_staff
+----------+-----------+---------+------+---------+
| name | incidents | ROW_NUM | RANK | DENSE_R |
+----------+-----------+---------+------+---------+
| Claudio | 10 | 1 | 1 | 1 |
| Valeriy | 9 | 3 | 2 | 2 |
| Daniel | 9 | 4 | 2 | 2 |
| Geoff | 9 | 2 | 2 | 2 |
| Stephane | 8 | 5 | 5 | 3 |
+----------+-----------+---------+------+---------+
13
ntile(n)
select
name, incidents,
row_number() over (order by incidents desc) as ROW_NUM,
rank() over (order by incidents desc) as RANK,
dense_rank() over (order by incidents desc) as DENSE_R,
ntile(4) over (order by incidents desc) as QARTILE,
from
support_staff
+----------+-----------+---------+------+---------+----------+
| name | incidents | ROW_NUM | RANK | DENSE_R | QUARTILE |
+----------+-----------+---------+------+---------+----------+
| Claudio | 10 | 1 | 1 | 1 | 1 |
| Valeriy | 9 | 2 | 2 | 2 | 1 |
| Daniel | 9 | 3 | 2 | 2 | 2 |
| Geoff | 9 | 4 | 2 | 2 | 3 |
| Stephane | 8 | 5 | 5 | 3 | 4 |
+----------+-----------+---------+------+---------+----------+
14
Conclusions so far
• Window functions are similar to aggregates
• Computed on (current_row, ordered_list(window_rows))
• Can compute relative standing of row wrt other rows
• RANK, DENSE_RANK, ...
15
Plan
• What are window functions
– Basic window functions
– Frames
– Window functions and other parts of SQL
• Computing window functions
• Optimizations
16
Framed window functions
• Some Window Functions use FRAMES
– e.g. Aggregates that are used as
window functions
• Window function is computed on rows
in the frame.
• Frame is inside PARTITION BY
• Frame moves with the current row
• There are various frame types
17
Smoothing Noisy Data
• Noisy data
acquisition
solution
17
SELECT time, raw_data
FROM sensor_data;
18
Smoothing Noisy Data
• Noisy data
acquisition
solution
SELECT time, raw_data
AVG(raw_data) OVER (
)
FROM sensor_data;
19
Smoothing Noisy Data
• Noisy data
acquisition
solution
SELECT time, raw_data
AVG(raw_data) OVER (
ORDER BY time
)
FROM sensor_data;
20
Smoothing Noisy Data
• Noisy data
acquisition
solution
SELECT time, raw_data
AVG(raw_data) OVER (
ORDER BY time
ROWS BETWEEN
3 PRECEDING AND
3 FOLLOWING )
FROM sensor_data;
21
Smoothing Noisy Data
• Noisy data
acquisition
solution
SELECT time, raw_data
AVG(raw_data) OVER (
ORDER BY time
ROWS BETWEEN
6 PRECEDING AND
6 FOLLOWING )
FROM sensor_data;
22
Account balance statement
• Generate balance sheet for bank account.
• Incoming transactions.
• Outgoing transactions.
+-----+----------+--------+
| tid | date | amount |
+-----+----------+--------+
| 1 | 20160401 | 2000 |
| 2 | 20160402 | -30.5 |
| 3 | 20160404 | -45.5 |
| 4 | 20160405 | -125.5 |
| 5 | 20160406 | 100.3 |
+-----+----------+--------+
select tid, date, amount
from transactions
where account_id = 12345;
23
Account balance statement
SELECT tid, date, amount
FROM transactions
WHERE account_id = 12345;
+-----+----------+--------+
| tid | date | amount |
+-----+----------+--------+
| 1 | 20160401 | 2000 |
| 2 | 20160402 | -30.5 |
| 3 | 20160404 | -45.5 |
| 4 | 20160405 | -125.5 |
| 5 | 20160406 | 100.3 |
+-----+----------+--------+
24
Account balance statement
SELECT tid, date, amount,
( SELECT SUM(amount)
FROM transactions t
WHERE t.date <= date AND
account_id = 12345 ) AS balance
FROM transactions
WHERE account_id = 12345;
+-----+----------+--------+----------+
| tid | date | amount | balance |
+-----+----------+--------+----------+
| 1 | 20160401 | 2000 | 2000 |
| 2 | 20160402 | -30.5 | 1969.5 |
| 3 | 20160404 | -45.5 | 1924 |
| 4 | 20160405 | -125.5 | 1798.5 |
| 5 | 20160406 | 100.3 | 1898.8 |
+-----+----------+--------+----------+
25
Account balance statement
SELECT tid, date, amount,
SUM(amount) OVER (ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND
CURRENT ROW) AS balance
FROM transactions
WHERE account_id = 12345;
+-----+----------+--------+----------+
| tid | date | amount | balance |
+-----+----------+--------+----------+
| 1 | 20160401 | 2000 | 2000 |
| 2 | 20160402 | -30.5 | 1969.5 |
| 3 | 20160404 | -45.5 | 1924 |
| 4 | 20160405 | -125.5 | 1798.5 |
| 5 | 20160406 | 100.3 | 1898.8 |
+-----+----------+--------+----------+
26
Account balance statement
• How do queries compare?
# Rows Regular SQL Window Functions
100 3.72 sec 0.01 sec
500 30.04 sec 0.01 sec
1000 59.6 sec 0.02 sec
2000 1 min 59 sec 0.03 sec
4000 4 min 1 sec 0.04 sec
16000 18 min 26 sec 0.18 sec
27
RANGE-type frames
• Useful when interval of interest has multiple/missing rows
• ORDER BY column -- one numeric column
• RANGE n PRECEDING
rows with R.column >= (current_row.column – n)
• RANGE n FOLLOWING
rows with R.column <= (current_row.column + n)
• CURRENT ROW
current row and rows with R.column = current_row.column
28
RANGE-type frames
• Expenses from today and yesterday:
+----------+-------+--------+------+
| exp_date | name | amount | sum |
+----------+-------+--------+------+
| 20160407 | bus | 4 | 4 |
| 20160409 | beer | 2 | |
| 20160410 | wine | 4 | |
| 20160410 | snack | 12 | |
+----------+-------+--------+------+
select
*,
sum(amount) over (order by exp_date
range between 1 preceding and
current row) as sum
from expenses
29
RANGE-type frames
• Expenses from today and yesterday:
+----------+-------+--------+------+
| exp_date | name | amount | sum |
+----------+-------+--------+------+
| 20160407 | bus | 4 | 4 |
| 20160409 | beer | 2 | |
| 20160410 | wine | 4 | |
| 20160410 | snack | 12 | |
+----------+-------+--------+------+
select
*,
sum(amount) over (order by exp_date
range between 1 preceding and
current row) as sum
from expenses
30
RANGE-type frames
• Expenses from today and yesterday:
+----------+-------+--------+------+
| exp_date | name | amount | sum |
+----------+-------+--------+------+
| 20160407 | bus | 4 | 4 |
| 20160409 | beer | 2 | 2 |
| 20160410 | wine | 4 | |
| 20160410 | snack | 12 | |
+----------+-------+--------+------+
select
*,
sum(amount) over (order by exp_date
range between 1 preceding and
current row) as sum
from expenses
31
RANGE-type frames
• Expenses from today and yesterday:
+----------+-------+--------+------+
| exp_date | name | amount | sum |
+----------+-------+--------+------+
| 20160407 | bus | 4 | 4 |
| 20160409 | beer | 2 | 2 |
| 20160410 | wine | 4 | 18 |
| 20160410 | snack | 12 | |
+----------+-------+--------+------+
select
*,
sum(amount) over (order by exp_date
range between 1 preceding and
current row) as sum
from expenses
32
RANGE-type frames
• Expenses from today and yesterday:
+----------+-------+--------+------+
| exp_date | name | amount | sum |
+----------+-------+--------+------+
| 20160407 | bus | 4 | 4 |
| 20160409 | beer | 2 | 2 |
| 20160410 | wine | 4 | 18 |
| 20160410 | snack | 12 | 18 |
+----------+-------+--------+------+
select
*,
sum(amount) over (order by exp_date
range between 1 preceding and
current row) as sum
from expenses
33
Date columns with RANGE-type frames
• Date columns and temporal intervals (MDEV-9727)
AVG(value) OVER (ORDER BY date_col
RANGE BETWEEN INTERVAL 1 MONTH PRECEDING
AND INTERVAL 1 MONTH FOLLOWING)
• SQL Standard allows this
• Not supported by PostgreSQL or MS SQL Server
• Intend to support in MariaDB.
34
FRAME syntax
• ROWS|RANGE PRECEDING|FOLLOWING:
35
Frames summary
• Some window functions use frames
– e.g. Aggregate functions used as window functions
• Frame moves with the current row
• RANGE/ROWS-type frames
– MariaDB supports all kinds
• Useful for
– Cumulative sums
– Running averages
– Getting aggregates without doing GROUP BY
36
The Island problem
• Given a set of ordered integers, find the start and end of
sequences that have no missing numbers.
Ex: 2, 3, 10, 11, 12, 15, 16, 17
• A common problem, with plenty of use cases:
– Used in sales to identify activity periods.
– Detecting outages.
– Stock market analysis.
37
The Island problem
SELECT value
FROM islands
ORDER BY value;
+-------+
| value |
+-------+
| 2 |
| 3 |
| 10 |
| 11 |
| 12 |
| 15 |
| 16 |
| 17 |
+-------+
+-------------+-----------+
| start_range | end_range |
+-------------+-----------+
| 2 | 3 |
| 10 | 12 |
| 15 | 17 |
+-------------+-----------+
38
The Island problem
SELECT value, (SELECT ??? ) AS grp
FROM islands
ORDER BY value;
+-------+------+
| value | grp |
+-------+------+
| 2 | a |
| 3 | a |
| 10 | b |
| 11 | b |
| 12 | b |
| 15 | c |
| 16 | c |
| 17 | c |
+-------+------+
+-------------+-----------+
| start_range | end_range |
+-------------+-----------+
| 2 | 3 |
| 10 | 12 |
| 15 | 17 |
+-------------+-----------+
39
The Island problem
SELECT value, (SELECT ??? ) AS grp
FROM islands
ORDER BY value;
+-------+------+
| value | grp |
+-------+------+
| 2 | a |
| 3 | a |
| 10 | b |
| 11 | b |
| 12 | b |
| 15 | c |
| 16 | c |
| 17 | c |
+-------+------+
+-------------+-----------+
| start_range | end_range |
+-------------+-----------+
| 2 | 3 |
| 10 | 12 |
| 15 | 17 |
+-------------+-----------+
SELECT MIN(value) AS start_range
MAX(value) AS end_range
FROM islands
GROUP BY grp;
40
The Island problem – generating the groups
SELECT value, (SELECT ??? ) AS grp
FROM islands
ORDER BY value;
+-------+------+
| value | grp |
+-------+------+
| 2 | 3 |
| 3 | 3 |
| 10 | 12 |
| 11 | 12 |
| 12 | 12 |
| 15 | 17 |
| 16 | 17 |
| 17 | 17 |
+-------+------+
41
The Island problem – generating the groups
SELECT value,
( SELECT MIN(B.value)
FROM islands AS B
WHERE B.value >= A.value
AND NOT EXISTS
( SELECT *
FROM islands AS C
WHERE C.col1 = B.col1 + 1)
) AS grp
FROM islands as A
ORDER BY value;
+-------+------+
| value | grp |
+-------+------+
| 2 | 3 |
| 3 | 3 |
| 10 | 12 |
| 11 | 12 |
| 12 | 12 |
| 15 | 17 |
| 16 | 17 |
| 17 | 17 |
+-------+------+
42
The Island problem – generating the groups
SELECT value,
( SELECT MIN(B.value)
FROM islands AS B
WHERE B.value >= A.value
AND NOT EXISTS
( SELECT *
FROM islands AS C
WHERE C.value = B.value + 1)
) AS grp
FROM islands as A
ORDER BY value;
+-------+------+
| value | grp |
+-------+------+
| 2 | 3 |
| 3 | 3 |
| 10 | 12 |
| 11 | 12 |
| 12 | 12 |
| 15 | 17 |
| 16 | 17 |
| 17 | 17 |
+-------+------+
43
The Island problem – generating the groups
43
SELECT value,
ROW_NUMBER() OVER
(ORDER BY value)
AS grp
FROM islands as A
ORDER BY value;
+-------+------+
| value | grp |
+-------+------+
| 2 | 1 |
| 3 | 2 |
| 10 | 3 |
| 11 | 4 |
| 12 | 5 |
| 15 | 6 |
| 16 | 7 |
| 17 | 8 |
+-------+------+
44
The Island problem – generating the groups
SELECT value,
value - ROW_NUMBER() OVER
(ORDER BY value)
AS grp
FROM islands as A
ORDER BY value;
+-------+------+
| value | grp |
+-------+------+
| 2 | 1 |
| 3 | 1 |
| 10 | 7 |
| 11 | 7 |
| 12 | 7 |
| 15 | 9 |
| 16 | 9 |
| 17 | 9 |
+-------+------+
45
The Island problem – generating the groups
SELECT value,
value - ROW_NUMBER() OVER
(ORDER BY value)
AS grp
FROM islands as A
ORDER BY value;
SELECT value,
( SELECT MIN(B.value)
FROM islands AS B
WHERE B.value >= A.value
AND NOT EXISTS
(SELECT *
FROM islands AS C
WHERE C.value = B.value + 1)
) AS grp
FROM islands as A
ORDER BY value;
46
Plan
• What are window functions
– Basic window functions
– Frames
– Window functions and other parts of SQL
• Computing window functions
• Optimizations
SergeyP
47
Window Functions and other SQL constructs
• Can have
WIN_FUNC(AGG_FUNC)
Join Group
Check
HAVING
DISTINCT Sort + Limit
Compute
Window
Functions
• Window functions can appear in
– SELECT list
– ORDER BY clause
48
Filtering on window function value
• How to filter for e.g. RANK() < 3 ? Use a subquery.
select
name, incidents,
row_number() over (order by incidents desc) as ROW_NUM
from
support_staff
49
Filtering on window function value
• How to filter for e.g. RANK() < 3 ? Use a subquery.
select * from (
select
name, incidents,
row_number() over (order by incidents desc) as ROW_NUM
from
support_staff
) as TBL
where TBL.ROW_NUM < 3
50
Plan
• What are window functions
– Basic window functions
– Frames
– Window functions and other parts of SQL
• Computing window functions
• Optimizations
51
Computing Window functions
group
table
table
table
join
• Join, grouping
• All partitions are mixed
together
52
Computing Window functions
• Put join output into a
temporary table
• Sort it by
(PARTITITION BY clause,
ORDER BY clause)
group
table
table
table
join
sort
Sort by:
PARTITION BY clause,
ORDER BY clause
53
Computing window function for a row
• Can look at
– Current row
– Rows in the partition, ordered
• Can compute the window function
• Computing values individually would
be expensive
– O(#rows_in_partition ^ 2)
54
“Streamable” window functions
• ROW_NUMBER, RANK, DENSE_RANK, ...
– Can walk down and compute values on
the fly
• NTILE, CUME_DIST, PERCENT_RANK
– Get #rows in the partition
– Then walk down and compute values on
the fly.
55
Computing framed window functions
• window_func(rows_in_the_frame)
• Frame moves with the current row
56
Computing framed window functions
20
10
$total+10-20
$total
• window_func(rows_in_the_frame)
• Frame moves with the current row
• Some functions allow to add and
remove rows
– SUM, COUNT, AVG, BIT_OR, BIT_*
• Can compute efficiently
– Done in MariaDB 10.2.0.
57
Some aggregates make streaming hard
20
21
19
10
MAX= ?
MAX=21
• MIN, MAX
• Need to track the whole window
– Doable for small frames
● Can also re-calculate
– Hard for bigger frames
• Are big frames used?
• Not implemented yet.
58
LEAD and LAG issues
• LAG(expr, N) – “expr N rows before”
– LAG(expr,1) - previous
• Non-constant N?
• Lookups to arbitrary rows
– Expensive
– Worth doing at all?
LAG(..., 2)
59
Summary for computing window functions
• Sort by (partition_by, order_by)
• Then walk through and compute window functions
• Most functions can be computed on-the-fly
• Framed window functions require moving the frame
– SUM, COUNT, AVG .. - can update value as frame moves
– MIN, MAX – more complex
• LEAD, LAG may require random reads
60
Plan
• What are window functions
– Basic window functions
– Frames
– Window functions and other parts of SQL
• Computing window functions
• Optimizations
61
Optimizations
Do window functions
optimizations matter?
62
join
A query with window functions
select
'web' as channel
,web.item
,web.return_ratio
,web.return_rank
,web.currency_rank
from (
select
item
,return_ratio
,currency_ratio
,rank() over (order by return_ratio) as return_rank
,rank() over (order by currency_ratio) as currency_rank
from
( select ws.ws_item_sk as item
,(cast(sum(coalesce(wr.wr_return_quantity,0)) as decimal(15,4))/
cast(sum(coalesce(ws.ws_quantity,0)) as decimal(15,4) )) as return_ratio
,(cast(sum(coalesce(wr.wr_return_amt,0)) as decimal(15,4))/
cast(sum(coalesce(ws.ws_net_paid,0)) as decimal(15,4) )) as currency_ratio
from
web_sales ws left outer join web_returns wr
on (ws.ws_order_number = wr.wr_order_number and
ws.ws_item_sk = wr.wr_item_sk)
,date_dim
where
wr.wr_return_amt > 10000
and ws.ws_net_profit > 1
and ws.ws_net_paid > 0
and ws.ws_quantity > 0
and ws_sold_date_sk = d_date_sk
and ws_sold_date_sk between 2452245 and 2452275
and d_year = 2001
and d_moy = 12
group by ws.ws_item_sk
) in_web
) web
where
web.return_rank <= 10 or web.currency_rank <= 10
Window functions
group
table
table
table
sort
63
Still, there are optimizations
• Doing fewer sorts
• Condition pushdown through PARTITION BY
64
Doing fewer sorts
tbl
tbl
tbl
join
sort
select
rank() over (order by incidents),
ntile(4)over (order by incidents),
rank() over (order by incidents,
join_date),
from
support_staff
• Each window function requires a sort
• Can avoid sorting if using an index (MariaDB: not yet)
• Identical PARTITION/ORDER BY must share the sort step
• Compatible may share the sort step
– MariaDB: yes (but have bugs atm)
– PostgreSQL: yes, limited
65
Condition pushdown through PARTITION BY
select * from (
select
name,
rank() over (partition by dept
order by incidents desc) as R
from
staff
) as TBL
where dept='Support'
staff sort
Development
Consulting
Supportsort
66
Condition pushdown into PARTITION BY
• Other databases have this
• In MariaDB, requires:
MDEV-9197: Pushdown conditions into non-mergeable views/
derived tables
MDEV-7486: Condition pushdown from HAVING into WHERE
• These are 10.2 tasks too
• Considering it
67
Optimizations summary
• Not much need/room for optimizations in many cases
– Window function is a small part of the query
• Optimizations to have
– Share the sort across window functions (have [bugs])
– Condition pushdown through PARTITION BY
●
Depends on another 10.2 task
●
Want to have it
68
Conclusions
• Window functions coming in MariaDB 10.2!
• Already have ~SQL:2003 level features
• Intend to have ~SQL:2011 features
– Comparable with “big 3” databases
• Work on optimizations is in progress
– Send us your cases.
69
Thanks
Q & A

More Related Content

PDF
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
PDF
MariaDB: Engine Independent Table Statistics, including histograms
PDF
Common Table Expressions in MariaDB 10.2
PDF
SQL window functions for MySQL
PDF
Optimizer features in recent releases of other databases
PDF
Mysqlconf2013 mariadb-cassandra-interoperability
PDF
ANALYZE for Statements - MariaDB's hidden gem
PDF
Using histograms to get better performance
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
MariaDB: Engine Independent Table Statistics, including histograms
Common Table Expressions in MariaDB 10.2
SQL window functions for MySQL
Optimizer features in recent releases of other databases
Mysqlconf2013 mariadb-cassandra-interoperability
ANALYZE for Statements - MariaDB's hidden gem
Using histograms to get better performance

What's hot (20)

PDF
Optimizer Trace Walkthrough
PDF
Query Optimizer in MariaDB 10.4
PDF
Window functions in MySQL 8.0
PDF
M|18 Understanding the Query Optimizer
PDF
Lessons for the optimizer from running the TPC-DS benchmark
PDF
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
PDF
M|18 Taking Advantage of Common Table Expressions
PDF
Mathematica for Physicits
PDF
Using Optimizer Hints to Improve MySQL Query Performance
PDF
MySQL 8.0: Common Table Expressions
PPT
Lec2 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Num...
PPTX
Hive in Practice
PDF
MySQL 8.0 EXPLAIN ANALYZE
PDF
MariaDB: ANALYZE for statements (lightning talk)
PDF
Fulltext engine for non fulltext searches
PPTX
Adaptive Query Optimization in 12c
PDF
Advanced fulltext search with Sphinx
PDF
Developers' mDay 2017. - Bogdan Kecman Oracle
PDF
Histograms : Pre-12c and Now
PDF
Scaling PostreSQL with Stado
Optimizer Trace Walkthrough
Query Optimizer in MariaDB 10.4
Window functions in MySQL 8.0
M|18 Understanding the Query Optimizer
Lessons for the optimizer from running the TPC-DS benchmark
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
M|18 Taking Advantage of Common Table Expressions
Mathematica for Physicits
Using Optimizer Hints to Improve MySQL Query Performance
MySQL 8.0: Common Table Expressions
Lec2 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Num...
Hive in Practice
MySQL 8.0 EXPLAIN ANALYZE
MariaDB: ANALYZE for statements (lightning talk)
Fulltext engine for non fulltext searches
Adaptive Query Optimization in 12c
Advanced fulltext search with Sphinx
Developers' mDay 2017. - Bogdan Kecman Oracle
Histograms : Pre-12c and Now
Scaling PostreSQL with Stado
Ad

Similar to Window functions in MariaDB 10.2 (20)

PDF
Data Love Conference - Window Functions for Database Analytics
PPT
MYSQL database presentation slides with examples
PDF
Optimizing Queries Using Window Functions
PDF
MySQL Kitchen : spice up your everyday SQL queries
PPT
Applied Partitioning And Scaling Your Database System Presentation
PDF
Modern query optimisation features in MySQL 8.
PDF
M|18 User Defined Function
ODP
PDF
Need for Speed: MySQL Indexing
PDF
Explain2
PPT
4. Data Manipulation.ppt
PDF
Building advanced data-driven applications
PDF
BigQuery implementation
PDF
Mysqlfunctions
PPTX
Optimizing queries MySQL
PDF
Design and Develop SQL DDL statements which demonstrate the use of SQL objec...
PDF
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...
PDF
M|18 Querying Data at a Previous Point in Time
PDF
New optimizer features in MariaDB releases before 10.12
PDF
Workshop 20140522 BigQuery Implementation
Data Love Conference - Window Functions for Database Analytics
MYSQL database presentation slides with examples
Optimizing Queries Using Window Functions
MySQL Kitchen : spice up your everyday SQL queries
Applied Partitioning And Scaling Your Database System Presentation
Modern query optimisation features in MySQL 8.
M|18 User Defined Function
Need for Speed: MySQL Indexing
Explain2
4. Data Manipulation.ppt
Building advanced data-driven applications
BigQuery implementation
Mysqlfunctions
Optimizing queries MySQL
Design and Develop SQL DDL statements which demonstrate the use of SQL objec...
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...
M|18 Querying Data at a Previous Point in Time
New optimizer features in MariaDB releases before 10.12
Workshop 20140522 BigQuery Implementation
Ad

More from Sergey Petrunya (19)

PDF
MariaDB's New-Generation Optimizer Hints
PDF
MariaDB's join optimizer: how it works and current fixes
PDF
Improved histograms in MariaDB 10.8
PDF
Improving MariaDB’s Query Optimizer with better selectivity estimates
PDF
JSON Support in MariaDB: News, non-news and the bigger picture
PDF
MariaDB 10.4 - что нового
PDF
MariaDB Optimizer - further down the rabbit hole
PDF
MariaDB 10.3 Optimizer - where does it stand
PDF
MyRocks in MariaDB | M18
PDF
New Query Optimizer features in MariaDB 10.3
PDF
MyRocks in MariaDB
PDF
Histograms in MariaDB, MySQL and PostgreSQL
PDF
Say Hello to MyRocks
PDF
MyRocks in MariaDB: why and how
PDF
Эволюция репликации в MySQL и MariaDB
PDF
MariaDB 10.1 - что нового.
PPTX
MyRocks: табличный движок для MySQL на основе RocksDB
PDF
ANALYZE for executable statements - a new way to do optimizer troubleshooting...
PDF
MariaDB 10.0 Query Optimizer
MariaDB's New-Generation Optimizer Hints
MariaDB's join optimizer: how it works and current fixes
Improved histograms in MariaDB 10.8
Improving MariaDB’s Query Optimizer with better selectivity estimates
JSON Support in MariaDB: News, non-news and the bigger picture
MariaDB 10.4 - что нового
MariaDB Optimizer - further down the rabbit hole
MariaDB 10.3 Optimizer - where does it stand
MyRocks in MariaDB | M18
New Query Optimizer features in MariaDB 10.3
MyRocks in MariaDB
Histograms in MariaDB, MySQL and PostgreSQL
Say Hello to MyRocks
MyRocks in MariaDB: why and how
Эволюция репликации в MySQL и MariaDB
MariaDB 10.1 - что нового.
MyRocks: табличный движок для MySQL на основе RocksDB
ANALYZE for executable statements - a new way to do optimizer troubleshooting...
MariaDB 10.0 Query Optimizer

Recently uploaded (20)

PDF
Top 10 Project Management Software for Small Teams in 2025.pdf
PPTX
Chapter 1 - Transaction Processing and Mgt.pptx
PDF
Mobile App for Guard Tour and Reporting.pdf
PPTX
UNIT II: Software design, software .pptx
PPTX
Lesson-3-Operation-System-Support.pptx-I
PPTX
AI Tools Revolutionizing Software Development Workflows
PPTX
WJQSJXNAZJVCVSAXJHBZKSJXKJKXJSBHJBJEHHJB
PPTX
SAP Business AI_L1 Overview_EXTERNAL.pptx
PDF
Cloud Native Aachen Meetup - Aug 21, 2025
PPTX
ESDS_SAP Application Cloud Offerings.pptx
PDF
Building an Inclusive Web Accessibility Made Simple with Accessibility Analyzer
PPTX
ROI from Efficient Content & Campaign Management in the Digital Media Industry
PDF
IT Consulting Services to Secure Future Growth
PDF
Sanket Mhaiskar Resume - Senior Software Engineer (Backend, AI)
PDF
Mobile App Backend Development with WordPress REST API: The Complete eBook
PPTX
Streamlining Project Management in the AV Industry with D-Tools for Zoho CRM ...
PPT
3.Software Design for software engineering
PPTX
Human-Computer Interaction for Lecture 1
PPTX
Bandicam Screen Recorder 8.2.1 Build 2529 Crack
PPTX
HackYourBrain__UtrechtJUG__11092025.pptx
Top 10 Project Management Software for Small Teams in 2025.pdf
Chapter 1 - Transaction Processing and Mgt.pptx
Mobile App for Guard Tour and Reporting.pdf
UNIT II: Software design, software .pptx
Lesson-3-Operation-System-Support.pptx-I
AI Tools Revolutionizing Software Development Workflows
WJQSJXNAZJVCVSAXJHBZKSJXKJKXJSBHJBJEHHJB
SAP Business AI_L1 Overview_EXTERNAL.pptx
Cloud Native Aachen Meetup - Aug 21, 2025
ESDS_SAP Application Cloud Offerings.pptx
Building an Inclusive Web Accessibility Made Simple with Accessibility Analyzer
ROI from Efficient Content & Campaign Management in the Digital Media Industry
IT Consulting Services to Secure Future Growth
Sanket Mhaiskar Resume - Senior Software Engineer (Backend, AI)
Mobile App Backend Development with WordPress REST API: The Complete eBook
Streamlining Project Management in the AV Industry with D-Tools for Zoho CRM ...
3.Software Design for software engineering
Human-Computer Interaction for Lecture 1
Bandicam Screen Recorder 8.2.1 Build 2529 Crack
HackYourBrain__UtrechtJUG__11092025.pptx

Window functions in MariaDB 10.2

  • 2. 2 Plan • What are window functions – Basic window functions – Frames – Window functions and other parts of SQL • Computing window functions • Optimizations
  • 3. 3 Plan • What are window functions – Basic window functions – Frames – Window functions and other parts of SQL • Computing window functions • Optimizations
  • 4. 4 Scalar functions select concat(Name, ' in ', Country) from Cities Peking in CHN Berlin in DEU Moscow in RUS Chicago in USA +------+---------+---------+ | ID | Name | Country | +------+---------+---------+ | 1891 | Peking | CHN | | 3068 | Berlin | DEU | | 3580 | Moscow | RUS | | 3795 | Chicago | USA | +------+---------+---------+ • Compute values based on the current row
  • 5. 5 Aggregate functions • Compute summary for the group • Group is collapsed into summary row select country, sum(Population) as total from Cities group by country +-----------+---------+------------+ | name | country | population | +-----------+---------+------------+ | Berlin | DEU | 3386667 | | Frankfurt | DEU | 643821 | | Moscow | RUS | 8389200 | | New York | USA | 8008278 | | Chicago | USA | 2896016 | | Seattle | USA | 563374 | +-----------+---------+------------+ +---------+----------+ | country | total | +---------+----------+ | DEU | 4030488 | | RUS | 8389200 | | USA | 11467668 | +---------+----------+
  • 6. 6 Window functions • Function is computed over an ordered partition (=group) • Groups are not collapsed select name, rank() over (partition by country, order by population desc) from cities +-----------+---------+------------+ | name | country | population | +-----------+---------+------------+ | Berlin | DEU | 3386667 | | Frankfurt | DEU | 643821 | | Moscow | RUS | 8389200 | | New York | USA | 8008278 | | Chicago | USA | 2896016 | | Seattle | USA | 563374 | +-----------+---------+------------+ +-----------+------+ | name | rank | +-----------+------+ | Berlin | 1 | | Frankfurt | 2 | | Moscow | 1 | | New York | 1 | | Chicago | 2 | | Seattle | 3 | +-----------+------+
  • 7. 7 Window functions • Function is computed over an ordered partition (=group) • Groups are not collapsed select name, rank() over (partition by country, order by population desc) from cities +-----------+---------+------------+ | name | country | population | +-----------+---------+------------+ | Berlin | DEU | 3386667 | | Frankfurt | DEU | 643821 | | Moscow | RUS | 8389200 | | New York | USA | 8008278 | | Chicago | USA | 2896016 | | Seattle | USA | 563374 | +-----------+---------+------------+ +-----------+------+ | name | rank | +-----------+------+ | Berlin | 1 | | Frankfurt | 2 | | Moscow | 1 | | New York | 1 | | Chicago | 2 | | Seattle | 3 | +-----------+------+
  • 8. 8 Plan • What are window functions – Basic window functions – Frames – Window functions and other parts of SQL • Computing window functions • Optimizations
  • 9. 9 Basic Window Functions select name, incidents from support_staff +----------+-----------+ | name | incidents | +----------+-----------+ | Claudio | 10 | | Valeriy | 9 | | Daniel | 9 | | Geoff | 9 | | Stephane | 8 | +----------+-----------+
  • 10. 10 row_number() select name, incidents, row_number() over (order by incidents desc) as ROW_NUM from support_staff +----------+-----------+---------+ | name | incidents | ROW_NUM | +----------+-----------+---------+ | Claudio | 10 | 1 | | Valeriy | 9 | 2 | | Daniel | 9 | 3 | | Geoff | 9 | 4 | | Stephane | 8 | 5 | +----------+-----------+---------+
  • 11. 11 rank() select name, incidents, row_number() over (order by incidents desc) as ROW_NUM, rank() over (order by incidents desc) as RANK, from support_staff +----------+-----------+---------+------+ | name | incidents | ROW_NUM | RANK | +----------+-----------+---------+------+ | Claudio | 10 | 1 | 1 | | Valeriy | 9 | 2 | 2 | | Daniel | 9 | 3 | 2 | | Geoff | 9 | 4 | 2 | | Stephane | 8 | 5 | 5 | +----------+-----------+---------+------+
  • 12. 12 dense_rank() select name, incidents, row_number() over (order by incidents desc) as ROW_NUM, rank() over (order by incidents desc) as RANK, dense_rank() over (order by incidents desc) as DENSE_R, from support_staff +----------+-----------+---------+------+---------+ | name | incidents | ROW_NUM | RANK | DENSE_R | +----------+-----------+---------+------+---------+ | Claudio | 10 | 1 | 1 | 1 | | Valeriy | 9 | 3 | 2 | 2 | | Daniel | 9 | 4 | 2 | 2 | | Geoff | 9 | 2 | 2 | 2 | | Stephane | 8 | 5 | 5 | 3 | +----------+-----------+---------+------+---------+
  • 13. 13 ntile(n) select name, incidents, row_number() over (order by incidents desc) as ROW_NUM, rank() over (order by incidents desc) as RANK, dense_rank() over (order by incidents desc) as DENSE_R, ntile(4) over (order by incidents desc) as QARTILE, from support_staff +----------+-----------+---------+------+---------+----------+ | name | incidents | ROW_NUM | RANK | DENSE_R | QUARTILE | +----------+-----------+---------+------+---------+----------+ | Claudio | 10 | 1 | 1 | 1 | 1 | | Valeriy | 9 | 2 | 2 | 2 | 1 | | Daniel | 9 | 3 | 2 | 2 | 2 | | Geoff | 9 | 4 | 2 | 2 | 3 | | Stephane | 8 | 5 | 5 | 3 | 4 | +----------+-----------+---------+------+---------+----------+
  • 14. 14 Conclusions so far • Window functions are similar to aggregates • Computed on (current_row, ordered_list(window_rows)) • Can compute relative standing of row wrt other rows • RANK, DENSE_RANK, ...
  • 15. 15 Plan • What are window functions – Basic window functions – Frames – Window functions and other parts of SQL • Computing window functions • Optimizations
  • 16. 16 Framed window functions • Some Window Functions use FRAMES – e.g. Aggregates that are used as window functions • Window function is computed on rows in the frame. • Frame is inside PARTITION BY • Frame moves with the current row • There are various frame types
  • 17. 17 Smoothing Noisy Data • Noisy data acquisition solution 17 SELECT time, raw_data FROM sensor_data;
  • 18. 18 Smoothing Noisy Data • Noisy data acquisition solution SELECT time, raw_data AVG(raw_data) OVER ( ) FROM sensor_data;
  • 19. 19 Smoothing Noisy Data • Noisy data acquisition solution SELECT time, raw_data AVG(raw_data) OVER ( ORDER BY time ) FROM sensor_data;
  • 20. 20 Smoothing Noisy Data • Noisy data acquisition solution SELECT time, raw_data AVG(raw_data) OVER ( ORDER BY time ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING ) FROM sensor_data;
  • 21. 21 Smoothing Noisy Data • Noisy data acquisition solution SELECT time, raw_data AVG(raw_data) OVER ( ORDER BY time ROWS BETWEEN 6 PRECEDING AND 6 FOLLOWING ) FROM sensor_data;
  • 22. 22 Account balance statement • Generate balance sheet for bank account. • Incoming transactions. • Outgoing transactions. +-----+----------+--------+ | tid | date | amount | +-----+----------+--------+ | 1 | 20160401 | 2000 | | 2 | 20160402 | -30.5 | | 3 | 20160404 | -45.5 | | 4 | 20160405 | -125.5 | | 5 | 20160406 | 100.3 | +-----+----------+--------+ select tid, date, amount from transactions where account_id = 12345;
  • 23. 23 Account balance statement SELECT tid, date, amount FROM transactions WHERE account_id = 12345; +-----+----------+--------+ | tid | date | amount | +-----+----------+--------+ | 1 | 20160401 | 2000 | | 2 | 20160402 | -30.5 | | 3 | 20160404 | -45.5 | | 4 | 20160405 | -125.5 | | 5 | 20160406 | 100.3 | +-----+----------+--------+
  • 24. 24 Account balance statement SELECT tid, date, amount, ( SELECT SUM(amount) FROM transactions t WHERE t.date <= date AND account_id = 12345 ) AS balance FROM transactions WHERE account_id = 12345; +-----+----------+--------+----------+ | tid | date | amount | balance | +-----+----------+--------+----------+ | 1 | 20160401 | 2000 | 2000 | | 2 | 20160402 | -30.5 | 1969.5 | | 3 | 20160404 | -45.5 | 1924 | | 4 | 20160405 | -125.5 | 1798.5 | | 5 | 20160406 | 100.3 | 1898.8 | +-----+----------+--------+----------+
  • 25. 25 Account balance statement SELECT tid, date, amount, SUM(amount) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS balance FROM transactions WHERE account_id = 12345; +-----+----------+--------+----------+ | tid | date | amount | balance | +-----+----------+--------+----------+ | 1 | 20160401 | 2000 | 2000 | | 2 | 20160402 | -30.5 | 1969.5 | | 3 | 20160404 | -45.5 | 1924 | | 4 | 20160405 | -125.5 | 1798.5 | | 5 | 20160406 | 100.3 | 1898.8 | +-----+----------+--------+----------+
  • 26. 26 Account balance statement • How do queries compare? # Rows Regular SQL Window Functions 100 3.72 sec 0.01 sec 500 30.04 sec 0.01 sec 1000 59.6 sec 0.02 sec 2000 1 min 59 sec 0.03 sec 4000 4 min 1 sec 0.04 sec 16000 18 min 26 sec 0.18 sec
  • 27. 27 RANGE-type frames • Useful when interval of interest has multiple/missing rows • ORDER BY column -- one numeric column • RANGE n PRECEDING rows with R.column >= (current_row.column – n) • RANGE n FOLLOWING rows with R.column <= (current_row.column + n) • CURRENT ROW current row and rows with R.column = current_row.column
  • 28. 28 RANGE-type frames • Expenses from today and yesterday: +----------+-------+--------+------+ | exp_date | name | amount | sum | +----------+-------+--------+------+ | 20160407 | bus | 4 | 4 | | 20160409 | beer | 2 | | | 20160410 | wine | 4 | | | 20160410 | snack | 12 | | +----------+-------+--------+------+ select *, sum(amount) over (order by exp_date range between 1 preceding and current row) as sum from expenses
  • 29. 29 RANGE-type frames • Expenses from today and yesterday: +----------+-------+--------+------+ | exp_date | name | amount | sum | +----------+-------+--------+------+ | 20160407 | bus | 4 | 4 | | 20160409 | beer | 2 | | | 20160410 | wine | 4 | | | 20160410 | snack | 12 | | +----------+-------+--------+------+ select *, sum(amount) over (order by exp_date range between 1 preceding and current row) as sum from expenses
  • 30. 30 RANGE-type frames • Expenses from today and yesterday: +----------+-------+--------+------+ | exp_date | name | amount | sum | +----------+-------+--------+------+ | 20160407 | bus | 4 | 4 | | 20160409 | beer | 2 | 2 | | 20160410 | wine | 4 | | | 20160410 | snack | 12 | | +----------+-------+--------+------+ select *, sum(amount) over (order by exp_date range between 1 preceding and current row) as sum from expenses
  • 31. 31 RANGE-type frames • Expenses from today and yesterday: +----------+-------+--------+------+ | exp_date | name | amount | sum | +----------+-------+--------+------+ | 20160407 | bus | 4 | 4 | | 20160409 | beer | 2 | 2 | | 20160410 | wine | 4 | 18 | | 20160410 | snack | 12 | | +----------+-------+--------+------+ select *, sum(amount) over (order by exp_date range between 1 preceding and current row) as sum from expenses
  • 32. 32 RANGE-type frames • Expenses from today and yesterday: +----------+-------+--------+------+ | exp_date | name | amount | sum | +----------+-------+--------+------+ | 20160407 | bus | 4 | 4 | | 20160409 | beer | 2 | 2 | | 20160410 | wine | 4 | 18 | | 20160410 | snack | 12 | 18 | +----------+-------+--------+------+ select *, sum(amount) over (order by exp_date range between 1 preceding and current row) as sum from expenses
  • 33. 33 Date columns with RANGE-type frames • Date columns and temporal intervals (MDEV-9727) AVG(value) OVER (ORDER BY date_col RANGE BETWEEN INTERVAL 1 MONTH PRECEDING AND INTERVAL 1 MONTH FOLLOWING) • SQL Standard allows this • Not supported by PostgreSQL or MS SQL Server • Intend to support in MariaDB.
  • 34. 34 FRAME syntax • ROWS|RANGE PRECEDING|FOLLOWING:
  • 35. 35 Frames summary • Some window functions use frames – e.g. Aggregate functions used as window functions • Frame moves with the current row • RANGE/ROWS-type frames – MariaDB supports all kinds • Useful for – Cumulative sums – Running averages – Getting aggregates without doing GROUP BY
  • 36. 36 The Island problem • Given a set of ordered integers, find the start and end of sequences that have no missing numbers. Ex: 2, 3, 10, 11, 12, 15, 16, 17 • A common problem, with plenty of use cases: – Used in sales to identify activity periods. – Detecting outages. – Stock market analysis.
  • 37. 37 The Island problem SELECT value FROM islands ORDER BY value; +-------+ | value | +-------+ | 2 | | 3 | | 10 | | 11 | | 12 | | 15 | | 16 | | 17 | +-------+ +-------------+-----------+ | start_range | end_range | +-------------+-----------+ | 2 | 3 | | 10 | 12 | | 15 | 17 | +-------------+-----------+
  • 38. 38 The Island problem SELECT value, (SELECT ??? ) AS grp FROM islands ORDER BY value; +-------+------+ | value | grp | +-------+------+ | 2 | a | | 3 | a | | 10 | b | | 11 | b | | 12 | b | | 15 | c | | 16 | c | | 17 | c | +-------+------+ +-------------+-----------+ | start_range | end_range | +-------------+-----------+ | 2 | 3 | | 10 | 12 | | 15 | 17 | +-------------+-----------+
  • 39. 39 The Island problem SELECT value, (SELECT ??? ) AS grp FROM islands ORDER BY value; +-------+------+ | value | grp | +-------+------+ | 2 | a | | 3 | a | | 10 | b | | 11 | b | | 12 | b | | 15 | c | | 16 | c | | 17 | c | +-------+------+ +-------------+-----------+ | start_range | end_range | +-------------+-----------+ | 2 | 3 | | 10 | 12 | | 15 | 17 | +-------------+-----------+ SELECT MIN(value) AS start_range MAX(value) AS end_range FROM islands GROUP BY grp;
  • 40. 40 The Island problem – generating the groups SELECT value, (SELECT ??? ) AS grp FROM islands ORDER BY value; +-------+------+ | value | grp | +-------+------+ | 2 | 3 | | 3 | 3 | | 10 | 12 | | 11 | 12 | | 12 | 12 | | 15 | 17 | | 16 | 17 | | 17 | 17 | +-------+------+
  • 41. 41 The Island problem – generating the groups SELECT value, ( SELECT MIN(B.value) FROM islands AS B WHERE B.value >= A.value AND NOT EXISTS ( SELECT * FROM islands AS C WHERE C.col1 = B.col1 + 1) ) AS grp FROM islands as A ORDER BY value; +-------+------+ | value | grp | +-------+------+ | 2 | 3 | | 3 | 3 | | 10 | 12 | | 11 | 12 | | 12 | 12 | | 15 | 17 | | 16 | 17 | | 17 | 17 | +-------+------+
  • 42. 42 The Island problem – generating the groups SELECT value, ( SELECT MIN(B.value) FROM islands AS B WHERE B.value >= A.value AND NOT EXISTS ( SELECT * FROM islands AS C WHERE C.value = B.value + 1) ) AS grp FROM islands as A ORDER BY value; +-------+------+ | value | grp | +-------+------+ | 2 | 3 | | 3 | 3 | | 10 | 12 | | 11 | 12 | | 12 | 12 | | 15 | 17 | | 16 | 17 | | 17 | 17 | +-------+------+
  • 43. 43 The Island problem – generating the groups 43 SELECT value, ROW_NUMBER() OVER (ORDER BY value) AS grp FROM islands as A ORDER BY value; +-------+------+ | value | grp | +-------+------+ | 2 | 1 | | 3 | 2 | | 10 | 3 | | 11 | 4 | | 12 | 5 | | 15 | 6 | | 16 | 7 | | 17 | 8 | +-------+------+
  • 44. 44 The Island problem – generating the groups SELECT value, value - ROW_NUMBER() OVER (ORDER BY value) AS grp FROM islands as A ORDER BY value; +-------+------+ | value | grp | +-------+------+ | 2 | 1 | | 3 | 1 | | 10 | 7 | | 11 | 7 | | 12 | 7 | | 15 | 9 | | 16 | 9 | | 17 | 9 | +-------+------+
  • 45. 45 The Island problem – generating the groups SELECT value, value - ROW_NUMBER() OVER (ORDER BY value) AS grp FROM islands as A ORDER BY value; SELECT value, ( SELECT MIN(B.value) FROM islands AS B WHERE B.value >= A.value AND NOT EXISTS (SELECT * FROM islands AS C WHERE C.value = B.value + 1) ) AS grp FROM islands as A ORDER BY value;
  • 46. 46 Plan • What are window functions – Basic window functions – Frames – Window functions and other parts of SQL • Computing window functions • Optimizations SergeyP
  • 47. 47 Window Functions and other SQL constructs • Can have WIN_FUNC(AGG_FUNC) Join Group Check HAVING DISTINCT Sort + Limit Compute Window Functions • Window functions can appear in – SELECT list – ORDER BY clause
  • 48. 48 Filtering on window function value • How to filter for e.g. RANK() < 3 ? Use a subquery. select name, incidents, row_number() over (order by incidents desc) as ROW_NUM from support_staff
  • 49. 49 Filtering on window function value • How to filter for e.g. RANK() < 3 ? Use a subquery. select * from ( select name, incidents, row_number() over (order by incidents desc) as ROW_NUM from support_staff ) as TBL where TBL.ROW_NUM < 3
  • 50. 50 Plan • What are window functions – Basic window functions – Frames – Window functions and other parts of SQL • Computing window functions • Optimizations
  • 51. 51 Computing Window functions group table table table join • Join, grouping • All partitions are mixed together
  • 52. 52 Computing Window functions • Put join output into a temporary table • Sort it by (PARTITITION BY clause, ORDER BY clause) group table table table join sort Sort by: PARTITION BY clause, ORDER BY clause
  • 53. 53 Computing window function for a row • Can look at – Current row – Rows in the partition, ordered • Can compute the window function • Computing values individually would be expensive – O(#rows_in_partition ^ 2)
  • 54. 54 “Streamable” window functions • ROW_NUMBER, RANK, DENSE_RANK, ... – Can walk down and compute values on the fly • NTILE, CUME_DIST, PERCENT_RANK – Get #rows in the partition – Then walk down and compute values on the fly.
  • 55. 55 Computing framed window functions • window_func(rows_in_the_frame) • Frame moves with the current row
  • 56. 56 Computing framed window functions 20 10 $total+10-20 $total • window_func(rows_in_the_frame) • Frame moves with the current row • Some functions allow to add and remove rows – SUM, COUNT, AVG, BIT_OR, BIT_* • Can compute efficiently – Done in MariaDB 10.2.0.
  • 57. 57 Some aggregates make streaming hard 20 21 19 10 MAX= ? MAX=21 • MIN, MAX • Need to track the whole window – Doable for small frames ● Can also re-calculate – Hard for bigger frames • Are big frames used? • Not implemented yet.
  • 58. 58 LEAD and LAG issues • LAG(expr, N) – “expr N rows before” – LAG(expr,1) - previous • Non-constant N? • Lookups to arbitrary rows – Expensive – Worth doing at all? LAG(..., 2)
  • 59. 59 Summary for computing window functions • Sort by (partition_by, order_by) • Then walk through and compute window functions • Most functions can be computed on-the-fly • Framed window functions require moving the frame – SUM, COUNT, AVG .. - can update value as frame moves – MIN, MAX – more complex • LEAD, LAG may require random reads
  • 60. 60 Plan • What are window functions – Basic window functions – Frames – Window functions and other parts of SQL • Computing window functions • Optimizations
  • 62. 62 join A query with window functions select 'web' as channel ,web.item ,web.return_ratio ,web.return_rank ,web.currency_rank from ( select item ,return_ratio ,currency_ratio ,rank() over (order by return_ratio) as return_rank ,rank() over (order by currency_ratio) as currency_rank from ( select ws.ws_item_sk as item ,(cast(sum(coalesce(wr.wr_return_quantity,0)) as decimal(15,4))/ cast(sum(coalesce(ws.ws_quantity,0)) as decimal(15,4) )) as return_ratio ,(cast(sum(coalesce(wr.wr_return_amt,0)) as decimal(15,4))/ cast(sum(coalesce(ws.ws_net_paid,0)) as decimal(15,4) )) as currency_ratio from web_sales ws left outer join web_returns wr on (ws.ws_order_number = wr.wr_order_number and ws.ws_item_sk = wr.wr_item_sk) ,date_dim where wr.wr_return_amt > 10000 and ws.ws_net_profit > 1 and ws.ws_net_paid > 0 and ws.ws_quantity > 0 and ws_sold_date_sk = d_date_sk and ws_sold_date_sk between 2452245 and 2452275 and d_year = 2001 and d_moy = 12 group by ws.ws_item_sk ) in_web ) web where web.return_rank <= 10 or web.currency_rank <= 10 Window functions group table table table sort
  • 63. 63 Still, there are optimizations • Doing fewer sorts • Condition pushdown through PARTITION BY
  • 64. 64 Doing fewer sorts tbl tbl tbl join sort select rank() over (order by incidents), ntile(4)over (order by incidents), rank() over (order by incidents, join_date), from support_staff • Each window function requires a sort • Can avoid sorting if using an index (MariaDB: not yet) • Identical PARTITION/ORDER BY must share the sort step • Compatible may share the sort step – MariaDB: yes (but have bugs atm) – PostgreSQL: yes, limited
  • 65. 65 Condition pushdown through PARTITION BY select * from ( select name, rank() over (partition by dept order by incidents desc) as R from staff ) as TBL where dept='Support' staff sort Development Consulting Supportsort
  • 66. 66 Condition pushdown into PARTITION BY • Other databases have this • In MariaDB, requires: MDEV-9197: Pushdown conditions into non-mergeable views/ derived tables MDEV-7486: Condition pushdown from HAVING into WHERE • These are 10.2 tasks too • Considering it
  • 67. 67 Optimizations summary • Not much need/room for optimizations in many cases – Window function is a small part of the query • Optimizations to have – Share the sort across window functions (have [bugs]) – Condition pushdown through PARTITION BY ● Depends on another 10.2 task ● Want to have it
  • 68. 68 Conclusions • Window functions coming in MariaDB 10.2! • Already have ~SQL:2003 level features • Intend to have ~SQL:2011 features – Comparable with “big 3” databases • Work on optimizations is in progress – Send us your cases.