SlideShare a Scribd company logo
Sergei Petrunia
Vicentiu Ciorbaru
Window functions
in MariaDB
2
Plan
• What are window functions
– Basic window functions
– Frames
– Window functions and other parts of SQL
• Computing window functions
• Optimizations
3
Plan
• What are window functions
– Basic window functions
– Frames
– Window functions and other parts of SQL
• Computing window functions
• Optimizations
4
Scalar functions
select
concat(Name, ' in ', Country)
from Cities
Peking in CHN
Berlin in DEU
Moscow in RUS
Chicago in USA
+------+---------+---------+
| ID | Name | Country |
+------+---------+---------+
| 1891 | Peking | CHN |
| 3068 | Berlin | DEU |
| 3580 | Moscow | RUS |
| 3795 | Chicago | USA |
+------+---------+---------+
• Compute values based on the current row
5
Aggregate functions
• Compute summary for the group
• Group is collapsed into summary row
select
country, sum(Population) as total
from Cities
group by country
+-----------+---------+------------+
| name | country | population |
+-----------+---------+------------+
| Berlin | DEU | 3386667 |
| Frankfurt | DEU | 643821 |
| Moscow | RUS | 8389200 |
| New York | USA | 8008278 |
| Chicago | USA | 2896016 |
| Seattle | USA | 563374 |
+-----------+---------+------------+
+---------+----------+
| country | total |
+---------+----------+
| DEU | 4030488 |
| RUS | 8389200 |
| USA | 11467668 |
+---------+----------+
6
Window functions
• Function is computed over an ordered partition (=group)
• Groups are not collapsed
select
name,
rank() over (partition by country,
order by population desc)
from cities
+-----------+---------+------------+
| name | country | population |
+-----------+---------+------------+
| Berlin | DEU | 3386667 |
| Frankfurt | DEU | 643821 |
| Moscow | RUS | 8389200 |
| New York | USA | 8008278 |
| Chicago | USA | 2896016 |
| Seattle | USA | 563374 |
+-----------+---------+------------+
+-----------+------+
| name | rank |
+-----------+------+
| Berlin | 1 |
| Frankfurt | 2 |
| Moscow | 1 |
| New York | 1 |
| Chicago | 2 |
| Seattle | 3 |
+-----------+------+
7
Window functions
• Function is computed over an ordered partition (=group)
• Groups are not collapsed
select
name,
rank() over (partition by country,
order by population desc)
from cities
+-----------+---------+------------+
| name | country | population |
+-----------+---------+------------+
| Berlin | DEU | 3386667 |
| Frankfurt | DEU | 643821 |
| Moscow | RUS | 8389200 |
| New York | USA | 8008278 |
| Chicago | USA | 2896016 |
| Seattle | USA | 563374 |
+-----------+---------+------------+
+-----------+------+
| name | rank |
+-----------+------+
| Berlin | 1 |
| Frankfurt | 2 |
| Moscow | 1 |
| New York | 1 |
| Chicago | 2 |
| Seattle | 3 |
+-----------+------+
8
Plan
• What are window functions
– Basic window functions
– Frames
– Window functions and other parts of SQL
• Computing window functions
• Optimizations
9
Basic Window Functions
select
name, incidents
from
support_staff
+----------+-----------+
| name | incidents |
+----------+-----------+
| Claudio | 10 |
| Valeriy | 9 |
| Daniel | 9 |
| Geoff | 9 |
| Stephane | 8 |
+----------+-----------+
10
row_number()
select
name, incidents,
row_number() over (order by incidents desc) as ROW_NUM
from
support_staff
+----------+-----------+---------+
| name | incidents | ROW_NUM |
+----------+-----------+---------+
| Claudio | 10 | 1 |
| Valeriy | 9 | 2 |
| Daniel | 9 | 3 |
| Geoff | 9 | 4 |
| Stephane | 8 | 5 |
+----------+-----------+---------+
11
rank()
select
name, incidents,
row_number() over (order by incidents desc) as ROW_NUM,
rank() over (order by incidents desc) as RANK,
from
support_staff
+----------+-----------+---------+------+
| name | incidents | ROW_NUM | RANK |
+----------+-----------+---------+------+
| Claudio | 10 | 1 | 1 |
| Valeriy | 9 | 2 | 2 |
| Daniel | 9 | 3 | 2 |
| Geoff | 9 | 4 | 2 |
| Stephane | 8 | 5 | 5 |
+----------+-----------+---------+------+
12
dense_rank()
select
name, incidents,
row_number() over (order by incidents desc) as ROW_NUM,
rank() over (order by incidents desc) as RANK,
dense_rank() over (order by incidents desc) as DENSE_R,
from
support_staff
+----------+-----------+---------+------+---------+
| name | incidents | ROW_NUM | RANK | DENSE_R |
+----------+-----------+---------+------+---------+
| Claudio | 10 | 1 | 1 | 1 |
| Valeriy | 9 | 3 | 2 | 2 |
| Daniel | 9 | 4 | 2 | 2 |
| Geoff | 9 | 2 | 2 | 2 |
| Stephane | 8 | 5 | 5 | 3 |
+----------+-----------+---------+------+---------+
13
ntile(n)
select
name, incidents,
row_number() over (order by incidents desc) as ROW_NUM,
rank() over (order by incidents desc) as RANK,
dense_rank() over (order by incidents desc) as DENSE_R,
ntile(4) over (order by incidents desc) as QARTILE,
from
support_staff
+----------+-----------+---------+------+---------+----------+
| name | incidents | ROW_NUM | RANK | DENSE_R | QUARTILE |
+----------+-----------+---------+------+---------+----------+
| Claudio | 10 | 1 | 1 | 1 | 1 |
| Valeriy | 9 | 2 | 2 | 2 | 1 |
| Daniel | 9 | 3 | 2 | 2 | 2 |
| Geoff | 9 | 4 | 2 | 2 | 3 |
| Stephane | 8 | 5 | 5 | 3 | 4 |
+----------+-----------+---------+------+---------+----------+
14
Conclusions so far
• Window functions are similar to aggregates
• Computed on (current_row, ordered_list(window_rows))
• Can compute relative standing of row wrt other rows
• RANK, DENSE_RANK, ...
15
Plan
• What are window functions
– Basic window functions
– Frames
– Window functions and other parts of SQL
• Computing window functions
• Optimizations
16
Framed window functions
• Some Window Functions use FRAMES
– e.g. Aggregates that are used as
window functions
• Window function is computed on rows
in the frame.
• Frame is inside PARTITION BY
• Frame moves with the current row
• There are various frame types
17
Smoothing Noisy Data
• Noisy data
acquisition
solution
17
SELECT time, raw_data
FROM sensor_data;
18
Smoothing Noisy Data
• Noisy data
acquisition
solution
SELECT time, raw_data
AVG(raw_data) OVER (
)
FROM sensor_data;
19
Smoothing Noisy Data
• Noisy data
acquisition
solution
SELECT time, raw_data
AVG(raw_data) OVER (
ORDER BY time
)
FROM sensor_data;
20
Smoothing Noisy Data
• Noisy data
acquisition
solution
SELECT time, raw_data
AVG(raw_data) OVER (
ORDER BY time
ROWS BETWEEN
3 PRECEDING AND
3 FOLLOWING )
FROM sensor_data;
21
Smoothing Noisy Data
• Noisy data
acquisition
solution
SELECT time, raw_data
AVG(raw_data) OVER (
ORDER BY time
ROWS BETWEEN
6 PRECEDING AND
6 FOLLOWING )
FROM sensor_data;
22
Account balance statement
• Generate balance sheet for bank account.
• Incoming transactions.
• Outgoing transactions.
+-----+----------+--------+
| tid | date | amount |
+-----+----------+--------+
| 1 | 20160401 | 2000 |
| 2 | 20160402 | -30.5 |
| 3 | 20160404 | -45.5 |
| 4 | 20160405 | -125.5 |
| 5 | 20160406 | 100.3 |
+-----+----------+--------+
select tid, date, amount
from transactions
where account_id = 12345;
23
Account balance statement
SELECT tid, date, amount
FROM transactions
WHERE account_id = 12345;
+-----+----------+--------+
| tid | date | amount |
+-----+----------+--------+
| 1 | 20160401 | 2000 |
| 2 | 20160402 | -30.5 |
| 3 | 20160404 | -45.5 |
| 4 | 20160405 | -125.5 |
| 5 | 20160406 | 100.3 |
+-----+----------+--------+
24
Account balance statement
SELECT tid, date, amount,
( SELECT SUM(amount)
FROM transactions t
WHERE t.date <= date AND
account_id = 12345 ) AS balance
FROM transactions
WHERE account_id = 12345;
+-----+----------+--------+----------+
| tid | date | amount | balance |
+-----+----------+--------+----------+
| 1 | 20160401 | 2000 | 2000 |
| 2 | 20160402 | -30.5 | 1969.5 |
| 3 | 20160404 | -45.5 | 1924 |
| 4 | 20160405 | -125.5 | 1798.5 |
| 5 | 20160406 | 100.3 | 1898.8 |
+-----+----------+--------+----------+
25
Account balance statement
SELECT tid, date, amount,
SUM(amount) OVER (ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND
CURRENT ROW) AS balance
FROM transactions
WHERE account_id = 12345;
+-----+----------+--------+----------+
| tid | date | amount | balance |
+-----+----------+--------+----------+
| 1 | 20160401 | 2000 | 2000 |
| 2 | 20160402 | -30.5 | 1969.5 |
| 3 | 20160404 | -45.5 | 1924 |
| 4 | 20160405 | -125.5 | 1798.5 |
| 5 | 20160406 | 100.3 | 1898.8 |
+-----+----------+--------+----------+
26
Account balance statement
• How do queries compare?
# Rows Regular SQL Window Functions
100 3.72 sec 0.01 sec
500 30.04 sec 0.01 sec
1000 59.6 sec 0.02 sec
2000 1 min 59 sec 0.03 sec
4000 4 min 1 sec 0.04 sec
16000 18 min 26 sec 0.18 sec
27
RANGE-type frames
• Useful when interval of interest has multiple/missing rows
• ORDER BY column -- one numeric column
• RANGE n PRECEDING
rows with R.column >= (current_row.column – n)
• RANGE n FOLLOWING
rows with R.column <= (current_row.column + n)
• CURRENT ROW
current row and rows with R.column = current_row.column
28
RANGE-type frames
• Expenses from today and yesterday:
+----------+-------+--------+------+
| exp_date | name | amount | sum |
+----------+-------+--------+------+
| 20160407 | bus | 4 | 4 |
| 20160409 | beer | 2 | |
| 20160410 | wine | 4 | |
| 20160410 | snack | 12 | |
+----------+-------+--------+------+
select
*,
sum(amount) over (order by exp_date
range between 1 preceding and
current row) as sum
from expenses
29
RANGE-type frames
• Expenses from today and yesterday:
+----------+-------+--------+------+
| exp_date | name | amount | sum |
+----------+-------+--------+------+
| 20160407 | bus | 4 | 4 |
| 20160409 | beer | 2 | |
| 20160410 | wine | 4 | |
| 20160410 | snack | 12 | |
+----------+-------+--------+------+
select
*,
sum(amount) over (order by exp_date
range between 1 preceding and
current row) as sum
from expenses
30
RANGE-type frames
• Expenses from today and yesterday:
+----------+-------+--------+------+
| exp_date | name | amount | sum |
+----------+-------+--------+------+
| 20160407 | bus | 4 | 4 |
| 20160409 | beer | 2 | 2 |
| 20160410 | wine | 4 | |
| 20160410 | snack | 12 | |
+----------+-------+--------+------+
select
*,
sum(amount) over (order by exp_date
range between 1 preceding and
current row) as sum
from expenses
31
RANGE-type frames
• Expenses from today and yesterday:
+----------+-------+--------+------+
| exp_date | name | amount | sum |
+----------+-------+--------+------+
| 20160407 | bus | 4 | 4 |
| 20160409 | beer | 2 | 2 |
| 20160410 | wine | 4 | 18 |
| 20160410 | snack | 12 | |
+----------+-------+--------+------+
select
*,
sum(amount) over (order by exp_date
range between 1 preceding and
current row) as sum
from expenses
32
RANGE-type frames
• Expenses from today and yesterday:
+----------+-------+--------+------+
| exp_date | name | amount | sum |
+----------+-------+--------+------+
| 20160407 | bus | 4 | 4 |
| 20160409 | beer | 2 | 2 |
| 20160410 | wine | 4 | 18 |
| 20160410 | snack | 12 | 18 |
+----------+-------+--------+------+
select
*,
sum(amount) over (order by exp_date
range between 1 preceding and
current row) as sum
from expenses
33
Date columns with RANGE-type frames
• Date columns and temporal intervals (MDEV-9727)
AVG(value) OVER (ORDER BY date_col
RANGE BETWEEN INTERVAL 1 MONTH PRECEDING
AND INTERVAL 1 MONTH FOLLOWING)
• SQL Standard allows this
• Not supported by PostgreSQL or MS SQL Server
• Intend to support in MariaDB.
34
FRAME syntax
• ROWS|RANGE PRECEDING|FOLLOWING:
35
Frames summary
• Some window functions use frames
– e.g. Aggregate functions used as window functions
• Frame moves with the current row
• RANGE/ROWS-type frames
– MariaDB supports all kinds
• Useful for
– Cumulative sums
– Running averages
– Getting aggregates without doing GROUP BY
36
The Island problem
• Given a set of ordered integers, find the start and end of
sequences that have no missing numbers.
Ex: 2, 3, 10, 11, 12, 15, 16, 17
• A common problem, with plenty of use cases:
– Used in sales to identify activity periods.
– Detecting outages.
– Stock market analysis.
37
The Island problem
SELECT value
FROM islands
ORDER BY value;
+-------+
| value |
+-------+
| 2 |
| 3 |
| 10 |
| 11 |
| 12 |
| 15 |
| 16 |
| 17 |
+-------+
+-------------+-----------+
| start_range | end_range |
+-------------+-----------+
| 2 | 3 |
| 10 | 12 |
| 15 | 17 |
+-------------+-----------+
38
The Island problem
SELECT value, (SELECT ??? ) AS grp
FROM islands
ORDER BY value;
+-------+------+
| value | grp |
+-------+------+
| 2 | a |
| 3 | a |
| 10 | b |
| 11 | b |
| 12 | b |
| 15 | c |
| 16 | c |
| 17 | c |
+-------+------+
+-------------+-----------+
| start_range | end_range |
+-------------+-----------+
| 2 | 3 |
| 10 | 12 |
| 15 | 17 |
+-------------+-----------+
39
The Island problem
SELECT value, (SELECT ??? ) AS grp
FROM islands
ORDER BY value;
+-------+------+
| value | grp |
+-------+------+
| 2 | a |
| 3 | a |
| 10 | b |
| 11 | b |
| 12 | b |
| 15 | c |
| 16 | c |
| 17 | c |
+-------+------+
+-------------+-----------+
| start_range | end_range |
+-------------+-----------+
| 2 | 3 |
| 10 | 12 |
| 15 | 17 |
+-------------+-----------+
SELECT MIN(value) AS start_range
MAX(value) AS end_range
FROM islands
GROUP BY grp;
40
The Island problem – generating the groups
SELECT value, (SELECT ??? ) AS grp
FROM islands
ORDER BY value;
+-------+------+
| value | grp |
+-------+------+
| 2 | 3 |
| 3 | 3 |
| 10 | 12 |
| 11 | 12 |
| 12 | 12 |
| 15 | 17 |
| 16 | 17 |
| 17 | 17 |
+-------+------+
41
The Island problem – generating the groups
SELECT value,
( SELECT MIN(B.value)
FROM islands AS B
WHERE B.value >= A.value
AND NOT EXISTS
( SELECT *
FROM islands AS C
WHERE C.col1 = B.col1 + 1)
) AS grp
FROM islands as A
ORDER BY value;
+-------+------+
| value | grp |
+-------+------+
| 2 | 3 |
| 3 | 3 |
| 10 | 12 |
| 11 | 12 |
| 12 | 12 |
| 15 | 17 |
| 16 | 17 |
| 17 | 17 |
+-------+------+
42
The Island problem – generating the groups
SELECT value,
( SELECT MIN(B.value)
FROM islands AS B
WHERE B.value >= A.value
AND NOT EXISTS
( SELECT *
FROM islands AS C
WHERE C.value = B.value + 1)
) AS grp
FROM islands as A
ORDER BY value;
+-------+------+
| value | grp |
+-------+------+
| 2 | 3 |
| 3 | 3 |
| 10 | 12 |
| 11 | 12 |
| 12 | 12 |
| 15 | 17 |
| 16 | 17 |
| 17 | 17 |
+-------+------+
43
The Island problem – generating the groups
43
SELECT value,
ROW_NUMBER() OVER
(ORDER BY value)
AS grp
FROM islands as A
ORDER BY value;
+-------+------+
| value | grp |
+-------+------+
| 2 | 1 |
| 3 | 2 |
| 10 | 3 |
| 11 | 4 |
| 12 | 5 |
| 15 | 6 |
| 16 | 7 |
| 17 | 8 |
+-------+------+
44
The Island problem – generating the groups
SELECT value,
value - ROW_NUMBER() OVER
(ORDER BY value)
AS grp
FROM islands as A
ORDER BY value;
+-------+------+
| value | grp |
+-------+------+
| 2 | 1 |
| 3 | 1 |
| 10 | 7 |
| 11 | 7 |
| 12 | 7 |
| 15 | 9 |
| 16 | 9 |
| 17 | 9 |
+-------+------+
45
The Island problem – generating the groups
SELECT value,
value - ROW_NUMBER() OVER
(ORDER BY value)
AS grp
FROM islands as A
ORDER BY value;
SELECT value,
( SELECT MIN(B.value)
FROM islands AS B
WHERE B.value >= A.value
AND NOT EXISTS
(SELECT *
FROM islands AS C
WHERE C.value = B.value + 1)
) AS grp
FROM islands as A
ORDER BY value;
46
Plan
• What are window functions
– Basic window functions
– Frames
– Window functions and other parts of SQL
• Computing window functions
• Optimizations
SergeyP
47
Window Functions and other SQL constructs
• Can have
WIN_FUNC(AGG_FUNC)
Join Group
Check
HAVING
DISTINCT Sort + Limit
Compute
Window
Functions
• Window functions can appear in
– SELECT list
– ORDER BY clause
48
Filtering on window function value
• How to filter for e.g. RANK() < 3 ? Use a subquery.
select
name, incidents,
row_number() over (order by incidents desc) as ROW_NUM
from
support_staff
49
Filtering on window function value
• How to filter for e.g. RANK() < 3 ? Use a subquery.
select * from (
select
name, incidents,
row_number() over (order by incidents desc) as ROW_NUM
from
support_staff
) as TBL
where TBL.ROW_NUM < 3
50
Plan
• What are window functions
– Basic window functions
– Frames
– Window functions and other parts of SQL
• Computing window functions
• Optimizations
51
Computing Window functions
group
table
table
table
join
• Join, grouping
• All partitions are mixed
together
52
Computing Window functions
• Put join output into a
temporary table
• Sort it by
(PARTITITION BY clause,
ORDER BY clause)
group
table
table
table
join
sort
Sort by:
PARTITION BY clause,
ORDER BY clause
53
Computing window function for a row
• Can look at
– Current row
– Rows in the partition, ordered
• Can compute the window function
• Computing values individually would
be expensive
– O(#rows_in_partition ^ 2)
54
“Streamable” window functions
• ROW_NUMBER, RANK, DENSE_RANK, ...
– Can walk down and compute values on
the fly
• NTILE, CUME_DIST, PERCENT_RANK
– Get #rows in the partition
– Then walk down and compute values on
the fly.
55
Computing framed window functions
• window_func(rows_in_the_frame)
• Frame moves with the current row
56
Computing framed window functions
20
10
$total+10-20
$total
• window_func(rows_in_the_frame)
• Frame moves with the current row
• Some functions allow to add and
remove rows
– SUM, COUNT, AVG, BIT_OR, BIT_*
• Can compute efficiently
– Done in MariaDB 10.2.0.
57
Some aggregates make streaming hard
20
21
19
10
MAX= ?
MAX=21
• MIN, MAX
• Need to track the whole window
– Doable for small frames
● Can also re-calculate
– Hard for bigger frames
• Are big frames used?
• Not implemented yet.
58
LEAD and LAG issues
• LAG(expr, N) – “expr N rows before”
– LAG(expr,1) - previous
• Non-constant N?
• Lookups to arbitrary rows
– Expensive
– Worth doing at all?
LAG(..., 2)
59
Summary for computing window functions
• Sort by (partition_by, order_by)
• Then walk through and compute window functions
• Most functions can be computed on-the-fly
• Framed window functions require moving the frame
– SUM, COUNT, AVG .. - can update value as frame moves
– MIN, MAX – more complex
• LEAD, LAG may require random reads
60
Plan
• What are window functions
– Basic window functions
– Frames
– Window functions and other parts of SQL
• Computing window functions
• Optimizations
61
Optimizations
Do window functions
optimizations matter?
62
join
A query with window functions
select
'web' as channel
,web.item
,web.return_ratio
,web.return_rank
,web.currency_rank
from (
select
item
,return_ratio
,currency_ratio
,rank() over (order by return_ratio) as return_rank
,rank() over (order by currency_ratio) as currency_rank
from
( select ws.ws_item_sk as item
,(cast(sum(coalesce(wr.wr_return_quantity,0)) as decimal(15,4))/
cast(sum(coalesce(ws.ws_quantity,0)) as decimal(15,4) )) as return_ratio
,(cast(sum(coalesce(wr.wr_return_amt,0)) as decimal(15,4))/
cast(sum(coalesce(ws.ws_net_paid,0)) as decimal(15,4) )) as currency_ratio
from
web_sales ws left outer join web_returns wr
on (ws.ws_order_number = wr.wr_order_number and
ws.ws_item_sk = wr.wr_item_sk)
,date_dim
where
wr.wr_return_amt > 10000
and ws.ws_net_profit > 1
and ws.ws_net_paid > 0
and ws.ws_quantity > 0
and ws_sold_date_sk = d_date_sk
and ws_sold_date_sk between 2452245 and 2452275
and d_year = 2001
and d_moy = 12
group by ws.ws_item_sk
) in_web
) web
where
web.return_rank <= 10 or web.currency_rank <= 10
Window functions
group
table
table
table
sort
63
Still, there are optimizations
• Doing fewer sorts
• Condition pushdown through PARTITION BY
64
Doing fewer sorts
tbl
tbl
tbl
join
sort
select
rank() over (order by incidents),
ntile(4)over (order by incidents),
rank() over (order by incidents,
join_date),
from
support_staff
• Each window function requires a sort
• Can avoid sorting if using an index (MariaDB: not yet)
• Identical PARTITION/ORDER BY must share the sort step
• Compatible may share the sort step
– MariaDB: yes (but have bugs atm)
– PostgreSQL: yes, limited
65
Condition pushdown through PARTITION BY
select * from (
select
name,
rank() over (partition by dept
order by incidents desc) as R
from
staff
) as TBL
where dept='Support'
staff sort
Development
Consulting
Supportsort
66
Condition pushdown into PARTITION BY
• Other databases have this
• In MariaDB, requires:
MDEV-9197: Pushdown conditions into non-mergeable views/
derived tables
MDEV-7486: Condition pushdown from HAVING into WHERE
• These are 10.2 tasks too
• Considering it
67
Optimizations summary
• Not much need/room for optimizations in many cases
– Window function is a small part of the query
• Optimizations to have
– Share the sort across window functions (have [bugs])
– Condition pushdown through PARTITION BY
●
Depends on another 10.2 task
●
Want to have it
68
Conclusions
• Window functions coming in MariaDB 10.2!
• Already have ~SQL:2003 level features
• Intend to have ~SQL:2011 features
– Comparable with “big 3” databases
• Work on optimizations is in progress
– Send us your cases.
69
Thanks
Q & A

More Related Content

What's hot (20)

PDF
Optimizer Trace Walkthrough
Sergey Petrunya
 
PDF
Query Optimizer in MariaDB 10.4
Sergey Petrunya
 
PDF
Window functions in MySQL 8.0
Mydbops
 
PDF
M|18 Understanding the Query Optimizer
MariaDB plc
 
PDF
Lessons for the optimizer from running the TPC-DS benchmark
Sergey Petrunya
 
PDF
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
MariaDB plc
 
PDF
M|18 Taking Advantage of Common Table Expressions
MariaDB plc
 
PDF
Mathematica for Physicits
Miroslav Mihaylov
 
PDF
Using Optimizer Hints to Improve MySQL Query Performance
oysteing
 
PDF
MySQL 8.0: Common Table Expressions
oysteing
 
PPT
Lec2 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Num...
Hsien-Hsin Sean Lee, Ph.D.
 
PPTX
Hive in Practice
András Fehér
 
PDF
MySQL 8.0 EXPLAIN ANALYZE
Norvald Ryeng
 
PDF
MariaDB: ANALYZE for statements (lightning talk)
Sergey Petrunya
 
PDF
Fulltext engine for non fulltext searches
Adrian Nuta
 
PPTX
Adaptive Query Optimization in 12c
Anju Garg
 
PDF
Advanced fulltext search with Sphinx
Adrian Nuta
 
PDF
Developers' mDay 2017. - Bogdan Kecman Oracle
mCloud
 
PDF
Histograms : Pre-12c and Now
Anju Garg
 
PDF
Scaling PostreSQL with Stado
Jim Mlodgenski
 
Optimizer Trace Walkthrough
Sergey Petrunya
 
Query Optimizer in MariaDB 10.4
Sergey Petrunya
 
Window functions in MySQL 8.0
Mydbops
 
M|18 Understanding the Query Optimizer
MariaDB plc
 
Lessons for the optimizer from running the TPC-DS benchmark
Sergey Petrunya
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
MariaDB plc
 
M|18 Taking Advantage of Common Table Expressions
MariaDB plc
 
Mathematica for Physicits
Miroslav Mihaylov
 
Using Optimizer Hints to Improve MySQL Query Performance
oysteing
 
MySQL 8.0: Common Table Expressions
oysteing
 
Lec2 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Num...
Hsien-Hsin Sean Lee, Ph.D.
 
Hive in Practice
András Fehér
 
MySQL 8.0 EXPLAIN ANALYZE
Norvald Ryeng
 
MariaDB: ANALYZE for statements (lightning talk)
Sergey Petrunya
 
Fulltext engine for non fulltext searches
Adrian Nuta
 
Adaptive Query Optimization in 12c
Anju Garg
 
Advanced fulltext search with Sphinx
Adrian Nuta
 
Developers' mDay 2017. - Bogdan Kecman Oracle
mCloud
 
Histograms : Pre-12c and Now
Anju Garg
 
Scaling PostreSQL with Stado
Jim Mlodgenski
 

Similar to Window functions in MariaDB 10.2 (20)

PDF
Data Love Conference - Window Functions for Database Analytics
Dave Stokes
 
PDF
Building advanced data-driven applications
MariaDB plc
 
PDF
Fun with ClickHouse Window Functions-2021-08-19.pdf
Altinity Ltd
 
PDF
Fun with click house window functions webinar slides 2021-08-19
Altinity Ltd
 
PPT
Enabling Applications with Informix' new OLAP functionality
Ajay Gupte
 
PPTX
SQL Windowing
Sandun Perera
 
PPT
Olap Functions Suport in Informix
Bingjie Miao
 
PPSX
Analytic & Windowing functions in oracle
Logan Palanisamy
 
PPTX
Simplifying SQL with CTE's and windowing functions
Clayton Groom
 
PDF
The Magic of Window Functions in Postgres
EDB
 
PDF
Windowing Functions - Little Rock Tech fest 2019
Dave Stokes
 
PDF
Windowing Functions - Little Rock Tech Fest 2019
Dave Stokes
 
PPTX
Adv.+SQL+PPT+final.pptx
AmitDas125851
 
PDF
Dublin 4x3-final-slideshare
Dag H. Wanvik
 
PDF
Oracle_Analytical_function.pdf
KalyankumarVenkat1
 
PDF
M|18 User Defined Function
MariaDB plc
 
PDF
advance-sqaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal.pdf
traphuong2103
 
PDF
CS121Lec05.pdf
georgejustymirobi1
 
PDF
SQL Queries .pdf
srinathpurushotham
 
PDF
Oracle Advanced SQL and Analytic Functions
Zohar Elkayam
 
Data Love Conference - Window Functions for Database Analytics
Dave Stokes
 
Building advanced data-driven applications
MariaDB plc
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Altinity Ltd
 
Fun with click house window functions webinar slides 2021-08-19
Altinity Ltd
 
Enabling Applications with Informix' new OLAP functionality
Ajay Gupte
 
SQL Windowing
Sandun Perera
 
Olap Functions Suport in Informix
Bingjie Miao
 
Analytic & Windowing functions in oracle
Logan Palanisamy
 
Simplifying SQL with CTE's and windowing functions
Clayton Groom
 
The Magic of Window Functions in Postgres
EDB
 
Windowing Functions - Little Rock Tech fest 2019
Dave Stokes
 
Windowing Functions - Little Rock Tech Fest 2019
Dave Stokes
 
Adv.+SQL+PPT+final.pptx
AmitDas125851
 
Dublin 4x3-final-slideshare
Dag H. Wanvik
 
Oracle_Analytical_function.pdf
KalyankumarVenkat1
 
M|18 User Defined Function
MariaDB plc
 
advance-sqaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaal.pdf
traphuong2103
 
CS121Lec05.pdf
georgejustymirobi1
 
SQL Queries .pdf
srinathpurushotham
 
Oracle Advanced SQL and Analytic Functions
Zohar Elkayam
 
Ad

More from Sergey Petrunya (20)

PDF
New optimizer features in MariaDB releases before 10.12
Sergey Petrunya
 
PDF
MariaDB's join optimizer: how it works and current fixes
Sergey Petrunya
 
PDF
Improved histograms in MariaDB 10.8
Sergey Petrunya
 
PDF
Improving MariaDB’s Query Optimizer with better selectivity estimates
Sergey Petrunya
 
PDF
JSON Support in MariaDB: News, non-news and the bigger picture
Sergey Petrunya
 
PDF
MariaDB 10.4 - что нового
Sergey Petrunya
 
PDF
MariaDB Optimizer - further down the rabbit hole
Sergey Petrunya
 
PDF
MariaDB 10.3 Optimizer - where does it stand
Sergey Petrunya
 
PDF
MyRocks in MariaDB | M18
Sergey Petrunya
 
PDF
New Query Optimizer features in MariaDB 10.3
Sergey Petrunya
 
PDF
MyRocks in MariaDB
Sergey Petrunya
 
PDF
Histograms in MariaDB, MySQL and PostgreSQL
Sergey Petrunya
 
PDF
Say Hello to MyRocks
Sergey Petrunya
 
PDF
MyRocks in MariaDB: why and how
Sergey Petrunya
 
PDF
Эволюция репликации в MySQL и MariaDB
Sergey Petrunya
 
PDF
MariaDB 10.1 - что нового.
Sergey Petrunya
 
PPTX
MyRocks: табличный движок для MySQL на основе RocksDB
Sergey Petrunya
 
PDF
ANALYZE for executable statements - a new way to do optimizer troubleshooting...
Sergey Petrunya
 
PDF
MariaDB 10.0 Query Optimizer
Sergey Petrunya
 
PDF
How mysql handles ORDER BY, GROUP BY, and DISTINCT
Sergey Petrunya
 
New optimizer features in MariaDB releases before 10.12
Sergey Petrunya
 
MariaDB's join optimizer: how it works and current fixes
Sergey Petrunya
 
Improved histograms in MariaDB 10.8
Sergey Petrunya
 
Improving MariaDB’s Query Optimizer with better selectivity estimates
Sergey Petrunya
 
JSON Support in MariaDB: News, non-news and the bigger picture
Sergey Petrunya
 
MariaDB 10.4 - что нового
Sergey Petrunya
 
MariaDB Optimizer - further down the rabbit hole
Sergey Petrunya
 
MariaDB 10.3 Optimizer - where does it stand
Sergey Petrunya
 
MyRocks in MariaDB | M18
Sergey Petrunya
 
New Query Optimizer features in MariaDB 10.3
Sergey Petrunya
 
MyRocks in MariaDB
Sergey Petrunya
 
Histograms in MariaDB, MySQL and PostgreSQL
Sergey Petrunya
 
Say Hello to MyRocks
Sergey Petrunya
 
MyRocks in MariaDB: why and how
Sergey Petrunya
 
Эволюция репликации в MySQL и MariaDB
Sergey Petrunya
 
MariaDB 10.1 - что нового.
Sergey Petrunya
 
MyRocks: табличный движок для MySQL на основе RocksDB
Sergey Petrunya
 
ANALYZE for executable statements - a new way to do optimizer troubleshooting...
Sergey Petrunya
 
MariaDB 10.0 Query Optimizer
Sergey Petrunya
 
How mysql handles ORDER BY, GROUP BY, and DISTINCT
Sergey Petrunya
 
Ad

Recently uploaded (20)

PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PPTX
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
PPTX
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PDF
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PDF
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
PDF
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PPTX
Transforming Mining & Engineering Operations with Odoo ERP | Streamline Proje...
SatishKumar2651
 
PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
PDF
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
Transforming Mining & Engineering Operations with Odoo ERP | Streamline Proje...
SatishKumar2651
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 

Window functions in MariaDB 10.2

  • 2. 2 Plan • What are window functions – Basic window functions – Frames – Window functions and other parts of SQL • Computing window functions • Optimizations
  • 3. 3 Plan • What are window functions – Basic window functions – Frames – Window functions and other parts of SQL • Computing window functions • Optimizations
  • 4. 4 Scalar functions select concat(Name, ' in ', Country) from Cities Peking in CHN Berlin in DEU Moscow in RUS Chicago in USA +------+---------+---------+ | ID | Name | Country | +------+---------+---------+ | 1891 | Peking | CHN | | 3068 | Berlin | DEU | | 3580 | Moscow | RUS | | 3795 | Chicago | USA | +------+---------+---------+ • Compute values based on the current row
  • 5. 5 Aggregate functions • Compute summary for the group • Group is collapsed into summary row select country, sum(Population) as total from Cities group by country +-----------+---------+------------+ | name | country | population | +-----------+---------+------------+ | Berlin | DEU | 3386667 | | Frankfurt | DEU | 643821 | | Moscow | RUS | 8389200 | | New York | USA | 8008278 | | Chicago | USA | 2896016 | | Seattle | USA | 563374 | +-----------+---------+------------+ +---------+----------+ | country | total | +---------+----------+ | DEU | 4030488 | | RUS | 8389200 | | USA | 11467668 | +---------+----------+
  • 6. 6 Window functions • Function is computed over an ordered partition (=group) • Groups are not collapsed select name, rank() over (partition by country, order by population desc) from cities +-----------+---------+------------+ | name | country | population | +-----------+---------+------------+ | Berlin | DEU | 3386667 | | Frankfurt | DEU | 643821 | | Moscow | RUS | 8389200 | | New York | USA | 8008278 | | Chicago | USA | 2896016 | | Seattle | USA | 563374 | +-----------+---------+------------+ +-----------+------+ | name | rank | +-----------+------+ | Berlin | 1 | | Frankfurt | 2 | | Moscow | 1 | | New York | 1 | | Chicago | 2 | | Seattle | 3 | +-----------+------+
  • 7. 7 Window functions • Function is computed over an ordered partition (=group) • Groups are not collapsed select name, rank() over (partition by country, order by population desc) from cities +-----------+---------+------------+ | name | country | population | +-----------+---------+------------+ | Berlin | DEU | 3386667 | | Frankfurt | DEU | 643821 | | Moscow | RUS | 8389200 | | New York | USA | 8008278 | | Chicago | USA | 2896016 | | Seattle | USA | 563374 | +-----------+---------+------------+ +-----------+------+ | name | rank | +-----------+------+ | Berlin | 1 | | Frankfurt | 2 | | Moscow | 1 | | New York | 1 | | Chicago | 2 | | Seattle | 3 | +-----------+------+
  • 8. 8 Plan • What are window functions – Basic window functions – Frames – Window functions and other parts of SQL • Computing window functions • Optimizations
  • 9. 9 Basic Window Functions select name, incidents from support_staff +----------+-----------+ | name | incidents | +----------+-----------+ | Claudio | 10 | | Valeriy | 9 | | Daniel | 9 | | Geoff | 9 | | Stephane | 8 | +----------+-----------+
  • 10. 10 row_number() select name, incidents, row_number() over (order by incidents desc) as ROW_NUM from support_staff +----------+-----------+---------+ | name | incidents | ROW_NUM | +----------+-----------+---------+ | Claudio | 10 | 1 | | Valeriy | 9 | 2 | | Daniel | 9 | 3 | | Geoff | 9 | 4 | | Stephane | 8 | 5 | +----------+-----------+---------+
  • 11. 11 rank() select name, incidents, row_number() over (order by incidents desc) as ROW_NUM, rank() over (order by incidents desc) as RANK, from support_staff +----------+-----------+---------+------+ | name | incidents | ROW_NUM | RANK | +----------+-----------+---------+------+ | Claudio | 10 | 1 | 1 | | Valeriy | 9 | 2 | 2 | | Daniel | 9 | 3 | 2 | | Geoff | 9 | 4 | 2 | | Stephane | 8 | 5 | 5 | +----------+-----------+---------+------+
  • 12. 12 dense_rank() select name, incidents, row_number() over (order by incidents desc) as ROW_NUM, rank() over (order by incidents desc) as RANK, dense_rank() over (order by incidents desc) as DENSE_R, from support_staff +----------+-----------+---------+------+---------+ | name | incidents | ROW_NUM | RANK | DENSE_R | +----------+-----------+---------+------+---------+ | Claudio | 10 | 1 | 1 | 1 | | Valeriy | 9 | 3 | 2 | 2 | | Daniel | 9 | 4 | 2 | 2 | | Geoff | 9 | 2 | 2 | 2 | | Stephane | 8 | 5 | 5 | 3 | +----------+-----------+---------+------+---------+
  • 13. 13 ntile(n) select name, incidents, row_number() over (order by incidents desc) as ROW_NUM, rank() over (order by incidents desc) as RANK, dense_rank() over (order by incidents desc) as DENSE_R, ntile(4) over (order by incidents desc) as QARTILE, from support_staff +----------+-----------+---------+------+---------+----------+ | name | incidents | ROW_NUM | RANK | DENSE_R | QUARTILE | +----------+-----------+---------+------+---------+----------+ | Claudio | 10 | 1 | 1 | 1 | 1 | | Valeriy | 9 | 2 | 2 | 2 | 1 | | Daniel | 9 | 3 | 2 | 2 | 2 | | Geoff | 9 | 4 | 2 | 2 | 3 | | Stephane | 8 | 5 | 5 | 3 | 4 | +----------+-----------+---------+------+---------+----------+
  • 14. 14 Conclusions so far • Window functions are similar to aggregates • Computed on (current_row, ordered_list(window_rows)) • Can compute relative standing of row wrt other rows • RANK, DENSE_RANK, ...
  • 15. 15 Plan • What are window functions – Basic window functions – Frames – Window functions and other parts of SQL • Computing window functions • Optimizations
  • 16. 16 Framed window functions • Some Window Functions use FRAMES – e.g. Aggregates that are used as window functions • Window function is computed on rows in the frame. • Frame is inside PARTITION BY • Frame moves with the current row • There are various frame types
  • 17. 17 Smoothing Noisy Data • Noisy data acquisition solution 17 SELECT time, raw_data FROM sensor_data;
  • 18. 18 Smoothing Noisy Data • Noisy data acquisition solution SELECT time, raw_data AVG(raw_data) OVER ( ) FROM sensor_data;
  • 19. 19 Smoothing Noisy Data • Noisy data acquisition solution SELECT time, raw_data AVG(raw_data) OVER ( ORDER BY time ) FROM sensor_data;
  • 20. 20 Smoothing Noisy Data • Noisy data acquisition solution SELECT time, raw_data AVG(raw_data) OVER ( ORDER BY time ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING ) FROM sensor_data;
  • 21. 21 Smoothing Noisy Data • Noisy data acquisition solution SELECT time, raw_data AVG(raw_data) OVER ( ORDER BY time ROWS BETWEEN 6 PRECEDING AND 6 FOLLOWING ) FROM sensor_data;
  • 22. 22 Account balance statement • Generate balance sheet for bank account. • Incoming transactions. • Outgoing transactions. +-----+----------+--------+ | tid | date | amount | +-----+----------+--------+ | 1 | 20160401 | 2000 | | 2 | 20160402 | -30.5 | | 3 | 20160404 | -45.5 | | 4 | 20160405 | -125.5 | | 5 | 20160406 | 100.3 | +-----+----------+--------+ select tid, date, amount from transactions where account_id = 12345;
  • 23. 23 Account balance statement SELECT tid, date, amount FROM transactions WHERE account_id = 12345; +-----+----------+--------+ | tid | date | amount | +-----+----------+--------+ | 1 | 20160401 | 2000 | | 2 | 20160402 | -30.5 | | 3 | 20160404 | -45.5 | | 4 | 20160405 | -125.5 | | 5 | 20160406 | 100.3 | +-----+----------+--------+
  • 24. 24 Account balance statement SELECT tid, date, amount, ( SELECT SUM(amount) FROM transactions t WHERE t.date <= date AND account_id = 12345 ) AS balance FROM transactions WHERE account_id = 12345; +-----+----------+--------+----------+ | tid | date | amount | balance | +-----+----------+--------+----------+ | 1 | 20160401 | 2000 | 2000 | | 2 | 20160402 | -30.5 | 1969.5 | | 3 | 20160404 | -45.5 | 1924 | | 4 | 20160405 | -125.5 | 1798.5 | | 5 | 20160406 | 100.3 | 1898.8 | +-----+----------+--------+----------+
  • 25. 25 Account balance statement SELECT tid, date, amount, SUM(amount) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS balance FROM transactions WHERE account_id = 12345; +-----+----------+--------+----------+ | tid | date | amount | balance | +-----+----------+--------+----------+ | 1 | 20160401 | 2000 | 2000 | | 2 | 20160402 | -30.5 | 1969.5 | | 3 | 20160404 | -45.5 | 1924 | | 4 | 20160405 | -125.5 | 1798.5 | | 5 | 20160406 | 100.3 | 1898.8 | +-----+----------+--------+----------+
  • 26. 26 Account balance statement • How do queries compare? # Rows Regular SQL Window Functions 100 3.72 sec 0.01 sec 500 30.04 sec 0.01 sec 1000 59.6 sec 0.02 sec 2000 1 min 59 sec 0.03 sec 4000 4 min 1 sec 0.04 sec 16000 18 min 26 sec 0.18 sec
  • 27. 27 RANGE-type frames • Useful when interval of interest has multiple/missing rows • ORDER BY column -- one numeric column • RANGE n PRECEDING rows with R.column >= (current_row.column – n) • RANGE n FOLLOWING rows with R.column <= (current_row.column + n) • CURRENT ROW current row and rows with R.column = current_row.column
  • 28. 28 RANGE-type frames • Expenses from today and yesterday: +----------+-------+--------+------+ | exp_date | name | amount | sum | +----------+-------+--------+------+ | 20160407 | bus | 4 | 4 | | 20160409 | beer | 2 | | | 20160410 | wine | 4 | | | 20160410 | snack | 12 | | +----------+-------+--------+------+ select *, sum(amount) over (order by exp_date range between 1 preceding and current row) as sum from expenses
  • 29. 29 RANGE-type frames • Expenses from today and yesterday: +----------+-------+--------+------+ | exp_date | name | amount | sum | +----------+-------+--------+------+ | 20160407 | bus | 4 | 4 | | 20160409 | beer | 2 | | | 20160410 | wine | 4 | | | 20160410 | snack | 12 | | +----------+-------+--------+------+ select *, sum(amount) over (order by exp_date range between 1 preceding and current row) as sum from expenses
  • 30. 30 RANGE-type frames • Expenses from today and yesterday: +----------+-------+--------+------+ | exp_date | name | amount | sum | +----------+-------+--------+------+ | 20160407 | bus | 4 | 4 | | 20160409 | beer | 2 | 2 | | 20160410 | wine | 4 | | | 20160410 | snack | 12 | | +----------+-------+--------+------+ select *, sum(amount) over (order by exp_date range between 1 preceding and current row) as sum from expenses
  • 31. 31 RANGE-type frames • Expenses from today and yesterday: +----------+-------+--------+------+ | exp_date | name | amount | sum | +----------+-------+--------+------+ | 20160407 | bus | 4 | 4 | | 20160409 | beer | 2 | 2 | | 20160410 | wine | 4 | 18 | | 20160410 | snack | 12 | | +----------+-------+--------+------+ select *, sum(amount) over (order by exp_date range between 1 preceding and current row) as sum from expenses
  • 32. 32 RANGE-type frames • Expenses from today and yesterday: +----------+-------+--------+------+ | exp_date | name | amount | sum | +----------+-------+--------+------+ | 20160407 | bus | 4 | 4 | | 20160409 | beer | 2 | 2 | | 20160410 | wine | 4 | 18 | | 20160410 | snack | 12 | 18 | +----------+-------+--------+------+ select *, sum(amount) over (order by exp_date range between 1 preceding and current row) as sum from expenses
  • 33. 33 Date columns with RANGE-type frames • Date columns and temporal intervals (MDEV-9727) AVG(value) OVER (ORDER BY date_col RANGE BETWEEN INTERVAL 1 MONTH PRECEDING AND INTERVAL 1 MONTH FOLLOWING) • SQL Standard allows this • Not supported by PostgreSQL or MS SQL Server • Intend to support in MariaDB.
  • 34. 34 FRAME syntax • ROWS|RANGE PRECEDING|FOLLOWING:
  • 35. 35 Frames summary • Some window functions use frames – e.g. Aggregate functions used as window functions • Frame moves with the current row • RANGE/ROWS-type frames – MariaDB supports all kinds • Useful for – Cumulative sums – Running averages – Getting aggregates without doing GROUP BY
  • 36. 36 The Island problem • Given a set of ordered integers, find the start and end of sequences that have no missing numbers. Ex: 2, 3, 10, 11, 12, 15, 16, 17 • A common problem, with plenty of use cases: – Used in sales to identify activity periods. – Detecting outages. – Stock market analysis.
  • 37. 37 The Island problem SELECT value FROM islands ORDER BY value; +-------+ | value | +-------+ | 2 | | 3 | | 10 | | 11 | | 12 | | 15 | | 16 | | 17 | +-------+ +-------------+-----------+ | start_range | end_range | +-------------+-----------+ | 2 | 3 | | 10 | 12 | | 15 | 17 | +-------------+-----------+
  • 38. 38 The Island problem SELECT value, (SELECT ??? ) AS grp FROM islands ORDER BY value; +-------+------+ | value | grp | +-------+------+ | 2 | a | | 3 | a | | 10 | b | | 11 | b | | 12 | b | | 15 | c | | 16 | c | | 17 | c | +-------+------+ +-------------+-----------+ | start_range | end_range | +-------------+-----------+ | 2 | 3 | | 10 | 12 | | 15 | 17 | +-------------+-----------+
  • 39. 39 The Island problem SELECT value, (SELECT ??? ) AS grp FROM islands ORDER BY value; +-------+------+ | value | grp | +-------+------+ | 2 | a | | 3 | a | | 10 | b | | 11 | b | | 12 | b | | 15 | c | | 16 | c | | 17 | c | +-------+------+ +-------------+-----------+ | start_range | end_range | +-------------+-----------+ | 2 | 3 | | 10 | 12 | | 15 | 17 | +-------------+-----------+ SELECT MIN(value) AS start_range MAX(value) AS end_range FROM islands GROUP BY grp;
  • 40. 40 The Island problem – generating the groups SELECT value, (SELECT ??? ) AS grp FROM islands ORDER BY value; +-------+------+ | value | grp | +-------+------+ | 2 | 3 | | 3 | 3 | | 10 | 12 | | 11 | 12 | | 12 | 12 | | 15 | 17 | | 16 | 17 | | 17 | 17 | +-------+------+
  • 41. 41 The Island problem – generating the groups SELECT value, ( SELECT MIN(B.value) FROM islands AS B WHERE B.value >= A.value AND NOT EXISTS ( SELECT * FROM islands AS C WHERE C.col1 = B.col1 + 1) ) AS grp FROM islands as A ORDER BY value; +-------+------+ | value | grp | +-------+------+ | 2 | 3 | | 3 | 3 | | 10 | 12 | | 11 | 12 | | 12 | 12 | | 15 | 17 | | 16 | 17 | | 17 | 17 | +-------+------+
  • 42. 42 The Island problem – generating the groups SELECT value, ( SELECT MIN(B.value) FROM islands AS B WHERE B.value >= A.value AND NOT EXISTS ( SELECT * FROM islands AS C WHERE C.value = B.value + 1) ) AS grp FROM islands as A ORDER BY value; +-------+------+ | value | grp | +-------+------+ | 2 | 3 | | 3 | 3 | | 10 | 12 | | 11 | 12 | | 12 | 12 | | 15 | 17 | | 16 | 17 | | 17 | 17 | +-------+------+
  • 43. 43 The Island problem – generating the groups 43 SELECT value, ROW_NUMBER() OVER (ORDER BY value) AS grp FROM islands as A ORDER BY value; +-------+------+ | value | grp | +-------+------+ | 2 | 1 | | 3 | 2 | | 10 | 3 | | 11 | 4 | | 12 | 5 | | 15 | 6 | | 16 | 7 | | 17 | 8 | +-------+------+
  • 44. 44 The Island problem – generating the groups SELECT value, value - ROW_NUMBER() OVER (ORDER BY value) AS grp FROM islands as A ORDER BY value; +-------+------+ | value | grp | +-------+------+ | 2 | 1 | | 3 | 1 | | 10 | 7 | | 11 | 7 | | 12 | 7 | | 15 | 9 | | 16 | 9 | | 17 | 9 | +-------+------+
  • 45. 45 The Island problem – generating the groups SELECT value, value - ROW_NUMBER() OVER (ORDER BY value) AS grp FROM islands as A ORDER BY value; SELECT value, ( SELECT MIN(B.value) FROM islands AS B WHERE B.value >= A.value AND NOT EXISTS (SELECT * FROM islands AS C WHERE C.value = B.value + 1) ) AS grp FROM islands as A ORDER BY value;
  • 46. 46 Plan • What are window functions – Basic window functions – Frames – Window functions and other parts of SQL • Computing window functions • Optimizations SergeyP
  • 47. 47 Window Functions and other SQL constructs • Can have WIN_FUNC(AGG_FUNC) Join Group Check HAVING DISTINCT Sort + Limit Compute Window Functions • Window functions can appear in – SELECT list – ORDER BY clause
  • 48. 48 Filtering on window function value • How to filter for e.g. RANK() < 3 ? Use a subquery. select name, incidents, row_number() over (order by incidents desc) as ROW_NUM from support_staff
  • 49. 49 Filtering on window function value • How to filter for e.g. RANK() < 3 ? Use a subquery. select * from ( select name, incidents, row_number() over (order by incidents desc) as ROW_NUM from support_staff ) as TBL where TBL.ROW_NUM < 3
  • 50. 50 Plan • What are window functions – Basic window functions – Frames – Window functions and other parts of SQL • Computing window functions • Optimizations
  • 51. 51 Computing Window functions group table table table join • Join, grouping • All partitions are mixed together
  • 52. 52 Computing Window functions • Put join output into a temporary table • Sort it by (PARTITITION BY clause, ORDER BY clause) group table table table join sort Sort by: PARTITION BY clause, ORDER BY clause
  • 53. 53 Computing window function for a row • Can look at – Current row – Rows in the partition, ordered • Can compute the window function • Computing values individually would be expensive – O(#rows_in_partition ^ 2)
  • 54. 54 “Streamable” window functions • ROW_NUMBER, RANK, DENSE_RANK, ... – Can walk down and compute values on the fly • NTILE, CUME_DIST, PERCENT_RANK – Get #rows in the partition – Then walk down and compute values on the fly.
  • 55. 55 Computing framed window functions • window_func(rows_in_the_frame) • Frame moves with the current row
  • 56. 56 Computing framed window functions 20 10 $total+10-20 $total • window_func(rows_in_the_frame) • Frame moves with the current row • Some functions allow to add and remove rows – SUM, COUNT, AVG, BIT_OR, BIT_* • Can compute efficiently – Done in MariaDB 10.2.0.
  • 57. 57 Some aggregates make streaming hard 20 21 19 10 MAX= ? MAX=21 • MIN, MAX • Need to track the whole window – Doable for small frames ● Can also re-calculate – Hard for bigger frames • Are big frames used? • Not implemented yet.
  • 58. 58 LEAD and LAG issues • LAG(expr, N) – “expr N rows before” – LAG(expr,1) - previous • Non-constant N? • Lookups to arbitrary rows – Expensive – Worth doing at all? LAG(..., 2)
  • 59. 59 Summary for computing window functions • Sort by (partition_by, order_by) • Then walk through and compute window functions • Most functions can be computed on-the-fly • Framed window functions require moving the frame – SUM, COUNT, AVG .. - can update value as frame moves – MIN, MAX – more complex • LEAD, LAG may require random reads
  • 60. 60 Plan • What are window functions – Basic window functions – Frames – Window functions and other parts of SQL • Computing window functions • Optimizations
  • 62. 62 join A query with window functions select 'web' as channel ,web.item ,web.return_ratio ,web.return_rank ,web.currency_rank from ( select item ,return_ratio ,currency_ratio ,rank() over (order by return_ratio) as return_rank ,rank() over (order by currency_ratio) as currency_rank from ( select ws.ws_item_sk as item ,(cast(sum(coalesce(wr.wr_return_quantity,0)) as decimal(15,4))/ cast(sum(coalesce(ws.ws_quantity,0)) as decimal(15,4) )) as return_ratio ,(cast(sum(coalesce(wr.wr_return_amt,0)) as decimal(15,4))/ cast(sum(coalesce(ws.ws_net_paid,0)) as decimal(15,4) )) as currency_ratio from web_sales ws left outer join web_returns wr on (ws.ws_order_number = wr.wr_order_number and ws.ws_item_sk = wr.wr_item_sk) ,date_dim where wr.wr_return_amt > 10000 and ws.ws_net_profit > 1 and ws.ws_net_paid > 0 and ws.ws_quantity > 0 and ws_sold_date_sk = d_date_sk and ws_sold_date_sk between 2452245 and 2452275 and d_year = 2001 and d_moy = 12 group by ws.ws_item_sk ) in_web ) web where web.return_rank <= 10 or web.currency_rank <= 10 Window functions group table table table sort
  • 63. 63 Still, there are optimizations • Doing fewer sorts • Condition pushdown through PARTITION BY
  • 64. 64 Doing fewer sorts tbl tbl tbl join sort select rank() over (order by incidents), ntile(4)over (order by incidents), rank() over (order by incidents, join_date), from support_staff • Each window function requires a sort • Can avoid sorting if using an index (MariaDB: not yet) • Identical PARTITION/ORDER BY must share the sort step • Compatible may share the sort step – MariaDB: yes (but have bugs atm) – PostgreSQL: yes, limited
  • 65. 65 Condition pushdown through PARTITION BY select * from ( select name, rank() over (partition by dept order by incidents desc) as R from staff ) as TBL where dept='Support' staff sort Development Consulting Supportsort
  • 66. 66 Condition pushdown into PARTITION BY • Other databases have this • In MariaDB, requires: MDEV-9197: Pushdown conditions into non-mergeable views/ derived tables MDEV-7486: Condition pushdown from HAVING into WHERE • These are 10.2 tasks too • Considering it
  • 67. 67 Optimizations summary • Not much need/room for optimizations in many cases – Window function is a small part of the query • Optimizations to have – Share the sort across window functions (have [bugs]) – Condition pushdown through PARTITION BY ● Depends on another 10.2 task ● Want to have it
  • 68. 68 Conclusions • Window functions coming in MariaDB 10.2! • Already have ~SQL:2003 level features • Intend to have ~SQL:2011 features – Comparable with “big 3” databases • Work on optimizations is in progress – Send us your cases.