SlideShare a Scribd company logo
Maxim Boguk
How to teach
an elephant
to rock’n’roll
About Data Egret
dataegret.com
At Data Egret (formerly known as PostgreSQL-Consulting.com) we
provide database maintenance solutions and system administration
support covering PostgreSQL database life cycle to large and small
businesses alike.

27/7 Support

Consulting

Health Check

Monitoring

Training
dataegret.com
How to teach an elephant to rock’n’roll
dataegret.com

Some queries can’t be automatically optimized

An alternative approach

Manual fast algorithm – huge performance boost
Really?
dataegret.com

Demonstrate very useful fast alternative algorithms

Provide methods of the PL/PgSQL conversion to plain SQL

Additional: Self-contained learning package
Goals
dataegret.com

Basic query optimization techniques

PostgreSQL query planner and optimizer limitations

Compare PL/PgSQL performance vs plain SQL
(The Pl/PgSQL samples are written in a way that is easy to
understand, however they are not the fastest possible
implementation)
Non goals
dataegret.com

PL/PgSQL knowledge

PostgreSQL SQL features:

WITH [RECURSIVE]

[JOIN] LATERAL

UNNEST [WITH ORDINALITY]
Prerequisites
dataegret.com

PostgreSQL version 9.6

Works on 9.5 and 9.4 without (major) changes

Running on the 9.3 (and down to 8.4) is possible, but requires a
work-around implementation for missing features.
PostgreSQL version
dataegret.com

Problem description

Native approach - a simple SQL query

EXPLAIN ANALYZE

Alternative algorithm on PL/PgSQL

The same algorithm implemented on SQL

EXPLAIN ANALYZE

Performance results
Structure
01
Initial data
preparation
dataegret.com
Initial data preparation
Schema01
CREATE UNIQUE INDEX blog_post_test_author_id_ctime_ukey
ON b_p_t USING btree (author_id, ctime);
dataegret.com
Create blog posts table:
DROP TABLE IF EXISTS b_p_t;
CREATE TABLE b_p_t (
id BIGSERIAL PRIMARY KEY,
ctime TIMESTAMP NOT NULL,
author_id INTEGER NOT NULL,
payload text);
Initial data preparation
Part 101
dataegret.com
Populate the table with test data:
-- generate 10.000.000 blog posts from 1000 authors average 10000 post
per author from last 5 years
-- expect few hours run time
INSERT INTO b_p_t (ctime, author_id, payload)
SELECT
-- random in last 5 years
now()-(random()*365*24*3600*5)*'1 second'::interval AS ctime,
-- 1001 author
(random()*1000)::int AS author_id,
-- random text-like payload 100-2100 bytes long
(SELECT
string_agg(substr('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVW
XYZ0123456789 ', (random() * 72)::integer + 1, 1), '')
FROM generate_series(1, 100+i%10+(random() * 2000)::integer)) AS
payload
-- 10M posts
FROM generate_series(1, 10000000) AS g(i);
Initial data preparation
Part 201
Initial data preparation
Part 3
Populate the table with test data (continue):
--delete generated duplicates
DELETE FROM b_p_t where (author_id, ctime) IN (select author_id, ctime
from b_p_t group by author_id, ctime having count(*)>1);
CREATE INDEX bpt_ctime_key on b_p_t(ctime);
CREATE UNIQUE INDEX bpt_a_id_ctime_ukey on b_p_t(author_id, ctime);
-- create authors table
DROP TABLE IF EXISTS a_t;
CREATE TABLE a_t AS select distinct on (author_id) author_id AS id,
'author_'||author_id AS name FROM b_p_t;
ALTER TABLE a_t ADD PRIMARY KEY (id);
ALTER TABLE b_p_t ADD CONSTRAINT author_id_fk FOREIGN KEY (author_id)
REFERENCE a_t(id);
ANALYZE a_t;
dataegret.com
01
02
IOS for
large
offsets
dataegret.com

Queries with large offset are always slow

To produce 1.000.001’st row, database fist needs to iterate
over all 1.000.000 rows

Alternative: use fast Index Only Scan to skip the first
1.000.000 rows
IOS for large offsets02
dataegret.com
SELECT
*
FROM b_p_t
ORDER BY id
OFFSET 1000000
LIMIT 10
IOS for large offsets
Native SQL02
02
dataegret.com
Limit (actual time=503..503 rows=10 loops=1)
-> Index Scan using b_p_t_pkey on b_p_t
(actual time=0..386 rows=1000010 loops=1)
IOS for large offsets
Native SQL EXPLAIN
02
dataegret.com
CREATE OR REPLACE FUNCTION ios_offset_test
(a_offset BIGINT, a_limit BIGINT) RETURNS SETOF b_p_t LANGUAGE plpgsql
AS $function$
DECLARE start_id b_p_t.id%TYPE;
BEGIN
--find a starting id using IOS to skip OFFSET rows
SELECT id INTO start_id FROM b_p_t
ORDER BY id OFFSET a_offset LIMIT 1;
--return result using normal index scan
RETURN QUERY SELECT * FROM b_p_t WHERE id>=start_id
ORDER BY ID LIMIT a_limit;
END;
$function$;
IOS for large offsets
PL/PgSQL
02
dataegret.com
SELECT bpt.* FROM
(
--find a starting id using IOS to skip OFFSET rows
SELECT id FROM b_p_t
ORDER BY id OFFSET 1000000 LIMIT 1
) AS t,
LATERAL (
--return result using normal index scan
SELECT * FROM b_p_t WHERE id>=t.id
ORDER BY id LIMIT 10
) AS bpt;
IOS for large offsets
Advanced SQL
02
dataegret.com
-> Index Only Scan using blog_post_test_pkey on b_p_t
(actual time=0..236 rows=1000001 loops=1)
-> Index Scan using blog_post_test_pkey on b_p_t
(actual time=0.016..0.026 rows=10 loops=1)
Index Cond: (id >= b_p_t.id)
IOS for large offsets
Advanced SQL EXPLAIN
dataegret.com
IOS for large offsets
Performance
OFFSET Native PL/PgSQL Advanced
1000 0.7ms 0.5ms 0.4ms
10000 3.0ms 1.2ms 1.1ms
100000 26.2ms 7.7ms 7.4ms
1000000 273.0ms 71.0ms 70.9ms
02
03
Pull down
LIMIT under
JOIN
dataegret.com

A combination of JOIN, ORDER BY and LIMIT

The database performs join for every row
(not only for the LIMIT rows)

In many cases it’s unnecessary

Alternative: pull down LIMIT+ORDER BY under JOIN
Pull down LIMIT under JOIN03
dataegret.com
SELECT
*
FROM b_p_t
JOIN a_t ON a_t.id=author_id
WHERE author_id IN (1,2,3,4,5)
ORDER BY ctime
LIMIT 10
Pull down LIMIT under JOIN
Native SQL03
03
dataegret.com
-> Sort (actual time=345..345 rows=10 loops=1)
-> Nested Loop (actual time=0.061..295.832 rows=50194
loops=1)
-> Index Scan using b_p_t_author_id_ctime_ukey on
b_p_t (actual time=0..78 rows=50194 loops=1)
Index Cond: (author_id = ANY
('{1,2,3,4,5}'::integer[]))
-> Index Scan using a_t_pkey on a_t
(actual time=0.002 rows=1 loops=50194)
Index Cond: (id = b_p_t.author_id)
Pull down LIMIT under JOIN
Native SQL EXPLAIN
03
dataegret.com
CREATE OR REPLACE FUNCTION join_limit_pulldown_test (a_authors BIGINT[],
a_limit BIGINT)
RETURNS TABLE (id BIGINT, ctime TIMESTAMP, author_id INT, payload TEXT, name
TEXT) LANGUAGE plpgsql
AS $function$
DECLARE t record;
BEGIN
FOR t IN (
-- find ONLY required rows first
SELECT * FROM b_p_t
WHERE b_p_t.author_id=ANY(a_authors)
ORDER BY ctime LIMIT a_limit
) LOOP
-- and only after join with authors
RETURN QUERY SELECT t.*, a_t.name FROM a_t
WHERE a_t.id=t.author_id;
END LOOP;
END;
$function$;
Pull down LIMIT under JOIN
PL/PgSQL
03
dataegret.com
SELECT bpt_with_a_name.* FROM
-- find ONLY required rows first
(
SELECT * FROM b_p_t
WHERE author_id IN (1,2,3,4,5)
ORDER BY ctime LIMIT 10
) AS t, LATERAL (
-- and only after join with the authors
SELECT t.*,a_t.name FROM a_t
WHERE a_t.id=t.author_id
) AS bpt_with_a_name
-- second ORDER BY required
ORDER BY ctime LIMIT 10;
Pull down LIMIT under JOIN
Advanced SQL
03
dataegret.com
-> Nested Loop (actual time=68..68 rows=10 loops=1)
-> Sort (actual time=68..68 rows=10 loops=1)
-> Index Scan using b_p_t_author_id_ctime_ukey on b_p_t
(actual time=0..49 rows=50194 loops=1)
Index Cond: (author_id = ANY ('{1,2,3,4,5}'::integer[]))
-> Index Scan using a_t_pkey on a_t (actual
time=0.002..0.002 rows=1 loops=10)
Index Cond: (id = b_p_t.author_id)
Pull down LIMIT under JOIN
Advanced SQL EXPLAIN
03
dataegret.com
author_id IN (1,2,3,4,5) / LIMIT 10
Native SQL: 155ms
PL/PgSQL: 52ms
Advanced SQL: 51ms
Pull down LIMIT under JOIN
Performance
04
DISTINCT
04
dataegret.com

DISTINCT on a large table always slow

It scans whole table

Even if the amount of distinct values low

Alternative: creative index use to perform DISTINCT

Technique known as LOOSE INDEX SCAN:
https://wiki.postgresql.org/wiki/Loose_indexscan
DISTINCT
04
dataegret.com
select
distinct author_id
from b_p_t
DISTINCT
Native SQL
04
dataegret.com
Unique (actual time=0..5235 rows=1001 loops=1)
-> Index Only Scan using b_p_t_author_id_ctime_ukey
on b_p_t
(actual time=0..3767 rows=9999964 loops=1)
DISTINCT
Native SQL EXPLAIN
04
dataegret.com
CREATE OR REPLACE FUNCTION fast_distinct_test
() RETURNS SETOF INTEGER
LANGUAGE plpgsql
AS $function$
DECLARE _author_id b_p_t.author_id%TYPE;
BEGIN
--start from least author_id
SELECT min(author_id) INTO _author_id FROM b_p_t;
LOOP
--finish if nothing found
EXIT WHEN _author_id IS NULL;
--return found value
RETURN NEXT _author_id;
--find the next author_id > current author_id
SELECT author_id INTO _author_id FROM b_p_t WHERE
author_id>_author_id ORDER BY author_id LIMIT 1;
END LOOP;
END;
$function$;
DISTINCT
PL/PgSQL
04
dataegret.com
WITH RECURSIVE t AS (
--start from least author_id
(SELECT author_id AS _author_id FROM b_p_t
ORDER BY author_id LIMIT 1
)
UNION ALL
SELECT author_id AS _author_id
FROM t, LATERAL (
--find the next author_id > current author_id
SELECT author_id FROM b_p_t
WHERE author_id>t._author_id
ORDER BY author_id LIMIT 1
) AS a_id
)
--return found values
SELECT _author_id FROM t;
DISTINCT
Advanced SQL
04
dataegret.com
-> Index Only Scan using
b_p_t_author_id_ctime_ukey on b_p_t b_p_t_1 (actual
time=0.015..0.015 rows=1 loops=1)
-> Index Only Scan using
b_p_t_author_id_ctime_ukey on b_p_t
(actual time=0.007..0.007 rows=1 loops=1001)
DISTINCT
Advanced SQL EXPLAIN
04
dataegret.com
Native SQL: 2660ms
PL/PgSQL: 18ms
Advanced SQL: 10ms
DISTINCT
Performance
05
DISTINCT
ON with IN
()
05
dataegret.com

DISTINCT ON with IN() used when an application require
fetch the latest data for SELECTED authors in single query

DISTINCT ON queries usually slow

Alternative: creative use of index to calculate DISTINCT ON
DISTINCT ON with IN ()
05
dataegret.com
SELECT
DISTINCT ON (author_id)
*
FROM b_p_t
WHERE author_id IN (1,2,3,4,5)
ORDER BY author_id, ctime DESC
DISTINCT ON with IN ()
Native SQL
05
dataegret.com
Unique (actual time=160..204 rows=5 loops=1)
-> Sort (actual time=160..195 rows=50194
loops=1)
-> Index Scan using
b_p_t_author_id_ctime_ukey on b_p_t (actual
time=0..50 rows=50194 loops=1)
Index Cond: (author_id = ANY
('{1,2,3,4,5}'::integer[]))
DISTINCT ON with IN ()
Native SQL EXPLAIN
05
dataegret.com
CREATE OR REPLACE FUNCTION distinct_on_test
(a_authors INT[]) RETURNS SETOF b_p_t
LANGUAGE plpgsql
AS $function$
DECLARE _a b_p_t.author_id%TYPE;
BEGIN
-- loop over authors list
FOREACH _a IN ARRAY a_authors LOOP
-- return the latest post for author
RETURN QUERY SELECT * FROM b_p_t
WHERE author_id=_a ORDER BY ctime DESC LIMIT 1;
END LOOP;
END;
$function$;
DISTINCT ON with IN ()
PL/PgSQL
05
dataegret.com
SELECT bpt.*
-- loop over authors list
FROM unnest(ARRAY[1,2,3,4,5]::INT[]) AS t(_author_id),
LATERAL (
-- return the latest post for author
SELECT * FROM b_p_t
WHERE author_id=t._author_id
ORDER BY ctime DESC LIMIT 1
) AS bpt;
DISTINCT ON with IN ()
Advanced SQL
05
dataegret.com
Nested Loop (actual time=0.02..0.06 rows=5 loops=1)
-> Index Scan Backward using b_p_t_author_id_ctime_ukey on
b_p_t
(actual time=0.009..0.009 rows=1 loops=5)
Index Cond: (author_id = t._author)
DISTINCT ON with IN ()
Advanced SQL EXPLAIN
05
dataegret.com
Native SQL: 110.0ms
PL/PgSQL: 0.5ms
Advanced SQL: 0.4ms
DISTINCT ON with IN ()
Performance
06
DISTINCT
ON over all
table
06
dataegret.com

DISTINCT ON used when an application require fetch the
latest data for ALL authors in single query

DISTINCT ON on a large table always is performance killer

Alternative: again, creative use of indexes and SQL saves
the day
DISTINCT ON over all table
06
dataegret.com
SELECT
DISTINCT ON (author_id)
*
FROM b_p_t
ORDER BY author_id, ctime DESC
DISTINCT ON over all table
Native SQL
06
dataegret.com
Unique (actual time=29938..39450 rows=1001 loops=1)
-> Sort (actual time=29938..37845 rows=9999964 loops=1)
Sort Key: author_id, ctime DESC
Sort Method: external merge Disk: 10002168kB
-> Seq Scan on b_p_t (actual time=0.004..2472
rows=9999964 loops=1)
DISTINCT ON over all table
Native SQL EXPLAIN
06
dataegret.com
DISTINCT ON over all table
PL/PgSQL
CREATE OR REPLACE FUNCTION fast_distinct_on_test() RETURNS SETOF b_p_t
LANGUAGE plpgsql
AS $function$
DECLARE _b_p_t record;
BEGIN
--start from greatest author_id
SELECT * INTO _b_p_t FROM b_p_t
ORDER BY author_id DESC, ctime DESC LIMIT 1;
LOOP
--finish if nothing found
EXIT WHEN _b_p_t IS NULL;
--return found value
RETURN NEXT _b_p_t;
--latest post from next author_id < current author_id
SELECT * FROM b_p_t INTO _b_p_t
WHERE author_id<_b_p_t.author_id
ORDER BY author_id DESC, ctime DESC LIMIT 1;
END LOOP;
END;
$function$;
06
dataegret.com
WITH RECURSIVE t AS (
--start from greatest author_id
(SELECT * FROM b_p_t
ORDER BY author_id DESC, ctime DESC LIMIT 1)
UNION ALL
SELECT bpt.* FROM t, LATERAL (
--latest post from the next author_id < current author_id
SELECT * FROM b_p_t
WHERE author_id<t.author_id
ORDER BY author_id DESC, ctime DESC LIMIT 1
) AS bpt
)
--return found values
SELECT * FROM t;
DISTINCT ON over all table
Advanced SQL
06
dataegret.com
-> Index Scan Backward using b_p_t_author_id_ctime_ukey on
b_p_t
(actual time=0.008..0.008 rows=1 loops=1)
-> Index Scan Backward using b_p_t_author_id_ctime_ukey
on b_p_t
(actual time=0.007..0.007 rows=1 loops=1001)
Index Cond: (author_id < t_1.author_id)
DISTINCT ON over all table
Advanced SQL EXPLAIN
06
dataegret.com
--fast distinct(author_id) implementation (from part 4)
WITH RECURSIVE t AS (
--start from least author_id
(SELECT author_id AS _author_id FROM b_p_t
ORDER BY author_id LIMIT 1)
UNION ALL
SELECT author_id AS _author_id
FROM t, LATERAL (
--find the next author_id > current author_id
SELECT author_id FROM b_p_t WHERE author_id>t._author_id
ORDER BY author_id LIMIT 1
) AS a_id
)
SELECT bpt.*
-- loop over authors list (from part 5)
FROM t,
LATERAL (
-- return the latest post for each author
SELECT * FROM b_p_t WHERE b_p_t.author_id=t._author_id
ORDER BY ctime DESC LIMIT 1
) AS bpt;
DISTINCT ON over all table
Alternative advanced SQL
06
dataegret.com
Native SQL: 3755ms
PL/PgSQL: 27ms
Advanced SQL: 13ms
Alternative advanced SQL: 18ms
DISTINCT ON over all table
Performance
07
Let the real
fun begin:
the basic
news feed
dataegret.com
The idea of basic news feed is very simple:
 find the N newest posts from the subscription list authors
 may be with some offset
 easy to implement
 hard to make run fast
The basic news feed07
dataegret.com
SELECT *
FROM b_p_t
WHERE
author_id IN (1,2,3,4,5)
ORDER BY ctime DESC
LIMIT 20 OFFSET 20
The basic news feed
Native SQL07
07
dataegret.com
Limit (actual time=146..146 rows=20 loops=1)
-> Sort (actual time=146..146 rows=40 loops=1)
Sort Key: ctime DESC
Sort Method: top-N heapsort Memory: 84kB
-> Index Scan using bpt_a_id_ctime_ukey on b_p_t
(actual time=0.015..105 rows=50194 loops=1)
Index Cond: (author_id = ANY
('{1,2,3,4,5}'::integer[]))
The basic news feed
EXPLAIN
07
dataegret.com

The query fetches all the posts from all listed authors

It fetches whole rows (with all the payloads)

The more posts – the slower the query

The longer list of authors used – the slower the query
The basic news feed
Problems
07
dataegret.com

An ideal feed query implementation should require no
more then OFFSET+LIMIT index probes on blog_post table

Must be fast at least for small OFFSET/LIMIT values (100-
1000)
The basic news feed
Requirements
07
dataegret.com
The basic news feed
Alternative idea Step 0
Lets start from array of the author_id (_a_ids):
'{20,10,30,100,16}'::int[]
pos author_id
1 20
2 10
3 30
4 100
5 16
07
dataegret.com
The basic news feed
Alternative idea Step 1
Now, using ideas from block 5 populate the ctime of
the latest posts for each author (_a_ctimes):
SELECT
array_agg((
SELECT ctime
FROM b_p_t WHERE author_id=_a_id
ORDER BY ctime DESC LIMIT 1
) ORDER BY _pos)
FROM
UNNEST('{20,10,30,100,16}'::int[])
WITH ORDINALITY AS u(_a_id, _pos)
pos ctime
1 01-02-2017
2 03-02-2017
3 10-02-2017
4 28-02-2017
5 21-02-2017
07
dataegret.com
The basic news feed
Alternative idea Step 2
pos author_id
1 20
2 10
3 30
4 100
5 16
pos ctime
1 01-02-2017
2 03-02-2017
3 10-02-2017
4 28-02-2017
5 21-02-2017
Select position of the latest
post from _a_ctimes. It will
be first row of final result set.
SELECT pos FROM
UNNEST(_a_ctimes) WITH ORDINALITY AS u(a_ctime, pos)
ORDER BY a_ctime DESC NULLS LAST LIMIT 1
07
dataegret.com
The basic news feed
Alternative idea Step 3
pos author_id
1 20
2 10
3 30
4 100
5 16
pos ctime
1 01-02-2017
2 03-02-2017
3 10-02-2017
4 28-02-2017
5 21-02-2017
Now, replace row 4 in _a_ctimes with ctime of the previous post from the
same author.
SELECT ctime AS _a_ctime FROM b_p_t WHERE author_id=_a_ids[pos]
AND ctime<_a_ctimes[pos] ORDER BY ctime DESC LIMIT 1
Found author_id ctime
1 4 28-02-2017
07
dataegret.com
The basic news feed
Alternative idea
pos author_id
1 20
2 10
3 30
4 100
5 16
pos ctime
1 01-02-2017
2 03-02-2017
3 10-02-2017
4 20-02-2017
5 21-02-2017
“Rinse and repeat” steps 2 and 3 collecting found rows, until requested
LIMIT rows found.
Found author_id ctime
1 4 28-02-2017
2 5 21-02-2017
dataegret.com
The basic news feed
PL/PgSQL07
CREATE OR REPLACE FUNCTION feed_test
(a_authors INT[], a_limit INT, a_offset INT) RETURNS SETOF b_p_t
LANGUAGE plpgsql
AS $function$
DECLARE _a_ids INT[] := a_authors;
DECLARE _a_ctimes TIMESTAMP[];
DECLARE _rows_found INT := 0;
DECLARE _pos INT;
DECLARE _a_ctime TIMESTAMP;
DECLARE _a_id INT;
BEGIN
-- loop over authors list
FOR _pos IN SELECT generate_subscripts(a_authors, 1) LOOP
--populate the latest post ctime for every author
SELECT ctime INTO _a_ctime FROM b_p_t WHERE
author_id=_a_ids[_pos] ORDER BY ctime DESC LIMIT 1;
_a_ctimes[_pos] := _a_ctime;
END LOOP;
dataegret.com
The basic news feed
PL/PgSQL (continue)07
WHILE _rows_found<a_limit+a_offset LOOP
--seek position of the latest post in ctime array
SELECT pos INTO _pos
FROM UNNEST(_a_ctimes) WITH ORDINALITY AS u(a_ctime, pos)
ORDER BY a_ctime DESC NULLS LAST LIMIT 1;
--get ctime of previous post of the same author
SELECT ctime INTO _a_ctime FROM b_p_t
WHERE author_id=_a_ids[_pos] AND ctime<_a_ctimes[_pos]
ORDER BY ctime DESC LIMIT 1;
dataegret.com
The basic news feed
PL/PgSQL (continue)07
--offset rows done, start return results
IF _rows_found>=a_offset THEN
RETURN QUERY SELECT * FROM b_p_t
WHERE author_id=_a_ids[_pos]
AND ctime=_a_ctimes[_pos];
END IF;
--increase found rows count
_rows_found := _rows_found+1;
--replace ctime for author with previous message ctime
_a_ctimes[_pos] := _a_ctime;
END LOOP;
END;
$function$;
07
dataegret.com
WITH RECURSIVE
r AS (
SELECT
--empty result
NULL::b_p_t AS _return,
--zero rows found yet
0::integer AS _rows_found,
--populate author ARRAY
'{1,2,3,4,5}'::int[] AS _a_ids,
--populate author ARRAY of latest blog posts
(SELECT
array_agg((
SELECT ctime FROM b_p_t WHERE author_id=_a_id ORDER BY ctime DESC LIMIT 1
) ORDER BY _pos)
FROM UNNEST('{1,2,3,4,5}'::int[]) WITH ORDINALITY AS u(_a_id, _pos)
) AS _a_ctimes
UNION ALL
SELECT
--return found row to the result set if we already done OFFSET or more entries
CASE WHEN _rows_found>=100 THEN (
SELECT b_p_t FROM b_p_t WHERE author_id=_a_ids[_pos] AND ctime=_a_ctimes[_pos]
) ELSE NULL END,
--increase found row count
_rows_found+1,
--pass through the same a_ids array
_a_ids,
--replace current ctime for author with previous message ctime for the same author
_a_ctimes[:_pos-1]||_a_ctime||_a_ctimes[_pos+1:]
FROM r,
LATERAL (
SELECT _pos FROM UNNEST(_a_ctimes) WITH ORDINALITY AS u(_a_ctime, _pos)
ORDER BY _a_ctime DESC NULLS LAST LIMIT 1
) AS t1,
LATERAL (
SELECT ctime AS _a_ctime FROM b_p_t WHERE author_id=_a_ids[_pos] AND ctime<_a_ctimes[_pos]
ORDER BY ctime DESC LIMIT 1
) AS t2
--found the required amount of rows (offset+limit done)
WHERE _rows_found<105
)
SELECT (_return).* FROM r WHERE _return IS NOT NULL ORDER BY _rows_found;
The basic news feed
Advanced SQL
07
dataegret.com
WITH RECURSIVE
r AS (
--initial part of recursive union
SELECT
...
UNION ALL
--main part of recursive union
SELECT
…
--exit condition
WHERE _rows_found<200
)
--produce final ordered result
SELECT (_return).* FROM r WHERE _return IS NOT NULL
ORDER BY _rows_found
Alternative implementation
Advanced SQL (structure overview)
07
dataegret.com
--empty result
NULL::b_p_t AS _return,
--zero rows found so far
0::integer AS _rows_found,
--author ARRAY
'{1,2,3,4,5}'::int[] AS _a_ids,
--populate author ARRAY of latest blog posts (see part 5)
(SELECT
array_agg((
SELECT ctime FROM b_p_t WHERE author_id=_a_id ORDER BY
ctime DESC LIMIT 1
) ORDER BY _pos)
FROM UNNEST('{1,2,3,4,5}'::int[]) WITH ORDINALITY AS
u(_a_id, _pos)
) AS _a_ctimes
Alternative implementation
Advanced SQL (initial part)
07
dataegret.com
--return found row to the result set if we already done OFFSET or more entries
CASE WHEN _rows_found>=100 THEN (
SELECT b_p_t FROM b_p_t WHERE author_id=_a_ids[_pos] AND ctime=_a_ctimes[_pos]
) ELSE NULL END,
--increase found row count
_rows_found+1,
--pass through the same a_ids array
_a_ids,
--replace current ctime for author with previous message ctime
_a_ctimes[:_pos-1]||_a_ctime||_a_ctimes[_pos+1:]
FROM r,
LATERAL (
SELECT _pos FROM UNNEST(_a_ctimes) WITH ORDINALITY AS u(_a_ctime, _pos)
ORDER BY _a_ctime DESC NULLS LAST LIMIT 1
) AS t1,
LATERAL (
SELECT ctime AS _a_ctime FROM b_p_t
WHERE author_id=_a_ids[_pos] AND ctime<_a_ctimes[_pos]
ORDER BY ctime DESC LIMIT 1
) AS t2
Alternative implementation
Advanced SQL (main part)
07
dataegret.com
Performance
(with 10 authors)
LIMIT OFFSET Native PL/PgSQL Alternative
1 0 180ms 1.0ms 1.0ms
10 0 182ms 1.8ms 1.3ms
10 10 182ms 1.9ms 1.9ms
10 1000 187ms 53.0ms 24.7ms
1000 0 194ms 100.8ms 35.1ms
100 100 185ms 13.2ms 6.2ms
08
Final notes
dataegret.com

Efficient execution some popular queries requires
implementation of alternative procedural algorithm

Implementation of custom algorithms usually easier when
using PL/PgSQL

The same algorithm implemented on SQL runs faster

Process:
1. Implement and debug algorithm on PL/PgSQL
2. Convert to SQL
Final notes
dataegret.com
No questions?
Really?
Questions?
dataegret.com Maxim Boguk
Thank you!

More Related Content

PDF
ClickHouse Intro
Yegor Andreenko
 
PPT
IBM DB2 LUW UDB DBA Training by www.etraining.guru
Ravikumar Nandigam
 
PDF
Postgresql database administration volume 1
Federico Campoli
 
PDF
MySQL Performance for DevOps
Sveta Smirnova
 
PPTX
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Dremio Corporation
 
PDF
Logstash-Elasticsearch-Kibana
dknx01
 
PPTX
Presto best practices for Cluster admins, data engineers and analysts
Shubham Tagra
 
PPTX
SQL Server Database Backup and Restore Plan
Hamid J. Fard
 
ClickHouse Intro
Yegor Andreenko
 
IBM DB2 LUW UDB DBA Training by www.etraining.guru
Ravikumar Nandigam
 
Postgresql database administration volume 1
Federico Campoli
 
MySQL Performance for DevOps
Sveta Smirnova
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Dremio Corporation
 
Logstash-Elasticsearch-Kibana
dknx01
 
Presto best practices for Cluster admins, data engineers and analysts
Shubham Tagra
 
SQL Server Database Backup and Restore Plan
Hamid J. Fard
 

What's hot (20)

PDF
PostgreSql query planning and tuning
Federico Campoli
 
ODP
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
javier ramirez
 
PPTX
Tuning PostgreSQL for High Write Throughput
Grant McAlister
 
PDF
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxData
 
PDF
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
PDF
How to Take Advantage of Optimizer Improvements in MySQL 8.0
Norvald Ryeng
 
PDF
Building a semantic/metrics layer using Calcite
Julian Hyde
 
PDF
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
PgDay.Seoul
 
PPTX
Azure Blob Storage API for Scala and Spark
Braja Krishna Das
 
PPTX
PostgreSQL.pptx
MAYURUGALE6
 
PPTX
20151022 elasticsearch 적용및활용_송준이_sds발표용
Junyi Song
 
PDF
MySQL 8.0 Optimizer Guide
Morgan Tocker
 
PDF
ASE Performance and Tuning Parameters Beyond the cfg File
SAP Technology
 
PDF
The Apache Spark File Format Ecosystem
Databricks
 
PPTX
MemSQL 201: Advanced Tips and Tricks Webcast
SingleStore
 
PDF
Apache Hudi: The Path Forward
Alluxio, Inc.
 
PDF
[215]네이버콘텐츠통계서비스소개 김기영
NAVER D2
 
PPTX
Mapping Data Flows Training April 2021
Mark Kromer
 
PDF
PostgreSQL High Availability in a Containerized World
Jignesh Shah
 
PDF
The Parquet Format and Performance Optimization Opportunities
Databricks
 
PostgreSql query planning and tuning
Federico Campoli
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
javier ramirez
 
Tuning PostgreSQL for High Write Throughput
Grant McAlister
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxData
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
How to Take Advantage of Optimizer Improvements in MySQL 8.0
Norvald Ryeng
 
Building a semantic/metrics layer using Calcite
Julian Hyde
 
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
PgDay.Seoul
 
Azure Blob Storage API for Scala and Spark
Braja Krishna Das
 
PostgreSQL.pptx
MAYURUGALE6
 
20151022 elasticsearch 적용및활용_송준이_sds발표용
Junyi Song
 
MySQL 8.0 Optimizer Guide
Morgan Tocker
 
ASE Performance and Tuning Parameters Beyond the cfg File
SAP Technology
 
The Apache Spark File Format Ecosystem
Databricks
 
MemSQL 201: Advanced Tips and Tricks Webcast
SingleStore
 
Apache Hudi: The Path Forward
Alluxio, Inc.
 
[215]네이버콘텐츠통계서비스소개 김기영
NAVER D2
 
Mapping Data Flows Training April 2021
Mark Kromer
 
PostgreSQL High Availability in a Containerized World
Jignesh Shah
 
The Parquet Format and Performance Optimization Opportunities
Databricks
 
Ad

Viewers also liked (20)

PDF
PostgreSQL WAL for DBAs
PGConf APAC
 
PDF
PostgreSQL on Amazon RDS
PGConf APAC
 
PDF
Lessons PostgreSQL learned from commercial databases, and didn’t
PGConf APAC
 
PDF
Why we love pgpool-II and why we hate it!
PGConf APAC
 
PDF
Use Case: PostGIS and Agribotics
PGConf APAC
 
PDF
Query Parallelism in PostgreSQL: What's coming next?
PGConf APAC
 
PDF
PostgreSQL: Past present Future
PGConf APAC
 
PDF
Lightening Talk - PostgreSQL Worst Practices
PGConf APAC
 
PDF
Introduction to Vacuum Freezing and XID
PGConf APAC
 
PDF
Security Best Practices for your Postgres Deployment
PGConf APAC
 
PDF
Swapping Pacemaker Corosync with repmgr
PGConf APAC
 
PDF
Secure PostgreSQL deployment
Command Prompt., Inc
 
PDF
PostgreSQL Enterprise Class Features and Capabilities
PGConf APAC
 
PDF
Past, Present, and Future Analysis of the Architectural & Engineering Design ...
Lisa Dehner
 
PDF
Java Generics Past, Present and Future - Richard Warburton, Raoul-Gabriel Urma
JAXLondon_Conference
 
PDF
pg_hba.conf 이야기
PgDay.Seoul
 
PDF
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem...
Mark Wong
 
PDF
24/7 Monitoring and Alerting of PostgreSQL
InMobi Technology
 
PDF
Achieving Pci Compliace
Denish Patel
 
PDF
PgDay Asia 2016 - Security Best Practices for your Postgres Deployment
Ashnikbiz
 
PostgreSQL WAL for DBAs
PGConf APAC
 
PostgreSQL on Amazon RDS
PGConf APAC
 
Lessons PostgreSQL learned from commercial databases, and didn’t
PGConf APAC
 
Why we love pgpool-II and why we hate it!
PGConf APAC
 
Use Case: PostGIS and Agribotics
PGConf APAC
 
Query Parallelism in PostgreSQL: What's coming next?
PGConf APAC
 
PostgreSQL: Past present Future
PGConf APAC
 
Lightening Talk - PostgreSQL Worst Practices
PGConf APAC
 
Introduction to Vacuum Freezing and XID
PGConf APAC
 
Security Best Practices for your Postgres Deployment
PGConf APAC
 
Swapping Pacemaker Corosync with repmgr
PGConf APAC
 
Secure PostgreSQL deployment
Command Prompt., Inc
 
PostgreSQL Enterprise Class Features and Capabilities
PGConf APAC
 
Past, Present, and Future Analysis of the Architectural & Engineering Design ...
Lisa Dehner
 
Java Generics Past, Present and Future - Richard Warburton, Raoul-Gabriel Urma
JAXLondon_Conference
 
pg_hba.conf 이야기
PgDay.Seoul
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem...
Mark Wong
 
24/7 Monitoring and Alerting of PostgreSQL
InMobi Technology
 
Achieving Pci Compliace
Denish Patel
 
PgDay Asia 2016 - Security Best Practices for your Postgres Deployment
Ashnikbiz
 
Ad

Similar to How to teach an elephant to rock'n'roll (20)

PDF
query-optimization-techniques_talk.pdf
garos1
 
PDF
Postgres can do THAT?
alexbrasetvik
 
PDF
PostgreSQL query planner's internals
Alexey Ermakov
 
PDF
Postgres Performance for Humans
Citus Data
 
PDF
SQL: Query optimization in practice
Jano Suchal
 
PDF
Teaching PostgreSQL to new people
Tomek Borek
 
PDF
Quick Wins
HighLoad2009
 
PDF
Postgres performance for humans
Craig Kerstiens
 
PPT
Explain that explain
Fabrizio Parrella
 
PDF
MySQL Query Optimisation 101
Federico Razzoli
 
PDF
Indexes don't mean slow inserts.
Anastasia Lubennikova
 
PDF
Writeable CTEs: The Next Big Thing
PostgreSQL Experts, Inc.
 
PDF
PostgreSQL 9.5 Features
Saiful
 
PDF
How MySQL can boost (or kill) your application
Federico Razzoli
 
PDF
Does PostgreSQL respond to the challenge of analytical queries?
Andrey Lepikhov
 
PDF
query_tuning.pdf
ssuserf99076
 
PDF
Non-Relational Postgres
EDB
 
PDF
MariaDB workshop
Alex Chistyakov
 
PDF
Exploring plsql new features best practices september 2013
Andrejs Vorobjovs
 
PDF
How to Design Indexes, Really
MYXPLAIN
 
query-optimization-techniques_talk.pdf
garos1
 
Postgres can do THAT?
alexbrasetvik
 
PostgreSQL query planner's internals
Alexey Ermakov
 
Postgres Performance for Humans
Citus Data
 
SQL: Query optimization in practice
Jano Suchal
 
Teaching PostgreSQL to new people
Tomek Borek
 
Quick Wins
HighLoad2009
 
Postgres performance for humans
Craig Kerstiens
 
Explain that explain
Fabrizio Parrella
 
MySQL Query Optimisation 101
Federico Razzoli
 
Indexes don't mean slow inserts.
Anastasia Lubennikova
 
Writeable CTEs: The Next Big Thing
PostgreSQL Experts, Inc.
 
PostgreSQL 9.5 Features
Saiful
 
How MySQL can boost (or kill) your application
Federico Razzoli
 
Does PostgreSQL respond to the challenge of analytical queries?
Andrey Lepikhov
 
query_tuning.pdf
ssuserf99076
 
Non-Relational Postgres
EDB
 
MariaDB workshop
Alex Chistyakov
 
Exploring plsql new features best practices september 2013
Andrejs Vorobjovs
 
How to Design Indexes, Really
MYXPLAIN
 

More from PGConf APAC (19)

PDF
PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...
PGConf APAC
 
PDF
PGConf APAC 2018: PostgreSQL 10 - Replication goes Logical
PGConf APAC
 
PDF
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQL
PGConf APAC
 
PDF
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC
 
PDF
Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...
PGConf APAC
 
PDF
PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018
PGConf APAC
 
PDF
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
PGConf APAC
 
PDF
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC
 
PDF
PGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC
 
PDF
PGConf APAC 2018 - Where's Waldo - Text Search and Pattern in PostgreSQL
PGConf APAC
 
PDF
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC
 
PDF
PGConf APAC 2018 - PostgreSQL HA with Pgpool-II and whats been happening in P...
PGConf APAC
 
PDF
PGConf APAC 2018 - PostgreSQL performance comparison in various clouds
PGConf APAC
 
PDF
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...
PGConf APAC
 
PDF
PGConf APAC 2018 - Tale from Trenches
PGConf APAC
 
PDF
PGConf APAC 2018 Keynote: PostgreSQL goes eleven
PGConf APAC
 
PDF
Amazon (AWS) Aurora
PGConf APAC
 
PDF
(Ab)using 4d Indexing
PGConf APAC
 
PDF
Go Faster With Native Compilation
PGConf APAC
 
PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...
PGConf APAC
 
PGConf APAC 2018: PostgreSQL 10 - Replication goes Logical
PGConf APAC
 
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQL
PGConf APAC
 
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PGConf APAC
 
Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...
PGConf APAC
 
PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018
PGConf APAC
 
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
PGConf APAC
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC
 
PGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC
 
PGConf APAC 2018 - Where's Waldo - Text Search and Pattern in PostgreSQL
PGConf APAC
 
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC
 
PGConf APAC 2018 - PostgreSQL HA with Pgpool-II and whats been happening in P...
PGConf APAC
 
PGConf APAC 2018 - PostgreSQL performance comparison in various clouds
PGConf APAC
 
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...
PGConf APAC
 
PGConf APAC 2018 - Tale from Trenches
PGConf APAC
 
PGConf APAC 2018 Keynote: PostgreSQL goes eleven
PGConf APAC
 
Amazon (AWS) Aurora
PGConf APAC
 
(Ab)using 4d Indexing
PGConf APAC
 
Go Faster With Native Compilation
PGConf APAC
 

Recently uploaded (20)

PDF
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
PPTX
GALILEO CRS SYSTEM | GALILEO TRAVEL SOFTWARE
philipnathen82
 
PDF
Protecting the Digital World Cyber Securit
dnthakkar16
 
PPTX
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PDF
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
PPTX
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
PPTX
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
PPTX
ConcordeApp: Engineering Global Impact & Unlocking Billions in Event ROI with AI
chastechaste14
 
PDF
vAdobe Premiere Pro 2025 (v25.2.3.004) Crack Pre-Activated Latest
imang66g
 
PPTX
TRAVEL APIs | WHITE LABEL TRAVEL API | TOP TRAVEL APIs
philipnathen82
 
PPTX
Presentation about variables and constant.pptx
safalsingh810
 
PPT
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
PPTX
Explanation about Structures in C language.pptx
Veeral Rathod
 
PPTX
Maximizing Revenue with Marketo Measure: A Deep Dive into Multi-Touch Attribu...
bbedford2
 
PPTX
Presentation about variables and constant.pptx
kr2589474
 
PDF
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
PDF
Key Features to Look for in Arizona App Development Services
Net-Craft.com
 
DOCX
Can You Build Dashboards Using Open Source Visualization Tool.docx
Varsha Nayak
 
PDF
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
PPTX
Odoo Integration Services by Candidroot Solutions
CandidRoot Solutions Private Limited
 
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
GALILEO CRS SYSTEM | GALILEO TRAVEL SOFTWARE
philipnathen82
 
Protecting the Digital World Cyber Securit
dnthakkar16
 
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
ConcordeApp: Engineering Global Impact & Unlocking Billions in Event ROI with AI
chastechaste14
 
vAdobe Premiere Pro 2025 (v25.2.3.004) Crack Pre-Activated Latest
imang66g
 
TRAVEL APIs | WHITE LABEL TRAVEL API | TOP TRAVEL APIs
philipnathen82
 
Presentation about variables and constant.pptx
safalsingh810
 
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
Explanation about Structures in C language.pptx
Veeral Rathod
 
Maximizing Revenue with Marketo Measure: A Deep Dive into Multi-Touch Attribu...
bbedford2
 
Presentation about variables and constant.pptx
kr2589474
 
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
Key Features to Look for in Arizona App Development Services
Net-Craft.com
 
Can You Build Dashboards Using Open Source Visualization Tool.docx
Varsha Nayak
 
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
Odoo Integration Services by Candidroot Solutions
CandidRoot Solutions Private Limited
 

How to teach an elephant to rock'n'roll

  • 1. Maxim Boguk How to teach an elephant to rock’n’roll
  • 2. About Data Egret dataegret.com At Data Egret (formerly known as PostgreSQL-Consulting.com) we provide database maintenance solutions and system administration support covering PostgreSQL database life cycle to large and small businesses alike.  27/7 Support  Consulting  Health Check  Monitoring  Training
  • 3. dataegret.com How to teach an elephant to rock’n’roll
  • 4. dataegret.com  Some queries can’t be automatically optimized  An alternative approach  Manual fast algorithm – huge performance boost Really?
  • 5. dataegret.com  Demonstrate very useful fast alternative algorithms  Provide methods of the PL/PgSQL conversion to plain SQL  Additional: Self-contained learning package Goals
  • 6. dataegret.com  Basic query optimization techniques  PostgreSQL query planner and optimizer limitations  Compare PL/PgSQL performance vs plain SQL (The Pl/PgSQL samples are written in a way that is easy to understand, however they are not the fastest possible implementation) Non goals
  • 7. dataegret.com  PL/PgSQL knowledge  PostgreSQL SQL features:  WITH [RECURSIVE]  [JOIN] LATERAL  UNNEST [WITH ORDINALITY] Prerequisites
  • 8. dataegret.com  PostgreSQL version 9.6  Works on 9.5 and 9.4 without (major) changes  Running on the 9.3 (and down to 8.4) is possible, but requires a work-around implementation for missing features. PostgreSQL version
  • 9. dataegret.com  Problem description  Native approach - a simple SQL query  EXPLAIN ANALYZE  Alternative algorithm on PL/PgSQL  The same algorithm implemented on SQL  EXPLAIN ANALYZE  Performance results Structure
  • 11. dataegret.com Initial data preparation Schema01 CREATE UNIQUE INDEX blog_post_test_author_id_ctime_ukey ON b_p_t USING btree (author_id, ctime);
  • 12. dataegret.com Create blog posts table: DROP TABLE IF EXISTS b_p_t; CREATE TABLE b_p_t ( id BIGSERIAL PRIMARY KEY, ctime TIMESTAMP NOT NULL, author_id INTEGER NOT NULL, payload text); Initial data preparation Part 101
  • 13. dataegret.com Populate the table with test data: -- generate 10.000.000 blog posts from 1000 authors average 10000 post per author from last 5 years -- expect few hours run time INSERT INTO b_p_t (ctime, author_id, payload) SELECT -- random in last 5 years now()-(random()*365*24*3600*5)*'1 second'::interval AS ctime, -- 1001 author (random()*1000)::int AS author_id, -- random text-like payload 100-2100 bytes long (SELECT string_agg(substr('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVW XYZ0123456789 ', (random() * 72)::integer + 1, 1), '') FROM generate_series(1, 100+i%10+(random() * 2000)::integer)) AS payload -- 10M posts FROM generate_series(1, 10000000) AS g(i); Initial data preparation Part 201
  • 14. Initial data preparation Part 3 Populate the table with test data (continue): --delete generated duplicates DELETE FROM b_p_t where (author_id, ctime) IN (select author_id, ctime from b_p_t group by author_id, ctime having count(*)>1); CREATE INDEX bpt_ctime_key on b_p_t(ctime); CREATE UNIQUE INDEX bpt_a_id_ctime_ukey on b_p_t(author_id, ctime); -- create authors table DROP TABLE IF EXISTS a_t; CREATE TABLE a_t AS select distinct on (author_id) author_id AS id, 'author_'||author_id AS name FROM b_p_t; ALTER TABLE a_t ADD PRIMARY KEY (id); ALTER TABLE b_p_t ADD CONSTRAINT author_id_fk FOREIGN KEY (author_id) REFERENCE a_t(id); ANALYZE a_t; dataegret.com 01
  • 16. dataegret.com  Queries with large offset are always slow  To produce 1.000.001’st row, database fist needs to iterate over all 1.000.000 rows  Alternative: use fast Index Only Scan to skip the first 1.000.000 rows IOS for large offsets02
  • 17. dataegret.com SELECT * FROM b_p_t ORDER BY id OFFSET 1000000 LIMIT 10 IOS for large offsets Native SQL02
  • 18. 02 dataegret.com Limit (actual time=503..503 rows=10 loops=1) -> Index Scan using b_p_t_pkey on b_p_t (actual time=0..386 rows=1000010 loops=1) IOS for large offsets Native SQL EXPLAIN
  • 19. 02 dataegret.com CREATE OR REPLACE FUNCTION ios_offset_test (a_offset BIGINT, a_limit BIGINT) RETURNS SETOF b_p_t LANGUAGE plpgsql AS $function$ DECLARE start_id b_p_t.id%TYPE; BEGIN --find a starting id using IOS to skip OFFSET rows SELECT id INTO start_id FROM b_p_t ORDER BY id OFFSET a_offset LIMIT 1; --return result using normal index scan RETURN QUERY SELECT * FROM b_p_t WHERE id>=start_id ORDER BY ID LIMIT a_limit; END; $function$; IOS for large offsets PL/PgSQL
  • 20. 02 dataegret.com SELECT bpt.* FROM ( --find a starting id using IOS to skip OFFSET rows SELECT id FROM b_p_t ORDER BY id OFFSET 1000000 LIMIT 1 ) AS t, LATERAL ( --return result using normal index scan SELECT * FROM b_p_t WHERE id>=t.id ORDER BY id LIMIT 10 ) AS bpt; IOS for large offsets Advanced SQL
  • 21. 02 dataegret.com -> Index Only Scan using blog_post_test_pkey on b_p_t (actual time=0..236 rows=1000001 loops=1) -> Index Scan using blog_post_test_pkey on b_p_t (actual time=0.016..0.026 rows=10 loops=1) Index Cond: (id >= b_p_t.id) IOS for large offsets Advanced SQL EXPLAIN
  • 22. dataegret.com IOS for large offsets Performance OFFSET Native PL/PgSQL Advanced 1000 0.7ms 0.5ms 0.4ms 10000 3.0ms 1.2ms 1.1ms 100000 26.2ms 7.7ms 7.4ms 1000000 273.0ms 71.0ms 70.9ms 02
  • 24. dataegret.com  A combination of JOIN, ORDER BY and LIMIT  The database performs join for every row (not only for the LIMIT rows)  In many cases it’s unnecessary  Alternative: pull down LIMIT+ORDER BY under JOIN Pull down LIMIT under JOIN03
  • 25. dataegret.com SELECT * FROM b_p_t JOIN a_t ON a_t.id=author_id WHERE author_id IN (1,2,3,4,5) ORDER BY ctime LIMIT 10 Pull down LIMIT under JOIN Native SQL03
  • 26. 03 dataegret.com -> Sort (actual time=345..345 rows=10 loops=1) -> Nested Loop (actual time=0.061..295.832 rows=50194 loops=1) -> Index Scan using b_p_t_author_id_ctime_ukey on b_p_t (actual time=0..78 rows=50194 loops=1) Index Cond: (author_id = ANY ('{1,2,3,4,5}'::integer[])) -> Index Scan using a_t_pkey on a_t (actual time=0.002 rows=1 loops=50194) Index Cond: (id = b_p_t.author_id) Pull down LIMIT under JOIN Native SQL EXPLAIN
  • 27. 03 dataegret.com CREATE OR REPLACE FUNCTION join_limit_pulldown_test (a_authors BIGINT[], a_limit BIGINT) RETURNS TABLE (id BIGINT, ctime TIMESTAMP, author_id INT, payload TEXT, name TEXT) LANGUAGE plpgsql AS $function$ DECLARE t record; BEGIN FOR t IN ( -- find ONLY required rows first SELECT * FROM b_p_t WHERE b_p_t.author_id=ANY(a_authors) ORDER BY ctime LIMIT a_limit ) LOOP -- and only after join with authors RETURN QUERY SELECT t.*, a_t.name FROM a_t WHERE a_t.id=t.author_id; END LOOP; END; $function$; Pull down LIMIT under JOIN PL/PgSQL
  • 28. 03 dataegret.com SELECT bpt_with_a_name.* FROM -- find ONLY required rows first ( SELECT * FROM b_p_t WHERE author_id IN (1,2,3,4,5) ORDER BY ctime LIMIT 10 ) AS t, LATERAL ( -- and only after join with the authors SELECT t.*,a_t.name FROM a_t WHERE a_t.id=t.author_id ) AS bpt_with_a_name -- second ORDER BY required ORDER BY ctime LIMIT 10; Pull down LIMIT under JOIN Advanced SQL
  • 29. 03 dataegret.com -> Nested Loop (actual time=68..68 rows=10 loops=1) -> Sort (actual time=68..68 rows=10 loops=1) -> Index Scan using b_p_t_author_id_ctime_ukey on b_p_t (actual time=0..49 rows=50194 loops=1) Index Cond: (author_id = ANY ('{1,2,3,4,5}'::integer[])) -> Index Scan using a_t_pkey on a_t (actual time=0.002..0.002 rows=1 loops=10) Index Cond: (id = b_p_t.author_id) Pull down LIMIT under JOIN Advanced SQL EXPLAIN
  • 30. 03 dataegret.com author_id IN (1,2,3,4,5) / LIMIT 10 Native SQL: 155ms PL/PgSQL: 52ms Advanced SQL: 51ms Pull down LIMIT under JOIN Performance
  • 32. 04 dataegret.com  DISTINCT on a large table always slow  It scans whole table  Even if the amount of distinct values low  Alternative: creative index use to perform DISTINCT  Technique known as LOOSE INDEX SCAN: https://wiki.postgresql.org/wiki/Loose_indexscan DISTINCT
  • 34. 04 dataegret.com Unique (actual time=0..5235 rows=1001 loops=1) -> Index Only Scan using b_p_t_author_id_ctime_ukey on b_p_t (actual time=0..3767 rows=9999964 loops=1) DISTINCT Native SQL EXPLAIN
  • 35. 04 dataegret.com CREATE OR REPLACE FUNCTION fast_distinct_test () RETURNS SETOF INTEGER LANGUAGE plpgsql AS $function$ DECLARE _author_id b_p_t.author_id%TYPE; BEGIN --start from least author_id SELECT min(author_id) INTO _author_id FROM b_p_t; LOOP --finish if nothing found EXIT WHEN _author_id IS NULL; --return found value RETURN NEXT _author_id; --find the next author_id > current author_id SELECT author_id INTO _author_id FROM b_p_t WHERE author_id>_author_id ORDER BY author_id LIMIT 1; END LOOP; END; $function$; DISTINCT PL/PgSQL
  • 36. 04 dataegret.com WITH RECURSIVE t AS ( --start from least author_id (SELECT author_id AS _author_id FROM b_p_t ORDER BY author_id LIMIT 1 ) UNION ALL SELECT author_id AS _author_id FROM t, LATERAL ( --find the next author_id > current author_id SELECT author_id FROM b_p_t WHERE author_id>t._author_id ORDER BY author_id LIMIT 1 ) AS a_id ) --return found values SELECT _author_id FROM t; DISTINCT Advanced SQL
  • 37. 04 dataegret.com -> Index Only Scan using b_p_t_author_id_ctime_ukey on b_p_t b_p_t_1 (actual time=0.015..0.015 rows=1 loops=1) -> Index Only Scan using b_p_t_author_id_ctime_ukey on b_p_t (actual time=0.007..0.007 rows=1 loops=1001) DISTINCT Advanced SQL EXPLAIN
  • 38. 04 dataegret.com Native SQL: 2660ms PL/PgSQL: 18ms Advanced SQL: 10ms DISTINCT Performance
  • 40. 05 dataegret.com  DISTINCT ON with IN() used when an application require fetch the latest data for SELECTED authors in single query  DISTINCT ON queries usually slow  Alternative: creative use of index to calculate DISTINCT ON DISTINCT ON with IN ()
  • 41. 05 dataegret.com SELECT DISTINCT ON (author_id) * FROM b_p_t WHERE author_id IN (1,2,3,4,5) ORDER BY author_id, ctime DESC DISTINCT ON with IN () Native SQL
  • 42. 05 dataegret.com Unique (actual time=160..204 rows=5 loops=1) -> Sort (actual time=160..195 rows=50194 loops=1) -> Index Scan using b_p_t_author_id_ctime_ukey on b_p_t (actual time=0..50 rows=50194 loops=1) Index Cond: (author_id = ANY ('{1,2,3,4,5}'::integer[])) DISTINCT ON with IN () Native SQL EXPLAIN
  • 43. 05 dataegret.com CREATE OR REPLACE FUNCTION distinct_on_test (a_authors INT[]) RETURNS SETOF b_p_t LANGUAGE plpgsql AS $function$ DECLARE _a b_p_t.author_id%TYPE; BEGIN -- loop over authors list FOREACH _a IN ARRAY a_authors LOOP -- return the latest post for author RETURN QUERY SELECT * FROM b_p_t WHERE author_id=_a ORDER BY ctime DESC LIMIT 1; END LOOP; END; $function$; DISTINCT ON with IN () PL/PgSQL
  • 44. 05 dataegret.com SELECT bpt.* -- loop over authors list FROM unnest(ARRAY[1,2,3,4,5]::INT[]) AS t(_author_id), LATERAL ( -- return the latest post for author SELECT * FROM b_p_t WHERE author_id=t._author_id ORDER BY ctime DESC LIMIT 1 ) AS bpt; DISTINCT ON with IN () Advanced SQL
  • 45. 05 dataegret.com Nested Loop (actual time=0.02..0.06 rows=5 loops=1) -> Index Scan Backward using b_p_t_author_id_ctime_ukey on b_p_t (actual time=0.009..0.009 rows=1 loops=5) Index Cond: (author_id = t._author) DISTINCT ON with IN () Advanced SQL EXPLAIN
  • 46. 05 dataegret.com Native SQL: 110.0ms PL/PgSQL: 0.5ms Advanced SQL: 0.4ms DISTINCT ON with IN () Performance
  • 48. 06 dataegret.com  DISTINCT ON used when an application require fetch the latest data for ALL authors in single query  DISTINCT ON on a large table always is performance killer  Alternative: again, creative use of indexes and SQL saves the day DISTINCT ON over all table
  • 49. 06 dataegret.com SELECT DISTINCT ON (author_id) * FROM b_p_t ORDER BY author_id, ctime DESC DISTINCT ON over all table Native SQL
  • 50. 06 dataegret.com Unique (actual time=29938..39450 rows=1001 loops=1) -> Sort (actual time=29938..37845 rows=9999964 loops=1) Sort Key: author_id, ctime DESC Sort Method: external merge Disk: 10002168kB -> Seq Scan on b_p_t (actual time=0.004..2472 rows=9999964 loops=1) DISTINCT ON over all table Native SQL EXPLAIN
  • 51. 06 dataegret.com DISTINCT ON over all table PL/PgSQL CREATE OR REPLACE FUNCTION fast_distinct_on_test() RETURNS SETOF b_p_t LANGUAGE plpgsql AS $function$ DECLARE _b_p_t record; BEGIN --start from greatest author_id SELECT * INTO _b_p_t FROM b_p_t ORDER BY author_id DESC, ctime DESC LIMIT 1; LOOP --finish if nothing found EXIT WHEN _b_p_t IS NULL; --return found value RETURN NEXT _b_p_t; --latest post from next author_id < current author_id SELECT * FROM b_p_t INTO _b_p_t WHERE author_id<_b_p_t.author_id ORDER BY author_id DESC, ctime DESC LIMIT 1; END LOOP; END; $function$;
  • 52. 06 dataegret.com WITH RECURSIVE t AS ( --start from greatest author_id (SELECT * FROM b_p_t ORDER BY author_id DESC, ctime DESC LIMIT 1) UNION ALL SELECT bpt.* FROM t, LATERAL ( --latest post from the next author_id < current author_id SELECT * FROM b_p_t WHERE author_id<t.author_id ORDER BY author_id DESC, ctime DESC LIMIT 1 ) AS bpt ) --return found values SELECT * FROM t; DISTINCT ON over all table Advanced SQL
  • 53. 06 dataegret.com -> Index Scan Backward using b_p_t_author_id_ctime_ukey on b_p_t (actual time=0.008..0.008 rows=1 loops=1) -> Index Scan Backward using b_p_t_author_id_ctime_ukey on b_p_t (actual time=0.007..0.007 rows=1 loops=1001) Index Cond: (author_id < t_1.author_id) DISTINCT ON over all table Advanced SQL EXPLAIN
  • 54. 06 dataegret.com --fast distinct(author_id) implementation (from part 4) WITH RECURSIVE t AS ( --start from least author_id (SELECT author_id AS _author_id FROM b_p_t ORDER BY author_id LIMIT 1) UNION ALL SELECT author_id AS _author_id FROM t, LATERAL ( --find the next author_id > current author_id SELECT author_id FROM b_p_t WHERE author_id>t._author_id ORDER BY author_id LIMIT 1 ) AS a_id ) SELECT bpt.* -- loop over authors list (from part 5) FROM t, LATERAL ( -- return the latest post for each author SELECT * FROM b_p_t WHERE b_p_t.author_id=t._author_id ORDER BY ctime DESC LIMIT 1 ) AS bpt; DISTINCT ON over all table Alternative advanced SQL
  • 55. 06 dataegret.com Native SQL: 3755ms PL/PgSQL: 27ms Advanced SQL: 13ms Alternative advanced SQL: 18ms DISTINCT ON over all table Performance
  • 56. 07 Let the real fun begin: the basic news feed
  • 57. dataegret.com The idea of basic news feed is very simple:  find the N newest posts from the subscription list authors  may be with some offset  easy to implement  hard to make run fast The basic news feed07
  • 58. dataegret.com SELECT * FROM b_p_t WHERE author_id IN (1,2,3,4,5) ORDER BY ctime DESC LIMIT 20 OFFSET 20 The basic news feed Native SQL07
  • 59. 07 dataegret.com Limit (actual time=146..146 rows=20 loops=1) -> Sort (actual time=146..146 rows=40 loops=1) Sort Key: ctime DESC Sort Method: top-N heapsort Memory: 84kB -> Index Scan using bpt_a_id_ctime_ukey on b_p_t (actual time=0.015..105 rows=50194 loops=1) Index Cond: (author_id = ANY ('{1,2,3,4,5}'::integer[])) The basic news feed EXPLAIN
  • 60. 07 dataegret.com  The query fetches all the posts from all listed authors  It fetches whole rows (with all the payloads)  The more posts – the slower the query  The longer list of authors used – the slower the query The basic news feed Problems
  • 61. 07 dataegret.com  An ideal feed query implementation should require no more then OFFSET+LIMIT index probes on blog_post table  Must be fast at least for small OFFSET/LIMIT values (100- 1000) The basic news feed Requirements
  • 62. 07 dataegret.com The basic news feed Alternative idea Step 0 Lets start from array of the author_id (_a_ids): '{20,10,30,100,16}'::int[] pos author_id 1 20 2 10 3 30 4 100 5 16
  • 63. 07 dataegret.com The basic news feed Alternative idea Step 1 Now, using ideas from block 5 populate the ctime of the latest posts for each author (_a_ctimes): SELECT array_agg(( SELECT ctime FROM b_p_t WHERE author_id=_a_id ORDER BY ctime DESC LIMIT 1 ) ORDER BY _pos) FROM UNNEST('{20,10,30,100,16}'::int[]) WITH ORDINALITY AS u(_a_id, _pos) pos ctime 1 01-02-2017 2 03-02-2017 3 10-02-2017 4 28-02-2017 5 21-02-2017
  • 64. 07 dataegret.com The basic news feed Alternative idea Step 2 pos author_id 1 20 2 10 3 30 4 100 5 16 pos ctime 1 01-02-2017 2 03-02-2017 3 10-02-2017 4 28-02-2017 5 21-02-2017 Select position of the latest post from _a_ctimes. It will be first row of final result set. SELECT pos FROM UNNEST(_a_ctimes) WITH ORDINALITY AS u(a_ctime, pos) ORDER BY a_ctime DESC NULLS LAST LIMIT 1
  • 65. 07 dataegret.com The basic news feed Alternative idea Step 3 pos author_id 1 20 2 10 3 30 4 100 5 16 pos ctime 1 01-02-2017 2 03-02-2017 3 10-02-2017 4 28-02-2017 5 21-02-2017 Now, replace row 4 in _a_ctimes with ctime of the previous post from the same author. SELECT ctime AS _a_ctime FROM b_p_t WHERE author_id=_a_ids[pos] AND ctime<_a_ctimes[pos] ORDER BY ctime DESC LIMIT 1 Found author_id ctime 1 4 28-02-2017
  • 66. 07 dataegret.com The basic news feed Alternative idea pos author_id 1 20 2 10 3 30 4 100 5 16 pos ctime 1 01-02-2017 2 03-02-2017 3 10-02-2017 4 20-02-2017 5 21-02-2017 “Rinse and repeat” steps 2 and 3 collecting found rows, until requested LIMIT rows found. Found author_id ctime 1 4 28-02-2017 2 5 21-02-2017
  • 67. dataegret.com The basic news feed PL/PgSQL07 CREATE OR REPLACE FUNCTION feed_test (a_authors INT[], a_limit INT, a_offset INT) RETURNS SETOF b_p_t LANGUAGE plpgsql AS $function$ DECLARE _a_ids INT[] := a_authors; DECLARE _a_ctimes TIMESTAMP[]; DECLARE _rows_found INT := 0; DECLARE _pos INT; DECLARE _a_ctime TIMESTAMP; DECLARE _a_id INT; BEGIN -- loop over authors list FOR _pos IN SELECT generate_subscripts(a_authors, 1) LOOP --populate the latest post ctime for every author SELECT ctime INTO _a_ctime FROM b_p_t WHERE author_id=_a_ids[_pos] ORDER BY ctime DESC LIMIT 1; _a_ctimes[_pos] := _a_ctime; END LOOP;
  • 68. dataegret.com The basic news feed PL/PgSQL (continue)07 WHILE _rows_found<a_limit+a_offset LOOP --seek position of the latest post in ctime array SELECT pos INTO _pos FROM UNNEST(_a_ctimes) WITH ORDINALITY AS u(a_ctime, pos) ORDER BY a_ctime DESC NULLS LAST LIMIT 1; --get ctime of previous post of the same author SELECT ctime INTO _a_ctime FROM b_p_t WHERE author_id=_a_ids[_pos] AND ctime<_a_ctimes[_pos] ORDER BY ctime DESC LIMIT 1;
  • 69. dataegret.com The basic news feed PL/PgSQL (continue)07 --offset rows done, start return results IF _rows_found>=a_offset THEN RETURN QUERY SELECT * FROM b_p_t WHERE author_id=_a_ids[_pos] AND ctime=_a_ctimes[_pos]; END IF; --increase found rows count _rows_found := _rows_found+1; --replace ctime for author with previous message ctime _a_ctimes[_pos] := _a_ctime; END LOOP; END; $function$;
  • 70. 07 dataegret.com WITH RECURSIVE r AS ( SELECT --empty result NULL::b_p_t AS _return, --zero rows found yet 0::integer AS _rows_found, --populate author ARRAY '{1,2,3,4,5}'::int[] AS _a_ids, --populate author ARRAY of latest blog posts (SELECT array_agg(( SELECT ctime FROM b_p_t WHERE author_id=_a_id ORDER BY ctime DESC LIMIT 1 ) ORDER BY _pos) FROM UNNEST('{1,2,3,4,5}'::int[]) WITH ORDINALITY AS u(_a_id, _pos) ) AS _a_ctimes UNION ALL SELECT --return found row to the result set if we already done OFFSET or more entries CASE WHEN _rows_found>=100 THEN ( SELECT b_p_t FROM b_p_t WHERE author_id=_a_ids[_pos] AND ctime=_a_ctimes[_pos] ) ELSE NULL END, --increase found row count _rows_found+1, --pass through the same a_ids array _a_ids, --replace current ctime for author with previous message ctime for the same author _a_ctimes[:_pos-1]||_a_ctime||_a_ctimes[_pos+1:] FROM r, LATERAL ( SELECT _pos FROM UNNEST(_a_ctimes) WITH ORDINALITY AS u(_a_ctime, _pos) ORDER BY _a_ctime DESC NULLS LAST LIMIT 1 ) AS t1, LATERAL ( SELECT ctime AS _a_ctime FROM b_p_t WHERE author_id=_a_ids[_pos] AND ctime<_a_ctimes[_pos] ORDER BY ctime DESC LIMIT 1 ) AS t2 --found the required amount of rows (offset+limit done) WHERE _rows_found<105 ) SELECT (_return).* FROM r WHERE _return IS NOT NULL ORDER BY _rows_found; The basic news feed Advanced SQL
  • 71. 07 dataegret.com WITH RECURSIVE r AS ( --initial part of recursive union SELECT ... UNION ALL --main part of recursive union SELECT … --exit condition WHERE _rows_found<200 ) --produce final ordered result SELECT (_return).* FROM r WHERE _return IS NOT NULL ORDER BY _rows_found Alternative implementation Advanced SQL (structure overview)
  • 72. 07 dataegret.com --empty result NULL::b_p_t AS _return, --zero rows found so far 0::integer AS _rows_found, --author ARRAY '{1,2,3,4,5}'::int[] AS _a_ids, --populate author ARRAY of latest blog posts (see part 5) (SELECT array_agg(( SELECT ctime FROM b_p_t WHERE author_id=_a_id ORDER BY ctime DESC LIMIT 1 ) ORDER BY _pos) FROM UNNEST('{1,2,3,4,5}'::int[]) WITH ORDINALITY AS u(_a_id, _pos) ) AS _a_ctimes Alternative implementation Advanced SQL (initial part)
  • 73. 07 dataegret.com --return found row to the result set if we already done OFFSET or more entries CASE WHEN _rows_found>=100 THEN ( SELECT b_p_t FROM b_p_t WHERE author_id=_a_ids[_pos] AND ctime=_a_ctimes[_pos] ) ELSE NULL END, --increase found row count _rows_found+1, --pass through the same a_ids array _a_ids, --replace current ctime for author with previous message ctime _a_ctimes[:_pos-1]||_a_ctime||_a_ctimes[_pos+1:] FROM r, LATERAL ( SELECT _pos FROM UNNEST(_a_ctimes) WITH ORDINALITY AS u(_a_ctime, _pos) ORDER BY _a_ctime DESC NULLS LAST LIMIT 1 ) AS t1, LATERAL ( SELECT ctime AS _a_ctime FROM b_p_t WHERE author_id=_a_ids[_pos] AND ctime<_a_ctimes[_pos] ORDER BY ctime DESC LIMIT 1 ) AS t2 Alternative implementation Advanced SQL (main part)
  • 74. 07 dataegret.com Performance (with 10 authors) LIMIT OFFSET Native PL/PgSQL Alternative 1 0 180ms 1.0ms 1.0ms 10 0 182ms 1.8ms 1.3ms 10 10 182ms 1.9ms 1.9ms 10 1000 187ms 53.0ms 24.7ms 1000 0 194ms 100.8ms 35.1ms 100 100 185ms 13.2ms 6.2ms
  • 76. dataegret.com  Efficient execution some popular queries requires implementation of alternative procedural algorithm  Implementation of custom algorithms usually easier when using PL/PgSQL  The same algorithm implemented on SQL runs faster  Process: 1. Implement and debug algorithm on PL/PgSQL 2. Convert to SQL Final notes