SlideShare a Scribd company logo
PostgreSQL Procedural
Languages:
Tips, Tricks & Gotchas
Who Am I?
● Jim Mlodgenski
– jimm@openscg.com
– @jim_mlodgenski
● Co-organizer of
– NYC PUG (www.nycpug.org)
– Philly PUG (www.phlpug.org)
● CTO, OpenSCG
– www.openscg.com
PostgreSQL Procedural Languages: Tips, Tricks and Gotchas
Stored procedures/functions
● Code that runs inside of the database
● Used for:
– Performance
– Security
– Convenience
functions=# SELECT airport FROM bird_strikes LIMIT 5;
airport
--------------------------
NEWARK LIBERTY INTL ARPT
UNKNOWN
DENVER INTL AIRPORT
CHICAGO O'HARE INTL ARPT
JOHN F KENNEDY INTL
(5 rows)
Source: http://wildlife.faa.gov/
Sample Data
functions=# SELECT count(*)
functions-# FROM bird_strikes
functions-# WHERE get_iata_code_from_abbr_name(airport) =
'LAX';
count
-------
850
(1 row)
Time: 13490.611 ms
Data Formatting Functions
functions=# EXPLAIN ANALYZE SELECT count(*) FROM bird_strikes ...
QUERY PLAN
------------------------------------------------------------------------
Aggregate (cost=29418.79..29418.80 rows=1 width=0) (actual
time=13463.628..13463.629 rows=1 loops=1)
-> Seq Scan on bird_strikes (cost=0.00..29417.55 rows=497 width=0)
(actual time=15.721..13463.293 rows=850 loops=1)
Filter: ((get_iata_code_from_abbr_name(airport))::text =
'LAX'::text)
Rows Removed by Filter: 98554
Planning time: 0.124 ms
Execution time: 13463.682 ms
(6 rows)
Check Performance
functions=# set track_functions = 'pl';
SET
functions=# select * from pg_stat_user_functions;
(No rows)
functions=# SELECT count(*) FROM bird_strikes ...
-[ RECORD 1 ]
count | 850
Track Function Usage
functions=# select * from pg_stat_user_functions;
-[ RECORD 1 ]----------------------------
funcid | 41247
schemaname | public
funcname | get_iata_code_from_name
calls | 88547
total_time | 12493.419
self_time | 12493.419
-[ RECORD 2 ]----------------------------
funcid | 41246
schemaname | public
funcname | get_iata_code_from_abbr_name
calls | 99404
total_time | 13977.674
self_time | 1484.255
Isolate Performance Issues
CREATE OR REPLACE FUNCTION get_iata_code_from_abbr_name(abbr_name varchar)
RETURNS varchar AS
$$
DECLARE
working_name varchar;
code varchar := null;
BEGIN
working_name := upper(abbr_name);
IF working_name = 'UNKNOWN' THEN
RETURN null;
END IF;
working_name := replace(working_name, 'INTL', 'INTERNATIONAL');
working_name := replace(working_name, 'ARPT', 'AIRPORT');
working_name := replace(working_name, 'MUNI', 'MUNICIPAL');
working_name := replace(working_name, 'METRO', 'METROPOLITAN');
working_name := replace(working_name, 'NATL', 'NATIONAL');
working_name := replace(working_name, '-', ' ');
working_name := replace(working_name, '/', ' ');
working_name := working_name || '%';
code := get_iata_code_from_name(working_name);
RETURN code;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION get_iata_code_from_name(airport_name varchar)
RETURNS varchar AS
$$
DECLARE
working_name varchar;
code varchar := null;
BEGIN
working_name := upper(airport_name);
EXECUTE $__$ SELECT iata_code
FROM airports
WHERE upper(name) LIKE $1
$__$
INTO code
USING working_name;
RETURN code;
END;
$$ LANGUAGE plpgsql;
Debugger
http://git.postgresql.org/gitweb/?p=pldebugger.git
functions=# select * from pl_profiler ;
func_oid | line_number | line | exec_count | total_time | longest_time
----------+-------------+---------------------------------------------------------------------+------------+------------+--------------
41246 | 1 | | 0 | 0 | 0
41246 | 2 | DECLARE | 0 | 0 | 0
41246 | 3 | working_name varchar; | 0 | 0 | 0
41246 | 4 | code varchar := null; | 0 | 0 | 0
41246 | 5 | BEGIN | 0 | 0 | 0
41246 | 6 | working_name := upper(abbr_name); | 99404 | 210587 | 363
41246 | 7 | | 0 | 0 | 0
41246 | 8 | IF working_name = 'UNKNOWN' THEN | 99404 | 63406 | 97
41246 | 9 | RETURN null; | 10857 | 2744 | 15
41246 | 10 | END IF; | 0 | 0 | 0
41246 | 11 | | 0 | 0 | 0
41246 | 12 | working_name := replace(working_name, 'INTL', 'INTERNATIONAL'); | 88547 | 116474 | 145
41246 | 13 | working_name := replace(working_name, 'ARPT', 'AIRPORT'); | 88547 | 83015 | 91
41246 | 14 | working_name := replace(working_name, 'MUNI', 'MUNICIPAL'); | 88547 | 70676 | 74
41246 | 15 | working_name := replace(working_name, 'METRO', 'METROPOLITAN'); | 88547 | 67392 | 63
41246 | 16 | working_name := replace(working_name, 'NATL', 'NATIONAL'); | 88547 | 64681 | 70
41246 | 17 | | 0 | 0 | 0
41246 | 18 | working_name := replace(working_name, '-', ' '); | 88547 | 66771 | 62
41246 | 19 | working_name := replace(working_name, '/', ' '); | 88547 | 65054 | 66
41246 | 20 | working_name := working_name || '%'; | 88547 | 64892 | 207
41246 | 21 | | 0 | 0 | 0
41246 | 22 | code := get_iata_code_from_name(working_name); | 88547 | 12282997 | 3709
41246 | 23 | | 0 | 0 | 0
41246 | 24 | RETURN code; | 88547 | 33374 | 14
41246 | 25 | END; | 0 | 0 | 0
41247 | 1 | | 0 | 0 | 0
41247 | 2 | DECLARE | 0 | 0 | 0
41247 | 3 | working_name varchar; | 0 | 0 | 0
41247 | 4 | code varchar := null; | 0 | 0 | 0
41247 | 5 | BEGIN | 0 | 0 | 0
41247 | 6 | working_name := upper(airport_name); | 88547 | 170273 | 90
41247 | 7 | | 0 | 0 | 0
41247 | 8 | EXECUTE $__$ SELECT iata_code | 88547 | 11572604 | 3273
41247 | 9 | FROM airports | 0 | 0 | 0
41247 | 10 | WHERE upper(name) LIKE $1 | 0 | 0 | 0
41247 | 11 | $__$ | 0 | 0 | 0
41247 | 12 | INTO code | 0 | 0 | 0
41247 | 13 | USING working_name; | 0 | 0 | 0
41247 | 14 | | 0 | 0 | 0
41247 | 15 | RETURN code; | 88547 | 121574 | 27
41247 | 16 | END; | 0 | 0 | 0
(41 rows)
Profiler
https://bitbucket.org/openscg/plprofiler
● Be careful when you have a
function call another function
– May lead to difficult to diagnose
performance problems
● Be careful when a function is used
in a WHERE clause
– For sequential scans, it may
execute once per row in the table
functions=# SELECT iso_region FROM airports LIMIT 5;
iso_region
------------
US-PA
US-AK
US-AL
US-AR
US-AZ
(5 rows)
Source: http://ourairports.com/data/
Sample Data
CREATE TYPE airport_regions AS (airport_name varchar,
airport_continent varchar,
airport_country varchar,
airport_state varchar);
CREATE OR REPLACE FUNCTION get_airport_regions()
RETURNS SETOF airport_regions AS
$$
BEGIN
RETURN QUERY SELECT name::varchar, continent::varchar,
iso_country::varchar,
split_part(iso_region, '-', 2)::varchar
FROM airports;
END;
$$ LANGUAGE plpgsql;
Set Returning Functions
functions=# SELECT b.num_wildlife_struck
FROM bird_strikes b, state_code s,
get_airport_regions() r
WHERE b.origin_state = s.name
AND s.abbreviation = r.airport_state
AND r.airport_continent = 'NA';
num_wildlife_struck
---------------------
…
Time: 48507.635 ms
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Nested Loop (cost=42.10..318.77 rows=1972 width=2) (actual time=43.468..38467.229 rows=60334427 loops=1)
-> Hash Join (cost=12.81..14.51 rows=1 width=9) (actual time=43.284..58.007 rows=21488 loops=1)
Hash Cond: ((s.abbreviation)::text = (r.airport_state)::text)
-> Seq Scan on state_code s (cost=0.00..1.50 rows=50 width=12) (actual time=0.007..0.045 rows=50
loops=1)
-> Hash (cost=12.75..12.75 rows=5 width=32) (actual time=43.264..43.264 rows=25056 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 857kB
-> Function Scan on get_airport_regions r (cost=0.25..12.75 rows=5 width=32) (actual
time=34.050..39.650 rows=25056 loops=1)
Filter: ((airport_continent)::text = 'NA'::text)
Rows Removed by Filter: 21150
-> Bitmap Heap Scan on bird_strikes b (cost=29.29..288.48 rows=1578 width=10) (actual
time=0.445..1.343 rows=2808 loops=21488)
Recheck Cond: ((origin_state)::text = (s.name)::text)
Heap Blocks: exact=31639334
-> Bitmap Index Scan on bird_strikes_state (cost=0.00..28.89 rows=1578 width=0) (actual
time=0.285..0.285 rows=2808 loops=21488)
Index Cond: ((origin_state)::text = (s.name)::text)
Planning time: 0.742 ms
Execution time: 40447.925 ms
(16 rows)
Time: 40449.209 ms
CREATE OR REPLACE FUNCTION get_airport_regions()
RETURNS SETOF airport_regions AS
$$
BEGIN
RETURN QUERY SELECT name::varchar, continent::varchar,
iso_country::varchar,
split_part(iso_region, '-', 2)::varchar
FROM airports;
END;
$$ LANGUAGE plpgsql
ROWS 46206
COST 600000;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Hash Join (cost=2081.87..7687.83 rows=91120 width=2) (actual time=51.589..7568.729 rows=60334427 loops=1)
Hash Cond: ((b.origin_state)::text = (s.name)::text)
-> Seq Scan on bird_strikes b (cost=0.00..4318.04 rows=99404 width=10) (actual time=0.006..14.207
rows=99404 loops=1)
-> Hash (cost=2081.15..2081.15 rows=58 width=9) (actual time=51.571..51.571 rows=21488 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 861kB
-> Hash Join (cost=1502.12..2081.15 rows=58 width=9) (actual time=37.574..48.385 rows=21488 loops=1)
Hash Cond: ((r.airport_state)::text = (s.abbreviation)::text)
-> Function Scan on get_airport_regions r (cost=1500.00..2077.57 rows=231 width=32) (actual
time=37.526..42.626 rows=25056 loops=1)
Filter: ((airport_continent)::text = 'NA'::text)
Rows Removed by Filter: 21150
-> Hash (cost=1.50..1.50 rows=50 width=12) (actual time=0.041..0.041 rows=50 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 3kB
-> Seq Scan on state_code s (cost=0.00..1.50 rows=50 width=12) (actual time=0.004..0.020
rows=50 loops=1)
Planning time: 0.722 ms
Execution time: 9572.353 ms
(15 rows)
Time: 9573.716 ms
● When using set returning functions as
tables, the row and cost estimates are
usually way off
– Default ROWS: 1000
– Default COST: 100
● Note: COST is in units of
cpu_operator_cost which is 0.0025
● Do not use functions to mask a
bad data model
● Use functions to help load the data
into the correct format
Table Partitioning
● Usually done for performance
● Uses check constraints and inherited tables
● Triggers are preferred over rules so COPY can be used
● Trigger functions used to move the data to the correct
child table
CREATE UNLOGGED TABLE trigger_test (key serial primary key,
value varchar,
insert_ts timestamp,
update_ts timestamp);
CREATE UNLOGGED TABLE trigger_test_0
(CHECK ( key % 5 = 0)) INHERITS (trigger_test);
CREATE UNLOGGED TABLE trigger_test_1
(CHECK ( key % 5 = 1)) INHERITS (trigger_test);
CREATE UNLOGGED TABLE trigger_test_2
(CHECK ( key % 5 = 2)) INHERITS (trigger_test);
CREATE UNLOGGED TABLE trigger_test_3
(CHECK ( key % 5 = 3)) INHERITS (trigger_test);
CREATE UNLOGGED TABLE trigger_test_4
(CHECK ( key % 5 = 4)) INHERITS (trigger_test);
CREATE OR REPLACE FUNCTION partition_trigger() RETURNS trigger AS $$
DECLARE
partition int;
BEGIN
partition = NEW.key % 5;
EXECUTE 'INSERT INTO trigger_test_' || partition || ' VALUES
(($1).*)' USING NEW;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER partition_trigger BEFORE INSERT ON trigger_test
FOR EACH ROW EXECUTE PROCEDURE partition_trigger();
Dynamic Trigger
CREATE OR REPLACE FUNCTION partition_trigger() RETURNS trigger AS $$
BEGIN
CASE NEW.key % 5
WHEN 0 THEN
INSERT INTO trigger_test_0 VALUES (NEW.*);
WHEN 1 THEN
INSERT INTO trigger_test_1 VALUES (NEW.*);
WHEN 2 THEN
INSERT INTO trigger_test_2 VALUES (NEW.*);
WHEN 3 THEN
INSERT INTO trigger_test_3 VALUES (NEW.*);
WHEN 4 THEN
INSERT INTO trigger_test_4 VALUES (NEW.*);
END CASE;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
Case Statement
● 16% performance gain using CASE Statement
● Tested inserting 100,000 rows
Dynamic Trigger Case Trigger
3200
3400
3600
3800
4000
4200
4400
Performance of Partition Triggers
Trigger Overhead
● Triggers get executed when an event
happens in the database
– INSERT, UPDATE, DELETE
● Event Triggers fire on DDL
– CREATE, DROP, ALTER
CREATE UNLOGGED TABLE trigger_test (
key serial primary key,
value varchar,
insert_ts timestamp,
update_ts timestamp
);
INSERTS.pgbench
INSERT INTO trigger_test (value) VALUES (‘hello’);
pgbench -n -t 100000
-f INSERTS.pgbench functions
Inserts: 5191 TPS
CREATE FUNCTION empty_trigger() RETURNS trigger AS $$
BEGIN
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER empty_trigger BEFORE INSERT OR UPDATE ON
trigger_test
FOR EACH ROW EXECUTE PROCEDURE empty_trigger();
pgbench -n -t 100000
-f INSERTS.pgbench functions
Inserts: 4906 TPS (5.5% overhead)
Overhead of PL Languages
● PL/pgSQL
● C
● PL/Perl
● PL/TCL
● PL/Python
● PL/v8
● PL/Lua
● PL/R
● PL/sh
PL/pgSQL
CREATE FUNCTION empty_trigger() RETURNS
trigger AS $$
BEGIN
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
C
#include "postgres.h"
#include "commands/trigger.h"
PG_MODULE_MAGIC;
Datum empty_c_trigger(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(empty_c_trigger);
Datum
empty_c_trigger(PG_FUNCTION_ARGS)
{
TriggerData *tg;
HeapTuple ret;
tg = (TriggerData *) (fcinfo->context);
if (TRIGGER_FIRED_BY_UPDATE(tg->tg_event))
ret = tg->tg_newtuple;
else
ret = tg->tg_trigtuple;
return PointerGetDatum(ret);
}
PL/Python
CREATE FUNCTION empty_python_trigger()
RETURNS trigger AS
$$
return
$$ LANGUAGE plpythonu;
PL/Perl
CREATE FUNCTION empty_perl_trigger()
RETURNS trigger AS
$$
return;
$$ LANGUAGE plperl;
PL/TCL
CREATE FUNCTION empty_tcl_trigger()
RETURNS trigger AS
$$
return [array get NEW]
$$ LANGUAGE pltcl;
PL/v8
CREATE FUNCTION empty_v8_trigger()
RETURNS trigger AS
$$
return NEW;
$$
LANGUAGE plv8;
PL/R
CREATE FUNCTION empty_r_trigger()
RETURNS trigger AS
$$
return(pg.tg.new)
$$ LANGUAGE plr;
PL/Lua
CREATE FUNCTION empty_lua_trigger()
RETURNS trigger AS
$$
return
$$ LANGUAGE pllua;
PL/sh
CREATE FUNCTION empty_sh_trigger()
RETURNS trigger AS
$$
#!/bin/sh
exit 0
$$ LANGUAGE plsh;
C PL/pgSQL PL/Lua PL/Python PL/Perl PL/v8 PL/TCL PL/R PL/sh
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Percent overhead of triggers
● Think things through before
adding server side code
● Performance test your functions
● Don't use a procedural language
just because it's cool
– Use the right tool for the job
Questions?
jimm@openscg.com

More Related Content

What's hot (19)

PPTX
Using Cerberus and PySpark to validate semi-structured datasets
Bartosz Konieczny
 
PPTX
Apache Spark in your likeness - low and high level customization
Bartosz Konieczny
 
PDF
Python sqlite3 - flask
Eueung Mulyana
 
PPTX
Best Practices in Handling Performance Issues
Odoo
 
PDF
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
PPTX
Apache Spark Structured Streaming + Apache Kafka = ♡
Bartosz Konieczny
 
PDF
Perl6 Regexen: Reduce the line noise in your code.
Workhorse Computing
 
PDF
Congfigure python as_ide
Lingfei Kong
 
PDF
Tests unitaires pour PostgreSQL avec pgTap
Rodolphe Quiédeville
 
PDF
pg_proctab: Accessing System Stats in PostgreSQL
Mark Wong
 
PDF
pg_proctab: Accessing System Stats in PostgreSQL
Mark Wong
 
PDF
Troubleshooting PostgreSQL Streaming Replication
Alexey Lesovsky
 
ODP
Building and Incredible Machine with Pipelines and Generators in PHP (IPC Ber...
dantleech
 
PDF
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Workhorse Computing
 
PDF
pg_proctab: Accessing System Stats in PostgreSQL
Mark Wong
 
PDF
Hypers and Gathers and Takes! Oh my!
Workhorse Computing
 
PDF
BSDM with BASH: Command Interpolation
Workhorse Computing
 
PDF
Memory Manglement in Raku
Workhorse Computing
 
Using Cerberus and PySpark to validate semi-structured datasets
Bartosz Konieczny
 
Apache Spark in your likeness - low and high level customization
Bartosz Konieczny
 
Python sqlite3 - flask
Eueung Mulyana
 
Best Practices in Handling Performance Issues
Odoo
 
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
Apache Spark Structured Streaming + Apache Kafka = ♡
Bartosz Konieczny
 
Perl6 Regexen: Reduce the line noise in your code.
Workhorse Computing
 
Congfigure python as_ide
Lingfei Kong
 
Tests unitaires pour PostgreSQL avec pgTap
Rodolphe Quiédeville
 
pg_proctab: Accessing System Stats in PostgreSQL
Mark Wong
 
pg_proctab: Accessing System Stats in PostgreSQL
Mark Wong
 
Troubleshooting PostgreSQL Streaming Replication
Alexey Lesovsky
 
Building and Incredible Machine with Pipelines and Generators in PHP (IPC Ber...
dantleech
 
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Workhorse Computing
 
pg_proctab: Accessing System Stats in PostgreSQL
Mark Wong
 
Hypers and Gathers and Takes! Oh my!
Workhorse Computing
 
BSDM with BASH: Command Interpolation
Workhorse Computing
 
Memory Manglement in Raku
Workhorse Computing
 

Similar to PostgreSQL Procedural Languages: Tips, Tricks and Gotchas (20)

PDF
Oracle APEX Cheat Sheet
Dimitri Gielis
 
PDF
CREATE STATISTICS - what is it for?
Tomas Vondra
 
PDF
PostgreSQL10の新機能 ~ロジカルレプリケーションを中心に~
Atsushi Torikoshi
 
PDF
ETL Patterns with Postgres
Martin Loetzsch
 
DOCX
here is the SQL. PLEASE ONLY START FROM # 6 TO #10 AS I DID #1 TO #5.docx
howard4little59962
 
PDF
Postgres performance for humans
Craig Kerstiens
 
PDF
Function Procedure Trigger Partition.pdf
Sanam Maharjan
 
PPTX
How to tune a query - ODTUG 2012
Connor McDonald
 
PPT
98765432345671223Intro-to-PostgreSQL.ppt
HastavaramDineshKuma
 
PDF
Practical SQL A Beginner s Guide to Storytelling with Data 2nd Edition Anthon...
somiapaddas3
 
PPT
A brief introduction to PostgreSQL
Vu Hung Nguyen
 
PDF
PostgreSQL: Data analysis and analytics
Hans-Jürgen Schönig
 
PDF
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
PROIDEA
 
PDF
CREATE STATISTICS - What is it for? (PostgresLondon)
Tomas Vondra
 
PPTX
Greenplum 6 Changes
VMware Tanzu
 
PDF
Practical SQL: A Beginner's Guide to Storytelling with Data, 2nd Edition Anth...
rostjareon
 
DOCX
PL/SQL Code for Sample Projects
jwjablonski
 
PPTX
Sql analytic queries tips
Vedran Bilopavlović
 
PDF
Becoming a better developer with EXPLAIN
Louise Grandjonc
 
PDF
10 Reasons to Start Your Analytics Project with PostgreSQL
Satoshi Nagayasu
 
Oracle APEX Cheat Sheet
Dimitri Gielis
 
CREATE STATISTICS - what is it for?
Tomas Vondra
 
PostgreSQL10の新機能 ~ロジカルレプリケーションを中心に~
Atsushi Torikoshi
 
ETL Patterns with Postgres
Martin Loetzsch
 
here is the SQL. PLEASE ONLY START FROM # 6 TO #10 AS I DID #1 TO #5.docx
howard4little59962
 
Postgres performance for humans
Craig Kerstiens
 
Function Procedure Trigger Partition.pdf
Sanam Maharjan
 
How to tune a query - ODTUG 2012
Connor McDonald
 
98765432345671223Intro-to-PostgreSQL.ppt
HastavaramDineshKuma
 
Practical SQL A Beginner s Guide to Storytelling with Data 2nd Edition Anthon...
somiapaddas3
 
A brief introduction to PostgreSQL
Vu Hung Nguyen
 
PostgreSQL: Data analysis and analytics
Hans-Jürgen Schönig
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
PROIDEA
 
CREATE STATISTICS - What is it for? (PostgresLondon)
Tomas Vondra
 
Greenplum 6 Changes
VMware Tanzu
 
Practical SQL: A Beginner's Guide to Storytelling with Data, 2nd Edition Anth...
rostjareon
 
PL/SQL Code for Sample Projects
jwjablonski
 
Sql analytic queries tips
Vedran Bilopavlović
 
Becoming a better developer with EXPLAIN
Louise Grandjonc
 
10 Reasons to Start Your Analytics Project with PostgreSQL
Satoshi Nagayasu
 
Ad

More from Jim Mlodgenski (8)

PDF
Debugging Your PL/pgSQL Code
Jim Mlodgenski
 
PDF
An Introduction To PostgreSQL Triggers
Jim Mlodgenski
 
ODP
Introduction to PostgreSQL
Jim Mlodgenski
 
ODP
Postgresql Federation
Jim Mlodgenski
 
PPT
Leveraging Hadoop in your PostgreSQL Environment
Jim Mlodgenski
 
PDF
Scaling PostreSQL with Stado
Jim Mlodgenski
 
ODP
Multi-Master Replication with Slony
Jim Mlodgenski
 
ODP
Scaling PostgreSQL With GridSQL
Jim Mlodgenski
 
Debugging Your PL/pgSQL Code
Jim Mlodgenski
 
An Introduction To PostgreSQL Triggers
Jim Mlodgenski
 
Introduction to PostgreSQL
Jim Mlodgenski
 
Postgresql Federation
Jim Mlodgenski
 
Leveraging Hadoop in your PostgreSQL Environment
Jim Mlodgenski
 
Scaling PostreSQL with Stado
Jim Mlodgenski
 
Multi-Master Replication with Slony
Jim Mlodgenski
 
Scaling PostgreSQL With GridSQL
Jim Mlodgenski
 
Ad

Recently uploaded (20)

PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 

PostgreSQL Procedural Languages: Tips, Tricks and Gotchas

  • 2. Who Am I? ● Jim Mlodgenski – [email protected] – @jim_mlodgenski ● Co-organizer of – NYC PUG (www.nycpug.org) – Philly PUG (www.phlpug.org) ● CTO, OpenSCG – www.openscg.com
  • 4. Stored procedures/functions ● Code that runs inside of the database ● Used for: – Performance – Security – Convenience
  • 5. functions=# SELECT airport FROM bird_strikes LIMIT 5; airport -------------------------- NEWARK LIBERTY INTL ARPT UNKNOWN DENVER INTL AIRPORT CHICAGO O'HARE INTL ARPT JOHN F KENNEDY INTL (5 rows) Source: http://wildlife.faa.gov/ Sample Data
  • 6. functions=# SELECT count(*) functions-# FROM bird_strikes functions-# WHERE get_iata_code_from_abbr_name(airport) = 'LAX'; count ------- 850 (1 row) Time: 13490.611 ms Data Formatting Functions
  • 7. functions=# EXPLAIN ANALYZE SELECT count(*) FROM bird_strikes ... QUERY PLAN ------------------------------------------------------------------------ Aggregate (cost=29418.79..29418.80 rows=1 width=0) (actual time=13463.628..13463.629 rows=1 loops=1) -> Seq Scan on bird_strikes (cost=0.00..29417.55 rows=497 width=0) (actual time=15.721..13463.293 rows=850 loops=1) Filter: ((get_iata_code_from_abbr_name(airport))::text = 'LAX'::text) Rows Removed by Filter: 98554 Planning time: 0.124 ms Execution time: 13463.682 ms (6 rows) Check Performance
  • 8. functions=# set track_functions = 'pl'; SET functions=# select * from pg_stat_user_functions; (No rows) functions=# SELECT count(*) FROM bird_strikes ... -[ RECORD 1 ] count | 850 Track Function Usage
  • 9. functions=# select * from pg_stat_user_functions; -[ RECORD 1 ]---------------------------- funcid | 41247 schemaname | public funcname | get_iata_code_from_name calls | 88547 total_time | 12493.419 self_time | 12493.419 -[ RECORD 2 ]---------------------------- funcid | 41246 schemaname | public funcname | get_iata_code_from_abbr_name calls | 99404 total_time | 13977.674 self_time | 1484.255 Isolate Performance Issues
  • 10. CREATE OR REPLACE FUNCTION get_iata_code_from_abbr_name(abbr_name varchar) RETURNS varchar AS $$ DECLARE working_name varchar; code varchar := null; BEGIN working_name := upper(abbr_name); IF working_name = 'UNKNOWN' THEN RETURN null; END IF; working_name := replace(working_name, 'INTL', 'INTERNATIONAL'); working_name := replace(working_name, 'ARPT', 'AIRPORT'); working_name := replace(working_name, 'MUNI', 'MUNICIPAL'); working_name := replace(working_name, 'METRO', 'METROPOLITAN'); working_name := replace(working_name, 'NATL', 'NATIONAL'); working_name := replace(working_name, '-', ' '); working_name := replace(working_name, '/', ' '); working_name := working_name || '%'; code := get_iata_code_from_name(working_name); RETURN code; END; $$ LANGUAGE plpgsql;
  • 11. CREATE OR REPLACE FUNCTION get_iata_code_from_name(airport_name varchar) RETURNS varchar AS $$ DECLARE working_name varchar; code varchar := null; BEGIN working_name := upper(airport_name); EXECUTE $__$ SELECT iata_code FROM airports WHERE upper(name) LIKE $1 $__$ INTO code USING working_name; RETURN code; END; $$ LANGUAGE plpgsql;
  • 13. functions=# select * from pl_profiler ; func_oid | line_number | line | exec_count | total_time | longest_time ----------+-------------+---------------------------------------------------------------------+------------+------------+-------------- 41246 | 1 | | 0 | 0 | 0 41246 | 2 | DECLARE | 0 | 0 | 0 41246 | 3 | working_name varchar; | 0 | 0 | 0 41246 | 4 | code varchar := null; | 0 | 0 | 0 41246 | 5 | BEGIN | 0 | 0 | 0 41246 | 6 | working_name := upper(abbr_name); | 99404 | 210587 | 363 41246 | 7 | | 0 | 0 | 0 41246 | 8 | IF working_name = 'UNKNOWN' THEN | 99404 | 63406 | 97 41246 | 9 | RETURN null; | 10857 | 2744 | 15 41246 | 10 | END IF; | 0 | 0 | 0 41246 | 11 | | 0 | 0 | 0 41246 | 12 | working_name := replace(working_name, 'INTL', 'INTERNATIONAL'); | 88547 | 116474 | 145 41246 | 13 | working_name := replace(working_name, 'ARPT', 'AIRPORT'); | 88547 | 83015 | 91 41246 | 14 | working_name := replace(working_name, 'MUNI', 'MUNICIPAL'); | 88547 | 70676 | 74 41246 | 15 | working_name := replace(working_name, 'METRO', 'METROPOLITAN'); | 88547 | 67392 | 63 41246 | 16 | working_name := replace(working_name, 'NATL', 'NATIONAL'); | 88547 | 64681 | 70 41246 | 17 | | 0 | 0 | 0 41246 | 18 | working_name := replace(working_name, '-', ' '); | 88547 | 66771 | 62 41246 | 19 | working_name := replace(working_name, '/', ' '); | 88547 | 65054 | 66 41246 | 20 | working_name := working_name || '%'; | 88547 | 64892 | 207 41246 | 21 | | 0 | 0 | 0 41246 | 22 | code := get_iata_code_from_name(working_name); | 88547 | 12282997 | 3709 41246 | 23 | | 0 | 0 | 0 41246 | 24 | RETURN code; | 88547 | 33374 | 14 41246 | 25 | END; | 0 | 0 | 0 41247 | 1 | | 0 | 0 | 0 41247 | 2 | DECLARE | 0 | 0 | 0 41247 | 3 | working_name varchar; | 0 | 0 | 0 41247 | 4 | code varchar := null; | 0 | 0 | 0 41247 | 5 | BEGIN | 0 | 0 | 0 41247 | 6 | working_name := upper(airport_name); | 88547 | 170273 | 90 41247 | 7 | | 0 | 0 | 0 41247 | 8 | EXECUTE $__$ SELECT iata_code | 88547 | 11572604 | 3273 41247 | 9 | FROM airports | 0 | 0 | 0 41247 | 10 | WHERE upper(name) LIKE $1 | 0 | 0 | 0 41247 | 11 | $__$ | 0 | 0 | 0 41247 | 12 | INTO code | 0 | 0 | 0 41247 | 13 | USING working_name; | 0 | 0 | 0 41247 | 14 | | 0 | 0 | 0 41247 | 15 | RETURN code; | 88547 | 121574 | 27 41247 | 16 | END; | 0 | 0 | 0 (41 rows) Profiler https://bitbucket.org/openscg/plprofiler
  • 14. ● Be careful when you have a function call another function – May lead to difficult to diagnose performance problems ● Be careful when a function is used in a WHERE clause – For sequential scans, it may execute once per row in the table
  • 15. functions=# SELECT iso_region FROM airports LIMIT 5; iso_region ------------ US-PA US-AK US-AL US-AR US-AZ (5 rows) Source: http://ourairports.com/data/ Sample Data
  • 16. CREATE TYPE airport_regions AS (airport_name varchar, airport_continent varchar, airport_country varchar, airport_state varchar); CREATE OR REPLACE FUNCTION get_airport_regions() RETURNS SETOF airport_regions AS $$ BEGIN RETURN QUERY SELECT name::varchar, continent::varchar, iso_country::varchar, split_part(iso_region, '-', 2)::varchar FROM airports; END; $$ LANGUAGE plpgsql; Set Returning Functions
  • 17. functions=# SELECT b.num_wildlife_struck FROM bird_strikes b, state_code s, get_airport_regions() r WHERE b.origin_state = s.name AND s.abbreviation = r.airport_state AND r.airport_continent = 'NA'; num_wildlife_struck --------------------- … Time: 48507.635 ms
  • 18. QUERY PLAN ----------------------------------------------------------------------------------------------------------- Nested Loop (cost=42.10..318.77 rows=1972 width=2) (actual time=43.468..38467.229 rows=60334427 loops=1) -> Hash Join (cost=12.81..14.51 rows=1 width=9) (actual time=43.284..58.007 rows=21488 loops=1) Hash Cond: ((s.abbreviation)::text = (r.airport_state)::text) -> Seq Scan on state_code s (cost=0.00..1.50 rows=50 width=12) (actual time=0.007..0.045 rows=50 loops=1) -> Hash (cost=12.75..12.75 rows=5 width=32) (actual time=43.264..43.264 rows=25056 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 857kB -> Function Scan on get_airport_regions r (cost=0.25..12.75 rows=5 width=32) (actual time=34.050..39.650 rows=25056 loops=1) Filter: ((airport_continent)::text = 'NA'::text) Rows Removed by Filter: 21150 -> Bitmap Heap Scan on bird_strikes b (cost=29.29..288.48 rows=1578 width=10) (actual time=0.445..1.343 rows=2808 loops=21488) Recheck Cond: ((origin_state)::text = (s.name)::text) Heap Blocks: exact=31639334 -> Bitmap Index Scan on bird_strikes_state (cost=0.00..28.89 rows=1578 width=0) (actual time=0.285..0.285 rows=2808 loops=21488) Index Cond: ((origin_state)::text = (s.name)::text) Planning time: 0.742 ms Execution time: 40447.925 ms (16 rows) Time: 40449.209 ms
  • 19. CREATE OR REPLACE FUNCTION get_airport_regions() RETURNS SETOF airport_regions AS $$ BEGIN RETURN QUERY SELECT name::varchar, continent::varchar, iso_country::varchar, split_part(iso_region, '-', 2)::varchar FROM airports; END; $$ LANGUAGE plpgsql ROWS 46206 COST 600000;
  • 20. QUERY PLAN -------------------------------------------------------------------------------------------------------------- Hash Join (cost=2081.87..7687.83 rows=91120 width=2) (actual time=51.589..7568.729 rows=60334427 loops=1) Hash Cond: ((b.origin_state)::text = (s.name)::text) -> Seq Scan on bird_strikes b (cost=0.00..4318.04 rows=99404 width=10) (actual time=0.006..14.207 rows=99404 loops=1) -> Hash (cost=2081.15..2081.15 rows=58 width=9) (actual time=51.571..51.571 rows=21488 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 861kB -> Hash Join (cost=1502.12..2081.15 rows=58 width=9) (actual time=37.574..48.385 rows=21488 loops=1) Hash Cond: ((r.airport_state)::text = (s.abbreviation)::text) -> Function Scan on get_airport_regions r (cost=1500.00..2077.57 rows=231 width=32) (actual time=37.526..42.626 rows=25056 loops=1) Filter: ((airport_continent)::text = 'NA'::text) Rows Removed by Filter: 21150 -> Hash (cost=1.50..1.50 rows=50 width=12) (actual time=0.041..0.041 rows=50 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 3kB -> Seq Scan on state_code s (cost=0.00..1.50 rows=50 width=12) (actual time=0.004..0.020 rows=50 loops=1) Planning time: 0.722 ms Execution time: 9572.353 ms (15 rows) Time: 9573.716 ms
  • 21. ● When using set returning functions as tables, the row and cost estimates are usually way off – Default ROWS: 1000 – Default COST: 100 ● Note: COST is in units of cpu_operator_cost which is 0.0025
  • 22. ● Do not use functions to mask a bad data model ● Use functions to help load the data into the correct format
  • 23. Table Partitioning ● Usually done for performance ● Uses check constraints and inherited tables ● Triggers are preferred over rules so COPY can be used ● Trigger functions used to move the data to the correct child table
  • 24. CREATE UNLOGGED TABLE trigger_test (key serial primary key, value varchar, insert_ts timestamp, update_ts timestamp); CREATE UNLOGGED TABLE trigger_test_0 (CHECK ( key % 5 = 0)) INHERITS (trigger_test); CREATE UNLOGGED TABLE trigger_test_1 (CHECK ( key % 5 = 1)) INHERITS (trigger_test); CREATE UNLOGGED TABLE trigger_test_2 (CHECK ( key % 5 = 2)) INHERITS (trigger_test); CREATE UNLOGGED TABLE trigger_test_3 (CHECK ( key % 5 = 3)) INHERITS (trigger_test); CREATE UNLOGGED TABLE trigger_test_4 (CHECK ( key % 5 = 4)) INHERITS (trigger_test);
  • 25. CREATE OR REPLACE FUNCTION partition_trigger() RETURNS trigger AS $$ DECLARE partition int; BEGIN partition = NEW.key % 5; EXECUTE 'INSERT INTO trigger_test_' || partition || ' VALUES (($1).*)' USING NEW; RETURN NULL; END; $$ LANGUAGE plpgsql; CREATE TRIGGER partition_trigger BEFORE INSERT ON trigger_test FOR EACH ROW EXECUTE PROCEDURE partition_trigger(); Dynamic Trigger
  • 26. CREATE OR REPLACE FUNCTION partition_trigger() RETURNS trigger AS $$ BEGIN CASE NEW.key % 5 WHEN 0 THEN INSERT INTO trigger_test_0 VALUES (NEW.*); WHEN 1 THEN INSERT INTO trigger_test_1 VALUES (NEW.*); WHEN 2 THEN INSERT INTO trigger_test_2 VALUES (NEW.*); WHEN 3 THEN INSERT INTO trigger_test_3 VALUES (NEW.*); WHEN 4 THEN INSERT INTO trigger_test_4 VALUES (NEW.*); END CASE; RETURN NULL; END; $$ LANGUAGE plpgsql; Case Statement
  • 27. ● 16% performance gain using CASE Statement ● Tested inserting 100,000 rows Dynamic Trigger Case Trigger 3200 3400 3600 3800 4000 4200 4400 Performance of Partition Triggers
  • 28. Trigger Overhead ● Triggers get executed when an event happens in the database – INSERT, UPDATE, DELETE ● Event Triggers fire on DDL – CREATE, DROP, ALTER
  • 29. CREATE UNLOGGED TABLE trigger_test ( key serial primary key, value varchar, insert_ts timestamp, update_ts timestamp ); INSERTS.pgbench INSERT INTO trigger_test (value) VALUES (‘hello’);
  • 30. pgbench -n -t 100000 -f INSERTS.pgbench functions Inserts: 5191 TPS
  • 31. CREATE FUNCTION empty_trigger() RETURNS trigger AS $$ BEGIN RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER empty_trigger BEFORE INSERT OR UPDATE ON trigger_test FOR EACH ROW EXECUTE PROCEDURE empty_trigger();
  • 32. pgbench -n -t 100000 -f INSERTS.pgbench functions Inserts: 4906 TPS (5.5% overhead)
  • 33. Overhead of PL Languages ● PL/pgSQL ● C ● PL/Perl ● PL/TCL ● PL/Python ● PL/v8 ● PL/Lua ● PL/R ● PL/sh
  • 34. PL/pgSQL CREATE FUNCTION empty_trigger() RETURNS trigger AS $$ BEGIN RETURN NEW; END; $$ LANGUAGE plpgsql;
  • 35. C #include "postgres.h" #include "commands/trigger.h" PG_MODULE_MAGIC; Datum empty_c_trigger(PG_FUNCTION_ARGS); PG_FUNCTION_INFO_V1(empty_c_trigger); Datum empty_c_trigger(PG_FUNCTION_ARGS) { TriggerData *tg; HeapTuple ret; tg = (TriggerData *) (fcinfo->context); if (TRIGGER_FIRED_BY_UPDATE(tg->tg_event)) ret = tg->tg_newtuple; else ret = tg->tg_trigtuple; return PointerGetDatum(ret); }
  • 36. PL/Python CREATE FUNCTION empty_python_trigger() RETURNS trigger AS $$ return $$ LANGUAGE plpythonu;
  • 37. PL/Perl CREATE FUNCTION empty_perl_trigger() RETURNS trigger AS $$ return; $$ LANGUAGE plperl;
  • 38. PL/TCL CREATE FUNCTION empty_tcl_trigger() RETURNS trigger AS $$ return [array get NEW] $$ LANGUAGE pltcl;
  • 39. PL/v8 CREATE FUNCTION empty_v8_trigger() RETURNS trigger AS $$ return NEW; $$ LANGUAGE plv8;
  • 40. PL/R CREATE FUNCTION empty_r_trigger() RETURNS trigger AS $$ return(pg.tg.new) $$ LANGUAGE plr;
  • 41. PL/Lua CREATE FUNCTION empty_lua_trigger() RETURNS trigger AS $$ return $$ LANGUAGE pllua;
  • 42. PL/sh CREATE FUNCTION empty_sh_trigger() RETURNS trigger AS $$ #!/bin/sh exit 0 $$ LANGUAGE plsh;
  • 43. C PL/pgSQL PL/Lua PL/Python PL/Perl PL/v8 PL/TCL PL/R PL/sh 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% Percent overhead of triggers
  • 44. ● Think things through before adding server side code ● Performance test your functions ● Don't use a procedural language just because it's cool – Use the right tool for the job