PostgreSQL, The Big, The Fast
and The (NOSQL on ) Acid
Federico Campoli
09 Jan 2016
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 1 / 40
Table of contents
1 The Big
2 The Fast
3 The (NOSQL on) Acid
4 Wrap up
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 2 / 40
Table of contents
1 The Big
2 The Fast
3 The (NOSQL on) Acid
4 Wrap up
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 3 / 40
The Big
Image by Caitlin - https://www.flickr.com/photos/lizard queen
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 4 / 40
PostgreSQL, an history of excellence
Created at Berkeley in 1982 by database’s legend Prof. Stonebraker
In the 1994 Andrew Yu and Jolly Chen added the SQL interpreter
In the 1996 becomes an Open Source project.
The project’s name changes in PostgreSQL
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 5 / 40
PostgreSQL, an history of excellence
Created at Berkeley in 1982 by database’s legend Prof. Stonebraker
In the 1994 Andrew Yu and Jolly Chen added the SQL interpreter
In the 1996 becomes an Open Source project.
The project’s name changes in PostgreSQL
Fully ACID compliant
High performance in read/write with the MVCC
Tablespaces
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 5 / 40
PostgreSQL, an history of excellence
Created at Berkeley in 1982 by database’s legend Prof. Stonebraker
In the 1994 Andrew Yu and Jolly Chen added the SQL interpreter
In the 1996 becomes an Open Source project.
The project’s name changes in PostgreSQL
Fully ACID compliant
High performance in read/write with the MVCC
Tablespaces
Runs on almost any unix flavour
From the version 8.0 is native on *cough* MS Windows *cough*
HA with hot standby and streaming replication
Heterogeneous federation
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 5 / 40
PostgreSQL, an history of excellence
Created at Berkeley in 1982 by database’s legend Prof. Stonebraker
In the 1994 Andrew Yu and Jolly Chen added the SQL interpreter
In the 1996 becomes an Open Source project.
The project’s name changes in PostgreSQL
Fully ACID compliant
High performance in read/write with the MVCC
Tablespaces
Runs on almost any unix flavour
From the version 8.0 is native on *cough* MS Windows *cough*
HA with hot standby and streaming replication
Heterogeneous federation
Procedural languages (pl/pgsql, pl/python, pl/perl...)
Support for NOSQL features like HSTORE and JSON
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 5 / 40
Development
Old ugly C language
New development cycle starts usually in June
New version released usually by the end of the year
At least 4 LTS versions
Can be extended using shared libraries
Extensions (from the version9.1)
BSD like license
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 6 / 40
Limits
Database size. No limits.
Table size, 32 TB
Row size 1.6 TB
Rows in table. No limits.
Fields in table 250 - 1600 depending on data type.
Tables in a database. No limits.
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 7 / 40
Data types
Alongside the general purpose data types PostgreSQL have some exotic types.
Range (integers, date)
Geometric (points, lines etc.)
Network addresses
XML
JSON
HSTORE (extension)
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 8 / 40
Surprise surprise!
PostgreSQL 9.5, UPSERT, Row Level Security, and Big Data
Release date: 2016-01-07
UPSERT aka INSERT, ON CONFLICT UPDATE
Row Level Security, allows security ”policies” which filter which rows
particular users are allowed to update or view.
BIG DATA!
BRIN - Block Range Indices
CUBE, ROLLUP and GROUPING SETS
IMPORT FOREIGN SCHEMA
TABLESAMPLE
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 9 / 40
Table of contents
1 The Big
2 The Fast
3 The (NOSQL on) Acid
4 Wrap up
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 10 / 40
The Fast
Image by Hein Waschefort -
http://commons.wikimedia.org/wiki/User:Hein waschefort
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 11 / 40
Page layout
A PostgreSQL’s data file is an array of fixed length blocks called pages. The
default size is 8kb.
Figure : Index page
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 12 / 40
Page layout
Each page have an header used to enforce the durability, and the optional page’s
checksum. There are some pointers used to track the free space inside the page.
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 13 / 40
Tuple layout
Just after the header there is a list of pointers to the physical tuples stored in the
page’s end. Each tuple is and array of raw data, called datum. The nature of this
datum is unknown to the postgres process. The datum becomes the data type
when PostgreSQL loads the page in memory. This requires a system catalogue
look up.
Figure : Tuple structure
The tuple’s header is used in the MVCC.
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 14 / 40
The magic of the MVCC
Any operation in PostgreSQL happens through transactions.
By default when a single statement is successfully completed the database
commits automatically the transaction.
It’s possible to wrap multiple statements in a single transaction using the
keywords [BEGIN;]....... [COMMIT; ROLLBACK]
The minimal possible level the transaction isolation is READ COMMITTED.
PostgreSQL from 9.2 supports the snapshot export to other sessions.
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 15 / 40
There’s no such thing like an update
Where’s the catch?
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 16 / 40
There’s no such thing like an update
Where’s the catch?
PostgreSQL actually NEVER performs an update.
The UPDATE behaviour is to add a new row version and to keep the old one for
read consistency.
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 16 / 40
Dead tuples and VACUUM
The tuples left in place for read consistency are called dead.
A dead tuple is left in place for any transaction that should see it. This adds
overhead to any I/O operation.
VACUUM clears the dead tuples
VACUUM is designed to have the minimal impact on the database normal
activity
VACUUM removes only dead tuples no longer visible to the open transactions
VACUUM prevents the xid wraparound failure
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 17 / 40
Table of contents
1 The Big
2 The Fast
3 The (NOSQL on) Acid
4 Wrap up
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 18 / 40
The (NOSQL on) Acid
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 19 / 40
JSON
JSON - JavaScript Object Notation
The version 9.2 adds JSON as native data type
The version 9.3 adds the support functions for JSON
JSON is stored as text
JSON is parsed and validated on the fly
The 9.4 adds JSONB (binary) data type
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 20 / 40
JSON
JSON - Examples
From record to JSON
postgres=# SELECT row_to_json(ROW(1,’foo’));
row_to_json
---------------------
{"f1":1,"f2":"foo"}
(1 row)
Expanding JSON into key to value elements
postgres=# SELECT * from json_each(’{"a":"foo", "b":"bar"}’);
key | value
-----+-------
a | "foo"
b | "bar"
(2 rows)
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 21 / 40
HSTORE
HSTORE is a custom data type used to store key to value items
Is an extension
Data stored as text
A shared library does the magic transforming the datum in HSTORE
Is similar to JSON without nested elements
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 22 / 40
HSTORE
HSTORE - Examples
From record to HSTORE
postgres=# SELECT hstore(ROW(1,2));
hstore
----------------------
"f1"=>"1", "f2"=>"2"
(1 row)
HSTORE expansion to key to value elements
postgres=# SELECT * FROM each(’a=>1,b=>2’);
key | value
-----+-------
a | 1
b | 2
(2 rows)
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 23 / 40
JSON and HSTORE
There is a subtile difference between HSTORE and JSON. HSTORE is not a
native data type.
The JSON is a native data type and the conversion happens inside the postgres
process.
The HSTORE requires the access to the shared library.
Because the conversion from the raw datum happens for each tuple loaded in the
shared buffer can affect the performance’s overall.
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 24 / 40
JSONB
Because JSON is parsed and validated on the fly and and this can be a bottleneck.
The new JSONB introduced with PostgreSQL 9.4 is parsed, validated and
transformed at insert/update’s time. The access is then faster than the plain
JSON but the storage cost can be higher.
The functions available for JSON are also available in the JSONB flavour.
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 25 / 40
Some numbers
Let’s create three tables with text,json and jsonb type fields.
Each record contains the same json element generated on
http://beta.json-generator.com/4kwCt-fwg
[ {
"_id": "56891aba27402de7f551bc91",
"index": 0,
"guid": "b9345045-1222-4f71-9540-6ed7c8d2ccae",
"isActive": false,
............
3,
{
"id": 1,
"name": "Bridgett Shaw"
}
],
"greeting": "Hello, Johnston! You have 8 unread messages.",
"favoriteFruit": "apple"
}
]
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 26 / 40
Some numbers
DROP TABLE IF EXISTS t_json ;
DROP TABLE IF EXISTS t_jsonb ;
DROP TABLE IF EXISTS t_text ;
CREATE TABLE t_json as
SELECT
’<JSON ELEMENT >’:: json as js_value
FROM
generate_series (1 ,100000);
Query returned successfully : 100000 rows affected , 14504 ms execution time.
CREATE TABLE t_text as
SELECT
’<JSON ELEMENT >’:: text as t_value
FROM
generate_series (1 ,100000);
Query returned successfully : 100000 rows affected , 14330 ms execution time.
CREATE TABLE t_jsonb as
SELECT
’<JSON ELEMENT >’:: jsonb as jsb_value
FROM
generate_series (1 ,100000);
Query returned successfully : 100000 rows affected , 14060 ms execution time.
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 27 / 40
Table size
SELECT
pg_size_pretty ( pg_total_relation_size (oid)),
relname
FROM
pg_class
WHERE
relname LIKE ’t_%’
;
pg_size_pretty | relname
-- --------------+---------
270 MB | t_json
322 MB | t_jsonb
270 MB | t_text
(3 rows)
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 28 / 40
Sequential scans
TEXT
EXPLAIN (BUFFERS , ANALYZE) SELECT * FROM t_text;
Seq Scan on t_text (cost =0.00..1637.00 rows =100000 width =18) (actual time
=0.016..17.624 rows =100000 loops =1)
Buffers: shared hit =637
Planning time: 0.040 ms
Execution time: 28.967 ms
(4 rows)
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 29 / 40
Sequential scans
JSON
EXPLAIN (BUFFERS , ANALYZE) SELECT * FROM t_json;
Seq Scan on t_json (cost =0.00..1637.09 rows =100009 width =32) (actual time
=0.018..15.443 rows =100000 loops =1)
Buffers: shared hit =637
Planning time: 0.045 ms
Execution time: 25.268 ms
(4 rows)
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 30 / 40
Sequential scans
JSONB
EXPLAIN (BUFFERS , ANALYZE) SELECT * FROM t_jsonb;
Seq Scan on t_jsonb (cost =0.00..1637.00 rows =100000 width =18) (actual time
=0.015..18.943 rows =100000 loops =1)
Buffers: shared hit =637
Planning time: 0.043 ms
Execution time: 31.072 ms
(4 rows)
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 31 / 40
Sequential scan with json access
TEXT
EXPLAIN (BUFFERS , ANALYZE) SELECT t_value ::json ->’index ’ FROM t_text;
Seq Scan on t_text (cost =0.00..2387.00 rows =100000 width =18) (actual time
=0.159..7748.381 rows =100000 loops =1)
Buffers: shared hit =401729
Planning time: 0.028 ms
Execution time: 7760.263 ms
(4 rows)
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 32 / 40
Sequential scan with json access
JSON
EXPLAIN (BUFFERS , ANALYZE) SELECT js_value ->’index ’ FROM t_json;
Seq Scan on t_json (cost =0.00..1887.11 rows =100009 width =32) (actual time
=0.254..5787.267 rows =100000 loops =1)
Buffers: shared hit =401730
Planning time: 0.044 ms
Execution time: 5798.153 ms
(4 rows)
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 33 / 40
Sequential scan with json access
JSONB
EXPLAIN (BUFFERS , ANALYZE) SELECT jsb_value ->’index ’ FROM t_jsonb;
Seq Scan on t_jsonb (cost =0.00..1887.00 rows =100000 width =18) (actual time
=0.138..1678.222 rows =100000 loops =1)
Buffers: shared hit =421729
Planning time: 0.048 ms
Execution time: 1688.752 ms
(4 rows)
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 34 / 40
Table of contents
1 The Big
2 The Fast
3 The (NOSQL on) Acid
4 Wrap up
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 35 / 40
Wrap up
Schema less data are useful. They are flexible and powerful.
Never forget the update strategy in PostgreSQL
The lack of horizontal scalability in PostgreSQL can be a serious problem.
An interesting project for a distributed cluster is PostgreSQL XL -
http://www.postgres-xl.org/
CitusDB is a powerful DWH oriented DBMS with horizontal scale capabilities
- should become open source in the next release
Never forget PostgreSQL is a RDBMS
Get a DBA on board
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 36 / 40
Questions
Questions?
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 37 / 40
Contacts
Twitter: 4thdoctor scarf
Personal blog: http://www.pgdba.co.uk
PostgreSQL Book:
http://www.slideshare.net/FedericoCampoli/postgresql-dba-01
Brighton PostgreSQL Meetup:
http://www.meetup.com/Brighton-PostgreSQL-Meetup/
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 38 / 40
License and copyright
This presentation is licensed under the terms of the Creative Commons
Attribution NonCommercial ShareAlike 4.0
http://creativecommons.org/licenses/by-nc-sa/4.0/
The elephant photo is copyright by Caitlin -
https://www.flickr.com/photos/lizard queen
The cheetah photo is copyright by Hein Waschefort -
http://commons.wikimedia.org/wiki/User:Hein waschefort
The elephant logos are copyright of the PostgreSQL Global Development
Group - http://www.postgresql.org/
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 39 / 40
PostgreSQL, The Big, The Fast
and The (NOSQL on ) Acid
Federico Campoli
09 Jan 2016
Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 40 / 40

More Related Content

PDF
The hitchhiker's guide to PostgreSQL
PDF
Don't panic! - Postgres introduction
PDF
PostgreSql query planning and tuning
PDF
PostgreSQL - backup and recovery with large databases
PDF
A couple of things about PostgreSQL...
PDF
Life on a_rollercoaster
PDF
PostgreSQL, the big the fast and the (NOSQL on) Acid
PDF
Backup recovery with PostgreSQL
The hitchhiker's guide to PostgreSQL
Don't panic! - Postgres introduction
PostgreSql query planning and tuning
PostgreSQL - backup and recovery with large databases
A couple of things about PostgreSQL...
Life on a_rollercoaster
PostgreSQL, the big the fast and the (NOSQL on) Acid
Backup recovery with PostgreSQL

What's hot (20)

PDF
a look at the postgresql engine
PDF
The ninja elephant, scaling the analytics database in Transwerwise
PDF
Hitchikers guide handout
PDF
Pg chameleon MySQL to PostgreSQL replica
PDF
The ninja elephant, scaling the analytics database in Transwerwise
PDF
Introduction to PostgreSQL
ODP
NoSQL and Triple Stores
PDF
Solving PostgreSQL wicked problems
PDF
Varnishing Search Perfromance
PDF
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
ODP
Bio2RDF@BH2010
PDF
Pg 95 new capabilities
PDF
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
PDF
An Introduction to SPARQL
PPTX
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
PPTX
Making sense of performance and identifying stragglers in Data Analytics Fram...
PDF
RDFox Poster
PDF
PDF
Search at Twitter
PPT
From SQL to SPARQL
a look at the postgresql engine
The ninja elephant, scaling the analytics database in Transwerwise
Hitchikers guide handout
Pg chameleon MySQL to PostgreSQL replica
The ninja elephant, scaling the analytics database in Transwerwise
Introduction to PostgreSQL
NoSQL and Triple Stores
Solving PostgreSQL wicked problems
Varnishing Search Perfromance
(An Overview on) Linked Data Management and SPARQL Querying (ISSLOD2011)
Bio2RDF@BH2010
Pg 95 new capabilities
The Art of Database Experiments – PostgresConf Silicon Valley 2018 / San Jose
An Introduction to SPARQL
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Making sense of performance and identifying stragglers in Data Analytics Fram...
RDFox Poster
Search at Twitter
From SQL to SPARQL
Ad

Viewers also liked (16)

PDF
Postgresql database administration volume 1
PDF
Streaming replication
DOCX
Tension superficial de liquidos
PPTX
Combustion caldera
PDF
Programa y fomentacion sensibilizacion
DOCX
Van ness capitulo 3 orihuela contreras jose
DOCX
Van ness problemas termo cap 1 orihuela contreras jose
DOCX
Interpretación topográfica y elementos básicos de foto interpretación
PPTX
Motor stirling de combustion externa
DOCX
Van ness capitulo 3 orihuela contreras jose
PDF
Dibujo tecnico 2
PPTX
Motores de combustion interna de cuatro tiempos
PDF
PostgreSQL, performance for queries with grouping
PDF
PUGS Meetup Presentation - 11062015
PDF
SP DIT Bonding Day - 05062015
PDF
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
Postgresql database administration volume 1
Streaming replication
Tension superficial de liquidos
Combustion caldera
Programa y fomentacion sensibilizacion
Van ness capitulo 3 orihuela contreras jose
Van ness problemas termo cap 1 orihuela contreras jose
Interpretación topográfica y elementos básicos de foto interpretación
Motor stirling de combustion externa
Van ness capitulo 3 orihuela contreras jose
Dibujo tecnico 2
Motores de combustion interna de cuatro tiempos
PostgreSQL, performance for queries with grouping
PUGS Meetup Presentation - 11062015
SP DIT Bonding Day - 05062015
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
Ad

Similar to Pg big fast ugly acid (20)

PDF
NoSQL on ACID - Meet Unstructured Postgres
 
ODP
Introduction to PostgreSQL
PDF
NoSQL Now: Postgres - The NoSQL Cake You Can Eat
PDF
An evening with Postgresql
PPTX
Chjkkkkkkkkkkkkkkkkkjjjjjjjjjjjjjjjjjjjjjjjjjj01_The Basics.pptx
PPTX
PDF
EDB NoSQL German Webinar 2015
 
PDF
Postgres: The NoSQL Cake You Can Eat
 
PDF
Mathias test
PDF
Postgres NoSQL - Delivering Apps Faster
 
PDF
No sql way_in_pg
PPT
Do More with Postgres- NoSQL Applications for the Enterprise
 
PDF
PostgreSQL - Case Study
PDF
PostgreSQL Prologue
PDF
PostgreSQL, your NoSQL database
PDF
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
PDF
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
PDF
No sql bigdata and postgresql
PPT
The NoSQL Way in Postgres
 
PPTX
NoSQL on ACID: Meet Unstructured Postgres
 
NoSQL on ACID - Meet Unstructured Postgres
 
Introduction to PostgreSQL
NoSQL Now: Postgres - The NoSQL Cake You Can Eat
An evening with Postgresql
Chjkkkkkkkkkkkkkkkkkjjjjjjjjjjjjjjjjjjjjjjjjjj01_The Basics.pptx
EDB NoSQL German Webinar 2015
 
Postgres: The NoSQL Cake You Can Eat
 
Mathias test
Postgres NoSQL - Delivering Apps Faster
 
No sql way_in_pg
Do More with Postgres- NoSQL Applications for the Enterprise
 
PostgreSQL - Case Study
PostgreSQL Prologue
PostgreSQL, your NoSQL database
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
No sql bigdata and postgresql
The NoSQL Way in Postgres
 
NoSQL on ACID: Meet Unstructured Postgres
 

Recently uploaded (20)

PPTX
indiraparyavaranbhavan-240418134200-31d840b3.pptx
PPTX
865628565-Pertemuan-2-chapter-03-NUMERICAL-MEASURES.pptx
PPTX
Chapter security of computer_8_v8.1.pptx
PDF
Hikvision-IR-PPT---EN.pdfSADASDASSAAAAAAAAAAAAAAA
PPTX
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
PPT
2011 HCRP presentation-final.pptjrirrififfi
PPTX
ch20 Database System Architecture by Rizvee
PDF
Concepts of Database Management, 10th Edition by Lisa Friedrichsen Test Bank.pdf
PDF
The Role of Pathology AI in Translational Cancer Research and Education
PDF
Teal Blue Futuristic Metaverse Presentation.pdf
PPTX
Sheep Seg. Marketing Plan_C2 2025 (1).pptx
PDF
Mcdonald's : a half century growth . pdf
PPTX
1 hour to get there before the game is done so you don’t need a car seat for ...
PDF
General category merit rank list for neet pg
PPTX
DIGITAL DESIGN AND.pptx hhhhhhhhhhhhhhhhh
PPTX
Machine Learning and working of machine Learning
PDF
book-34714 (2).pdfhjkkljgfdssawtjiiiiiujj
PPT
dsa Lec-1 Introduction FOR THE STUDENTS OF bscs
PPTX
PPT for Diseases (1)-2, types of diseases.pptx
PPTX
9 Bioterrorism.pptxnsbhsjdgdhdvkdbebrkndbd
indiraparyavaranbhavan-240418134200-31d840b3.pptx
865628565-Pertemuan-2-chapter-03-NUMERICAL-MEASURES.pptx
Chapter security of computer_8_v8.1.pptx
Hikvision-IR-PPT---EN.pdfSADASDASSAAAAAAAAAAAAAAA
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
2011 HCRP presentation-final.pptjrirrififfi
ch20 Database System Architecture by Rizvee
Concepts of Database Management, 10th Edition by Lisa Friedrichsen Test Bank.pdf
The Role of Pathology AI in Translational Cancer Research and Education
Teal Blue Futuristic Metaverse Presentation.pdf
Sheep Seg. Marketing Plan_C2 2025 (1).pptx
Mcdonald's : a half century growth . pdf
1 hour to get there before the game is done so you don’t need a car seat for ...
General category merit rank list for neet pg
DIGITAL DESIGN AND.pptx hhhhhhhhhhhhhhhhh
Machine Learning and working of machine Learning
book-34714 (2).pdfhjkkljgfdssawtjiiiiiujj
dsa Lec-1 Introduction FOR THE STUDENTS OF bscs
PPT for Diseases (1)-2, types of diseases.pptx
9 Bioterrorism.pptxnsbhsjdgdhdvkdbebrkndbd

Pg big fast ugly acid

  • 1. PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid Federico Campoli 09 Jan 2016 Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 1 / 40
  • 2. Table of contents 1 The Big 2 The Fast 3 The (NOSQL on) Acid 4 Wrap up Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 2 / 40
  • 3. Table of contents 1 The Big 2 The Fast 3 The (NOSQL on) Acid 4 Wrap up Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 3 / 40
  • 4. The Big Image by Caitlin - https://www.flickr.com/photos/lizard queen Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 4 / 40
  • 5. PostgreSQL, an history of excellence Created at Berkeley in 1982 by database’s legend Prof. Stonebraker In the 1994 Andrew Yu and Jolly Chen added the SQL interpreter In the 1996 becomes an Open Source project. The project’s name changes in PostgreSQL Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 5 / 40
  • 6. PostgreSQL, an history of excellence Created at Berkeley in 1982 by database’s legend Prof. Stonebraker In the 1994 Andrew Yu and Jolly Chen added the SQL interpreter In the 1996 becomes an Open Source project. The project’s name changes in PostgreSQL Fully ACID compliant High performance in read/write with the MVCC Tablespaces Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 5 / 40
  • 7. PostgreSQL, an history of excellence Created at Berkeley in 1982 by database’s legend Prof. Stonebraker In the 1994 Andrew Yu and Jolly Chen added the SQL interpreter In the 1996 becomes an Open Source project. The project’s name changes in PostgreSQL Fully ACID compliant High performance in read/write with the MVCC Tablespaces Runs on almost any unix flavour From the version 8.0 is native on *cough* MS Windows *cough* HA with hot standby and streaming replication Heterogeneous federation Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 5 / 40
  • 8. PostgreSQL, an history of excellence Created at Berkeley in 1982 by database’s legend Prof. Stonebraker In the 1994 Andrew Yu and Jolly Chen added the SQL interpreter In the 1996 becomes an Open Source project. The project’s name changes in PostgreSQL Fully ACID compliant High performance in read/write with the MVCC Tablespaces Runs on almost any unix flavour From the version 8.0 is native on *cough* MS Windows *cough* HA with hot standby and streaming replication Heterogeneous federation Procedural languages (pl/pgsql, pl/python, pl/perl...) Support for NOSQL features like HSTORE and JSON Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 5 / 40
  • 9. Development Old ugly C language New development cycle starts usually in June New version released usually by the end of the year At least 4 LTS versions Can be extended using shared libraries Extensions (from the version9.1) BSD like license Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 6 / 40
  • 10. Limits Database size. No limits. Table size, 32 TB Row size 1.6 TB Rows in table. No limits. Fields in table 250 - 1600 depending on data type. Tables in a database. No limits. Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 7 / 40
  • 11. Data types Alongside the general purpose data types PostgreSQL have some exotic types. Range (integers, date) Geometric (points, lines etc.) Network addresses XML JSON HSTORE (extension) Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 8 / 40
  • 12. Surprise surprise! PostgreSQL 9.5, UPSERT, Row Level Security, and Big Data Release date: 2016-01-07 UPSERT aka INSERT, ON CONFLICT UPDATE Row Level Security, allows security ”policies” which filter which rows particular users are allowed to update or view. BIG DATA! BRIN - Block Range Indices CUBE, ROLLUP and GROUPING SETS IMPORT FOREIGN SCHEMA TABLESAMPLE Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 9 / 40
  • 13. Table of contents 1 The Big 2 The Fast 3 The (NOSQL on) Acid 4 Wrap up Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 10 / 40
  • 14. The Fast Image by Hein Waschefort - http://commons.wikimedia.org/wiki/User:Hein waschefort Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 11 / 40
  • 15. Page layout A PostgreSQL’s data file is an array of fixed length blocks called pages. The default size is 8kb. Figure : Index page Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 12 / 40
  • 16. Page layout Each page have an header used to enforce the durability, and the optional page’s checksum. There are some pointers used to track the free space inside the page. Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 13 / 40
  • 17. Tuple layout Just after the header there is a list of pointers to the physical tuples stored in the page’s end. Each tuple is and array of raw data, called datum. The nature of this datum is unknown to the postgres process. The datum becomes the data type when PostgreSQL loads the page in memory. This requires a system catalogue look up. Figure : Tuple structure The tuple’s header is used in the MVCC. Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 14 / 40
  • 18. The magic of the MVCC Any operation in PostgreSQL happens through transactions. By default when a single statement is successfully completed the database commits automatically the transaction. It’s possible to wrap multiple statements in a single transaction using the keywords [BEGIN;]....... [COMMIT; ROLLBACK] The minimal possible level the transaction isolation is READ COMMITTED. PostgreSQL from 9.2 supports the snapshot export to other sessions. Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 15 / 40
  • 19. There’s no such thing like an update Where’s the catch? Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 16 / 40
  • 20. There’s no such thing like an update Where’s the catch? PostgreSQL actually NEVER performs an update. The UPDATE behaviour is to add a new row version and to keep the old one for read consistency. Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 16 / 40
  • 21. Dead tuples and VACUUM The tuples left in place for read consistency are called dead. A dead tuple is left in place for any transaction that should see it. This adds overhead to any I/O operation. VACUUM clears the dead tuples VACUUM is designed to have the minimal impact on the database normal activity VACUUM removes only dead tuples no longer visible to the open transactions VACUUM prevents the xid wraparound failure Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 17 / 40
  • 22. Table of contents 1 The Big 2 The Fast 3 The (NOSQL on) Acid 4 Wrap up Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 18 / 40
  • 23. The (NOSQL on) Acid Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 19 / 40
  • 24. JSON JSON - JavaScript Object Notation The version 9.2 adds JSON as native data type The version 9.3 adds the support functions for JSON JSON is stored as text JSON is parsed and validated on the fly The 9.4 adds JSONB (binary) data type Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 20 / 40
  • 25. JSON JSON - Examples From record to JSON postgres=# SELECT row_to_json(ROW(1,’foo’)); row_to_json --------------------- {"f1":1,"f2":"foo"} (1 row) Expanding JSON into key to value elements postgres=# SELECT * from json_each(’{"a":"foo", "b":"bar"}’); key | value -----+------- a | "foo" b | "bar" (2 rows) Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 21 / 40
  • 26. HSTORE HSTORE is a custom data type used to store key to value items Is an extension Data stored as text A shared library does the magic transforming the datum in HSTORE Is similar to JSON without nested elements Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 22 / 40
  • 27. HSTORE HSTORE - Examples From record to HSTORE postgres=# SELECT hstore(ROW(1,2)); hstore ---------------------- "f1"=>"1", "f2"=>"2" (1 row) HSTORE expansion to key to value elements postgres=# SELECT * FROM each(’a=>1,b=>2’); key | value -----+------- a | 1 b | 2 (2 rows) Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 23 / 40
  • 28. JSON and HSTORE There is a subtile difference between HSTORE and JSON. HSTORE is not a native data type. The JSON is a native data type and the conversion happens inside the postgres process. The HSTORE requires the access to the shared library. Because the conversion from the raw datum happens for each tuple loaded in the shared buffer can affect the performance’s overall. Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 24 / 40
  • 29. JSONB Because JSON is parsed and validated on the fly and and this can be a bottleneck. The new JSONB introduced with PostgreSQL 9.4 is parsed, validated and transformed at insert/update’s time. The access is then faster than the plain JSON but the storage cost can be higher. The functions available for JSON are also available in the JSONB flavour. Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 25 / 40
  • 30. Some numbers Let’s create three tables with text,json and jsonb type fields. Each record contains the same json element generated on http://beta.json-generator.com/4kwCt-fwg [ { "_id": "56891aba27402de7f551bc91", "index": 0, "guid": "b9345045-1222-4f71-9540-6ed7c8d2ccae", "isActive": false, ............ 3, { "id": 1, "name": "Bridgett Shaw" } ], "greeting": "Hello, Johnston! You have 8 unread messages.", "favoriteFruit": "apple" } ] Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 26 / 40
  • 31. Some numbers DROP TABLE IF EXISTS t_json ; DROP TABLE IF EXISTS t_jsonb ; DROP TABLE IF EXISTS t_text ; CREATE TABLE t_json as SELECT ’<JSON ELEMENT >’:: json as js_value FROM generate_series (1 ,100000); Query returned successfully : 100000 rows affected , 14504 ms execution time. CREATE TABLE t_text as SELECT ’<JSON ELEMENT >’:: text as t_value FROM generate_series (1 ,100000); Query returned successfully : 100000 rows affected , 14330 ms execution time. CREATE TABLE t_jsonb as SELECT ’<JSON ELEMENT >’:: jsonb as jsb_value FROM generate_series (1 ,100000); Query returned successfully : 100000 rows affected , 14060 ms execution time. Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 27 / 40
  • 32. Table size SELECT pg_size_pretty ( pg_total_relation_size (oid)), relname FROM pg_class WHERE relname LIKE ’t_%’ ; pg_size_pretty | relname -- --------------+--------- 270 MB | t_json 322 MB | t_jsonb 270 MB | t_text (3 rows) Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 28 / 40
  • 33. Sequential scans TEXT EXPLAIN (BUFFERS , ANALYZE) SELECT * FROM t_text; Seq Scan on t_text (cost =0.00..1637.00 rows =100000 width =18) (actual time =0.016..17.624 rows =100000 loops =1) Buffers: shared hit =637 Planning time: 0.040 ms Execution time: 28.967 ms (4 rows) Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 29 / 40
  • 34. Sequential scans JSON EXPLAIN (BUFFERS , ANALYZE) SELECT * FROM t_json; Seq Scan on t_json (cost =0.00..1637.09 rows =100009 width =32) (actual time =0.018..15.443 rows =100000 loops =1) Buffers: shared hit =637 Planning time: 0.045 ms Execution time: 25.268 ms (4 rows) Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 30 / 40
  • 35. Sequential scans JSONB EXPLAIN (BUFFERS , ANALYZE) SELECT * FROM t_jsonb; Seq Scan on t_jsonb (cost =0.00..1637.00 rows =100000 width =18) (actual time =0.015..18.943 rows =100000 loops =1) Buffers: shared hit =637 Planning time: 0.043 ms Execution time: 31.072 ms (4 rows) Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 31 / 40
  • 36. Sequential scan with json access TEXT EXPLAIN (BUFFERS , ANALYZE) SELECT t_value ::json ->’index ’ FROM t_text; Seq Scan on t_text (cost =0.00..2387.00 rows =100000 width =18) (actual time =0.159..7748.381 rows =100000 loops =1) Buffers: shared hit =401729 Planning time: 0.028 ms Execution time: 7760.263 ms (4 rows) Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 32 / 40
  • 37. Sequential scan with json access JSON EXPLAIN (BUFFERS , ANALYZE) SELECT js_value ->’index ’ FROM t_json; Seq Scan on t_json (cost =0.00..1887.11 rows =100009 width =32) (actual time =0.254..5787.267 rows =100000 loops =1) Buffers: shared hit =401730 Planning time: 0.044 ms Execution time: 5798.153 ms (4 rows) Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 33 / 40
  • 38. Sequential scan with json access JSONB EXPLAIN (BUFFERS , ANALYZE) SELECT jsb_value ->’index ’ FROM t_jsonb; Seq Scan on t_jsonb (cost =0.00..1887.00 rows =100000 width =18) (actual time =0.138..1678.222 rows =100000 loops =1) Buffers: shared hit =421729 Planning time: 0.048 ms Execution time: 1688.752 ms (4 rows) Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 34 / 40
  • 39. Table of contents 1 The Big 2 The Fast 3 The (NOSQL on) Acid 4 Wrap up Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 35 / 40
  • 40. Wrap up Schema less data are useful. They are flexible and powerful. Never forget the update strategy in PostgreSQL The lack of horizontal scalability in PostgreSQL can be a serious problem. An interesting project for a distributed cluster is PostgreSQL XL - http://www.postgres-xl.org/ CitusDB is a powerful DWH oriented DBMS with horizontal scale capabilities - should become open source in the next release Never forget PostgreSQL is a RDBMS Get a DBA on board Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 36 / 40
  • 41. Questions Questions? Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 37 / 40
  • 42. Contacts Twitter: 4thdoctor scarf Personal blog: http://www.pgdba.co.uk PostgreSQL Book: http://www.slideshare.net/FedericoCampoli/postgresql-dba-01 Brighton PostgreSQL Meetup: http://www.meetup.com/Brighton-PostgreSQL-Meetup/ Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 38 / 40
  • 43. License and copyright This presentation is licensed under the terms of the Creative Commons Attribution NonCommercial ShareAlike 4.0 http://creativecommons.org/licenses/by-nc-sa/4.0/ The elephant photo is copyright by Caitlin - https://www.flickr.com/photos/lizard queen The cheetah photo is copyright by Hein Waschefort - http://commons.wikimedia.org/wiki/User:Hein waschefort The elephant logos are copyright of the PostgreSQL Global Development Group - http://www.postgresql.org/ Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 39 / 40
  • 44. PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid Federico Campoli 09 Jan 2016 Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 09 Jan 2016 40 / 40