SlideShare a Scribd company logo
Scaling PostgreSQL  with GridSQL
Who Am I? Jim Mlodgenski Co-organizer of NYCPUG
Founder of Cirrus Technologies
Former Chief Architect of EnterpriseDB
Agenda What is GridSQL?
Architecture
Query Flow
Scaling
Limitations
What is GridSQL? “ Shared-Nothing”, distributed data architecture. Leverage the power of multiple commodity servers while appearing as a single database to the application Essentially...  Open Source
Greenplum, Netezza or Teradata
GridSQL Details Designed for Parallel Querying
Not just “Read-Only”, can execute UPDATE, DELETE
Data Loader for parallel loading
Standard connectivity via PostgreSQL compatible connectors: JDBC, ODBC, ADO.NET, libpq (psql)
What GridSQL is not? A replication solution like Slony or Bucardo
A high availability solution like Streaming Replication in PostgreSQL 9.0
A scalable transactional solution like PostgresXC
An elastic, eventually consistent NoSQL database
Configuration Can be configured for multiple logical “nodes” per physical server Take advantage of multi-core processors Tables may be either replicated or partitioned
Replicated tables for static lookup data or dimensions Partitioned tables for large fact tables
Partitioning Tables may simultaneously use GridSQL Partitioning with Constraint Exclusion Partitioning Large queries scan a much smaller subset of data by using subtables
Since each subtable is also partitioned across nodes, they are scanned in parallel
Queries execute much faster
Architecture Loosely coupled, shared-nothing architecture
Data repositories Metadata database
GridSQL database GridSQL processes Central coordinator
Agents
Query Optimization Cost Based Optimizer Takes into account Row Shipping (expensive) Looks for joins with replicated tables Can be done locally

More Related Content

What's hot (20)

PDF
How to teach an elephant to rock'n'roll
PGConf APAC
 
PDF
Performance features12102 doag_2014
Trivadis
 
PDF
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
Altinity Ltd
 
PDF
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Altinity Ltd
 
PDF
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Altinity Ltd
 
PDF
ClickHouse 2018. How to stop waiting for your queries to complete and start ...
Altinity Ltd
 
PPTX
Join optimization in hive
Liyin Tang
 
PDF
Photon Technical Deep Dive: How to Think Vectorized
Databricks
 
PDF
Developers' mDay 2017. - Bogdan Kecman Oracle
mCloud
 
PDF
Map reduce: beyond word count
Jeff Patti
 
PDF
Oracle Parallel Distribution and 12c Adaptive Plans
Franck Pachot
 
PDF
Data preparation covariates
FAO
 
PDF
Oracle Join Methods and 12c Adaptive Plans
Franck Pachot
 
PDF
Table partitioning in PostgreSQL + Rails
Agnieszka Figiel
 
PDF
Spatial query on vanilla databases
Julian Hyde
 
PDF
Star Transformation, 12c Adaptive Bitmap Pruning and In-Memory option
Franck Pachot
 
PPTX
Webinar 2017. Supercharge your analytics with ClickHouse. Vadim Tkachenko
Altinity Ltd
 
PDF
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
Insight Technology, Inc.
 
PDF
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Sergey Petrunya
 
PDF
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
Databricks
 
How to teach an elephant to rock'n'roll
PGConf APAC
 
Performance features12102 doag_2014
Trivadis
 
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
Altinity Ltd
 
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Altinity Ltd
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Altinity Ltd
 
ClickHouse 2018. How to stop waiting for your queries to complete and start ...
Altinity Ltd
 
Join optimization in hive
Liyin Tang
 
Photon Technical Deep Dive: How to Think Vectorized
Databricks
 
Developers' mDay 2017. - Bogdan Kecman Oracle
mCloud
 
Map reduce: beyond word count
Jeff Patti
 
Oracle Parallel Distribution and 12c Adaptive Plans
Franck Pachot
 
Data preparation covariates
FAO
 
Oracle Join Methods and 12c Adaptive Plans
Franck Pachot
 
Table partitioning in PostgreSQL + Rails
Agnieszka Figiel
 
Spatial query on vanilla databases
Julian Hyde
 
Star Transformation, 12c Adaptive Bitmap Pruning and In-Memory option
Franck Pachot
 
Webinar 2017. Supercharge your analytics with ClickHouse. Vadim Tkachenko
Altinity Ltd
 
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
Insight Technology, Inc.
 
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Sergey Petrunya
 
Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...
Databricks
 

Similar to Scaling PostgreSQL With GridSQL (20)

ODP
PostgreSQL 8.4 TriLUG 2009-11-12
Andrew Dunstan
 
PPTX
SQL Windowing
Sandun Perera
 
PDF
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
Masayuki Matsushita
 
PDF
20181116 Massive Log Processing using I/O optimized PostgreSQL
Kohei KaiGai
 
PPTX
Module3 for enginerring students ppt.pptx
mudduanjali02
 
PPTX
CS 542 -- Query Execution
J Singh
 
PPT
Getting Started with PostGIS geographic database - Lasma Sietinsone, EDINA
JISC GECO
 
PPT
Getting started with PostGIS geographic database
EDINA, University of Edinburgh
 
PPT
Os Lonergan
oscon2007
 
PPTX
Odtug2011 adf developers make the database work for you
Luc Bors
 
PPTX
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Serban Tanasa
 
PDF
20160407_GTC2016_PgSQL_In_Place
Kohei KaiGai
 
PDF
Run your queries 14X faster without any investment!
Knoldus Inc.
 
PPT
Download It
butest
 
ODT
ACADILD:: HADOOP LESSON
Padma shree. T
 
PPTX
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
SnapLogic
 
PPTX
Presentation_BigData_NenaMarin
n5712036
 
PDF
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Databricks
 
PDF
Postgres Vienna DB Meetup 2014
Michael Renner
 
PDF
Apache Spark: What? Why? When?
Massimo Schenone
 
PostgreSQL 8.4 TriLUG 2009-11-12
Andrew Dunstan
 
SQL Windowing
Sandun Perera
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
Masayuki Matsushita
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
Kohei KaiGai
 
Module3 for enginerring students ppt.pptx
mudduanjali02
 
CS 542 -- Query Execution
J Singh
 
Getting Started with PostGIS geographic database - Lasma Sietinsone, EDINA
JISC GECO
 
Getting started with PostGIS geographic database
EDINA, University of Edinburgh
 
Os Lonergan
oscon2007
 
Odtug2011 adf developers make the database work for you
Luc Bors
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Serban Tanasa
 
20160407_GTC2016_PgSQL_In_Place
Kohei KaiGai
 
Run your queries 14X faster without any investment!
Knoldus Inc.
 
Download It
butest
 
ACADILD:: HADOOP LESSON
Padma shree. T
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
SnapLogic
 
Presentation_BigData_NenaMarin
n5712036
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Databricks
 
Postgres Vienna DB Meetup 2014
Michael Renner
 
Apache Spark: What? Why? When?
Massimo Schenone
 
Ad

More from Jim Mlodgenski (10)

PDF
Strategic autovacuum
Jim Mlodgenski
 
PDF
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Jim Mlodgenski
 
PDF
Oracle postgre sql-mirgration-top-10-mistakes
Jim Mlodgenski
 
PDF
Profiling PL/pgSQL
Jim Mlodgenski
 
PDF
Debugging Your PL/pgSQL Code
Jim Mlodgenski
 
PDF
An Introduction To PostgreSQL Triggers
Jim Mlodgenski
 
PDF
PostgreSQL Procedural Languages: Tips, Tricks and Gotchas
Jim Mlodgenski
 
ODP
Introduction to PostgreSQL
Jim Mlodgenski
 
ODP
Postgresql Federation
Jim Mlodgenski
 
PPT
Leveraging Hadoop in your PostgreSQL Environment
Jim Mlodgenski
 
Strategic autovacuum
Jim Mlodgenski
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Jim Mlodgenski
 
Oracle postgre sql-mirgration-top-10-mistakes
Jim Mlodgenski
 
Profiling PL/pgSQL
Jim Mlodgenski
 
Debugging Your PL/pgSQL Code
Jim Mlodgenski
 
An Introduction To PostgreSQL Triggers
Jim Mlodgenski
 
PostgreSQL Procedural Languages: Tips, Tricks and Gotchas
Jim Mlodgenski
 
Introduction to PostgreSQL
Jim Mlodgenski
 
Postgresql Federation
Jim Mlodgenski
 
Leveraging Hadoop in your PostgreSQL Environment
Jim Mlodgenski
 
Ad

Recently uploaded (20)

PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 

Scaling PostgreSQL With GridSQL

  • 1. Scaling PostgreSQL with GridSQL
  • 2. Who Am I? Jim Mlodgenski Co-organizer of NYCPUG
  • 3. Founder of Cirrus Technologies
  • 4. Former Chief Architect of EnterpriseDB
  • 5. Agenda What is GridSQL?
  • 10. What is GridSQL? “ Shared-Nothing”, distributed data architecture. Leverage the power of multiple commodity servers while appearing as a single database to the application Essentially... Open Source
  • 12. GridSQL Details Designed for Parallel Querying
  • 13. Not just “Read-Only”, can execute UPDATE, DELETE
  • 14. Data Loader for parallel loading
  • 15. Standard connectivity via PostgreSQL compatible connectors: JDBC, ODBC, ADO.NET, libpq (psql)
  • 16. What GridSQL is not? A replication solution like Slony or Bucardo
  • 17. A high availability solution like Streaming Replication in PostgreSQL 9.0
  • 18. A scalable transactional solution like PostgresXC
  • 19. An elastic, eventually consistent NoSQL database
  • 20. Configuration Can be configured for multiple logical “nodes” per physical server Take advantage of multi-core processors Tables may be either replicated or partitioned
  • 21. Replicated tables for static lookup data or dimensions Partitioned tables for large fact tables
  • 22. Partitioning Tables may simultaneously use GridSQL Partitioning with Constraint Exclusion Partitioning Large queries scan a much smaller subset of data by using subtables
  • 23. Since each subtable is also partitioned across nodes, they are scanned in parallel
  • 25. Architecture Loosely coupled, shared-nothing architecture
  • 27. GridSQL database GridSQL processes Central coordinator
  • 29. Query Optimization Cost Based Optimizer Takes into account Row Shipping (expensive) Looks for joins with replicated tables Can be done locally
  • 30. Looks for joins between tables on partitioned columns
  • 31. Aggregation First set of aggregates done in parallel at the nodes
  • 32. Like groups of intermediate results shipped to same target node
  • 33. Second aggregation done in parallel
  • 34. Coordinator streams in node results, combining on the fly and sending to client result set, performing a merge sort if ORDER BY present
  • 35. Two Phase Aggregation SUM SUM(stat1)
  • 37. SUM2 (SUM(stat1)) / SUM2 (COUNT(stat1))
  • 38. Creating Tables Tables can be partitioned or replicated CREATE TABLE region (r_regionkey INTEGER NOT NULL, r_name CHAR(25) NOT NULL, r_comment VARCHAR(152)) REPLICATED;
  • 39. Creating Tables CREATE TABLE orders ( o_orderkey INTEGER NOT NULL, o_custkey INTEGER NOT NULL, o_orderstatus CHAR(1) NOT NULL, o_totalprice DECIMAL(15,2) NOT NULL, o_orderdate DATE NOT NULL, o_orderpriority CHAR(15) NOT NULL, o_clerk CHAR(15) NOT NULL, o_shippriority INTEGER NOT NULL, o_comment VARCHAR(79) NOT NULL) PARTITIONING KEY o_orderkey ON ALL;
  • 40. DBT3 : Query 1 SELECT l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order FROM lineitem WHERE l_shipdate <= date'1998-12-01' - interval '90 days' GROUP BY l_returnflag, l_linestatus ORDER BY l_returnflag, l_linestatus; Results l_returnflag | l_linestatus | sum_qty | sum_base_price | ... | count_order --------------+--------------+----------+----------------+ ... +------------- A | F | 37734104 | 56586654000 | ... | 1478493 N | F | 991417 | 1487505700 | ... | 38854 N | O | 74473520 | 111717540000 | ... | 2920374 R | F | 37719752 | 56567792000 | ... | 1478870 (4 rows)
  • 41. Query 1 – Execution (no Agents) Go to Animation Slide
  • 42. DBT3 : Query 7 Results supp_nation | cust_nation | l_year | revenue ---------------------------+---------------------------+--------+-------------------- GERMANY | UNITED STATES | 1995 | 51883178.038909949 GERMANY | UNITED STATES | 1996 | 52528107.076993272 UNITED STATES | GERMANY | 1995 | 51546631.033109233 UNITED STATES | GERMANY | 1996 | 53108668.056805529 (4 rows) SELECT supp_nation, cust_nation, l_year, sum(volume) as revenue FROM (SELECT n1.n_name as supp_nation, n2.n_name as cust_nation, extract(year from l_shipdate) as l_year, l_extendedprice * (1 - l_discount) as volume FROM supplier, lineitem, orders, customer, nation n1, nation n2 WHERE s_suppkey = l_suppkey AND o_orderkey = l_orderkey AND c_custkey = o_custkey AND s_nationkey = n1.n_nationkey AND c_nationkey = n2.n_nationkey AND ((n1.n_name = 'GERMANY' and n2.n_name = 'UNITED STATES') or (n1.n_name = 'UNITED STATES' and n2.n_name = 'GERMANY')) AND l_shipdate between date '1995-01-01' and date '1996-12-31' ) AS shipping GROUP BY supp_nation, cust_nation, l_year ORDER BY supp_nation, cust_nation, l_year;
  • 43. Query 7 – Execution (with Agents) Go to Animation Slide
  • 45. Scalability A few DBT3 queries on Amazon EC2 Using PostgreSQL 9.0
  • 46. Scalability SELECT l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order FROM lineitem WHERE l_shipdate <= date'1998-12-01' - interval '90 days' GROUP BY l_returnflag, l_linestatus ORDER BY l_returnflag, l_linestatus;
  • 47. Scalability SELECT supp_nation, cust_nation, l_year, sum(volume) as revenue FROM (SELECT n1.n_name as supp_nation, n2.n_name as cust_nation, extract(year from l_shipdate) as l_year, l_extendedprice * (1 - l_discount) as volume FROM supplier, lineitem, orders, customer, nation n1, nation n2 WHERE s_suppkey = l_suppkey AND o_orderkey = l_orderkey AND c_custkey = o_custkey AND s_nationkey = n1.n_nationkey AND c_nationkey = n2.n_nationkey AND ((n1.n_name = 'GERMANY' and n2.n_name = 'UNITED STATES') or (n1.n_name = 'UNITED STATES' and n2.n_name = 'GERMANY')) AND l_shipdate between date '1995-01-01' and date '1996-12-31' ) AS shipping GROUP BY supp_nation, cust_nation, l_year ORDER BY supp_nation, cust_nation, l_year;
  • 48. Limitations SQL Support Uses its own parser and optimizer so: No Window Functions
  • 50. No Full Text Search
  • 52. Transaction Performance Single row Insert, Update, or Delete are slow compared to a single PostgreSQL instance The data must make an additional network trip to be committed
  • 53. All partitioned rows must be hashed to be mapped to the proper node
  • 54. All replicated rows must be committed to all nodes Use “gs-loader” for bulk loading for better performance
  • 55. High Availability No heartbeat or fail-over control in the coordinator High Availability for each PostgreSQL node must be configured separately
  • 56. Streaming replication can be ideal for this Getting a consistent backup of the entire GridSQL database is difficult Must ensure there are no transaction are occurring
  • 57. Backup each node separately
  • 58. Adding Nodes Requires Downtime Data must be manually reloaded to partition the data to the new node With planning, the process can be fast with no mapping of data Run multiple PostgreSQL instances on each physical server and move the PostgreSQL instances to new hardware as needed
  • 59. Interesting Side Note GridSQL scales well in a cloud environment
  • 60. The results are dependent on the cloud vendor
  • 61. Summary GridSQL can improve performance tremendously of PostgreSQL queries
  • 62. GridSQL can scale linearly as more nodes are added
  • 63. GridSQL is open source so if the limitations are an issue,
  • 65. Download GridSQL at: http://sourceforge.net/projects/gridsql/ Jim Mlodgenski Email: [email_address] Twitter: @jim_mlodgenski