Copyright©2017 NTT corp. All Rights Reserved.
FDW-based Sharding Update and Future
NTT Open Source Software Center
Masahiko Sawada
PGConf Russia 2017 (16th March)
2Copyright©2017 NTT corp. All Rights Reserved.
Who am I?
Masahiko Sawada
Twitter : @sawada_masahiko
GitHub: MasahikoSawada
PostgreSQL Contributor
Freeze Map(PG9.6)
Multiple Synchronous Replication(PG9.6)
Quorum-based Synchronous Replication(PG10)
PostgreSQL Technical Support
pg_repack committer
3Copyright©2017 NTT corp. All Rights Reserved.
1. What is database sharding
2. What is FDW-based sharding
3. Demonstration
4. Use cases
5. Challenges and key techniques
6. Conclusion
Agenda
4Copyright©2017 NTT corp. All Rights Reserved.
• Scale-up
• Vertical scaling
• Simple
• Price
• Not safe against hardware
failure
Scale-up and Scale-out
• Scale-out
• Horizontal scaling
• Easier to run fault-
tolerantly
• More complex
5Copyright©2017 NTT corp. All Rights Reserved.
• A scale-out technique
• The tables are divided and distributed into multiple
servers
• Row based
• Column based
• A database shard can be placed on separate hardware
What is database sharding
6Copyright©2017 NTT corp. All Rights Reserved.
• Pros
• Write scale out (Horizontal scaling)
• Reduce I/O on each shard, by splitting data across shard
• Access only required shard
• Cons
• Node management
• Cross-shard transaction could be cause of slow query
• Downtime might be required when changing the sharding layout
Pros and Cons
7Copyright©2017 NTT corp. All Rights Reserved.
• Reliability
• Backups of individual database shards
• Replication of database shards
• Automated failover
• Distributed queries
• Avoidance of cross-shard joins
• Auto-increment key, like sequence
• Distributed transactions
Challenges
8Copyright©2017 NTT corp. All Rights Reserved.
• Postgres-XC by NTT, EDB
• Postgres-XL by 2ndQuadrant
• Postgres Cluster by Postgres Professional
• Greenplum by Pivotal
• pg_shard by CitusData
• Other than PostgreSQL,
• VoltDB
• MySQL Cluster
• Spanner
• etc
Well-known Products
9Copyright©2017 NTT corp. All Rights Reserved.
• FDW-based sharding is a database sharding techniques using mainly
FDW (Foreign-Data-Wrapper) and Table Partitioning
• Our goal is providing a sharding solution as a Built-in feature.
What is FDW-based sharding
PostgreSQL
PostgreSQL PostgreSQL PostgreSQL
Client
Client
Client
Client
postgres_fdw
10Copyright©2017 NTT corp. All Rights Reserved.
Basic Architecture (PG9.6)
PostgreSQL
Heap Table
(Parent Table)
Foreign Table
(child)
Foreign Table
(child)
PostgreSQLHeap Table
(child)
PostgreSQLHeap Table
(child)
Client
postgres_fdw
Coordinator Node
Data NodeData Node
Table
Partitioning
11Copyright©2017 NTT corp. All Rights Reserved.
Basic Architecture (PG9.6)
PostgreSQL
Heap Table
(Parent Table)
Foreign Table
(child)
Foreign Table
(child)
PostgreSQLHeap Table
(child)
PostgreSQLHeap Table
(child)
Client
postgres_fdw
Coordinator Node
&
Data Node
Data NodeData Node
Table
Partitioning
Heap Table
(Parent Table)
12Copyright©2017 NTT corp. All Rights Reserved.
Data Node
Multiple coordinator nodes (future)
PostgreSQL
PostgreSQLPostgreSQL
Client Client Client Client
PostgreSQL PostgreSQL
Heap
Table
Foreign
Table
PostgreSQL
Coordinator Node
13Copyright©2017 NTT corp. All Rights Reserved.
PostgreSQL server behaves both (future)
PostgreSQL
Client
Client Client
Client
PostgreSQL
PostgreSQL
Heap
Table
Foreign
Table
Coordinator
& Data Node
Coordinator
& Data Node
Coordinator
& Data Node
14Copyright©2017 NTT corp. All Rights Reserved.
Insert data ID=50
PostgreSQL
Heap Table
(Parent Table)
Foreign Table
(child)
Foreign Table
(child)
PostgreSQLHeap Table
(child)
PostgreSQLHeap Table
(child)
Client
postgres_fdw
Coordinator Node
Data NodeData Node
Table
Partitioning
INSERT INTO parent_table VALUES(50);
ID : 0 ~ 100 ID : 101 ~ 200
15Copyright©2017 NTT corp. All Rights Reserved.
Select data ID=150
PostgreSQL
Heap Table
(Parent Table)
Foreign Table
(child)
Foreign Table
(child)
PostgreSQLHeap Table
(child)
PostgreSQLHeap Table
(child)
Client
postgres_fdw
Coordinator Node
Data NodeData Node
Table
Partitioning
ID : 0 ~ 100 ID : 101 ~ 200
SELECT … FROM … WHERE id =
150;
16Copyright©2017 NTT corp. All Rights Reserved.
Sort Push Down
-- 9.5
Sort
Output: p.col
Sort Key: p.col
-> Append
-> Seq Scan on public.p
Output: p.col
-> Foreign Scan on public.s1
Output: s1.col
Remote SQL: SELECT col FROM public.s1
-> Foreign Scan on public.s2
Output: s2.col
Remote SQL: SELECT col FROM public.s2
-- 9.6
Merge Append
Sort Key: p.col
-> Sort
Output: p.col
Sort Key: p.col
-> Seq Scan on public.p
Output: p.col
-> Foreign Scan on public.s1
Output: s1.col
Remote SQL: SELECT col FROM public.s1 ORDER BY col ASC NULLS LAST
-> Foreign Scan on public.s2
Output: s2.col
Remote SQL: SELECT col FROM public.s2 ORDER BY col ASC NULLS LAST
=# EXPLAIN (verbose on, costs off) SELECT * FROM p ORDER BY col;
17Copyright©2017 NTT corp. All Rights Reserved.
• Using PostgreSQL 9.6.2
• Insert to foreign child table
• Partition pruning
Demonstration
PostgreSQL
Parent
Table
Child
Table
Child
Table
PostgreSQLID
1 ~ 100
PostgreSQLID
101 ~ 200
postgres_fdw
Foreign Server
1
Foreign Server
2
SQL Coordinator
18Copyright©2017 NTT corp. All Rights Reserved.
• Transparent to the user
• No need to modify application code
• No special DDLs for table management
• same as local table partitioning
• Can use multiple partitioning method; list, range (and hash)
• Horizontal partitioning
• Can support not only PostgreSQL shard node but also
other source that corresponding FDW exists
• Coordinator node can be a shard node as well
• All features are Implemented as a generic feature
• FDW features are useful on their own merit
FDW-based Sharding
19Copyright©2017 NTT corp. All Rights Reserved.
• PostgreSQL 9.6 can cover use cases where,
• Frequent reads
• The system requires write scale-out
• Write single shard node in a transaction
• If you don’t need transaction, you can do it with multiple server
Use cases
20Copyright©2017 NTT corp. All Rights Reserved.
• More push down*
• Distributed query optimization*
• Asynchronous execution*
• Partitioning*
• Transaction support*
• Node registration
• High availability
• etc.
Challenges and Key Techniques of FDW-based sharding
21Copyright©2017 NTT corp. All Rights Reserved.
• Push-down makes distributed query execution more efficient
• What push down we can and can’t
• Conditionals
• data types, operators, function (including extension-provided)
• Join, Sort, Aggregate(PG10+)
• Grouping sets, window function aren’t yet
• Patches for PostgreSQL 10
• “Push down more full joins in postgres_fdw” by Etsuro Fujita
• “Push down more UPDATEs/DELETEs in postgres_fdw” by Etsuro Fujita
• “postgres_fdw: support parameterized foreign joins” by Etsuro Fujita
More push down
22Copyright©2017 NTT corp. All Rights Reserved.
postgres_fdw and distributed queries
Operation PostgreSQL 9.5 PostgreSQL 9.6 PostgreSQL 10
SELECT
Foreign
table pruning
Foreign
table pruning
Foreign
table pruning
Conditionals Push down Push down Push down
Aggregations Local Local Push down
Sorts Local Push down Push down
Joins Local
Push down
(Left, Right, Full)
Push down*
(Left, Right, Full)
UPDATE,
DELETE
Tuple based
using CURSOR
Directly execution
Directly execution*
(with joins)
INSERT
INSERT to remote server
using Prepare/Execute
INSERT to remote server
using Prepare/Execute
INSERT to remote server
using Prepare/Execute
23Copyright©2017 NTT corp. All Rights Reserved.
• Need declarative partitioning
• Committed basic infrastructure and syntax to PostgreSQL 10!
• Still missing building blocks
• Tuple routing feature
• doesn’t support insert foreign partitioned table so far
• Executor improvement
• Global unique index
Partitioning
24Copyright©2017 NTT corp. All Rights Reserved.
• Executor improvement
• Data fetching request to different site can be sent
asynchronously
• Improves foreign table scanning performance
• Patch
• Under discussion
• “Asynchronous execution for postgres_fdw” by Kyotaro Horiguchi
Asynchronous Execution
25Copyright©2017 NTT corp. All Rights Reserved.
• Provide cluster-wide transaction (ACID)
• Atomic commit
• Under reviewing
• Transaction involving multiple foreign servers commits using two-
phase-commit protocol
• Patch
• “Transactions involving multiple postgres foreign servers” by
Masahiko Sawada, Ashutosh Bapat
Distributed Transaction Management
26Copyright©2017 NTT corp. All Rights Reserved.
Processing Sequence of 2PC on FDW
Coordinator
Foreign server
1
Client
Foreign server
2
Two-phase
commit is used
transparently.
27Copyright©2017 NTT corp. All Rights Reserved.
• Business report (complex query analyzing large data)
• By aggregation pushdown, optimizer improvement
• Update partition key across nodes atomically
• By atomic commit distributed transaction
Use cases with PostgreSQL 10
28Copyright©2017 NTT corp. All Rights Reserved.
Conclusion
29Copyright©2017 NTT corp. All Rights Reserved.
• FDW-based sharding brings us a native PostgreSQL scale-
out solution
• A lot of work in-progress building blocks
• Do we really need it?
• To expand the applicability to more critical system
• Each sharding feature improves PostgreSQL generically
• More detail of FDW-based sharding,
• https://wiki.postgresql.org/wiki/Built-in_Sharding
Conclusion - Keep challenging -
30Copyright©2017 NTT corp. All Rights Reserved.
• The Future of Postgres Sharding
• https://momjian.us/main/writings/pgsql/sharding.pdf
• Shard (database architecture)
• https://en.wikipedia.org/wiki/Shard_(database_architecture)
• Planning Parallel and Distributed Queries
• https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVs
dGRvbWFpbnxyb2JlcnRtaGFhc3xneDo1ZmFhYzBhNjNhNzVhMDM0
References
31Copyright©2017 NTT corp. All Rights Reserved.
Thank you
Спасибо
Masahiko Sawada
sawada.mshk@gmail.com
32Copyright©2017 NTT corp. All Rights Reserved.
• NTT has been developing feature related to FDW-based
sharding since PostgreSQL 9.3, with the knowledge
obtained through the development Postgres-XC.
FDW features
9.3
• *Introduce
postgres_fdw
• *Write via FDW
• *Foreign table
inheritance
• *Join push down
• *Sort push down
• *Direct perform UPDATE
and DELETE
• Extension-provided
operator push down
• *Partitioning
• Aggregate push
down
• (*Async execution)
• (*2PC on FDW)
• Trigger on
Foreign table
9.4 9.5 9.6 10

More Related Content

PDF
PostgreSQLアーキテクチャ入門
PDF
OSSデータベースの開発コミュニティに参加しよう! (DEIM2024 発表資料)
PPTX
Dbts 分散olt pv2
PDF
PostgreSQL13でのレプリケーション関連の改善について(第14回PostgreSQLアンカンファレンス@オンライン)
PPTX
PostgreSQLモニタリングの基本とNTTデータが追加したモニタリング新機能(Open Source Conference 2021 Online F...
PDF
外部データラッパによる PostgreSQL の拡張
PDF
オンライン物理バックアップの排他モードと非排他モードについて(第15回PostgreSQLアンカンファレンス@オンライン 発表資料)
PDF
PostgreSQLのバグとの付き合い方 ~バグの調査からコミュニティへの報告、修正パッチ投稿まで~(PostgreSQL Conference Japa...
PostgreSQLアーキテクチャ入門
OSSデータベースの開発コミュニティに参加しよう! (DEIM2024 発表資料)
Dbts 分散olt pv2
PostgreSQL13でのレプリケーション関連の改善について(第14回PostgreSQLアンカンファレンス@オンライン)
PostgreSQLモニタリングの基本とNTTデータが追加したモニタリング新機能(Open Source Conference 2021 Online F...
外部データラッパによる PostgreSQL の拡張
オンライン物理バックアップの排他モードと非排他モードについて(第15回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLのバグとの付き合い方 ~バグの調査からコミュニティへの報告、修正パッチ投稿まで~(PostgreSQL Conference Japa...

What's hot (20)

PDF
速習!論理レプリケーション ~基礎から最新動向まで~(PostgreSQL Conference Japan 2022 発表資料)
PPTX
PostgreSQLのfull_page_writesについて(第24回PostgreSQLアンカンファレンス@オンライン 発表資料)
PDF
MesonでPostgreSQLをビルドしてみよう!(第39回PostgreSQLアンカンファレンス@オンライン 発表資料)
PDF
PostgreSQLバックアップの基本
PDF
[Postgre sql9.4新機能]レプリケーション・スロットの活用
PDF
TiDBの可用性構成パターン (TiUG Meetup #2 発表資料)
PDF
PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)
PPTX
祝!PostgreSQLレプリケーション10周年!徹底紹介!!
PDF
Linux tuning to improve PostgreSQL performance
PDF
あなたの知らないPostgreSQL監視の世界
PDF
Embulk, an open-source plugin-based parallel bulk data loader
PDF
Vacuum徹底解説
PPT
Transactional Information Systems入門
PPTX
オンライン物理バックアップの排他モードと非排他モードについて ~PostgreSQLバージョン15対応版~(第34回PostgreSQLアンカンファレンス...
PPTX
VSCodeで作るPostgreSQL開発環境(第25回 PostgreSQLアンカンファレンス@オンライン 発表資料)
PDF
pg_trgmと全文検索
PDF
pg_bigmを触り始めた人に伝えたいこと
PDF
Backup and-recovery2
PDF
PostgreSQL13でのpg_basebackupの改善について(第13回PostgreSQLアンカンファレンス@オンライン)
速習!論理レプリケーション ~基礎から最新動向まで~(PostgreSQL Conference Japan 2022 発表資料)
PostgreSQLのfull_page_writesについて(第24回PostgreSQLアンカンファレンス@オンライン 発表資料)
MesonでPostgreSQLをビルドしてみよう!(第39回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLバックアップの基本
[Postgre sql9.4新機能]レプリケーション・スロットの活用
TiDBの可用性構成パターン (TiUG Meetup #2 発表資料)
PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)
祝!PostgreSQLレプリケーション10周年!徹底紹介!!
Linux tuning to improve PostgreSQL performance
あなたの知らないPostgreSQL監視の世界
Embulk, an open-source plugin-based parallel bulk data loader
Vacuum徹底解説
Transactional Information Systems入門
オンライン物理バックアップの排他モードと非排他モードについて ~PostgreSQLバージョン15対応版~(第34回PostgreSQLアンカンファレンス...
VSCodeで作るPostgreSQL開発環境(第25回 PostgreSQLアンカンファレンス@オンライン 発表資料)
pg_trgmと全文検索
pg_bigmを触り始めた人に伝えたいこと
Backup and-recovery2
PostgreSQL13でのpg_basebackupの改善について(第13回PostgreSQLアンカンファレンス@オンライン)
Ad

Similar to FDW-based Sharding Update and Future (20)

PPTX
PostgreSQL 10: What to Look For
PPTX
PostgreSQL as a Strategic Tool
 
PDF
Multi Master PostgreSQL Cluster on Kubernetes
PDF
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
PPTX
New enhancements for security and usability in EDB 13
 
PDF
Using PEM to understand and improve performance in Postgres: Postgres Tuning ...
 
PPT
Postgres for the Future
 
PDF
VM-aware Adaptive Storage Cache Prefetching
PDF
IPv4 IPv6 Media Player
PDF
20201006_PGconf_Online_Large_Data_Processing
PPTX
Getting started with postgresql
PPTX
How to debug machine learning call stacks
PDF
Kubernetes - Hosted OSG Services
PDF
Archmage, Pinterest’s Real-time Analytics Platform on Druid
PDF
XPDDS17: To Grant or Not to Grant? - João Martins, Oracle
PDF
Presto@Uber
PDF
4K Video Downloader Crack + License Key 2025
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
PDF
[db tech showcase Tokyo 2017] C13:There and back again or how to connect Orac...
PDF
There and back_again_oracle_and_big_data_16x9
PostgreSQL 10: What to Look For
PostgreSQL as a Strategic Tool
 
Multi Master PostgreSQL Cluster on Kubernetes
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
New enhancements for security and usability in EDB 13
 
Using PEM to understand and improve performance in Postgres: Postgres Tuning ...
 
Postgres for the Future
 
VM-aware Adaptive Storage Cache Prefetching
IPv4 IPv6 Media Player
20201006_PGconf_Online_Large_Data_Processing
Getting started with postgresql
How to debug machine learning call stacks
Kubernetes - Hosted OSG Services
Archmage, Pinterest’s Real-time Analytics Platform on Druid
XPDDS17: To Grant or Not to Grant? - João Martins, Oracle
Presto@Uber
4K Video Downloader Crack + License Key 2025
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
[db tech showcase Tokyo 2017] C13:There and back again or how to connect Orac...
There and back_again_oracle_and_big_data_16x9
Ad

More from Masahiko Sawada (20)

PDF
PostgreSQL 15の新機能を徹底解説
PDF
行ロックと「LOG: process 12345 still waiting for ShareLock on transaction 710 afte...
PDF
PostgreSQL 15 開発最新情報
PDF
Transparent Data Encryption in PostgreSQL
PDF
PostgreSQL 12の話
PDF
OSS活動のやりがいとそれから得たもの - PostgreSQLコミュニティにて -
PDF
Transparent Data Encryption in PostgreSQL and Integration with Key Management...
PDF
Bloat and Fragmentation in PostgreSQL
PDF
Database Encryption and Key Management for PostgreSQL - Principles and Consid...
PDF
今秋リリース予定のPostgreSQL11を徹底解説
PDF
Vacuum more efficient than ever
PDF
Vacuumとzheap
PDF
アーキテクチャから理解するPostgreSQLのレプリケーション
PDF
Parallel Vacuum
PDF
PostgreSQLでスケールアウト
PDF
OSS 開発ってどうやっているの? ~ PostgreSQL の現場から~
PDF
PostgreSQL10徹底解説
PDF
What’s new in 9.6, by PostgreSQL contributor
PDF
PostgreSQL 9.6 新機能紹介
PDF
pg_bigmと類似度検索
PostgreSQL 15の新機能を徹底解説
行ロックと「LOG: process 12345 still waiting for ShareLock on transaction 710 afte...
PostgreSQL 15 開発最新情報
Transparent Data Encryption in PostgreSQL
PostgreSQL 12の話
OSS活動のやりがいとそれから得たもの - PostgreSQLコミュニティにて -
Transparent Data Encryption in PostgreSQL and Integration with Key Management...
Bloat and Fragmentation in PostgreSQL
Database Encryption and Key Management for PostgreSQL - Principles and Consid...
今秋リリース予定のPostgreSQL11を徹底解説
Vacuum more efficient than ever
Vacuumとzheap
アーキテクチャから理解するPostgreSQLのレプリケーション
Parallel Vacuum
PostgreSQLでスケールアウト
OSS 開発ってどうやっているの? ~ PostgreSQL の現場から~
PostgreSQL10徹底解説
What’s new in 9.6, by PostgreSQL contributor
PostgreSQL 9.6 新機能紹介
pg_bigmと類似度検索

Recently uploaded (20)

PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PPTX
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PDF
Advancing precision in air quality forecasting through machine learning integ...
PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
PDF
Auditboard EB SOX Playbook 2023 edition.
PDF
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
PDF
NewMind AI Weekly Chronicles – August ’25 Week IV
PDF
Lung cancer patients survival prediction using outlier detection and optimize...
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PPTX
Module 1 Introduction to Web Programming .pptx
PDF
4 layer Arch & Reference Arch of IoT.pdf
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
Data Virtualization in Action: Scaling APIs and Apps with FME
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
Convolutional neural network based encoder-decoder for efficient real-time ob...
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
Advancing precision in air quality forecasting through machine learning integ...
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
Auditboard EB SOX Playbook 2023 edition.
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
The influence of sentiment analysis in enhancing early warning system model f...
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
NewMind AI Weekly Chronicles – August ’25 Week IV
Lung cancer patients survival prediction using outlier detection and optimize...
Early detection and classification of bone marrow changes in lumbar vertebrae...
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
Module 1 Introduction to Web Programming .pptx
4 layer Arch & Reference Arch of IoT.pdf
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...

FDW-based Sharding Update and Future

  • 1. Copyright©2017 NTT corp. All Rights Reserved. FDW-based Sharding Update and Future NTT Open Source Software Center Masahiko Sawada PGConf Russia 2017 (16th March)
  • 2. 2Copyright©2017 NTT corp. All Rights Reserved. Who am I? Masahiko Sawada Twitter : @sawada_masahiko GitHub: MasahikoSawada PostgreSQL Contributor Freeze Map(PG9.6) Multiple Synchronous Replication(PG9.6) Quorum-based Synchronous Replication(PG10) PostgreSQL Technical Support pg_repack committer
  • 3. 3Copyright©2017 NTT corp. All Rights Reserved. 1. What is database sharding 2. What is FDW-based sharding 3. Demonstration 4. Use cases 5. Challenges and key techniques 6. Conclusion Agenda
  • 4. 4Copyright©2017 NTT corp. All Rights Reserved. • Scale-up • Vertical scaling • Simple • Price • Not safe against hardware failure Scale-up and Scale-out • Scale-out • Horizontal scaling • Easier to run fault- tolerantly • More complex
  • 5. 5Copyright©2017 NTT corp. All Rights Reserved. • A scale-out technique • The tables are divided and distributed into multiple servers • Row based • Column based • A database shard can be placed on separate hardware What is database sharding
  • 6. 6Copyright©2017 NTT corp. All Rights Reserved. • Pros • Write scale out (Horizontal scaling) • Reduce I/O on each shard, by splitting data across shard • Access only required shard • Cons • Node management • Cross-shard transaction could be cause of slow query • Downtime might be required when changing the sharding layout Pros and Cons
  • 7. 7Copyright©2017 NTT corp. All Rights Reserved. • Reliability • Backups of individual database shards • Replication of database shards • Automated failover • Distributed queries • Avoidance of cross-shard joins • Auto-increment key, like sequence • Distributed transactions Challenges
  • 8. 8Copyright©2017 NTT corp. All Rights Reserved. • Postgres-XC by NTT, EDB • Postgres-XL by 2ndQuadrant • Postgres Cluster by Postgres Professional • Greenplum by Pivotal • pg_shard by CitusData • Other than PostgreSQL, • VoltDB • MySQL Cluster • Spanner • etc Well-known Products
  • 9. 9Copyright©2017 NTT corp. All Rights Reserved. • FDW-based sharding is a database sharding techniques using mainly FDW (Foreign-Data-Wrapper) and Table Partitioning • Our goal is providing a sharding solution as a Built-in feature. What is FDW-based sharding PostgreSQL PostgreSQL PostgreSQL PostgreSQL Client Client Client Client postgres_fdw
  • 10. 10Copyright©2017 NTT corp. All Rights Reserved. Basic Architecture (PG9.6) PostgreSQL Heap Table (Parent Table) Foreign Table (child) Foreign Table (child) PostgreSQLHeap Table (child) PostgreSQLHeap Table (child) Client postgres_fdw Coordinator Node Data NodeData Node Table Partitioning
  • 11. 11Copyright©2017 NTT corp. All Rights Reserved. Basic Architecture (PG9.6) PostgreSQL Heap Table (Parent Table) Foreign Table (child) Foreign Table (child) PostgreSQLHeap Table (child) PostgreSQLHeap Table (child) Client postgres_fdw Coordinator Node & Data Node Data NodeData Node Table Partitioning Heap Table (Parent Table)
  • 12. 12Copyright©2017 NTT corp. All Rights Reserved. Data Node Multiple coordinator nodes (future) PostgreSQL PostgreSQLPostgreSQL Client Client Client Client PostgreSQL PostgreSQL Heap Table Foreign Table PostgreSQL Coordinator Node
  • 13. 13Copyright©2017 NTT corp. All Rights Reserved. PostgreSQL server behaves both (future) PostgreSQL Client Client Client Client PostgreSQL PostgreSQL Heap Table Foreign Table Coordinator & Data Node Coordinator & Data Node Coordinator & Data Node
  • 14. 14Copyright©2017 NTT corp. All Rights Reserved. Insert data ID=50 PostgreSQL Heap Table (Parent Table) Foreign Table (child) Foreign Table (child) PostgreSQLHeap Table (child) PostgreSQLHeap Table (child) Client postgres_fdw Coordinator Node Data NodeData Node Table Partitioning INSERT INTO parent_table VALUES(50); ID : 0 ~ 100 ID : 101 ~ 200
  • 15. 15Copyright©2017 NTT corp. All Rights Reserved. Select data ID=150 PostgreSQL Heap Table (Parent Table) Foreign Table (child) Foreign Table (child) PostgreSQLHeap Table (child) PostgreSQLHeap Table (child) Client postgres_fdw Coordinator Node Data NodeData Node Table Partitioning ID : 0 ~ 100 ID : 101 ~ 200 SELECT … FROM … WHERE id = 150;
  • 16. 16Copyright©2017 NTT corp. All Rights Reserved. Sort Push Down -- 9.5 Sort Output: p.col Sort Key: p.col -> Append -> Seq Scan on public.p Output: p.col -> Foreign Scan on public.s1 Output: s1.col Remote SQL: SELECT col FROM public.s1 -> Foreign Scan on public.s2 Output: s2.col Remote SQL: SELECT col FROM public.s2 -- 9.6 Merge Append Sort Key: p.col -> Sort Output: p.col Sort Key: p.col -> Seq Scan on public.p Output: p.col -> Foreign Scan on public.s1 Output: s1.col Remote SQL: SELECT col FROM public.s1 ORDER BY col ASC NULLS LAST -> Foreign Scan on public.s2 Output: s2.col Remote SQL: SELECT col FROM public.s2 ORDER BY col ASC NULLS LAST =# EXPLAIN (verbose on, costs off) SELECT * FROM p ORDER BY col;
  • 17. 17Copyright©2017 NTT corp. All Rights Reserved. • Using PostgreSQL 9.6.2 • Insert to foreign child table • Partition pruning Demonstration PostgreSQL Parent Table Child Table Child Table PostgreSQLID 1 ~ 100 PostgreSQLID 101 ~ 200 postgres_fdw Foreign Server 1 Foreign Server 2 SQL Coordinator
  • 18. 18Copyright©2017 NTT corp. All Rights Reserved. • Transparent to the user • No need to modify application code • No special DDLs for table management • same as local table partitioning • Can use multiple partitioning method; list, range (and hash) • Horizontal partitioning • Can support not only PostgreSQL shard node but also other source that corresponding FDW exists • Coordinator node can be a shard node as well • All features are Implemented as a generic feature • FDW features are useful on their own merit FDW-based Sharding
  • 19. 19Copyright©2017 NTT corp. All Rights Reserved. • PostgreSQL 9.6 can cover use cases where, • Frequent reads • The system requires write scale-out • Write single shard node in a transaction • If you don’t need transaction, you can do it with multiple server Use cases
  • 20. 20Copyright©2017 NTT corp. All Rights Reserved. • More push down* • Distributed query optimization* • Asynchronous execution* • Partitioning* • Transaction support* • Node registration • High availability • etc. Challenges and Key Techniques of FDW-based sharding
  • 21. 21Copyright©2017 NTT corp. All Rights Reserved. • Push-down makes distributed query execution more efficient • What push down we can and can’t • Conditionals • data types, operators, function (including extension-provided) • Join, Sort, Aggregate(PG10+) • Grouping sets, window function aren’t yet • Patches for PostgreSQL 10 • “Push down more full joins in postgres_fdw” by Etsuro Fujita • “Push down more UPDATEs/DELETEs in postgres_fdw” by Etsuro Fujita • “postgres_fdw: support parameterized foreign joins” by Etsuro Fujita More push down
  • 22. 22Copyright©2017 NTT corp. All Rights Reserved. postgres_fdw and distributed queries Operation PostgreSQL 9.5 PostgreSQL 9.6 PostgreSQL 10 SELECT Foreign table pruning Foreign table pruning Foreign table pruning Conditionals Push down Push down Push down Aggregations Local Local Push down Sorts Local Push down Push down Joins Local Push down (Left, Right, Full) Push down* (Left, Right, Full) UPDATE, DELETE Tuple based using CURSOR Directly execution Directly execution* (with joins) INSERT INSERT to remote server using Prepare/Execute INSERT to remote server using Prepare/Execute INSERT to remote server using Prepare/Execute
  • 23. 23Copyright©2017 NTT corp. All Rights Reserved. • Need declarative partitioning • Committed basic infrastructure and syntax to PostgreSQL 10! • Still missing building blocks • Tuple routing feature • doesn’t support insert foreign partitioned table so far • Executor improvement • Global unique index Partitioning
  • 24. 24Copyright©2017 NTT corp. All Rights Reserved. • Executor improvement • Data fetching request to different site can be sent asynchronously • Improves foreign table scanning performance • Patch • Under discussion • “Asynchronous execution for postgres_fdw” by Kyotaro Horiguchi Asynchronous Execution
  • 25. 25Copyright©2017 NTT corp. All Rights Reserved. • Provide cluster-wide transaction (ACID) • Atomic commit • Under reviewing • Transaction involving multiple foreign servers commits using two- phase-commit protocol • Patch • “Transactions involving multiple postgres foreign servers” by Masahiko Sawada, Ashutosh Bapat Distributed Transaction Management
  • 26. 26Copyright©2017 NTT corp. All Rights Reserved. Processing Sequence of 2PC on FDW Coordinator Foreign server 1 Client Foreign server 2 Two-phase commit is used transparently.
  • 27. 27Copyright©2017 NTT corp. All Rights Reserved. • Business report (complex query analyzing large data) • By aggregation pushdown, optimizer improvement • Update partition key across nodes atomically • By atomic commit distributed transaction Use cases with PostgreSQL 10
  • 28. 28Copyright©2017 NTT corp. All Rights Reserved. Conclusion
  • 29. 29Copyright©2017 NTT corp. All Rights Reserved. • FDW-based sharding brings us a native PostgreSQL scale- out solution • A lot of work in-progress building blocks • Do we really need it? • To expand the applicability to more critical system • Each sharding feature improves PostgreSQL generically • More detail of FDW-based sharding, • https://wiki.postgresql.org/wiki/Built-in_Sharding Conclusion - Keep challenging -
  • 30. 30Copyright©2017 NTT corp. All Rights Reserved. • The Future of Postgres Sharding • https://momjian.us/main/writings/pgsql/sharding.pdf • Shard (database architecture) • https://en.wikipedia.org/wiki/Shard_(database_architecture) • Planning Parallel and Distributed Queries • https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVs dGRvbWFpbnxyb2JlcnRtaGFhc3xneDo1ZmFhYzBhNjNhNzVhMDM0 References
  • 31. 31Copyright©2017 NTT corp. All Rights Reserved. Thank you Спасибо Masahiko Sawada [email protected]
  • 32. 32Copyright©2017 NTT corp. All Rights Reserved. • NTT has been developing feature related to FDW-based sharding since PostgreSQL 9.3, with the knowledge obtained through the development Postgres-XC. FDW features 9.3 • *Introduce postgres_fdw • *Write via FDW • *Foreign table inheritance • *Join push down • *Sort push down • *Direct perform UPDATE and DELETE • Extension-provided operator push down • *Partitioning • Aggregate push down • (*Async execution) • (*2PC on FDW) • Trigger on Foreign table 9.4 9.5 9.6 10