SlideShare a Scribd company logo
Built-in Replication in PostgreSQL


                                                      Fujii Masao
                                                  NTT OSS Center

                                                     09/27/2010

Copyright(c)2010 NTT, Inc. All Rights Reserved.
Who am I?
 • Database engineer in NTT Open Source
   Software Center

 • PostgreSQL developer since 2008

 • Author of new built-in replication




Copyright(c)2010 NTT, Inc. All Rights Reserved.   2
Abstract
 • What’s replication?

 • Background

 • How does the built-in replication work?
        –    Features
        –    Architecture
        –    Limitations
        –    Future works


Copyright(c)2010 NTT, Inc. All Rights Reserved.   3
What’s replication?
 • Create a replica of the database on
   multiple servers
        – Multiple servers have the same database
                                                           Client


                               Change




                                                  Change



                          Original                                  Replicas


Copyright(c)2010 NTT, Inc. All Rights Reserved.                                4
Why replication?
 • High Availability
        – Reduces the system downtime

 • Load Balancing
        – Improve the system performance
   High Availability                                Load Balancing
                                           Client                     Client

                                          SQL            SQL          SQL




                          DBMS                                 DBMS

Copyright(c)2010 NTT, Inc. All Rights Reserved.                                5
Background
 • Historical policy
        – Avoid putting replication into core
          PostgreSQL
        – No "one size fits all" replication solution

 • Replication war!
                   Postgres-XC                                 syncreplicator
                                                  rubyrep
                                  PL/Proxy                            Londiste
              Postgres-R
                                 Sequoia                         Slony-I
                                                                           Bucardo
                      pgpool-II
                                                                  Mammoth
          GridSQL
                             PGCluster             PyReplica       PostgresForest

Copyright(c)2010 NTT, Inc. All Rights Reserved.                                      6
Road to core
 • No default choice
        –    Too complex to install and use for simple cases
        –    Low activity, easily-inactive
        –    No Japanese document
        –    Cannot work on other than linux
        –    vs. other dbms


 • v9.0
        – Simple, reliable basic replication in core




Copyright(c)2010 NTT, Inc. All Rights Reserved.                7
Built-in replication in PostgreSQL 9.0
 • Streaming Replication
        – Capability to stream changes on master to standby
 • Hot Standby
        – Capability to run read-only queries on standby

 • 1+1=3
                                              Client
                                                                     Hot Standby
                                 R/W SQL                        R/O SQL

               Master                                                     Standby
                                                       Change

                                       Streaming Replication
Copyright(c)2010 NTT, Inc. All Rights Reserved.                                     8
Master / Standbys
 • One master / Multiple standbys
        – Only master accepts write query
        – Both accepts read query


 • Read scalable
        – Not write scalable

                               Client
                                                                Standbys
                                                      R/O SQL
                  R/W SQL



                                             Change
               Master

Copyright(c)2010 NTT, Inc. All Rights Reserved.                            9
Cascading vs. Proxy
                                            Client          Not allow Cascading




            Master                                Standby        Standby


                                            Client          Allow Proxy approach




            Master                                 Proxy                   Standbys


Copyright(c)2010 NTT, Inc. All Rights Reserved.                                       10
Hot Standby

 Allow
 • Query access                                   • Logical hot backup
        – SELECT                                    – pg_dump



 Not allow
 • Data Manipulation                              • Maintenance
   Language (DML)                                   – VACUUM, ANALYZE
        – INSERT, UPDATE, DELETE                    – (Replicated from master)
        – SELECT FOR UPDATE
                                                  • Physical hot backup
 • Data Definition                                  – pg_start/stop_backup
   Language (DDL)
        – CREATE, DROP, ALTER



Copyright(c)2010 NTT, Inc. All Rights Reserved.                                  11
Log-shipping
 • WAL is shipped from master to standby
        – WAL a.k.a transaction log

 • Standby in recovery mode
        – Keeps the database current by replaying receved
          WAL
                                                        Client
                                     Write query



                 Master                                                  Standby
                                                  WAL
                                                                 Recovery

                                    WAL                 WAL          Database

Copyright(c)2010 NTT, Inc. All Rights Reserved.                                    12
Limitation by log-shipping
 • Must be the same between master and
   standby
        – H/W architecture
        – PostgreSQL major version
                                      Client           Standby
                                                                 OS: 32bit

                                                  NG
               Master
                                                  NG             PG: v9.1.0


        OS: 64bit
        PG: v9.0.0                                OK             OS: 64bit
                                                                 PG: v9.0.2


Copyright(c)2010 NTT, Inc. All Rights Reserved.                               13
Per database cluster granularity
 • All database objects are replicated
        – Per-table granularity is not allowed


          Per database cluster                     Per table




            Master                       Standby    Master     Standby



Copyright(c)2010 NTT, Inc. All Rights Reserved.                          14
Easy migration
 • No need to change table definition
        – cf. Slony-I forces table to have a primary key

 • No need to rewrite SQL
        – cf. Slony-I doesn’t replicate DDL
        – All the SQL PostgreSQL supports are available in master

 • Easy to use existing database server as master

                   Client                                             Client



                                           Easy migration!



             Stand-alone                                     Master       Standby

Copyright(c)2010 NTT, Inc. All Rights Reserved.                                     15
No query distribution
 • Postgres doesn’t provide query distribution capability
        – Implement query distribution logic into application
        – Use query distributor

    Implement logic                                     Use distributor
                                           Client                            Client



                                                                          Query

   Write query                            Read query                        Distributor

                                                       Write query          Read query




      Master                               Standby       Master              Standby


Copyright(c)2010 NTT, Inc. All Rights Reserved.                                           16
Shared nothing
 • WAL is shipped via network
        – No special H/W required
        – No distance limitation
        – No single point of failure

         Shared nothing                                 Shared disk



      Master                                  Standby   Master        Standby




Copyright(c)2010 NTT, Inc. All Rights Reserved.                                 17
Asynchronous
 • WAL is shipped asynchronously
        – Low performance impact on the master
        – Data loss window on failover
        – Query on the standby sees a bit outdated
          transactions
                                                                 Client thinks this
                                                       Client    transaction has been
                                                                 committed. But..
                         Transaction
                                                  “success”
                                                                             Transaction has

                          Master
                                         WAL                    Standby
                                                                             not been
                                                                             replicated yet.

Copyright(c)2010 NTT, Inc. All Rights Reserved.                                                18
Failover
 • Standby can be brought up anytime
        – Automatic failover requires clusterware
 • Failover time is relatively short
                                        Client                   Client
  Pacemaker                                          pgpool-II


                 VIP
                                                                 pgpool-II




      Master                               Standby    Master      Standby


Copyright(c)2010 NTT, Inc. All Rights Reserved.                              19
Online standby addtion and deletion
 • Standby can be added or deleted without
   downtime of the master and the other
   standbys
        – This is useful for small start system

                                                   Client




      Don’t need to                       Master
      stop master
      during adding
      new standby

                                                            New Standby

Copyright(c)2010 NTT, Inc. All Rights Reserved.                           20
Built-in
 • Easy to install and use
        – Need to install only Postgres
        – User-intuitive usage
        – Run on all the major operating systems


 • Highly active community
        –    Volunteers translate the document into Japanese
        –    Bug will be fixed soon
        –    Continuous improvement and development
        –    Many users



Copyright(c)2010 NTT, Inc. All Rights Reserved.                21
Architecture


                                                         Client

                                     Write query           Read query

                                                                           access
          database                                            backend
                                                             backend              database
                                                             backend
 change                                                                          apply
                                                  send      receive
           backend
          backend
          backend                    walsender               walreceiver            startup

       write                                read              write                      read
                          WAL                                              WAL

      Master                                                Standby


Copyright(c)2010 NTT, Inc. All Rights Reserved.                                                 22
Multiple standbys
 • One-to-one relationship between
   walsender and standby
        – WAL is shipped to each standby in parallel

    backend
   backend                        walsender        walreceiver         startup
   backend                                                       WAL

                                                  Standby


        WAL                       walsender        walreceiver   WAL   startup

                                                  Standby


                                  walsender        walreceiver   WAL   startup

    Master                                        Standby

Copyright(c)2010 NTT, Inc. All Rights Reserved.                                  23
Walsender and WAL
 • Walsender always reads WAL from disk
        – Prevent standby from going ahead of
          master
        – Avoid loss of consistency between master
          and standby

 • WAL is basically read from file cache
        – WAL is read just after written
        – I/O load by walsender is not high
        – But, WAL is read from disk if standby falls
          far behind master

Copyright(c)2010 NTT, Inc. All Rights Reserved.         24
Recovery vs. Read-only query


                                                  Client

                                                   Read query
                                                                 access
                                                       backend
                                                      backend           database
                                                      backend

                                                                       apply

                                                                          startup


                                                                               read
                                                                 WAL

                                                                         Recovery

Copyright(c)2010 NTT, Inc. All Rights Reserved.                                       25
Recovery vs. Read-only query
 • Until the conflict has been resolved,
        – Read query returns outdated result
        – Failover is blocked


 • Parameter specifying maximum delay in
   recovery
        – Increase the delay when running time-consuming
          job
        – Decrease the delay when we want to make the
          failover time short




Copyright(c)2010 NTT, Inc. All Rights Reserved.            26
Recovery vs. Read-only query


                                                         Client

                                     Write query
                                                          Don’t interfere
                                                          with log-shipping
          database

 change
                                                  send      receive
           backend
          backend
          backend                    walsender               walreceiver

       write                                read              write

                          WAL                                              WAL

      Master                                                Standby


Copyright(c)2010 NTT, Inc. All Rights Reserved.                                  27
Future work - Synchronous
 •    Synchronous replication is essential to avoid data loss on failover
        – Currently under development for 9.1
        – Three synchronization levels: recv, fsync, apply


                                                                 Client

                                     Write query



          database                                 “success”
 change
                                                  send              receive
           backend
          backend
          backend                    walsender                       walreceiver
                                                         reply
       write                                read

                            recv
                          WAL
                            Master waits until standby
      Master                has received WAL           Standby
Copyright(c)2010 NTT, Inc. All Rights Reserved.                                    28
Future work - Synchronous
 •    Synchronous replication is essential to avoid data loss on failover
        – Currently under development for 9.1
        – Three synchronization levels: recv, fsync, apply


                                                                 Client

                                     Write query

                                                                          fsync     access
          database                                 “success”              Master waits until standby
                                                                          has written WAL
 change
                                                  send              receive
           backend
          backend
          backend                    walsender                       walreceiver
                                                         reply
       write                                read                      write

                          WAL                                                       WAL

      Master                                                        Standby
Copyright(c)2010 NTT, Inc. All Rights Reserved.                                                        29
Future work - Synchronous
 •    Synchronous replication is essential to avoid data loss on failover
        – Currently under development for 9.1
        – Three synchronization levels: recv, fsync, apply


                                                                  Client

                                     Write query         apply    Read query
                                                         Master waits until standby
                                                         has applied WAL

          database                                 “success”                               database

 change                                                                                   apply
                                                  send               receive
           backend
          backend
          backend                    walsender                        walreceiver           startup
                                                          reply
       write                                read                       write                      read
                          WAL                                                       WAL

      Master                                                         Standby
Copyright(c)2010 NTT, Inc. All Rights Reserved.                                                          30
Future work - Synchronous
 • Per-transaction control
        – Some transactions are important, others are
          not


 • Quorum commit
        – Master waits for N standbys




Copyright(c)2010 NTT, Inc. All Rights Reserved.         31
If you find bug or problem
 • Bug report form
        – http://www.postgresql.org/support/submit
          bug


 • Mail
        – pgsql-jp@ml.postgresql.jp
        – masao.fujii@gmail.com




Copyright(c)2010 NTT, Inc. All Rights Reserved.      32
Thank you for listening




Copyright(c)2010 NTT, Inc. All Rights Reserved.

More Related Content

What's hot (20)

PDF
PostreSQL HA and DR Setup & Use Cases
Ashnikbiz
 
PDF
Deep Dive into RDS PostgreSQL Universe
Jignesh Shah
 
PPT
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...
Command Prompt., Inc
 
PDF
PostgreSQL High Availability in a Containerized World
Jignesh Shah
 
PDF
Tuning DB2 in a Solaris Environment
Jignesh Shah
 
PDF
Best Practices with PostgreSQL on Solaris
Jignesh Shah
 
PDF
Architecture for building scalable and highly available Postgres Cluster
Ashnikbiz
 
PDF
My experience with embedding PostgreSQL
Jignesh Shah
 
PPTX
PostgreSQL and Linux Containers
Jignesh Shah
 
PDF
Geographically Distributed PostgreSQL
mason_s
 
PPTX
X-DB Replication Server and MMR
Ashnikbiz
 
PDF
PostgreSQL Query Cache - "pqc"
Uptime Technologies LLC
 
PDF
PostgreSQL HA
haroonm
 
PPTX
Provisioning and automating high availability postgres on aws ec2 (1)
Payal Singh
 
PDF
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Jignesh Shah
 
PDF
PostgreSQL and Benchmarks
Jignesh Shah
 
PDF
What's New in Postgres Plus Advanced Server 9.3
EDB
 
PDF
Postgres-XC: Symmetric PostgreSQL Cluster
Pavan Deolasee
 
PDF
Is There Anything PgBouncer Can’t Do?
EDB
 
PDF
Linux tuning to improve PostgreSQL performance
PostgreSQL-Consulting
 
PostreSQL HA and DR Setup & Use Cases
Ashnikbiz
 
Deep Dive into RDS PostgreSQL Universe
Jignesh Shah
 
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...
Command Prompt., Inc
 
PostgreSQL High Availability in a Containerized World
Jignesh Shah
 
Tuning DB2 in a Solaris Environment
Jignesh Shah
 
Best Practices with PostgreSQL on Solaris
Jignesh Shah
 
Architecture for building scalable and highly available Postgres Cluster
Ashnikbiz
 
My experience with embedding PostgreSQL
Jignesh Shah
 
PostgreSQL and Linux Containers
Jignesh Shah
 
Geographically Distributed PostgreSQL
mason_s
 
X-DB Replication Server and MMR
Ashnikbiz
 
PostgreSQL Query Cache - "pqc"
Uptime Technologies LLC
 
PostgreSQL HA
haroonm
 
Provisioning and automating high availability postgres on aws ec2 (1)
Payal Singh
 
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Jignesh Shah
 
PostgreSQL and Benchmarks
Jignesh Shah
 
What's New in Postgres Plus Advanced Server 9.3
EDB
 
Postgres-XC: Symmetric PostgreSQL Cluster
Pavan Deolasee
 
Is There Anything PgBouncer Can’t Do?
EDB
 
Linux tuning to improve PostgreSQL performance
PostgreSQL-Consulting
 

Similar to Built-in Replication in PostgreSQL (20)

PDF
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
Masao Fujii
 
PDF
PostgreSQL replication
NTT DATA OSS Professional Services
 
PPTX
High Availability Solutions in SQL 2012
Microsoft TechNet - Belgium and Luxembourg
 
PDF
Implementing the Future of PostgreSQL Clustering with Tungsten
Command Prompt., Inc
 
PDF
Advanced MySQL Replication Architectures - Luis Soares
MySQL Brasil
 
PDF
Reducing Database Pain & Costs with Postgres
EDB
 
PDF
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
EDB
 
PDF
High Availability with MySQL
Thava Alagu
 
PPTX
Sql Server 2012 HA and DR -- SQL Saturday Richmond
Joseph D'Antoni
 
PDF
PostgreSQL Replication
elliando dias
 
ZIP
My sql replication advanced techniques presentation
epee
 
PPTX
Sql server 2012 ha and dr sql saturday tampa
Joseph D'Antoni
 
PDF
Managing replication of PostgreSQL (Simon Riggs)
Ontico
 
PDF
HA Clustering of PostgreSQL(replication)@2012.9.29 PG Study.
Takatoshi Matsuo
 
PPTX
Sql server 2012 ha and dr sql saturday dc
Joseph D'Antoni
 
PDF
Synchronous Log Shipping Replication
elliando dias
 
PDF
Drupal Con My Sql Ha 2008 08 29
liufabin 66688
 
PDF
Keith Larson Replication
Dave Stokes
 
PPTX
Sql server 2012 ha and dr sql saturday boston
Joseph D'Antoni
 
PDF
DB2 Pure Scale Webcast
Laura Hood
 
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
Masao Fujii
 
PostgreSQL replication
NTT DATA OSS Professional Services
 
High Availability Solutions in SQL 2012
Microsoft TechNet - Belgium and Luxembourg
 
Implementing the Future of PostgreSQL Clustering with Tungsten
Command Prompt., Inc
 
Advanced MySQL Replication Architectures - Luis Soares
MySQL Brasil
 
Reducing Database Pain & Costs with Postgres
EDB
 
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
EDB
 
High Availability with MySQL
Thava Alagu
 
Sql Server 2012 HA and DR -- SQL Saturday Richmond
Joseph D'Antoni
 
PostgreSQL Replication
elliando dias
 
My sql replication advanced techniques presentation
epee
 
Sql server 2012 ha and dr sql saturday tampa
Joseph D'Antoni
 
Managing replication of PostgreSQL (Simon Riggs)
Ontico
 
HA Clustering of PostgreSQL(replication)@2012.9.29 PG Study.
Takatoshi Matsuo
 
Sql server 2012 ha and dr sql saturday dc
Joseph D'Antoni
 
Synchronous Log Shipping Replication
elliando dias
 
Drupal Con My Sql Ha 2008 08 29
liufabin 66688
 
Keith Larson Replication
Dave Stokes
 
Sql server 2012 ha and dr sql saturday boston
Joseph D'Antoni
 
DB2 Pure Scale Webcast
Laura Hood
 
Ad

More from Masao Fujii (10)

PDF
カスタムプランと汎用プラン
Masao Fujii
 
PDF
Introduction to pg_cheat_funcs
Masao Fujii
 
PDF
PostgreSQL Quiz
Masao Fujii
 
PDF
誰か私のTODOを解決してください
Masao Fujii
 
PDF
WAL圧縮
Masao Fujii
 
PDF
使ってみませんか?pg hint_plan
Masao Fujii
 
PDF
PostgreSQLレプリケーション徹底紹介
Masao Fujii
 
PDF
PostgreSQL V9 レプリケーション解説
Masao Fujii
 
PDF
PostgreSQL9.0アップデート レプリケーションがやってきた!
Masao Fujii
 
PDF
PostgreSQL9.1同期レプリケーションとPacemakerによる高可用クラスタ化の紹介
Masao Fujii
 
カスタムプランと汎用プラン
Masao Fujii
 
Introduction to pg_cheat_funcs
Masao Fujii
 
PostgreSQL Quiz
Masao Fujii
 
誰か私のTODOを解決してください
Masao Fujii
 
WAL圧縮
Masao Fujii
 
使ってみませんか?pg hint_plan
Masao Fujii
 
PostgreSQLレプリケーション徹底紹介
Masao Fujii
 
PostgreSQL V9 レプリケーション解説
Masao Fujii
 
PostgreSQL9.0アップデート レプリケーションがやってきた!
Masao Fujii
 
PostgreSQL9.1同期レプリケーションとPacemakerによる高可用クラスタ化の紹介
Masao Fujii
 
Ad

Recently uploaded (20)

PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 

Built-in Replication in PostgreSQL

  • 1. Built-in Replication in PostgreSQL Fujii Masao NTT OSS Center 09/27/2010 Copyright(c)2010 NTT, Inc. All Rights Reserved.
  • 2. Who am I? • Database engineer in NTT Open Source Software Center • PostgreSQL developer since 2008 • Author of new built-in replication Copyright(c)2010 NTT, Inc. All Rights Reserved. 2
  • 3. Abstract • What’s replication? • Background • How does the built-in replication work? – Features – Architecture – Limitations – Future works Copyright(c)2010 NTT, Inc. All Rights Reserved. 3
  • 4. What’s replication? • Create a replica of the database on multiple servers – Multiple servers have the same database Client Change Change Original Replicas Copyright(c)2010 NTT, Inc. All Rights Reserved. 4
  • 5. Why replication? • High Availability – Reduces the system downtime • Load Balancing – Improve the system performance High Availability Load Balancing Client Client SQL SQL SQL DBMS DBMS Copyright(c)2010 NTT, Inc. All Rights Reserved. 5
  • 6. Background • Historical policy – Avoid putting replication into core PostgreSQL – No "one size fits all" replication solution • Replication war! Postgres-XC syncreplicator rubyrep PL/Proxy Londiste Postgres-R Sequoia Slony-I Bucardo pgpool-II Mammoth GridSQL PGCluster PyReplica PostgresForest Copyright(c)2010 NTT, Inc. All Rights Reserved. 6
  • 7. Road to core • No default choice – Too complex to install and use for simple cases – Low activity, easily-inactive – No Japanese document – Cannot work on other than linux – vs. other dbms • v9.0 – Simple, reliable basic replication in core Copyright(c)2010 NTT, Inc. All Rights Reserved. 7
  • 8. Built-in replication in PostgreSQL 9.0 • Streaming Replication – Capability to stream changes on master to standby • Hot Standby – Capability to run read-only queries on standby • 1+1=3 Client Hot Standby R/W SQL R/O SQL Master Standby Change Streaming Replication Copyright(c)2010 NTT, Inc. All Rights Reserved. 8
  • 9. Master / Standbys • One master / Multiple standbys – Only master accepts write query – Both accepts read query • Read scalable – Not write scalable Client Standbys R/O SQL R/W SQL Change Master Copyright(c)2010 NTT, Inc. All Rights Reserved. 9
  • 10. Cascading vs. Proxy Client Not allow Cascading Master Standby Standby Client Allow Proxy approach Master Proxy Standbys Copyright(c)2010 NTT, Inc. All Rights Reserved. 10
  • 11. Hot Standby Allow • Query access • Logical hot backup – SELECT – pg_dump Not allow • Data Manipulation • Maintenance Language (DML) – VACUUM, ANALYZE – INSERT, UPDATE, DELETE – (Replicated from master) – SELECT FOR UPDATE • Physical hot backup • Data Definition – pg_start/stop_backup Language (DDL) – CREATE, DROP, ALTER Copyright(c)2010 NTT, Inc. All Rights Reserved. 11
  • 12. Log-shipping • WAL is shipped from master to standby – WAL a.k.a transaction log • Standby in recovery mode – Keeps the database current by replaying receved WAL Client Write query Master Standby WAL Recovery WAL WAL Database Copyright(c)2010 NTT, Inc. All Rights Reserved. 12
  • 13. Limitation by log-shipping • Must be the same between master and standby – H/W architecture – PostgreSQL major version Client Standby OS: 32bit NG Master NG PG: v9.1.0 OS: 64bit PG: v9.0.0 OK OS: 64bit PG: v9.0.2 Copyright(c)2010 NTT, Inc. All Rights Reserved. 13
  • 14. Per database cluster granularity • All database objects are replicated – Per-table granularity is not allowed Per database cluster Per table Master Standby Master Standby Copyright(c)2010 NTT, Inc. All Rights Reserved. 14
  • 15. Easy migration • No need to change table definition – cf. Slony-I forces table to have a primary key • No need to rewrite SQL – cf. Slony-I doesn’t replicate DDL – All the SQL PostgreSQL supports are available in master • Easy to use existing database server as master Client Client Easy migration! Stand-alone Master Standby Copyright(c)2010 NTT, Inc. All Rights Reserved. 15
  • 16. No query distribution • Postgres doesn’t provide query distribution capability – Implement query distribution logic into application – Use query distributor Implement logic Use distributor Client Client Query Write query Read query Distributor Write query Read query Master Standby Master Standby Copyright(c)2010 NTT, Inc. All Rights Reserved. 16
  • 17. Shared nothing • WAL is shipped via network – No special H/W required – No distance limitation – No single point of failure Shared nothing Shared disk Master Standby Master Standby Copyright(c)2010 NTT, Inc. All Rights Reserved. 17
  • 18. Asynchronous • WAL is shipped asynchronously – Low performance impact on the master – Data loss window on failover – Query on the standby sees a bit outdated transactions Client thinks this Client transaction has been committed. But.. Transaction “success” Transaction has Master WAL Standby not been replicated yet. Copyright(c)2010 NTT, Inc. All Rights Reserved. 18
  • 19. Failover • Standby can be brought up anytime – Automatic failover requires clusterware • Failover time is relatively short Client Client Pacemaker pgpool-II VIP pgpool-II Master Standby Master Standby Copyright(c)2010 NTT, Inc. All Rights Reserved. 19
  • 20. Online standby addtion and deletion • Standby can be added or deleted without downtime of the master and the other standbys – This is useful for small start system Client Don’t need to Master stop master during adding new standby New Standby Copyright(c)2010 NTT, Inc. All Rights Reserved. 20
  • 21. Built-in • Easy to install and use – Need to install only Postgres – User-intuitive usage – Run on all the major operating systems • Highly active community – Volunteers translate the document into Japanese – Bug will be fixed soon – Continuous improvement and development – Many users Copyright(c)2010 NTT, Inc. All Rights Reserved. 21
  • 22. Architecture Client Write query Read query access database backend backend database backend change apply send receive backend backend backend walsender walreceiver startup write read write read WAL WAL Master Standby Copyright(c)2010 NTT, Inc. All Rights Reserved. 22
  • 23. Multiple standbys • One-to-one relationship between walsender and standby – WAL is shipped to each standby in parallel backend backend walsender walreceiver startup backend WAL Standby WAL walsender walreceiver WAL startup Standby walsender walreceiver WAL startup Master Standby Copyright(c)2010 NTT, Inc. All Rights Reserved. 23
  • 24. Walsender and WAL • Walsender always reads WAL from disk – Prevent standby from going ahead of master – Avoid loss of consistency between master and standby • WAL is basically read from file cache – WAL is read just after written – I/O load by walsender is not high – But, WAL is read from disk if standby falls far behind master Copyright(c)2010 NTT, Inc. All Rights Reserved. 24
  • 25. Recovery vs. Read-only query Client Read query access backend backend database backend apply startup read WAL Recovery Copyright(c)2010 NTT, Inc. All Rights Reserved. 25
  • 26. Recovery vs. Read-only query • Until the conflict has been resolved, – Read query returns outdated result – Failover is blocked • Parameter specifying maximum delay in recovery – Increase the delay when running time-consuming job – Decrease the delay when we want to make the failover time short Copyright(c)2010 NTT, Inc. All Rights Reserved. 26
  • 27. Recovery vs. Read-only query Client Write query Don’t interfere with log-shipping database change send receive backend backend backend walsender walreceiver write read write WAL WAL Master Standby Copyright(c)2010 NTT, Inc. All Rights Reserved. 27
  • 28. Future work - Synchronous • Synchronous replication is essential to avoid data loss on failover – Currently under development for 9.1 – Three synchronization levels: recv, fsync, apply Client Write query database “success” change send receive backend backend backend walsender walreceiver reply write read recv WAL Master waits until standby Master has received WAL Standby Copyright(c)2010 NTT, Inc. All Rights Reserved. 28
  • 29. Future work - Synchronous • Synchronous replication is essential to avoid data loss on failover – Currently under development for 9.1 – Three synchronization levels: recv, fsync, apply Client Write query fsync access database “success” Master waits until standby has written WAL change send receive backend backend backend walsender walreceiver reply write read write WAL WAL Master Standby Copyright(c)2010 NTT, Inc. All Rights Reserved. 29
  • 30. Future work - Synchronous • Synchronous replication is essential to avoid data loss on failover – Currently under development for 9.1 – Three synchronization levels: recv, fsync, apply Client Write query apply Read query Master waits until standby has applied WAL database “success” database change apply send receive backend backend backend walsender walreceiver startup reply write read write read WAL WAL Master Standby Copyright(c)2010 NTT, Inc. All Rights Reserved. 30
  • 31. Future work - Synchronous • Per-transaction control – Some transactions are important, others are not • Quorum commit – Master waits for N standbys Copyright(c)2010 NTT, Inc. All Rights Reserved. 31
  • 32. If you find bug or problem • Bug report form – http://www.postgresql.org/support/submit bug • Mail – [email protected][email protected] Copyright(c)2010 NTT, Inc. All Rights Reserved. 32
  • 33. Thank you for listening Copyright(c)2010 NTT, Inc. All Rights Reserved.