SlideShare a Scribd company logo
August 2018
An Introduction to Performance
Monitoring for PostgreSQL
Sebastian Insausti
Presenter
sebastian@severalnines.com
Copyright 2017 Severalnines AB
I'm Jean-Jérôme from the Severalnines Team and
I'm your host for today's webinar!
Feel free to ask any questions in the Questions
section of this application or via the Chat box.
You can also contact me directly via the chat box
or via email: info@severalnines.com during or
after the webinar.
Your host & some logistics
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
Copyright 2017 Severalnines AB
Copyright 2017 Severalnines AB
Automation & Management
Deployment
● Deploy a Cluster in Minutes
● On-Prem or Cloud (AWS/Azure/Google)
Monitoring
● Systems View with 1 sec Resolution
● DB / OS stats & Performance Advisors
● Configurable Dashboards
● Query Analyzer
● Real-time / historical
Management
● Backup Management
● Upgrades & Patching
● Security & Compliance
● Operational Reports
● Automatic Recovery & Repair
● Performance Management
● Automatic Performance Advisors
Copyright 2017 Severalnines AB
Supported Databases
Copyright 2017 Severalnines AB
Our Customers
Poll 1 - What databases do you currently
use?
Copyright 2018 Severalnines AB
(select one or more)
● PostgreSQL
● MySQL/MariaDB
● MongoDB
● Oracle and/or MS SQL
● Other
August 2018
An Introduction to Performance
Monitoring for PostgreSQL
Sebastian Insausti
Presenter
sebastian@severalnines.com
Agenda
● PostgreSQL architecture overview
● Key PostgreSQL metrics and their meaning
○ Troubleshooting performance problems in production
○ Tuning
● Performance monitoring tools
● Impact of monitoring on performance
● How to use ClusterControl to identify performance issues
○ Demo
Copyright 2017 Severalnines AB
Copyright 2018 Severalnines AB
PostgreSQL architecture overview
Fundamental Parts
● Processes
○ Postgres Server Process
○ Backend Process
○ Background Process
○ Replications Associated Process
○ Background Worker Process
● Memory
○ Local memory area
○ Shared memory area
● Disk
○ Data Files
○ WAL Files
○ Log Files
Processes
Processes
Memory
Disk
Parts Working Together
Copyright 2017 Severalnines AB
Copyright 2018 Severalnines AB
Key PostgreSQL metrics and their meaning
System Monitoring
● CPU Usage: Percentage use of CPU (%cpu)
● RAM Usage: Amount of free RAM memory (mem free)
● Network: Packet loss or high latency (packet time or
packet loss)
● Disk Usage: Percentage use of disk (use%)
● Disk IOPS: Read or write per second, and IO wait.
(r/s, w/s, iowait)
● SWAP usage: Amount of free SWAP memory
(swap free)
Tuning instance vs workload
● Instance Tuning
○ Instance parameters (OS, Database)
● Workload Tuning
○ Queries, Schema
Types of Instance Metrics
● Caching
● Connections
● Checkpoints
● Commits
● Replication
● Vacuum
Caching (1 of 3)
Cache hits vs disk hits: Disk access is expensive, we want to fetch most
of the data in memory.
Check queries to confirm if you are using cache or disk (EXPLAIN
ANALYZE BUFFER).
Related parameters:
● shared_buffers: The amount of memory that the database server
uses for shared memory buffers. If this value is too low, the
database would use more disk, which would cause more slowness.
● work_mem: Amount of memory used by the internal operations of
ORDER BY, DISTINCT and JOIN before writing to the temporary files on
disk. If this value is too low, the database would use more disk.
● temp_buffers: Used to store the temporary tables used in each session.
This parameter sets the maximum amount of memory for this task.
Caching (2 of 3)
Caching (3 of 3)
● maintenance_work_mem: Maximum memory that an operation like
Vacuuming, adding indexes or foreign keys can consume.
● effective_cache_size: Used by the query planner to take into account
plans that may or may not fit in memory. A high value makes it more
probable that index scans are used and a low value makes it more
probable that sequential scans will be used.
Connections
Amount of connections: Create a baseline and check for odd patterns.
○ Increasing: Bad use of connection pooling, locking, increase of activity.
○ Decreasing: Application problem , networking issue.
State of connections: Search for queries in a particular state. How we
manage transactions in our applications can impact here.
Related parameters:
● max_connections: This parameter determines the maximum number
of simultaneous connections to our database.
Checkpoints (1 of 2)
Checkpoints are points in the sequence of transactions at which all data files
have been updated with all information written before that checkpoint.
In the event of a crash, the crash recovery procedure looks at the latest
checkpoint record to determine the point in the log (known as the redo
record) from which it should start the REDO operation.
Checkpoint frequency: Frequency impacts disk I/O performance.
Checkpoints (2 of 2)
Related parameters:
● Checkpoint_timeout: Maximum time between automatic WAL
checkpoints, in seconds.
● max_wal_size: Maximum size that the WAL is allowed to grow between
the control points.
● min_wal_size: When the WAL file is kept below this value, it is recycled for
future use at a checkpoint, instead of being deleted.
● wal_sync_method: It is used to force WAL updates to disk.
● wal_buffers: Amount of shared memory used for WAL data that has not
yet been written to disk.
High number of commits: Can be caused by inefficient bulk loads. Check
workload and what have changed.
Related parameters:
● synchronous_commit: It specifies if the transaction commit will wait for
the WAL records to be written to disk before the command returns a
"success" indication to the client.
Possible values: on, remote_apply, remote_write, local and off.
Commits (1 of 2)
[root@postgres1 /]# ./pgbench -c50 -N -Upgbtest pgbtest
Commits (2 of 2)
synchronous_commit TPS
on (default) 679.942166
off 913.768318
local 778.297985
remote_write 719.684452
remote_apply 630.358726
Lag and state: The key metrics to monitor here would be the lag and the
replication state.
● Check for networking issues.
● Check for resources or underdimesioning issues.
Related parameters:
● max_wal_senders: It specifies the maximum number of concurrent
connections from standby servers or streaming base backup clients. The
parameter cannot be set higher than max_connections.
Replication
Vacuum (1 of 3)
Vacuum process: It is responsible for several maintenance tasks in the database,
one of them recovering storage used by dead tuples. If the VACUUM is taking too
much time or resources, it means that we must do it more frequently
To monitor the vacuum process, check for dead tuples and last time vacuum
execution. We have this information in the pg_stat_user_tables:
SELECT relname, n_dead_tup, last_autovacuum FROM pg_stat_user_tables;
relname | n_dead_tup | last_autovacuum
-------------+------------------+-------------------------------
setups | 343688 | 2018-08-15 05:55:30.309274+00
users | 234865 | 2018-08-15 21:46:41.015965+00
Vacuum (2 of 3)
If the autovacuum process is not running:
● Check process on the operating system:
[root@postgres1 /]# ps aux |grep autovacuum
postgres 283 0.0 0.8 435340 8768 ? Ss 00:44 0:01 postgres: autovacuum launcher process
● Check autovacuum status on the database:
SELECT name, setting FROM pg_settings WHERE name='autovacuum';
name | setting
------------+---------
autovacuum | on
(1 row)
Vacuum (3 of 3)
Related parameters:
● autovacuum_work_mem: It specifies the maximum amount of memory
to be used by each autovacuum worker process. It defaults to -1,
indicating that we are using maintenance_work_mem.
Check Error Log: Check your log for errors like ‘FATAL’ or ‘deadlock’, or even
for common errors for proactive maintenance.
In general, the error messages contain a description of the issue, detailed
information, and a hint.
Examples:
2018-08-19 02:06:28.053 UTC [28856] FATAL: password authentication failed
for user "username"
2018-08-19 01:59:02.998 UTC [28789] ERROR: duplicate key value violates
unique constraint "sbtest21_pkey"
Monitoring the Error Log (1 of 2)
Monitoring the Error Log (2 of 2)
2018-08-18 12:56:38.520 -03 [1181] ERROR: deadlock detected
2018-08-18 12:56:38.520 -03 [1181] DETAIL: Process 1181 waits for ShareLock on transaction 579; blocked
by process 1148.
Process 1148 waits for ShareLock on transaction 578; blocked by process 1181.
Process 1181: UPDATE country SET population=18886001 WHERE code='AUS';
Process 1148: UPDATE country SET population=15864001 WHERE code='NLD';
2018-08-18 12:56:38.520 -03 [1181] HINT: See server log for query details.
2018-08-18 12:56:38.520 -03 [1181] CONTEXT: while updating tuple (0,15) in relation "country"
2018-08-18 12:56:38.520 -03 [1181] STATEMENT: UPDATE country SET population=18886001 WHERE
code='AUS';
2018-08-18 12:59:50.568 -03 [1181] ERROR: current transaction is aborted, commands ignored until end of
transaction block
Patterns: Check the patterns of the queries. Differences in time or frequency.
Operation: If you have a lot of reads, consider sending to a slave.
Locks or indexes: Understand how locking works, and if there are deadlocks.
Look for unindexed queries or unused indexes.
Queries
● There are several types of locks.
● The important thing about them, is how they conflict with each other.
Locks
Queries
Slow queries:
● Resources: Check for load somewhere, high CPU, or swapping.
● Inefficient plan: Check for using correct indexes, bloat or out of date
statistics.
● Locks: Check for queries waiting for another query.
Related parameters:
● default_statistics_target: PostgreSQL collects statistics from each of
the tables to decide how queries will be executed on them. This value set
the number of rows to be inspected by ANALYZE process.
Queries
world=# EXPLAIN SELECT * FROM city t1,country t2 WHERE id>100 AND t1.population>700000 AND t2.population<7000000;
QUERY PLAN
--------------------------------------------------------------------------
Nested Loop (cost=0.00..734.81 rows=50662 width=144)
-> Seq Scan on city t1 (cost=0.00..93.19 rows=347 width=31)
Filter: ((id > 100) AND (population > 700000))
-> Materialize (cost=0.00..8.72 rows=146 width=113)
-> Seq Scan on country t2 (cost=0.00..7.99 rows=146 width=113)
Filter: (population < 7000000)
(6 rows)
Queries
world=# EXPLAIN ANALYZE SELECT * FROM city t1,country t2 WHERE id>100 AND t1.population>700000 AND t2.population<7000000;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..734.81 rows=50662 width=143) (actual time=0.040..22.066 rows=51100 loops=1)
-> Seq Scan on city t1 (cost=0.00..93.19 rows=347 width=31) (actual time=0.025..0.581 rows=350 loops=1)
Filter: ((id > 100) AND (population > 700000))
Rows Removed by Filter: 3729
-> Materialize (cost=0.00..8.72 rows=146 width=112) (actual time=0.000..0.010 rows=146 loops=350)
-> Seq Scan on country t2 (cost=0.00..7.99 rows=146 width=112) (actual time=0.005..0.053 rows=146 loops=1)
Filter: (population < 7000000)
Rows Removed by Filter: 93
Planning time: 0.123 ms
Execution time: 24.052 ms
(10 rows)
world=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM city t1,country t2 WHERE id>100 AND t1.population>700000 AND t2.population<7000000;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..734.81 rows=50662 width=143) (actual time=0.034..21.384 rows=51100 loops=1)
Buffers: shared hit=37
-> Seq Scan on city t1 (cost=0.00..93.19 rows=347 width=31) (actual time=0.025..0.637 rows=350 loops=1)
Filter: ((id > 100) AND (population > 700000))
Rows Removed by Filter: 3729
Buffers: shared hit=32
-> Materialize (cost=0.00..8.72 rows=146 width=112) (actual time=0.000..0.010 rows=146 loops=350)
Buffers: shared hit=5
-> Seq Scan on country t2 (cost=0.00..7.99 rows=146 width=112) (actual time=0.005..0.054 rows=146 loops=1)
Filter: (population < 7000000)
Rows Removed by Filter: 93
Buffers: shared hit=5
Planning time: 0.134 ms
Execution time: 23.881 ms
Queries
Copyright 2017 Severalnines AB
Copyright 2018 Severalnines AB
Performance monitoring tools
Poll 2 - What tools do you use to monitor
PostgreSQL?
Copyright 2018 Severalnines AB
(select one or more)
● On-prem (Nagios, Zabbix)
● SaaS solution (DataDog, NewRelic)
● Postgres centric (Postgres Enterprise Manager, pgwatch2, …)
● Polyglot (ClusterControl)
● Other
Built-in
● Error Log
Automating some monitoring of the error log, looking
for key words like FATAL, ERROR or DEADLOCK is really
useful.
● Statistics collector
The collector can count accesses to tables and indexes
in both disk-block and individual-row terms, tracks the
total number of rows in each table, and information
about vacuum and analyze actions for each table.
Contributed / External
● pg_stat_statements
It help us to know the query profile of your database.
It tracks all the queries that are executed and stores a
lot of useful statistics in a table called
pg_stat_statements.
● pg_stat_plans
This builds on pg_stat_statements and records query
plans for all executed queries.
Contributed / External
● pgBadger
Performs an analysis of PostgreSQL logs and displays
them in an HTML file.
pgBadger is able to autodetect your log file format.
Parses huge log files as well as gzip compressed files.
Contributed / External
● pg_buffercache
Allows to check what's happening in the shared buffer
cache in real time, showing how many pages are
currently held in the cache.
● pgstattuple
Generates statistics for tables and indexes, shows how
much space used by each table/index is consumed by
live tuples, deleted tuples or how much unused space is
available in each relation.
Operating System
● top: Check CPU, Memory, Load and more
● ps: Check processes running
● free: Check memory (RAM & SWAP)
● netstat / ping / ifconfig: Check the network state
● iostat / iotop: Check the Disk access
Copyright 2017 Severalnines AB
Copyright 2018 Severalnines AB
External Performance Monitoring Tools
Nagios is an Open Source system and network
monitoring application.
You can monitor network services, host resources,
and more.
For monitoring PostgreSQL you can use:
● Plugins
● Create your own script
Nagios
Zabbix is a software that can monitor both
networks and servers.
Flexible notification mechanism
Offers reports and data visualization based on the
stored data.
Zabbix is accessed by a web interface.
Zabbix
ClusterControl
ClusterControl is a polyglot management and
monitoring system that helps to deploy,
manage, monitor and scale different databases.
Supports PostgreSQL, MySQL, MariaDB,
MongoDB, Galera Cluster and more.
More Information
For more information about how to monitoring PostgreSQL with an external tool
you can check the following blog:
The Best Alert and Notification Tools for PostgreSQL
https://severalnines.com/blog/best-alert-and-notification-tools-postgresql
Copyright 2017 Severalnines AB
Copyright 2018 Severalnines AB
Impact of monitoring on performance
Logs and Queries
VS
Monitoring Performance
Poll 3 - How are your Postgres databases
performing?
Copyright 2018 Severalnines AB
(select one)
● Good, they are well tuned
● Poorly, we need to optimize them
● Poorly despite optimizing, we need a new DB architecture
● Good, but we might run into (traffic growth) issues
● Other
Copyright 2017 Severalnines AB
Copyright 2018 Severalnines AB
Demo
Copyright 2017 Severalnines AB
Copyright 2018 Severalnines AB
Q & A
Additional Resources
Free PostgreSQL
Whitepaper
severalnines.com/res
ources/whitepapers
Additional Resources
● How to Benchmark PostgreSQL Performance
○ https://severalnines.com/blog/how-benchmark-postgresql-performance-using-sysbench
● Tuning Input/Output (I/O) Operations for PostgreSQL
○ https://severalnines.com/blog/tuning-io-operations-postgresql
● A Performance Cheat Sheet for PostgreSQL
○ https://severalnines.com/blog/performance-cheat-sheet-postgresql
● Contact us: info@severalnines.com

More Related Content

What's hot (20)

PPTX
PostGreSQL Performance Tuning
Maven Logix
 
PPTX
Oracle GoldenGate 21c New Features and Best Practices
Bobby Curtis
 
PDF
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Sandesh Rao
 
PDF
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Adam Doyle
 
PPTX
Key-Value NoSQL Database
Heman Hosainpana
 
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
PDF
Introduction to column oriented databases
ArangoDB Database
 
PPTX
Apache Spark Architecture
Alexey Grishchenko
 
PDF
Introduction to Redis
Dvir Volk
 
PDF
Building an open data platform with apache iceberg
Alluxio, Inc.
 
PDF
The Parquet Format and Performance Optimization Opportunities
Databricks
 
PDF
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
GetInData
 
PDF
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
PPTX
Azure data platform overview
James Serra
 
PDF
High-speed Database Throughput Using Apache Arrow Flight SQL
ScyllaDB
 
PPTX
Programming in Spark using PySpark
Mostafa
 
PPTX
Snowflake essentials
qureshihamid
 
PPTX
Apache Ranger
Rommel Garcia
 
PPTX
Modernize & Automate Analytics Data Pipelines
Carole Gunst
 
PPTX
Understand oracle real application cluster
Satishbabu Gunukula
 
PostGreSQL Performance Tuning
Maven Logix
 
Oracle GoldenGate 21c New Features and Best Practices
Bobby Curtis
 
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Sandesh Rao
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Adam Doyle
 
Key-Value NoSQL Database
Heman Hosainpana
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Introduction to column oriented databases
ArangoDB Database
 
Apache Spark Architecture
Alexey Grishchenko
 
Introduction to Redis
Dvir Volk
 
Building an open data platform with apache iceberg
Alluxio, Inc.
 
The Parquet Format and Performance Optimization Opportunities
Databricks
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
GetInData
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
Azure data platform overview
James Serra
 
High-speed Database Throughput Using Apache Arrow Flight SQL
ScyllaDB
 
Programming in Spark using PySpark
Mostafa
 
Snowflake essentials
qureshihamid
 
Apache Ranger
Rommel Garcia
 
Modernize & Automate Analytics Data Pipelines
Carole Gunst
 
Understand oracle real application cluster
Satishbabu Gunukula
 

Similar to Webinar slides: An Introduction to Performance Monitoring for PostgreSQL (20)

PDF
The Accidental DBA
PostgreSQL Experts, Inc.
 
PDF
Creating PostgreSQL-as-a-Service at Scale
Sean Chittenden
 
PDF
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
PostgreSQL-Consulting
 
PDF
The Essential postgresql.conf
Robert Treat
 
PDF
Introduction to PostgreSQL for System Administrators
Jignesh Shah
 
PPTX
PostgreSQL Performance Problems: Monitoring and Alerting
Grant Fritchey
 
PPTX
Migrating To PostgreSQL
Grant Fritchey
 
PDF
Slow things down to make them go faster [FOSDEM 2022]
Jimmy Angelakos
 
PDF
What’s new in 9.6, by PostgreSQL contributor
Masahiko Sawada
 
PDF
Monitoring Postgres at Scale | PostgresConf US 2018 | Lukas Fittl
Citus Data
 
PDF
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
PPTX
How to be a Postgres DBA in a Pinch
ElizabethGarrettChri
 
PDF
PGConf APAC 2018 - Tale from Trenches
PGConf APAC
 
PDF
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Jignesh Shah
 
KEY
Grabbing the PostgreSQL Elephant by the Trunk
Harold Giménez
 
PDF
PostgreSQL 9.5 - Major Features
InMobi Technology
 
PDF
Advanced Postgres Monitoring
Denish Patel
 
PDF
PostgreSQL High_Performance_Cheatsheet
Lucian Oprea
 
PDF
PGConf.ASIA 2019 Bali - Modern PostgreSQL Monitoring & Diagnostics - Mahadeva...
Equnix Business Solutions
 
PDF
8.4 Upcoming Features
PostgreSQL Experts, Inc.
 
The Accidental DBA
PostgreSQL Experts, Inc.
 
Creating PostgreSQL-as-a-Service at Scale
Sean Chittenden
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
PostgreSQL-Consulting
 
The Essential postgresql.conf
Robert Treat
 
Introduction to PostgreSQL for System Administrators
Jignesh Shah
 
PostgreSQL Performance Problems: Monitoring and Alerting
Grant Fritchey
 
Migrating To PostgreSQL
Grant Fritchey
 
Slow things down to make them go faster [FOSDEM 2022]
Jimmy Angelakos
 
What’s new in 9.6, by PostgreSQL contributor
Masahiko Sawada
 
Monitoring Postgres at Scale | PostgresConf US 2018 | Lukas Fittl
Citus Data
 
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
How to be a Postgres DBA in a Pinch
ElizabethGarrettChri
 
PGConf APAC 2018 - Tale from Trenches
PGConf APAC
 
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Jignesh Shah
 
Grabbing the PostgreSQL Elephant by the Trunk
Harold Giménez
 
PostgreSQL 9.5 - Major Features
InMobi Technology
 
Advanced Postgres Monitoring
Denish Patel
 
PostgreSQL High_Performance_Cheatsheet
Lucian Oprea
 
PGConf.ASIA 2019 Bali - Modern PostgreSQL Monitoring & Diagnostics - Mahadeva...
Equnix Business Solutions
 
8.4 Upcoming Features
PostgreSQL Experts, Inc.
 
Ad

More from Severalnines (20)

PDF
The Long Term Cost of Managed DBaaS vs Sovereign DBaaS
Severalnines
 
PPTX
Sovereign DBaaS_ A Practical Vision for Self-Implementation of DBaaS.pptx
Severalnines
 
PDF
PostgreSQL on AWS Aurora/Azure Cosmos VS EC2/Azure VMs
Severalnines
 
PDF
Localhost Conference 2024_ Building a Flexible and Scalable Database Strategy...
Severalnines
 
PDF
SREDAY London 2024 | Cloud Native Technologies: The Building Blocks of Modern...
Severalnines
 
PDF
Building a Sovereign DBaaS on K8s OpenInfra Summit Asia 2024.pdf
Severalnines
 
PDF
S-DBaaS Community Call | Introduction to Sovereign DBaaS: The why, what and how
Severalnines
 
PDF
WEBINAR SLIDES: CCX for Cloud Service Providers
Severalnines
 
PPTX
LIVE DEMO: CCX for CSPs, a drop-in DBaaS solution
Severalnines
 
PDF
Kubernetes at Scale: Going Multi-Cluster with Istio
Severalnines
 
PDF
DIY DBaaS: A guide to building your own full-featured DBaaS
Severalnines
 
PDF
Cloud's future runs through Sovereign DBaaS
Severalnines
 
PPTX
Tips to drive maria db cluster performance for nextcloud
Severalnines
 
PPTX
Working with the Moodle Database: The Basics
Severalnines
 
PPTX
SysAdmin Working from Home? Tips to Automate MySQL, MariaDB, Postgres & MongoDB
Severalnines
 
PDF
(slides) Polyglot persistence: utilizing open source databases as a Swiss poc...
Severalnines
 
PDF
Webinar slides: How to Migrate from Oracle DB to MariaDB
Severalnines
 
PDF
Webinar slides: How to Automate & Manage PostgreSQL with ClusterControl
Severalnines
 
PDF
Webinar slides: How to Manage Replication Failover Processes for MySQL, Maria...
Severalnines
 
PDF
Webinar slides: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB wi...
Severalnines
 
The Long Term Cost of Managed DBaaS vs Sovereign DBaaS
Severalnines
 
Sovereign DBaaS_ A Practical Vision for Self-Implementation of DBaaS.pptx
Severalnines
 
PostgreSQL on AWS Aurora/Azure Cosmos VS EC2/Azure VMs
Severalnines
 
Localhost Conference 2024_ Building a Flexible and Scalable Database Strategy...
Severalnines
 
SREDAY London 2024 | Cloud Native Technologies: The Building Blocks of Modern...
Severalnines
 
Building a Sovereign DBaaS on K8s OpenInfra Summit Asia 2024.pdf
Severalnines
 
S-DBaaS Community Call | Introduction to Sovereign DBaaS: The why, what and how
Severalnines
 
WEBINAR SLIDES: CCX for Cloud Service Providers
Severalnines
 
LIVE DEMO: CCX for CSPs, a drop-in DBaaS solution
Severalnines
 
Kubernetes at Scale: Going Multi-Cluster with Istio
Severalnines
 
DIY DBaaS: A guide to building your own full-featured DBaaS
Severalnines
 
Cloud's future runs through Sovereign DBaaS
Severalnines
 
Tips to drive maria db cluster performance for nextcloud
Severalnines
 
Working with the Moodle Database: The Basics
Severalnines
 
SysAdmin Working from Home? Tips to Automate MySQL, MariaDB, Postgres & MongoDB
Severalnines
 
(slides) Polyglot persistence: utilizing open source databases as a Swiss poc...
Severalnines
 
Webinar slides: How to Migrate from Oracle DB to MariaDB
Severalnines
 
Webinar slides: How to Automate & Manage PostgreSQL with ClusterControl
Severalnines
 
Webinar slides: How to Manage Replication Failover Processes for MySQL, Maria...
Severalnines
 
Webinar slides: Backup Management for MySQL, MariaDB, PostgreSQL & MongoDB wi...
Severalnines
 
Ad

Recently uploaded (20)

PPTX
PHIPA-Compliant Web Hosting in Toronto: What Healthcare Providers Must Know
steve198109
 
PDF
Boardroom AI: The Next 10 Moves | Cerebraix Talent Tech
ssuser73bdb11
 
PPTX
西班牙巴利阿里群岛大学电子版毕业证{UIBLetterUIB文凭证书}文凭复刻
Taqyea
 
PPTX
Presentation3gsgsgsgsdfgadgsfgfgsfgagsfgsfgzfdgsdgs.pptx
SUB03
 
PDF
BRKSP-2551 - Introduction to Segment Routing.pdf
fcesargonca
 
PDF
Enhancing Parental Roles in Protecting Children from Online Sexual Exploitati...
ICT Frame Magazine Pvt. Ltd.
 
PPTX
Lec15_Mutability Immutability-converted.pptx
khanjahanzaib1
 
PDF
Cleaning up your RPKI invalids, presented at PacNOG 35
APNIC
 
PPTX
Metaphysics_Presentation_With_Visuals.pptx
erikjohnsales1
 
PPTX
Networking_Essentials_version_3.0_-_Module_3.pptx
ryan622010
 
PDF
BRKAPP-1102 - Proactive Network and Application Monitoring.pdf
fcesargonca
 
PDF
The Internet - By the numbers, presented at npNOG 11
APNIC
 
PPTX
L1A Season 1 ENGLISH made by A hegy fixed
toszolder91
 
PDF
Top 10 Testing Procedures to Ensure Your Magento to Shopify Migration Success...
CartCoders
 
PDF
FutureCon Seattle 2025 Presentation Slides - You Had One Job
Suzanne Aldrich
 
PPTX
法国巴黎第二大学本科毕业证{Paris 2学费发票Paris 2成绩单}办理方法
Taqyea
 
PDF
BRKACI-1003 ACI Brownfield Migration - Real World Experiences and Best Practi...
fcesargonca
 
DOCX
Custom vs. Off-the-Shelf Banking Software
KristenCarter35
 
PDF
Digital burnout toolkit for youth workers and teachers
asociatiastart123
 
PPTX
Networking_Essentials_version_3.0_-_Module_5.pptx
ryan622010
 
PHIPA-Compliant Web Hosting in Toronto: What Healthcare Providers Must Know
steve198109
 
Boardroom AI: The Next 10 Moves | Cerebraix Talent Tech
ssuser73bdb11
 
西班牙巴利阿里群岛大学电子版毕业证{UIBLetterUIB文凭证书}文凭复刻
Taqyea
 
Presentation3gsgsgsgsdfgadgsfgfgsfgagsfgsfgzfdgsdgs.pptx
SUB03
 
BRKSP-2551 - Introduction to Segment Routing.pdf
fcesargonca
 
Enhancing Parental Roles in Protecting Children from Online Sexual Exploitati...
ICT Frame Magazine Pvt. Ltd.
 
Lec15_Mutability Immutability-converted.pptx
khanjahanzaib1
 
Cleaning up your RPKI invalids, presented at PacNOG 35
APNIC
 
Metaphysics_Presentation_With_Visuals.pptx
erikjohnsales1
 
Networking_Essentials_version_3.0_-_Module_3.pptx
ryan622010
 
BRKAPP-1102 - Proactive Network and Application Monitoring.pdf
fcesargonca
 
The Internet - By the numbers, presented at npNOG 11
APNIC
 
L1A Season 1 ENGLISH made by A hegy fixed
toszolder91
 
Top 10 Testing Procedures to Ensure Your Magento to Shopify Migration Success...
CartCoders
 
FutureCon Seattle 2025 Presentation Slides - You Had One Job
Suzanne Aldrich
 
法国巴黎第二大学本科毕业证{Paris 2学费发票Paris 2成绩单}办理方法
Taqyea
 
BRKACI-1003 ACI Brownfield Migration - Real World Experiences and Best Practi...
fcesargonca
 
Custom vs. Off-the-Shelf Banking Software
KristenCarter35
 
Digital burnout toolkit for youth workers and teachers
asociatiastart123
 
Networking_Essentials_version_3.0_-_Module_5.pptx
ryan622010
 

Webinar slides: An Introduction to Performance Monitoring for PostgreSQL

  • 1. August 2018 An Introduction to Performance Monitoring for PostgreSQL Sebastian Insausti Presenter [email protected]
  • 2. Copyright 2017 Severalnines AB I'm Jean-Jérôme from the Severalnines Team and I'm your host for today's webinar! Feel free to ask any questions in the Questions section of this application or via the Chat box. You can also contact me directly via the chat box or via email: [email protected] during or after the webinar. Your host & some logistics
  • 5. Copyright 2017 Severalnines AB Automation & Management Deployment ● Deploy a Cluster in Minutes ● On-Prem or Cloud (AWS/Azure/Google) Monitoring ● Systems View with 1 sec Resolution ● DB / OS stats & Performance Advisors ● Configurable Dashboards ● Query Analyzer ● Real-time / historical Management ● Backup Management ● Upgrades & Patching ● Security & Compliance ● Operational Reports ● Automatic Recovery & Repair ● Performance Management ● Automatic Performance Advisors
  • 6. Copyright 2017 Severalnines AB Supported Databases
  • 7. Copyright 2017 Severalnines AB Our Customers
  • 8. Poll 1 - What databases do you currently use? Copyright 2018 Severalnines AB (select one or more) ● PostgreSQL ● MySQL/MariaDB ● MongoDB ● Oracle and/or MS SQL ● Other
  • 9. August 2018 An Introduction to Performance Monitoring for PostgreSQL Sebastian Insausti Presenter [email protected]
  • 10. Agenda ● PostgreSQL architecture overview ● Key PostgreSQL metrics and their meaning ○ Troubleshooting performance problems in production ○ Tuning ● Performance monitoring tools ● Impact of monitoring on performance ● How to use ClusterControl to identify performance issues ○ Demo
  • 11. Copyright 2017 Severalnines AB Copyright 2018 Severalnines AB PostgreSQL architecture overview
  • 12. Fundamental Parts ● Processes ○ Postgres Server Process ○ Backend Process ○ Background Process ○ Replications Associated Process ○ Background Worker Process ● Memory ○ Local memory area ○ Shared memory area ● Disk ○ Data Files ○ WAL Files ○ Log Files
  • 16. Disk
  • 18. Copyright 2017 Severalnines AB Copyright 2018 Severalnines AB Key PostgreSQL metrics and their meaning
  • 19. System Monitoring ● CPU Usage: Percentage use of CPU (%cpu) ● RAM Usage: Amount of free RAM memory (mem free) ● Network: Packet loss or high latency (packet time or packet loss) ● Disk Usage: Percentage use of disk (use%) ● Disk IOPS: Read or write per second, and IO wait. (r/s, w/s, iowait) ● SWAP usage: Amount of free SWAP memory (swap free)
  • 20. Tuning instance vs workload ● Instance Tuning ○ Instance parameters (OS, Database) ● Workload Tuning ○ Queries, Schema
  • 21. Types of Instance Metrics ● Caching ● Connections ● Checkpoints ● Commits ● Replication ● Vacuum
  • 22. Caching (1 of 3) Cache hits vs disk hits: Disk access is expensive, we want to fetch most of the data in memory. Check queries to confirm if you are using cache or disk (EXPLAIN ANALYZE BUFFER). Related parameters: ● shared_buffers: The amount of memory that the database server uses for shared memory buffers. If this value is too low, the database would use more disk, which would cause more slowness.
  • 23. ● work_mem: Amount of memory used by the internal operations of ORDER BY, DISTINCT and JOIN before writing to the temporary files on disk. If this value is too low, the database would use more disk. ● temp_buffers: Used to store the temporary tables used in each session. This parameter sets the maximum amount of memory for this task. Caching (2 of 3)
  • 24. Caching (3 of 3) ● maintenance_work_mem: Maximum memory that an operation like Vacuuming, adding indexes or foreign keys can consume. ● effective_cache_size: Used by the query planner to take into account plans that may or may not fit in memory. A high value makes it more probable that index scans are used and a low value makes it more probable that sequential scans will be used.
  • 25. Connections Amount of connections: Create a baseline and check for odd patterns. ○ Increasing: Bad use of connection pooling, locking, increase of activity. ○ Decreasing: Application problem , networking issue. State of connections: Search for queries in a particular state. How we manage transactions in our applications can impact here. Related parameters: ● max_connections: This parameter determines the maximum number of simultaneous connections to our database.
  • 26. Checkpoints (1 of 2) Checkpoints are points in the sequence of transactions at which all data files have been updated with all information written before that checkpoint. In the event of a crash, the crash recovery procedure looks at the latest checkpoint record to determine the point in the log (known as the redo record) from which it should start the REDO operation. Checkpoint frequency: Frequency impacts disk I/O performance.
  • 27. Checkpoints (2 of 2) Related parameters: ● Checkpoint_timeout: Maximum time between automatic WAL checkpoints, in seconds. ● max_wal_size: Maximum size that the WAL is allowed to grow between the control points. ● min_wal_size: When the WAL file is kept below this value, it is recycled for future use at a checkpoint, instead of being deleted. ● wal_sync_method: It is used to force WAL updates to disk. ● wal_buffers: Amount of shared memory used for WAL data that has not yet been written to disk.
  • 28. High number of commits: Can be caused by inefficient bulk loads. Check workload and what have changed. Related parameters: ● synchronous_commit: It specifies if the transaction commit will wait for the WAL records to be written to disk before the command returns a "success" indication to the client. Possible values: on, remote_apply, remote_write, local and off. Commits (1 of 2)
  • 29. [root@postgres1 /]# ./pgbench -c50 -N -Upgbtest pgbtest Commits (2 of 2) synchronous_commit TPS on (default) 679.942166 off 913.768318 local 778.297985 remote_write 719.684452 remote_apply 630.358726
  • 30. Lag and state: The key metrics to monitor here would be the lag and the replication state. ● Check for networking issues. ● Check for resources or underdimesioning issues. Related parameters: ● max_wal_senders: It specifies the maximum number of concurrent connections from standby servers or streaming base backup clients. The parameter cannot be set higher than max_connections. Replication
  • 31. Vacuum (1 of 3) Vacuum process: It is responsible for several maintenance tasks in the database, one of them recovering storage used by dead tuples. If the VACUUM is taking too much time or resources, it means that we must do it more frequently To monitor the vacuum process, check for dead tuples and last time vacuum execution. We have this information in the pg_stat_user_tables: SELECT relname, n_dead_tup, last_autovacuum FROM pg_stat_user_tables; relname | n_dead_tup | last_autovacuum -------------+------------------+------------------------------- setups | 343688 | 2018-08-15 05:55:30.309274+00 users | 234865 | 2018-08-15 21:46:41.015965+00
  • 32. Vacuum (2 of 3) If the autovacuum process is not running: ● Check process on the operating system: [root@postgres1 /]# ps aux |grep autovacuum postgres 283 0.0 0.8 435340 8768 ? Ss 00:44 0:01 postgres: autovacuum launcher process ● Check autovacuum status on the database: SELECT name, setting FROM pg_settings WHERE name='autovacuum'; name | setting ------------+--------- autovacuum | on (1 row)
  • 33. Vacuum (3 of 3) Related parameters: ● autovacuum_work_mem: It specifies the maximum amount of memory to be used by each autovacuum worker process. It defaults to -1, indicating that we are using maintenance_work_mem.
  • 34. Check Error Log: Check your log for errors like ‘FATAL’ or ‘deadlock’, or even for common errors for proactive maintenance. In general, the error messages contain a description of the issue, detailed information, and a hint. Examples: 2018-08-19 02:06:28.053 UTC [28856] FATAL: password authentication failed for user "username" 2018-08-19 01:59:02.998 UTC [28789] ERROR: duplicate key value violates unique constraint "sbtest21_pkey" Monitoring the Error Log (1 of 2)
  • 35. Monitoring the Error Log (2 of 2) 2018-08-18 12:56:38.520 -03 [1181] ERROR: deadlock detected 2018-08-18 12:56:38.520 -03 [1181] DETAIL: Process 1181 waits for ShareLock on transaction 579; blocked by process 1148. Process 1148 waits for ShareLock on transaction 578; blocked by process 1181. Process 1181: UPDATE country SET population=18886001 WHERE code='AUS'; Process 1148: UPDATE country SET population=15864001 WHERE code='NLD'; 2018-08-18 12:56:38.520 -03 [1181] HINT: See server log for query details. 2018-08-18 12:56:38.520 -03 [1181] CONTEXT: while updating tuple (0,15) in relation "country" 2018-08-18 12:56:38.520 -03 [1181] STATEMENT: UPDATE country SET population=18886001 WHERE code='AUS'; 2018-08-18 12:59:50.568 -03 [1181] ERROR: current transaction is aborted, commands ignored until end of transaction block
  • 36. Patterns: Check the patterns of the queries. Differences in time or frequency. Operation: If you have a lot of reads, consider sending to a slave. Locks or indexes: Understand how locking works, and if there are deadlocks. Look for unindexed queries or unused indexes. Queries
  • 37. ● There are several types of locks. ● The important thing about them, is how they conflict with each other. Locks
  • 38. Queries Slow queries: ● Resources: Check for load somewhere, high CPU, or swapping. ● Inefficient plan: Check for using correct indexes, bloat or out of date statistics. ● Locks: Check for queries waiting for another query. Related parameters: ● default_statistics_target: PostgreSQL collects statistics from each of the tables to decide how queries will be executed on them. This value set the number of rows to be inspected by ANALYZE process.
  • 39. Queries world=# EXPLAIN SELECT * FROM city t1,country t2 WHERE id>100 AND t1.population>700000 AND t2.population<7000000; QUERY PLAN -------------------------------------------------------------------------- Nested Loop (cost=0.00..734.81 rows=50662 width=144) -> Seq Scan on city t1 (cost=0.00..93.19 rows=347 width=31) Filter: ((id > 100) AND (population > 700000)) -> Materialize (cost=0.00..8.72 rows=146 width=113) -> Seq Scan on country t2 (cost=0.00..7.99 rows=146 width=113) Filter: (population < 7000000) (6 rows)
  • 40. Queries world=# EXPLAIN ANALYZE SELECT * FROM city t1,country t2 WHERE id>100 AND t1.population>700000 AND t2.population<7000000; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..734.81 rows=50662 width=143) (actual time=0.040..22.066 rows=51100 loops=1) -> Seq Scan on city t1 (cost=0.00..93.19 rows=347 width=31) (actual time=0.025..0.581 rows=350 loops=1) Filter: ((id > 100) AND (population > 700000)) Rows Removed by Filter: 3729 -> Materialize (cost=0.00..8.72 rows=146 width=112) (actual time=0.000..0.010 rows=146 loops=350) -> Seq Scan on country t2 (cost=0.00..7.99 rows=146 width=112) (actual time=0.005..0.053 rows=146 loops=1) Filter: (population < 7000000) Rows Removed by Filter: 93 Planning time: 0.123 ms Execution time: 24.052 ms (10 rows)
  • 41. world=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM city t1,country t2 WHERE id>100 AND t1.population>700000 AND t2.population<7000000; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..734.81 rows=50662 width=143) (actual time=0.034..21.384 rows=51100 loops=1) Buffers: shared hit=37 -> Seq Scan on city t1 (cost=0.00..93.19 rows=347 width=31) (actual time=0.025..0.637 rows=350 loops=1) Filter: ((id > 100) AND (population > 700000)) Rows Removed by Filter: 3729 Buffers: shared hit=32 -> Materialize (cost=0.00..8.72 rows=146 width=112) (actual time=0.000..0.010 rows=146 loops=350) Buffers: shared hit=5 -> Seq Scan on country t2 (cost=0.00..7.99 rows=146 width=112) (actual time=0.005..0.054 rows=146 loops=1) Filter: (population < 7000000) Rows Removed by Filter: 93 Buffers: shared hit=5 Planning time: 0.134 ms Execution time: 23.881 ms Queries
  • 42. Copyright 2017 Severalnines AB Copyright 2018 Severalnines AB Performance monitoring tools
  • 43. Poll 2 - What tools do you use to monitor PostgreSQL? Copyright 2018 Severalnines AB (select one or more) ● On-prem (Nagios, Zabbix) ● SaaS solution (DataDog, NewRelic) ● Postgres centric (Postgres Enterprise Manager, pgwatch2, …) ● Polyglot (ClusterControl) ● Other
  • 44. Built-in ● Error Log Automating some monitoring of the error log, looking for key words like FATAL, ERROR or DEADLOCK is really useful. ● Statistics collector The collector can count accesses to tables and indexes in both disk-block and individual-row terms, tracks the total number of rows in each table, and information about vacuum and analyze actions for each table.
  • 45. Contributed / External ● pg_stat_statements It help us to know the query profile of your database. It tracks all the queries that are executed and stores a lot of useful statistics in a table called pg_stat_statements. ● pg_stat_plans This builds on pg_stat_statements and records query plans for all executed queries.
  • 46. Contributed / External ● pgBadger Performs an analysis of PostgreSQL logs and displays them in an HTML file. pgBadger is able to autodetect your log file format. Parses huge log files as well as gzip compressed files.
  • 47. Contributed / External ● pg_buffercache Allows to check what's happening in the shared buffer cache in real time, showing how many pages are currently held in the cache. ● pgstattuple Generates statistics for tables and indexes, shows how much space used by each table/index is consumed by live tuples, deleted tuples or how much unused space is available in each relation.
  • 48. Operating System ● top: Check CPU, Memory, Load and more ● ps: Check processes running ● free: Check memory (RAM & SWAP) ● netstat / ping / ifconfig: Check the network state ● iostat / iotop: Check the Disk access
  • 49. Copyright 2017 Severalnines AB Copyright 2018 Severalnines AB External Performance Monitoring Tools
  • 50. Nagios is an Open Source system and network monitoring application. You can monitor network services, host resources, and more. For monitoring PostgreSQL you can use: ● Plugins ● Create your own script Nagios
  • 51. Zabbix is a software that can monitor both networks and servers. Flexible notification mechanism Offers reports and data visualization based on the stored data. Zabbix is accessed by a web interface. Zabbix
  • 52. ClusterControl ClusterControl is a polyglot management and monitoring system that helps to deploy, manage, monitor and scale different databases. Supports PostgreSQL, MySQL, MariaDB, MongoDB, Galera Cluster and more.
  • 53. More Information For more information about how to monitoring PostgreSQL with an external tool you can check the following blog: The Best Alert and Notification Tools for PostgreSQL https://severalnines.com/blog/best-alert-and-notification-tools-postgresql
  • 54. Copyright 2017 Severalnines AB Copyright 2018 Severalnines AB Impact of monitoring on performance
  • 56. Poll 3 - How are your Postgres databases performing? Copyright 2018 Severalnines AB (select one) ● Good, they are well tuned ● Poorly, we need to optimize them ● Poorly despite optimizing, we need a new DB architecture ● Good, but we might run into (traffic growth) issues ● Other
  • 57. Copyright 2017 Severalnines AB Copyright 2018 Severalnines AB Demo
  • 58. Copyright 2017 Severalnines AB Copyright 2018 Severalnines AB Q & A
  • 60. Additional Resources ● How to Benchmark PostgreSQL Performance ○ https://severalnines.com/blog/how-benchmark-postgresql-performance-using-sysbench ● Tuning Input/Output (I/O) Operations for PostgreSQL ○ https://severalnines.com/blog/tuning-io-operations-postgresql ● A Performance Cheat Sheet for PostgreSQL ○ https://severalnines.com/blog/performance-cheat-sheet-postgresql ● Contact us: [email protected]