SlideShare a Scribd company logo
Jeremy Schneider, Amazon RDS
Wait! What’s going on inside
my database?
PostgreSQL and the Science of Database Performance
Updated: Sep 19, 2019
About PostgreSQL
1970: Mathematician Edgar F. Codd, working as researcher
for IBM, publishes “A Relational Model of Data for Large
Shared Data Banks”
1973: Michael Stonebraker and Eugene Wong at University
of California Berkeley seek funding and begin development
of a relational database called INGRES
1986: Michael Stonebraker and Lawrence A. Rowe at
University of California Berkeley publish “The Design of
POSTGRES” – a new database that is the successor to INGRES
1994: Andrew Yu and Jolly Chen at University of California
Berkeley add support for the SQL language
1996: Transition to non-university core team of volunteers,
official release under new name POSTGRESQL 1985
About PostgreSQL
About Database Performance
About Database Performance
About Database Performance
1990’s Manager:
“Dear DBA: Expert consultants
have taught us that if the Buffer
Cache Hit Ratio (BCHR) is below
90% then the system
immediately needs an expensive
tuning engagement.
Please report any databases that
have BCHR < 90%.”
Delfador Chibi by Peileppe
CC0
About Database Performance
1990’s Manager:
“Dear DBA: Expert consultants
have taught us that if the Buffer
Cache Hit Ratio (BCHR) is below
90% then the system
immediately needs an expensive
tuning engagement.
Please report any databases that
have BCHR < 90%.”
Delfador Chibi by Peileppe
CC0
Nørgaard, Mogens et al. Oracle Insights:
Tales of the Oak Table. Berkeley, CA:
Apress/OakTable Press, 2004. p76-77.
About Database Performance
About Database Performance
Millsap, Cary V. Optimizing Oracle Performance.
Sebastopol, CA: OReilly, 2003. p225, 240, 258-259
R = S + W
“How long
the SQL
takes to run”
See also:
• Shallahamer, Craig.
Forecasting Oracle
Performance. Berkeley,
CA: Apress, 2007.
About Database Performance
Active Session Sampling
(JB’s notebook, 2004)
Images & Quotes
Used With Permission
What about PostgreSQL?
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Mariinsky Theatre, St. Petersburg
by Sandra Cohen-Rose and Colin Rose (Montreal, Canada)
CC BY-SA
Mariinsky Theatre, St. Petersburg
by Sandra Cohen-Rose and Colin Rose (Montreal, Canada)
CC BY-SA
Wait Events
Mariinsky Theatre, St. Petersburg
by Sandra Cohen-Rose and Colin Rose (Montreal, Canada)
CC BY-SA
Wait Events
• 1990s: Database kernel instrumentation:
• Counters and tools to snapshot/compare them
• Events (log a message under certain circumstances)
• 1992: Unable to solve a performance problem, as a last resort,
engineers added event code in version 7.0.12 capable of emitting
log messages when the database waited for something
• First exposed in V$SESSION_WAIT and later in V$SESSION
(equivalent of pg_stat_activity)
• PostgreSQL built on concepts that had become standard across
the industry
Mariinsky Theatre, St. Petersburg
by Sandra Cohen-Rose and Colin Rose (Montreal, Canada)
CC BY-SA
Wait Events
Millsap, Cary V. Optimizing Oracle Performance.
Sebastopol, CA: OReilly, 2003. p225, 240, 258-259
R = S + W
“How long
the SQL
takes to run”
See also:
• Shallahamer, Craig.
Forecasting Oracle
Performance. Berkeley,
CA: Apress, 2007.
Mariinsky Theatre, St. Petersburg
by Sandra Cohen-Rose and Colin Rose (Montreal, Canada)
CC BY-SA
Wait Events
Active Session Sampling
(JB’s notebook, 2004)
Images & Quotes
Used With Permission
Mariinsky Theatre, St. Petersburg
by Sandra Cohen-Rose and Colin Rose (Montreal, Canada)
CC BY-SA
Wait Events
“But why are these events called wait events?
…
In short, when a session is not using the CPU, it may be
waiting for a resource, an action to complete, or simply
more work. Hence, events that are associated with all
such waits are known as wait events.”
Shee, Richmond, Kirtikumar Deshpande, and K. Gopalakrishnan. Oracle Wait Interface a Practical
Guide to Performance Diagnostics & Tuning. New York: London, 2004. p16
Mariinsky Theatre, St. Petersburg
by Sandra Cohen-Rose and Colin Rose (Montreal, Canada)
CC BY-SA
Wait Events
High-Level Idea:
Caveats:
• OS scheduling/runqueue
• Measurement overhead, OS kernel CPU time (e.g. I/O)
The database is WAITING any time when it’s not running on the CPU
Mariinsky Theatre, St. Petersburg
by Sandra Cohen-Rose and Colin Rose (Montreal, Canada)
CC BY-SA
Wait Events
psql> SELECT…
Idle
WaitingCpu
Mariinsky Theatre, St. Petersburg
by Sandra Cohen-Rose and Colin Rose (Montreal, Canada)
CC BY-SA
Wait Events
Mariinsky Theatre, St. Petersburg
by Sandra Cohen-Rose and Colin Rose (Montreal, Canada)
CC BY-SA
Wait Events
Mariinsky Theatre, St. Petersburg
by Sandra Cohen-Rose and Colin Rose (Montreal, Canada)
CC BY-SA
Wait Events
Mariinsky Theatre, St. Petersburg
by Sandra Cohen-Rose and Colin Rose (Montreal, Canada)
CC BY-SA
Wait Events
Significant Commits: Version 9.6
• Aa65de0 – 11 Sep 2015 – Autogenerate lwlocknames.[c|h]
• 53be0b1 – 10 Mar 2016 – Heavy/Lightweight Locks, Buffer Pins
Mariinsky Theatre, St. Petersburg
by Sandra Cohen-Rose and Colin Rose (Montreal, Canada)
CC BY-SA
Wait Events
Significant Commits: Version 9.6
• Aa65de0 – 11 Sep 2015 – Autogenerate lwlocknames.[c|h]
• 53be0b1 – 10 Mar 2016 – Heavy/Lightweight Locks, Buffer Pins
Version 10
• 6f3bd98 – 4 Oct 2016 – Latches & Sockets, Clients, Main Loops
• 249cf07 – 18 Mar 2017 – I/O
• Fc70a4b – 26 Mar 2017 – Background and Auxiliary Processes
Version 11
• 1804284 – 20 Dec 2017 – Parallel-Aware Hash Joins
Mariinsky Theatre, St. Petersburg
by Sandra Cohen-Rose and Colin Rose (Montreal, Canada)
CC BY-SA
Wait Events
src/include/pgstat.h
Mariinsky Theatre, St. Petersburg
by Sandra Cohen-Rose and Colin Rose (Montreal, Canada)
CC BY-SA
Wait Events
doxygen.postgresql.org
Mariinsky Theatre, St. Petersburg
by Sandra Cohen-Rose and Colin Rose (Montreal, Canada)
CC BY-SA
Wait Events
src/backend/postmaster/pgstat.c
Mariinsky Theatre, St. Petersburg
by Sandra Cohen-Rose and Colin Rose (Montreal, Canada)
CC BY-SA
Wait Events
src/backend/postmaster/pgstat.c
Mariinsky Theatre, St. Petersburg
by Sandra Cohen-Rose and Colin Rose (Montreal, Canada)
CC BY-SA
Wait Events
Gaps after migrating to Open Source/Community PostgreSQL
• Wait Event Counters and Cumulative Times
• Wait Event Arguments (object, block, etc)
• Comprehensive tracking of CPU time (POSIX rusage)
• Ability to find previous SQL for COMMIT/ROLLBACK
• Needed to identify which transaction is committing. (Other databases do not
update SQL text for COMMIT statement)
• On-CPU State
• SQL Execution Stage (parse/plan/execute/fetch)
• SQL Execution Plan Identifier in pg_stat_statements
• Current plan node
• Progress on long operations (e.g. large seqscan)
• Better runtime visibility into PLs
I can haz Wait Events?
Solving Problems with Wait Events in PostgreSQL
By Antony Griffiths (Flickr), CC BY
Solving Problems With Wait Events
pid | state | wait_event_type | wait_event | xact_runtime | query_short
--------+-------------+-----------------+---------------------+-------------------------+----------------------------------------------------
8135 | active | | | -00:00:00.000941 | autovacuum: VACUUM pghist.pg_stat_statements_20190
8168 | active | | | 00:00:00 | SELECT col1, col2,
| | | | |
108975 | | Activity | WalWriterMain | |
108976 | | Activity | AutoVacuumMain | |
108973 | | Activity | CheckpointerMain | |
108974 | | Activity | BgWriterMain | |
108979 | | Activity | LogicalLauncherMain | |
8185 | active | | | 00:00:00.07941 | autovacuum: VACUUM pghist.pg_stat_sys_indexes_2019
8212 | active | | | 00:00:00.349238 | autovacuum: VACUUM pghist.pg_stat_statements_20190
115699 | active | Lock | relation | 00:30:01.170404 | SELECT proc('param1')
103268 | active | IO | DataFileRead | 00:46:46.277548 | select count(*) from some_ones_table a , (select c
95936 | active | LWLock | buffer_io | 00:56:57.327904 | SELECT col1 FROM some_ones_table a, (SELECT col1 a
95935 | active | IO | DataFileRead | 00:56:57.328169 | SELECT col1 FROM some_ones_table a, (SELECT col1 a
95921 | active | LWLock | buffer_io | 00:56:57.393765 | SELECT col1 FROM some_ones_table a, (SELECT col1 a
56628 | active | IO | DataFileRead | 01:47:55.333596 | select col1 from some_ones_table WHERE err_id in (
53981 | active | IO | BufFileRead | 01:51:40.986659 | SELECT col1 FROM some_ones_table a, (SELECT asin a
49386 | active | LWLock | buffer_io | 01:58:13.166389 | SELECT count(*) FROM some_ones_table a, (SELECT co
29172 | active | IO | BufFileRead | 02:04:09.108342 | SELECT count(*) FROM some_ones_table a, (SELECT co
43208 | active | LWLock | buffer_io | 02:06:39.296499 | SELECT count(*) FROM some_ones_table a, (SELECT co
43207 | active | IO | DataFileRead | 02:06:39.29666 | SELECT count(*) FROM some_ones_table a, (SELECT co
31401 | active | IPC | MessageQueueReceive | 02:06:39.370239 | SELECT count(*) FROM some_ones_table a, (SELECT co
12387 | active | IO | DataFileRead | 02:46:50.262871 | select count(*) from some_ones_table a , (select c
12386 | active | IO | DataFileRead | 02:46:50.263142 | select count(*) from some_ones_table a , (select c
12385 | active | IO | DataFileRead | 02:46:50.266696 | select count(*) from some_ones_table a , (select c
83681 | active | BufferPin | BufferPin | 15:24:45.260184 | autovacuum: VACUUM schema1.some_ones_table (to prev
23340 | active | LWLock | buffer_io | 1 day 16:39:18.732685 | select column_001,column2,column3,column000004,
24074 | active | LWLock | buffer_io | 1 day 16:41:55.91496 | WITH this_subquery_01 as (select column_001,PIPELI
8110 | active | LWLock | buffer_io | 1 day 17:03:52.767838 | WITH this_subquery_01 as (select column_001,PIPEL
51767 | active | LWLock | buffer_io | 1 day 19:03:47.006302 | WITH this_subquery_01 as (select column_001,PIPEL
9217 | active | LWLock | buffer_io | 1 day 20:01:58.572314 | WITH this_subquery_01 as (select column_001,PIPEL
6086 | active | IO | DataFileRead | 1 day 20:06:08.584313 | WITH this_subquery_01 as (select column_001,PIPEL
115385 | active | LWLock | buffer_io | 1 day 20:35:27.617606 | WITH this_subquery_01 as (select column_001,PIPEL
94256 | idle in trx | Client | ClientRead | 27 days 02:33:48.940102 | select subquery00_.column_001 as COLUMN01_2_0_, a
(33 rows)
Solving Problems With Wait Events
Solving Problems With Wait Events
Active Session Sampling of Wait Events on PostgreSQL:
• Performance Insights on Amazon RDS
• RDS PostgreSQL 10+
• Aurora PostgreSQL 9.6+ (v10 Wait Events were backported)
• pg_wait_sampling and PoWa
• pgSentinal
• pgCenter
• (what am I missing?)
Solving Problems With Wait Events
SELECT sql_statement, count(*)
FROM pg_stat_activity_samples
WHERE date BETWEEN problem_start AND problem_end
GROUP BY sql_statement
ORDER BY count(*) DESC;
SELECT wait_event, count(*)
FROM pg_stat_activity_samples
WHERE sql_statement=top_problem_sql_statement
AND date BETWEEN problem_start AND problem_end
GROUP BY wait_event
ORDER BY count(*) DESC;
Solving Problems With Wait Events
Active Session Summary (Performance Insights, etc)
Top SQL & Top Wait Events
EXPLAIN ANALYZE with Buffers, IO timing, etc
Investigate STEP & WAIT Taking The Most Time
Thank you!
aws.amazon.com/rds/postgresql
Stuffed Elephant Store
Stuffed Elephant Store
A unique service that produces on-
demand stuffed elephants
Multiple sizes with long or short
fur
Any color the customer wants, as
long as its blue
Breibeest (Flickr), CC BY
Stuffed Elephant Store
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Performance Insights
See the query text
and the wait
events by query
Look back 7 days
or as much as 2
years to find
activity
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
explain.depesz.com
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Execution Plans
Explains how
Postgres plans to
execute a query
Shows the type of
operation, the
estimated cost,
and the estimated
number of rows
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
System Catalogs
Contains the
structure of all
objects in the
database
Statistics views
shows usage of
the objects
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Stuffed Elephant Store
Indexes
PostgreSQL has a
rich set of index
types
Base functionality
can be enhanced
by specialized
extensions
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Performance Insights
Drill down into
time periods show
finer grain detail
The 3 minute view
shows 1 second
granularity
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Locking
Locks are held for
the duration of
the transaction
Locks can be held
on a table, row or
other object such
as transaction IDs
It’s so simple!
Solving Problems With Wait Events
Solving Problems With Wait Events
Aurora PostgreSQL:
• AWS Documentation covers
Aurora-Specific Wait
Events
• Shares Code With
Community PostgreSQL
(and merges regularly)
Solving Problems With Wait Events
DataFileRead, buffer_io
• I/O Read Path: Check SQL
execution plans, optimize
for fewer block reads.
XactSync, WALWrite
• I/O Write Path: Check
commit rate, volume of
change.
transactionid, relation, etc.
• Application Design: check
pg_locks during contention.
buffer_content
• Hot Block in Memory: check
foreign keys, optimize
contention (e.g. schema
redesign, fillfactor, etc).
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Individual Named LWLocks
Tranches for SLRU
Tranches for Shared Buffers
Individual Named Tranches
Solving Problems With Wait Events
Uppercase LWLocks: see lwlocknames.txt, search code directly
Lowercase LWLocks: tranches (arrays of locks for groups of objects)
• SLRUs – see SimpleLruInit() callers on doxygen
• Shared Buffers (buffer_content, buffer_io)
• Other Tranches – see RegisterLWLockTranches() in lwlock.c
(Heavyweight/SQL/Transaction) Locks: LockTagType enum in lock.h
• Strings come from matching structure LockTagTypeNames in lockfuncs.c
BufferPin: Vacuuming - see PG_WAIT_BUFFER_PIN refs on doxygen
Extension: FDWs, BG worker startup, etc - see PG_WAIT_EXTENSION refs
on doxygen
Activity, Client, IPC, Timeout and IO: enums, see pgstat.h
Multi-AZ [multiple physical locations]
Physical Backups
• Max allowed retention (35 days in RDS)
• Regular restore testing
Logical Backups
• Scheduled Exports/Dumps and Application Re-Drive
• Logical Replication
Huge Pages
Autovacuum Logging (RDS: need “force” setting)
• Logging Level = INFO
• Minimum duration = 10 seconds
PostgreSQL quarterly updates
• Stable minor releases for security and bug fixes (RDS)
• Some Aurora minors have new development work (Aurora)
• Remember to upgrade extensions; it’s not automatic
Connection Pooling
• Centralized and decentralized (app-tier) architectures exist
• Recycle server connections (e.g. server_lifetime)
Performance Insights [monitor active session waits]
• Keep the history
Enhanced Monitoring [OS monitoring]
• 10 second (or lower) granularity
Preload pg_stat_statements
Limit on temp usage by default (esp. Aurora)
• Log temp usage when close to the limit
Alarms
• Maximum used transaction IDs
• DBLoad [Average Active Sessions]
• Free disk space (RDS) / Free local storage (Aurora)
• Memory / swap
• Replica Lag (RDS)
PostgreSQL Happiness Hints version:
jer_s/2019-09-19
tinyurl.com/waitevents
Thank you!
aws.amazon.com/rds/postgresql
Solving Problems With Wait Events
244 GB DDR4 Memory
16 Physical Intel Xeon E5-2686 v4 (Broadwell) processors
Solving Problems With Wait Events
244 GB DDR4 Memory
16 Physical Intel Xeon E5-2686 v4 (Broadwell) processors
Solving Problems With Wait Events
© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Individual Named LWLocks
Tranches for SLRU
Tranches for Shared Buffers
Individual Named Tranches
Solving Problems With Wait Events
Uppercase LWLocks: see lwlocknames.txt, search code directly
Lowercase LWLocks: tranches (arrays of locks for groups of objects)
• SLRUs – see SimpleLruInit() callers on doxygen
• Shared Buffers (buffer_content, buffer_io)
• Other Tranches – see RegisterLWLockTranches() in lwlock.c
(Heavyweight/SQL/Transaction) Locks: LockTagType enum in lock.h
• Strings come from matching structure LockTagTypeNames in lockfuncs.c
BufferPin: Vacuuming - see PG_WAIT_BUFFER_PIN refs on doxygen
Extension: FDWs, BG worker startup, etc - see PG_WAIT_EXTENSION refs
on doxygen
Activity, Client, IPC, Timeout and IO: enums, see pgstat.h
Solving Problems With Wait Events
Solving Problems With Wait Events
Solving Problems With Wait Events
Solving Problems With Wait Events
Solving Problems With Wait Events
Solving Problems With Wait Events
Solving Problems With Wait Events
Solving Problems With Wait Events

More Related Content

What's hot (20)

PDF
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
PDF
MySQL GTID 시작하기
I Goo Lee
 
PDF
Patroni - HA PostgreSQL made easy
Alexander Kukushkin
 
PDF
Backup and-recovery2
Command Prompt., Inc
 
PDF
MariaDB 10.11 key features overview for DBAs
Federico Razzoli
 
PDF
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
PostgresOpen
 
PPTX
Tuning PostgreSQL for High Write Throughput
Grant McAlister
 
PDF
PostgreSQL Replication High Availability Methods
Mydbops
 
PDF
PostgreSQL : Introduction
Open Source School
 
PDF
My first 90 days with ClickHouse.pdf
Alkin Tezuysal
 
PDF
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
PostgreSQL-Consulting
 
PDF
Enabling Vectorized Engine in Apache Spark
Kazuaki Ishizaki
 
PDF
Introduction VAUUM, Freezing, XID wraparound
Masahiko Sawada
 
PDF
PostgreSQL on Kubernetes: Realizing High Availability with PGO (Postgres Ibiz...
NTT DATA Technology & Innovation
 
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
PDF
Get to know PostgreSQL!
Oddbjørn Steffensen
 
PPTX
PostgreSQL Database Slides
metsarin
 
PDF
PostgreSQL replication
NTT DATA OSS Professional Services
 
PDF
Inside vacuum - 第一回PostgreSQLプレ勉強会
Masahiko Sawada
 
PDF
レプリケーション遅延の監視について(第40回PostgreSQLアンカンファレンス@オンライン 発表資料)
NTT DATA Technology & Innovation
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
MySQL GTID 시작하기
I Goo Lee
 
Patroni - HA PostgreSQL made easy
Alexander Kukushkin
 
Backup and-recovery2
Command Prompt., Inc
 
MariaDB 10.11 key features overview for DBAs
Federico Razzoli
 
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
PostgresOpen
 
Tuning PostgreSQL for High Write Throughput
Grant McAlister
 
PostgreSQL Replication High Availability Methods
Mydbops
 
PostgreSQL : Introduction
Open Source School
 
My first 90 days with ClickHouse.pdf
Alkin Tezuysal
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
PostgreSQL-Consulting
 
Enabling Vectorized Engine in Apache Spark
Kazuaki Ishizaki
 
Introduction VAUUM, Freezing, XID wraparound
Masahiko Sawada
 
PostgreSQL on Kubernetes: Realizing High Availability with PGO (Postgres Ibiz...
NTT DATA Technology & Innovation
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Get to know PostgreSQL!
Oddbjørn Steffensen
 
PostgreSQL Database Slides
metsarin
 
PostgreSQL replication
NTT DATA OSS Professional Services
 
Inside vacuum - 第一回PostgreSQLプレ勉強会
Masahiko Sawada
 
レプリケーション遅延の監視について(第40回PostgreSQLアンカンファレンス@オンライン 発表資料)
NTT DATA Technology & Innovation
 

Similar to Wait! What’s going on inside my database? (20)

PDF
Wait! What’s going on inside my database? (SCaLE 21x Update)
Jeremy Schneider
 
PDF
Wait! What’s going on inside my database? (PASS 2023 Update)
Jeremy Schneider
 
PPTX
Oow2016 review-db-dev-bigdata-BI
Getting value from IoT, Integration and Data Analytics
 
PPTX
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Lucas Jellema
 
PPTX
Dissecting Open Source Cloud Evolution: An OpenStack Case Study
Salman Baset
 
PDF
Big Data Seervices in Danaos Use Case
Big Data Value Association
 
PDF
Macy's: Changing Engines in Mid-Flight
DataStax Academy
 
PDF
Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...
Kristofferson A
 
PPTX
Bloomreach - BloomStore Compute Cloud Infrastructure
bloomreacheng
 
PDF
2021 04-20 apache arrow and its impact on the database industry.pptx
Andrew Lamb
 
PDF
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Databricks
 
PDF
How We Added Replication to QuestDB - JonTheBeach
javier ramirez
 
PDF
Tutorial On Database Management System
psathishcs
 
PPTX
Presentation
Dimitris Stripelis
 
PDF
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
Marcin Bielak
 
PDF
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
cookie1969
 
PDF
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Kristofferson A
 
PDF
Re-Engineering PostgreSQL as a Time-Series Database
All Things Open
 
PDF
Apache Spark v3.0.0
Jean-Georges Perrin
 
PDF
Using a Fast Operational Database to Build Real-time Streaming Aggregations
VoltDB
 
Wait! What’s going on inside my database? (SCaLE 21x Update)
Jeremy Schneider
 
Wait! What’s going on inside my database? (PASS 2023 Update)
Jeremy Schneider
 
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Lucas Jellema
 
Dissecting Open Source Cloud Evolution: An OpenStack Case Study
Salman Baset
 
Big Data Seervices in Danaos Use Case
Big Data Value Association
 
Macy's: Changing Engines in Mid-Flight
DataStax Academy
 
Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...
Kristofferson A
 
Bloomreach - BloomStore Compute Cloud Infrastructure
bloomreacheng
 
2021 04-20 apache arrow and its impact on the database industry.pptx
Andrew Lamb
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Databricks
 
How We Added Replication to QuestDB - JonTheBeach
javier ramirez
 
Tutorial On Database Management System
psathishcs
 
Presentation
Dimitris Stripelis
 
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
Marcin Bielak
 
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
cookie1969
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Kristofferson A
 
Re-Engineering PostgreSQL as a Time-Series Database
All Things Open
 
Apache Spark v3.0.0
Jean-Georges Perrin
 
Using a Fast Operational Database to Build Real-time Streaming Aggregations
VoltDB
 
Ad

Recently uploaded (20)

PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Python basic programing language for automation
DanialHabibi2
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Ad

Wait! What’s going on inside my database?

  • 1. Jeremy Schneider, Amazon RDS Wait! What’s going on inside my database? PostgreSQL and the Science of Database Performance Updated: Sep 19, 2019
  • 2. About PostgreSQL 1970: Mathematician Edgar F. Codd, working as researcher for IBM, publishes “A Relational Model of Data for Large Shared Data Banks” 1973: Michael Stonebraker and Eugene Wong at University of California Berkeley seek funding and begin development of a relational database called INGRES 1986: Michael Stonebraker and Lawrence A. Rowe at University of California Berkeley publish “The Design of POSTGRES” – a new database that is the successor to INGRES 1994: Andrew Yu and Jolly Chen at University of California Berkeley add support for the SQL language 1996: Transition to non-university core team of volunteers, official release under new name POSTGRESQL 1985
  • 6. About Database Performance 1990’s Manager: “Dear DBA: Expert consultants have taught us that if the Buffer Cache Hit Ratio (BCHR) is below 90% then the system immediately needs an expensive tuning engagement. Please report any databases that have BCHR < 90%.” Delfador Chibi by Peileppe CC0
  • 7. About Database Performance 1990’s Manager: “Dear DBA: Expert consultants have taught us that if the Buffer Cache Hit Ratio (BCHR) is below 90% then the system immediately needs an expensive tuning engagement. Please report any databases that have BCHR < 90%.” Delfador Chibi by Peileppe CC0
  • 8. Nørgaard, Mogens et al. Oracle Insights: Tales of the Oak Table. Berkeley, CA: Apress/OakTable Press, 2004. p76-77. About Database Performance
  • 9. About Database Performance Millsap, Cary V. Optimizing Oracle Performance. Sebastopol, CA: OReilly, 2003. p225, 240, 258-259 R = S + W “How long the SQL takes to run” See also: • Shallahamer, Craig. Forecasting Oracle Performance. Berkeley, CA: Apress, 2007.
  • 10. About Database Performance Active Session Sampling (JB’s notebook, 2004) Images & Quotes Used With Permission
  • 12. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Mariinsky Theatre, St. Petersburg by Sandra Cohen-Rose and Colin Rose (Montreal, Canada) CC BY-SA
  • 13. Mariinsky Theatre, St. Petersburg by Sandra Cohen-Rose and Colin Rose (Montreal, Canada) CC BY-SA Wait Events
  • 14. Mariinsky Theatre, St. Petersburg by Sandra Cohen-Rose and Colin Rose (Montreal, Canada) CC BY-SA Wait Events • 1990s: Database kernel instrumentation: • Counters and tools to snapshot/compare them • Events (log a message under certain circumstances) • 1992: Unable to solve a performance problem, as a last resort, engineers added event code in version 7.0.12 capable of emitting log messages when the database waited for something • First exposed in V$SESSION_WAIT and later in V$SESSION (equivalent of pg_stat_activity) • PostgreSQL built on concepts that had become standard across the industry
  • 15. Mariinsky Theatre, St. Petersburg by Sandra Cohen-Rose and Colin Rose (Montreal, Canada) CC BY-SA Wait Events Millsap, Cary V. Optimizing Oracle Performance. Sebastopol, CA: OReilly, 2003. p225, 240, 258-259 R = S + W “How long the SQL takes to run” See also: • Shallahamer, Craig. Forecasting Oracle Performance. Berkeley, CA: Apress, 2007.
  • 16. Mariinsky Theatre, St. Petersburg by Sandra Cohen-Rose and Colin Rose (Montreal, Canada) CC BY-SA Wait Events Active Session Sampling (JB’s notebook, 2004) Images & Quotes Used With Permission
  • 17. Mariinsky Theatre, St. Petersburg by Sandra Cohen-Rose and Colin Rose (Montreal, Canada) CC BY-SA Wait Events “But why are these events called wait events? … In short, when a session is not using the CPU, it may be waiting for a resource, an action to complete, or simply more work. Hence, events that are associated with all such waits are known as wait events.” Shee, Richmond, Kirtikumar Deshpande, and K. Gopalakrishnan. Oracle Wait Interface a Practical Guide to Performance Diagnostics & Tuning. New York: London, 2004. p16
  • 18. Mariinsky Theatre, St. Petersburg by Sandra Cohen-Rose and Colin Rose (Montreal, Canada) CC BY-SA Wait Events High-Level Idea: Caveats: • OS scheduling/runqueue • Measurement overhead, OS kernel CPU time (e.g. I/O) The database is WAITING any time when it’s not running on the CPU
  • 19. Mariinsky Theatre, St. Petersburg by Sandra Cohen-Rose and Colin Rose (Montreal, Canada) CC BY-SA Wait Events psql> SELECT… Idle WaitingCpu
  • 20. Mariinsky Theatre, St. Petersburg by Sandra Cohen-Rose and Colin Rose (Montreal, Canada) CC BY-SA Wait Events
  • 21. Mariinsky Theatre, St. Petersburg by Sandra Cohen-Rose and Colin Rose (Montreal, Canada) CC BY-SA Wait Events
  • 22. Mariinsky Theatre, St. Petersburg by Sandra Cohen-Rose and Colin Rose (Montreal, Canada) CC BY-SA Wait Events
  • 23. Mariinsky Theatre, St. Petersburg by Sandra Cohen-Rose and Colin Rose (Montreal, Canada) CC BY-SA Wait Events Significant Commits: Version 9.6 • Aa65de0 – 11 Sep 2015 – Autogenerate lwlocknames.[c|h] • 53be0b1 – 10 Mar 2016 – Heavy/Lightweight Locks, Buffer Pins
  • 24. Mariinsky Theatre, St. Petersburg by Sandra Cohen-Rose and Colin Rose (Montreal, Canada) CC BY-SA Wait Events Significant Commits: Version 9.6 • Aa65de0 – 11 Sep 2015 – Autogenerate lwlocknames.[c|h] • 53be0b1 – 10 Mar 2016 – Heavy/Lightweight Locks, Buffer Pins Version 10 • 6f3bd98 – 4 Oct 2016 – Latches & Sockets, Clients, Main Loops • 249cf07 – 18 Mar 2017 – I/O • Fc70a4b – 26 Mar 2017 – Background and Auxiliary Processes Version 11 • 1804284 – 20 Dec 2017 – Parallel-Aware Hash Joins
  • 25. Mariinsky Theatre, St. Petersburg by Sandra Cohen-Rose and Colin Rose (Montreal, Canada) CC BY-SA Wait Events src/include/pgstat.h
  • 26. Mariinsky Theatre, St. Petersburg by Sandra Cohen-Rose and Colin Rose (Montreal, Canada) CC BY-SA Wait Events doxygen.postgresql.org
  • 27. Mariinsky Theatre, St. Petersburg by Sandra Cohen-Rose and Colin Rose (Montreal, Canada) CC BY-SA Wait Events src/backend/postmaster/pgstat.c
  • 28. Mariinsky Theatre, St. Petersburg by Sandra Cohen-Rose and Colin Rose (Montreal, Canada) CC BY-SA Wait Events src/backend/postmaster/pgstat.c
  • 29. Mariinsky Theatre, St. Petersburg by Sandra Cohen-Rose and Colin Rose (Montreal, Canada) CC BY-SA Wait Events Gaps after migrating to Open Source/Community PostgreSQL • Wait Event Counters and Cumulative Times • Wait Event Arguments (object, block, etc) • Comprehensive tracking of CPU time (POSIX rusage) • Ability to find previous SQL for COMMIT/ROLLBACK • Needed to identify which transaction is committing. (Other databases do not update SQL text for COMMIT statement) • On-CPU State • SQL Execution Stage (parse/plan/execute/fetch) • SQL Execution Plan Identifier in pg_stat_statements • Current plan node • Progress on long operations (e.g. large seqscan) • Better runtime visibility into PLs
  • 30. I can haz Wait Events? Solving Problems with Wait Events in PostgreSQL By Antony Griffiths (Flickr), CC BY
  • 31. Solving Problems With Wait Events pid | state | wait_event_type | wait_event | xact_runtime | query_short --------+-------------+-----------------+---------------------+-------------------------+---------------------------------------------------- 8135 | active | | | -00:00:00.000941 | autovacuum: VACUUM pghist.pg_stat_statements_20190 8168 | active | | | 00:00:00 | SELECT col1, col2, | | | | | 108975 | | Activity | WalWriterMain | | 108976 | | Activity | AutoVacuumMain | | 108973 | | Activity | CheckpointerMain | | 108974 | | Activity | BgWriterMain | | 108979 | | Activity | LogicalLauncherMain | | 8185 | active | | | 00:00:00.07941 | autovacuum: VACUUM pghist.pg_stat_sys_indexes_2019 8212 | active | | | 00:00:00.349238 | autovacuum: VACUUM pghist.pg_stat_statements_20190 115699 | active | Lock | relation | 00:30:01.170404 | SELECT proc('param1') 103268 | active | IO | DataFileRead | 00:46:46.277548 | select count(*) from some_ones_table a , (select c 95936 | active | LWLock | buffer_io | 00:56:57.327904 | SELECT col1 FROM some_ones_table a, (SELECT col1 a 95935 | active | IO | DataFileRead | 00:56:57.328169 | SELECT col1 FROM some_ones_table a, (SELECT col1 a 95921 | active | LWLock | buffer_io | 00:56:57.393765 | SELECT col1 FROM some_ones_table a, (SELECT col1 a 56628 | active | IO | DataFileRead | 01:47:55.333596 | select col1 from some_ones_table WHERE err_id in ( 53981 | active | IO | BufFileRead | 01:51:40.986659 | SELECT col1 FROM some_ones_table a, (SELECT asin a 49386 | active | LWLock | buffer_io | 01:58:13.166389 | SELECT count(*) FROM some_ones_table a, (SELECT co 29172 | active | IO | BufFileRead | 02:04:09.108342 | SELECT count(*) FROM some_ones_table a, (SELECT co 43208 | active | LWLock | buffer_io | 02:06:39.296499 | SELECT count(*) FROM some_ones_table a, (SELECT co 43207 | active | IO | DataFileRead | 02:06:39.29666 | SELECT count(*) FROM some_ones_table a, (SELECT co 31401 | active | IPC | MessageQueueReceive | 02:06:39.370239 | SELECT count(*) FROM some_ones_table a, (SELECT co 12387 | active | IO | DataFileRead | 02:46:50.262871 | select count(*) from some_ones_table a , (select c 12386 | active | IO | DataFileRead | 02:46:50.263142 | select count(*) from some_ones_table a , (select c 12385 | active | IO | DataFileRead | 02:46:50.266696 | select count(*) from some_ones_table a , (select c 83681 | active | BufferPin | BufferPin | 15:24:45.260184 | autovacuum: VACUUM schema1.some_ones_table (to prev 23340 | active | LWLock | buffer_io | 1 day 16:39:18.732685 | select column_001,column2,column3,column000004, 24074 | active | LWLock | buffer_io | 1 day 16:41:55.91496 | WITH this_subquery_01 as (select column_001,PIPELI 8110 | active | LWLock | buffer_io | 1 day 17:03:52.767838 | WITH this_subquery_01 as (select column_001,PIPEL 51767 | active | LWLock | buffer_io | 1 day 19:03:47.006302 | WITH this_subquery_01 as (select column_001,PIPEL 9217 | active | LWLock | buffer_io | 1 day 20:01:58.572314 | WITH this_subquery_01 as (select column_001,PIPEL 6086 | active | IO | DataFileRead | 1 day 20:06:08.584313 | WITH this_subquery_01 as (select column_001,PIPEL 115385 | active | LWLock | buffer_io | 1 day 20:35:27.617606 | WITH this_subquery_01 as (select column_001,PIPEL 94256 | idle in trx | Client | ClientRead | 27 days 02:33:48.940102 | select subquery00_.column_001 as COLUMN01_2_0_, a (33 rows)
  • 32. Solving Problems With Wait Events
  • 33. Solving Problems With Wait Events Active Session Sampling of Wait Events on PostgreSQL: • Performance Insights on Amazon RDS • RDS PostgreSQL 10+ • Aurora PostgreSQL 9.6+ (v10 Wait Events were backported) • pg_wait_sampling and PoWa • pgSentinal • pgCenter • (what am I missing?)
  • 34. Solving Problems With Wait Events SELECT sql_statement, count(*) FROM pg_stat_activity_samples WHERE date BETWEEN problem_start AND problem_end GROUP BY sql_statement ORDER BY count(*) DESC; SELECT wait_event, count(*) FROM pg_stat_activity_samples WHERE sql_statement=top_problem_sql_statement AND date BETWEEN problem_start AND problem_end GROUP BY wait_event ORDER BY count(*) DESC;
  • 35. Solving Problems With Wait Events Active Session Summary (Performance Insights, etc) Top SQL & Top Wait Events EXPLAIN ANALYZE with Buffers, IO timing, etc Investigate STEP & WAIT Taking The Most Time
  • 38. Stuffed Elephant Store A unique service that produces on- demand stuffed elephants Multiple sizes with long or short fur Any color the customer wants, as long as its blue Breibeest (Flickr), CC BY
  • 40. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 41. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 42. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 43. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 44. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 45. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 46. Performance Insights See the query text and the wait events by query Look back 7 days or as much as 2 years to find activity
  • 47. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 48. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 49. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. explain.depesz.com
  • 50. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 51. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 52. Execution Plans Explains how Postgres plans to execute a query Shows the type of operation, the estimated cost, and the estimated number of rows
  • 53. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 54. System Catalogs Contains the structure of all objects in the database Statistics views shows usage of the objects
  • 55. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 56. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 57. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 58. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 60. Indexes PostgreSQL has a rich set of index types Base functionality can be enhanced by specialized extensions
  • 61. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 62. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 63. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 64. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 65. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 66. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 67. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 68. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 69. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 70. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 71. Performance Insights Drill down into time periods show finer grain detail The 3 minute view shows 1 second granularity
  • 72. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 73. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 74. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 75. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 76. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 77. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 78. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 79. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 80. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 81. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 82. Locking Locks are held for the duration of the transaction Locks can be held on a table, row or other object such as transaction IDs
  • 83. It’s so simple! Solving Problems With Wait Events
  • 84. Solving Problems With Wait Events Aurora PostgreSQL: • AWS Documentation covers Aurora-Specific Wait Events • Shares Code With Community PostgreSQL (and merges regularly)
  • 85. Solving Problems With Wait Events DataFileRead, buffer_io • I/O Read Path: Check SQL execution plans, optimize for fewer block reads. XactSync, WALWrite • I/O Write Path: Check commit rate, volume of change. transactionid, relation, etc. • Application Design: check pg_locks during contention. buffer_content • Hot Block in Memory: check foreign keys, optimize contention (e.g. schema redesign, fillfactor, etc).
  • 86. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Individual Named LWLocks Tranches for SLRU Tranches for Shared Buffers Individual Named Tranches
  • 87. Solving Problems With Wait Events Uppercase LWLocks: see lwlocknames.txt, search code directly Lowercase LWLocks: tranches (arrays of locks for groups of objects) • SLRUs – see SimpleLruInit() callers on doxygen • Shared Buffers (buffer_content, buffer_io) • Other Tranches – see RegisterLWLockTranches() in lwlock.c (Heavyweight/SQL/Transaction) Locks: LockTagType enum in lock.h • Strings come from matching structure LockTagTypeNames in lockfuncs.c BufferPin: Vacuuming - see PG_WAIT_BUFFER_PIN refs on doxygen Extension: FDWs, BG worker startup, etc - see PG_WAIT_EXTENSION refs on doxygen Activity, Client, IPC, Timeout and IO: enums, see pgstat.h
  • 88. Multi-AZ [multiple physical locations] Physical Backups • Max allowed retention (35 days in RDS) • Regular restore testing Logical Backups • Scheduled Exports/Dumps and Application Re-Drive • Logical Replication Huge Pages Autovacuum Logging (RDS: need “force” setting) • Logging Level = INFO • Minimum duration = 10 seconds PostgreSQL quarterly updates • Stable minor releases for security and bug fixes (RDS) • Some Aurora minors have new development work (Aurora) • Remember to upgrade extensions; it’s not automatic Connection Pooling • Centralized and decentralized (app-tier) architectures exist • Recycle server connections (e.g. server_lifetime) Performance Insights [monitor active session waits] • Keep the history Enhanced Monitoring [OS monitoring] • 10 second (or lower) granularity Preload pg_stat_statements Limit on temp usage by default (esp. Aurora) • Log temp usage when close to the limit Alarms • Maximum used transaction IDs • DBLoad [Average Active Sessions] • Free disk space (RDS) / Free local storage (Aurora) • Memory / swap • Replica Lag (RDS) PostgreSQL Happiness Hints version: jer_s/2019-09-19 tinyurl.com/waitevents
  • 90. Solving Problems With Wait Events 244 GB DDR4 Memory 16 Physical Intel Xeon E5-2686 v4 (Broadwell) processors
  • 91. Solving Problems With Wait Events 244 GB DDR4 Memory 16 Physical Intel Xeon E5-2686 v4 (Broadwell) processors
  • 92. Solving Problems With Wait Events
  • 93. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Individual Named LWLocks Tranches for SLRU Tranches for Shared Buffers Individual Named Tranches
  • 94. Solving Problems With Wait Events Uppercase LWLocks: see lwlocknames.txt, search code directly Lowercase LWLocks: tranches (arrays of locks for groups of objects) • SLRUs – see SimpleLruInit() callers on doxygen • Shared Buffers (buffer_content, buffer_io) • Other Tranches – see RegisterLWLockTranches() in lwlock.c (Heavyweight/SQL/Transaction) Locks: LockTagType enum in lock.h • Strings come from matching structure LockTagTypeNames in lockfuncs.c BufferPin: Vacuuming - see PG_WAIT_BUFFER_PIN refs on doxygen Extension: FDWs, BG worker startup, etc - see PG_WAIT_EXTENSION refs on doxygen Activity, Client, IPC, Timeout and IO: enums, see pgstat.h
  • 95. Solving Problems With Wait Events
  • 96. Solving Problems With Wait Events
  • 97. Solving Problems With Wait Events
  • 98. Solving Problems With Wait Events
  • 99. Solving Problems With Wait Events
  • 100. Solving Problems With Wait Events
  • 101. Solving Problems With Wait Events
  • 102. Solving Problems With Wait Events