SlideShare a Scribd company logo
Reducing Risk When Upgrading
Your MySQL Environment
 
Kenny Gryp
MySQL Practice Manager
My Experience as MySQL
Consultant On Upgrading MySQL
it's quite complex...
Kenny Gryp
MySQL Practice Manager
Table of Contents
The O cial Documentation
Make Your Own Documentation
Potential Risks
Establish Upgrade Method For A Single Server
Rollback Scenario Testing
Test Writes
Test Individual Reads
Workload Testing
Establish (& Test) Migration Process
Migration In Production
(Rollback)
Post-Migration Assessment
3 / 77
The O cial Documentation
4 / 77
Oracle's Recommended Process
Backup your data
Read all release notes and assess
https://dev.mysql.com/doc/relnotes/mysql/5.7/en/
Read Changes Affecting Upgrades to MySQL 5.7
https://dev.mysql.com/doc/refman/5.7/en/upgrading-from-
previous-series.html
5 / 77
Oracle's Recommended Process
Upgrade Slaves First
6 / 77
Oracle's Recommended Process
Upgrade Slaves First
In-Place Upgrade:
Clean shutdown (innodb_fast_shutdown=0)
Run mysql_upgrade
Logical Upgrade:
mysqldump data
Import data again
Run mysql_upgrade to x mysql schema
http://dev.mysql.com/doc/refman/5.7/en/upgrading.html
7 / 77
Oracle's Recommended Process (cont.)
A Lot of Risk:
No guarantee queries will execute the same
No guarantee queries will be same speed or faster
No guarantee all your queries will still work (new default
stricter sql_mode)
There is no o cial support to upgrade from <5.6 to 5.7
but we might actually be able to do that
8 / 77
do-it-yourself
Documenting The Process
9 / 77
Documenting The Process
PEBKAC: Human errors happen and create issues
import data using wrong character set
setting up replica using wrong binlog le/pos
...
Document every step, we need to repeat it multiple times
10 / 77
making you afraid to upgrade by describing
Potential Risks
11 / 77
Optimizer Changes
Example: index_merge_intersection
Often seen during migrations to MySQL 5.6
Affects environments with sub-optimal indexing
Queries with c1='a' AND c2='b' when composite index
(c1,c2) is missing
Is often slower when selectivity with 1 of the 2 columns is
bad (and it happens frequently)
Result: a lot of queries were slower in new environment
Need SELECT performance tests between versions
https://www.percona.com/blog/2012/12/14/the-optimization-that-often-isnt-index-merge-intersection/
12 / 77
New Defaults In MySQL 5.7
The new defaults in MySQL 5.7 make a lot of sense:
More use of available features and performance
enhancements out of the box
More strictness with data/query validation
New Reserved words
Applications might not be ready for it.
Drupal 7 - https://www.drupal.org/node/2545480
They will/might break the application more easily:
sql_mode=ONLY_FULL_GROUP_BY,
STRICT_TRANS_TABLES, NO_ZERO_IN_DATE,
NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO,
NO_AUTO_CREATE_USER, NO_ENGINE_SUBSTITUTION
innodb_strict_mode=1
Needs SELECT & DML query validity tests between versions 13 / 77
Other Changes in MySQL 5.7
Passwords that use the older pre-4.1 password hashing format
is removed.
14 / 77
MySQL 5.0.37
+-------+
| 0 |
+-------+
MySQL 5.0.45
+-------+
| 1 |
+-------+
Minor Versions Also At Risk
CREATE TABLE date (d DATE);
INSERT INTO date VALUES ('2017-04-19');
SELECT COUNT(*) FROM date
WHERE d < NOW()-INTERVAL 1 DAY;
Seen with DELETE FROM date WHERE d < NOW()-
INTERVAL 1 DAY in binlog_format=STATEMENT
environments.
Needs SELECT & DML query result tests between versions
15 / 77
Workload
SYNC_BINLOG=1 in MySQL 5.7
Can impact certain environments, might not be noticed when
looking at a single query
InnoDB LRU Flushing changes require tuning for heavy
workloads in 5.6
(innodb_lru_scan_depth)
When switching to MySQL 8.0 with the new data dictionary
...
Need to do Workload Testing between versions
http://mysqlentomologist.blogspot.com/2015/10/fun-with-bugs-38-regression-bugs-in.html
http://lefred.be/content/sync_binlog-1-in-5-7/
16 / 77
How Do We Reduce All This Risk?
17 / 77
Testing!
18 / 77
establish
Upgrade Method For A Single Server
19 / 77
Upgrade Method For A Single Server
Follow MySQL documentation:
http://dev.mysql.com/doc/refman/5.7/en/upgrading.html
Ensure to document every command
Restore from backup
Or take a replica you can miss
20 / 77
Upgrade Method For A Single Server
21 / 77
Upgrade Method For A Single Server
22 / 77
Replication Consistency
Testing Writes
23 / 77
Writes - Replication Consistency
pt-table-checksum:
validate consistency in a replication topology
Identify problems caused by PEBKAC
Ensure events replicate properly
(binlog_format=STATEMENT)
Upgrade a replica or add a replica which is using the modi ed
version.
Do it on production, will have no result in test/staging
https://www.percona.com/doc/percona-toolkit/3.0/pt-table-checksum.html
24 / 77
Writes - Replicate Test Server
25 / 77
often left behind is
Rollback Scenario Testing
26 / 77
Rollback Scenario Testing
Possibility to fall back in case something went wrong during
migration
Can be done using replication, but has to be tested!
27 / 77
Writes - Rollback Testing
28 / 77
Rollback Scenario Testing
You might need to change some settings to your new my.cnf to be
able to support replicating back.
Example:
binlog_checksum = NONE
binlog_row_image = FULL
binlog_rows_query_log_events = OFF
log_bin_use_v1_row_events = 1
gtid_mode = OFF
log_slave_updates=1
skip-slave-start
29 / 77
Writes - Checksums - GTID
30 / 77
Writes - Checksums - Non-GTID
31 / 77
Writes - Checkums - ROW
32 / 77
Writes - Checkums - ROW
33 / 77
Where To Run pt-table-checksum?
GTID:
pt-table-checksum can only be run on Master
(Errant Transactions)
Or scratch the pt-table-checksum host after tests
non-GTID:
pt-table-checksum can be run on intermediate master
binlog_format=ROW:
only 1 tier below can be checksummed
run on every tier that has a replica (for rollback)
pt-table-checksum can bring prod overhead when run on
active master
Let replication run for a while before checksumming
34 / 77
pt-table-checksum results
On every replica (including rollback):
SELECT db, tbl, SUM(this_cnt) AS total_rows,
COUNT(*) AS chunks
FROM percona.checksum
WHERE (master_cnt <> this_cnt
OR master_crc <> this_crc
OR ISNULL(master_crc) <> ISNULL(this_crc))
GROUP BY db, tbl;
+----+-----------------+------------+--------+
| db | tbl | total_rows | chunks |
+----+-----------------+------------+--------+
| db | telephone_debit | 44342 | 1 |
| db | orderline | 21451 | 3 |
| db | orders | 25125215 | 12 |
+----+-----------------+------------+--------+ 35 / 77
pt-table-checksum - Analysis
Troubleshooting starts now...
What went wrong?
36 / 77
pt-table-checksum - Analysis
Which chunks failed?
db: db
tbl: telephone_debit
chunk: 100
chunk_time: 0.4956125
chunk_index: PRIMARY
lower_boundary: 5014733
upper_boundary: 5059074
this_crc: 7fd37eb9
this_cnt: 44342
master_crc: b7babd94
master_cnt: 44342
ts: 2013-02-05 01:59:48
37 / 77
pt-table-checksum - Analysis
Which chunks failed?
db: db
tbl: telephone_debit
chunk: 100
chunk_time: 0.4956125
chunk_index: PRIMARY
lower_boundary: 5014733
upper_boundary: 5059074
this_crc: 7fd37eb9
this_cnt: 44342
master_crc: b7babd94
master_cnt: 44342
ts: 2013-02-05 01:59:48
38 / 77
pt-table-checksum - Analysis
SELECT *
INTO outfile '/tmp/telephone_debit_mysql56'
FROM db.telephone_debit
WHERE id BETWEEN 5014733 AND 5059074;
SELECT *
INTO outfile '/tmp/telephone_debit_mysql57'
FROM db.telephone_debit
WHERE id BETWEEN 5014733 AND 5059074;
# diff -u /tmp/telephone_debit_mysql5{6,7}
39 / 77
pt-table-checksum - Analysis
SELECT *
INTO outfile '/tmp/telephone_debit_mysql56'
FROM db.telephone_debit
WHERE id BETWEEN 5014733 AND 5059074;
SELECT *
INTO outfile '/tmp/telephone_debit_mysql57'
FROM db.telephone_debit
WHERE id BETWEEN 5014733 AND 5059074;
# diff -u /tmp/telephone_debit_mysql5{6,7}
Use twindb_table_compare!
https://github.com/twindb/twindb_table_compare
40 / 77
pt-table-checksum - Analysis
Wrong upgrade method
backups
wrong replication le/pos
...
binlog_format=STATEMENT using (UUID()...)
Common Seen Issues replicating older versions:
Floating point differences: Storing currencies in a DOUBLE
Temporal data types
Invalid dates converted to zero dates
Trailing spaces in CHAR elds
41 / 77
Testing Writes
Consistency Checks Process:
Checksum
Check for differences
On new environment
On rollback environment
For each inconsistency
Analyze diff
Find root cause
Fix problem
Document problem & solution
Repeat checksum again
42 / 77
Testing Individual Reads
43 / 77
Testing Reads - Collect Queries
44 / 77
Testing Reads - Collect Queries
Collection Techniques:
Slow Query Log
long_query_time=0
Careful when ~+10000 QPS
Percona Server: log_slow_rate_limit
tcpdump
'packets lost' in libpcap
Application/Load Balancer queries
Ensure:
Get the full workload (long enough)
Get data from Master & Replicas
Collect batchjob queries running at night
https://www.percona.com/doc/percona-server/5.7/diagnostics/slow_extended.html
45 / 77
Testing Reads - Setup 2 Environments
46 / 77
Testing Reads - Setup 2 Environments
Need 2 Test Servers:
Reuse servers from checksum + rollback
Ensure they have the same data
(break replication at same time)
Same HW speci cations
Similar Con gurations on buffer pool, flatc...
Fast enough to more or less resemble production
Optionally can be done using 1 machine
(pt-upgrade --save-results)
47 / 77
Testing Reads - pt-upgrade
48 / 77
Testing Reads - pt-upgrade
pt-upgrade:
runs one query at a time on both test environments
compares differences:
warnings/errors
resultset (even different order)
query response time
Run pt-upgrade on third host with similar network latency
Run twice to warm up buffer pool rst (need to be equal)
Can also compare writes for execution time & warnings
Filter slowlog initially to limit similar queries
pt-query-digest
--no-report --output slowlog --samples 20
https://www.percona.com/doc/percona-toolkit/3.0/pt-upgrade.html
49 / 77
Testing Reads - pt-upgrade
Reporting class because there are 1000 row diffs.
Total queries 10
Unique queries 10
Discarded queries 0
select ... from ...
##
## Row diffs: 10
##
-- 1.
@ row 2
< 13178,"dim0",37,2,21,,,0,0,0,1,NULL,NULL
> 13178,"dimø",37,2,21,,,0,0,0,1,NULL,NULL
...
50 / 77
Testing Reads - pt-upgrade
Reporting class because it has diffs,
but hasn't been reported yet.
SELECT * FROM `database`.table
WHERE treeid = '' AND productid='0'
## Warning diffs: 2
Code: 1366
Level: Warning
Message: Incorrect integer value: ''
for column 'treeid' at row 1
vs.
No warning 1366
51 / 77
Testing Reads - pt-upgrade
SELECT *
FROM `database`.client_orders
WHERE client=?
AND blacklist=? LIMIT ?
## Query time diffs: 1
-- 1.
0.000513 vs. 0.036395 seconds (70.9x increase)
SELECT *
FROM `database`.client_orders
WHERE client=57450
AND blacklist=1 LIMIT 1
52 / 77
Testing Reads Process
Collect queries
Run pt-upgrade (twice)
For each entry in report
Figure out why it is reported
Deploy x in Prod Application
Make schema changes
Document analysis
Run pt-upgrade again
53 / 77
one of the most challenging is
Testing Workload
54 / 77
Workload Testing - Percona Playback
55 / 77
Workload Testing - Query Playback
Uses slowlog to replay queries
Needs long_query_time=0 - challenging on busy servers
Enough data during peak workload
Tries to execute workload as realistically as possible
same connections, same transactions, same delays between
queries
Run against both environments, compare speed
Think about preloading buffer on both the same way
Active development by Marius Wachtler (ex)-DropBox! Thank
you!
(uno cal product of Percona, no support)
56 / 77
Workload Testing - ProxySQL Mirroring
57 / 77
Workload Testing - ProxySQL Mirroring
Mirror queries from Load Balancer to test environment
Good Blogpost:
https://www.pythian.com/blog/using-proxysql-validate-mysql-
updates/
58 / 77
establish (& test)
Migration Process
59 / 77
Migration Process
Create Migration Plan
Different for every environment/application
Upgrade a replica rst for a couple of days/weeks?
How to switch masters?
How is failover being handled nowadays?
MHA, Orchestrator, Manual, GTID/msyqlrpladmin...?
Test in staging!
60 / 77
the actual
Migration In Production
61 / 77
Migration - Create Slave Environments
62 / 77
Migration - Redirect Read Tra c
63 / 77
Migration - Application Switchover - 1
64 / 77
Migration - Application Switchover - 2
65 / 77
you (think you) will never need to do a
Rollback
66 / 77
Rollback
67 / 77
Rollback
What went wrong?
I did not follow the full process! (or I forgot to document it)
Do consistency checks again!
68 / 77
after all that testing, it's ok to spend time doing
Post-Migration Assessment
69 / 77
Post-Migration
Check trending for different behavior
more cpu load?
more disk IO?
higher amount of innodb_rows_* and handler_*
threads_running stability?
do some query optimization
If all looks good, scratch the 5.6 rollback & make it 5.7
Remove the rollback speci c con guration options
70 / 77
Post Migration Cleanup
71 / 77
small recap
Summary
72 / 77
Multi-Use
(Minor MySQL version upgrades)
Major MySQL version upgrades
Switching Hardware from Intel -> AMD archicture
Using a new kernel/libc/memory allocator
Switching storage engines
MariaDB/Percona Server/MySQL
...
73 / 77
Do I really have to go through this?
Many success stories:
Have done several MySQL upgrades from 4.1 -> 5.5
without intermediate slaves
Upgraded environments with major schema changes in the mix
(mssql-style environments using stored procedures only)
Found numerous application bugs using this process
Optimized many customers schemas/queries in the meantime
As long as you follow this process completely,
the risk of running into problems is quite small.
74 / 77
Do I really have to go through this?
It Depends:
Your business might be risk-averse:
every change has to be thoroughly tested
Other companies just upgrade a replica in production and see
how it goes
My suggestion to do this at least for:
Major MySQL version upgrades
Switching storage engines
75 / 77
Summary
Test Step Skip?
Document Upgrade Single Server Really? Why?
Rollback Scenarios Not Recommended
Consistency Checks Required, No Debate!
Read Tests Strongly Suggested
Workload Tests Possible (Early Adopter Alert)
Migration Tests Not Recommended To Skip
76 / 77
Reducing Risk When Upgrading
Your MySQL Environment
Q&A!
Kenny Gryp
MySQL Practice Manager

More Related Content

What's hot (20)

PDF
MySQL Router REST API
Frederic Descamps
 
PDF
Redo log improvements MYSQL 8.0
Mydbops
 
PDF
Upgrade from MySQL 5.7 to MySQL 8.0
Olivier DASINI
 
PDF
Vacuum徹底解説
Masahiko Sawada
 
PDF
MySQL InnoDB Cluster - New Features in 8.0 Releases - Best Practices
Kenny Gryp
 
PDF
Wait! What’s going on inside my database?
Jeremy Schneider
 
PDF
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?
Miguel Araújo
 
PDF
統計情報のリセットによるautovacuumへの影響について(第39回PostgreSQLアンカンファレンス@オンライン 発表資料)
NTT DATA Technology & Innovation
 
PDF
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
PPTX
オンライン物理バックアップの排他モードと非排他モードについて ~PostgreSQLバージョン15対応版~(第34回PostgreSQLアンカンファレンス...
NTT DATA Technology & Innovation
 
PDF
MySQL InnoDB Cluster - Group Replication
Frederic Descamps
 
PDF
PostgreSQL 15 開発最新情報
Masahiko Sawada
 
PDF
FOSDEM 2022 MySQL Devroom: MySQL 8.0 - Logical Backups, Snapshots and Point-...
Frederic Descamps
 
PDF
NTT DATA と PostgreSQL が挑んだ総力戦
NTT DATA OSS Professional Services
 
PDF
PostgreSQL Deep Internal
EXEM
 
PDF
使ってみませんか?pg_hint_plan
NTT DATA OSS Professional Services
 
PDF
MariaDB MaxScale
MariaDB plc
 
PDF
PGCon 2023 参加報告(第42回PostgreSQLアンカンファレンス@オンライン 発表資料)
NTT DATA Technology & Innovation
 
PDF
MySQL Shell for DBAs
Frederic Descamps
 
PDF
[2018] MySQL 이중화 진화기
NHN FORWARD
 
MySQL Router REST API
Frederic Descamps
 
Redo log improvements MYSQL 8.0
Mydbops
 
Upgrade from MySQL 5.7 to MySQL 8.0
Olivier DASINI
 
Vacuum徹底解説
Masahiko Sawada
 
MySQL InnoDB Cluster - New Features in 8.0 Releases - Best Practices
Kenny Gryp
 
Wait! What’s going on inside my database?
Jeremy Schneider
 
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?
Miguel Araújo
 
統計情報のリセットによるautovacuumへの影響について(第39回PostgreSQLアンカンファレンス@オンライン 発表資料)
NTT DATA Technology & Innovation
 
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
オンライン物理バックアップの排他モードと非排他モードについて ~PostgreSQLバージョン15対応版~(第34回PostgreSQLアンカンファレンス...
NTT DATA Technology & Innovation
 
MySQL InnoDB Cluster - Group Replication
Frederic Descamps
 
PostgreSQL 15 開発最新情報
Masahiko Sawada
 
FOSDEM 2022 MySQL Devroom: MySQL 8.0 - Logical Backups, Snapshots and Point-...
Frederic Descamps
 
NTT DATA と PostgreSQL が挑んだ総力戦
NTT DATA OSS Professional Services
 
PostgreSQL Deep Internal
EXEM
 
使ってみませんか?pg_hint_plan
NTT DATA OSS Professional Services
 
MariaDB MaxScale
MariaDB plc
 
PGCon 2023 参加報告(第42回PostgreSQLアンカンファレンス@オンライン 発表資料)
NTT DATA Technology & Innovation
 
MySQL Shell for DBAs
Frederic Descamps
 
[2018] MySQL 이중화 진화기
NHN FORWARD
 

Viewers also liked (20)

PDF
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
Kenny Gryp
 
PDF
10x Performance Improvements - A Case Study
Ronald Bradford
 
PDF
MySQL High Availability with Group Replication
Nuno Carvalho
 
PDF
Inno db internals innodb file formats and source code structure
zhaolinjnu
 
PDF
SQL Outer Joins for Fun and Profit
Karwin Software Solutions LLC
 
PPTX
Hbase源码初探
zhaolinjnu
 
PDF
What you wanted to know about MySQL, but could not find using inernal instrum...
Sveta Smirnova
 
PDF
Capturing, Analyzing and Optimizing MySQL
Ronald Bradford
 
ODP
Mysql For Developers
Carol McDonald
 
PDF
Advanced Percona XtraDB Cluster in a nutshell... la suite
Kenny Gryp
 
PDF
MySQL Storage Engines - which do you use? TokuDB? MyRocks? InnoDB?
Sveta Smirnova
 
PDF
MySQL Group Replication
Kenny Gryp
 
PDF
MySQL Best Practices - OTN LAD Tour
Ronald Bradford
 
PPT
淘宝数据库架构演进历程
zhaolinjnu
 
PDF
Extensible Data Modeling
Karwin Software Solutions LLC
 
PDF
Multi Source Replication With MySQL 5.7 @ Verisure
Kenny Gryp
 
PDF
Java MySQL Connector & Connection Pool Features & Optimization
Kenny Gryp
 
PPTX
MySQL aio
zhaolinjnu
 
PDF
Group Replication: A Journey to the Group Communication Core
Alfranio Júnior
 
PDF
MySQL Group Replication
Manish Kumar
 
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
Kenny Gryp
 
10x Performance Improvements - A Case Study
Ronald Bradford
 
MySQL High Availability with Group Replication
Nuno Carvalho
 
Inno db internals innodb file formats and source code structure
zhaolinjnu
 
SQL Outer Joins for Fun and Profit
Karwin Software Solutions LLC
 
Hbase源码初探
zhaolinjnu
 
What you wanted to know about MySQL, but could not find using inernal instrum...
Sveta Smirnova
 
Capturing, Analyzing and Optimizing MySQL
Ronald Bradford
 
Mysql For Developers
Carol McDonald
 
Advanced Percona XtraDB Cluster in a nutshell... la suite
Kenny Gryp
 
MySQL Storage Engines - which do you use? TokuDB? MyRocks? InnoDB?
Sveta Smirnova
 
MySQL Group Replication
Kenny Gryp
 
MySQL Best Practices - OTN LAD Tour
Ronald Bradford
 
淘宝数据库架构演进历程
zhaolinjnu
 
Extensible Data Modeling
Karwin Software Solutions LLC
 
Multi Source Replication With MySQL 5.7 @ Verisure
Kenny Gryp
 
Java MySQL Connector & Connection Pool Features & Optimization
Kenny Gryp
 
MySQL aio
zhaolinjnu
 
Group Replication: A Journey to the Group Communication Core
Alfranio Júnior
 
MySQL Group Replication
Manish Kumar
 
Ad

Similar to Reducing Risk When Upgrading MySQL (20)

PDF
Getting Modern With MySQL
All Things Open
 
PDF
Getting modern with my sql
Jakob Lorberblatt
 
PPTX
How to upgrade like a boss to my sql 8.0?
Alkin Tezuysal
 
PDF
How to upgrade like a boss to MySQL 8.0 - PLE19
Alkin Tezuysal
 
PDF
Upgrade to MySQL 5.6 without downtime
Olivier DASINI
 
PDF
Quick Wins
HighLoad2009
 
PDF
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
Aurimas Mikalauskas
 
PDF
My sql 5.7-upcoming-changes-v2
Morgan Tocker
 
PDF
Best practices for MySQL High Availability
Colin Charles
 
PDF
Mysql 57-upcoming-changes
Morgan Tocker
 
PPTX
Infrastructure review - Shining a light on the Black Box
Miklos Szel
 
PDF
From crash to testcase
Roel Van de Paar
 
PDF
Managing MariaDB Server operations with Percona Toolkit
Sveta Smirnova
 
PDF
Best practices for MySQL/MariaDB Server/Percona Server High Availability
Colin Charles
 
PDF
介绍 Percona 服务器 XtraDB 和 Xtrabackup
YUCHENG HU
 
PDF
Scaling MySQL Strategies for Developers
Jonathan Levin
 
PDF
OSDC 2018 | Scaling & High Availability MySQL learnings from the past decade+...
NETWAYS
 
PDF
MySQL Parallel Replication by Booking.com
Jean-François Gagné
 
PDF
SDPHP - Percona Toolkit (It's Basically Magic)
Robert Swisher
 
PPTX
Wildcard13 - warmup slides for the "Roundtable discussion with Oracle Profess...
Maris Elsins
 
Getting Modern With MySQL
All Things Open
 
Getting modern with my sql
Jakob Lorberblatt
 
How to upgrade like a boss to my sql 8.0?
Alkin Tezuysal
 
How to upgrade like a boss to MySQL 8.0 - PLE19
Alkin Tezuysal
 
Upgrade to MySQL 5.6 without downtime
Olivier DASINI
 
Quick Wins
HighLoad2009
 
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
Aurimas Mikalauskas
 
My sql 5.7-upcoming-changes-v2
Morgan Tocker
 
Best practices for MySQL High Availability
Colin Charles
 
Mysql 57-upcoming-changes
Morgan Tocker
 
Infrastructure review - Shining a light on the Black Box
Miklos Szel
 
From crash to testcase
Roel Van de Paar
 
Managing MariaDB Server operations with Percona Toolkit
Sveta Smirnova
 
Best practices for MySQL/MariaDB Server/Percona Server High Availability
Colin Charles
 
介绍 Percona 服务器 XtraDB 和 Xtrabackup
YUCHENG HU
 
Scaling MySQL Strategies for Developers
Jonathan Levin
 
OSDC 2018 | Scaling & High Availability MySQL learnings from the past decade+...
NETWAYS
 
MySQL Parallel Replication by Booking.com
Jean-François Gagné
 
SDPHP - Percona Toolkit (It's Basically Magic)
Robert Swisher
 
Wildcard13 - warmup slides for the "Roundtable discussion with Oracle Profess...
Maris Elsins
 
Ad

More from Kenny Gryp (9)

PDF
MySQL Database Architectures - 2022-08
Kenny Gryp
 
PDF
MySQL Operator for Kubernetes
Kenny Gryp
 
PDF
MySQL InnoDB Cluster / ReplicaSet - Tutorial
Kenny Gryp
 
PDF
MySQL Connectors 8.0.19 & DNS SRV
Kenny Gryp
 
PDF
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
Kenny Gryp
 
PDF
MySQL Group Replication - Ready For Production? (2018-04)
Kenny Gryp
 
PDF
MySQL Group Replication - HandsOn Tutorial
Kenny Gryp
 
PDF
Online MySQL Backups with Percona XtraBackup
Kenny Gryp
 
PDF
Percona XtraDB Cluster
Kenny Gryp
 
MySQL Database Architectures - 2022-08
Kenny Gryp
 
MySQL Operator for Kubernetes
Kenny Gryp
 
MySQL InnoDB Cluster / ReplicaSet - Tutorial
Kenny Gryp
 
MySQL Connectors 8.0.19 & DNS SRV
Kenny Gryp
 
MySQL Database Architectures - InnoDB ReplicaSet & Cluster
Kenny Gryp
 
MySQL Group Replication - Ready For Production? (2018-04)
Kenny Gryp
 
MySQL Group Replication - HandsOn Tutorial
Kenny Gryp
 
Online MySQL Backups with Percona XtraBackup
Kenny Gryp
 
Percona XtraDB Cluster
Kenny Gryp
 

Recently uploaded (20)

PPTX
Softuni - Psychology of entrepreneurship
Kalin Karakehayov
 
PPTX
L1A Season 1 ENGLISH made by A hegy fixed
toszolder91
 
PPTX
法国巴黎第二大学本科毕业证{Paris 2学费发票Paris 2成绩单}办理方法
Taqyea
 
PDF
BRKSP-2551 - Introduction to Segment Routing.pdf
fcesargonca
 
PPTX
PHIPA-Compliant Web Hosting in Toronto: What Healthcare Providers Must Know
steve198109
 
PDF
BRKAPP-1102 - Proactive Network and Application Monitoring.pdf
fcesargonca
 
PDF
BRKACI-1003 ACI Brownfield Migration - Real World Experiences and Best Practi...
fcesargonca
 
PDF
Digital burnout toolkit for youth workers and teachers
asociatiastart123
 
PPTX
04 Output 1 Instruments & Tools (3).pptx
GEDYIONGebre
 
PDF
Top 10 Testing Procedures to Ensure Your Magento to Shopify Migration Success...
CartCoders
 
PDF
The Internet - By the numbers, presented at npNOG 11
APNIC
 
PPTX
Orchestrating things in Angular application
Peter Abraham
 
DOCX
Custom vs. Off-the-Shelf Banking Software
KristenCarter35
 
PDF
FutureCon Seattle 2025 Presentation Slides - You Had One Job
Suzanne Aldrich
 
PPTX
Networking_Essentials_version_3.0_-_Module_3.pptx
ryan622010
 
PDF
Cleaning up your RPKI invalids, presented at PacNOG 35
APNIC
 
PPTX
Presentation3gsgsgsgsdfgadgsfgfgsfgagsfgsfgzfdgsdgs.pptx
SUB03
 
PDF
Enhancing Parental Roles in Protecting Children from Online Sexual Exploitati...
ICT Frame Magazine Pvt. Ltd.
 
PPTX
Networking_Essentials_version_3.0_-_Module_5.pptx
ryan622010
 
PDF
BRKACI-1001 - Your First 7 Days of ACI.pdf
fcesargonca
 
Softuni - Psychology of entrepreneurship
Kalin Karakehayov
 
L1A Season 1 ENGLISH made by A hegy fixed
toszolder91
 
法国巴黎第二大学本科毕业证{Paris 2学费发票Paris 2成绩单}办理方法
Taqyea
 
BRKSP-2551 - Introduction to Segment Routing.pdf
fcesargonca
 
PHIPA-Compliant Web Hosting in Toronto: What Healthcare Providers Must Know
steve198109
 
BRKAPP-1102 - Proactive Network and Application Monitoring.pdf
fcesargonca
 
BRKACI-1003 ACI Brownfield Migration - Real World Experiences and Best Practi...
fcesargonca
 
Digital burnout toolkit for youth workers and teachers
asociatiastart123
 
04 Output 1 Instruments & Tools (3).pptx
GEDYIONGebre
 
Top 10 Testing Procedures to Ensure Your Magento to Shopify Migration Success...
CartCoders
 
The Internet - By the numbers, presented at npNOG 11
APNIC
 
Orchestrating things in Angular application
Peter Abraham
 
Custom vs. Off-the-Shelf Banking Software
KristenCarter35
 
FutureCon Seattle 2025 Presentation Slides - You Had One Job
Suzanne Aldrich
 
Networking_Essentials_version_3.0_-_Module_3.pptx
ryan622010
 
Cleaning up your RPKI invalids, presented at PacNOG 35
APNIC
 
Presentation3gsgsgsgsdfgadgsfgfgsfgagsfgsfgzfdgsdgs.pptx
SUB03
 
Enhancing Parental Roles in Protecting Children from Online Sexual Exploitati...
ICT Frame Magazine Pvt. Ltd.
 
Networking_Essentials_version_3.0_-_Module_5.pptx
ryan622010
 
BRKACI-1001 - Your First 7 Days of ACI.pdf
fcesargonca
 

Reducing Risk When Upgrading MySQL

  • 1. Reducing Risk When Upgrading Your MySQL Environment   Kenny Gryp MySQL Practice Manager
  • 2. My Experience as MySQL Consultant On Upgrading MySQL it's quite complex... Kenny Gryp MySQL Practice Manager
  • 3. Table of Contents The O cial Documentation Make Your Own Documentation Potential Risks Establish Upgrade Method For A Single Server Rollback Scenario Testing Test Writes Test Individual Reads Workload Testing Establish (& Test) Migration Process Migration In Production (Rollback) Post-Migration Assessment 3 / 77
  • 4. The O cial Documentation 4 / 77
  • 5. Oracle's Recommended Process Backup your data Read all release notes and assess https://dev.mysql.com/doc/relnotes/mysql/5.7/en/ Read Changes Affecting Upgrades to MySQL 5.7 https://dev.mysql.com/doc/refman/5.7/en/upgrading-from- previous-series.html 5 / 77
  • 7. Oracle's Recommended Process Upgrade Slaves First In-Place Upgrade: Clean shutdown (innodb_fast_shutdown=0) Run mysql_upgrade Logical Upgrade: mysqldump data Import data again Run mysql_upgrade to x mysql schema http://dev.mysql.com/doc/refman/5.7/en/upgrading.html 7 / 77
  • 8. Oracle's Recommended Process (cont.) A Lot of Risk: No guarantee queries will execute the same No guarantee queries will be same speed or faster No guarantee all your queries will still work (new default stricter sql_mode) There is no o cial support to upgrade from <5.6 to 5.7 but we might actually be able to do that 8 / 77
  • 10. Documenting The Process PEBKAC: Human errors happen and create issues import data using wrong character set setting up replica using wrong binlog le/pos ... Document every step, we need to repeat it multiple times 10 / 77
  • 11. making you afraid to upgrade by describing Potential Risks 11 / 77
  • 12. Optimizer Changes Example: index_merge_intersection Often seen during migrations to MySQL 5.6 Affects environments with sub-optimal indexing Queries with c1='a' AND c2='b' when composite index (c1,c2) is missing Is often slower when selectivity with 1 of the 2 columns is bad (and it happens frequently) Result: a lot of queries were slower in new environment Need SELECT performance tests between versions https://www.percona.com/blog/2012/12/14/the-optimization-that-often-isnt-index-merge-intersection/ 12 / 77
  • 13. New Defaults In MySQL 5.7 The new defaults in MySQL 5.7 make a lot of sense: More use of available features and performance enhancements out of the box More strictness with data/query validation New Reserved words Applications might not be ready for it. Drupal 7 - https://www.drupal.org/node/2545480 They will/might break the application more easily: sql_mode=ONLY_FULL_GROUP_BY, STRICT_TRANS_TABLES, NO_ZERO_IN_DATE, NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO, NO_AUTO_CREATE_USER, NO_ENGINE_SUBSTITUTION innodb_strict_mode=1 Needs SELECT & DML query validity tests between versions 13 / 77
  • 14. Other Changes in MySQL 5.7 Passwords that use the older pre-4.1 password hashing format is removed. 14 / 77
  • 15. MySQL 5.0.37 +-------+ | 0 | +-------+ MySQL 5.0.45 +-------+ | 1 | +-------+ Minor Versions Also At Risk CREATE TABLE date (d DATE); INSERT INTO date VALUES ('2017-04-19'); SELECT COUNT(*) FROM date WHERE d < NOW()-INTERVAL 1 DAY; Seen with DELETE FROM date WHERE d < NOW()- INTERVAL 1 DAY in binlog_format=STATEMENT environments. Needs SELECT & DML query result tests between versions 15 / 77
  • 16. Workload SYNC_BINLOG=1 in MySQL 5.7 Can impact certain environments, might not be noticed when looking at a single query InnoDB LRU Flushing changes require tuning for heavy workloads in 5.6 (innodb_lru_scan_depth) When switching to MySQL 8.0 with the new data dictionary ... Need to do Workload Testing between versions http://mysqlentomologist.blogspot.com/2015/10/fun-with-bugs-38-regression-bugs-in.html http://lefred.be/content/sync_binlog-1-in-5-7/ 16 / 77
  • 17. How Do We Reduce All This Risk? 17 / 77
  • 19. establish Upgrade Method For A Single Server 19 / 77
  • 20. Upgrade Method For A Single Server Follow MySQL documentation: http://dev.mysql.com/doc/refman/5.7/en/upgrading.html Ensure to document every command Restore from backup Or take a replica you can miss 20 / 77
  • 21. Upgrade Method For A Single Server 21 / 77
  • 22. Upgrade Method For A Single Server 22 / 77
  • 24. Writes - Replication Consistency pt-table-checksum: validate consistency in a replication topology Identify problems caused by PEBKAC Ensure events replicate properly (binlog_format=STATEMENT) Upgrade a replica or add a replica which is using the modi ed version. Do it on production, will have no result in test/staging https://www.percona.com/doc/percona-toolkit/3.0/pt-table-checksum.html 24 / 77
  • 25. Writes - Replicate Test Server 25 / 77
  • 26. often left behind is Rollback Scenario Testing 26 / 77
  • 27. Rollback Scenario Testing Possibility to fall back in case something went wrong during migration Can be done using replication, but has to be tested! 27 / 77
  • 28. Writes - Rollback Testing 28 / 77
  • 29. Rollback Scenario Testing You might need to change some settings to your new my.cnf to be able to support replicating back. Example: binlog_checksum = NONE binlog_row_image = FULL binlog_rows_query_log_events = OFF log_bin_use_v1_row_events = 1 gtid_mode = OFF log_slave_updates=1 skip-slave-start 29 / 77
  • 30. Writes - Checksums - GTID 30 / 77
  • 31. Writes - Checksums - Non-GTID 31 / 77
  • 32. Writes - Checkums - ROW 32 / 77
  • 33. Writes - Checkums - ROW 33 / 77
  • 34. Where To Run pt-table-checksum? GTID: pt-table-checksum can only be run on Master (Errant Transactions) Or scratch the pt-table-checksum host after tests non-GTID: pt-table-checksum can be run on intermediate master binlog_format=ROW: only 1 tier below can be checksummed run on every tier that has a replica (for rollback) pt-table-checksum can bring prod overhead when run on active master Let replication run for a while before checksumming 34 / 77
  • 35. pt-table-checksum results On every replica (including rollback): SELECT db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM percona.checksum WHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl; +----+-----------------+------------+--------+ | db | tbl | total_rows | chunks | +----+-----------------+------------+--------+ | db | telephone_debit | 44342 | 1 | | db | orderline | 21451 | 3 | | db | orders | 25125215 | 12 | +----+-----------------+------------+--------+ 35 / 77
  • 36. pt-table-checksum - Analysis Troubleshooting starts now... What went wrong? 36 / 77
  • 37. pt-table-checksum - Analysis Which chunks failed? db: db tbl: telephone_debit chunk: 100 chunk_time: 0.4956125 chunk_index: PRIMARY lower_boundary: 5014733 upper_boundary: 5059074 this_crc: 7fd37eb9 this_cnt: 44342 master_crc: b7babd94 master_cnt: 44342 ts: 2013-02-05 01:59:48 37 / 77
  • 38. pt-table-checksum - Analysis Which chunks failed? db: db tbl: telephone_debit chunk: 100 chunk_time: 0.4956125 chunk_index: PRIMARY lower_boundary: 5014733 upper_boundary: 5059074 this_crc: 7fd37eb9 this_cnt: 44342 master_crc: b7babd94 master_cnt: 44342 ts: 2013-02-05 01:59:48 38 / 77
  • 39. pt-table-checksum - Analysis SELECT * INTO outfile '/tmp/telephone_debit_mysql56' FROM db.telephone_debit WHERE id BETWEEN 5014733 AND 5059074; SELECT * INTO outfile '/tmp/telephone_debit_mysql57' FROM db.telephone_debit WHERE id BETWEEN 5014733 AND 5059074; # diff -u /tmp/telephone_debit_mysql5{6,7} 39 / 77
  • 40. pt-table-checksum - Analysis SELECT * INTO outfile '/tmp/telephone_debit_mysql56' FROM db.telephone_debit WHERE id BETWEEN 5014733 AND 5059074; SELECT * INTO outfile '/tmp/telephone_debit_mysql57' FROM db.telephone_debit WHERE id BETWEEN 5014733 AND 5059074; # diff -u /tmp/telephone_debit_mysql5{6,7} Use twindb_table_compare! https://github.com/twindb/twindb_table_compare 40 / 77
  • 41. pt-table-checksum - Analysis Wrong upgrade method backups wrong replication le/pos ... binlog_format=STATEMENT using (UUID()...) Common Seen Issues replicating older versions: Floating point differences: Storing currencies in a DOUBLE Temporal data types Invalid dates converted to zero dates Trailing spaces in CHAR elds 41 / 77
  • 42. Testing Writes Consistency Checks Process: Checksum Check for differences On new environment On rollback environment For each inconsistency Analyze diff Find root cause Fix problem Document problem & solution Repeat checksum again 42 / 77
  • 44. Testing Reads - Collect Queries 44 / 77
  • 45. Testing Reads - Collect Queries Collection Techniques: Slow Query Log long_query_time=0 Careful when ~+10000 QPS Percona Server: log_slow_rate_limit tcpdump 'packets lost' in libpcap Application/Load Balancer queries Ensure: Get the full workload (long enough) Get data from Master & Replicas Collect batchjob queries running at night https://www.percona.com/doc/percona-server/5.7/diagnostics/slow_extended.html 45 / 77
  • 46. Testing Reads - Setup 2 Environments 46 / 77
  • 47. Testing Reads - Setup 2 Environments Need 2 Test Servers: Reuse servers from checksum + rollback Ensure they have the same data (break replication at same time) Same HW speci cations Similar Con gurations on buffer pool, flatc... Fast enough to more or less resemble production Optionally can be done using 1 machine (pt-upgrade --save-results) 47 / 77
  • 48. Testing Reads - pt-upgrade 48 / 77
  • 49. Testing Reads - pt-upgrade pt-upgrade: runs one query at a time on both test environments compares differences: warnings/errors resultset (even different order) query response time Run pt-upgrade on third host with similar network latency Run twice to warm up buffer pool rst (need to be equal) Can also compare writes for execution time & warnings Filter slowlog initially to limit similar queries pt-query-digest --no-report --output slowlog --samples 20 https://www.percona.com/doc/percona-toolkit/3.0/pt-upgrade.html 49 / 77
  • 50. Testing Reads - pt-upgrade Reporting class because there are 1000 row diffs. Total queries 10 Unique queries 10 Discarded queries 0 select ... from ... ## ## Row diffs: 10 ## -- 1. @ row 2 < 13178,"dim0",37,2,21,,,0,0,0,1,NULL,NULL > 13178,"dimø",37,2,21,,,0,0,0,1,NULL,NULL ... 50 / 77
  • 51. Testing Reads - pt-upgrade Reporting class because it has diffs, but hasn't been reported yet. SELECT * FROM `database`.table WHERE treeid = '' AND productid='0' ## Warning diffs: 2 Code: 1366 Level: Warning Message: Incorrect integer value: '' for column 'treeid' at row 1 vs. No warning 1366 51 / 77
  • 52. Testing Reads - pt-upgrade SELECT * FROM `database`.client_orders WHERE client=? AND blacklist=? LIMIT ? ## Query time diffs: 1 -- 1. 0.000513 vs. 0.036395 seconds (70.9x increase) SELECT * FROM `database`.client_orders WHERE client=57450 AND blacklist=1 LIMIT 1 52 / 77
  • 53. Testing Reads Process Collect queries Run pt-upgrade (twice) For each entry in report Figure out why it is reported Deploy x in Prod Application Make schema changes Document analysis Run pt-upgrade again 53 / 77
  • 54. one of the most challenging is Testing Workload 54 / 77
  • 55. Workload Testing - Percona Playback 55 / 77
  • 56. Workload Testing - Query Playback Uses slowlog to replay queries Needs long_query_time=0 - challenging on busy servers Enough data during peak workload Tries to execute workload as realistically as possible same connections, same transactions, same delays between queries Run against both environments, compare speed Think about preloading buffer on both the same way Active development by Marius Wachtler (ex)-DropBox! Thank you! (uno cal product of Percona, no support) 56 / 77
  • 57. Workload Testing - ProxySQL Mirroring 57 / 77
  • 58. Workload Testing - ProxySQL Mirroring Mirror queries from Load Balancer to test environment Good Blogpost: https://www.pythian.com/blog/using-proxysql-validate-mysql- updates/ 58 / 77
  • 59. establish (& test) Migration Process 59 / 77
  • 60. Migration Process Create Migration Plan Different for every environment/application Upgrade a replica rst for a couple of days/weeks? How to switch masters? How is failover being handled nowadays? MHA, Orchestrator, Manual, GTID/msyqlrpladmin...? Test in staging! 60 / 77
  • 61. the actual Migration In Production 61 / 77
  • 62. Migration - Create Slave Environments 62 / 77
  • 63. Migration - Redirect Read Tra c 63 / 77
  • 64. Migration - Application Switchover - 1 64 / 77
  • 65. Migration - Application Switchover - 2 65 / 77
  • 66. you (think you) will never need to do a Rollback 66 / 77
  • 68. Rollback What went wrong? I did not follow the full process! (or I forgot to document it) Do consistency checks again! 68 / 77
  • 69. after all that testing, it's ok to spend time doing Post-Migration Assessment 69 / 77
  • 70. Post-Migration Check trending for different behavior more cpu load? more disk IO? higher amount of innodb_rows_* and handler_* threads_running stability? do some query optimization If all looks good, scratch the 5.6 rollback & make it 5.7 Remove the rollback speci c con guration options 70 / 77
  • 73. Multi-Use (Minor MySQL version upgrades) Major MySQL version upgrades Switching Hardware from Intel -> AMD archicture Using a new kernel/libc/memory allocator Switching storage engines MariaDB/Percona Server/MySQL ... 73 / 77
  • 74. Do I really have to go through this? Many success stories: Have done several MySQL upgrades from 4.1 -> 5.5 without intermediate slaves Upgraded environments with major schema changes in the mix (mssql-style environments using stored procedures only) Found numerous application bugs using this process Optimized many customers schemas/queries in the meantime As long as you follow this process completely, the risk of running into problems is quite small. 74 / 77
  • 75. Do I really have to go through this? It Depends: Your business might be risk-averse: every change has to be thoroughly tested Other companies just upgrade a replica in production and see how it goes My suggestion to do this at least for: Major MySQL version upgrades Switching storage engines 75 / 77
  • 76. Summary Test Step Skip? Document Upgrade Single Server Really? Why? Rollback Scenarios Not Recommended Consistency Checks Required, No Debate! Read Tests Strongly Suggested Workload Tests Possible (Early Adopter Alert) Migration Tests Not Recommended To Skip 76 / 77
  • 77. Reducing Risk When Upgrading Your MySQL Environment Q&A! Kenny Gryp MySQL Practice Manager