SlideShare a Scribd company logo
ZFS and MySQL on Linux,
the Sweet Spots
ZFS User Conference 2018
Jervin Real
1 / 50
MySQL
The World's Most Popular Open Source Database
2 / 50
ZFS
Is MySQL for storage.
3 / 50
ZFS + MySQL
MySQL Needs
A reliable, durable, performant storage
ZFS Provides
A reliable, durable, performant(?) storage
4 / 50
ZFS + MySQL
MySQL Needs
A reliable, durable, performant storage
At the same time, users demand:
At rest encryption (number of choices available)
Compression (InnoDB compression is a bit complex for many)
Reliable and sane backups (except when you leave your
dataset to grow)
ZFS Provides
A reliable, durable, performant(?) storage
5 / 50
ZFS + MySQL
MySQL Needs
A reliable, durable, performant storage
At the same time, users demand:
At rest encryption (number of choices available)
Compression (InnoDB compression is a bit complex for many)
Reliable and sane backups (except when you leave your
dataset to grow)
ZFS Provides
A reliable, durable, performant(?) storage
Compression options, encryption, sane backups
6 / 50
ZFS + MySQL
Its a compromise and they should meet
somewhere in between ...
7 / 50
WARNING: You will see graphs ...
8 / 50
Use Case: Large MySQL Dataset
9 / 50
Large MySQL Dataset
Probably not what you should be doing, but:
Single large dataset, in the TBs range
Backup and recovery challenges
Storage space constraints
Long data retention periods (fintech/healthcare)
Mixed storage engines (yes MySQL allows it)
10 / 50
Large MySQL Dataset
Switching is not as straighforward however:
Thick performance bands on ZFS
TPS down to 0 on write heavy tests
11 / 50
Large MySQL Dataset
Switching is not as straighforward however:
Gets worse on 32 threads, write-only workload
12 / 50
Large MySQL Dataset
Switching is not as straighforward however:
Transactions/sec drops to zero a lot of times!
13 / 50
ZFS
logbias=throughput
zfs_arc_max=1073741824
zfs_prefetch_disable=1
atime=12
recordsize=16k/128k
MySQL
sync_binlog=1
innodb_flush_log_at_trx_commit=1
innodb_doublewrite=0
Large MySQL Dataset
Switching is not as straighforward however:
HP DL380G7 6xSAS disks, P410 (with battery backed cache)
Disks configured as JBOD, ZFS pool 2x3 mirrored stripes
14 / 50
Large MySQL Dataset
Trying to optimize without shelling out extra
cash:
Minimum TPS drops just by tuning zfs_dirty_data_max=128M:
./1‑no‑slog/sb‑no‑slog‑ui32.csv                                    383
./7‑no‑slog‑no4‑txgtimeout5/sb‑no‑slog‑no4‑txgtimeout5‑ui32.csv     18
15 / 50
Large MySQL Dataset
Trying to optimize without shelling out extra
cash:
Still quite far off from EXT4 numbers, but we know our ceiling
16 / 50
Large MySQL Dataset
Biting the bullet, adding an NVMe SLOG:
Raised further, but not as far as we expected
zfs_nocacheflush=1 helps too, we are using RAID controller with
battery backed cache
17 / 50
Large MySQL Dataset
Biting the bullet, adding an NVMe SLOG:
Limited by how much
throughput and IOPs
combined our main pool disks
can deliver.
18 / 50
Large MySQL Dataset
Well let's test with SSD as well:
Same hardware, 6x Samsung 860 SSD drives
Same ZFS config without NVMe SLOG
Big gap from EXT4, just a matter of tuning for capacity with SSD
19 / 50
        ˉ_(ツ)_/ˉ
Watch this space:
https://github.com/dotmanila/zfs‑mysql
Large MySQL Dataset
Unfortunately, just as we were having fun:
20 / 50
Large MySQL Dataset
What we learned so far:
Know your pool capability and capacity
Performance depends on the slowest component (pool disks vs
slog)
21 / 50
Use Case: Percona XtraDB Cluster
22 / 50
Percona XtraDB Cluster
A group of MySQL servers with:
Synchronous Replication
Multi Master*, True Parallel Replication
Automatic node provisioning
*Requires compatible workload type
23 / 50
Percona XtraDB Cluster
How Galera replication works:
Image Credit: http://galeracluster.com/documentation-
webpages/certificationbasedreplication.html#how-certification-
based-replication-works
24 / 50
Percona XtraDB Cluster
Why ZFS fits:
Writeset certifications within the cluster allows least per node
durability.
sync_binlog = 0
innodb_flush_log_at_trx_commit = 0
25 / 50
Percona XtraDB Cluster
Why ZFS fits:
We should be able to tune ZFS with least durability as well.
sync=disabled
zfs_nocacheflush=1
Plus PXC settings:
innodb_doublewrite=0
innodb_log_checksums=OFF
innodb_checksum_algorithm=none
26 / 50
Percona XtraDB Cluster
Exploring ZFS on PXC, i3.2xlarge
27 / 50
[mysqld]
server‑id=1
datadir=/mysql/data
wsrep_provider=/usr/lib/galera3/libgalera_smm.so
wsrep_cluster_address=gcomm://
binlog_format=ROW
default_storage_engine=InnoDB
innodb_doublewrite=0
innodb_log_group_home_dir=/mysql/logs
innodb_io_capacity=5000
innodb_autoinc_lock_mode=2
innodb_flush_log_at_trx_commit=0
innodb_buffer_pool_size=48G
innodb_log_file_size=8G
innodb_log_checksums=OFF
innodb_checksum_algorithm=none
pxc_strict_mode=ENFORCING
wsrep_node_name=zfs01
wsrep_slave_threads=8
wsrep_node_address=10.1.2.117
wsrep_cluster_name=zfs
wsrep_sst_method=xtrabackup‑v2
wsrep_sst_auth="msandbox:msandbox"
options zfs zfs_arc_max=1073741824
options zfs zfs_prefetch_disable=1
options zfs zfs_nocacheflush=1
sudo zpool create ‑o ashift=12 ‑f mysql /dev/nvme0n1
sudo zfs set recordsize=16k mysql
sudo zfs set atime=off mysql
sudo zfs set logbias=latency mysql
sudo zfs set primarycache=metadata mysql
sudo zfs set compression=lz4 mysql
sudo zfs set sync=disabled mysql
sudo zfs create ‑o recordsize=128K mysql/logs
sudo zfs create ‑o recordsize=16K mysql/data
Percona XtraDB Cluster
Exploring ZFS on PXC, i3.2xlarge
28 / 50
Percona XtraDB Cluster
Why ZFS fits:
And still take advantage of ZFS features:
ZFS snapshots for SST
Reading records/blocks from source multithreaded
Writing records to destination is also multithreaded
Encryption
Compression
29 / 50
Percona XtraDB Cluster
State Snapshot Transfer:
XtraBackup
Donor becomes unavailable, on a 3 node cluster, you lose 2/3
when one needs SST.
Potentially slow for large uncompressed datasets
On the fly compression and decompression
It is however parallel
Depends on compressability
ZFS Snapshot
Donor remains available
Raw SEND/REC
30 / 50
Percona XtraDB Cluster
State Snapshot Transfer:
Taking ZFS Snapshot with Percona Server/Percona XtraDB Cluster
mysql> LOCK TABLES FOR BACKUP;
mysql> LOCK BINLOG;
mysql> SHOW MASTER STATUS;
mysql> ‑‑ take ZFS snapshot here
mysql> UNLOCK TABLES;
mysql> UNLOCK BINLOG;
31 / 50
Percona XtraDB Cluster
State Snapshot Transfer:
Streaming encrypted and compressed:
ubuntu@ip‑10‑1‑2‑126:~$ mbuffer ‑s 128k ‑m 1G ‑I 9999 | sudo zfs recv ‑F mysql
in @ 77.9 MiB/s, out @ 77.9 MiB/s,  156 GiB total, buffer   0% full
summary:  156 GiByte in 23min 24.7sec ‑ average of  114 MiB/s
ubuntu@ip‑10‑1‑2‑126:~$ sudo zfs load‑key ‑a
2 / 2 key(s) successfully loaded
ubuntu@ip‑10‑1‑2‑126:~$ sudo zfs mount ‑a
ubuntu@ip‑10‑1‑2‑117:~$ time sudo zfs send ‑wcR mysql@201804192210 | mbuffer ‑s 128
in @ 22.2 MiB/s, out @ 76.4 MiB/s,  156 GiB total, buffer   0% full
summary:  156 GiByte in 23min 22.1sec ‑ average of  114 MiB/s
real  23m24.961s
user  0m9.052s
sys 6m39.036s
32 / 50
r4.4xlarge, single
EBS 4000 PIOPs
PXC 5.7.19
Idle cluster, no
traffic during
snapshot
transfers
Percona XtraDB Cluster
State Snapshot Transfer: ZFS POC Tests
33 / 50
Use Case: MySQL Dedicated
Backup Nodes
34 / 50
MySQL Dedicated Backup Nodes
Backups are fun:
Performance
Restore Time Objective
Restore Point Objective
Security
35 / 50
MySQL Dedicated Backup Nodes
Backup options for InnoDB:
Percona XtraBackup
Delayed Replica
36 / 50
MySQL Dedicated Backup Nodes
Backups are fun: XtraBackup (1/2)
Performance
Fast parallel copy, fast on the fly compression with qpress
Decompress in parallel too
Restore Time Objective
Depends on where backups are stored relative to restore
target
How fast to transfer backups from stored state to usable state
PITR also adds up and depends on how frequent backups are
made
Full + incremental backups are possible
37 / 50
MySQL Dedicated Backup Nodes
Backups are fun: XtraBackup (2/2)
Restore Point Objective
PITR with full + incremental + binary logs roll forward
Gap between incremental and binary log target is crucial
Security
Encryption on the fly, decryption separate process
Decompression and decryption can be done at the same time
in parallel with some shell-fu
cat table.ibd.xbcrypt.qp | xbcrypt ‑d | qpress ‑di
table.ibd
38 / 50
MySQL Dedicated Backup Nodes
Backups are fun: Delayed MySQL Replica (1/2)
Performance
Full dataset immediately available
PITR recovery speed depends on configured delay and SQL
thread speed
Restore Time Objective
Depends on configured delay and SQL thread speed
39 / 50
MySQL Dedicated Backup Nodes
Backups are fun: Delayed MySQL Replica (2/2)
Restore Point Objective
PITR range depends on configured delay (1hr vs 1day)
Security
Scoped within instance security (at rest, in transit)
40 / 50
MySQL Dedicated Backup Nodes
Testing Snapshots
Run sysbench oltp_update_index test, 32 threads on an i3.2xlarge
Take snapshots every 5mins, copy the last snapshot every hour
(send to /dev/null)
while true; do 
  echo "$(date +%Y‑%m‑%d_%H:%M) sleeping ..."; 
  sleep 300; 
  sudo zfs snapshot ‑r mysql@$(date +%Y%m%d%H%M); 
done
while true; do 
  snap=$(zfs list ‑t snap | egrep 'mysql@' | tail ‑n1 | awk '{print $1}');
  time sudo zfs send ‑R $snap | cat ‑ > /dev/null; 
  sleep 3600; 
done
41 / 50
MySQL Dedicated Backup Nodes
Testing Snapshots
Transactions/sec looks to be even and non-degrading
42 / 50
MySQL Dedicated Backup Nodes
Testing Snapshots
Running the graph on Loess smoothing, reveals something
interesing
Transactions/sec degrades overtime
43 / 50
MySQL Dedicated Backup Nodes
Testing Snapshots
Reads saturating the disk
Binary logs size increased
44 / 50
MySQL Dedicated Backup Nodes
Backups are fun: ZFS Snapshots (1/2)
Performance
Compressed optimized SEND/RECV
Restore Time Objective
Same as Xtrabackup (full + incremental + binlogs speed)
In place snapshots allows near instantaneous rollback!
45 / 50
MySQL Dedicated Backup Nodes
Backups are fun: ZFS Snapshots (2/2)
Restore Point Objective
In place snapshots allows rollback as far as space allows
Combined with replication, essentially also a delayed replica
but faster!
Security
Encrypted datasets can remain encrypted in transit and to
destination
46 / 50
Looking Forward
47 / 50
Looking Forward
Exciting things up ahead:
ZSTD compression!
More MySQL related potential tuning:
Separate undo log directory - InnoDB undo log are best suited
for SSDs
PXC state snapshot transfer (automatic provisioning) with ZFS
snapshots
48 / 50
Questions!
49 / 50
Percona Live Call 2018 is Next Week!
https://www.percona.com/live/18/
50 / 50

More Related Content

What's hot (20)

PDF
Linux performance tuning & stabilization tips (mysqlconf2010)
Yoshinori Matsunobu
 
PPTX
What every data programmer needs to know about disks
iammutex
 
PDF
LizardFS-WhitePaper-Eng-v3.9.2-web
Szymon Haly
 
PDF
[若渴]Study on Side Channel Attacks and Countermeasures
Aj MaChInE
 
PDF
Enterprise manager cloud control 12c(12.1) &agent安装图文指南
maclean liu
 
PDF
Scale2014
Dru Lavigne
 
PDF
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE
 
PDF
Comparison of foss distributed storage
Marian Marinov
 
PDF
Linux con europe_2014_f
sprdd
 
PPTX
[若渴計畫] Studying ASLR^cache
Aj MaChInE
 
PDF
Percona XtraDB 集群文档
YUCHENG HU
 
PDF
Flourish16
Dru Lavigne
 
PPSX
Logical volume manager xfs
Sarwar Javaid
 
PDF
High Availability Storage (susecon2016)
Roger Zhou 周志强
 
PPTX
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Виталий Стародубцев
 
PDF
Introducing Xtrabackup Manager
Henrik Ingo
 
PPTX
Introduction to TrioNAS LX U300
qsantechnology
 
PDF
Informix Warehouse Accelerator on Cluster
Andreas Breitfeld
 
PPTX
Backups
Payal Singh
 
PDF
Feature rich BTRFS is Getting Richer with Encryption
LF Events
 
Linux performance tuning & stabilization tips (mysqlconf2010)
Yoshinori Matsunobu
 
What every data programmer needs to know about disks
iammutex
 
LizardFS-WhitePaper-Eng-v3.9.2-web
Szymon Haly
 
[若渴]Study on Side Channel Attacks and Countermeasures
Aj MaChInE
 
Enterprise manager cloud control 12c(12.1) &agent安装图文指南
maclean liu
 
Scale2014
Dru Lavigne
 
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
SUSE
 
Comparison of foss distributed storage
Marian Marinov
 
Linux con europe_2014_f
sprdd
 
[若渴計畫] Studying ASLR^cache
Aj MaChInE
 
Percona XtraDB 集群文档
YUCHENG HU
 
Flourish16
Dru Lavigne
 
Logical volume manager xfs
Sarwar Javaid
 
High Availability Storage (susecon2016)
Roger Zhou 周志强
 
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Виталий Стародубцев
 
Introducing Xtrabackup Manager
Henrik Ingo
 
Introduction to TrioNAS LX U300
qsantechnology
 
Informix Warehouse Accelerator on Cluster
Andreas Breitfeld
 
Backups
Payal Singh
 
Feature rich BTRFS is Getting Richer with Encryption
LF Events
 

Similar to ZFS and MySQL on Linux, the Sweet Spots (20)

PDF
Using ZFS file system with MySQL
Mydbops
 
PPTX
Infrastructure review - Shining a light on the Black Box
Miklos Szel
 
PDF
PostgreSQL on EXT4, XFS, BTRFS and ZFS
Tomas Vondra
 
PDF
The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...
Dave Stokes
 
PDF
Scaling MySQL in Amazon Web Services
Laine Campbell
 
PDF
Proper Care and Feeding of a MySQL Database for Busy Linux Administrators
Dave Stokes
 
PDF
Running MySQL in AWS
Laine Campbell
 
PDF
Mysql features for the enterprise
Giuseppe Maxia
 
PDF
MySQL Backup and Recovery Essentials
Ronald Bradford
 
PDF
MySQL Time Machine by replicating into HBase - Slides from Percona Live Amste...
Boško Devetak
 
PDF
SSD based storage tuning for databases
Angelo Rajadurai
 
PDF
Database performance tuning for SSD based storage
Angelo Rajadurai
 
PDF
Linuxfest Northwest Proper Care and Feeding Of a MySQL for Busy Linux Admins
Dave Stokes
 
PDF
Buytaert kris my_sql-pacemaker
kuchinskaya
 
PPTX
How (not) to kill your MySQL infrastructure
Miklos Szel
 
PDF
Loadays MySQL
lefredbe
 
PDF
The Peoper Care and Feeding of a MySQL Server for Busy Linux Admin
Dave Stokes
 
PDF
Storage Methods for Nonstandard Data Patterns
Bob Burgess
 
PDF
Lock, Stock and Backup: Data Guaranteed
Jervin Real
 
PDF
The Proper Care and Feeding of MySQL Databases
Dave Stokes
 
Using ZFS file system with MySQL
Mydbops
 
Infrastructure review - Shining a light on the Black Box
Miklos Szel
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
Tomas Vondra
 
The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...
Dave Stokes
 
Scaling MySQL in Amazon Web Services
Laine Campbell
 
Proper Care and Feeding of a MySQL Database for Busy Linux Administrators
Dave Stokes
 
Running MySQL in AWS
Laine Campbell
 
Mysql features for the enterprise
Giuseppe Maxia
 
MySQL Backup and Recovery Essentials
Ronald Bradford
 
MySQL Time Machine by replicating into HBase - Slides from Percona Live Amste...
Boško Devetak
 
SSD based storage tuning for databases
Angelo Rajadurai
 
Database performance tuning for SSD based storage
Angelo Rajadurai
 
Linuxfest Northwest Proper Care and Feeding Of a MySQL for Busy Linux Admins
Dave Stokes
 
Buytaert kris my_sql-pacemaker
kuchinskaya
 
How (not) to kill your MySQL infrastructure
Miklos Szel
 
Loadays MySQL
lefredbe
 
The Peoper Care and Feeding of a MySQL Server for Busy Linux Admin
Dave Stokes
 
Storage Methods for Nonstandard Data Patterns
Bob Burgess
 
Lock, Stock and Backup: Data Guaranteed
Jervin Real
 
The Proper Care and Feeding of MySQL Databases
Dave Stokes
 
Ad

More from Jervin Real (11)

PDF
Low Cost Transactional and Analytics with MySQL + Clickhouse
Jervin Real
 
PDF
Low Cost Transactional and Analytics with MySQL + Clickhouse
Jervin Real
 
PDF
Learning MySQL 5.7
Jervin Real
 
PDF
Heterogenous Persistence
Jervin Real
 
PDF
Preventing and Resolving MySQL Downtime
Jervin Real
 
PDF
TokuDB - What You Need to Know
Jervin Real
 
PDF
PLAM 2015 - Evolving Backups Strategy, Devploying pyxbackup
Jervin Real
 
PDF
Learning by Experience, Devploying pyxbackup
Jervin Real
 
PDF
AWS Users Meetup April 2015
Jervin Real
 
PDF
High Performance Rails with MySQL
Jervin Real
 
PDF
Highly Available MySQL/PHP Applications with mysqlnd
Jervin Real
 
Low Cost Transactional and Analytics with MySQL + Clickhouse
Jervin Real
 
Low Cost Transactional and Analytics with MySQL + Clickhouse
Jervin Real
 
Learning MySQL 5.7
Jervin Real
 
Heterogenous Persistence
Jervin Real
 
Preventing and Resolving MySQL Downtime
Jervin Real
 
TokuDB - What You Need to Know
Jervin Real
 
PLAM 2015 - Evolving Backups Strategy, Devploying pyxbackup
Jervin Real
 
Learning by Experience, Devploying pyxbackup
Jervin Real
 
AWS Users Meetup April 2015
Jervin Real
 
High Performance Rails with MySQL
Jervin Real
 
Highly Available MySQL/PHP Applications with mysqlnd
Jervin Real
 
Ad

Recently uploaded (20)

PDF
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
introduction to computer hardware and sofeware
chauhanshraddha2007
 
PPTX
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PDF
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PPTX
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
PDF
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
introduction to computer hardware and sofeware
chauhanshraddha2007
 
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 

ZFS and MySQL on Linux, the Sweet Spots