SlideShare a Scribd company logo
Performance Architect
Oracle/MySQL/Replication
Jan 31, 2020
Vítor Oliveira
pre-FOSDEM 2020 MySQL Days
MySQL Replication Performance
in the Cloud
The following is intended to outline our general product direction. It is intended for information
purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any
material, code, or functionality, and should not be relied upon in making purchasing decisions. The
development, release, timing, and pricing of any features or functionality described for Oracle’s
products may change and remains at the sole discretion of Oracle Corporation.
Safe harbor statement
Copyright © 2020, Oracle and/or its affiliates2
Copyright © 2020, Oracle and/or its affiliates3
5
4
3
2
1
Conclusion
A transaction simulator
Benchmarking replication
Can we estimate the behavior?
Introduction
Agenda
Copyright © 2020, Oracle and/or its affiliates4
Evaluating the performance of a system properly is difficult, error prone and expensive, both in terms of
time and of computing resources.
• Performance benchmarks are a good tool to improve the software, to help design and size
infrastructures where to deploy the software.
• However, performance benchmarks alone are insufficient, we need to understand why the results
are in such a way, so that information is of any use in different scenarios.
Introduction
Copyright © 2020, Oracle and/or its affiliates5
A problem for performance evaluation is knowing what is reasonable to expect from a particular
workload in a particular deployment scenarios.
• The limits of each component of the system, and how they are being reached, is unknown;
• The cumulative effects of multiple changes may produce unexpected results;
• And the software features to test may have not been built yet…
In the presentation I will show some benchmarks related to running MySQL replication on the Oracle
Cloud Infrastructure (OCI), and them try to provide some inner details that might help on the problems
above.
Introduction
Copyright © 2020, Oracle and/or its affiliates6
We will start by presenting a performance evaluation of the MySQL Replication technologies running
on the OCI, varying the shape type and some configuration parameters.
The following testing setup was built:
• Topologies with 3 servers on OCI, one in each AD, and a single client machine in the same AD as
the primary server;
• The benchmark selected was the usual Sysbench, in particular OLTP RW, OLTP Write-only and
Update Index;
• The machines selected where both a set of three OCI’s BM.Standard.E2.64 (64 core, 128 HW
threads), and a set of VM.Standard.E2.8 (8 core, 128 HW threads).
Benchmarking replication
Copyright © 2020, Oracle and/or its affiliates7
Comparison between a single server and three servers in
different ADs, using semi-sync and group replication.
Two different type of servers: 1 VM with 8 cores and a full
bare metal machine with 64 cores.
Overview of replication performance
Benchmarking replication
0
2000
4000
6000
8000
10000
12000
1 2 4 8 16 32 48 64 96 128 192 256 512
Throughput(transactions/second)
Number of client threads
Sysbench Update Index Throughput on OCI BM.E2 and VM.Standard.E2.8
Standalone vs Semi-sync vs Group Replication
BM Standalone BM Semi-sync BM Group Replication
E2.8 Standalone E2.8 Semi-sync E2.8 Group Replication
0
1000
2000
3000
4000
5000
6000
7000
1 2 4 8 16 32 48 64 96 128 192 256 512
Throughput(transactions/second)
Number of client threads
Sysbench Read-Write Throughput on OCI BM.E2 and VM.Standard.E2.8
Standalone vs Semi-sync vs Group Replication
BM Standalone BM Semi-sync BM Group Replication
E2.8 Standalone E2.8 Semi-sync E2.8 Group Replication
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
1 2 4 8 16 32 48 64 96 128 192 256 512
Throughput(transactions/second)
Number of client threads
Sysbench Write-only Throughput on OCI BM.E2 and VM.Standard.E2.8
Standalone vs Semi-sync vs Group Replication
BM Standalone BM Semi-sync BM Group Replication
E2.8 Standalone E2.8 Semi-sync E2.8 Group Replication
Copyright © 2020, Oracle and/or its affiliates8
Impact of relaxing the durability on the disks, transferring
it to the network.
The gain is significant, as the distributed storage system is
more complex and with a higher latency than the Paxos
from Group Replication.
Impact of durability on throughput
Benchmarking replication
0
1000
2000
3000
4000
5000
6000
7000
8000
1 2 4 8 16 32 48 64 96 128 192 256 512
Throughput(transactions/second)
Number of client threads
Sysbench Read-Write Throughput on OCI BM.Standard.E2
Standalone vs Semi-sync vs Group Replication, Durability
Standalone
Semisync
Semi-nosync
GR
GR-nosync
0
2000
4000
6000
8000
10000
12000
14000
1 2 4 8 16 32 48 64 96 128 192 256 512
Throughput(transactions/second)
Number of client threads
Sysbench Write-only Throughput on OCI BM.Standard.E2
Standalone vs Semi-sync vs Group Replication, Durability
Standalone
Semisync
Semi-nosync
GR
GR-nosync
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
1 2 4 8 16 32 48 64 96 128 192 256 512
Throughput(transactions/second)
Number of client threads
Sysbench Update Index Throughput on OCI BM.Standard.E2
Standalone vs Semi-sync vs Group Replication, Durability
Standalone
Semisync
Semi-nosync
GR
GR-nosync
Copyright © 2020, Oracle and/or its affiliates9
There benchmarks are representative of a particular workload in a particular
testing scenario.
When evaluating such benchmarks many questions pop up:
• What explains those results?
• Can I expect the same behavior for my workload in the same setup?
• How does the performance compare other cloud providers?
• How does the performance compare to on-premises deployments?
• Would other parameters be more effective for a particular purpose?
• Were the measurements properly taken?
• What can I reliably estimate using these results?
What do we gain with these performance evaluation results?
Benchmarking replication
Copyright © 2020, Oracle and/or its affiliates10
The proper way to benchmark a system is to deploy it in a real system, and apply to it
the actual user workload, even if that is hard to do in many cases.
Benchmarks can be use capture three different things (often hard to distinguish):
1. The performance of the software implementation of certain features tested by a
benchmark in a particular way;
2. The noise in the environment and other test setup limitations;
3. The fundamental architectural and configuration details of the computing system
used in the tests.
Wouldn't it be great if we had a model that represents different deployment scenarios
and calculate a rough estimate on the throughput to expect?
Can we estimate the behavior?
Copyright © 2020, Oracle and/or its affiliates11
While the software implementation is hard to model, the fundamental
aspects of the computing systems that are used are not, at least in a very
simplified way.
This model can be build on information such as:
• The architecture of MySQL and how it uses the computing resources;
• The characteristics of the workload: number of queries on the workload,
size of the active dataset compared to the memory size;
• And the computing system characteristics: network latency and
bandwidth, computing resources, number of network jumps.
Building a simple model
Can we estimate the behavior?
Copyright © 2020, Oracle and/or its affiliates12
On the right is a really simple model
for the execution of a transaction.
It demonstrates the goal, but we
need more details…
Building a simple model
Can we estimate the behavior?
Copyright © 2020, Oracle and/or its affiliates13
Lifecycle of a transaction:
• Execute the queries as
they are sent, read any
data from the database
that isn’t already in the
buffer pool, write the
changes to the redo log,
prepare the transaction,
fsync (phase 1);
• Write the full transaction
to the binary log, fsync
(phase 2);
• Send the binary log to the
slave and apply the same
changes there.
Asynchronous Replication
MySQL
time
An Asynchronous Replication transaction
R - Read from disk
W - Write to disk
P - Prepare
S - Sync to disk
W R
W
P
S W S
W R
USER APPLICATION (ONE TRANSACTION)
READ
DATA
BASE
WRITE
REDO
LOG
BINARY
LOG
BEGIN
DUMP
RELAY
LOGS
IO
REDO
LOG
BINARY
LOG
APPLY
MASTER
SLAVE
STORAGE
 SYSTEM
STORAGE
 SYSTEM
COMMIT
W P S SR
Copyright © 2020, Oracle and/or its affiliates14
Changes to the lifecycle:
• Group replication intercepts the
transactions before they are
committed and broadcasts them
to the network;
• The Paxos protocol in GR
requires at least 1 or 1.5 RTT to
propagate the messages,
although 2 RTT is more
common;
• After that the certification takes
over and the rest of the pipeline
is similar to asynchronous
replication.
Group Replication
MySQL
USER APPLICATION (ONE TRANSACTION)
READ
DATA
BASE
WRITE
REDO
LOG
COMMIT
throttle (fc)
BINARY
LOG
BEGIN
CERTIFY
RELAY
LOG
CERTIFY
REDO
LOG
BINARY
LOG
APPLY
MASTER
SLAVE
STORAGE
 SYSTEM
STORAGE
 SYSTEM
STORAGE
 SYSTEM
GROUP
REPLICATION
(PAXOS)
time
A Group Replication transaction
with consistency=EVENTUAL
R - Read from disk
W - Write to disk
P - Prepare
S - Sync to disk
B - Broadcast
W P S BR
W R
W
P
S W S
Copyright © 2020, Oracle and/or its affiliates15
Changes to the lifecycle:
• With transaction
consistency = AFTER, the
commit is only finished
after all members have
signaled that they have
the transaction prepared
on disk;
• That extends the
transaction latency, but
the apply is really
synchronous.
GR + Synchronous writes
MySQL
time
W P S B W S
W R
W
P
S W S
B
A Group Replication transaction
with consistency=AFTER
R - Read from disk
W - Write to disk
P - Prepare
S - Sync to disk
B - Broadcast
USER APPLICATION (ONE TRANSACTION)
READ
R
DATA
BASE
WRITE
REDO
LOG
COMMIT
throttle (fc)
BINARY
LOG
BEGIN
CERTIFY
RELAY
LOG
CERTIFY
REDO
LOG
BINARY
LOG
APPLY
MASTER
SLAVE
STORAGE
 SYSTEM
STORAGE
 SYSTEM
STORAGE
 SYSTEM
GROUP REPLICATION
(PAXOS)
Copyright © 2020, Oracle and/or its affiliates16
MySQL disk throughput while running Sysbench
RW and Update Index in a bare metal machine
with local SSD.
Main observations:
• At load, the redo log takes 69% of disk write
bandwidth, the binary log takes 21% and the
database takes 10%;
• The total consumed write bandwidth is very
similar in both, at 180MB/s (RW) and
190MB/s (UI).
A sample write pattern
Sysbench workloads
Copyright © 2020, Oracle and/or its affiliates17
Main observations:
• The transaction size on the redo log and on
the binary log is similar, with the redo log
being larger by 7% to 10%;
• The additional write bandwidth for the redo
log is for rewriting blocks as additional data is
logged, but only part of these writes reach
the disk;
• However, fsync’ing frequently forces the
intermediate writes to be sent to disk, and
that can happen multiple times per
transaction.
Data written per transaction
Sysbench workloads
Copyright © 2020, Oracle and/or its affiliates18
Deploying a database system in any Cloud infrastructure introduces a few
performance problems one must be aware.
Some are actually new incarnation of old problems, that the availability of
cheap highly multi-core machines with flash storage was making us forget:
• storage is slow - network storage must always pay the network latency
penalty;
• computing power is expensive - every core used is now billed;
• network bandwidth is limited - it is split between the VMs in the server
and usually even storage needs to go through it.
But here we will focus more on network latency.
Cloud Infrastructure
Copyright © 2020, Oracle and/or its affiliates19
The main factor affecting the performance of
MySQL in a Cloud deployment is latency.
That latency impacts in the connections
between:
• the client and the server,
• the server and the storage ,
• the server and its replicas in a replication
topology.
Transaction execution is mostly affected by the
latency to fsync the redo and binlog to disk.
Latency is very important
Cloud Infrastructure
Copyright © 2020, Oracle and/or its affiliates20
Clouds usually support nodes with networked
storage and/or nodes with local storage.
• Local storage can provide very low latency,
but it is expensive and has a number of
limitations.
• Networked storage is less expensive, has
more features and is easier to manage both
to users and to the infrastructure
management.
Networked storage is much more common, and
it allows the deployments to be simpler, there will
be only one interface in and out of computing
nodes – the network.
Storage is also network
Cloud Infrastructure
Copyright © 2020, Oracle and/or its affiliates21
The available network bandwidth is also relevant
to the actual latency of a request:
• a reply can only leave the remote node when
the request message is fully received, so the
size of the packets must be considered;
• there is nothing similar to cut-through
routing for services.
Sharing the network bandwidth by multiple
services, particularly between serving client
requests and accessing storage, amplifies this
issue.
Bandwidth is also latency
Cloud Infrastructure
Copyright © 2020, Oracle and/or its affiliates22
To help estimate the behavior a small
simulator was built as a spreadsheet that
includes parameters of the workload and of
the underlying computing infrastructure,
and generates simulations on how the
system could behave.
Estimating MySQL throughput
A transaction simulator
The charts that follow, marked with SIMULATION on the title, are an exercise around estimating the upper
bound on performance taking into account some characteristics of the underlying computing system.
The following charts do not result from actual measurements!
Copyright © 2020, Oracle and/or its affiliates23
Observations:
• Different workloads imply different behaviors
from the system, so it is important to know
the characteristics of the workload;
• The Sysbench benchmarks presented
represent a read-write workload, a write-only
workload and a single-update, and the shape
is similar to the behavior found in practice.
Different Sysbench Benchmarks
A transaction simulator
Copyright © 2020, Oracle and/or its affiliates24
Observations:
• When the computing effort per transaction is
low, the gain from going to larger machines
becomes smaller;
• As the latency from other factors dominates,
the gain is only realized when the number of
threads is very high.
Number of processor cores
A transaction simulator
Copyright © 2020, Oracle and/or its affiliates25
Observations:
• The context switching between threads is
expensive, and many threads become less
effective if the workload is CPU bound;
• As the number of threads grows the
performance may drop, and there is a
parameter to mimic some interference
between threads.
Context switching impact
A transaction simulator
Copyright © 2020, Oracle and/or its affiliates26
Observations:
• The distance between the clients and the
server can have a dramatic impact on the
throughput;
• That is mainly due to the synchronous
request/response model of the MySQL client
protocol.
Impact of the distance to client
A transaction simulator
Copyright © 2020, Oracle and/or its affiliates27
Observations:
• The performance of the storage system is
critical to the throughput of MySQL;
• While the throughput can grow to the point
where the difference is less visible, at lower
thread counts the latency, in particular, is a
major factor impacting.
Impact of the storage technology
A transaction simulator
Copyright © 2020, Oracle and/or its affiliates28
Observations:
• The impact of durability is hidden if the
number of threads is high;
• But at lower thread counts, the impact is
larger, as the throughput curve is shifted to
the right;
• Again, this is similar to what is observed in
real systems.
Impact of the durability settings
A transaction simulator
Copyright © 2020, Oracle and/or its affiliates29
Observations:
• Group Replication can be used to avoid using
fsyncs in the local storage, in practice
replacing local for network durability;
• Since the GR protocol depends only on two
RTT, it may bring better performance than
writing to a distributed storage system.
Impact of Group Replication
A transaction simulator
Copyright © 2020, Oracle and/or its affiliates30
Observations:
• The distance between members is very
significant;
• However, it applies once per transaction, and
only for those that write to disk, so the impact
is less than the client distance.
Impact of GR member distance
A transaction simulator
Copyright © 2020, Oracle and/or its affiliates31
Observations:
• Comparing the impact of the latency in client
to server connection to the impact of having
group replication members with similar
latency, one can see that it is better to use GR;
• GR only adds latency as a single message at
commit time, instead of having the latency
impact all queries.
WAN latency: GR vs client
A transaction simulator
Copyright © 2020, Oracle and/or its affiliates32
Observations:
• The impact of the binary log is two-fold: it
adds latency due to the write to disk and due
do the fsync to disk;
• Having lower durability allows the throughput
to follow the performance of the server
closer, something that is even more relevant
in the Cloud, as sync time can be much larger.
Impact of the binary log
A transaction simulator
Copyright © 2020, Oracle and/or its affiliates33
Observations:
• The read operations cannot be hidden in the
transaction execution;
• If the size of the buffer pool in smaller than
the active data set, the probability of needing
to pay the penalty becomes larger;
• With a small buffer pool the effective use of
the computing resources is impaired.
Impact of the buffer pool size
A transaction simulator
Copyright © 2020, Oracle and/or its affiliates34
Observations:
• Using an intermediate router adds latency to
all queries, so the effect is the same as
having a farther away client;
• While that impact can be reduced at higher
thread counts, the impact is significant if
there are not enough threads to hide the
latency.
Impact of an intermediate router
A transaction simulator
Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted35
www.mysqlhighavailability.com
For more information check out
Thank you
MySQL Replication Performance in the Cloud

More Related Content

What's hot (20)

PDF
MySQL InnoDB Cluster: Management and Troubleshooting with MySQL Shell
Miguel Araújo
 
PDF
MySQL 8.0 InnoDB Cluster - Easiest Tutorial
Frederic Descamps
 
PDF
MySQL InnoDB Cluster HA Overview & Demo
Keith Hollman
 
PDF
MySQL Database Architectures - High Availability and Disaster Recovery Solution
Miguel Araújo
 
PDF
MySQL User Group NL - MySQL 8
Frederic Descamps
 
PDF
MySQL 8 High Availability with InnoDB Clusters
Miguel Araújo
 
PDF
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?
Miguel Araújo
 
PPTX
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
RomanKhavronenko
 
PDF
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
Jean-François Gagné
 
PDF
MySQL Router REST API
Frederic Descamps
 
PDF
Another MySQL HA Solution for ProxySQL Users, Easy and All Integrated: MySQL ...
Frederic Descamps
 
PDF
Oracle RAC - New Generation
Anil Nair
 
PDF
HandsOn ProxySQL Tutorial - PLSC18
Derek Downey
 
PDF
MySQL Database Architectures - 2020-10
Kenny Gryp
 
PDF
Oracle RAC Internals - The Cache Fusion Edition
Markus Michalewicz
 
PDF
MAA Best Practices for Oracle Database 19c
Markus Michalewicz
 
PDF
InnoDb Vs NDB Cluster
Mark Swarbrick
 
PDF
MySQL Shell for DBAs
Frederic Descamps
 
PDF
MySQL InnoDB Cluster / ReplicaSet - Tutorial
Kenny Gryp
 
PDF
All about Zookeeper and ClickHouse Keeper.pdf
Altinity Ltd
 
MySQL InnoDB Cluster: Management and Troubleshooting with MySQL Shell
Miguel Araújo
 
MySQL 8.0 InnoDB Cluster - Easiest Tutorial
Frederic Descamps
 
MySQL InnoDB Cluster HA Overview & Demo
Keith Hollman
 
MySQL Database Architectures - High Availability and Disaster Recovery Solution
Miguel Araújo
 
MySQL User Group NL - MySQL 8
Frederic Descamps
 
MySQL 8 High Availability with InnoDB Clusters
Miguel Araújo
 
Disaster Recovery with MySQL InnoDB ClusterSet - What is it and how do I use it?
Miguel Araújo
 
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
RomanKhavronenko
 
MySQL Parallel Replication: All the 5.7 and 8.0 Details (LOGICAL_CLOCK)
Jean-François Gagné
 
MySQL Router REST API
Frederic Descamps
 
Another MySQL HA Solution for ProxySQL Users, Easy and All Integrated: MySQL ...
Frederic Descamps
 
Oracle RAC - New Generation
Anil Nair
 
HandsOn ProxySQL Tutorial - PLSC18
Derek Downey
 
MySQL Database Architectures - 2020-10
Kenny Gryp
 
Oracle RAC Internals - The Cache Fusion Edition
Markus Michalewicz
 
MAA Best Practices for Oracle Database 19c
Markus Michalewicz
 
InnoDb Vs NDB Cluster
Mark Swarbrick
 
MySQL Shell for DBAs
Frederic Descamps
 
MySQL InnoDB Cluster / ReplicaSet - Tutorial
Kenny Gryp
 
All about Zookeeper and ClickHouse Keeper.pdf
Altinity Ltd
 

Similar to MySQL Replication Performance in the Cloud (20)

PDF
Disadvantages Of Robotium
Susan Tullis
 
PDF
AskTom: How to Make and Test Your Application "Oracle RAC Ready"?
Markus Michalewicz
 
PDF
AAI-4847 Full Disclosure on the Performance Characteristics of WebSphere Appl...
WASdev Community
 
PPT
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Toronto-Oracle-Users-Group
 
PDF
Make Your Application “Oracle RAC Ready” & Test For It
Markus Michalewicz
 
PPTX
Sql server 2019 New Features by Yevhen Nedaskivskyi
Alex Tumanoff
 
PPT
Ebook10
kaashiv1
 
PPT
Sql interview question part 10
kaashiv1
 
PDF
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2
 
PDF
MIGRATION OF AN OLTP SYSTEM FROM ORACLE TO MYSQL AND COMPARATIVE PERFORMANCE ...
cscpconf
 
PPTX
Oracle Real Application Cluster ( RAC )
varasteh65
 
PDF
My sql cluster case study apr16
Sumi Ryu
 
PPTX
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
LarryZaman
 
DOC
Clustering overview2
Vinod Hanumantharayappa
 
PPTX
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
DataStax Academy
 
PDF
KoprowskiT_SQLRelay2014#9_London_FromPlanToBackupToCloud
Tobias Koprowski
 
PDF
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld
 
PPTX
WebSphere App Server vs JBoss vs WebLogic vs Tomcat (InterConnect 2016)
Roman Kharkovski
 
PPTX
Systems oracle overview_hardware
Fran Navarro
 
Disadvantages Of Robotium
Susan Tullis
 
AskTom: How to Make and Test Your Application "Oracle RAC Ready"?
Markus Michalewicz
 
AAI-4847 Full Disclosure on the Performance Characteristics of WebSphere Appl...
WASdev Community
 
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Toronto-Oracle-Users-Group
 
Make Your Application “Oracle RAC Ready” & Test For It
Markus Michalewicz
 
Sql server 2019 New Features by Yevhen Nedaskivskyi
Alex Tumanoff
 
Ebook10
kaashiv1
 
Sql interview question part 10
kaashiv1
 
WSO2 Customer Webinar: WEST Interactive’s Deployment Approach and DevOps Prac...
WSO2
 
MIGRATION OF AN OLTP SYSTEM FROM ORACLE TO MYSQL AND COMPARATIVE PERFORMANCE ...
cscpconf
 
Oracle Real Application Cluster ( RAC )
varasteh65
 
My sql cluster case study apr16
Sumi Ryu
 
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
LarryZaman
 
Clustering overview2
Vinod Hanumantharayappa
 
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
DataStax Academy
 
KoprowskiT_SQLRelay2014#9_London_FromPlanToBackupToCloud
Tobias Koprowski
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld
 
WebSphere App Server vs JBoss vs WebLogic vs Tomcat (InterConnect 2016)
Roman Kharkovski
 
Systems oracle overview_hardware
Fran Navarro
 
Ad

Recently uploaded (20)

PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Ad

MySQL Replication Performance in the Cloud

  • 1. Performance Architect Oracle/MySQL/Replication Jan 31, 2020 Vítor Oliveira pre-FOSDEM 2020 MySQL Days MySQL Replication Performance in the Cloud
  • 2. The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation. Safe harbor statement Copyright © 2020, Oracle and/or its affiliates2
  • 3. Copyright © 2020, Oracle and/or its affiliates3 5 4 3 2 1 Conclusion A transaction simulator Benchmarking replication Can we estimate the behavior? Introduction Agenda
  • 4. Copyright © 2020, Oracle and/or its affiliates4 Evaluating the performance of a system properly is difficult, error prone and expensive, both in terms of time and of computing resources. • Performance benchmarks are a good tool to improve the software, to help design and size infrastructures where to deploy the software. • However, performance benchmarks alone are insufficient, we need to understand why the results are in such a way, so that information is of any use in different scenarios. Introduction
  • 5. Copyright © 2020, Oracle and/or its affiliates5 A problem for performance evaluation is knowing what is reasonable to expect from a particular workload in a particular deployment scenarios. • The limits of each component of the system, and how they are being reached, is unknown; • The cumulative effects of multiple changes may produce unexpected results; • And the software features to test may have not been built yet… In the presentation I will show some benchmarks related to running MySQL replication on the Oracle Cloud Infrastructure (OCI), and them try to provide some inner details that might help on the problems above. Introduction
  • 6. Copyright © 2020, Oracle and/or its affiliates6 We will start by presenting a performance evaluation of the MySQL Replication technologies running on the OCI, varying the shape type and some configuration parameters. The following testing setup was built: • Topologies with 3 servers on OCI, one in each AD, and a single client machine in the same AD as the primary server; • The benchmark selected was the usual Sysbench, in particular OLTP RW, OLTP Write-only and Update Index; • The machines selected where both a set of three OCI’s BM.Standard.E2.64 (64 core, 128 HW threads), and a set of VM.Standard.E2.8 (8 core, 128 HW threads). Benchmarking replication
  • 7. Copyright © 2020, Oracle and/or its affiliates7 Comparison between a single server and three servers in different ADs, using semi-sync and group replication. Two different type of servers: 1 VM with 8 cores and a full bare metal machine with 64 cores. Overview of replication performance Benchmarking replication 0 2000 4000 6000 8000 10000 12000 1 2 4 8 16 32 48 64 96 128 192 256 512 Throughput(transactions/second) Number of client threads Sysbench Update Index Throughput on OCI BM.E2 and VM.Standard.E2.8 Standalone vs Semi-sync vs Group Replication BM Standalone BM Semi-sync BM Group Replication E2.8 Standalone E2.8 Semi-sync E2.8 Group Replication 0 1000 2000 3000 4000 5000 6000 7000 1 2 4 8 16 32 48 64 96 128 192 256 512 Throughput(transactions/second) Number of client threads Sysbench Read-Write Throughput on OCI BM.E2 and VM.Standard.E2.8 Standalone vs Semi-sync vs Group Replication BM Standalone BM Semi-sync BM Group Replication E2.8 Standalone E2.8 Semi-sync E2.8 Group Replication 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 1 2 4 8 16 32 48 64 96 128 192 256 512 Throughput(transactions/second) Number of client threads Sysbench Write-only Throughput on OCI BM.E2 and VM.Standard.E2.8 Standalone vs Semi-sync vs Group Replication BM Standalone BM Semi-sync BM Group Replication E2.8 Standalone E2.8 Semi-sync E2.8 Group Replication
  • 8. Copyright © 2020, Oracle and/or its affiliates8 Impact of relaxing the durability on the disks, transferring it to the network. The gain is significant, as the distributed storage system is more complex and with a higher latency than the Paxos from Group Replication. Impact of durability on throughput Benchmarking replication 0 1000 2000 3000 4000 5000 6000 7000 8000 1 2 4 8 16 32 48 64 96 128 192 256 512 Throughput(transactions/second) Number of client threads Sysbench Read-Write Throughput on OCI BM.Standard.E2 Standalone vs Semi-sync vs Group Replication, Durability Standalone Semisync Semi-nosync GR GR-nosync 0 2000 4000 6000 8000 10000 12000 14000 1 2 4 8 16 32 48 64 96 128 192 256 512 Throughput(transactions/second) Number of client threads Sysbench Write-only Throughput on OCI BM.Standard.E2 Standalone vs Semi-sync vs Group Replication, Durability Standalone Semisync Semi-nosync GR GR-nosync 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 1 2 4 8 16 32 48 64 96 128 192 256 512 Throughput(transactions/second) Number of client threads Sysbench Update Index Throughput on OCI BM.Standard.E2 Standalone vs Semi-sync vs Group Replication, Durability Standalone Semisync Semi-nosync GR GR-nosync
  • 9. Copyright © 2020, Oracle and/or its affiliates9 There benchmarks are representative of a particular workload in a particular testing scenario. When evaluating such benchmarks many questions pop up: • What explains those results? • Can I expect the same behavior for my workload in the same setup? • How does the performance compare other cloud providers? • How does the performance compare to on-premises deployments? • Would other parameters be more effective for a particular purpose? • Were the measurements properly taken? • What can I reliably estimate using these results? What do we gain with these performance evaluation results? Benchmarking replication
  • 10. Copyright © 2020, Oracle and/or its affiliates10 The proper way to benchmark a system is to deploy it in a real system, and apply to it the actual user workload, even if that is hard to do in many cases. Benchmarks can be use capture three different things (often hard to distinguish): 1. The performance of the software implementation of certain features tested by a benchmark in a particular way; 2. The noise in the environment and other test setup limitations; 3. The fundamental architectural and configuration details of the computing system used in the tests. Wouldn't it be great if we had a model that represents different deployment scenarios and calculate a rough estimate on the throughput to expect? Can we estimate the behavior?
  • 11. Copyright © 2020, Oracle and/or its affiliates11 While the software implementation is hard to model, the fundamental aspects of the computing systems that are used are not, at least in a very simplified way. This model can be build on information such as: • The architecture of MySQL and how it uses the computing resources; • The characteristics of the workload: number of queries on the workload, size of the active dataset compared to the memory size; • And the computing system characteristics: network latency and bandwidth, computing resources, number of network jumps. Building a simple model Can we estimate the behavior?
  • 12. Copyright © 2020, Oracle and/or its affiliates12 On the right is a really simple model for the execution of a transaction. It demonstrates the goal, but we need more details… Building a simple model Can we estimate the behavior?
  • 13. Copyright © 2020, Oracle and/or its affiliates13 Lifecycle of a transaction: • Execute the queries as they are sent, read any data from the database that isn’t already in the buffer pool, write the changes to the redo log, prepare the transaction, fsync (phase 1); • Write the full transaction to the binary log, fsync (phase 2); • Send the binary log to the slave and apply the same changes there. Asynchronous Replication MySQL time An Asynchronous Replication transaction R - Read from disk W - Write to disk P - Prepare S - Sync to disk W R W P S W S W R USER APPLICATION (ONE TRANSACTION) READ DATA BASE WRITE REDO LOG BINARY LOG BEGIN DUMP RELAY LOGS IO REDO LOG BINARY LOG APPLY MASTER SLAVE STORAGE  SYSTEM STORAGE  SYSTEM COMMIT W P S SR
  • 14. Copyright © 2020, Oracle and/or its affiliates14 Changes to the lifecycle: • Group replication intercepts the transactions before they are committed and broadcasts them to the network; • The Paxos protocol in GR requires at least 1 or 1.5 RTT to propagate the messages, although 2 RTT is more common; • After that the certification takes over and the rest of the pipeline is similar to asynchronous replication. Group Replication MySQL USER APPLICATION (ONE TRANSACTION) READ DATA BASE WRITE REDO LOG COMMIT throttle (fc) BINARY LOG BEGIN CERTIFY RELAY LOG CERTIFY REDO LOG BINARY LOG APPLY MASTER SLAVE STORAGE  SYSTEM STORAGE  SYSTEM STORAGE  SYSTEM GROUP REPLICATION (PAXOS) time A Group Replication transaction with consistency=EVENTUAL R - Read from disk W - Write to disk P - Prepare S - Sync to disk B - Broadcast W P S BR W R W P S W S
  • 15. Copyright © 2020, Oracle and/or its affiliates15 Changes to the lifecycle: • With transaction consistency = AFTER, the commit is only finished after all members have signaled that they have the transaction prepared on disk; • That extends the transaction latency, but the apply is really synchronous. GR + Synchronous writes MySQL time W P S B W S W R W P S W S B A Group Replication transaction with consistency=AFTER R - Read from disk W - Write to disk P - Prepare S - Sync to disk B - Broadcast USER APPLICATION (ONE TRANSACTION) READ R DATA BASE WRITE REDO LOG COMMIT throttle (fc) BINARY LOG BEGIN CERTIFY RELAY LOG CERTIFY REDO LOG BINARY LOG APPLY MASTER SLAVE STORAGE  SYSTEM STORAGE  SYSTEM STORAGE  SYSTEM GROUP REPLICATION (PAXOS)
  • 16. Copyright © 2020, Oracle and/or its affiliates16 MySQL disk throughput while running Sysbench RW and Update Index in a bare metal machine with local SSD. Main observations: • At load, the redo log takes 69% of disk write bandwidth, the binary log takes 21% and the database takes 10%; • The total consumed write bandwidth is very similar in both, at 180MB/s (RW) and 190MB/s (UI). A sample write pattern Sysbench workloads
  • 17. Copyright © 2020, Oracle and/or its affiliates17 Main observations: • The transaction size on the redo log and on the binary log is similar, with the redo log being larger by 7% to 10%; • The additional write bandwidth for the redo log is for rewriting blocks as additional data is logged, but only part of these writes reach the disk; • However, fsync’ing frequently forces the intermediate writes to be sent to disk, and that can happen multiple times per transaction. Data written per transaction Sysbench workloads
  • 18. Copyright © 2020, Oracle and/or its affiliates18 Deploying a database system in any Cloud infrastructure introduces a few performance problems one must be aware. Some are actually new incarnation of old problems, that the availability of cheap highly multi-core machines with flash storage was making us forget: • storage is slow - network storage must always pay the network latency penalty; • computing power is expensive - every core used is now billed; • network bandwidth is limited - it is split between the VMs in the server and usually even storage needs to go through it. But here we will focus more on network latency. Cloud Infrastructure
  • 19. Copyright © 2020, Oracle and/or its affiliates19 The main factor affecting the performance of MySQL in a Cloud deployment is latency. That latency impacts in the connections between: • the client and the server, • the server and the storage , • the server and its replicas in a replication topology. Transaction execution is mostly affected by the latency to fsync the redo and binlog to disk. Latency is very important Cloud Infrastructure
  • 20. Copyright © 2020, Oracle and/or its affiliates20 Clouds usually support nodes with networked storage and/or nodes with local storage. • Local storage can provide very low latency, but it is expensive and has a number of limitations. • Networked storage is less expensive, has more features and is easier to manage both to users and to the infrastructure management. Networked storage is much more common, and it allows the deployments to be simpler, there will be only one interface in and out of computing nodes – the network. Storage is also network Cloud Infrastructure
  • 21. Copyright © 2020, Oracle and/or its affiliates21 The available network bandwidth is also relevant to the actual latency of a request: • a reply can only leave the remote node when the request message is fully received, so the size of the packets must be considered; • there is nothing similar to cut-through routing for services. Sharing the network bandwidth by multiple services, particularly between serving client requests and accessing storage, amplifies this issue. Bandwidth is also latency Cloud Infrastructure
  • 22. Copyright © 2020, Oracle and/or its affiliates22 To help estimate the behavior a small simulator was built as a spreadsheet that includes parameters of the workload and of the underlying computing infrastructure, and generates simulations on how the system could behave. Estimating MySQL throughput A transaction simulator The charts that follow, marked with SIMULATION on the title, are an exercise around estimating the upper bound on performance taking into account some characteristics of the underlying computing system. The following charts do not result from actual measurements!
  • 23. Copyright © 2020, Oracle and/or its affiliates23 Observations: • Different workloads imply different behaviors from the system, so it is important to know the characteristics of the workload; • The Sysbench benchmarks presented represent a read-write workload, a write-only workload and a single-update, and the shape is similar to the behavior found in practice. Different Sysbench Benchmarks A transaction simulator
  • 24. Copyright © 2020, Oracle and/or its affiliates24 Observations: • When the computing effort per transaction is low, the gain from going to larger machines becomes smaller; • As the latency from other factors dominates, the gain is only realized when the number of threads is very high. Number of processor cores A transaction simulator
  • 25. Copyright © 2020, Oracle and/or its affiliates25 Observations: • The context switching between threads is expensive, and many threads become less effective if the workload is CPU bound; • As the number of threads grows the performance may drop, and there is a parameter to mimic some interference between threads. Context switching impact A transaction simulator
  • 26. Copyright © 2020, Oracle and/or its affiliates26 Observations: • The distance between the clients and the server can have a dramatic impact on the throughput; • That is mainly due to the synchronous request/response model of the MySQL client protocol. Impact of the distance to client A transaction simulator
  • 27. Copyright © 2020, Oracle and/or its affiliates27 Observations: • The performance of the storage system is critical to the throughput of MySQL; • While the throughput can grow to the point where the difference is less visible, at lower thread counts the latency, in particular, is a major factor impacting. Impact of the storage technology A transaction simulator
  • 28. Copyright © 2020, Oracle and/or its affiliates28 Observations: • The impact of durability is hidden if the number of threads is high; • But at lower thread counts, the impact is larger, as the throughput curve is shifted to the right; • Again, this is similar to what is observed in real systems. Impact of the durability settings A transaction simulator
  • 29. Copyright © 2020, Oracle and/or its affiliates29 Observations: • Group Replication can be used to avoid using fsyncs in the local storage, in practice replacing local for network durability; • Since the GR protocol depends only on two RTT, it may bring better performance than writing to a distributed storage system. Impact of Group Replication A transaction simulator
  • 30. Copyright © 2020, Oracle and/or its affiliates30 Observations: • The distance between members is very significant; • However, it applies once per transaction, and only for those that write to disk, so the impact is less than the client distance. Impact of GR member distance A transaction simulator
  • 31. Copyright © 2020, Oracle and/or its affiliates31 Observations: • Comparing the impact of the latency in client to server connection to the impact of having group replication members with similar latency, one can see that it is better to use GR; • GR only adds latency as a single message at commit time, instead of having the latency impact all queries. WAN latency: GR vs client A transaction simulator
  • 32. Copyright © 2020, Oracle and/or its affiliates32 Observations: • The impact of the binary log is two-fold: it adds latency due to the write to disk and due do the fsync to disk; • Having lower durability allows the throughput to follow the performance of the server closer, something that is even more relevant in the Cloud, as sync time can be much larger. Impact of the binary log A transaction simulator
  • 33. Copyright © 2020, Oracle and/or its affiliates33 Observations: • The read operations cannot be hidden in the transaction execution; • If the size of the buffer pool in smaller than the active data set, the probability of needing to pay the penalty becomes larger; • With a small buffer pool the effective use of the computing resources is impaired. Impact of the buffer pool size A transaction simulator
  • 34. Copyright © 2020, Oracle and/or its affiliates34 Observations: • Using an intermediate router adds latency to all queries, so the effect is the same as having a farther away client; • While that impact can be reduced at higher thread counts, the impact is significant if there are not enough threads to hide the latency. Impact of an intermediate router A transaction simulator
  • 35. Copyright © 2020, Oracle and/or its affiliates | Confidential: Internal/Restricted/Highly Restricted35 www.mysqlhighavailability.com For more information check out Thank you