SlideShare a Scribd company logo
Hello
Managing MySQL Scale Through Consolidation
Percona Live 04/15/15
Chris Merz, @merzdba
DB Systems Architect, SolidFire
Enterprise Scale MySQL Challenges
• Many MySQL instances (10s-100s-1000s)
• Often 100s of GB or multi-TB range
• Capacity planning and resource management
• Quickly respond to changing requirements
• Ability to quickly scale in real-time
Confidently Planning for the Future
• Understand application data storage profile
• Predict growth trajectory for MySQL platform
• Capacity planning and resource management
– Compute Resources
– Memory Allocation
– Storage Resources
• Growth and scaling plan for application
• Public cloud TCO tipping-point plan
Planning for the Future: Compute
• Compute Resources
– What is contributing to total CPU consumption?
(query processing, i/o wait, virtualized steal?, etc)
– Is CPU utilization truly driven from mysqld
churning and processing data (rather than wait,
steal, network wait, etc)?
– What is the CPU growth rate for your systems?
Planning for the Future: Memory
• Memory Resources
– Memory bound?
– ‘Hot set’ of data fit in memory?
– Are queries optimized to only pull required
data?
– Indexes? Over indexed? Under indexed?
– When is the next RAM increase required?
Planning for the Future: Storage
• Storage Resources
– Disk resource heavy?
– IOPS bound?
– Max IOPS ceiling for disk configuration?
– Disk latencies within acceptable tolerance?
– Sequential reads, large chunk processing?
– More random in nature? Disk heads thrashing?
– Is flash memory required?
– Disk usage consumption rate?
– When is the next storage addition required?
– How will you add that capacity?
– Extend existing filesystem natively? (xfs_grow)
– Filesystem-per-database strategy (symlinks)?
Planning for the Future: Topology
• MySQL topology scaling plan
– Master cluster servers (Percona Cluster, Galera)
– Master shard servers (Homegrown, ClusterixDB, etc)
– Slave servers (read-heavy environments)
– DevTest instances that require prod db copies
– Reporting and Data Warehouse instances
– Public vs Private Cloud
– Orchestration, OpenStack, DBaaS: Trove
Instrument and Gather
• System Performance Data is Essential
– Quantify change rate over time
– Monitor every layer, from app to storage
– Zabbix, Zenoss, Munin, Cacti, Nagios, Graphite
– Critical to real-time troubleshooting
“Trust in God, all others bring Data”
Larger Data Sets == Larger Challenges
• Automation increasingly important
• Leverage software defined infrastructure
• Quickly react to increase performance
• Deploy additional slaves for read scaling
• Refresh DevTest, QA, Business copies
• Improve Backup and Restore times
Leverage Point: Pivot on the Storage Layer
• Invert the Dominant Paradigm
• Data Gravity and Storage Jiu-jitsu
• Move the platform around the data
• Modern shared storage capabilities
– Snap/clone, writable snapshots
– De-duplication, QoS allocation, Scale-out
• Efficient use of storage resources
Shared Storage: MySQL Ops Secret Weapon
• Real-time resource allocation
– Extend volume capacity
– Designate Min/Max/Burst IOPS
• Dev/Test secondary copies
– Deployment for growing teams
– Refreshes for faster iterations
• Replication slave creation
– MySQL read arrays
– HA warm standby copies
• Decrease/eliminate backup windows
• Accelerate restore scenarios
Real-time resource allocation
• Modern storage virtualizes resources
• Allows for dynamic allocation
• Increase capacity on the fly
• Change QoS settings (Min/Max/Burst IOPS)
• Scale ‘up’ instantly without lead time
• Scale out – horizontal storage growth
Deploy Secondary Copies in Seconds
• Snapshot the prod MySQL storage volume
• Create a volume clone from the snapshot
• Attach the cloned volume to a MySQL VM
• service mysqld start
• For a 1TB dataset: ~6 hrs -> ~90 seconds
Deploy Replication Slaves in Seconds
• Flush the target master, SHOW MASTER STATUS
• Snapshot the prod MySQL storage volume
• Create a volume clone from the snapshot
• Mount the cloned volume to a MySQL instance
• Increment server_id; remove auto.cnf
• Script in the CHANGE MASTER config
• service mysqld start
• start slave
• For a 1TB dataset: ~9+ hrs -> ~100 seconds
Decrease or Eliminate Backup Windows
• Leverage instant snapshots
• Creates crash consistent backups
• Zero impact to production performance
• Multiple volumes? Group snapshot
• Suitable for many use cases
• Efficient storage utilization (meta data only)
Accelerate Restore Scenarios
• Snapshot Backups are crash consistent
• Ideal time-sensitive restore operations
• Snapshots applied to volumes instantly
• Revert in seconds:
– Stop mysqld
– Unmount /var/lib/mysql
– Restore storage volume from snapshot (instant)
– Remount /var/lib/mysql
– Start mysqld
• Key: block storage metadata manipulation
Managing Scale Through Consolidation
• Enterprise/Web Scale MySQL: new set of challenges
• Understanding growth patterns is essential
• Virtualization and Orchestration ecosystems
• Data Gravity requires a capable storage layer
• Pivot on storage to avoid data transfer
Thank You
Come visit us at Booth #211
We’re also hiring…

More Related Content

What's hot (20)

PPTX
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
ScyllaDB
 
PDF
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
ScyllaDB
 
PPTX
mParticle's Journey to Scylla from Cassandra
ScyllaDB
 
PDF
Critical Attributes for a High-Performance, Low-Latency Database
ScyllaDB
 
PDF
Seattle Cassandra Meetup - HasOffers
btoddb
 
PPTX
How to power microservices with MariaDB
MariaDB plc
 
PDF
Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay
DataStax Academy
 
PPTX
Redis Labs and SQL Server
Lynn Langit
 
PDF
How MariaDB is approaching DBaaS
MariaDB plc
 
PDF
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
ScyllaDB
 
PPTX
Scylla Summit 2022: Scylla 5.0 New Features, Part 2
ScyllaDB
 
PPTX
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
MariaDB plc
 
PPTX
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...
ScyllaDB
 
PDF
Getting started in the cloud for developers
MariaDB plc
 
PDF
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
ScyllaDB
 
PDF
Apache Cassandra in the Real World
Jeremy Hanna
 
PDF
Measuring Database Performance on Bare Metal AWS Instances
ScyllaDB
 
PDF
Apache Cassandra in the Cloud
Instaclustr
 
PPTX
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
DataStax Academy
 
PDF
How to Build a Scylla Database Cluster that Fits Your Needs
ScyllaDB
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
ScyllaDB
 
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand Users
ScyllaDB
 
mParticle's Journey to Scylla from Cassandra
ScyllaDB
 
Critical Attributes for a High-Performance, Low-Latency Database
ScyllaDB
 
Seattle Cassandra Meetup - HasOffers
btoddb
 
How to power microservices with MariaDB
MariaDB plc
 
Cassandra Summit 2014: Apache Cassandra Best Practices at Ebay
DataStax Academy
 
Redis Labs and SQL Server
Lynn Langit
 
How MariaDB is approaching DBaaS
MariaDB plc
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
ScyllaDB
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 2
ScyllaDB
 
How Alibaba Cloud scaled ApsaraDB with MariaDB MaxScale
MariaDB plc
 
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...
ScyllaDB
 
Getting started in the cloud for developers
MariaDB plc
 
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
ScyllaDB
 
Apache Cassandra in the Real World
Jeremy Hanna
 
Measuring Database Performance on Bare Metal AWS Instances
ScyllaDB
 
Apache Cassandra in the Cloud
Instaclustr
 
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
DataStax Academy
 
How to Build a Scylla Database Cluster that Fits Your Needs
ScyllaDB
 

Similar to Managing MySQL Scale Through Consolidation (20)

PDF
Scaling, Tuning and Maintaining the Monolith
Ross McFadyen
 
PDF
Evolution of DBA in the Cloud Era
Mydbops
 
PDF
Scaling MySQL in Amazon Web Services
Laine Campbell
 
PDF
Scaling MySQL -- Swanseacon.co.uk
Dave Stokes
 
PDF
Running MySQL on Linux
Great Wide Open
 
PDF
The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...
Dave Stokes
 
PPTX
Scaling Techniques to Increase Magento Capacity
Clustrix
 
PDF
2_MySQL_Cluster_Introduction.pdf
Haiping Li
 
PDF
MySQL infra readiness-for-peak-sale-events - Kabilesh PR (Co-Founder of Mydbops)
Mydbops
 
PPTX
Black Friday and Cyber Monday- Best Practices for Your E-Commerce Database
Tim Vaillancourt
 
PPTX
MySQL Tech Tour 2015 - Manage & Tune
Mark Swarbrick
 
PDF
Lessons learned when managing MySQL in the Cloud
Igor Donchovski
 
PPTX
Case Study with Answers.com on Scaling with Memcached and MySQL
answers
 
PDF
The Peoper Care and Feeding of a MySQL Server for Busy Linux Admin
Dave Stokes
 
PDF
Proper Care and Feeding of a MySQL Database for Busy Linux Administrators
Dave Stokes
 
PDF
Getting 100B Metrics to Disk
jthurman42
 
PDF
Linuxfest Northwest Proper Care and Feeding Of a MySQL for Busy Linux Admins
Dave Stokes
 
PDF
Capacity planning for your data stores
Colin Charles
 
PPTX
Архитектура приложений с использованием MySQL, Петр Зайцев (Percona)
Ontico
 
PPTX
Augmenting MySQL with NoSQL options - Data Lifecycles
David Murphy
 
Scaling, Tuning and Maintaining the Monolith
Ross McFadyen
 
Evolution of DBA in the Cloud Era
Mydbops
 
Scaling MySQL in Amazon Web Services
Laine Campbell
 
Scaling MySQL -- Swanseacon.co.uk
Dave Stokes
 
Running MySQL on Linux
Great Wide Open
 
The Proper Care and Feeding of a MySQL Database for Busy Linux Admins -- SCaL...
Dave Stokes
 
Scaling Techniques to Increase Magento Capacity
Clustrix
 
2_MySQL_Cluster_Introduction.pdf
Haiping Li
 
MySQL infra readiness-for-peak-sale-events - Kabilesh PR (Co-Founder of Mydbops)
Mydbops
 
Black Friday and Cyber Monday- Best Practices for Your E-Commerce Database
Tim Vaillancourt
 
MySQL Tech Tour 2015 - Manage & Tune
Mark Swarbrick
 
Lessons learned when managing MySQL in the Cloud
Igor Donchovski
 
Case Study with Answers.com on Scaling with Memcached and MySQL
answers
 
The Peoper Care and Feeding of a MySQL Server for Busy Linux Admin
Dave Stokes
 
Proper Care and Feeding of a MySQL Database for Busy Linux Administrators
Dave Stokes
 
Getting 100B Metrics to Disk
jthurman42
 
Linuxfest Northwest Proper Care and Feeding Of a MySQL for Busy Linux Admins
Dave Stokes
 
Capacity planning for your data stores
Colin Charles
 
Архитектура приложений с использованием MySQL, Петр Зайцев (Percona)
Ontico
 
Augmenting MySQL with NoSQL options - Data Lifecycles
David Murphy
 
Ad

More from NetApp (20)

PDF
DevOps the NetApp Way: 10 Rules for Forming a DevOps Team
NetApp
 
PDF
10 Reasons to Choose NetApp for EUC/VDI
NetApp
 
PDF
Spot Lets NetApp Get the Most Out of the Cloud
NetApp
 
PDF
NetApp #WFH: COVID-19 Impact Report
NetApp
 
PDF
4 Ways FlexPod Forms the Foundation for Cisco and NetApp Success
NetApp
 
PDF
NetApp 2020 Predictions
NetApp
 
PDF
NetApp 2020 Predictions
NetApp
 
PDF
NetApp 2020 Predictions in Tech
NetApp
 
PPTX
Corporate IT at NetApp
NetApp
 
PDF
Modernize small and mid-sized enterprise data management with the AFF C190
NetApp
 
PDF
Achieving Target State Architecture in NetApp IT
NetApp
 
PDF
10 Reasons Why Your SAP Applications Belong on NetApp
NetApp
 
PDF
Turbocharge Your Data with Intel Optane Technology and MAX Data
NetApp
 
PDF
Redefining HCI: How to Go from Hyper Converged to Hybrid Cloud Infrastructure
NetApp
 
PDF
Webinar: NetApp SaaS Backup
NetApp
 
PDF
NetApp 2019 Perspectives
NetApp
 
PDF
Künstliche Intelligenz ist in deutschen Unter- nehmen Chefsache
NetApp
 
PDF
Iperconvergenza come migliora gli economics del tuo IT
NetApp
 
PDF
10 Good Reasons: NetApp for Artificial Intelligence / Deep Learning
NetApp
 
PDF
NetApp IT’s Tiered Archive Approach for Active IQ
NetApp
 
DevOps the NetApp Way: 10 Rules for Forming a DevOps Team
NetApp
 
10 Reasons to Choose NetApp for EUC/VDI
NetApp
 
Spot Lets NetApp Get the Most Out of the Cloud
NetApp
 
NetApp #WFH: COVID-19 Impact Report
NetApp
 
4 Ways FlexPod Forms the Foundation for Cisco and NetApp Success
NetApp
 
NetApp 2020 Predictions
NetApp
 
NetApp 2020 Predictions
NetApp
 
NetApp 2020 Predictions in Tech
NetApp
 
Corporate IT at NetApp
NetApp
 
Modernize small and mid-sized enterprise data management with the AFF C190
NetApp
 
Achieving Target State Architecture in NetApp IT
NetApp
 
10 Reasons Why Your SAP Applications Belong on NetApp
NetApp
 
Turbocharge Your Data with Intel Optane Technology and MAX Data
NetApp
 
Redefining HCI: How to Go from Hyper Converged to Hybrid Cloud Infrastructure
NetApp
 
Webinar: NetApp SaaS Backup
NetApp
 
NetApp 2019 Perspectives
NetApp
 
Künstliche Intelligenz ist in deutschen Unter- nehmen Chefsache
NetApp
 
Iperconvergenza come migliora gli economics del tuo IT
NetApp
 
10 Good Reasons: NetApp for Artificial Intelligence / Deep Learning
NetApp
 
NetApp IT’s Tiered Archive Approach for Active IQ
NetApp
 
Ad

Recently uploaded (20)

PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
Digital Circuits, important subject in CS
contactparinay1
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 

Managing MySQL Scale Through Consolidation

  • 1. Hello Managing MySQL Scale Through Consolidation Percona Live 04/15/15 Chris Merz, @merzdba DB Systems Architect, SolidFire
  • 2. Enterprise Scale MySQL Challenges • Many MySQL instances (10s-100s-1000s) • Often 100s of GB or multi-TB range • Capacity planning and resource management • Quickly respond to changing requirements • Ability to quickly scale in real-time
  • 3. Confidently Planning for the Future • Understand application data storage profile • Predict growth trajectory for MySQL platform • Capacity planning and resource management – Compute Resources – Memory Allocation – Storage Resources • Growth and scaling plan for application • Public cloud TCO tipping-point plan
  • 4. Planning for the Future: Compute • Compute Resources – What is contributing to total CPU consumption? (query processing, i/o wait, virtualized steal?, etc) – Is CPU utilization truly driven from mysqld churning and processing data (rather than wait, steal, network wait, etc)? – What is the CPU growth rate for your systems?
  • 5. Planning for the Future: Memory • Memory Resources – Memory bound? – ‘Hot set’ of data fit in memory? – Are queries optimized to only pull required data? – Indexes? Over indexed? Under indexed? – When is the next RAM increase required?
  • 6. Planning for the Future: Storage • Storage Resources – Disk resource heavy? – IOPS bound? – Max IOPS ceiling for disk configuration? – Disk latencies within acceptable tolerance? – Sequential reads, large chunk processing? – More random in nature? Disk heads thrashing? – Is flash memory required? – Disk usage consumption rate? – When is the next storage addition required? – How will you add that capacity? – Extend existing filesystem natively? (xfs_grow) – Filesystem-per-database strategy (symlinks)?
  • 7. Planning for the Future: Topology • MySQL topology scaling plan – Master cluster servers (Percona Cluster, Galera) – Master shard servers (Homegrown, ClusterixDB, etc) – Slave servers (read-heavy environments) – DevTest instances that require prod db copies – Reporting and Data Warehouse instances – Public vs Private Cloud – Orchestration, OpenStack, DBaaS: Trove
  • 8. Instrument and Gather • System Performance Data is Essential – Quantify change rate over time – Monitor every layer, from app to storage – Zabbix, Zenoss, Munin, Cacti, Nagios, Graphite – Critical to real-time troubleshooting “Trust in God, all others bring Data”
  • 9. Larger Data Sets == Larger Challenges • Automation increasingly important • Leverage software defined infrastructure • Quickly react to increase performance • Deploy additional slaves for read scaling • Refresh DevTest, QA, Business copies • Improve Backup and Restore times
  • 10. Leverage Point: Pivot on the Storage Layer • Invert the Dominant Paradigm • Data Gravity and Storage Jiu-jitsu • Move the platform around the data • Modern shared storage capabilities – Snap/clone, writable snapshots – De-duplication, QoS allocation, Scale-out • Efficient use of storage resources
  • 11. Shared Storage: MySQL Ops Secret Weapon • Real-time resource allocation – Extend volume capacity – Designate Min/Max/Burst IOPS • Dev/Test secondary copies – Deployment for growing teams – Refreshes for faster iterations • Replication slave creation – MySQL read arrays – HA warm standby copies • Decrease/eliminate backup windows • Accelerate restore scenarios
  • 12. Real-time resource allocation • Modern storage virtualizes resources • Allows for dynamic allocation • Increase capacity on the fly • Change QoS settings (Min/Max/Burst IOPS) • Scale ‘up’ instantly without lead time • Scale out – horizontal storage growth
  • 13. Deploy Secondary Copies in Seconds • Snapshot the prod MySQL storage volume • Create a volume clone from the snapshot • Attach the cloned volume to a MySQL VM • service mysqld start • For a 1TB dataset: ~6 hrs -> ~90 seconds
  • 14. Deploy Replication Slaves in Seconds • Flush the target master, SHOW MASTER STATUS • Snapshot the prod MySQL storage volume • Create a volume clone from the snapshot • Mount the cloned volume to a MySQL instance • Increment server_id; remove auto.cnf • Script in the CHANGE MASTER config • service mysqld start • start slave • For a 1TB dataset: ~9+ hrs -> ~100 seconds
  • 15. Decrease or Eliminate Backup Windows • Leverage instant snapshots • Creates crash consistent backups • Zero impact to production performance • Multiple volumes? Group snapshot • Suitable for many use cases • Efficient storage utilization (meta data only)
  • 16. Accelerate Restore Scenarios • Snapshot Backups are crash consistent • Ideal time-sensitive restore operations • Snapshots applied to volumes instantly • Revert in seconds: – Stop mysqld – Unmount /var/lib/mysql – Restore storage volume from snapshot (instant) – Remount /var/lib/mysql – Start mysqld • Key: block storage metadata manipulation
  • 17. Managing Scale Through Consolidation • Enterprise/Web Scale MySQL: new set of challenges • Understanding growth patterns is essential • Virtualization and Orchestration ecosystems • Data Gravity requires a capable storage layer • Pivot on storage to avoid data transfer
  • 18. Thank You Come visit us at Booth #211 We’re also hiring…

Editor's Notes

  • #3: When you’re using MySQL in production, at scale in the Enterprise, you’re usually not dealing with 100MB databases (Unless you’re a service provider of some sort, maybe SaaS such as WordPress). When datasets get large, into the 100s of GB or multi-TB range, you end up managing a new class of challenges which broadly include 1) planning for the future (or, scaling your MySQL platform resources over time), and 2) Quickly responding to the needs of the present (or, scaling and re-allocating your system resources in real-time)
  • #4: Requires understanding your application data storage profile, Being able to predict, as accurately as possible, the growth trajectory for your MySQL systems This requires knowing your resource consumption profiles for Compute, Memory, Storage
  • #5: * Compute resources * What is contributing to CPU consumption? (query processing, i/o wait, virtualized steal?, etc) * Is CPU consumption is truly from mysqld churning and processing data (rather than wait, steal, network wait, etc)? * What is the CPU growth rate for your systems?
  • #6: Are you memory bound? Does the hotset of data fit in memory? Are your application queries optimized to accurately target the required information, rather than pulling unnecessary data into memory? (select * vs select 1) How about indexes? Are you over indexed? Under indexed? Given data growth rate, when is the next RAM increase required to avoid serious performance degradation?
  • #7: Are you disk resource heavy? Are you IOPS bound? What is the max IOPS ceiling for your disk configuration? Are disk latencies within acceptable range for the required query and application response time? Does your application lean on sequential reads, such as range queries or large chunk processing? Is the query profile more random in nature? If so, are your disk heads thrashing (if on spinning disk) Is flash memory indicated for your dataset and application needs? What is your disk usage consumption rate? When is the next storage resource addition required? How will you add that capacity? Are you going to extend the existing filesystem via LVM? Can you extend the existing filesystem natively? (xfs_grow) Do you use a filesystem-per-database strategy (symlinks)?
  • #8: At a higher level, do you have growth rate and and topology scaling plan for: * Master cluster servers (via technologies such as Percona Cluster or Galera) * Master shard servers (via homegrown or mysql extension technologies such as ClusterixDB) * Slave servers (used in read-heavy environments) * DevTest instances that require copies of production datasets * Reporting instances for consolidated warehouse querying by various departments * Even if you’re running in the Public Cloud, such as AWS, there comes a point in the growth curve where you have to evaluate the costs of not bringing servers back in-house and creating your own Private Cloud to control costs. True dedicated performance in the public cloud understandably costs a considerable amount more than oversubscribed best-effort shared resource pools (example: pIOPS database storage volumes)
  • #9: All of these things require you to understand your systems current resource consumption profiles, and be able to quantify the change rate over time, and in relation to other growth factors, such as application end-users (for every million users of our platform we add, we’ll require X additional resources for the MySQL layer), or in relation to internal company growth (for every 10 developers we add, we’ll require Y additional MySQL resource pools). Gathering and accessing this data absolutely requires the ability to monitor every layer of your platform stack, and there are a great number of monitoring tools that you can use to instrument your environment in order to collect, graph, and analyze this data over time. Nagios + Graphite, Zabbix, Zenoss, Munin, Cacti, and many more. The most important point is that you capture this data over time and reserve time to analyze and understand growth trends. These tools also are invaluable for real-time troubleshooting, allowing you to zero in and better ascertain and pinpoint root cause during incidents. Incident data can also be used to set up predictive and preventative alerting to avoid outages before they happen. This type of tooling, data, and insight, in turn, enables one to confidently plan and to be responsive to the needs of the present, but as datasets get larger, so do the operational challenges.
  • #10: DBAs and DevOps teams are faced with a specific class of problems when MySQL datasets move toward the TB and multi-TB range, or when the scale of various copies of the system exceeds the cost-benefit of manual operations. Automation becomes an increasingly important dimension of the MySQL landscape. When very large databases (VLDBs) are in play, the time it takes to copy or manipulate the corresponding data volumes becomes a major operational hinderance, and a whole host of strategies and tactics need to be employed to reduce or eliminate performance impact and downtime that would otherwise be required during maintenance operations. Operations staff touch-time requirements can also be minimized with various tools and techniques. * Quickly respond to requirements to increase performance * * In virtualized environments, CPU and RAM allocations can be added quickly (though a restart is likely required) * Storage capacity can be added, or storage allocations can be reconfigured to provide additional throughput * In advanced storage systems, you can even dial up the IOPS allocations for a MySQL data volume on the fly, allowing more throughput to the database in real time, no restart required * Deploy additional read-slaves for copies quickly * Refresh Dev/Test, QA, and business copies of databases * Improve backup and restore times As your enterprise grows, your data gravity increases. This comes along with additional customers, end-user, and organizational growth. This increased data mass can mean decreasing agility and drag on your business in the form of increased MySQL platform management complexity, longer response times for provisioning database systems, delays in refreshing of business reporting and data warehouse systems, among other challenges.
  • #11: As compute platforms get more flexible, both in the public cloud, and in private clouds, it becomes increasingly possible to turn the traditional model of database platform management on its head. Rather than seeing the data layer as the child or property of the compute layer, Make the compute layer an attribute of the dataset. With data gravity and ‘stickiness’ being the operational bottleneck, you can pivot on your storage layer and use the capabilities of modern shared storage systems, combined with virtualization, allow you to flip that model. I refer to this as the technique of Storage Jujitsu. Some of the capabilities include instant snapshots and quick volume cloning, or writable snapshots. Data de-duplication for very efficient use of high performance shared storage platforms. QoS or Quality of Service allocation to segregate performance resources between storage volumes, allowing you to put production MySQL database systems on the same storage array as DevTest and other secondary systems without fear of resource conflict (commonly known as the noisy neighbor effect). Scale-out architectures are also a great development in storage, allowing for seamless storage growth without the need for huge migrations and massive capital poured in to system refreshes. These systems are all designed to make efficient use of storage resources and avoid the need to over provision. The key here is to avoid the use of the network to copy and transfer data, but instead, lean on the native capabilities of the underlying storage layer.
  • #12: Let’s take a look at some examples of how this can be done, in practice. Here are some the operational models we use at SolidFire for managing our Percona Server MySQL platforms that have been completely reimagined and retooled to leverage next-generation storage capabilities. - Dev/test secondary copies - Replication slave creation - Backup windows - Restore scenarios