SlideShare a Scribd company logo
Sharding MySQL with Vitess
Harun Küçük
What is Sharding?
Sharding is a type of database partitioning that separates very large databases
into smaller,faster and more easily managed parts called data shards.
• Non-Scalable Master
Why we need Sharding?
Sample Traditional MySQL Replication
• Scalable App Layer
• Scalable Replicas
Vitess
• Started 2010 , youtube https://vitess.io/
• Open source since 2011 https://github.com/vitessio/vitess
• Incubating project in CNCF https://www.cncf.io/projects/
Vitess Architecture
• Lightweight proxy server
• Routes traffic to correct vttablet
• Returns consolidated results back to the clients
Vitess Architecture
• Proxy server that sits in front of MySQL instance
• Protect MySQL from harmful queries
• Connection Pooling
• Query rewriting
• Hot row protection
Vitess Architecture
• Stores metadata (running servers,sharding schema,Replication Graph)
• Etcd, Apache Zookeeper or consul could be used for topology
Vitess Architecture
• Vtctl is command line tool, Vtctld is an HTTP server that lets
you browse the information stored in the topology.
Vitess Architecture
• Replica tablets: candidates for master tablet ,
Readonly tables: for batch jobs, resharding,bigdata,backups etc.
Vitess Key Adaptors
• Started 2010 at Youtube and It has been serving all Youtube
database traffic since 2011. Youtube had 256 shards and each
shards had between 80 and 120 replicas across 20 datacenters
all around the world. (Approx. 256K instance)
• JD.com is the 2nd largest retailer company in China. JD.com has
more than 10.000 instance (master,replicas) in Vitess on
kubernetes cluster
• Square Cash App fully runs on Vitess. Square has more than 64
shards.
• Slack migrated 40% database traffic to Vitess and their goal is
100%
• Pinterest’s all of advertising campaign management fully runs
on Vitess
Example: Sakila DVD Rental Company Database
Lets suppose we have a DVD rental company and our database diagram live below
Day 1: Table Rows
Query:
30 days later…
Query:
6 months later…
Query:
2 years later…
Query:
Entity Group for Sharding
Payment,Rental and Customer
tables have customer_id column.
So these three tables should be
sharded horizontally by
customer_id
Step 1: Vertical Sharding
Step 2: Horizontal Sharding
Step 3: Resharding
Prerequisites for Demo
• Vitess (https://github.com/vitessio/vitess)
• Kubernetes Cluster or Minicube
• Etcd operator for topology cluster
• Helm for vitess helm charts (vitess/helm/vitess)
• NFS client provisioner for persistent NFS volumes
(https://github.com/helm/charts/tree/master/stable/nfs-client-provisioner)
1.) initiate new cluster, Concepts
File: initial_cluster.yaml
• Cell : Zone, availability zone, datacenter
• KeySpace : Logical Database
• Vschema : Vitess Schema, contains
metadata about how tables are organized
across keyspaces and shards
• Vindex : index to find shards
1.) initiate new cluster
File: initial_cluster.yaml
1.) initiate new cluster
2.) Create New Keyspace
File:create_keyspace.yaml
2.) Create New Keyspace
File:create_keyspace.yaml
3.) Split Keyspace Schema
File:split_keyspace_schema.yaml
3.) Split Keyspace Schema
File:split_keyspace_schema.yaml
3.) Split KeySpace Schema
4.) Vertical Split Clone
File:vertical_split_clone.yaml
4.) Vertical Split Clone
File:vertical_split_clone.yaml
4.) Vertical Split Clone
4.) Migrate
File: migrate_readonly_replica.yaml
migrate_master.yaml
4.) Migrate ReadOnly and Replica
4.) Migrate Master
5.) Drop Blacklisted Tables
VtGate Quey Routing
VtGate Quey Routing
Horizontal Sharding
6.) Primary Vindex for Horizontal Sharding
File: create_vindex.yaml
Sharding by Hash Function
Sharding by Hash Function
1001
1000
1002
1003
1004
1005
1006
1007
Vitess Hash Algorithm (3des hash)
4000 = 4000000000000000
8000 = 8000000000000000
Vitess Hash Based Sharding (3des hash)
264 combination. theoretically allows us to have an infinite number
of shards
Vitess Hash Based Sharding (3des hash)
7.) Create Horizontal Shards
File: create_horizontal_shards.yaml
7.) Create Horizontal Shards
7.) Horizontal SplitClone
File: horizontal_split_clone.yaml
7.) Horizontal SplitClone
File: horizontal_split_clone.yaml
7.) Horizontal SplitClone
File: horizontal_split_clone.yaml
8.) Migrate RdOnly,Replica and Master
File: 13_migrate_readonly_replica.yaml
14_migrate_master.yaml
8.) Migrate RdOnly,Replica and Master
File: 13_migrate_readonly_replica.yaml
14_migrate_master.yaml
Querying Vitess Cluster - Counts
Querying Vitess Cluster - Routing
Querying Vitess Cluster - Routing
Querying Vitess Cluster – @replica,@rdonly
Querying Vitess Cluster – Cross Shard Join
Querying Vitess Cluster – Cross Shard Join
Querying Vitess Cluster – Cross Shard Join
= +
N *
N = row count of 2. query’s result
1
2
3
VReplication
Files:copy_schema_shard.sh,vreplication.sh,ap
ply_routing_rules.sh,add_reference_table.sh
Querying Vitess Cluster –Cross Shard
Join,VReplication
Querying Vitess Cluster –Scatter Query
Querying Vitess Cluster –Scatter Query
LookupVindex
Drawbacks: Application Transactions
• Best Effort Commit -> Consistency Problems
• 2 Phase Commit -> Performance Cost
Drawbacks: Distributed Deadlocks
Shard Level Deadlock
Drawbacks: Distributed Deadlocks
Database Level Deadlock
Application Transactions
Q&A
github.com/vitessio/vitess
vitess.io
vitess.slack.com
Thank You

More Related Content

What's hot (20)

PDF
MAA Best Practices for Oracle Database 19c
Markus Michalewicz
 
PDF
Performance Stability, Tips and Tricks and Underscores
Jitendra Singh
 
PPTX
Modeling Data and Queries for Wide Column NoSQL
ScyllaDB
 
PDF
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Sandesh Rao
 
PDF
MySQL Parallel Replication: inventory, use-case and limitations
Jean-François Gagné
 
PDF
Keep me in the Loop: INotify in HDFS
DataWorks Summit
 
PDF
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
Aurimas Mikalauskas
 
PDF
Backup and recovery in oracle
sadegh salehi
 
PDF
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
PDF
Building an open data platform with apache iceberg
Alluxio, Inc.
 
PDF
Making Apache Spark Better with Delta Lake
Databricks
 
PDF
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
PPTX
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
PDF
MySQL Administrator 2021 - 네오클로바
NeoClova
 
ODP
Transparent Hugepages in RHEL 6
Raghu Udiyar
 
PPTX
What’s New in Oracle Database 19c - Part 1
Satishbabu Gunukula
 
PDF
Apache Spark vs Apache Flink
AKASH SIHAG
 
PDF
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Anant Corporation
 
PPTX
Vitess VReplication: Standing on the Shoulders of a MySQL Giant
Matt Lord
 
PDF
MariaDB 마이그레이션 - 네오클로바
NeoClova
 
MAA Best Practices for Oracle Database 19c
Markus Michalewicz
 
Performance Stability, Tips and Tricks and Underscores
Jitendra Singh
 
Modeling Data and Queries for Wide Column NoSQL
ScyllaDB
 
Oracle Real Application Clusters 19c- Best Practices and Internals- EMEA Tour...
Sandesh Rao
 
MySQL Parallel Replication: inventory, use-case and limitations
Jean-François Gagné
 
Keep me in the Loop: INotify in HDFS
DataWorks Summit
 
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
Aurimas Mikalauskas
 
Backup and recovery in oracle
sadegh salehi
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Building an open data platform with apache iceberg
Alluxio, Inc.
 
Making Apache Spark Better with Delta Lake
Databricks
 
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
MySQL Administrator 2021 - 네오클로바
NeoClova
 
Transparent Hugepages in RHEL 6
Raghu Udiyar
 
What’s New in Oracle Database 19c - Part 1
Satishbabu Gunukula
 
Apache Spark vs Apache Flink
AKASH SIHAG
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Anant Corporation
 
Vitess VReplication: Standing on the Shoulders of a MySQL Giant
Matt Lord
 
MariaDB 마이그레이션 - 네오클로바
NeoClova
 

Similar to Sharding MySQL with Vitess (20)

PDF
A Technical Deep Dive on Protecting Acropolis Workloads with Rubrik
NEXTtour
 
PPTX
VMworld - sto7650 -Software defined storage @VMmware primer
Duncan Epping
 
PPTX
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
Malin Weiss
 
PPTX
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
Speedment, Inc.
 
PPTX
Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978
David Chou
 
KEY
DjangoCon 2010 Scaling Disqus
zeeg
 
PPTX
The Next Generation of Hyperconverged Infrastructure - Cisco
MarcoTechnologies
 
PDF
Vitess: Scalable Database Architecture - Kubernetes Community Days Africa Ap...
Alkin Tezuysal
 
PDF
MySQL Ecosystem in 2020
Alkin Tezuysal
 
PPTX
State of the Container Ecosystem
Vinay Rao
 
PDF
Better, faster, cheaper infrastructure with apache cloud stack and riak cs redux
John Burwell
 
PPTX
CISCO presentation used during the SWITCHPOINT NV/SA Quarterly Experience Day...
SWITCHPOINT NV/SA
 
PPTX
Riga dev day: Lambda architecture at AWS
Antons Kranga
 
PDF
Serverless SQL
Torsten Steinbach
 
PDF
Introduction of MariaDB AX / TX
GOTO Satoru
 
PDF
MongoDB WiredTiger Internals
Norberto Leite
 
PDF
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
confluent
 
PDF
Virtualization for Cloud Environment
Dr. Sunil Kr. Pandey
 
PDF
Cloud orchestration major tools comparision
Ravi Kiran
 
PPTX
Microservices with Apache Camel, Docker and Fabric8 v2
Christian Posta
 
A Technical Deep Dive on Protecting Acropolis Workloads with Rubrik
NEXTtour
 
VMworld - sto7650 -Software defined storage @VMmware primer
Duncan Epping
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
Malin Weiss
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
Speedment, Inc.
 
Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978
David Chou
 
DjangoCon 2010 Scaling Disqus
zeeg
 
The Next Generation of Hyperconverged Infrastructure - Cisco
MarcoTechnologies
 
Vitess: Scalable Database Architecture - Kubernetes Community Days Africa Ap...
Alkin Tezuysal
 
MySQL Ecosystem in 2020
Alkin Tezuysal
 
State of the Container Ecosystem
Vinay Rao
 
Better, faster, cheaper infrastructure with apache cloud stack and riak cs redux
John Burwell
 
CISCO presentation used during the SWITCHPOINT NV/SA Quarterly Experience Day...
SWITCHPOINT NV/SA
 
Riga dev day: Lambda architecture at AWS
Antons Kranga
 
Serverless SQL
Torsten Steinbach
 
Introduction of MariaDB AX / TX
GOTO Satoru
 
MongoDB WiredTiger Internals
Norberto Leite
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
confluent
 
Virtualization for Cloud Environment
Dr. Sunil Kr. Pandey
 
Cloud orchestration major tools comparision
Ravi Kiran
 
Microservices with Apache Camel, Docker and Fabric8 v2
Christian Posta
 
Ad

Recently uploaded (20)

PPTX
ENSA_Module_7.pptx_wide_area_network_concepts
RanaMukherjee24
 
PPTX
Online Cab Booking and Management System.pptx
diptipaneri80
 
PDF
All chapters of Strength of materials.ppt
girmabiniyam1234
 
PPTX
22PCOAM21 Session 1 Data Management.pptx
Guru Nanak Technical Institutions
 
PDF
Natural_Language_processing_Unit_I_notes.pdf
sanguleumeshit
 
PDF
2025 Laurence Sigler - Advancing Decision Support. Content Management Ecommer...
Francisco Javier Mora Serrano
 
PDF
Air -Powered Car PPT by ER. SHRESTH SUDHIR KOKNE.pdf
SHRESTHKOKNE
 
PPTX
Precedence and Associativity in C prog. language
Mahendra Dheer
 
PDF
勉強会資料_An Image is Worth More Than 16x16 Patches
NABLAS株式会社
 
PDF
Packaging Tips for Stainless Steel Tubes and Pipes
heavymetalsandtubes
 
PPTX
Chapter_Seven_Construction_Reliability_Elective_III_Msc CM
SubashKumarBhattarai
 
PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
PDF
Machine Learning All topics Covers In This Single Slides
AmritTiwari19
 
PDF
settlement FOR FOUNDATION ENGINEERS.pdf
Endalkazene
 
PPTX
Water resources Engineering GIS KRT.pptx
Krunal Thanki
 
PPTX
Module2 Data Base Design- ER and NF.pptx
gomathisankariv2
 
PDF
4 Tier Teamcenter Installation part1.pdf
VnyKumar1
 
PDF
Jual GPS Geodetik CHCNAV i93 IMU-RTK Lanjutan dengan Survei Visual
Budi Minds
 
PPTX
quantum computing transition from classical mechanics.pptx
gvlbcy
 
PDF
STUDY OF NOVEL CHANNEL MATERIALS USING III-V COMPOUNDS WITH VARIOUS GATE DIEL...
ijoejnl
 
ENSA_Module_7.pptx_wide_area_network_concepts
RanaMukherjee24
 
Online Cab Booking and Management System.pptx
diptipaneri80
 
All chapters of Strength of materials.ppt
girmabiniyam1234
 
22PCOAM21 Session 1 Data Management.pptx
Guru Nanak Technical Institutions
 
Natural_Language_processing_Unit_I_notes.pdf
sanguleumeshit
 
2025 Laurence Sigler - Advancing Decision Support. Content Management Ecommer...
Francisco Javier Mora Serrano
 
Air -Powered Car PPT by ER. SHRESTH SUDHIR KOKNE.pdf
SHRESTHKOKNE
 
Precedence and Associativity in C prog. language
Mahendra Dheer
 
勉強会資料_An Image is Worth More Than 16x16 Patches
NABLAS株式会社
 
Packaging Tips for Stainless Steel Tubes and Pipes
heavymetalsandtubes
 
Chapter_Seven_Construction_Reliability_Elective_III_Msc CM
SubashKumarBhattarai
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
Machine Learning All topics Covers In This Single Slides
AmritTiwari19
 
settlement FOR FOUNDATION ENGINEERS.pdf
Endalkazene
 
Water resources Engineering GIS KRT.pptx
Krunal Thanki
 
Module2 Data Base Design- ER and NF.pptx
gomathisankariv2
 
4 Tier Teamcenter Installation part1.pdf
VnyKumar1
 
Jual GPS Geodetik CHCNAV i93 IMU-RTK Lanjutan dengan Survei Visual
Budi Minds
 
quantum computing transition from classical mechanics.pptx
gvlbcy
 
STUDY OF NOVEL CHANNEL MATERIALS USING III-V COMPOUNDS WITH VARIOUS GATE DIEL...
ijoejnl
 
Ad

Sharding MySQL with Vitess