SlideShare a Scribd company logo
How people build software!
MySQL Infrastructure
Testing Automation 

@ GitHub
IkeWalker
GitHub
Boston MySQL Meetup
December 11, 2017
1
!
How people build software!
Agenda
• About
• MySQL @ GitHub
• Automation
• Backup/restores
• Failovers
• Schema migrations
2
!
How people build software!
About me
• Database Architect
• Working with MySQL since 2006
• Organizer of Boston MySQL Meetup
github.com/ikewalker
@iowalker
3
!
How people build software! 4
• The world’s largest Octocat T-shirt and stickers store
• And water bottles
• And hoodies
• We also do stuff related to things
• Word is new swag is coming up
GitHub
How people build software!
GitHub
• 66M repositories
• 24M developers
• 117K businesses
• More than a million teams
• World’s largest open source hosting
• Alexa top 100
• Critical path in build flows
5
!
How people build software!
MySQL at GitHub
• GitHub stores repositories in git, and uses MySQL
as the backend database for all related metadata:
• Repository metadata, users, issues, pull
requests, comments etc.
• Website/API/Auth/more all use MySQL.
• We run a few (growing number of) clusters, totaling
over 100 MySQL servers.
• The setup isn’t very large but very busy.
6
!
How people build software!
MySQL at GitHub
• Our MySQL servers must be available, responsive
and in good state
• GitHub has 99.95% SLA
• Availability issues must be handled quickly, as
automatically as possible.
7
!
How people build software!
github/database-infrastructure
• @ggunson, @jessbreckenridge, @jonahberquist,
@shlomi-noach, @tomkrouper, @gtowey
• Concerned with:
• Data availability
• Data integrity
8
!
How people build software!
Testing
9
!
How people build software!
Backups/restores
that ^
10
How people build software!
Your data
It’s important
11
!
How people build software!
Restores
• Dedicated restore servers.
• One per cluster.
• Continuously restores, catches up with replication,
restores, catches up with replication, restores, …
• Sending a “success” event at the end of each cycle.
• We monitor for number of “success” events in past
24-ish hours, per cluster.
12
!
How people build software! 13
!
!
!
!
!
production replicas
auto-restore replica
master
!
auto-restore replicas
""""""
backup replica
How people build software!
Restores
• New host provisioning uses same flow as restore.
• A human may kick a restore/reclone manually.
• Chatops: 

.mysql backup-restore -H restore.this.host -r
14
!
How people build software!
Restore failure
• A specific backup/restore may fail because
computers.
• No reason for panic.
• Previous backup/restores proven to be working
• At most we lose time
• Two sequential failures, or failures across clusters
are incidents to be investigated
15
!
How people build software!
Restore: delayed replica
• One delayed replica per cluster
• Lagging at 4 hours
• Chatops: .mysql panic
16
!
How people build software!
Failovers
^ that, too
17
How people build software!
MySQL setup @ GitHub
• Plain-old single writer master-replicas
asynchronous replication.
• Not yet semi-sync
• Cross DC, multiple data centers
• 5.7, RBR
• Servers with special roles: production replica,
backup, auto-restore, migration-test, analytics, …
• 2-3 tiers of replication
• Occasional cluster split (functional sharding)
• Very dynamic, always changing
18
!
How people build software!
Points of failure
• Master failure, sev1
• Intermediate masters failure
19
!
! !
!
!
!
! !
!
!
How people build software!
orchestrator
• Topology discovery
• Refactoring
• Failovers for masters and intermediate masters
• Open source, Apache 2 license
• github.com/github/orchestrator
20
!
How people build software!
orchestrator failovers @ GitHub
• Automated master & intermediate master failovers
for all clusters.
• On failover, runs GitHub-specific hooks
• Grabbing VIP/DNS
• Updating server role
• Kicking services (e.g. pt-heartbeat)
• Notifying chat
• Running puppet
21
!
How people build software!
Testing cluster
• Dedicated testing cluster in production
• Does not take production traffic
• “load-test” traffic
• Resembles a production topology:
• OS, MySQL Versions
• Data centers
• Server roles
• DNS
• Proxy
• Used for many of our deployment tests
22
!
How people build software!
Failover testing
• Multiple times per day:
• Setup the cluster in desired topology layout
• Inject failure (kill/block/reject)
• Wait, expect recovery
• Check topology:
• Expect new master, correct DNS changes,
replica capacity, …
• Restore old master from backup
• (an implicit backup/restore test)
• “success/failure” event
23
!
How people build software!
Failover in production
• We expect < 30s failover
• Intermediate master failover has low impact on
subset of users, depending on cluster/DC/server
• Master failover implies outage
• Planned master switchover takes a few seconds
24
!
How people build software!
A moment of reflection
25
How people build software!
What builds trust in failovers?
A testing environment?
26
!
How people build software!
Chaos testing in production
• First steps into regular testing
• Manual
• Supported by our peers
• Learning, understanding impact
27
!
How people build software!
Tests that go wrong
• Many things can go wrong
• Corrupt replication
• Invalidated servers
• Unassigned DNS
• Cleanups
28
!
How people build software!
Schema migrations
29
How people build software!
Is your data correct?
The data you see is merely a ghost of your original data
30
!
How people build software!
gh-ost
• Young. 16 months old.
• In production at GitHub since born.
• Software
• Bugs
• Development
• Bugs
31
How people build software!
gh-ost testing
• gh-ost works perfectly well on our data
• Tested, re-tested, and tested again
• Full coverage of production tables
32
How people build software!
gh-ost testing servers
• Dedicated servers that run continuous tests
33
How people build software! 34
!
!
!
#
!
!
production replicas
testing replica
master
!
gh-ost testing replicas
!
!
!
#
!
!
production replicas
testing replica
master
!
How people build software!
gh-ost testing
• Trivial ENGINE=INNODB migration
• Stop replication
• Cut-over, cut-back
• Checksum both tables, compare
• Checksum failure: stop the world, alert
• Success/failure: event
• Drop ghost table
• Catch up
• Next table
35
How people build software!
gh-ost development cycle
• Work on branch

.deploy gh-ost/mybranch to prod/mysql_role=ghost_testing
• Let continuous tests run
• Depending on nature of change, observe hours/days/more.
• Merge
• Tests run regardless of deployed branch
36
How people build software!
Conclusion
• Backup & restore
• Failovers
• Schema migrations
37
How people build software!
Thank you!
Questions?
github.com/ikewalker
@iowalker
38
!

More Related Content

What's hot (20)

PPTX
Severalnines Self-Training: MySQL® Cluster - Part II
Severalnines
 
PDF
MySQL Performance Tuning
FromDual GmbH
 
PPTX
Severalnines Self-Training: MySQL® Cluster - Part V
Severalnines
 
PPTX
From cache to in-memory data grid. Introduction to Hazelcast.
Taras Matyashovsky
 
PDF
Run Cloud Native MySQL NDB Cluster in Kubernetes
Bernd Ocklin
 
PDF
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document Store
Filipe Silva
 
PDF
MySQL Performance - Best practices
Ted Wennmark
 
PDF
5 Steps to PostgreSQL Performance
Command Prompt., Inc
 
PDF
MySQL 5.7: Focus on InnoDB
Mario Beck
 
PDF
01 upgrade to my sql8
Ted Wennmark
 
PDF
Představení produktové řady Oracle SPARC S7
MarketingArrowECS_CZ
 
PPTX
Super cluster oracleday cl 7
Claudio Osvaldo Vargas Farfan
 
PDF
MySQL High-Availability and Scale-Out architectures
FromDual GmbH
 
PDF
MySQL configuration - The most important Variables
FromDual GmbH
 
PPTX
Simplify IT: Oracle SuperCluster
Fran Navarro
 
PDF
Security a SPARC M7 CPU
MarketingArrowECS_CZ
 
PDF
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Hazelcast
 
PDF
MOUG17 Keynote: Oracle OpenWorld Major Announcements
Monica Li
 
PPTX
Understanding the IBM Power Systems Advantage
IBM Power Systems
 
PPTX
Presenta completaoow2013
Fran Navarro
 
Severalnines Self-Training: MySQL® Cluster - Part II
Severalnines
 
MySQL Performance Tuning
FromDual GmbH
 
Severalnines Self-Training: MySQL® Cluster - Part V
Severalnines
 
From cache to in-memory data grid. Introduction to Hazelcast.
Taras Matyashovsky
 
Run Cloud Native MySQL NDB Cluster in Kubernetes
Bernd Ocklin
 
Connector/J Beyond JDBC: the X DevAPI for Java and MySQL as a Document Store
Filipe Silva
 
MySQL Performance - Best practices
Ted Wennmark
 
5 Steps to PostgreSQL Performance
Command Prompt., Inc
 
MySQL 5.7: Focus on InnoDB
Mario Beck
 
01 upgrade to my sql8
Ted Wennmark
 
Představení produktové řady Oracle SPARC S7
MarketingArrowECS_CZ
 
Super cluster oracleday cl 7
Claudio Osvaldo Vargas Farfan
 
MySQL High-Availability and Scale-Out architectures
FromDual GmbH
 
MySQL configuration - The most important Variables
FromDual GmbH
 
Simplify IT: Oracle SuperCluster
Fran Navarro
 
Security a SPARC M7 CPU
MarketingArrowECS_CZ
 
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Hazelcast
 
MOUG17 Keynote: Oracle OpenWorld Major Announcements
Monica Li
 
Understanding the IBM Power Systems Advantage
IBM Power Systems
 
Presenta completaoow2013
Fran Navarro
 

Similar to MySQL Infrastructure Testing Automation at GitHub (20)

PDF
Building New on Top of Old: The Argument for Simplicity
New Relic
 
PDF
5/ GitHub Inner Source @ OPEN'16
Kangaroot
 
PDF
DOES SFO 2016 - Greg Padak - Default to Open
Gene Kim
 
PDF
How GitHub Builds Software at Ruby Conference Kenya 2017 by Mike McQuaid
Michael Kimathi
 
PDF
Delivery Free of Charge
All Things Open
 
PDF
Research Software Engineering A Guide To The Open Source Ecosystem Matthias B...
kleksramble
 
PPTX
EMC World 2016 - code.10 Jumpstart your Open Source Presence through new Coll...
{code}
 
PPTX
SE Lecture 1.pptxfkkkkkkkkkkkkkkkkkkfkfk
mesumjiwani
 
PPTX
The Professional Programmer
Dave Cross
 
PDF
Sustaining Open Source Software
Stephen Walli
 
PPTX
BNI Tech forum- Knowledge Series _ GNU_Linux & FOSS-Free Open Source Softwar...
Pavan More
 
PDF
HIS 2015: Prof. Mark Little - Open Source Challenges in the Enterprise
AdaCore
 
PDF
Jeff Lindsay: Building Public Infrastructure with Autosustainable Services
it-people
 
PDF
How Modern Software Architecture Benefits from Patterns Found in Natural Comp...
Jeremiah Jones
 
PPTX
The Introduction to Software Engineering
ROZLINABINTIMOHAMED
 
PDF
Ice dec05-04-wan leung
Chun Ming Au Yeung
 
PDF
Hitchhikers Guide to Participating in Open Source - Long Version
Elena Williams
 
PDF
Modern software architectures - PHP UK Conference 2015
Ricard Clau
 
PPTX
The world is not black and white – Impact of decisions over the lifetime of a...
Eric Reiche
 
PDF
Intro to DevOps 4 undergraduates
Liran Levy
 
Building New on Top of Old: The Argument for Simplicity
New Relic
 
5/ GitHub Inner Source @ OPEN'16
Kangaroot
 
DOES SFO 2016 - Greg Padak - Default to Open
Gene Kim
 
How GitHub Builds Software at Ruby Conference Kenya 2017 by Mike McQuaid
Michael Kimathi
 
Delivery Free of Charge
All Things Open
 
Research Software Engineering A Guide To The Open Source Ecosystem Matthias B...
kleksramble
 
EMC World 2016 - code.10 Jumpstart your Open Source Presence through new Coll...
{code}
 
SE Lecture 1.pptxfkkkkkkkkkkkkkkkkkkfkfk
mesumjiwani
 
The Professional Programmer
Dave Cross
 
Sustaining Open Source Software
Stephen Walli
 
BNI Tech forum- Knowledge Series _ GNU_Linux & FOSS-Free Open Source Softwar...
Pavan More
 
HIS 2015: Prof. Mark Little - Open Source Challenges in the Enterprise
AdaCore
 
Jeff Lindsay: Building Public Infrastructure with Autosustainable Services
it-people
 
How Modern Software Architecture Benefits from Patterns Found in Natural Comp...
Jeremiah Jones
 
The Introduction to Software Engineering
ROZLINABINTIMOHAMED
 
Ice dec05-04-wan leung
Chun Ming Au Yeung
 
Hitchhikers Guide to Participating in Open Source - Long Version
Elena Williams
 
Modern software architectures - PHP UK Conference 2015
Ricard Clau
 
The world is not black and white – Impact of decisions over the lifetime of a...
Eric Reiche
 
Intro to DevOps 4 undergraduates
Liran Levy
 
Ad

Recently uploaded (20)

PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Ad

MySQL Infrastructure Testing Automation at GitHub

  • 1. How people build software! MySQL Infrastructure Testing Automation 
 @ GitHub IkeWalker GitHub Boston MySQL Meetup December 11, 2017 1 !
  • 2. How people build software! Agenda • About • MySQL @ GitHub • Automation • Backup/restores • Failovers • Schema migrations 2 !
  • 3. How people build software! About me • Database Architect • Working with MySQL since 2006 • Organizer of Boston MySQL Meetup github.com/ikewalker @iowalker 3 !
  • 4. How people build software! 4 • The world’s largest Octocat T-shirt and stickers store • And water bottles • And hoodies • We also do stuff related to things • Word is new swag is coming up GitHub
  • 5. How people build software! GitHub • 66M repositories • 24M developers • 117K businesses • More than a million teams • World’s largest open source hosting • Alexa top 100 • Critical path in build flows 5 !
  • 6. How people build software! MySQL at GitHub • GitHub stores repositories in git, and uses MySQL as the backend database for all related metadata: • Repository metadata, users, issues, pull requests, comments etc. • Website/API/Auth/more all use MySQL. • We run a few (growing number of) clusters, totaling over 100 MySQL servers. • The setup isn’t very large but very busy. 6 !
  • 7. How people build software! MySQL at GitHub • Our MySQL servers must be available, responsive and in good state • GitHub has 99.95% SLA • Availability issues must be handled quickly, as automatically as possible. 7 !
  • 8. How people build software! github/database-infrastructure • @ggunson, @jessbreckenridge, @jonahberquist, @shlomi-noach, @tomkrouper, @gtowey • Concerned with: • Data availability • Data integrity 8 !
  • 9. How people build software! Testing 9 !
  • 10. How people build software! Backups/restores that ^ 10
  • 11. How people build software! Your data It’s important 11 !
  • 12. How people build software! Restores • Dedicated restore servers. • One per cluster. • Continuously restores, catches up with replication, restores, catches up with replication, restores, … • Sending a “success” event at the end of each cycle. • We monitor for number of “success” events in past 24-ish hours, per cluster. 12 !
  • 13. How people build software! 13 ! ! ! ! ! production replicas auto-restore replica master ! auto-restore replicas """""" backup replica
  • 14. How people build software! Restores • New host provisioning uses same flow as restore. • A human may kick a restore/reclone manually. • Chatops: 
 .mysql backup-restore -H restore.this.host -r 14 !
  • 15. How people build software! Restore failure • A specific backup/restore may fail because computers. • No reason for panic. • Previous backup/restores proven to be working • At most we lose time • Two sequential failures, or failures across clusters are incidents to be investigated 15 !
  • 16. How people build software! Restore: delayed replica • One delayed replica per cluster • Lagging at 4 hours • Chatops: .mysql panic 16 !
  • 17. How people build software! Failovers ^ that, too 17
  • 18. How people build software! MySQL setup @ GitHub • Plain-old single writer master-replicas asynchronous replication. • Not yet semi-sync • Cross DC, multiple data centers • 5.7, RBR • Servers with special roles: production replica, backup, auto-restore, migration-test, analytics, … • 2-3 tiers of replication • Occasional cluster split (functional sharding) • Very dynamic, always changing 18 !
  • 19. How people build software! Points of failure • Master failure, sev1 • Intermediate masters failure 19 ! ! ! ! ! ! ! ! ! !
  • 20. How people build software! orchestrator • Topology discovery • Refactoring • Failovers for masters and intermediate masters • Open source, Apache 2 license • github.com/github/orchestrator 20 !
  • 21. How people build software! orchestrator failovers @ GitHub • Automated master & intermediate master failovers for all clusters. • On failover, runs GitHub-specific hooks • Grabbing VIP/DNS • Updating server role • Kicking services (e.g. pt-heartbeat) • Notifying chat • Running puppet 21 !
  • 22. How people build software! Testing cluster • Dedicated testing cluster in production • Does not take production traffic • “load-test” traffic • Resembles a production topology: • OS, MySQL Versions • Data centers • Server roles • DNS • Proxy • Used for many of our deployment tests 22 !
  • 23. How people build software! Failover testing • Multiple times per day: • Setup the cluster in desired topology layout • Inject failure (kill/block/reject) • Wait, expect recovery • Check topology: • Expect new master, correct DNS changes, replica capacity, … • Restore old master from backup • (an implicit backup/restore test) • “success/failure” event 23 !
  • 24. How people build software! Failover in production • We expect < 30s failover • Intermediate master failover has low impact on subset of users, depending on cluster/DC/server • Master failover implies outage • Planned master switchover takes a few seconds 24 !
  • 25. How people build software! A moment of reflection 25
  • 26. How people build software! What builds trust in failovers? A testing environment? 26 !
  • 27. How people build software! Chaos testing in production • First steps into regular testing • Manual • Supported by our peers • Learning, understanding impact 27 !
  • 28. How people build software! Tests that go wrong • Many things can go wrong • Corrupt replication • Invalidated servers • Unassigned DNS • Cleanups 28 !
  • 29. How people build software! Schema migrations 29
  • 30. How people build software! Is your data correct? The data you see is merely a ghost of your original data 30 !
  • 31. How people build software! gh-ost • Young. 16 months old. • In production at GitHub since born. • Software • Bugs • Development • Bugs 31
  • 32. How people build software! gh-ost testing • gh-ost works perfectly well on our data • Tested, re-tested, and tested again • Full coverage of production tables 32
  • 33. How people build software! gh-ost testing servers • Dedicated servers that run continuous tests 33
  • 34. How people build software! 34 ! ! ! # ! ! production replicas testing replica master ! gh-ost testing replicas ! ! ! # ! ! production replicas testing replica master !
  • 35. How people build software! gh-ost testing • Trivial ENGINE=INNODB migration • Stop replication • Cut-over, cut-back • Checksum both tables, compare • Checksum failure: stop the world, alert • Success/failure: event • Drop ghost table • Catch up • Next table 35
  • 36. How people build software! gh-ost development cycle • Work on branch
 .deploy gh-ost/mybranch to prod/mysql_role=ghost_testing • Let continuous tests run • Depending on nature of change, observe hours/days/more. • Merge • Tests run regardless of deployed branch 36
  • 37. How people build software! Conclusion • Backup & restore • Failovers • Schema migrations 37
  • 38. How people build software! Thank you! Questions? github.com/ikewalker @iowalker 38 !