Managing 10,000 Node Storage
Clusters at Twitter
Tech Lead Real Time Storage, Twitter
Boaz Avital
@bx
Operations are a burden
Boaz Avital
Tech Lead, Real Time Storage
@bx
What’s hard about managing
Stateful Services?
Availability
Correctness
Scale
Manhattan
What’s hard about managing
Stateful Services?
Availability
Correctness
Scale
Requests can only be satisfied by specific nodes
Knowing Where Data Lives
#MyData
Requests can only be satisfied by specific nodes
Knowing Where Data Lives
#MyData
Requests can only be satisfied by specific nodes
Knowing Where Data Lives
#MyData
Requests can only be satisfied by specific nodes
Knowing Where Data Lives
Topology
Requests can only be satisfied by specific nodes
Knowing Where Data Lives
Topology
Requests can only be satisfied by specific nodes
Knowing Where Data Lives
$ manifest rolling-restart
What’s hard about managing
Stateful Services?
Availability
Correctness
Scale
Requests must be sent to the right place at the right time
Changing Where Data Lives
#MyData
Changing Where Data Lives
#MyData
Requests must be sent to the right place at the right time
Changing Where Data Lives
#MyData
Requests must be sent to the right place at the right time
Changing Where Data Lives
Placing data
Shards
Nodes
Changing Where Data Lives
Placing data
Changing Where Data Lives
Consistent Hashing
Placing data
Changing Where Data Lives
Shards
Nodes
Placing data
Changing Where Data Lives
A
B
C
D
A
F
E
B
E
Static Topology
Placing data
Changing Where Data Lives
Topology transitions
Snapshot and
stream
Direct writes to
both sets
Direct reads and
writes to new set
Normal
Changing Where Data Lives
Snapshot and
stream
Direct writes to
both sets
Direct reads and
writes to new set
Normal
Topology transitions
Changing Where Data Lives
Snapshot and
stream
Direct writes to
both sets
Direct reads and
writes to new set
Normal
Topology transitions
Changing Where Data Lives
Snapshot and
stream
Direct writes to
both sets
Direct reads and
writes to new set
Normal
Topology transitions
Changing Where Data Lives
Snapshot and
stream
Direct writes to
both sets
Direct reads and
writes to new set
Normal
Topology transitions
Changing Where Data Lives
Snapshot and
stream
Normal
Prepare to
receive writes
Direct writes
to both sets
Stop writing
to old set
Direct reads
to new set
Topology transitions
Changing Where Data Lives
Snapshot and
stream
Normal
Prepare to
receive writes
Direct writes
to both sets
Stop writing
to old set
Direct reads
to new set
Topology transitions
Changing Where Data Lives
Snapshot and
stream
Normal
Prepare to
receive writes
Direct writes
to both sets
Stop writing
to old set
Direct reads
to new set
Topology transitions
Changing Where Data Lives
Requests must be sent to the right place at the right time
$ screen
$ manifest topology-transition topology
...
replica_set37:
- host1.twitter.com
- host2.twitter.com
- host3.twitter.com
:w
$ manifest topology-checkout > topology
$ vi topology
Changing Where Data Lives
Strong Consistency
Topology transitions
Changing Where Data Lives
Strong Consistency
Snapshot and
stream
Normal
Prepare to
receive writes
Direct writes
to both sets
Stop writing
to old set
Direct reads
to new set
Topology transitions
Changing Where Data Lives
Strong Consistency
Snapshot and
stream
Stop writing
to old set
Direct reads
to new set
Normal
Prepare to
receive writes
Direct writes
to both sets
Topology transitions
Changing Where Data Lives
Strong Consistency
Snapshot and
stream
Normal
Prepare to
receive writes
Direct writes
to both sets
Stop writing
to old set
Direct reads
to new set
Topology transitions
Changing Where Data Lives
Snapshot
and stream
Normal
Prepare to
receive
writes
Direct
writes to
both sets
Stop
writing to
old set
Direct
reads to
new set
Count new
node
responses
New nodes
are caught
up
Strong Consistency
Topology transitions
Changing Where Data Lives
Snapshot
and stream
Normal
Prepare to
receive
writes
Direct
writes to
both sets
Stop
writing to
old set
Direct
reads to
new set
Count new
node
responses
New nodes
are caught
up
Strong Consistency
Topology transitions
Changing Where Data Lives
Snapshot
and stream
Normal
Prepare to
receive
writes
Direct
writes to
both sets
Stop
writing to
old set
Direct
reads to
new set
Count new
node
responses
New nodes
are caught
up
Strong Consistency
Topology transitions
Changing Where Data Lives
Snapshot
and stream
Normal
Prepare to
receive
writes
Direct
writes to
both sets
Stop
writing to
old set
Direct
reads to
new set
Count new
node
responses
New nodes
are caught
up
HDFS
Read Only
Topology transitions
Changing Where Data Lives
Requests must be sent to the right place at the right time
$ screen
$ manifest topology-transition topology
...
replica_set37:
- host1.twitter.com
- host2.twitter.com
- host3.twitter.com
:w
$ manifest topology-checkout > topology
$ vi topology
What’s hard about managing
Stateful Services?
Availability
Correctness
Scale
Humans can reason about actions
Small Scale Operations
Humans can reason about actions
Small Scale Operations
Humans can reason about actions
Small Scale Operations
Humans can reason about actions
Small Scale Operations
Humans can reason about actions
Small Scale Operations
Many simultaneous actions are required
Large Scale Operations
Many simultaneous actions are required
Large Scale Operations
Many simultaneous actions are required
Large Scale Operations
Many simultaneous actions are required
Large Scale Operations
Many simultaneous actions are required
Large Scale Operations
Many simultaneous actions are required
Large Scale Operations
Goal: Minimize operator burden
Ensure operations
are always safe to
execute
Minimize domain
knowledge to
increase effectiveness
1 3
Make operations
fire-and-forget with
no babysitting
2
Operators shouldn’t have to think
Building an Operations Service
Tooling that thinks so your operators don’t have to
Zookeeper
Ops Genie
Building an Operations Service
Tooling that thinks so your operators don’t have to
Goal-oriented architecture
Safe operational concurrency
Building an Operations Service
Tooling that thinks so your operators don’t have to
Old view New view
Add nodes Restart nodes Remove nodes
Building an Operations Service
Tooling that thinks so your operators don’t have to
Goal-oriented architecture
Safe operational concurrency
Incremental progress
Continuous data rebalancing
Building an Operations Service
Tooling that thinks so your operators don’t have to
Add host1 Add host2
Building an Operations Service
Tooling that thinks so your operators don’t have to
Goal-oriented architecture
Safe operational concurrency
Incremental progress
Continuous data rebalancing
Restart management
Liveness detection
Building an Operations Service
Tooling that thinks so your operators don’t have to
Zookeeper
Building an Operations Service
Tooling that thinks so your operators don’t have to
Goal-based architecture
Safe operational concurrency
Incremental progress
Continuous data rebalancing
Restart management
Liveness detection
Operation throttling
Rolling restarts
$ genieclient rolling-restart
Handling node failures
$ genieclient node-mark-dead --nodes=host1.twitter.com
Adding nodes to a cluster
$ genieclient node-add --nodes=host2,host3,host4
Pausing all operations
$ genieclient freeze-on
Automate everything
Scylla Summit 2017: Managing 10,000 Node Storage Clusters at Twitter
More interesting data placement algorithms
Future Work
Per-dataset backends based on usage patterns
Shard splitting that reacts to load
Generalized cluster manager
Questions
@b
x

More Related Content

PDF
Managing 10,000 Node Storage Clusters at Twitter
PPTX
Intro to hadoop
PDF
Application Metrics (with Prometheus examples) #PHPDD18
PDF
Application metrics - Confoo 2019
PDF
Application metrics with Prometheus - DPC18
PPTX
Msr2009 ian
PDF
New developments in open source ecosystem spark3.0 koalas delta lake
PDF
Time series databases
Managing 10,000 Node Storage Clusters at Twitter
Intro to hadoop
Application Metrics (with Prometheus examples) #PHPDD18
Application metrics - Confoo 2019
Application metrics with Prometheus - DPC18
Msr2009 ian
New developments in open source ecosystem spark3.0 koalas delta lake
Time series databases

What's hot (14)

DOCX
Enterprise Integration Pattern - Mule Soft Scatter gather
PPTX
Scaling on AWS to the First 10 Million Users
PDF
Data Antipatterns
PDF
Scaling to 1,000,000 concurrent users on the JVM
PDF
Workload-Aware: Auto-Scaling A new paradigm for Big Data Workloads
PPTX
Fraud Detection Architecture
PPTX
Boston hug
PDF
Presto at Tivo, Boston Hadoop Meetup
PDF
Design Patterns in Micro-services architectures & Gilmour
PDF
Pepperdata_CS_Opower
PDF
Democratizing Machine Learning: Perspective from a scikit-learn Creator
PDF
Improving HDFS Availability with Hadoop RPC Quality of Service
ODP
Clouds: All fluff and no substance?
PDF
API analytics with Redis and Google Bigquery. NoSQL matters edition
Enterprise Integration Pattern - Mule Soft Scatter gather
Scaling on AWS to the First 10 Million Users
Data Antipatterns
Scaling to 1,000,000 concurrent users on the JVM
Workload-Aware: Auto-Scaling A new paradigm for Big Data Workloads
Fraud Detection Architecture
Boston hug
Presto at Tivo, Boston Hadoop Meetup
Design Patterns in Micro-services architectures & Gilmour
Pepperdata_CS_Opower
Democratizing Machine Learning: Perspective from a scikit-learn Creator
Improving HDFS Availability with Hadoop RPC Quality of Service
Clouds: All fluff and no substance?
API analytics with Redis and Google Bigquery. NoSQL matters edition
Ad

Viewers also liked (20)

PDF
Scylla Summit 2017: How to Ruin Your Workload's Performance by Choosing the W...
PDF
Scylla Summit 2017: Migrating to Scylla From Cassandra and Others With No Dow...
PPTX
Scylla Summit 2017: From Elasticsearch to Scylla at Zenly
PDF
Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field
PDF
Scylla Summit 2017: Stateful Streaming Applications with Apache Spark
PDF
Scylla Summit 2017: Cry in the Dojo, Laugh in the Battlefield: How We Constan...
PDF
Scylla Summit 2017: Planning Your Queries for Maximum Performance
PDF
If You Care About Performance, Use User Defined Types
PPTX
Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL
PDF
Scylla Summit 2017: How to Optimize and Reduce Inter-DC Network Traffic and S...
PDF
Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...
PDF
Scylla Summit 2017: Scylla on Kubernetes
PDF
Scylla Summit 2017: How to Use Gocql to Execute Queries and What the Driver D...
PDF
Scylla Summit 2017: The Upcoming HPC Evolution
PPTX
Scylla Summit 2017: Scylla on Samsung NVMe Z-SSDs
PDF
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
PDF
Scylla Summit 2017: Intel Optane SSDs as the New Accelerator in Your Data Center
PDF
Scylla Summit 2017: Scylla's Open Source Monitoring Solution
PDF
Scylla Summit 2017: How to Run Cassandra/Scylla from a MySQL DBA's Point of View
PDF
Scylla Summit 2017: SMF: The Fastest RPC in the West
Scylla Summit 2017: How to Ruin Your Workload's Performance by Choosing the W...
Scylla Summit 2017: Migrating to Scylla From Cassandra and Others With No Dow...
Scylla Summit 2017: From Elasticsearch to Scylla at Zenly
Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field
Scylla Summit 2017: Stateful Streaming Applications with Apache Spark
Scylla Summit 2017: Cry in the Dojo, Laugh in the Battlefield: How We Constan...
Scylla Summit 2017: Planning Your Queries for Maximum Performance
If You Care About Performance, Use User Defined Types
Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL
Scylla Summit 2017: How to Optimize and Reduce Inter-DC Network Traffic and S...
Scylla Summit 2017: Repair, Backup, Restore: Last Thing Before You Go to Prod...
Scylla Summit 2017: Scylla on Kubernetes
Scylla Summit 2017: How to Use Gocql to Execute Queries and What the Driver D...
Scylla Summit 2017: The Upcoming HPC Evolution
Scylla Summit 2017: Scylla on Samsung NVMe Z-SSDs
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
Scylla Summit 2017: Intel Optane SSDs as the New Accelerator in Your Data Center
Scylla Summit 2017: Scylla's Open Source Monitoring Solution
Scylla Summit 2017: How to Run Cassandra/Scylla from a MySQL DBA's Point of View
Scylla Summit 2017: SMF: The Fastest RPC in the West
Ad

Similar to Scylla Summit 2017: Managing 10,000 Node Storage Clusters at Twitter (20)

PPTX
Hadoop Backup and Disaster Recovery
PPTX
Architecting for Massive Scalability - St. Louis Day of .NET 2011 - Aug 6, 2011
PDF
SVCC-2014
PDF
Building Highly-resilient Systems at Pinterest
PPTX
Cloud Architecture & Distributed Systems Trivia
PPTX
Ocassionally connected devices spark final
PPTX
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
PPTX
Взгляд на облака с точки зрения HPC
PPTX
Raising ux bar with offline first design
PDF
Microservices Antipatterns
PDF
Availability in a cloud native world v1.6 (Feb 2019)
PDF
Continuous Deployment & Delivery + Culture Hacks @ QCON 2012
PDF
Microservices: State of the Union
PPTX
Evolving HDFS to Generalized Storage Subsystem
PPTX
Scaling Systems: Architectures that grow
PDF
OSDC 2018 | From Monolith to Microservices by Paul Puschmann_
PDF
Nagios Conference 2014 - David Josephsen - Alert on What You Draw
PDF
Turning the web stack upside down rethinking how data flows through systems
PDF
Building data intensive applications
PDF
Revolutionizing Sleep: Scaling IoT Telemetry to 30+ Billion Daily Events by D...
Hadoop Backup and Disaster Recovery
Architecting for Massive Scalability - St. Louis Day of .NET 2011 - Aug 6, 2011
SVCC-2014
Building Highly-resilient Systems at Pinterest
Cloud Architecture & Distributed Systems Trivia
Ocassionally connected devices spark final
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Взгляд на облака с точки зрения HPC
Raising ux bar with offline first design
Microservices Antipatterns
Availability in a cloud native world v1.6 (Feb 2019)
Continuous Deployment & Delivery + Culture Hacks @ QCON 2012
Microservices: State of the Union
Evolving HDFS to Generalized Storage Subsystem
Scaling Systems: Architectures that grow
OSDC 2018 | From Monolith to Microservices by Paul Puschmann_
Nagios Conference 2014 - David Josephsen - Alert on What You Draw
Turning the web stack upside down rethinking how data flows through systems
Building data intensive applications
Revolutionizing Sleep: Scaling IoT Telemetry to 30+ Billion Daily Events by D...

More from ScyllaDB (20)

PDF
Build Real-Time ML Apps with Python, Feast & NoSQL
PDF
ScyllaDB Virtual Workshop: Getting Started with ScyllaDB
PDF
Understanding The True Cost of DynamoDB Webinar
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
PDF
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
PDF
New Ways to Reduce Database Costs with ScyllaDB
PDF
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
PDF
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
PDF
Leading a High-Stakes Database Migration
PDF
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
PDF
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
PDF
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
PDF
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
PDF
ScyllaDB: 10 Years and Beyond by Dor Laor
PDF
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
PDF
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
PDF
Vector Search with ScyllaDB by Szymon Wasik
PDF
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
PDF
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
PDF
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
Build Real-Time ML Apps with Python, Feast & NoSQL
ScyllaDB Virtual Workshop: Getting Started with ScyllaDB
Understanding The True Cost of DynamoDB Webinar
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
New Ways to Reduce Database Costs with ScyllaDB
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
Leading a High-Stakes Database Migration
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB: 10 Years and Beyond by Dor Laor
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
Vector Search with ScyllaDB by Szymon Wasik
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...

Recently uploaded (20)

PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PDF
Rapid Prototyping: A lecture on prototyping techniques for interface design
PPTX
Internet of Everything -Basic concepts details
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
PDF
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
PDF
Lung cancer patients survival prediction using outlier detection and optimize...
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
PDF
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
PDF
Auditboard EB SOX Playbook 2023 edition.
PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
PDF
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
PPTX
agenticai-neweraofintelligence-250529192801-1b5e6870.pptx
PDF
LMS bot: enhanced learning management systems for improved student learning e...
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PDF
SaaS reusability assessment using machine learning techniques
Early detection and classification of bone marrow changes in lumbar vertebrae...
Rapid Prototyping: A lecture on prototyping techniques for interface design
Internet of Everything -Basic concepts details
Improvisation in detection of pomegranate leaf disease using transfer learni...
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
Comparative analysis of machine learning models for fake news detection in so...
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
Lung cancer patients survival prediction using outlier detection and optimize...
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
Data Virtualization in Action: Scaling APIs and Apps with FME
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
Auditboard EB SOX Playbook 2023 edition.
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
agenticai-neweraofintelligence-250529192801-1b5e6870.pptx
LMS bot: enhanced learning management systems for improved student learning e...
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
SaaS reusability assessment using machine learning techniques

Scylla Summit 2017: Managing 10,000 Node Storage Clusters at Twitter

  • 1. Managing 10,000 Node Storage Clusters at Twitter Tech Lead Real Time Storage, Twitter Boaz Avital @bx
  • 3. Boaz Avital Tech Lead, Real Time Storage @bx
  • 4. What’s hard about managing Stateful Services? Availability Correctness Scale
  • 6. What’s hard about managing Stateful Services? Availability Correctness Scale
  • 7. Requests can only be satisfied by specific nodes Knowing Where Data Lives #MyData
  • 8. Requests can only be satisfied by specific nodes Knowing Where Data Lives #MyData
  • 9. Requests can only be satisfied by specific nodes Knowing Where Data Lives #MyData
  • 10. Requests can only be satisfied by specific nodes Knowing Where Data Lives Topology
  • 11. Requests can only be satisfied by specific nodes Knowing Where Data Lives Topology
  • 12. Requests can only be satisfied by specific nodes Knowing Where Data Lives $ manifest rolling-restart
  • 13. What’s hard about managing Stateful Services? Availability Correctness Scale
  • 14. Requests must be sent to the right place at the right time Changing Where Data Lives #MyData
  • 15. Changing Where Data Lives #MyData Requests must be sent to the right place at the right time
  • 16. Changing Where Data Lives #MyData Requests must be sent to the right place at the right time
  • 17. Changing Where Data Lives Placing data Shards Nodes
  • 18. Changing Where Data Lives Placing data
  • 19. Changing Where Data Lives Consistent Hashing Placing data
  • 20. Changing Where Data Lives Shards Nodes Placing data
  • 21. Changing Where Data Lives A B C D A F E B E Static Topology Placing data
  • 22. Changing Where Data Lives Topology transitions Snapshot and stream Direct writes to both sets Direct reads and writes to new set Normal
  • 23. Changing Where Data Lives Snapshot and stream Direct writes to both sets Direct reads and writes to new set Normal Topology transitions
  • 24. Changing Where Data Lives Snapshot and stream Direct writes to both sets Direct reads and writes to new set Normal Topology transitions
  • 25. Changing Where Data Lives Snapshot and stream Direct writes to both sets Direct reads and writes to new set Normal Topology transitions
  • 26. Changing Where Data Lives Snapshot and stream Direct writes to both sets Direct reads and writes to new set Normal Topology transitions
  • 27. Changing Where Data Lives Snapshot and stream Normal Prepare to receive writes Direct writes to both sets Stop writing to old set Direct reads to new set Topology transitions
  • 28. Changing Where Data Lives Snapshot and stream Normal Prepare to receive writes Direct writes to both sets Stop writing to old set Direct reads to new set Topology transitions
  • 29. Changing Where Data Lives Snapshot and stream Normal Prepare to receive writes Direct writes to both sets Stop writing to old set Direct reads to new set Topology transitions
  • 30. Changing Where Data Lives Requests must be sent to the right place at the right time $ screen $ manifest topology-transition topology ... replica_set37: - host1.twitter.com - host2.twitter.com - host3.twitter.com :w $ manifest topology-checkout > topology $ vi topology
  • 31. Changing Where Data Lives Strong Consistency Topology transitions
  • 32. Changing Where Data Lives Strong Consistency Snapshot and stream Normal Prepare to receive writes Direct writes to both sets Stop writing to old set Direct reads to new set Topology transitions
  • 33. Changing Where Data Lives Strong Consistency Snapshot and stream Stop writing to old set Direct reads to new set Normal Prepare to receive writes Direct writes to both sets Topology transitions
  • 34. Changing Where Data Lives Strong Consistency Snapshot and stream Normal Prepare to receive writes Direct writes to both sets Stop writing to old set Direct reads to new set Topology transitions
  • 35. Changing Where Data Lives Snapshot and stream Normal Prepare to receive writes Direct writes to both sets Stop writing to old set Direct reads to new set Count new node responses New nodes are caught up Strong Consistency Topology transitions
  • 36. Changing Where Data Lives Snapshot and stream Normal Prepare to receive writes Direct writes to both sets Stop writing to old set Direct reads to new set Count new node responses New nodes are caught up Strong Consistency Topology transitions
  • 37. Changing Where Data Lives Snapshot and stream Normal Prepare to receive writes Direct writes to both sets Stop writing to old set Direct reads to new set Count new node responses New nodes are caught up Strong Consistency Topology transitions
  • 38. Changing Where Data Lives Snapshot and stream Normal Prepare to receive writes Direct writes to both sets Stop writing to old set Direct reads to new set Count new node responses New nodes are caught up HDFS Read Only Topology transitions
  • 39. Changing Where Data Lives Requests must be sent to the right place at the right time $ screen $ manifest topology-transition topology ... replica_set37: - host1.twitter.com - host2.twitter.com - host3.twitter.com :w $ manifest topology-checkout > topology $ vi topology
  • 40. What’s hard about managing Stateful Services? Availability Correctness Scale
  • 41. Humans can reason about actions Small Scale Operations
  • 42. Humans can reason about actions Small Scale Operations
  • 43. Humans can reason about actions Small Scale Operations
  • 44. Humans can reason about actions Small Scale Operations
  • 45. Humans can reason about actions Small Scale Operations
  • 46. Many simultaneous actions are required Large Scale Operations
  • 47. Many simultaneous actions are required Large Scale Operations
  • 48. Many simultaneous actions are required Large Scale Operations
  • 49. Many simultaneous actions are required Large Scale Operations
  • 50. Many simultaneous actions are required Large Scale Operations
  • 51. Many simultaneous actions are required Large Scale Operations
  • 52. Goal: Minimize operator burden Ensure operations are always safe to execute Minimize domain knowledge to increase effectiveness 1 3 Make operations fire-and-forget with no babysitting 2 Operators shouldn’t have to think
  • 53. Building an Operations Service Tooling that thinks so your operators don’t have to Zookeeper Ops Genie
  • 54. Building an Operations Service Tooling that thinks so your operators don’t have to Goal-oriented architecture Safe operational concurrency
  • 55. Building an Operations Service Tooling that thinks so your operators don’t have to Old view New view Add nodes Restart nodes Remove nodes
  • 56. Building an Operations Service Tooling that thinks so your operators don’t have to Goal-oriented architecture Safe operational concurrency Incremental progress Continuous data rebalancing
  • 57. Building an Operations Service Tooling that thinks so your operators don’t have to Add host1 Add host2
  • 58. Building an Operations Service Tooling that thinks so your operators don’t have to Goal-oriented architecture Safe operational concurrency Incremental progress Continuous data rebalancing Restart management Liveness detection
  • 59. Building an Operations Service Tooling that thinks so your operators don’t have to Zookeeper
  • 60. Building an Operations Service Tooling that thinks so your operators don’t have to Goal-based architecture Safe operational concurrency Incremental progress Continuous data rebalancing Restart management Liveness detection Operation throttling
  • 62. Handling node failures $ genieclient node-mark-dead --nodes=host1.twitter.com
  • 63. Adding nodes to a cluster $ genieclient node-add --nodes=host2,host3,host4
  • 64. Pausing all operations $ genieclient freeze-on
  • 67. More interesting data placement algorithms Future Work Per-dataset backends based on usage patterns Shard splitting that reacts to load Generalized cluster manager