SlideShare a Scribd company logo
Scylla on Kubernetes:
Introducing the
Scylla Operator
Yannis Zarkadas, Software Engineer @ Arrikto
Presenter
Yannis Zarkadas, Software Engineer
■ Storage, DevOps, ML-Engineering
■ Open Source Enthusiast:
● Scylla Operator
● Cassandra Operator in rook.io
● Kubeflow
Problem Statement
● Great database
● Requires operational
expertise
● Great workload
management platform
Can we leverage Kubernetes to write a great management layer for Scylla ?
Pod
kubelet
Master
Node 1
kubelet
Node 2
kubelet
Node 3
kubelet
Node 4
API
Server
Pod
etcd nginx
Pod
MySQL
Pod
tomcat
Pod
kubectl apply
-f
save
Controllers
Scheduler
write
Various
Controllers
new
Pod
Node 4
new Pod
schedule
StatefulSet
Deploys and scales stateful software.
Provides guarantees for:
■ Pod uniqueness
● At most 1 of each Pod exists at any given time
■ Pod ordering
● Rolling Update and Deployment
■ Persistent network and storage identity
● DNS record and own Persistent Volume
storage
identity
network
identity
spec.replicas: status.replicas:
status.readyReplicas:
StatefulSet Controller
kubelet
Master
Node 1
kubelet
Node 2
kubelet
Node 3
kubelet
Node 4
API
Server
Pod
etcd
kubectl apply
-f
Controllers
StatefulSet
Controller
Various
Controllers
Node 4
write
Headless
Service
StatefulSet
save
scylla-0
Pod
scylla-0.scylla.
default.svc.cluster.local
scylla-1
Pod
scylla-1.scylla.
default.svc.cluster.local
scylla-2
Pod
scylla-2.scylla.
default.svc.cluster.local
3 0
0
1
1
2
2
3
3
Controller
Spec
(desired)
Status
(real)
Kubernetes
Objects
Controller Pattern
Used everywhere in Kubernetes
Observe
Calculate
Reconcile
Physical ResourcesPhysical ResourcesPhysical Resources
write
Custom Resource Definition
■ Store Custom Objects
■ Compatible with kubectl
● kubectl get clusters
The Operator Pattern
Controller
Observe
Calculate
Reconcile
write
Operator = Controller(s) + CRD(s)
Why the StatefulSet
is not enough
StatefulSet: Confined to 1 Rack
Member Pod
Cluster
Rack
Datacenter
StatefulSet
StatefulSet
StatefulSet
Multiple Racks ?
Multiple Datacenters?
Pod
Member
Safe Scale Down 0
44
88
132
176
220
● Want to leave
○ nodetool decommission
● Stream data
● Leave
Scylla Ring
member-0 Up
member-1 Up
member-2 Up
member-3 Up
member-4 Up
member-5 UpLeaving
Member
Member
Member
Member Member
Member
StatefulSet: Unsafe Scale Down
kubelet
Master
Node 1
kubelet
Node 2
kubelet
Node 3
kubelet
Node 4
API
Server
Pod
etcd
Controllers
StatefulSet
Controller
Various
Controllers
Node 4
scylla-1
Pod
scylla-1.scylla.
default.svc.cluster.local
spec.replicas: 2
scylla-2
Pod
scylla-2.scylla.
default.svc.cluster.local
StatefulSet
Scale Down?
spec.replicas: status.replicas:
status.readyReplicas:
3 0
0
1
1
2
2
3
3
kubectl apply
-f
save
2
Data not streamed!
Scylla Ring
scylla-0 Up
scylla-1 Up
scylla-2 UpDown
Potential Data Loss!
scylla-0
Pod
scylla-0.scylla.
default.svc.cluster.local
StatefulSet: Cannot track Member identity
kubelet
Master
Node 1
kubelet
Node 2
kubelet
Node 3
kubelet
Node 4
API
Server
Pod
etcd
Controllers
StatefulSet
Controller
Various
Controllers
Node 4
scylla-0
Pod
scylla-0.scylla.
default.svc.cluster.local
scylla-2
Pod
scylla-2.scylla.
default.svc.cluster.local
scylla-1
Pod
scylla-1.scylla.
default.svc.cluster.local
Member Joining
Replace Member? Add new Member?
Node Fail
Must know Member identity beforehand!
Vanilla Solution: StatefulSet
Problems with:
■ Seeds
■ Multi-zone deployment
■ Scale Down
■ Loss of Persistence
■ Backups/Restores
■ Extensibility
What if we could create management software in
the image of Kubernetes Controllers?
Design
Our goal
Operator = Controller(s) + CRD(s)
Controller
Observe
Calculate
Reconcile
write
StatefulSet
Pod
Rack N, Datacenter M
...
Cluster
Custom
Resource
Member
Services
(Static IP)
Controller
communication through Labels / Annotations
Member
Services
(Static IP)
Member
Services
(Static IP)
write
watch
Sidecar
JMX/HTTP
StatefulSet
Pod
Rack 1, Datacenter 1
Sidecar
JMX/HTTP
StatefulSet
Pod
Rack 1, Datacenter 2
Sidecar
JMX/HTTP
Mapping of Abstractions
Member Pod
Cluster
Rack
Datacenter
StatefulSet
StatefulSets
Cluster
Custom Resource
Sidecar
CRD + Controller + Sidecar
Sidecar
JMX/HTTP
Pod
Sidecar needed to:
■ Setup config files
■ Install plugins at startup
■ Backup and Restore functionality
■ Future extensibility
Member
An Alternative to DNS Records
Services already have a static IP, called ClusterIP.
Solution: ClusterIP Service per Pod
Drawbacks? :
■ Performance: iptables can handle a few hundred Members, IPVS
can handle thousands with no problem.
■ ClusterIP CIDR Depletion: Usually a /12 IP Block, so plenty of
addresses.
Much Requested Feature ->
■ What if we could have static IPs?
Implementation
Cluster Creation & Scale Up
kubelet
Master
Node 1
kubelet
Node 2
kubelet
Node 3
kubelet
Node 4
API
Server
Pod
etcd
Controllers
Scylla
Operator
Various
Controllers
eu-west1-b
eu-west1-c
Spec:
eu-west1-b: 1 Members
eu-west1-c: 2 Members
Status:
eu-west1-b: 0 Members 0 ReadyMembers
eu-west1-c: 0 Members 0 ReadyMembers
scylla-eu-west1-b-0
Pod
10.96.0.1
Member
Service
scylla-eu-west1-c-0
Pod
10.96.0.3
Member
Service
scylla-eu-west1-c-1
Pod
10.96.0.4
Member
Service
Scylla
Cluster
write
kubectl
apply
save
new Cluster
1 1
1 12 2
StatefulSet
eu-west1-c
replicas: 0
StatefulSet
eu-west1-b
replicas: 01
12
kubelet
Scale Down
Sidecar
scylla-eu-west1-c-1
Member
Pod
kubelet
Master
Node 1
kubelet
Node 3
Node 4
API
Server
Pod
etcd
Controllers
Scylla
Operator
Various
Controllers
eu-west1-b
eu-west1-c
Spec:
eu-west1-b: 1 Members
eu-west1-c: 2 Members
Status:
eu-west1-b: 0 Members 0 ReadyMembers
eu-west1-c: 0 Members 0 ReadyMembers
scylla-eu-west1-b-0
Pod
10.96.0.1
Member
Service
scylla-eu-west1-c-0
Pod
10.96.0.3
Member
Service
Scylla
Cluster
kubectl
apply
save
scale down eu-west1-c
Cluster changed
10.96.0.4
1 1
1 12 2
StatefulSet
eu-west1-c
replicas: 0
StatefulSet
eu-west1-b
replicas: 01
12
1
Member
Service
decommissioned: false
nodetool decommission
Node 4
Scylla Ring
scylla-eu-west1-b-0 Up
scylla-eu-west1-c-0 Up
scylla-eu-west1-c-1 UpLeaving
decommissioned: true
stream
data
kubelet
Node 2
Local Storage vs Network Attached
Local NVME
SSD
Network Attached Storage
(AWS EBS, Google Persistent
Disk)
■ Fast
■ Ephemeral
■ Slow
■ Fault-tolerant
Scylla handles replication => Use Local Storage!
v1.10: Local Persistent Volumes in Beta
Local Storage Failure Scenarios
■ Disk Misbehaves
● Block errors
● Deteriorating performance
■ Disk Fails
● Mount Point Disappears
■ Node Fails
● With Disk on it
■ Pod still runs
■ Unhandled by K8s
■ Pod fails to start
■ Unhandled by K8s
■ Pod fails to be scheduled
■ Unhandled by K8s
Common in the Cloud!
Node Fail
kubelet
Master
Node 1
kubelet
Node 2
kubelet
Node 4
API
Server
Pod
etcd
Controllers
Scylla
Operator
Various
Controllers
/mnt/ssd1 /mnt/ssd1
/mnt/ssd1
member-0
Pod
10.96.0.1
Member
Service
kubelet
Node 3
/mnt/ssd1
member-1
Pod
10.96.0.3
Member
Service
member-2
Pod
10.96.0.4
Member
Service
Node Fail
Admin / Fencing Software
Delete Node 3
StatefulSet changed
Recreate PVC
member-1
Pod
10.96.0.3
Member
Service
Empty Disk
kubelet
Node 2
/mnt/ssd1
member-1
Pod
10.96.0.3
Member
Service
Algorithm:
Cluster Member?
(search with IP)
Yes
Empty Disk ?
Stream Missing Data
(replace_address_first_boot option)
Yes
Node Fail Empty Disk
Demo
Take away
Kubernetes helps to manage Scylla, but has some limitations:
■ CPU Pinning
● Huge performance gains.
● Must be enabled in the kubelet.
● Many managed solutions don’t enable it.
■ Local Storage
● Supported but still needs improvement.
● Some vendors don’t offer high storage machines for K8s.
■ Multi-Region Clusters
● Still an unsolved problem.
“Cost of Containerization” by Moreno Garcia:
https://www.scylladb.com/2018/08/09/cost-containerization-scylla/
Future Work
Scylla Operator
■ Repairs with Scylla Manager
■ Multi-Region Clusters
● Very early support in Kubernetes
● LoadBalancer per Pod is a possible workaround
■ Backups and Restores
■ File your own issue:
● https://github.com/scylladb/scylla-operator
Kubernetes
■ Better Support for Local Storage
● Monitoring, scheduling
Thank you Stay in touch
Any questions?
Yannis Zarkadas
yanniszark@arrikto.com
@yanniszark

More Related Content

What's hot (20)

PDF
Parquet Strata/Hadoop World, New York 2013
Julien Le Dem
 
PDF
Top 5 Mistakes When Writing Spark Applications
Spark Summit
 
PPTX
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
PPTX
Introduction to Apache Spark
Rahul Jain
 
PDF
Hudi architecture, fundamentals and capabilities
Nishith Agarwal
 
PDF
Deep Dive: Memory Management in Apache Spark
Databricks
 
PDF
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
Altinity Ltd
 
PDF
Deep Dive into the New Features of Apache Spark 3.0
Databricks
 
PDF
Introduction to Apache Calcite
Jordan Halterman
 
PDF
Parquet performance tuning: the missing guide
Ryan Blue
 
PDF
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Julian Hyde
 
PDF
The Parquet Format and Performance Optimization Opportunities
Databricks
 
PPTX
RedisConf17- Using Redis at scale @ Twitter
Redis Labs
 
PDF
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
Altinity Ltd
 
PPTX
Apache Flink in the Cloud-Native Era
Flink Forward
 
PDF
P2P Container Image Distribution on IPFS With containerd and nerdctl
Kohei Tokunaga
 
PDF
Efficient Data Storage for Analytics with Apache Parquet 2.0
Cloudera, Inc.
 
PDF
Apache Spark on K8S and HDFS Security with Ilan Flonenko
Databricks
 
PDF
Spark shuffle introduction
colorant
 
PDF
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Yohei Onishi
 
Parquet Strata/Hadoop World, New York 2013
Julien Le Dem
 
Top 5 Mistakes When Writing Spark Applications
Spark Summit
 
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
Introduction to Apache Spark
Rahul Jain
 
Hudi architecture, fundamentals and capabilities
Nishith Agarwal
 
Deep Dive: Memory Management in Apache Spark
Databricks
 
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
Altinity Ltd
 
Deep Dive into the New Features of Apache Spark 3.0
Databricks
 
Introduction to Apache Calcite
Jordan Halterman
 
Parquet performance tuning: the missing guide
Ryan Blue
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Julian Hyde
 
The Parquet Format and Performance Optimization Opportunities
Databricks
 
RedisConf17- Using Redis at scale @ Twitter
Redis Labs
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
Altinity Ltd
 
Apache Flink in the Cloud-Native Era
Flink Forward
 
P2P Container Image Distribution on IPFS With containerd and nerdctl
Kohei Tokunaga
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Cloudera, Inc.
 
Apache Spark on K8S and HDFS Security with Ilan Flonenko
Databricks
 
Spark shuffle introduction
colorant
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Yohei Onishi
 

Similar to Scylla on Kubernetes: Introducing the Scylla Operator (20)

PDF
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
javier ramirez
 
PDF
LINE's Private Cloud - Meet Cloud Native World
LINE Corporation
 
PDF
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
Equnix Business Solutions
 
PDF
[WSO2Con Asia 2018] Deploying Applications in K8S and Docker
WSO2
 
PPTX
Data weekender deploying prod grade sql 2019 big data clusters
Chris Adkin
 
PDF
Best practices for optimizing Red Hat platforms for large scale datacenter de...
Jeremy Eder
 
PDF
State of Containers and the Convergence of HPC and BigData
inside-BigData.com
 
PPTX
Dayta AI Seminar - Kubernetes, Docker and AI on Cloud
Jung-Hong Kim
 
PPTX
Server 2016 sneak peek
Michael Rüefli
 
PDF
[WSO2Con EU 2018] Deploying Applications in K8S and Docker
WSO2
 
PDF
Running a database on local NVMes on Kubernetes
DoKC
 
PDF
Running a database on local NVMes on Kubernetes
DoKC
 
PDF
PGConf.ASIA 2019 Bali - PostgreSQL on K8S at Zalando - Alexander Kukushkin
Equnix Business Solutions
 
PDF
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
PDF
Apache Spark on K8s and HDFS Security
Databricks
 
PDF
Xen Virtualization 2008
mwlang88
 
PPTX
OpenEBS hangout #4
OpenEBS
 
PDF
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
Ed Balduf
 
PDF
OpenSlava Infrastructure Automation Patterns
Antons Kranga
 
PDF
Redis Meetup TLV - K8s Session 28/10/2018
Danni Moiseyev
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
javier ramirez
 
LINE's Private Cloud - Meet Cloud Native World
LINE Corporation
 
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
Equnix Business Solutions
 
[WSO2Con Asia 2018] Deploying Applications in K8S and Docker
WSO2
 
Data weekender deploying prod grade sql 2019 big data clusters
Chris Adkin
 
Best practices for optimizing Red Hat platforms for large scale datacenter de...
Jeremy Eder
 
State of Containers and the Convergence of HPC and BigData
inside-BigData.com
 
Dayta AI Seminar - Kubernetes, Docker and AI on Cloud
Jung-Hong Kim
 
Server 2016 sneak peek
Michael Rüefli
 
[WSO2Con EU 2018] Deploying Applications in K8S and Docker
WSO2
 
Running a database on local NVMes on Kubernetes
DoKC
 
Running a database on local NVMes on Kubernetes
DoKC
 
PGConf.ASIA 2019 Bali - PostgreSQL on K8S at Zalando - Alexander Kukushkin
Equnix Business Solutions
 
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
Apache Spark on K8s and HDFS Security
Databricks
 
Xen Virtualization 2008
mwlang88
 
OpenEBS hangout #4
OpenEBS
 
OpenStack Cinder, Implementation Today and New Trends for Tomorrow
Ed Balduf
 
OpenSlava Infrastructure Automation Patterns
Antons Kranga
 
Redis Meetup TLV - K8s Session 28/10/2018
Danni Moiseyev
 
Ad

More from ScyllaDB (20)

PDF
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
ScyllaDB
 
PDF
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
ScyllaDB
 
PDF
New Ways to Reduce Database Costs with ScyllaDB
ScyllaDB
 
PDF
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
PDF
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
ScyllaDB
 
PDF
Leading a High-Stakes Database Migration
ScyllaDB
 
PDF
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
ScyllaDB
 
PDF
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
ScyllaDB
 
PDF
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
ScyllaDB
 
PDF
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB
 
PDF
ScyllaDB: 10 Years and Beyond by Dor Laor
ScyllaDB
 
PDF
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
ScyllaDB
 
PDF
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
ScyllaDB
 
PDF
Vector Search with ScyllaDB by Szymon Wasik
ScyllaDB
 
PDF
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
ScyllaDB
 
PDF
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
ScyllaDB
 
PDF
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
ScyllaDB
 
PDF
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
ScyllaDB
 
PDF
Lessons Learned from Building a Serverless Notifications System by Srushith R...
ScyllaDB
 
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
ScyllaDB
 
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
ScyllaDB
 
New Ways to Reduce Database Costs with ScyllaDB
ScyllaDB
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
ScyllaDB
 
Leading a High-Stakes Database Migration
ScyllaDB
 
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
ScyllaDB
 
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
ScyllaDB
 
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
ScyllaDB
 
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB
 
ScyllaDB: 10 Years and Beyond by Dor Laor
ScyllaDB
 
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
ScyllaDB
 
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
ScyllaDB
 
Vector Search with ScyllaDB by Szymon Wasik
ScyllaDB
 
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
ScyllaDB
 
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
ScyllaDB
 
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
ScyllaDB
 
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
ScyllaDB
 
Lessons Learned from Building a Serverless Notifications System by Srushith R...
ScyllaDB
 
Ad

Recently uploaded (20)

PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PPTX
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PPTX
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
PDF
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
PDF
Market Insight : ETH Dominance Returns
CIFDAQ
 
PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
Market Insight : ETH Dominance Returns
CIFDAQ
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 

Scylla on Kubernetes: Introducing the Scylla Operator

  • 1. Scylla on Kubernetes: Introducing the Scylla Operator Yannis Zarkadas, Software Engineer @ Arrikto
  • 2. Presenter Yannis Zarkadas, Software Engineer ■ Storage, DevOps, ML-Engineering ■ Open Source Enthusiast: ● Scylla Operator ● Cassandra Operator in rook.io ● Kubeflow
  • 3. Problem Statement ● Great database ● Requires operational expertise ● Great workload management platform Can we leverage Kubernetes to write a great management layer for Scylla ?
  • 4. Pod kubelet Master Node 1 kubelet Node 2 kubelet Node 3 kubelet Node 4 API Server Pod etcd nginx Pod MySQL Pod tomcat Pod kubectl apply -f save Controllers Scheduler write Various Controllers new Pod Node 4 new Pod schedule
  • 5. StatefulSet Deploys and scales stateful software. Provides guarantees for: ■ Pod uniqueness ● At most 1 of each Pod exists at any given time ■ Pod ordering ● Rolling Update and Deployment ■ Persistent network and storage identity ● DNS record and own Persistent Volume storage identity network identity
  • 6. spec.replicas: status.replicas: status.readyReplicas: StatefulSet Controller kubelet Master Node 1 kubelet Node 2 kubelet Node 3 kubelet Node 4 API Server Pod etcd kubectl apply -f Controllers StatefulSet Controller Various Controllers Node 4 write Headless Service StatefulSet save scylla-0 Pod scylla-0.scylla. default.svc.cluster.local scylla-1 Pod scylla-1.scylla. default.svc.cluster.local scylla-2 Pod scylla-2.scylla. default.svc.cluster.local 3 0 0 1 1 2 2 3 3
  • 7. Controller Spec (desired) Status (real) Kubernetes Objects Controller Pattern Used everywhere in Kubernetes Observe Calculate Reconcile Physical ResourcesPhysical ResourcesPhysical Resources write
  • 8. Custom Resource Definition ■ Store Custom Objects ■ Compatible with kubectl ● kubectl get clusters
  • 11. StatefulSet: Confined to 1 Rack Member Pod Cluster Rack Datacenter StatefulSet StatefulSet StatefulSet Multiple Racks ? Multiple Datacenters? Pod Member
  • 12. Safe Scale Down 0 44 88 132 176 220 ● Want to leave ○ nodetool decommission ● Stream data ● Leave Scylla Ring member-0 Up member-1 Up member-2 Up member-3 Up member-4 Up member-5 UpLeaving Member Member Member Member Member Member
  • 13. StatefulSet: Unsafe Scale Down kubelet Master Node 1 kubelet Node 2 kubelet Node 3 kubelet Node 4 API Server Pod etcd Controllers StatefulSet Controller Various Controllers Node 4 scylla-1 Pod scylla-1.scylla. default.svc.cluster.local spec.replicas: 2 scylla-2 Pod scylla-2.scylla. default.svc.cluster.local StatefulSet Scale Down? spec.replicas: status.replicas: status.readyReplicas: 3 0 0 1 1 2 2 3 3 kubectl apply -f save 2 Data not streamed! Scylla Ring scylla-0 Up scylla-1 Up scylla-2 UpDown Potential Data Loss! scylla-0 Pod scylla-0.scylla. default.svc.cluster.local
  • 14. StatefulSet: Cannot track Member identity kubelet Master Node 1 kubelet Node 2 kubelet Node 3 kubelet Node 4 API Server Pod etcd Controllers StatefulSet Controller Various Controllers Node 4 scylla-0 Pod scylla-0.scylla. default.svc.cluster.local scylla-2 Pod scylla-2.scylla. default.svc.cluster.local scylla-1 Pod scylla-1.scylla. default.svc.cluster.local Member Joining Replace Member? Add new Member? Node Fail Must know Member identity beforehand!
  • 15. Vanilla Solution: StatefulSet Problems with: ■ Seeds ■ Multi-zone deployment ■ Scale Down ■ Loss of Persistence ■ Backups/Restores ■ Extensibility What if we could create management software in the image of Kubernetes Controllers?
  • 17. Our goal Operator = Controller(s) + CRD(s) Controller Observe Calculate Reconcile write
  • 18. StatefulSet Pod Rack N, Datacenter M ... Cluster Custom Resource Member Services (Static IP) Controller communication through Labels / Annotations Member Services (Static IP) Member Services (Static IP) write watch Sidecar JMX/HTTP StatefulSet Pod Rack 1, Datacenter 1 Sidecar JMX/HTTP StatefulSet Pod Rack 1, Datacenter 2 Sidecar JMX/HTTP
  • 19. Mapping of Abstractions Member Pod Cluster Rack Datacenter StatefulSet StatefulSets Cluster Custom Resource
  • 20. Sidecar CRD + Controller + Sidecar Sidecar JMX/HTTP Pod Sidecar needed to: ■ Setup config files ■ Install plugins at startup ■ Backup and Restore functionality ■ Future extensibility Member
  • 21. An Alternative to DNS Records Services already have a static IP, called ClusterIP. Solution: ClusterIP Service per Pod Drawbacks? : ■ Performance: iptables can handle a few hundred Members, IPVS can handle thousands with no problem. ■ ClusterIP CIDR Depletion: Usually a /12 IP Block, so plenty of addresses. Much Requested Feature -> ■ What if we could have static IPs?
  • 23. Cluster Creation & Scale Up kubelet Master Node 1 kubelet Node 2 kubelet Node 3 kubelet Node 4 API Server Pod etcd Controllers Scylla Operator Various Controllers eu-west1-b eu-west1-c Spec: eu-west1-b: 1 Members eu-west1-c: 2 Members Status: eu-west1-b: 0 Members 0 ReadyMembers eu-west1-c: 0 Members 0 ReadyMembers scylla-eu-west1-b-0 Pod 10.96.0.1 Member Service scylla-eu-west1-c-0 Pod 10.96.0.3 Member Service scylla-eu-west1-c-1 Pod 10.96.0.4 Member Service Scylla Cluster write kubectl apply save new Cluster 1 1 1 12 2 StatefulSet eu-west1-c replicas: 0 StatefulSet eu-west1-b replicas: 01 12
  • 24. kubelet Scale Down Sidecar scylla-eu-west1-c-1 Member Pod kubelet Master Node 1 kubelet Node 3 Node 4 API Server Pod etcd Controllers Scylla Operator Various Controllers eu-west1-b eu-west1-c Spec: eu-west1-b: 1 Members eu-west1-c: 2 Members Status: eu-west1-b: 0 Members 0 ReadyMembers eu-west1-c: 0 Members 0 ReadyMembers scylla-eu-west1-b-0 Pod 10.96.0.1 Member Service scylla-eu-west1-c-0 Pod 10.96.0.3 Member Service Scylla Cluster kubectl apply save scale down eu-west1-c Cluster changed 10.96.0.4 1 1 1 12 2 StatefulSet eu-west1-c replicas: 0 StatefulSet eu-west1-b replicas: 01 12 1 Member Service decommissioned: false nodetool decommission Node 4 Scylla Ring scylla-eu-west1-b-0 Up scylla-eu-west1-c-0 Up scylla-eu-west1-c-1 UpLeaving decommissioned: true stream data kubelet Node 2
  • 25. Local Storage vs Network Attached Local NVME SSD Network Attached Storage (AWS EBS, Google Persistent Disk) ■ Fast ■ Ephemeral ■ Slow ■ Fault-tolerant Scylla handles replication => Use Local Storage! v1.10: Local Persistent Volumes in Beta
  • 26. Local Storage Failure Scenarios ■ Disk Misbehaves ● Block errors ● Deteriorating performance ■ Disk Fails ● Mount Point Disappears ■ Node Fails ● With Disk on it ■ Pod still runs ■ Unhandled by K8s ■ Pod fails to start ■ Unhandled by K8s ■ Pod fails to be scheduled ■ Unhandled by K8s Common in the Cloud!
  • 27. Node Fail kubelet Master Node 1 kubelet Node 2 kubelet Node 4 API Server Pod etcd Controllers Scylla Operator Various Controllers /mnt/ssd1 /mnt/ssd1 /mnt/ssd1 member-0 Pod 10.96.0.1 Member Service kubelet Node 3 /mnt/ssd1 member-1 Pod 10.96.0.3 Member Service member-2 Pod 10.96.0.4 Member Service Node Fail Admin / Fencing Software Delete Node 3 StatefulSet changed Recreate PVC member-1 Pod 10.96.0.3 Member Service Empty Disk
  • 28. kubelet Node 2 /mnt/ssd1 member-1 Pod 10.96.0.3 Member Service Algorithm: Cluster Member? (search with IP) Yes Empty Disk ? Stream Missing Data (replace_address_first_boot option) Yes Node Fail Empty Disk
  • 29. Demo
  • 30. Take away Kubernetes helps to manage Scylla, but has some limitations: ■ CPU Pinning ● Huge performance gains. ● Must be enabled in the kubelet. ● Many managed solutions don’t enable it. ■ Local Storage ● Supported but still needs improvement. ● Some vendors don’t offer high storage machines for K8s. ■ Multi-Region Clusters ● Still an unsolved problem. “Cost of Containerization” by Moreno Garcia: https://www.scylladb.com/2018/08/09/cost-containerization-scylla/
  • 31. Future Work Scylla Operator ■ Repairs with Scylla Manager ■ Multi-Region Clusters ● Very early support in Kubernetes ● LoadBalancer per Pod is a possible workaround ■ Backups and Restores ■ File your own issue: ● https://github.com/scylladb/scylla-operator Kubernetes ■ Better Support for Local Storage ● Monitoring, scheduling
  • 32. Thank you Stay in touch Any questions? Yannis Zarkadas [email protected] @yanniszark

Editor's Notes

  • #5: Overview of distributed nature of Scylla
  • #6: Overview: each member stores a different portion of the data
  • #7: Intro to kubernetes: Smallest unit of processing: Pod Declarative nature: user declares desired state, Kubernetes works to satisfy
  • #8: Kubernetes’ solution for running DBs: StatefulSet
  • #9: Example of how the StatefulSet works
  • #10: Controller pattern that appears everywhere in K8s: 1. Observe desired state 2. Calculate actual state 3. Diff and take action
  • #11: What is missing to enable us to build our own controller? Custom Objects. CRDs enable us to store custom objects in etcd.
  • #12: Operator pattern. Controller acts as a human operator would.
  • #25: Examples of how our design addresses each of the StatefulSet’s shortcomings.