SlideShare a Scribd company logo
Rebuild the Ceph Distributed
Storage with Seastar
Kefu Chai
Software Engineer, Red Hat
Kefu Chai
He is a software engineer who has been focusing on storage
systems for over a decade.
Currently, he is working on Ceph, an open source, distributed
storage solution.
What is Ceph anyway?
About Ceph
▪ A platform
• Object, block and filesystem
▪ Scalable
• from 10 to 10,000 nodes (OSDs)
▪ No single point failure
▪ Free and Open Source (LGPL)
Ceph Components
CRUSH — the “partitioner” of objects
CRUSH (cont.)
good not not good enough
The latency analysis for write path
courtesy of Myonngwon Oh, et al., ALCA: Locality-Aware Lock Contention
Avoidance for NVMe-Based Scale-out Storage System
Blank Slide for full size diagram
Storage Backend
Network Messenger
Thread Pool serving requests
A closer look at the thread pool
Random R/W performance
The performance of random I/O is throttled by CPU
Blank Slide for full size diagram
Latencies in I/O path
▪ TCP/IP stack in kernel space
▪ Coarse grained locks
▪ Internal fragmentation
▪ Memory copy for direct-I/O
Seastar to the Rescue
Non-seastar
▪ Thread pool
▪ Lock
▪ Share state
▪ TCP/IP in kernel
▪ Callbacks
▪ Multiple-threaded
▪ Fiber
▪ Lockless
▪ Message passing
▪ TCP/IP in userspace (0-copy)
▪ future/promise
▪ single-threaded
Seastar
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Seastar
A message from the alien
A message from the alien
▪ std::future versus seastar::future
// in rocksdb
Status SequentialFile::read(size_t n,
char* buf)
{
return alien::submit_to(io_shard,
[n,buf,this]
{
return do_dma_read(n, buf);
}).wait();
}
▪ Dedicated cores for the alien territory
Backfill with priority
▪ Critical path first
• Basic I/O paths
▪ Supporting features later
• recovery
• Metric reporting
• …
Thank You
Any Questions ?
Please stay in touch
kchai@redhat.com
@tchaik0v

More Related Content

PPTX
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
ScyllaDB
 
PPTX
Scylla Summit 2018: Consensus in Eventually Consistent Databases
ScyllaDB
 
PPTX
How to be Successful with Scylla
ScyllaDB
 
PPTX
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
ScyllaDB
 
PDF
Avoiding Data Hotspots at Scale
ScyllaDB
 
PDF
RADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu Chai
Ceph Community
 
PDF
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
ScyllaDB
 
PDF
Unikraft: Fast, Specialized Unikernels the Easy Way
ScyllaDB
 
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
ScyllaDB
 
Scylla Summit 2018: Consensus in Eventually Consistent Databases
ScyllaDB
 
How to be Successful with Scylla
ScyllaDB
 
Scylla Summit 2018: How We Made Large Partition Scans Over Two Times Faster
ScyllaDB
 
Avoiding Data Hotspots at Scale
ScyllaDB
 
RADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu Chai
Ceph Community
 
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
ScyllaDB
 
Unikraft: Fast, Specialized Unikernels the Easy Way
ScyllaDB
 

What's hot (20)

PDF
Build Low-Latency Applications in Rust on ScyllaDB
ScyllaDB
 
PDF
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Ceph Community
 
PDF
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
DevOpsDays Tel Aviv
 
PDF
Update on Crimson - the Seastarized Ceph - Seastar Summit
ScyllaDB
 
PPTX
Scylla Summit 2018: What's New in Scylla Manager?
ScyllaDB
 
PDF
Linux Block Cache Practice on Ceph BlueStore - Junxin Zhang
Ceph Community
 
PDF
G1: To Infinity and Beyond
ScyllaDB
 
PDF
Erasure Code at Scale - Thomas William Byrne
Ceph Community
 
PDF
Integration of Glusterfs in to commvault simpana
Gluster.org
 
PDF
Sharding: Past, Present and Future with Krutika Dhananjay
Gluster.org
 
PDF
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
Ceph Community
 
PDF
Redis Day Keynote Salvatore Sanfillipo Redis Labs
Redis Labs
 
PDF
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
ScyllaDB
 
PDF
P99CONF — What We Need to Unlearn About Persistent Storage
ScyllaDB
 
PDF
Object Compaction in Cloud for High Yield
ScyllaDB
 
PPTX
How We Made Scylla Maintenance Easier, Safer and Faster
ScyllaDB
 
PPT
BigTable PreReading
everestsun
 
PPTX
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!
ScyllaDB
 
PDF
Performance tuning in BlueStore & RocksDB - Li Xiaoyan
Ceph Community
 
PPTX
Sizing Your Scylla Cluster
ScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
ScyllaDB
 
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Ceph Community
 
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
DevOpsDays Tel Aviv
 
Update on Crimson - the Seastarized Ceph - Seastar Summit
ScyllaDB
 
Scylla Summit 2018: What's New in Scylla Manager?
ScyllaDB
 
Linux Block Cache Practice on Ceph BlueStore - Junxin Zhang
Ceph Community
 
G1: To Infinity and Beyond
ScyllaDB
 
Erasure Code at Scale - Thomas William Byrne
Ceph Community
 
Integration of Glusterfs in to commvault simpana
Gluster.org
 
Sharding: Past, Present and Future with Krutika Dhananjay
Gluster.org
 
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
Ceph Community
 
Redis Day Keynote Salvatore Sanfillipo Redis Labs
Redis Labs
 
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
ScyllaDB
 
P99CONF — What We Need to Unlearn About Persistent Storage
ScyllaDB
 
Object Compaction in Cloud for High Yield
ScyllaDB
 
How We Made Scylla Maintenance Easier, Safer and Faster
ScyllaDB
 
BigTable PreReading
everestsun
 
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!
ScyllaDB
 
Performance tuning in BlueStore & RocksDB - Li Xiaoyan
Ceph Community
 
Sizing Your Scylla Cluster
ScyllaDB
 
Ad

Similar to Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Seastar (20)

PDF
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Odinot Stanislas
 
PDF
Sanger OpenStack presentation March 2017
Dave Holland
 
PDF
Ceph in the GRNET cloud stack
Nikos Kormpakis
 
PPTX
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community
 
PDF
Reference Architecture: Architecting Ceph Storage Solutions
Ceph Community
 
PDF
Crimson: Ceph for the Age of NVMe and Persistent Memory
ScyllaDB
 
ODP
Ceph Day SF 2015 - Keynote
Ceph Community
 
PDF
Open Source Storage at Scale: Ceph @ GRNET
Nikos Kormpakis
 
PDF
TUT18972: Unleash the power of Ceph across the Data Center
Ettore Simone
 
PDF
Ceph in 2023 and Beyond.pdf
Clyso GmbH
 
PPT
Ceph Performance and Optimization - Ceph Day Frankfurt
Ceph Community
 
PDF
Wicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Kamesh Pemmaraju
 
PPT
Openstack Summit HK - Ceph defacto - eNovance
eNovance
 
ODP
London Ceph Day Keynote: Building Tomorrow's Ceph
Ceph Community
 
PDF
2015 open storage workshop ceph software defined storage
Andrew Underwood
 
PDF
CETH for XDP [Linux Meetup Santa Clara | July 2016]
IO Visor Project
 
PPT
00_Introduction_JFV14_00_Introduction_JFV14.ppt
MahmoudGad93
 
PDF
Ceph as software define storage
Mahmoud Shiri Varamini
 
PDF
Scaling Ceph at CERN - Ceph Day Frankfurt
Ceph Community
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Odinot Stanislas
 
Sanger OpenStack presentation March 2017
Dave Holland
 
Ceph in the GRNET cloud stack
Nikos Kormpakis
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community
 
Reference Architecture: Architecting Ceph Storage Solutions
Ceph Community
 
Crimson: Ceph for the Age of NVMe and Persistent Memory
ScyllaDB
 
Ceph Day SF 2015 - Keynote
Ceph Community
 
Open Source Storage at Scale: Ceph @ GRNET
Nikos Kormpakis
 
TUT18972: Unleash the power of Ceph across the Data Center
Ettore Simone
 
Ceph in 2023 and Beyond.pdf
Clyso GmbH
 
Ceph Performance and Optimization - Ceph Day Frankfurt
Ceph Community
 
Wicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Kamesh Pemmaraju
 
Openstack Summit HK - Ceph defacto - eNovance
eNovance
 
London Ceph Day Keynote: Building Tomorrow's Ceph
Ceph Community
 
2015 open storage workshop ceph software defined storage
Andrew Underwood
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
IO Visor Project
 
00_Introduction_JFV14_00_Introduction_JFV14.ppt
MahmoudGad93
 
Ceph as software define storage
Mahmoud Shiri Varamini
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Ceph Community
 
Ad

More from ScyllaDB (20)

PDF
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
ScyllaDB
 
PDF
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
ScyllaDB
 
PDF
New Ways to Reduce Database Costs with ScyllaDB
ScyllaDB
 
PDF
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
PDF
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
ScyllaDB
 
PDF
Leading a High-Stakes Database Migration
ScyllaDB
 
PDF
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
ScyllaDB
 
PDF
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
ScyllaDB
 
PDF
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
ScyllaDB
 
PDF
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB
 
PDF
ScyllaDB: 10 Years and Beyond by Dor Laor
ScyllaDB
 
PDF
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
ScyllaDB
 
PDF
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
ScyllaDB
 
PDF
Vector Search with ScyllaDB by Szymon Wasik
ScyllaDB
 
PDF
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
ScyllaDB
 
PDF
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
ScyllaDB
 
PDF
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
ScyllaDB
 
PDF
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
ScyllaDB
 
PDF
Lessons Learned from Building a Serverless Notifications System by Srushith R...
ScyllaDB
 
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
ScyllaDB
 
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
ScyllaDB
 
New Ways to Reduce Database Costs with ScyllaDB
ScyllaDB
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
ScyllaDB
 
Leading a High-Stakes Database Migration
ScyllaDB
 
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
ScyllaDB
 
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
ScyllaDB
 
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
ScyllaDB
 
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB
 
ScyllaDB: 10 Years and Beyond by Dor Laor
ScyllaDB
 
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
ScyllaDB
 
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
ScyllaDB
 
Vector Search with ScyllaDB by Szymon Wasik
ScyllaDB
 
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
ScyllaDB
 
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
ScyllaDB
 
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
ScyllaDB
 
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
ScyllaDB
 
Lessons Learned from Building a Serverless Notifications System by Srushith R...
ScyllaDB
 

Recently uploaded (20)

PDF
Immersive experiences: what Pharo users do!
ESUG
 
PDF
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
PPTX
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
PPTX
Presentation about variables and constant.pptx
kr2589474
 
PDF
49785682629390197565_LRN3014_Migrating_the_Beast.pdf
Abilash868456
 
PDF
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
PPTX
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
PDF
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
PPTX
Odoo Integration Services by Candidroot Solutions
CandidRoot Solutions Private Limited
 
PDF
vAdobe Premiere Pro 2025 (v25.2.3.004) Crack Pre-Activated Latest
imang66g
 
PPTX
ASSIGNMENT_1[1][1][1][1][1] (1) variables.pptx
kr2589474
 
PDF
Enhancing Healthcare RPM Platforms with Contextual AI Integration
Cadabra Studio
 
PPTX
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
PDF
An Experience-Based Look at AI Lead Generation Pricing, Features & B2B Results
Thomas albart
 
PDF
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
PDF
Salesforce Implementation Services Provider.pdf
VALiNTRY360
 
PPTX
Presentation about variables and constant.pptx
safalsingh810
 
PPTX
Explanation about Structures in C language.pptx
Veeral Rathod
 
PDF
Using licensed Data Loss Prevention (DLP) as a strategic proactive data secur...
Q-Advise
 
PPTX
AI-Ready Handoff: Auto-Summaries & Draft Emails from MQL to Slack in One Flow
bbedford2
 
Immersive experiences: what Pharo users do!
ESUG
 
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
Presentation about variables and constant.pptx
kr2589474
 
49785682629390197565_LRN3014_Migrating_the_Beast.pdf
Abilash868456
 
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
Odoo Integration Services by Candidroot Solutions
CandidRoot Solutions Private Limited
 
vAdobe Premiere Pro 2025 (v25.2.3.004) Crack Pre-Activated Latest
imang66g
 
ASSIGNMENT_1[1][1][1][1][1] (1) variables.pptx
kr2589474
 
Enhancing Healthcare RPM Platforms with Contextual AI Integration
Cadabra Studio
 
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
An Experience-Based Look at AI Lead Generation Pricing, Features & B2B Results
Thomas albart
 
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
Salesforce Implementation Services Provider.pdf
VALiNTRY360
 
Presentation about variables and constant.pptx
safalsingh810
 
Explanation about Structures in C language.pptx
Veeral Rathod
 
Using licensed Data Loss Prevention (DLP) as a strategic proactive data secur...
Q-Advise
 
AI-Ready Handoff: Auto-Summaries & Draft Emails from MQL to Slack in One Flow
bbedford2
 

Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Seastar

  • 1. Rebuild the Ceph Distributed Storage with Seastar Kefu Chai Software Engineer, Red Hat
  • 2. Kefu Chai He is a software engineer who has been focusing on storage systems for over a decade. Currently, he is working on Ceph, an open source, distributed storage solution.
  • 3. What is Ceph anyway?
  • 4. About Ceph ▪ A platform • Object, block and filesystem ▪ Scalable • from 10 to 10,000 nodes (OSDs) ▪ No single point failure ▪ Free and Open Source (LGPL)
  • 6. CRUSH — the “partitioner” of objects
  • 8. good not not good enough
  • 9. The latency analysis for write path courtesy of Myonngwon Oh, et al., ALCA: Locality-Aware Lock Contention Avoidance for NVMe-Based Scale-out Storage System
  • 10. Blank Slide for full size diagram Storage Backend Network Messenger Thread Pool serving requests
  • 11. A closer look at the thread pool
  • 12. Random R/W performance The performance of random I/O is throttled by CPU
  • 13. Blank Slide for full size diagram
  • 14. Latencies in I/O path ▪ TCP/IP stack in kernel space ▪ Coarse grained locks ▪ Internal fragmentation ▪ Memory copy for direct-I/O
  • 15. Seastar to the Rescue
  • 16. Non-seastar ▪ Thread pool ▪ Lock ▪ Share state ▪ TCP/IP in kernel ▪ Callbacks ▪ Multiple-threaded ▪ Fiber ▪ Lockless ▪ Message passing ▪ TCP/IP in userspace (0-copy) ▪ future/promise ▪ single-threaded Seastar
  • 18. A message from the alien
  • 19. A message from the alien ▪ std::future versus seastar::future // in rocksdb Status SequentialFile::read(size_t n, char* buf) { return alien::submit_to(io_shard, [n,buf,this] { return do_dma_read(n, buf); }).wait(); } ▪ Dedicated cores for the alien territory
  • 20. Backfill with priority ▪ Critical path first • Basic I/O paths ▪ Supporting features later • recovery • Metric reporting • …
  • 21. Thank You Any Questions ? Please stay in touch [email protected] @tchaik0v