Building a petabyte-scale storage
system based on free software
martin palma / infrastructure engineer / eurac research
Setting some context
What we want…
What we want…
scalability
What we want…
scalability
reliability
What we want…
scalability
economically
reliability
What we want…
scalability
economically
reliability
persistence
What we want…
scalability
economically
reliability
persistence
performance
So, what to choose?
a unified, distributed storage system designed for excellent
performance, reliability and scalability.
• Created by Sage Weil for his PHD dissertation at the University of California, Santa Cruz in
2007
• From fall 2007 he worked full-time on Ceph at Dreamhost (he is one of the co-founder of
Dreamhost)
• In 2012 founded Inktank Storage for professional services and support of Ceph
• First release Argonaut on July 2, 2012
• April 2014 Red Hat purchased Inktank
• October 2015 the Ceph Community Advisory Board was formed. (Canonical, CERN, Cisco,
Fujitsu, Intel, Red Hat, SanDisk, and SUSE.)
• November 2018 The Ceph Foundation (funded under the Linux Foundation)
• Until now 13 releases (Mimic is the latest)
Principles…
• All components must scale horizontally
• There can be no single point of failure
• The solution must be hardware agnostic
• Should use commodity hardware
• Self-Managed whenever possible
• Open Source (LGPL)
RGW
A web services
gateway for object
storage, compatible
with S3 and Swift
LIBRADOS
A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOS
A software-based, reliable, autonomous, distributed object store comprised of
self-healing, self-managing, intelligent storage nodes and lightweight monitors
RBD
A reliable, fully-
distributed block
device with cloud
platform integration
CEPHFS
A distributed >le
system with POSIX
semantics and scale-
out metadata
management
APP HOST/VM CLIENT
• Maintain cluster membership and state
• Provide consensus for distributed decision-making
• Small and odd number
• Not part of the data path
• Storing and serving data to clients
• On per disk (HDD, SSD, NVMe)
• 10s to 1000s in a cluster
• Intelligently peer for replication & recovery
Monitors
Object Storage Daemon (OSD)
CRUSH Controlled, Scalable, Decentralized Placement of Replicated Data
Pseudo-random placement algorithm
• Fast calculation, no lookup
• Repeatable, deterministic
Statistically uniform distribution
Rule-base configuration
• Infrastructure topology aware (CRUSH map)
• Adjustable replication
• Weighted devices (different sizes)
Data Placement
Why Ceph?
Why Ceph?
scalability
economically
reliability
persistence
performance
Why Ceph?
scalability
economically
reliability
persistence
performance
Why Ceph?
scalability
economically
reliability
persistence
performance
Why Ceph?
scalability
economically
reliability
persistence
performance
Why Ceph?
scalability
economically
reliability
persistence
performance
Why Ceph and not X?
scalability
economically
reliability
persistence
performance
Our current implementation
• 39 Storage nodes (12 * 4 TB SATA, 4 * 400 GB SSD)
• 546 Object Storage Daemon
• 5 Monitor nodes
• 2 Metadata nodes
• Dedicated public and cluster network
• Storage nodes with 2 x 10 Gbit/s for public & 2 x 40 Gbit/s for cluster connectivity
• Monitor & Metadata have 2 x 10 Gbit/s public connectivity
• Hardware: Supermicro, Mellanox
• OS: CentOS 7.4
• Ceph version: Luminous 12.2.4
SFScon18 - Martin Palma - Building a petabyte-scale storage system based on free software
• Mainly CephFS and RBD used
• Integrated into internal Kubernetes and LXD infrastructure
• Started in 2014 with Hammer LTS
• Done 3 major version upgrades (Hammer -> Jewel -> Luminous)
• Scaled from initial 600 TB to 1,7 PB
• Raw used: ~1 PB
• Usage:
RBD
23%
CephFS
77%
Challenges
Challenges
experience
Challenges
experience
operation
Challenges
documentation
operation
experience
Challenges
experience
backup
operation
documentation
Conclusion
Thank you
Building a petabyte-scale storage system based on free
software
Martin Palma
Eurac Research
martin.palma@eurac.ed
http://www.eurac.edu

More Related Content

PDF
Hyperconvergence and Death of the Storage Array - Interop 2015
PDF
Nutanix and microsoft_webinar_oct_28
PPTX
Web scale IT - Nutanix
PPTX
Présentation NUTANIX - ÉVÉNEMENT TOUR D’ARGENT : ACROPOLIS - NUTANIX - SIPART...
PPTX
Seize Profits in the Cloud with SolidFire
PPTX
Nutanix overview
PDF
Nutanix Technology Bootcamp
PDF
Designing OpenStack Architectures
Hyperconvergence and Death of the Storage Array - Interop 2015
Nutanix and microsoft_webinar_oct_28
Web scale IT - Nutanix
Présentation NUTANIX - ÉVÉNEMENT TOUR D’ARGENT : ACROPOLIS - NUTANIX - SIPART...
Seize Profits in the Cloud with SolidFire
Nutanix overview
Nutanix Technology Bootcamp
Designing OpenStack Architectures

What's hot (20)

PPTX
When to select hyper converged 2016 Sydney VMUG
PPTX
Cost Effectively Run Multiple Oracle Database Copies at Scale
PPTX
How Lenovo and Nutanix are delivering the invisible infrastructure
PPT
Clouds in Your Coffee Session with Cleversafe & Avere
PDF
An Easy Path to Kubernetes on Nutanix
PDF
7 steps to storage freedom and avoiding vendor lock in - io fabric 2017
PPTX
IDC Nutanix - Hyperconvergence and the Pulling Forces in the Datacenter
PDF
Experiencing the hyperconverged
PDF
ConverBiz offering for database services
PDF
Přehled portfolia ODA a praktických případů v regionu EMEA
PDF
VMUG.IT UserCon 2015 - Nutanix
PPTX
Webinar: Performance vs. Cost - Solving The HPC Storage Tug-of-War
PPTX
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
PDF
4 hp converged_cloud
PPTX
Modern storage for modern business: get to know Hedvig
PPTX
Enterprise File Share and Sync with CleverSafe
PPTX
Protect the Hype: Backup Best Practices for Converged & Hyperconverged Infras...
PPTX
Cloudera Federal Forum 2014: Hadoop's Impact on the Future of Data Management
PPTX
Scale up is history! is scale out the future for storage
PPTX
Webinar: Don't believe the hype, you don't need dedicated storage for VDI
When to select hyper converged 2016 Sydney VMUG
Cost Effectively Run Multiple Oracle Database Copies at Scale
How Lenovo and Nutanix are delivering the invisible infrastructure
Clouds in Your Coffee Session with Cleversafe & Avere
An Easy Path to Kubernetes on Nutanix
7 steps to storage freedom and avoiding vendor lock in - io fabric 2017
IDC Nutanix - Hyperconvergence and the Pulling Forces in the Datacenter
Experiencing the hyperconverged
ConverBiz offering for database services
Přehled portfolia ODA a praktických případů v regionu EMEA
VMUG.IT UserCon 2015 - Nutanix
Webinar: Performance vs. Cost - Solving The HPC Storage Tug-of-War
SplunkLive! Nutanix Session - Turnkey and scalable infrastructure for Splunk ...
4 hp converged_cloud
Modern storage for modern business: get to know Hedvig
Enterprise File Share and Sync with CleverSafe
Protect the Hype: Backup Best Practices for Converged & Hyperconverged Infras...
Cloudera Federal Forum 2014: Hadoop's Impact on the Future of Data Management
Scale up is history! is scale out the future for storage
Webinar: Don't believe the hype, you don't need dedicated storage for VDI
Ad

Similar to SFScon18 - Martin Palma - Building a petabyte-scale storage system based on free software (20)

PDF
Fortissimo Foundation A Clustered, Pervasive, Global Direct-remote I/O Access...
PDF
Cisco hyperflex software defined storage and ucs unite
PPTX
Excelero overview for slideshare
PDF
Red Hat Storage Day Boston - OpenStack + Ceph Storage
PDF
Nexenta transtec
PDF
NAVER Ceph Storage on ssd for Container
PDF
End of RAID as we know it with Ceph Replication
PDF
Why MySQL High Availability Matters
PPTX
New Ceph capabilities and Reference Architectures
PPTX
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
PPTX
TECHunplugged Austin 2016
PDF
Red hat ceph storage customer presentation
PPTX
Share on LinkedIn Share on Twitter Share on Facebook Share on Google+ Share b...
PDF
Webinar - Introduction to Ceph and OpenStack
PDF
A Tight Ship: How Containers and SDS Optimize the Enterprise
PDF
Accelerate Your Migration to "Application-Centric" Storage-as-a-Service from ...
PPTX
start_your_datacenter_sds_v3
PDF
Cloud-Ready, Scale-Out Storage
PPTX
Cleversafe august 2016
PDF
Percona Live 4/14/15: Leveraging open stack cinder for peak application perfo...
Fortissimo Foundation A Clustered, Pervasive, Global Direct-remote I/O Access...
Cisco hyperflex software defined storage and ucs unite
Excelero overview for slideshare
Red Hat Storage Day Boston - OpenStack + Ceph Storage
Nexenta transtec
NAVER Ceph Storage on ssd for Container
End of RAID as we know it with Ceph Replication
Why MySQL High Availability Matters
New Ceph capabilities and Reference Architectures
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
TECHunplugged Austin 2016
Red hat ceph storage customer presentation
Share on LinkedIn Share on Twitter Share on Facebook Share on Google+ Share b...
Webinar - Introduction to Ceph and OpenStack
A Tight Ship: How Containers and SDS Optimize the Enterprise
Accelerate Your Migration to "Application-Centric" Storage-as-a-Service from ...
start_your_datacenter_sds_v3
Cloud-Ready, Scale-Out Storage
Cleversafe august 2016
Percona Live 4/14/15: Leveraging open stack cinder for peak application perfo...
Ad

More from South Tyrol Free Software Conference (20)

PDF
SFSCON24 - Marina Latini - 1, 2, 3, Doc Kit!
PDF
SFSCON24 - Carmen Delgado Ivar Grimstad - Nurturing OpenJDK distribution: Ecl...
PDF
SFSCON24 - Eduardo Guerra - codEEmoji – Making code more informative with emojis
PDF
SFSCON24 - Juri Solovjov - How to start contributing and still have fun
PDF
SFSCON24 - Michal Skipala & Bruno Rossi - Monolith Splitter
PDF
SFSCON24 - Jorge Melegati - Software Engineering Automation: From early tools...
PDF
SFSCON24 - Chiara Civardi & Dominika Tasarz Sochacka - The Crucial Role of Op...
PDF
SFSCON24 - Moritz Mock, Barbara Russo & Jorge Melegati - Can Test Driven Deve...
PDF
SFSCON24 - Aurelio Buonomo & Christian Zanotti - Apisense – Easily monitor an...
PDF
SFSCON24 - Giovanni Giannotta & Orneda Lecini - Approaches to Object Detectio...
PDF
SFSCON24 - Alberto Nicoletti - The SMART Box of AURA Project
PDF
SFSCON24 - Luca Alloatti - Open-source silicon chips
PDF
SFSCON24 - Roberto Innocenti - 2025 scenario on OpenISA OpenPower Open Hardwa...
PDF
SFSCON24 - Juan Rico - Enabling global interoperability among smart devices ...
PDF
SFSCON24 - Seckin Celik & Davide Serpico - Adoption Determinants of Open Hard...
PDF
SFSCON24 - Stefan Mutschlechner - Smart Werke Meran - Lorawan Use Cases
PDF
SFSCON24 - Mattia Pizzirani - Raspberry Pi and Node-RED: Open Source Tools fo...
PDF
SFSCON24 - Attaullah Buriro - ClapMetrics: Decoding Users Genderand Age Throu...
PDF
SFSCON24 - Joseph P. De Veaugh Geiss - Opt out? Opt in? Opt Green! Bringing F...
PDF
SFSCON24 - Fulvio Mastrogiovanni - On the ethical challenges raised by robots...
SFSCON24 - Marina Latini - 1, 2, 3, Doc Kit!
SFSCON24 - Carmen Delgado Ivar Grimstad - Nurturing OpenJDK distribution: Ecl...
SFSCON24 - Eduardo Guerra - codEEmoji – Making code more informative with emojis
SFSCON24 - Juri Solovjov - How to start contributing and still have fun
SFSCON24 - Michal Skipala & Bruno Rossi - Monolith Splitter
SFSCON24 - Jorge Melegati - Software Engineering Automation: From early tools...
SFSCON24 - Chiara Civardi & Dominika Tasarz Sochacka - The Crucial Role of Op...
SFSCON24 - Moritz Mock, Barbara Russo & Jorge Melegati - Can Test Driven Deve...
SFSCON24 - Aurelio Buonomo & Christian Zanotti - Apisense – Easily monitor an...
SFSCON24 - Giovanni Giannotta & Orneda Lecini - Approaches to Object Detectio...
SFSCON24 - Alberto Nicoletti - The SMART Box of AURA Project
SFSCON24 - Luca Alloatti - Open-source silicon chips
SFSCON24 - Roberto Innocenti - 2025 scenario on OpenISA OpenPower Open Hardwa...
SFSCON24 - Juan Rico - Enabling global interoperability among smart devices ...
SFSCON24 - Seckin Celik & Davide Serpico - Adoption Determinants of Open Hard...
SFSCON24 - Stefan Mutschlechner - Smart Werke Meran - Lorawan Use Cases
SFSCON24 - Mattia Pizzirani - Raspberry Pi and Node-RED: Open Source Tools fo...
SFSCON24 - Attaullah Buriro - ClapMetrics: Decoding Users Genderand Age Throu...
SFSCON24 - Joseph P. De Veaugh Geiss - Opt out? Opt in? Opt Green! Bringing F...
SFSCON24 - Fulvio Mastrogiovanni - On the ethical challenges raised by robots...

Recently uploaded (20)

PDF
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
PDF
Co-training pseudo-labeling for text classification with support vector machi...
PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
PDF
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
PDF
Lung cancer patients survival prediction using outlier detection and optimize...
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PDF
4 layer Arch & Reference Arch of IoT.pdf
PPTX
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
PDF
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
PDF
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
Rapid Prototyping: A lecture on prototyping techniques for interface design
PDF
NewMind AI Weekly Chronicles – August ’25 Week IV
PDF
giants, standing on the shoulders of - by Daniel Stenberg
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PPTX
SGT Report The Beast Plan and Cyberphysical Systems of Control
PDF
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
DOCX
Basics of Cloud Computing - Cloud Ecosystem
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
Co-training pseudo-labeling for text classification with support vector machi...
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
Lung cancer patients survival prediction using outlier detection and optimize...
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
Early detection and classification of bone marrow changes in lumbar vertebrae...
4 layer Arch & Reference Arch of IoT.pdf
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
Improvisation in detection of pomegranate leaf disease using transfer learni...
Rapid Prototyping: A lecture on prototyping techniques for interface design
NewMind AI Weekly Chronicles – August ’25 Week IV
giants, standing on the shoulders of - by Daniel Stenberg
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
SGT Report The Beast Plan and Cyberphysical Systems of Control
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
Basics of Cloud Computing - Cloud Ecosystem

SFScon18 - Martin Palma - Building a petabyte-scale storage system based on free software