SlideShare a Scribd company logo
RED HAT CONFIDENTIAL | NDA ONLY
CACHE TIERING AND ERASURE CODING
#ceph-devel
shinobu
RED HAT CONFIDENTIAL | NDA ONLY
■ CEPH MOTIVATING PRINCIPLES
■ CEPH COMPONENTS
■ ARCHITECTURE COMPONENT
■ RADOS
■ LIBRADOS
■ RADOS COMPONENTS
■ DATA PLACEMENT
■ CACHE TIERING
■ ERASURE CODING
AGENDA
1
RED HAT CONFIDENTIAL | NDA ONLY
■ All components must scale horizontally
■ There can be no single point of failure
■ The solution must be hardware agnostic
■ Should use commodity hardware
■ Self-manage whenever possible
■ Open source (LGPL)
■ Move beyond legacy approaches
■ Client / cluster instead of client / server
■ Ad hoc HA
CEPH MOTIVATING PRINCIPLES
2
RED HAT CONFIDENTIAL | NDA ONLY
RADOS
A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes and lightweight monitors
LIBRADOS
A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RGW
A web services
gateway for object
storage, compatible
with S3 and Swift
RBD
A reliable, fully-
distributed block
device with cloud
platform integration
CephFS
A distributed file
system with POSIX
semantics and scale-
out metadata
management
APP HOST/VM CLIENT
CEPH COMPONENTS
3
RED HAT CONFIDENTIAL | NDA ONLY
RADOS
A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes and lightweight monitors
LIBRADOS
A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RGW
A web services
gateway for object
storage, compatible
with S3 and Swift
RBD
A reliable, fully-
distributed block
device with cloud
platform integration
CephFS
A distributed file
system with POSIX
semantics and scale-
out metadata
management
APP HOST/VM CLIENT
ARCHITECTURE COMPONENTS
4
RED HAT CONFIDENTIAL | NDA ONLY
THE RADOS GATEWAY
APPLICATION
RADOSGW
LIBRADOS
APPLICATION
RADOSGW
LIBRADOS
RADOS CLUSTER
M
M
M
5
RED HAT CONFIDENTIAL | NDA ONLY
RADOS
A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes and lightweight monitors
LIBRADOS
A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RGW
A web services
gateway for object
storage, compatible
with S3 and Swift
RBD
A reliable, fully-
distributed block
device with cloud
platform integration
CephFS
A distributed file
system with POSIX
semantics and scale-
out metadata
management
APP HOST/VM CLIENT
ARCHITECTURE COMPONENTS
6
RED HAT CONFIDENTIAL | NDA ONLY
RADOS CLUSTER
M
M
STORING VIRTUAL DISK: LIBRBD
VM
HYPERVISOR
LIBRBD
7
RED HAT CONFIDENTIAL | NDA ONLY
RADOS CLUSTER
M
M
KERNEL MODULE: KRBD
LINUX HOST
KRBD
8
RED HAT CONFIDENTIAL | NDA ONLY
RBD FEATURES
■ Stripe images across entire cluster (pool)
■ Read-only snapshots
■ Copy-on-Write clones
■ Broad integration
■ Qemu
■ Linux kernel
■ iSCSI (STGT, LIO)
■ OpenStack, CloudStack, Nebula, Geneti, Proxmox
■ Incremental backup (relative to snapshot)
9
RED HAT CONFIDENTIAL | NDA ONLY
RBD FEATURES
■ image mirroring
■ Asynchronous replication to another cluster
■ Replica(s) crash consistent
■ Replication is per-image
■ Each image has a data journal
■ RBD mirror daemon does the work
CLUSTER A
HYPERVISOR
LIBRBD
Journal
CLUSTER B
HYPERVISOR
LIBRBD
rbd-mirror
10
RED HAT CONFIDENTIAL | NDA ONLY
ARCHITECTURE COMPONENTS
RADOS
A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes and lightweight monitors
LIBRADOS
A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RGW
A web services
gateway for object
storage, compatible
with S3 and Swift
RBD
A reliable, fully-
distributed block
device with cloud
platform integration
CephFS
A distributed file
system with POSIX
semantics and scale-
out metadata
management
APP HOST/VM CLIENT
11
RED HAT CONFIDENTIAL | NDA ONLY
SEPARATE METADATA SERVER
LINUX HOST
KERNEL MODULE
RADOS CLUSTER
M
M
M
01
10metadata data
12
RED HAT CONFIDENTIAL | NDA ONLY
SCALABLE METADATA SERVERS
MDS
■ Manages metadata for a POSIX-compliant shared filesystem
■ Directory hierarchy
■ File metadata (owner, timestamps, mode, etc)
■ Snapshots on any directory
■ Clients stripe file data in RADOS
■ MDS not in data path
■ MDS stores metadata in RADOS
■ Dynamic MDS cluster scales to 10s or 100s
■ Only required for shared file system
13
RED HAT CONFIDENTIAL | NDA ONLY
LIBRADOS
RADOS
A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes and lightweight monitors
RGW
A web services
gateway for object
storage, compatible
with S3 and Swift
RBD
A reliable, fully-
distributed block
device with cloud
platform integration
CephFS
A distributed file
system with POSIX
semantics and scale-
out metadata
management
APP HOST/VM CLIENT
LIBRADOS
A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
14
RED HAT CONFIDENTIAL | NDA ONLY
LIBRADOS API
#include <rados/librados.hpp>
librados::IoCtx io_ctx;
librados::Rados rados;
rados.init("admin");
rados.connect();
rados.pool_create("swimming_pool");
rados.ioctx_create("swimming_pool", io_ctx);
librados::bufferlist bl;
bl.append("water");
io_ctx.write_full("octopus", bl)
librados::bufferlist rbl;
librados::AioCompletion *read_completion1 = librados::Rados::aio_create_completion();
io_ctx.aio_read("octopus", read_completion1, &rbl, 4193404, 0);
read_completion1->wait_for_safe();
read_completion1->get_return_value()
librados::ObjectWriteOperation write_op;
librados::bufferlist xbl;
xbl.append('2');
write_op.setxattr("version", xbl);
15
RED HAT CONFIDENTIAL | NDA ONLY
RADOS
LIBRADOS
A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RGW
A web services
gateway for object
storage, compatible
with S3 and Swift
RBD
A reliable, fully-
distributed block
device with cloud
platform integration
CephFS
A distributed file
system with POSIX
semantics and scale-
out metadata
management
APP HOST/VM CLIENT
RADOS
A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing,
intelligent storage nodes and lightweight monitors
16
RED HAT CONFIDENTIAL | NDA ONLY
RADOS COMPONENTS
OSD:
■ 10s to 1000s in a cluster
■ One per disk (or one per SSD, RAID group…)
■ Server stored objects to clients
■ Intelligently peer for replication & recovery
17
RED HAT CONFIDENTIAL | NDA ONLY
RADOS
M
M
M
OSD
DISK
FS
OSD
DISK
FS
OSD
DISK
FS
OSD
DISK
FS
OBJECT STORAGE DAEMON
18
RED HAT CONFIDENTIAL | NDA ONLY
M
RADOS COMPONENTS
MON:
■ Maintain cluster membership and state
■ Provide consensus of distributed decision making
■ Small, odd number (e.g., 5)
■ Not part of data path
19
RED HAT CONFIDENTIAL | NDA ONLY
CRUSH
CRUSH:
■ Pseudo-random placement algorithm
■ Fast calculation, no lookup
■ Repeatable, deterministic
■ Statically uniform distribution
■ Stable mapping
■ Limited data migration on change
■ Rule-based configuration
■ Infrastructure topology aware
■ Adjustable replication
■ Weighting
20
RED HAT CONFIDENTIAL | NDA ONLY
DATA PLACEMENT
21
RED HAT CONFIDENTIAL | NDA ONLY
DATA PLACEMENT
RADOS
10
01
01
11
10
01
01
11
11
11
11
10
10
01
10
01
0110
10
10
1101
01
01
22
RED HAT CONFIDENTIAL | NDA ONLY
DATA PLACEMENT
RADOS
10
01
01
11
10
01
01
11
11
11
11
10
10
01
10
01
0110
10
10
1101
01
01
23
RED HAT CONFIDENTIAL | NDA ONLY
DATA PLACEMENT
RADOS
10
01
01
11
10
01
01
11
11
11
11
10
10
01
10
01
0110
10
10
11
01
01
10
01
01
11
10
01
01
11
01
01
24
RED HAT CONFIDENTIAL | NDA ONLY
25
CACHE TIERING
RED HAT CONFIDENTIAL | NDA ONLY
26
TWO WAYS TO CACHE
RED HAT CONFIDENTIAL | NDA ONLY
■ Within each OSD
■ Combine SSD and HDD under each OSD
■ Make localized promote / demote decisions
■ Leverage existing tools
■ dm-cache, bcache, flashcache
■ Variety of caching controllers
■ We can help with hints
TWO WAYS TO CACHE
OSD
DISK
BLOCKDEV
DISK
FS
27
RED HAT CONFIDENTIAL | NDA ONLY
TWO WAYS TO CACHE
BLOCKDEV
Data Cache
Metadata
FS
OSD
dm-cache
28
RED HAT CONFIDENTIAL | NDA ONLY
■ Cache on separate devices / nodes
■ Different hardware for devices / nodes
■ Slow nodes for cold data
■ High performance nodes for hot data
■ Add, remove, scale each tier independently
■ Unlikely to choose right ratios at procurement time
TWO WAYS TO CACHE
OSD
DISK
BLOCKDEV
FS
29
RED HAT CONFIDENTIAL | NDA ONLY
APPLICATION
RADOS
CACHE POOL (Replicated)
BACKING POOL (ERASURE CODED)
TIERED STORAGE
30
RED HAT CONFIDENTIAL | NDA ONLY
RADOS TIERING PRINCIPLES
■ Each tier is a RADOS pool
■ Replicated or erasure coded
■ Tiers are durable
■ replicate across OSDs in multiple hosts
■ Each tier has its own CRUSH policy
■ map to SSDs devices / hosts only
■ librados clients adapt to tiering topology
■ Transparently direct requests accordingly
■ No changes to RBD, RGW, CephFS, etc
RADOS
CACHE TIER
Promotion
logic
Tiering
agent
BASE TIER
Client
Objecter
31
RED HAT CONFIDENTIAL | NDA ONLY
32
I/O PATTERN
CACHE TIERING
RED HAT CONFIDENTIAL | NDA ONLY
33
WRITE HIT
CACHE TIERING
RED HAT CONFIDENTIAL | NDA ONLY
APPLICATION
RADOS
CACHE POOL (SSD): WRITEBACK
BACKING POOL (HDD)
WRITE INTO CACHE POOL
WRITE ACK
34
RED HAT CONFIDENTIAL | NDA ONLY
35
WRITE MISS
CACHE TIERING
RED HAT CONFIDENTIAL | NDA ONLY
APPLICATION
RADOS
CACHE POOL (SSD): WRITEBACK
BACKING POOL (HDD)
WRITE MISS
WRITE
PROMOTE
ACK
36
RED HAT CONFIDENTIAL | NDA ONLY
37
PROXY WRITE
CACHE TIERING
RED HAT CONFIDENTIAL | NDA ONLY
APPLICATION
RADOS
CACHE POOL (SSD): WRITEBACK
BACKING POOL (HDD)
PROXY WRITE
WRITE
PROXY WRITE
ACK
38
RED HAT CONFIDENTIAL | NDA ONLY
39
READ: CACHE HIT
CACHE TIERING
RED HAT CONFIDENTIAL | NDA ONLY
APPLICATION
RADOS
CACHE POOL (SSD): WRITEBACK
BACKING POOL (HDD)
READ: CACHE HIT
READ READ REPLY
40
RED HAT CONFIDENTIAL | NDA ONLY
41
READ: CACHE MISS
CACHE TIERING
RED HAT CONFIDENTIAL | NDA ONLY
APPLICATION
RADOS
CACHE POOL (SSD): WRITEBACK
BACKING POOL (HDD)
READ: CACHE MISS
READ READ REPLY
PROMOTE
42
RED HAT CONFIDENTIAL | NDA ONLY
43
READFORWARD
CACHE TIERING
RED HAT CONFIDENTIAL | NDA ONLY
APPLICATION
RADOS
CACHE POOL (SSD)
BACKING POOL (HDD)
READFORWARD
READ REDIRECT READ READ REPLY
44
RED HAT CONFIDENTIAL | NDA ONLY
45
FLUSH AND EVICT
CACHE TIERING
RED HAT CONFIDENTIAL | NDA ONLY
APPLICATION
RADOS
CACHE POOL (SSD): WRITEBACK
BACKING POOL (HDD)
FLUSH AND/OR EVICT COLD DATA
EVICTACKFLUSH
46
RED HAT CONFIDENTIAL | NDA ONLY
47
ERASURE CODING
RED HAT CONFIDENTIAL | NDA ONLY
OBJECT
ERASURE CODING
RADOS
REPLICATED POOL
COPYCOPYCOPY
RADOS
ERASURE CODED POOL
1 2 3 5 64
OBJECT
■ Full copy of stored objects
■ Very high durability
■ 3x (200% overhead)
■ Quick recovery
■ One copy plus parity
■ Cost-effective durability
■ 1.5x (50% overhead)
■ Expensive recovery
48
RED HAT CONFIDENTIAL | NDA ONLY
RADOS
ERASURE CODED POOL
ERASURE CODING
OSD
1
OSD
2
OSD
3
OSD
5
OSD
6
OSD
4
49
RED HAT CONFIDENTIAL | NDA ONLY
RADOS
ERASURE CODED POOL
ERASURE CODING
OSD
1
OSD
2
OSD
3
OSD
5
OSD
6
OSD
4
50
DATA CHUNKS
RED HAT CONFIDENTIAL | NDA ONLY
RADOS
ERASURE CODED POOL
ERASURE CODING
OSD
1
OSD
2
OSD
3
OSD
5
OSD
6
OSD
4
51
CODING CHUNKS
RED HAT CONFIDENTIAL | NDA ONLY
OBJECT
RADOS
ERASURE CODED POOL
ERASURE CODING
OSD
1
OSD
2
OSD
3
OSD
5
OSD
6
OSD
4
52
RED HAT CONFIDENTIAL | NDA ONLY
53
I/O PATTERN
ERASURE CODING
RED HAT CONFIDENTIAL | NDA ONLY
54
EC READ
ERASURE CODING
RED HAT CONFIDENTIAL | NDA ONLY
CLIENT
RADOS
ERASURE CODED POOL
EC READ
READ
OSD
1
OSD
2
OSD
3
OSD
5
OSD
6
OSD
4
55
RED HAT CONFIDENTIAL | NDA ONLY
CLIENT
RADOS
ERASURE CODED POOL
EC READ
READ
OSD
1
OSD
2
OSD
3
OSD
5
OSD
6
OSD
4
READS
56
RED HAT CONFIDENTIAL | NDA ONLY
CLIENT
RADOS
ERASURE CODED POOL
EC READ
READ REPLY
OSD
1
OSD
2
OSD
3
OSD
5
OSD
6
OSD
4
57
RED HAT CONFIDENTIAL | NDA ONLY
58
EC WRITE
ERASURE CODING
RED HAT CONFIDENTIAL | NDA ONLY
CLIENT
RADOS
ERASURE CODED POOL
EC WRITE
WRITE
OSD
1
OSD
2
OSD
3
OSD
5
OSD
6
OSD
4
59
RED HAT CONFIDENTIAL | NDA ONLY
CLIENT
RADOS
ERASURE CODED POOL
EC WRITE
WRITE
OSD
1
OSD
2
OSD
3
OSD
5
OSD
6
OSD
4
WRITES
60
RED HAT CONFIDENTIAL | NDA ONLY
CLIENT
RADOS
ERASURE CODED POOL
EC WRITE
WRITE ACK
OSD
1
OSD
2
OSD
3
OSD
5
OSD
6
OSD
4
61
RED HAT CONFIDENTIAL | NDA ONLY
62
EC WRITE: DEGRADED
ERASURE CODING
RED HAT CONFIDENTIAL | NDA ONLY
CLIENT
RADOS
ERASURE CODED POOL
EC WRITE: DEGRADED
WRITE
OSD
1
OSD
2
OSD
3
OSD
5
OSD
6
OSD
4
WRITES
63
RED HAT CONFIDENTIAL | NDA ONLY
64
EC WRITE: PARTIAL FAILURE
ERASURE CODING
RED HAT CONFIDENTIAL | NDA ONLY
CLIENT
RADOS
ERASURE CODED POOL
EC WRITE: PARTIAL FAILURE
WRITE
OSD
1
OSD
2
OSD
3
OSD
5
OSD
6
OSD
4
WRITES
65
RED HAT CONFIDENTIAL | NDA ONLY
CLIENT
RADOS
ERASURE CODED POOL
EC WRITE: PARTIAL FAILURE
OSD
1
OSD
2
OSD
3
OSD
5
OSD
6
OSD
4
WRITES
66
B B BA A A
RED HAT CONFIDENTIAL | NDA ONLY
CONFIGURATION EXAMPLE
/// Create pools
sudo ceph osd erasure-code-profile set myecprofile ruleset-failure-domain=osd k=3 m=1
sudo ceph osd pool create myecpool 12 12 erasure myecprofile
sudo ceph osd pool create mycache 64 64
sudo ceph osd pool set mycache crush_ruleset 3
/// Set up a read/write cache pool mycache for pool myecpool
sudo ceph osd tier add myecpool mycache
sudo ceph osd tier cache-mode mycache writeback
sudo ceph osd tier set-overlay myecpool mycache
/// Set the target size and enable the tiering agent
sudo ceph osd pool set mycache hit_set_type bloom
sudo ceph osd pool set mycache hit_set_count 1
sudo ceph osd pool set mycache hit_set_period 3600
sudo ceph osd pool set mycache target_max_objects 250
sudo ceph osd pool set foo-hot target_max_bytes 1000000000000 # 1 TB
sudo ceph osd pool set foo-hot min_read_recency_for_promote 1
sudo ceph osd pool set foo-hot min_write_recency_for_promote 1
67
/// CRUSH Rule
root ssd {
id -6
# weight 8.000
alg straw
hash 0 # rjenkins1
item octopus01-ssd weight 1.000
item octopus02-ssd weight 1.000
item octopus03-ssd weight 1.000
}
rule cacher {
ruleset 3
type replicated
min_size 3
max_size 10
step take ssd
step choose firstn 0 type host
step emit
}
RED HAT CONFIDENTIAL | NDA ONLY
CONFIGURATION EXAMPLE
68
CONTRIBUTION
http://docs.ceph.com/docs/master/dev/
IRC AND MAILING LIST
http://ceph.com/resources/mailing-list-irc/
BUG REPORT
http://tracker.ceph.com/projects/ceph/issues/
BENCHMARKING
Cache Tiering
http://www.flashmemorysummit.com/English/Collaterals/Proceedings/2015/20150813_S303E_Zhang.pdf
Erasure Coding
http://www.flashmemorysummit.com/English/Collaterals/Proceedings/2015/20150813_S303E_Roy.pdf
RED HAT CONFIDENTIAL | NDA ONLY
Red Hat
shinobu@redhat.com
Shinobu Kinjo
THANK YOU!

More Related Content

PDF
XenSummit - 08/28/2012
Ceph Community
 
PDF
Ceph Overview for Distributed Computing Denver Meetup
ktdreyer
 
PDF
SF Ceph Users Jan. 2014
Kyle Bader
 
ODP
Block Storage For VMs With Ceph
The Linux Foundation
 
PDF
librados
Patrick McGarry
 
PDF
CephFS update February 2016
John Spray
 
PDF
Storage tiering and erasure coding in Ceph (SCaLE13x)
Sage Weil
 
PDF
ceph openstack dream team
Udo Seidel
 
XenSummit - 08/28/2012
Ceph Community
 
Ceph Overview for Distributed Computing Denver Meetup
ktdreyer
 
SF Ceph Users Jan. 2014
Kyle Bader
 
Block Storage For VMs With Ceph
The Linux Foundation
 
librados
Patrick McGarry
 
CephFS update February 2016
John Spray
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Sage Weil
 
ceph openstack dream team
Udo Seidel
 

What's hot (19)

PDF
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Danny Al-Gaaf
 
PDF
Ceph data services in a multi- and hybrid cloud world
Sage Weil
 
PPTX
Ceph Introduction 2017
Karan Singh
 
PPTX
What you need to know about ceph
Emma Haruka Iwao
 
PDF
BlueStore: a new, faster storage backend for Ceph
Sage Weil
 
PDF
The State of Ceph, Manila, and Containers in OpenStack
Sage Weil
 
PDF
BlueStore: a new, faster storage backend for Ceph
Sage Weil
 
PDF
Scalable POSIX File Systems in the Cloud
Red_Hat_Storage
 
PPTX
QCT Ceph Solution - Design Consideration and Reference Architecture
Patrick McGarry
 
PDF
Ceph - A distributed storage system
Italo Santos
 
PDF
Ceph and RocksDB
Sage Weil
 
PDF
Keeping OpenStack storage trendy with Ceph and containers
Sage Weil
 
PDF
Reliable Storage for High Availability, Disaster Recovery, Clouds and Contain...
Celia Chase
 
PDF
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
Sage Weil
 
PDF
Distributed Storage and Compute With Ceph's librados (Vault 2015)
Sage Weil
 
PDF
Community Update at OpenStack Summit Boston
Sage Weil
 
PDF
LINSTOR - Linux Block storage management tool (march 2019)
Sebastian Schinhammer
 
PDF
A crash course in CRUSH
Sage Weil
 
PDF
Ceph as software define storage
Mahmoud Shiri Varamini
 
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Danny Al-Gaaf
 
Ceph data services in a multi- and hybrid cloud world
Sage Weil
 
Ceph Introduction 2017
Karan Singh
 
What you need to know about ceph
Emma Haruka Iwao
 
BlueStore: a new, faster storage backend for Ceph
Sage Weil
 
The State of Ceph, Manila, and Containers in OpenStack
Sage Weil
 
BlueStore: a new, faster storage backend for Ceph
Sage Weil
 
Scalable POSIX File Systems in the Cloud
Red_Hat_Storage
 
QCT Ceph Solution - Design Consideration and Reference Architecture
Patrick McGarry
 
Ceph - A distributed storage system
Italo Santos
 
Ceph and RocksDB
Sage Weil
 
Keeping OpenStack storage trendy with Ceph and containers
Sage Weil
 
Reliable Storage for High Availability, Disaster Recovery, Clouds and Contain...
Celia Chase
 
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
Sage Weil
 
Distributed Storage and Compute With Ceph's librados (Vault 2015)
Sage Weil
 
Community Update at OpenStack Summit Boston
Sage Weil
 
LINSTOR - Linux Block storage management tool (march 2019)
Sebastian Schinhammer
 
A crash course in CRUSH
Sage Weil
 
Ceph as software define storage
Mahmoud Shiri Varamini
 
Ad

Similar to Cache Tiering and Erasure Coding (20)

PDF
Ceph Day London 2014 - Ceph Ecosystem Overview
Ceph Community
 
PDF
Ceph Block Devices: A Deep Dive
joshdurgin
 
PDF
Ceph Block Devices: A Deep Dive
Red_Hat_Storage
 
PDF
OSDC 2015: John Spray | The Ceph Storage System
NETWAYS
 
PPTX
Tendências e Evoluções em Armazemamento de Dados
Jefferson Alcantara
 
PDF
Ceph Day LA - RBD: A deep dive
Ceph Community
 
ODP
Ceph Day Santa Clara: The Future of CephFS + Developing with Librados
Ceph Community
 
PDF
Ceph Day New York: Ceph: one decade in
Ceph Community
 
PDF
New use cases for Ceph, beyond OpenStack, Luis Rico
Ceph Community
 
ODP
Ceph: A decade in the making and still going strong
Patrick McGarry
 
ODP
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph
Ceph Community
 
ODP
Ceph Day NYC: Building Tomorrow's Ceph
Ceph Community
 
PPTX
Ceph Intro and Architectural Overview by Ross Turk
buildacloud
 
ODP
London Ceph Day Keynote: Building Tomorrow's Ceph
Ceph Community
 
PDF
Introduction into Ceph storage for OpenStack
OpenStack_Online
 
PPTX
ASAUDIT April 2016 New
Stefan Coetzee
 
PPTX
Ceph Day Santa Clara: Ceph Fundamentals
Ceph Community
 
PDF
2019.06.27 Intro to Ceph
Ceph Community
 
PPTX
Ceph Day NYC: Ceph Fundamentals
Ceph Community
 
PDF
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red Hat
OpenStack
 
Ceph Day London 2014 - Ceph Ecosystem Overview
Ceph Community
 
Ceph Block Devices: A Deep Dive
joshdurgin
 
Ceph Block Devices: A Deep Dive
Red_Hat_Storage
 
OSDC 2015: John Spray | The Ceph Storage System
NETWAYS
 
Tendências e Evoluções em Armazemamento de Dados
Jefferson Alcantara
 
Ceph Day LA - RBD: A deep dive
Ceph Community
 
Ceph Day Santa Clara: The Future of CephFS + Developing with Librados
Ceph Community
 
Ceph Day New York: Ceph: one decade in
Ceph Community
 
New use cases for Ceph, beyond OpenStack, Luis Rico
Ceph Community
 
Ceph: A decade in the making and still going strong
Patrick McGarry
 
Ceph Day Santa Clara: Keynote: Building Tomorrow's Ceph
Ceph Community
 
Ceph Day NYC: Building Tomorrow's Ceph
Ceph Community
 
Ceph Intro and Architectural Overview by Ross Turk
buildacloud
 
London Ceph Day Keynote: Building Tomorrow's Ceph
Ceph Community
 
Introduction into Ceph storage for OpenStack
OpenStack_Online
 
ASAUDIT April 2016 New
Stefan Coetzee
 
Ceph Day Santa Clara: Ceph Fundamentals
Ceph Community
 
2019.06.27 Intro to Ceph
Ceph Community
 
Ceph Day NYC: Ceph Fundamentals
Ceph Community
 
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red Hat
OpenStack
 
Ad

Recently uploaded (20)

PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
REPORT: Heating appliances market in Poland 2024
SPIUG
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Doc9.....................................
SofiaCollazos
 
REPORT: Heating appliances market in Poland 2024
SPIUG
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 

Cache Tiering and Erasure Coding

  • 1. RED HAT CONFIDENTIAL | NDA ONLY CACHE TIERING AND ERASURE CODING #ceph-devel shinobu
  • 2. RED HAT CONFIDENTIAL | NDA ONLY ■ CEPH MOTIVATING PRINCIPLES ■ CEPH COMPONENTS ■ ARCHITECTURE COMPONENT ■ RADOS ■ LIBRADOS ■ RADOS COMPONENTS ■ DATA PLACEMENT ■ CACHE TIERING ■ ERASURE CODING AGENDA 1
  • 3. RED HAT CONFIDENTIAL | NDA ONLY ■ All components must scale horizontally ■ There can be no single point of failure ■ The solution must be hardware agnostic ■ Should use commodity hardware ■ Self-manage whenever possible ■ Open source (LGPL) ■ Move beyond legacy approaches ■ Client / cluster instead of client / server ■ Ad hoc HA CEPH MOTIVATING PRINCIPLES 2
  • 4. RED HAT CONFIDENTIAL | NDA ONLY RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RGW A web services gateway for object storage, compatible with S3 and Swift RBD A reliable, fully- distributed block device with cloud platform integration CephFS A distributed file system with POSIX semantics and scale- out metadata management APP HOST/VM CLIENT CEPH COMPONENTS 3
  • 5. RED HAT CONFIDENTIAL | NDA ONLY RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RGW A web services gateway for object storage, compatible with S3 and Swift RBD A reliable, fully- distributed block device with cloud platform integration CephFS A distributed file system with POSIX semantics and scale- out metadata management APP HOST/VM CLIENT ARCHITECTURE COMPONENTS 4
  • 6. RED HAT CONFIDENTIAL | NDA ONLY THE RADOS GATEWAY APPLICATION RADOSGW LIBRADOS APPLICATION RADOSGW LIBRADOS RADOS CLUSTER M M M 5
  • 7. RED HAT CONFIDENTIAL | NDA ONLY RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RGW A web services gateway for object storage, compatible with S3 and Swift RBD A reliable, fully- distributed block device with cloud platform integration CephFS A distributed file system with POSIX semantics and scale- out metadata management APP HOST/VM CLIENT ARCHITECTURE COMPONENTS 6
  • 8. RED HAT CONFIDENTIAL | NDA ONLY RADOS CLUSTER M M STORING VIRTUAL DISK: LIBRBD VM HYPERVISOR LIBRBD 7
  • 9. RED HAT CONFIDENTIAL | NDA ONLY RADOS CLUSTER M M KERNEL MODULE: KRBD LINUX HOST KRBD 8
  • 10. RED HAT CONFIDENTIAL | NDA ONLY RBD FEATURES ■ Stripe images across entire cluster (pool) ■ Read-only snapshots ■ Copy-on-Write clones ■ Broad integration ■ Qemu ■ Linux kernel ■ iSCSI (STGT, LIO) ■ OpenStack, CloudStack, Nebula, Geneti, Proxmox ■ Incremental backup (relative to snapshot) 9
  • 11. RED HAT CONFIDENTIAL | NDA ONLY RBD FEATURES ■ image mirroring ■ Asynchronous replication to another cluster ■ Replica(s) crash consistent ■ Replication is per-image ■ Each image has a data journal ■ RBD mirror daemon does the work CLUSTER A HYPERVISOR LIBRBD Journal CLUSTER B HYPERVISOR LIBRBD rbd-mirror 10
  • 12. RED HAT CONFIDENTIAL | NDA ONLY ARCHITECTURE COMPONENTS RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RGW A web services gateway for object storage, compatible with S3 and Swift RBD A reliable, fully- distributed block device with cloud platform integration CephFS A distributed file system with POSIX semantics and scale- out metadata management APP HOST/VM CLIENT 11
  • 13. RED HAT CONFIDENTIAL | NDA ONLY SEPARATE METADATA SERVER LINUX HOST KERNEL MODULE RADOS CLUSTER M M M 01 10metadata data 12
  • 14. RED HAT CONFIDENTIAL | NDA ONLY SCALABLE METADATA SERVERS MDS ■ Manages metadata for a POSIX-compliant shared filesystem ■ Directory hierarchy ■ File metadata (owner, timestamps, mode, etc) ■ Snapshots on any directory ■ Clients stripe file data in RADOS ■ MDS not in data path ■ MDS stores metadata in RADOS ■ Dynamic MDS cluster scales to 10s or 100s ■ Only required for shared file system 13
  • 15. RED HAT CONFIDENTIAL | NDA ONLY LIBRADOS RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors RGW A web services gateway for object storage, compatible with S3 and Swift RBD A reliable, fully- distributed block device with cloud platform integration CephFS A distributed file system with POSIX semantics and scale- out metadata management APP HOST/VM CLIENT LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) 14
  • 16. RED HAT CONFIDENTIAL | NDA ONLY LIBRADOS API #include <rados/librados.hpp> librados::IoCtx io_ctx; librados::Rados rados; rados.init("admin"); rados.connect(); rados.pool_create("swimming_pool"); rados.ioctx_create("swimming_pool", io_ctx); librados::bufferlist bl; bl.append("water"); io_ctx.write_full("octopus", bl) librados::bufferlist rbl; librados::AioCompletion *read_completion1 = librados::Rados::aio_create_completion(); io_ctx.aio_read("octopus", read_completion1, &rbl, 4193404, 0); read_completion1->wait_for_safe(); read_completion1->get_return_value() librados::ObjectWriteOperation write_op; librados::bufferlist xbl; xbl.append('2'); write_op.setxattr("version", xbl); 15
  • 17. RED HAT CONFIDENTIAL | NDA ONLY RADOS LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RGW A web services gateway for object storage, compatible with S3 and Swift RBD A reliable, fully- distributed block device with cloud platform integration CephFS A distributed file system with POSIX semantics and scale- out metadata management APP HOST/VM CLIENT RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors 16
  • 18. RED HAT CONFIDENTIAL | NDA ONLY RADOS COMPONENTS OSD: ■ 10s to 1000s in a cluster ■ One per disk (or one per SSD, RAID group…) ■ Server stored objects to clients ■ Intelligently peer for replication & recovery 17
  • 19. RED HAT CONFIDENTIAL | NDA ONLY RADOS M M M OSD DISK FS OSD DISK FS OSD DISK FS OSD DISK FS OBJECT STORAGE DAEMON 18
  • 20. RED HAT CONFIDENTIAL | NDA ONLY M RADOS COMPONENTS MON: ■ Maintain cluster membership and state ■ Provide consensus of distributed decision making ■ Small, odd number (e.g., 5) ■ Not part of data path 19
  • 21. RED HAT CONFIDENTIAL | NDA ONLY CRUSH CRUSH: ■ Pseudo-random placement algorithm ■ Fast calculation, no lookup ■ Repeatable, deterministic ■ Statically uniform distribution ■ Stable mapping ■ Limited data migration on change ■ Rule-based configuration ■ Infrastructure topology aware ■ Adjustable replication ■ Weighting 20
  • 22. RED HAT CONFIDENTIAL | NDA ONLY DATA PLACEMENT 21
  • 23. RED HAT CONFIDENTIAL | NDA ONLY DATA PLACEMENT RADOS 10 01 01 11 10 01 01 11 11 11 11 10 10 01 10 01 0110 10 10 1101 01 01 22
  • 24. RED HAT CONFIDENTIAL | NDA ONLY DATA PLACEMENT RADOS 10 01 01 11 10 01 01 11 11 11 11 10 10 01 10 01 0110 10 10 1101 01 01 23
  • 25. RED HAT CONFIDENTIAL | NDA ONLY DATA PLACEMENT RADOS 10 01 01 11 10 01 01 11 11 11 11 10 10 01 10 01 0110 10 10 11 01 01 10 01 01 11 10 01 01 11 01 01 24
  • 26. RED HAT CONFIDENTIAL | NDA ONLY 25 CACHE TIERING
  • 27. RED HAT CONFIDENTIAL | NDA ONLY 26 TWO WAYS TO CACHE
  • 28. RED HAT CONFIDENTIAL | NDA ONLY ■ Within each OSD ■ Combine SSD and HDD under each OSD ■ Make localized promote / demote decisions ■ Leverage existing tools ■ dm-cache, bcache, flashcache ■ Variety of caching controllers ■ We can help with hints TWO WAYS TO CACHE OSD DISK BLOCKDEV DISK FS 27
  • 29. RED HAT CONFIDENTIAL | NDA ONLY TWO WAYS TO CACHE BLOCKDEV Data Cache Metadata FS OSD dm-cache 28
  • 30. RED HAT CONFIDENTIAL | NDA ONLY ■ Cache on separate devices / nodes ■ Different hardware for devices / nodes ■ Slow nodes for cold data ■ High performance nodes for hot data ■ Add, remove, scale each tier independently ■ Unlikely to choose right ratios at procurement time TWO WAYS TO CACHE OSD DISK BLOCKDEV FS 29
  • 31. RED HAT CONFIDENTIAL | NDA ONLY APPLICATION RADOS CACHE POOL (Replicated) BACKING POOL (ERASURE CODED) TIERED STORAGE 30
  • 32. RED HAT CONFIDENTIAL | NDA ONLY RADOS TIERING PRINCIPLES ■ Each tier is a RADOS pool ■ Replicated or erasure coded ■ Tiers are durable ■ replicate across OSDs in multiple hosts ■ Each tier has its own CRUSH policy ■ map to SSDs devices / hosts only ■ librados clients adapt to tiering topology ■ Transparently direct requests accordingly ■ No changes to RBD, RGW, CephFS, etc RADOS CACHE TIER Promotion logic Tiering agent BASE TIER Client Objecter 31
  • 33. RED HAT CONFIDENTIAL | NDA ONLY 32 I/O PATTERN CACHE TIERING
  • 34. RED HAT CONFIDENTIAL | NDA ONLY 33 WRITE HIT CACHE TIERING
  • 35. RED HAT CONFIDENTIAL | NDA ONLY APPLICATION RADOS CACHE POOL (SSD): WRITEBACK BACKING POOL (HDD) WRITE INTO CACHE POOL WRITE ACK 34
  • 36. RED HAT CONFIDENTIAL | NDA ONLY 35 WRITE MISS CACHE TIERING
  • 37. RED HAT CONFIDENTIAL | NDA ONLY APPLICATION RADOS CACHE POOL (SSD): WRITEBACK BACKING POOL (HDD) WRITE MISS WRITE PROMOTE ACK 36
  • 38. RED HAT CONFIDENTIAL | NDA ONLY 37 PROXY WRITE CACHE TIERING
  • 39. RED HAT CONFIDENTIAL | NDA ONLY APPLICATION RADOS CACHE POOL (SSD): WRITEBACK BACKING POOL (HDD) PROXY WRITE WRITE PROXY WRITE ACK 38
  • 40. RED HAT CONFIDENTIAL | NDA ONLY 39 READ: CACHE HIT CACHE TIERING
  • 41. RED HAT CONFIDENTIAL | NDA ONLY APPLICATION RADOS CACHE POOL (SSD): WRITEBACK BACKING POOL (HDD) READ: CACHE HIT READ READ REPLY 40
  • 42. RED HAT CONFIDENTIAL | NDA ONLY 41 READ: CACHE MISS CACHE TIERING
  • 43. RED HAT CONFIDENTIAL | NDA ONLY APPLICATION RADOS CACHE POOL (SSD): WRITEBACK BACKING POOL (HDD) READ: CACHE MISS READ READ REPLY PROMOTE 42
  • 44. RED HAT CONFIDENTIAL | NDA ONLY 43 READFORWARD CACHE TIERING
  • 45. RED HAT CONFIDENTIAL | NDA ONLY APPLICATION RADOS CACHE POOL (SSD) BACKING POOL (HDD) READFORWARD READ REDIRECT READ READ REPLY 44
  • 46. RED HAT CONFIDENTIAL | NDA ONLY 45 FLUSH AND EVICT CACHE TIERING
  • 47. RED HAT CONFIDENTIAL | NDA ONLY APPLICATION RADOS CACHE POOL (SSD): WRITEBACK BACKING POOL (HDD) FLUSH AND/OR EVICT COLD DATA EVICTACKFLUSH 46
  • 48. RED HAT CONFIDENTIAL | NDA ONLY 47 ERASURE CODING
  • 49. RED HAT CONFIDENTIAL | NDA ONLY OBJECT ERASURE CODING RADOS REPLICATED POOL COPYCOPYCOPY RADOS ERASURE CODED POOL 1 2 3 5 64 OBJECT ■ Full copy of stored objects ■ Very high durability ■ 3x (200% overhead) ■ Quick recovery ■ One copy plus parity ■ Cost-effective durability ■ 1.5x (50% overhead) ■ Expensive recovery 48
  • 50. RED HAT CONFIDENTIAL | NDA ONLY RADOS ERASURE CODED POOL ERASURE CODING OSD 1 OSD 2 OSD 3 OSD 5 OSD 6 OSD 4 49
  • 51. RED HAT CONFIDENTIAL | NDA ONLY RADOS ERASURE CODED POOL ERASURE CODING OSD 1 OSD 2 OSD 3 OSD 5 OSD 6 OSD 4 50 DATA CHUNKS
  • 52. RED HAT CONFIDENTIAL | NDA ONLY RADOS ERASURE CODED POOL ERASURE CODING OSD 1 OSD 2 OSD 3 OSD 5 OSD 6 OSD 4 51 CODING CHUNKS
  • 53. RED HAT CONFIDENTIAL | NDA ONLY OBJECT RADOS ERASURE CODED POOL ERASURE CODING OSD 1 OSD 2 OSD 3 OSD 5 OSD 6 OSD 4 52
  • 54. RED HAT CONFIDENTIAL | NDA ONLY 53 I/O PATTERN ERASURE CODING
  • 55. RED HAT CONFIDENTIAL | NDA ONLY 54 EC READ ERASURE CODING
  • 56. RED HAT CONFIDENTIAL | NDA ONLY CLIENT RADOS ERASURE CODED POOL EC READ READ OSD 1 OSD 2 OSD 3 OSD 5 OSD 6 OSD 4 55
  • 57. RED HAT CONFIDENTIAL | NDA ONLY CLIENT RADOS ERASURE CODED POOL EC READ READ OSD 1 OSD 2 OSD 3 OSD 5 OSD 6 OSD 4 READS 56
  • 58. RED HAT CONFIDENTIAL | NDA ONLY CLIENT RADOS ERASURE CODED POOL EC READ READ REPLY OSD 1 OSD 2 OSD 3 OSD 5 OSD 6 OSD 4 57
  • 59. RED HAT CONFIDENTIAL | NDA ONLY 58 EC WRITE ERASURE CODING
  • 60. RED HAT CONFIDENTIAL | NDA ONLY CLIENT RADOS ERASURE CODED POOL EC WRITE WRITE OSD 1 OSD 2 OSD 3 OSD 5 OSD 6 OSD 4 59
  • 61. RED HAT CONFIDENTIAL | NDA ONLY CLIENT RADOS ERASURE CODED POOL EC WRITE WRITE OSD 1 OSD 2 OSD 3 OSD 5 OSD 6 OSD 4 WRITES 60
  • 62. RED HAT CONFIDENTIAL | NDA ONLY CLIENT RADOS ERASURE CODED POOL EC WRITE WRITE ACK OSD 1 OSD 2 OSD 3 OSD 5 OSD 6 OSD 4 61
  • 63. RED HAT CONFIDENTIAL | NDA ONLY 62 EC WRITE: DEGRADED ERASURE CODING
  • 64. RED HAT CONFIDENTIAL | NDA ONLY CLIENT RADOS ERASURE CODED POOL EC WRITE: DEGRADED WRITE OSD 1 OSD 2 OSD 3 OSD 5 OSD 6 OSD 4 WRITES 63
  • 65. RED HAT CONFIDENTIAL | NDA ONLY 64 EC WRITE: PARTIAL FAILURE ERASURE CODING
  • 66. RED HAT CONFIDENTIAL | NDA ONLY CLIENT RADOS ERASURE CODED POOL EC WRITE: PARTIAL FAILURE WRITE OSD 1 OSD 2 OSD 3 OSD 5 OSD 6 OSD 4 WRITES 65
  • 67. RED HAT CONFIDENTIAL | NDA ONLY CLIENT RADOS ERASURE CODED POOL EC WRITE: PARTIAL FAILURE OSD 1 OSD 2 OSD 3 OSD 5 OSD 6 OSD 4 WRITES 66 B B BA A A
  • 68. RED HAT CONFIDENTIAL | NDA ONLY CONFIGURATION EXAMPLE /// Create pools sudo ceph osd erasure-code-profile set myecprofile ruleset-failure-domain=osd k=3 m=1 sudo ceph osd pool create myecpool 12 12 erasure myecprofile sudo ceph osd pool create mycache 64 64 sudo ceph osd pool set mycache crush_ruleset 3 /// Set up a read/write cache pool mycache for pool myecpool sudo ceph osd tier add myecpool mycache sudo ceph osd tier cache-mode mycache writeback sudo ceph osd tier set-overlay myecpool mycache /// Set the target size and enable the tiering agent sudo ceph osd pool set mycache hit_set_type bloom sudo ceph osd pool set mycache hit_set_count 1 sudo ceph osd pool set mycache hit_set_period 3600 sudo ceph osd pool set mycache target_max_objects 250 sudo ceph osd pool set foo-hot target_max_bytes 1000000000000 # 1 TB sudo ceph osd pool set foo-hot min_read_recency_for_promote 1 sudo ceph osd pool set foo-hot min_write_recency_for_promote 1 67 /// CRUSH Rule root ssd { id -6 # weight 8.000 alg straw hash 0 # rjenkins1 item octopus01-ssd weight 1.000 item octopus02-ssd weight 1.000 item octopus03-ssd weight 1.000 } rule cacher { ruleset 3 type replicated min_size 3 max_size 10 step take ssd step choose firstn 0 type host step emit }
  • 69. RED HAT CONFIDENTIAL | NDA ONLY CONFIGURATION EXAMPLE 68 CONTRIBUTION http://docs.ceph.com/docs/master/dev/ IRC AND MAILING LIST http://ceph.com/resources/mailing-list-irc/ BUG REPORT http://tracker.ceph.com/projects/ceph/issues/ BENCHMARKING Cache Tiering http://www.flashmemorysummit.com/English/Collaterals/Proceedings/2015/20150813_S303E_Zhang.pdf Erasure Coding http://www.flashmemorysummit.com/English/Collaterals/Proceedings/2015/20150813_S303E_Roy.pdf
  • 70. RED HAT CONFIDENTIAL | NDA ONLY Red Hat [email protected] Shinobu Kinjo THANK YOU!