SlideShare a Scribd company logo
Global	deduplication	for	Ceph
Myoungwon Oh
SW-Defined	Storage	Lab
SK	Telecom
Agenda
1. Why	do	we	need	Global	dedup ?
2. Ceph deduplication	design
3. Ceph extensible	tier	(implementation)
4. Upstream
5. Plan	&	issues
5G
Flash	device
High	Performance,	Low	Latency,	SLA
UHD
4
K
Scalable, Available,
Reliable, Unified Interface,
Open Platform
High Performance,
Low Latency
All-flash	Ceph !
Contribution : QoS,
Deduplication, etc.
Storage Solution for
- Private Cloud for Developers
- Virtual Desktop Infra
SK	with	software-defined	storage
Why	do	we	need	global	dedup?
A B C D
Data A Data A
A B C D
Data A Data A
1)	Local	deduplication 2)	Global deduplication
4 OSD 8 OSD 12 OSD 16 OSD
Local Dedup 15.5% 8.1% 5.5% 4.1%
Global Dedup 50% 50% 50% 50%
B)	FIO	workload	with	deduplication	ratio	of	50%	(32KB	block	size)
A)	Design	comparison
• Up	to	40%	of	total	storage	space	can	be	saved	via	deduplication	(in	our	private	cloud)
• Local	dedup (in	a	block	device	level)	can	not	cover	whole	data	reduction	in	terms	of	
Cluster-wide
Design	challenges	
• Which	implementation	is	the	most	appropriate	for	shared-nothing	scale-out	storage?
§ Applicable	to	existing	source
§ Transparent	to	the	application
§ Efficient	metadata	management
• How	to	manage	dedup metadata?
• What	is	the	most	appropriate	dedup method	(e.g.,	inline	or	post)?
§ Performance
§ I/O	cost
Design	1:	Double	distribution	hash
• Do	we	need	a	new	MDS	(metadata	server)	for	dedup?
§ Shared-nothing	filesystem	is	scalable	because	there	is	no	MDS.
§ MDS	does	not	fit	the	Shared-nothing	design	
§ MDS	needs	additional	I/Os to	complete	I/O	requests
§ E.g.,	metadata	query	to	MDS
§ How	to	do	rebalance	if	we	add	MDS	?
§ Synchronization	between	MDSs
Design	1:	Double	distribution	hash
• Can	we	implement	dedup without	MDS?
§ Yes,	CAS	(content-addressable	storage)	pool	with	double	distribution	hash!
§ (OID,	Data)	– chunking	and	fingerprinting	->	(OID,	Offests[],	FPs[])	- FP	is	new	OID	->	(FP,	Data)
§ Pros
§ No	central		MDS
§ Applicable	to existing	source without	major	modifications.
§ Transparent	to	the	application	
§ Efficient	metadata	management
§ Reusing	existing	architecture
§ DH,	recovery,	rebalance,	data	placement
§ Cons
§ I/O	redirection	(need	a	translation	layer)
Design	2:	Self-contained	object	for	
deduplication
• External	metadata	structure	needs	additional	complex	linking	between	
deduplication	metadata	and	existing	scale-out	storage	system
• Self-contained	object	can	be	answer
§ Dedup metadata	is	included	in	the	original	object
Metadata
Pool
Chunk Pool
Data
Client
Base	Tier
Dedup Tier
Object	foo
Object	Size	=	4MB
Chunk	Size	=	1MB
Objects	(Chunked	object)
OID:	fxc039 ,	Reference	count:	1
OID:	Dxc045,	Reference	count:	2
OID:	fZc0y9,		Reference	count:	4
foo-object(has	manifest)
{	0	– 32K,	fxc039
32	– 64K,	Dxc045
64	– 128K	,	fZc0y9}
Post-processing
1. Find	dirty	metadata	object	
which	contains	dirty	chunks	
from	the	dirty	object	ID	list.	
2. Find	the	dirty	chunk	ID	from	the	
dirty	metadata	object's	chunk	
map.	
3. The	deduplication	engine	
generate	a	chunk	object	and	
send	it	to	the	chunk	pool.
4. In	the	chunk	pool,	the	chunk	
object	generated	in	step	3	is	
placed
5. Add	reference	count	
information	to	the	object.
6. When	the	chunk	write	at	the	
chunk	pool	ends,	update	the	
metadata	object's	chunk	map.
A	single	chunk
Implementation:	Extensible	tier	
• The	key	structure	for	extensible	tier
struct object_manifest_t {
enum {
TYPE_NONE	=	0,
TYPE_REDIRECT	=	1,		
TYPE_CHUNKED	=	2,
TYPE_DEDUP	=	3,
};
uint8_t	type;		//	redirect,	chunked,	...
ghobject_t redirect_target;
map	<uint64_t(offset),	chunk_info_t>					
chunk_map;
};
• Operations
§ Proxy	read,	write
§ Flush,	promote
object_info_t
obj
Implementation:	write	path
RBD
Client
1. Get dirty chunklist
2. while(Chunklist.dirty_chunks())
{
if (has_old_reference)
decrement old chunk’s reference
objecter->write(Chunklist[i])
i++;
}
3. Receive all of the Ack and update chunk’
state (dirty à clean)
Write(foo, offset, size)
if check eviction limit (all chunks are dirty)
Write request is blocked until dirty object is flushed
else
Handle a write request
Base tier (post-processing)
Write chunk data
Increment reference count
Lower Tier
If chunk object
set_chunk (source, target)
else if redirect object
set_redirect(source, target)
Handle set_chunk or set_redirect
Base tier (set-redirect or set-chunk)
1
2
3
1. Chunking and write the object
2. Update chunk_map (clean à dirty)
Handle	a	write	request
Eviction	limit	(chunk	case)
4
4
Upstream	
• Proposal
§ http://marc.info/?l=ceph-devel&m=148172886923985&w=2
• Design
§ http://marc.info/?l=ceph-devel&m=148646542200947&w=2
§ Pad document (with Sage Weil)
• http://pad.ceph.com/p/deduplication_how_dedup_manifists
• http://pad.ceph.com/p/deduplication_how_do_we_store_chunk
• http://pad.ceph.com/p/deduplication_how_do_we_chunk
• http://pad.ceph.com/p/deduplication_how_to_drive_dedup_process
• Progress
§ osd,librados: add manifest, redirect https://github.com/ceph/ceph/pull/14894
§ osd,librados: add manifest, operations for chunked object
https://github.com/ceph/ceph/pull/15482
§ osd: flush operations for chunked objects https://github.com/ceph/ceph/pull/19294
§ osd, librados: add a rados op (TIER_PROMOTE) https://github.com/ceph/ceph/pull/19362
§ WIP: osd: refcount for manifest object (redirect, chunked)
https://github.com/ceph/ceph/pull/19935
Plan	&	Issues	
• Plan (To Do)
§ Reference counting methods and data types for redirect and chunk
§ Offline fingerprinting and then storing of dedup chunked manifest (whole object or parts of it)
§ Dedup processing
§ Background dedup worker
§ Refcount manager and methods for dedup (http://pad.ceph.com/p/deduplication_how_do_we_store_chunk)
§ Fixed-sized backpointers
§ Scrub
§ Test cases
§ Issues
§ Small chunk (< 64 KB)
§ Minimizing performance degradation
§ Dedup methods (inline?)
§ CDC (contents defined chunking)

More Related Content

PDF
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph Community
 
PDF
Doing QoS Before Ceph Cluster QoS is available - David Byte, Alex Lau
Ceph Community
 
PDF
Automatic Operation Bot for Ceph - You Ji
Ceph Community
 
PDF
Linux Block Cache Practice on Ceph BlueStore - Junxin Zhang
Ceph Community
 
PDF
RGW Beyond Cloud: Live Video Storage with Ceph - Shengjing Zhu, Yiming Xie
Ceph Community
 
PDF
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
Ceph Community
 
PDF
Ceph for Big Science - Dan van der Ster
Ceph Community
 
PDF
Erasure Code at Scale - Thomas William Byrne
Ceph Community
 
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph Community
 
Doing QoS Before Ceph Cluster QoS is available - David Byte, Alex Lau
Ceph Community
 
Automatic Operation Bot for Ceph - You Ji
Ceph Community
 
Linux Block Cache Practice on Ceph BlueStore - Junxin Zhang
Ceph Community
 
RGW Beyond Cloud: Live Video Storage with Ceph - Shengjing Zhu, Yiming Xie
Ceph Community
 
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
Ceph Community
 
Ceph for Big Science - Dan van der Ster
Ceph Community
 
Erasure Code at Scale - Thomas William Byrne
Ceph Community
 

What's hot (20)

PDF
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Ceph Community
 
PDF
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Ceph Community
 
PDF
RADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu Chai
Ceph Community
 
PDF
Ceph on arm64 upload
Ceph Community
 
PDF
Ceph, the future of Storage - Sage Weil
Ceph Community
 
PDF
Basic and Advanced Analysis of Ceph Volume Backend Driver in Cinder - John Haan
Ceph Community
 
PDF
Ceph Goes on Online at Qihoo 360 - Xuehan Xu
Ceph Community
 
PDF
Making Ceph awesome on Kubernetes with Rook - Bassam Tabbara
Ceph Community
 
PDF
Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Ceph Community
 
PDF
Ceph Day San Jose - HA NAS with CephFS
Ceph Community
 
PPTX
Ceph Performance Profiling and Reporting
Ceph Community
 
PPTX
OpenStack and Ceph case study at the University of Alabama
Kamesh Pemmaraju
 
PDF
Ceph on All Flash Storage -- Breaking Performance Barriers
Ceph Community
 
PDF
AF Ceph: Ceph Performance Analysis and Improvement on Flash
Ceph Community
 
PDF
Ceph Day San Jose - From Zero to Ceph in One Minute
Ceph Community
 
PPTX
MySQL Head-to-Head
Patrick McGarry
 
PPTX
QCT Ceph Solution - Design Consideration and Reference Architecture
Patrick McGarry
 
PDF
Scaling Apache Pulsar to 10 Petabytes/Day
ScyllaDB
 
PDF
Troubleshooting redis
DaeMyung Kang
 
PPTX
Hadoop at Bloomberg:Medium data for the financial industry
Matthew Hunt
 
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Ceph Community
 
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Ceph Community
 
RADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu Chai
Ceph Community
 
Ceph on arm64 upload
Ceph Community
 
Ceph, the future of Storage - Sage Weil
Ceph Community
 
Basic and Advanced Analysis of Ceph Volume Backend Driver in Cinder - John Haan
Ceph Community
 
Ceph Goes on Online at Qihoo 360 - Xuehan Xu
Ceph Community
 
Making Ceph awesome on Kubernetes with Rook - Bassam Tabbara
Ceph Community
 
Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Ceph Community
 
Ceph Day San Jose - HA NAS with CephFS
Ceph Community
 
Ceph Performance Profiling and Reporting
Ceph Community
 
OpenStack and Ceph case study at the University of Alabama
Kamesh Pemmaraju
 
Ceph on All Flash Storage -- Breaking Performance Barriers
Ceph Community
 
AF Ceph: Ceph Performance Analysis and Improvement on Flash
Ceph Community
 
Ceph Day San Jose - From Zero to Ceph in One Minute
Ceph Community
 
MySQL Head-to-Head
Patrick McGarry
 
QCT Ceph Solution - Design Consideration and Reference Architecture
Patrick McGarry
 
Scaling Apache Pulsar to 10 Petabytes/Day
ScyllaDB
 
Troubleshooting redis
DaeMyung Kang
 
Hadoop at Bloomberg:Medium data for the financial industry
Matthew Hunt
 
Ad

Similar to Global deduplication for Ceph - Myoungwon Oh (20)

PPT
Ceph and how it works from basics intro.ppt
leptonium
 
PDF
DEVIEW 2013
Patrick McGarry
 
PDF
OSDC 2015: John Spray | The Ceph Storage System
NETWAYS
 
PDF
Strata - 03/31/2012
Ceph Community
 
PDF
Ceph in 2023 and Beyond.pdf
Clyso GmbH
 
PDF
Scale 10x 01:22:12
Ceph Community
 
PDF
Quick-and-Easy Deployment of a Ceph Storage Cluster
Patrick Quairoli
 
PDF
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Odinot Stanislas
 
PDF
Intorduce to Ceph
kao kuo-tung
 
PDF
adp.ceph.openstack.talk
Udo Seidel
 
PDF
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Community
 
PDF
Open Source Storage at Scale: Ceph @ GRNET
Nikos Kormpakis
 
PDF
Webinar - Getting Started With Ceph
Ceph Community
 
PPT
overview of Ceph and its introdcution.ppt
leptonium
 
PDF
2015 open storage workshop ceph software defined storage
Andrew Underwood
 
PPTX
Dfs in iaa_s
Chih-Chieh Huang
 
PPTX
What you need to know about ceph
Emma Haruka Iwao
 
PDF
Ceph openstack-jun-2015-meetup
openstackindia
 
PDF
Ceph & OpenStack talk given @ OpenStack Meetup @ Bangalore, June 2015
Deepak Shetty
 
Ceph and how it works from basics intro.ppt
leptonium
 
DEVIEW 2013
Patrick McGarry
 
OSDC 2015: John Spray | The Ceph Storage System
NETWAYS
 
Strata - 03/31/2012
Ceph Community
 
Ceph in 2023 and Beyond.pdf
Clyso GmbH
 
Scale 10x 01:22:12
Ceph Community
 
Quick-and-Easy Deployment of a Ceph Storage Cluster
Patrick Quairoli
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Odinot Stanislas
 
Intorduce to Ceph
kao kuo-tung
 
adp.ceph.openstack.talk
Udo Seidel
 
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Community
 
Open Source Storage at Scale: Ceph @ GRNET
Nikos Kormpakis
 
Webinar - Getting Started With Ceph
Ceph Community
 
overview of Ceph and its introdcution.ppt
leptonium
 
2015 open storage workshop ceph software defined storage
Andrew Underwood
 
Dfs in iaa_s
Chih-Chieh Huang
 
What you need to know about ceph
Emma Haruka Iwao
 
Ceph openstack-jun-2015-meetup
openstackindia
 
Ceph & OpenStack talk given @ OpenStack Meetup @ Bangalore, June 2015
Deepak Shetty
 
Ad

Recently uploaded (20)

PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
PDF
REPORT: Heating appliances market in Poland 2024
SPIUG
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
The Future of Artificial Intelligence (AI)
Mukul
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
REPORT: Heating appliances market in Poland 2024
SPIUG
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Software Development Methodologies in 2025
KodekX
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 

Global deduplication for Ceph - Myoungwon Oh

  • 2. Agenda 1. Why do we need Global dedup ? 2. Ceph deduplication design 3. Ceph extensible tier (implementation) 4. Upstream 5. Plan & issues
  • 3. 5G Flash device High Performance, Low Latency, SLA UHD 4 K Scalable, Available, Reliable, Unified Interface, Open Platform High Performance, Low Latency All-flash Ceph ! Contribution : QoS, Deduplication, etc. Storage Solution for - Private Cloud for Developers - Virtual Desktop Infra SK with software-defined storage
  • 4. Why do we need global dedup? A B C D Data A Data A A B C D Data A Data A 1) Local deduplication 2) Global deduplication 4 OSD 8 OSD 12 OSD 16 OSD Local Dedup 15.5% 8.1% 5.5% 4.1% Global Dedup 50% 50% 50% 50% B) FIO workload with deduplication ratio of 50% (32KB block size) A) Design comparison • Up to 40% of total storage space can be saved via deduplication (in our private cloud) • Local dedup (in a block device level) can not cover whole data reduction in terms of Cluster-wide
  • 5. Design challenges • Which implementation is the most appropriate for shared-nothing scale-out storage? § Applicable to existing source § Transparent to the application § Efficient metadata management • How to manage dedup metadata? • What is the most appropriate dedup method (e.g., inline or post)? § Performance § I/O cost
  • 6. Design 1: Double distribution hash • Do we need a new MDS (metadata server) for dedup? § Shared-nothing filesystem is scalable because there is no MDS. § MDS does not fit the Shared-nothing design § MDS needs additional I/Os to complete I/O requests § E.g., metadata query to MDS § How to do rebalance if we add MDS ? § Synchronization between MDSs
  • 7. Design 1: Double distribution hash • Can we implement dedup without MDS? § Yes, CAS (content-addressable storage) pool with double distribution hash! § (OID, Data) – chunking and fingerprinting -> (OID, Offests[], FPs[]) - FP is new OID -> (FP, Data) § Pros § No central MDS § Applicable to existing source without major modifications. § Transparent to the application § Efficient metadata management § Reusing existing architecture § DH, recovery, rebalance, data placement § Cons § I/O redirection (need a translation layer)
  • 8. Design 2: Self-contained object for deduplication • External metadata structure needs additional complex linking between deduplication metadata and existing scale-out storage system • Self-contained object can be answer § Dedup metadata is included in the original object Metadata Pool Chunk Pool Data Client Base Tier Dedup Tier Object foo Object Size = 4MB Chunk Size = 1MB Objects (Chunked object) OID: fxc039 , Reference count: 1 OID: Dxc045, Reference count: 2 OID: fZc0y9, Reference count: 4 foo-object(has manifest) { 0 – 32K, fxc039 32 – 64K, Dxc045 64 – 128K , fZc0y9} Post-processing 1. Find dirty metadata object which contains dirty chunks from the dirty object ID list. 2. Find the dirty chunk ID from the dirty metadata object's chunk map. 3. The deduplication engine generate a chunk object and send it to the chunk pool. 4. In the chunk pool, the chunk object generated in step 3 is placed 5. Add reference count information to the object. 6. When the chunk write at the chunk pool ends, update the metadata object's chunk map. A single chunk
  • 9. Implementation: Extensible tier • The key structure for extensible tier struct object_manifest_t { enum { TYPE_NONE = 0, TYPE_REDIRECT = 1, TYPE_CHUNKED = 2, TYPE_DEDUP = 3, }; uint8_t type; // redirect, chunked, ... ghobject_t redirect_target; map <uint64_t(offset), chunk_info_t> chunk_map; }; • Operations § Proxy read, write § Flush, promote object_info_t obj
  • 10. Implementation: write path RBD Client 1. Get dirty chunklist 2. while(Chunklist.dirty_chunks()) { if (has_old_reference) decrement old chunk’s reference objecter->write(Chunklist[i]) i++; } 3. Receive all of the Ack and update chunk’ state (dirty à clean) Write(foo, offset, size) if check eviction limit (all chunks are dirty) Write request is blocked until dirty object is flushed else Handle a write request Base tier (post-processing) Write chunk data Increment reference count Lower Tier If chunk object set_chunk (source, target) else if redirect object set_redirect(source, target) Handle set_chunk or set_redirect Base tier (set-redirect or set-chunk) 1 2 3 1. Chunking and write the object 2. Update chunk_map (clean à dirty) Handle a write request Eviction limit (chunk case) 4 4
  • 11. Upstream • Proposal § http://marc.info/?l=ceph-devel&m=148172886923985&w=2 • Design § http://marc.info/?l=ceph-devel&m=148646542200947&w=2 § Pad document (with Sage Weil) • http://pad.ceph.com/p/deduplication_how_dedup_manifists • http://pad.ceph.com/p/deduplication_how_do_we_store_chunk • http://pad.ceph.com/p/deduplication_how_do_we_chunk • http://pad.ceph.com/p/deduplication_how_to_drive_dedup_process • Progress § osd,librados: add manifest, redirect https://github.com/ceph/ceph/pull/14894 § osd,librados: add manifest, operations for chunked object https://github.com/ceph/ceph/pull/15482 § osd: flush operations for chunked objects https://github.com/ceph/ceph/pull/19294 § osd, librados: add a rados op (TIER_PROMOTE) https://github.com/ceph/ceph/pull/19362 § WIP: osd: refcount for manifest object (redirect, chunked) https://github.com/ceph/ceph/pull/19935
  • 12. Plan & Issues • Plan (To Do) § Reference counting methods and data types for redirect and chunk § Offline fingerprinting and then storing of dedup chunked manifest (whole object or parts of it) § Dedup processing § Background dedup worker § Refcount manager and methods for dedup (http://pad.ceph.com/p/deduplication_how_do_we_store_chunk) § Fixed-sized backpointers § Scrub § Test cases § Issues § Small chunk (< 64 KB) § Minimizing performance degradation § Dedup methods (inline?) § CDC (contents defined chunking)