
Edelta: A Word-Enlarging Based Fast Delta Compression Approach
Wen Xia, Chunguang Li, Hong Jiang
†
, Dan Feng
∗
, Yu Hua, Leihua Qin, Yucheng Zhang
School of Computer, Huazhong University of Science and Technology, Wuhan, China
Wuhan National Laboratory for Optoelectronics, Wuhan, China
†
Dept. of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE, USA
Abstract
Delta compression, a promising data reduction approach
capable of finding the small differences (i.e., delta)
among very similar files and chunks, is widely used for
optimizing replicate synchronization, backup/archival
storage, cache compression, etc. However, delta com-
pression is costly because of its time-consuming word-
matching operations for delta calculation. Our in-
depth examination suggests that there exists strong word-
content locality for delta compression, which means that
contiguous duplicate words appear in approximately the
same order in their similar versions. This observation
motivates us to propose Edelta, a fast delta compression
approach based on a word-enlarging process that exploits
word-content locality. Specifically, Edelta will first ten-
tatively find a matched (duplicate) word, and then greed-
ily stretch the matched word boundary to find a likely
much longer (enlarged) duplicate word. Hence, Edelta
effectively reduces a potentially large number of the tra-
ditional time-consuming word-matching operations to a
single word-enlarging operation, which significantly ac-
celerates the delta compression process. Our evaluation
based on two case studies shows that Edelta achieves an
encoding speedup of 3X∼10X over the state-of-the-art
Ddelta, Xdelta, and Zdelta approaches without notice-
ably sacrificing the compression ratio.
1 Introduction
Delta compression is gaining increasing attention as a
promising technology that effectively eliminates redun-
dancy among the non-duplicate but very similar data
chunks and files in storage systems. Most recently, Dif-
ference Engine [2] combines delta compression, dedu-
plication, and LZ compression to reduce memory us-
age in VM environments, where delta compression de-
livers about 2X more memory savings than VMware
ESX server’s deduplication-only approach. Shilane et
al. [4] implement delta compression on top of dedu-
plication to further eliminate redundancy among simi-
lar data to accelerate the WAN replication of backup
datasets, which obtains an additional compression factor
of 2X-3X. Dropbox [1] implements delta compression to
reduce the bandwidth requirement of uploading the up-
dated files by calculating the small differences between
two revisions and sending only the delta updates.
∗
Although delta compression has been applied in
many areas for space saving, challenges facing high-
performance delta compression remain. One of the main
challenges is its time-consuming word-matching process
for delta calculation, which tries to first find the possible
duplicate words and then the delta between two similar
chunks or files. As suggested by the state-of-the-art ap-
proaches [5, 9], delta compression only offers speeds of
about 25 MB/s (Zdelta), 60 MB/s (Xdelta), 150 MB/s
(Ddelta), a worsening problem in face of the steadily in-
creasing storage bandwidth and speed, for example, an
IOPS of about 100,000 and sequential I/O speed of about
500MB/s offered by Samsung SSD 850 PRO100,000.
Our examination of delta compression suggests that
contiguous duplicate words appear in approximately the
same order among the similar chunks and files. We call
this phenomenon the word-content locality, which is sim-
ilar to the chunk data locality observed in many dedupli-
cation based storage systems [4]. This observation moti-
vates us to propose Edelta, a fast delta compression ap-
proach based on a work-enlarging process that exploits
the word-content locality to reduce the conventional
time-consuming word-matching operations. Specifi-
cally, if Edelta finds a matched word between two sim-
ilar chunks (or files) A and B, it directly uses a byte-
wise comparison in the remaining regions immediately
after the matched word in chunks A and B to find the
potentially much longer (i.e., enlarged) duplicate words.
This word-enlarging method helps avoid most of the tra-
ditional duplicate-checking operations, such as hashing,
indexing, etc., and thus significantly speeding up the
delta compression process.
The Edelta research makes the following three key
contributions: (1) The observation of the word-content
locality existing in delta compression of the similar
chunks and files, which suggests that the size of an ac-
tual duplicate segment is usually much larger than that
of a word conventionally used for duplicate checking in
delta compression. (2) A novel word-enlarging based
delta compression approach, Edelta, to accelerate the
duplicate-word checking process by directly enlarging
each matched word into a much longer one and thus
avoiding the word-matching operations in the enlarged
regions. (3) Experimental results on two case studies
demonstrating Edelta’s very high encoding speed that is
3X-10X faster than the state-of-the-art approaches with-