关于一种基于词扩展的快速Delta压缩方法Edelta论文_如何优化delta压缩速度资源-CSDN下载

需积分: 3 60 浏览量 2023-02-03 17:41:27 上传评论收藏 593KB PDF 举报

Delta压缩是一种很有前途的数据精简方法，能够发现非常相似的文件和块之间的微小差异(即delta)，被广泛用于优化复制同步、备份/归档存储、缓存压缩等。然而，delta压缩的代价是昂贵的，因为用于delta计算的文字匹配操作非常耗时。这一观察激励我们提出Edelta，一种基于词扩展过程的快速delta压缩方法，它利用了词内容的局部性。具体来说，E delta首先会试探性地找到一个匹配(重复)词，然后拉伸匹配词边界，找到一个可能更长(扩大)的重复词。因此，Edelta有效地将传统耗时的文字匹配操作中潜在的大量操作降为单字扩展操作，显著地加速了delta压缩过程。我们基于两个案例研究的评估表明，在不显著牺牲压缩比的情况下，与最先进的Ddelta、Xdelta和Zelta方法相比，Edelta获得了3倍至10倍的编码加速。《基于词扩展的快速Delta压缩方法Edelta》 Delta压缩技术在信息压缩领域扮演着重要角色，它通过找出相似文件和数据块之间的微小差异，实现了数据精简，广泛应用于同步复制、备份/归档存储以及缓存压缩等多个场景。然而，传统的Delta压缩方法存在明显的短板，其计算过程中涉及的单词匹配操作极其耗时，这限制了其在高效能应用中的潜力。针对这一问题，研究者们提出了Edelta，一种创新的基于词扩展的快速Delta压缩方法。Edelta利用了数据中单词内容的局部性特性，即在相似版本的文件中，连续的重复单词往往以大致相同的顺序出现。这种方法的核心在于，它不再依赖于传统的逐字匹配，而是尝试性地找到一个匹配的单词，然后逐步扩展这个匹配的边界，寻找可能更长的重复单词序列。这种词扩展策略将原本需要大量操作的单词匹配过程简化为单一的单词扩展操作，极大地提升了Delta压缩的效率。具体来说，Edelta的实现过程分为以下几个步骤： 1. **试探性匹配**：算法会搜索数据块，尝试找到一个初步的匹配单词。 2. **词边界扩展**：一旦找到匹配的单词，Edelta会尝试向两边扩展这个单词的边界，寻找更大的重复区域。 3. **贪婪策略**：在扩展过程中，算法采用贪婪策略，尽可能找到最长的重复单词，以提高压缩效果。 4. **优化与加速**：通过这种方式，Edelta减少了需要进行的匹配操作，从而显著加速了压缩过程。实验结果表明，Edelta在两个案例研究中对比最先进的Ddelta、Xdelta和Zdelta方法，实现了3到10倍的编码速度提升，同时并未明显牺牲压缩比率。这意味着，即使在保持数据压缩效率的同时，Edelta也能提供更快的处理速度，这对于需要实时或高吞吐量数据处理的场景尤为关键。在当前大数据背景下，数据存储和传输的需求日益增长，高效的压缩技术对于节省存储空间、降低网络传输成本以及提升系统性能具有重要意义。Edelta的提出，无疑为解决Delta压缩的效率问题提供了一种有效途径，为未来的数据压缩技术发展开辟了新的思路。 Edelta通过利用数据的局部性和词扩展策略，实现了对传统Delta压缩方法的优化，提高了压缩速度，降低了计算复杂度，有望在各种数据密集型应用中发挥重要作用。未来的研究将进一步探索如何在保持甚至提升压缩效率的同时，优化算法的资源消耗，以适应更广泛的实际应用场景。

资源推荐

资源详情

资源评论

Edelta: A Word-Enlarging Based Fast Delta Compression Approach

Wen Xia, Chunguang Li, Hong Jiang

†

, Dan Feng

∗

, Yu Hua, Leihua Qin, Yucheng Zhang

School of Computer, Huazhong University of Science and Technology, Wuhan, China

Wuhan National Laboratory for Optoelectronics, Wuhan, China

†

Dept. of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE, USA

Abstract

Delta compression, a promising data reduction approach

capable of ﬁnding the small differences (i.e., delta)

among very similar ﬁles and chunks, is widely used for

optimizing replicate synchronization, backup/archival

storage, cache compression, etc. However, delta com-

pression is costly because of its time-consuming word-

matching operations for delta calculation. Our in-

depth examination suggests that there exists strong word-

content locality for delta compression, which means that

contiguous duplicate words appear in approximately the

same order in their similar versions. This observation

motivates us to propose Edelta, a fast delta compression

approach based on a word-enlarging process that exploits

word-content locality. Speciﬁcally, Edelta will ﬁrst ten-

tatively ﬁnd a matched (duplicate) word, and then greed-

ily stretch the matched word boundary to ﬁnd a likely

much longer (enlarged) duplicate word. Hence, Edelta

effectively reduces a potentially large number of the tra-

ditional time-consuming word-matching operations to a

single word-enlarging operation, which signiﬁcantly ac-

celerates the delta compression process. Our evaluation

based on two case studies shows that Edelta achieves an

encoding speedup of 3X∼10X over the state-of-the-art

Ddelta, Xdelta, and Zdelta approaches without notice-

ably sacriﬁcing the compression ratio.

1 Introduction

Delta compression is gaining increasing attention as a

promising technology that effectively eliminates redun-

dancy among the non-duplicate but very similar data

chunks and ﬁles in storage systems. Most recently, Dif-

ference Engine [2] combines delta compression, dedu-

plication, and LZ compression to reduce memory us-

age in VM environments, where delta compression de-

livers about 2X more memory savings than VMware

ESX server’s deduplication-only approach. Shilane et

al. [4] implement delta compression on top of dedu-

plication to further eliminate redundancy among simi-

lar data to accelerate the WAN replication of backup

datasets, which obtains an additional compression factor

of 2X-3X. Dropbox [1] implements delta compression to

reduce the bandwidth requirement of uploading the up-

dated ﬁles by calculating the small differences between

two revisions and sending only the delta updates.

∗

Corresponding author: [email protected].

Although delta compression has been applied in

many areas for space saving, challenges facing high-

performance delta compression remain. One of the main

challenges is its time-consuming word-matching process

for delta calculation, which tries to ﬁrst ﬁnd the possible

duplicate words and then the delta between two similar

chunks or ﬁles. As suggested by the state-of-the-art ap-

proaches [5, 9], delta compression only offers speeds of

about 25 MB/s (Zdelta), 60 MB/s (Xdelta), 150 MB/s

(Ddelta), a worsening problem in face of the steadily in-

creasing storage bandwidth and speed, for example, an

IOPS of about 100,000 and sequential I/O speed of about

500MB/s offered by Samsung SSD 850 PRO100,000.

Our examination of delta compression suggests that

contiguous duplicate words appear in approximately the

same order among the similar chunks and ﬁles. We call

this phenomenon the word-content locality, which is sim-

ilar to the chunk data locality observed in many dedupli-

cation based storage systems [4]. This observation moti-

vates us to propose Edelta, a fast delta compression ap-

proach based on a work-enlarging process that exploits

the word-content locality to reduce the conventional

time-consuming word-matching operations. Speciﬁ-

cally, if Edelta ﬁnds a matched word between two sim-

ilar chunks (or ﬁles) A and B, it directly uses a byte-

wise comparison in the remaining regions immediately

after the matched word in chunks A and B to ﬁnd the

potentially much longer (i.e., enlarged) duplicate words.

This word-enlarging method helps avoid most of the tra-

ditional duplicate-checking operations, such as hashing,

indexing, etc., and thus signiﬁcantly speeding up the

delta compression process.

The Edelta research makes the following three key

contributions: (1) The observation of the word-content

locality existing in delta compression of the similar

chunks and ﬁles, which suggests that the size of an ac-

tual duplicate segment is usually much larger than that

of a word conventionally used for duplicate checking in

delta compression. (2) A novel word-enlarging based

delta compression approach, Edelta, to accelerate the

duplicate-word checking process by directly enlarging

each matched word into a much longer one and thus

avoiding the word-matching operations in the enlarged

regions. (3) Experimental results on two case studies

demonstrating Edelta’s very high encoding speed that is

3X-10X faster than the state-of-the-art approaches with-

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余4页未读，立即下载

评论收藏

内容反馈

_苏沐

粉丝: 9661

关于一种基于词扩展的快速Delta压缩方法Edelta论文

基于词扩展过程的快速delta压缩提出的edelta压缩方法论文翻译

基于CANopen的Delta机器人控制研究.pdf

一种基于Delta结构的书法机器人的研制.pdf

预测编码Delta实现文本的压缩和解压

基于合成运动的Delta机器人轨迹规划.pdf

Delta机器人图纸+论文完成.zip

基于传送带实时调速的Delta机器人分拣方法.pdf

基于可视化编程语言的Delta机器人控制系统设计.pdf

基于能量指标的DELTA并联机器人拾放轨迹参数优化及验证.pdf

基于MATLAB的一种高精度级联Sigma-Delta调制器的结构设计.pdf

sigma-delta ADC基于matlab simulink的仿真

基于几何法Delta并联机器人运动学分析.pdf

基于PWM和Sigma_Delta调制的数字音频功率放大器的实现

基于凯恩方法的三自由度Delta并联机器人动力学建模.pdf

基于软PLC的Delta机器人运动控制设计及实现.pdf

基于叠加摆线运动规律的Delta机器人轨迹规划.pdf

基于逆位移解的DELTA机器人工作空间分析.pdf

基于视觉的新型Delta机器人控制系统代码

基于ANSYS的Delta并联机器人主动臂静力学和模态分析.pdf

基于软PLC和EtherCAT总线的DELTA2机器人控制系统设计与实现.pdf

Centos7安装docker及Java常用开发环境

R语言数据科学精要

最新资源