SlideShare a Scribd company logo
WalB: A Fast and Low Latency
Backup System for Block Devices
Cybozu Meetup #8 SRE WalB
Kota Uchida
September 25, 2017
1
2
About me
▌Kota Uchida
▌SRE team at Cybozu, Inc.
▌A WalB developer
3
About Cybozu
▌A large cloud service vendor in Japan.
▌Largest market shares
in field of collaborative software.
▌We serve web applications on our own cloud platform.
 kintone: a low-code business app platform
 and more
#customer companies:
#accesses / day:
write IOs / day:
20,000+
210 millions
24.5 TiB
4
5
Service Level Objective
▌24/7 nonstop service
▌99.99% availability (4 min / month)
▌Daily backup (retention period is 14 days)
▌Disaster recover: copy data to a remote site once a day
Architecture of our platform
6
Application
Server
L7LB
Storage Server
dm-snap
Storage Server
dm-snap
Backup Server
Remote Site
Database
Server
DiffDiff
DiffDiff
The scope of this talk
RAID 1
Blob
Server
Mapping
Info
Snapshot Management
with dm-snap
7
A B
Original Volume Area
Snapshot Area
Logical Structure
Physical Structure
(1) CoW
Latest Image
Write A’ Write B’
Snapshot Image
(2) Write
B’
B
B’
A
A’
A’
0 1 2 3 4
Backup using dm-snap
8
Snapshot1
(2) Full-scan a new snapshot
Logical Structure
Snapshot0
B’A’
(3) Generate a diff image
by comparing two snapshots
B
(1) Full-scan an old snapshot
B’A’
A
Full-scan at night
9
Daytime
Backup processing time
o’clock
UX degradation
during a full-scan
10Full-scanning
11
We have no more “nights”
▌Until now:
Full scan is allowed only when access rate is low, i.e., at night.
▌From now on:
We have to handle accesses from multiple timezones.
▌We must be able to backup any time without UX degradation.
12
New Solution
▌We need a new solution with:
 No IO spikes
 Short backup time
▌We compared dm-thin with WalB
13
What is dm-thin?
▌dm-thin provides thin-provisioning volume management to
 share same data among volumes
 reduce disk usage using snapshots
▌In the mainline Linux kernel
Snapshot Management
with dm-thin
Logical Structure
Physical Structure
A
Latest Tree
Latest Image A
Snapshot Management
with dm-thin
15
Logical Structure
Physical Structure
A
Snapshot Tree Latest Tree
ASnapshot
Latest Image A
Snapshot Management
with dm-thin
16
A A’
Snapshot Tree Latest Tree
(1) CoW
(1) CoW
Write A’
Physical Structure
(2) Write
(2) Update
A’
ASnapshot
Latest Image
Logical Structure
17
A B B’
Snapshot0 Snapshot1
A’
A’ B’
A BSnapshot0
Snapshot1
Generate a diff image using dm-thin metadata
Logical Structure
Physical Structure
Backup using dm-thin
18
What is WalB?
▌A real-time and incremental backup system
 developed at Cybozu Labs
▌Can backup block devices without IO spikes
dm-snap
full scanning
WalB
no spikes
Special Block Devices for WalB
19
WalB device
Data device Log device
Read Write
Any application (File system, DBMS, etc.)
Linear mapped Ring buffer
Write IO Logging and Backup
with WalB
20
A B
Data Device Log Device
0 1 2 3 4
Time series of write I/Os
Time
Write IO Logging and Backup
with WalB
21
B
A B
Write A’
Data Device Log Device
A’
0 1 2 3 4
1 A’
Time series of write I/Os
Time
Scan the log device and
generate a diff image
Write IO Logging and Backup
with WalB
22
B
A B
B’
Write A’
Write B’
Data Device Log Device
A’
A’ 41
0 1 2 3 4
A’
A’ B’
Time series of write I/Os
Scan the log device and
generate a diff image
Time
1
23
Performance test
▌Compared dm-snap, dm-thin, and WalB
▌Executed a workload during a backup
 The workload & the backup will affect each other
▌Measured the following metrics:
 Latencies of the workload
 Backup time
24
Environment & Settings
▌Test environment:
 CPU:2.40 GHz x 12 cores
 MEM:192 GiB
 HDD:4 TB HDD, RAID 6 (8D2P)
 NIC:10 Gbps x 2
 Kernel:4.11 (latest upstream)
▌Test settings:
 100 GiB volumes
 Workload: 4 KiB Random writes for a 5 GiB range
25
Measuring the Backup Time
(dm-snap, dm-thin)
▌dm-snap:take a snapshot & scan full image
▌dm-thin:get a structure of snapshot trees & find modified
blocks & read these blocks
5 GiB 95 GiB (unchanged)
4 KiB Random Writes
dm-snap : scan full image
dm-thin : scan changed chunks (tree traversal)
26
Measuring the Backup Time
(WalB)
▌WalB:scan logs from a log device & send them to a backup
server continuously
5 GiB 95 GiB (unchanged)
4 KiB Random Writes
WalB : scan logs
Log Device
Write IO logsWalB Device
Backup Server
DiffDiff
Network
Write I/O latency
dm-thin
dm-snap
WalB
no-backup
27
IO spikes due to CoW,
worse than dm-snap!
Small overhead
large due to CoW
Backup time
28
1146
2260
1.2
slower than dm-snap
so fast!
29
Conclusion
▌dm-snap & dm-thin
 High I/O latency during a backup
 Long backup time
▌WalB
 Stable and low I/O latency (no spikes)
 Short backup time
WalB satisfies our requirements for production use.
30
Try WalB!
▌Project page
 https://walb-linux.github.io/
▌Tutorial
 https://github.com/walb-linux/walb-
tools/tree/master/misc/vagrant/
 Vagrantfile for Ubuntu 16.04 and CentOS 7
Remote Host
31
Incremental backup
▌Daily backup (retention period is 14 days)
▌Worker daemon of WalB selects diff files older than 14
days and applies them to a base image.
Volume Diff Diff Diff…
Base
Diff files for 14 days
Backup
Host
Apply everyday
Remote Host
32
Restoring a volume
▌To restore the latest state of a volume:
 take a snapshot of a base image, and
 apply all diff files to it.
Diff Diff Diff…
Base
Base'
Writable
snapshot
Apply all diffs
Remote Host
33
Make restoration faster 1/2
▌Fast restoration
by preparing read-only snapshots for each day
Diff Diff Diff…
Base
1421
dm-thin snapshots for each day
Diff
Remote Host
34
Make restoration faster 2/2
▌Apply some diffs to the appropriate snapshot.
▌At most 24 hours of diffs are needed to be applied.
Faster!
Diff Diff Diff…
Base
1421
Diff
35
Worldline: restoring a whole
environment
▌"Worldline" means a parallel world.
▌We backup configurations in addition to user data.
 Configurations:
definitions for each customer (ID, FQDN, Apps, …),
application version definition,
host definition, etc.
▌It is important to use applications whose versions are
consistent with user data backed up before.
36
Worldline: restoring a whole
environment
▌A daily script takes a snapshot of a whole environment.
▌An weekly script restores the latest backup, so we can use it
for investigation of failures or development our services.
User
data
DiffDiff
Snap
shot
Config
DB
Config
DB'Backup Backup
Worldline
Spare hosts
Restore
DiffDiff
Restore
Q&A
email: kota-uchida@cybozu.co.jp
twitter: @uchan_nos
37

More Related Content

What's hot (18)

PDF
Istioの始め方・環境構築方法
Shoichiro Sakaigawa
 
PDF
The Smug Mug Tale
MySQLConference
 
PDF
[Pgday.Seoul 2018] PostgreSQL 성능을 위해 개발된 라이브러리 OS 소개 apposha
PgDay.Seoul
 
PDF
Hyper-V y Contenedores, una nueva forma de virtualización
Juan Ignacio Oller Aznar
 
PDF
XPDS14 - OSv - A Modern Semi-POSIX LibraryOS - Glauber Costa, Cloudius Systems
The Linux Foundation
 
PDF
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, Oracle
The Linux Foundation
 
PDF
OSv – The OS designed for the Cloud
Yandex
 
KEY
MongoDB on CloudFoundry
Yohei Sasaki
 
PDF
OSv presentation from Linux Foundation Collaboration Summit
Don Marti
 
ODP
Experience In Building Scalable Web Sites Through Infrastructure's View
Phuwadon D
 
PPTX
Using Redis as Distributed Cache for ASP.NET apps - Peter Kellner, 73rd Stre...
Redis Labs
 
PPTX
Cloud Storage Introduction ( CEPH )
Alex Lau
 
PPTX
Webinar: What’s Your Path to NVMe?
Storage Switzerland
 
PPTX
Apahce Ignite
Philip Zheng
 
PPTX
IMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory Easy
In-Memory Computing Summit
 
PPTX
Alternative Database Technology in the Cloud
Bret Piatt
 
PDF
Openstack CPI cloudfoundry
Yitao Jiang
 
PPTX
Rocking mongo db on the cloud
MongoDB
 
Istioの始め方・環境構築方法
Shoichiro Sakaigawa
 
The Smug Mug Tale
MySQLConference
 
[Pgday.Seoul 2018] PostgreSQL 성능을 위해 개발된 라이브러리 OS 소개 apposha
PgDay.Seoul
 
Hyper-V y Contenedores, una nueva forma de virtualización
Juan Ignacio Oller Aznar
 
XPDS14 - OSv - A Modern Semi-POSIX LibraryOS - Glauber Costa, Cloudius Systems
The Linux Foundation
 
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, Oracle
The Linux Foundation
 
OSv – The OS designed for the Cloud
Yandex
 
MongoDB on CloudFoundry
Yohei Sasaki
 
OSv presentation from Linux Foundation Collaboration Summit
Don Marti
 
Experience In Building Scalable Web Sites Through Infrastructure's View
Phuwadon D
 
Using Redis as Distributed Cache for ASP.NET apps - Peter Kellner, 73rd Stre...
Redis Labs
 
Cloud Storage Introduction ( CEPH )
Alex Lau
 
Webinar: What’s Your Path to NVMe?
Storage Switzerland
 
Apahce Ignite
Philip Zheng
 
IMCSummit 2016 Keynote - Benzi Galili - More Memory for In-Memory Easy
In-Memory Computing Summit
 
Alternative Database Technology in the Cloud
Bret Piatt
 
Openstack CPI cloudfoundry
Yitao Jiang
 
Rocking mongo db on the cloud
MongoDB
 

Viewers also liked (20)

PDF
3000社の業務データ絞り込みを支える技術
Ryo Mitoma
 
PDF
Jenkins 2.0 最新事情 〜Make Jenkins Great Again〜
Jumpei Miyata
 
PDF
離れた場所でも最高のチームワークを実現する方法 ーサイボウズ開発チームのリモートワーク事例ー
Teppei Sato
 
PDF
あなたの開発チームには、チームワークがあふれていますか?
Yusuke Amano
 
PPTX
Api Strat Portland 2017 Serverless Extensibility talk
Glenn Block
 
PDF
サイボウズのフロントエンド開発 現在とこれからの挑戦
Teppei Sato
 
PPTX
すべての人にチームワークを サイボウズのアクセシビリティ
Kobayashi Daisuke
 
PDF
サイボウズのサービスを支えるログ基盤
Shin'ya Ueoka
 
PDF
遅いクエリと向き合う仕組み #CybozuMeetup
S Akai
 
PDF
すべてを自動化せよ! 〜生産性向上チームの挑戦〜
Jumpei Miyata
 
PDF
Kubernetes in 30 minutes (2017/03/10)
lestrrat
 
PDF
Kubernetesにまつわるエトセトラ(主に苦労話)
Works Applications
 
PDF
形態素解析
Works Applications
 
PDF
小さく始める大規模スクラム
Keisuke Tsukagoshi
 
PPTX
プロジェクト管理でkintone
Cybozucommunity
 
PDF
缶詰屋さんの課題解決にスクラムを使ってみた
Toshiyuki Ohtomo
 
PPTX
導入に困っているあなたに贈る スクラム導入コミュニケーション術
Kouki Kawagoi
 
PDF
なんたって”DevQA” アジャイル開発とQAの合体が改善を生む - 永田 敦 氏 #postudy
POStudy
 
PDF
[RSGT2017] つらい問題に出会ったら
Takahiro Kaihara
 
3000社の業務データ絞り込みを支える技術
Ryo Mitoma
 
Jenkins 2.0 最新事情 〜Make Jenkins Great Again〜
Jumpei Miyata
 
離れた場所でも最高のチームワークを実現する方法 ーサイボウズ開発チームのリモートワーク事例ー
Teppei Sato
 
あなたの開発チームには、チームワークがあふれていますか?
Yusuke Amano
 
Api Strat Portland 2017 Serverless Extensibility talk
Glenn Block
 
サイボウズのフロントエンド開発 現在とこれからの挑戦
Teppei Sato
 
すべての人にチームワークを サイボウズのアクセシビリティ
Kobayashi Daisuke
 
サイボウズのサービスを支えるログ基盤
Shin'ya Ueoka
 
遅いクエリと向き合う仕組み #CybozuMeetup
S Akai
 
すべてを自動化せよ! 〜生産性向上チームの挑戦〜
Jumpei Miyata
 
Kubernetes in 30 minutes (2017/03/10)
lestrrat
 
Kubernetesにまつわるエトセトラ(主に苦労話)
Works Applications
 
形態素解析
Works Applications
 
小さく始める大規模スクラム
Keisuke Tsukagoshi
 
プロジェクト管理でkintone
Cybozucommunity
 
缶詰屋さんの課題解決にスクラムを使ってみた
Toshiyuki Ohtomo
 
導入に困っているあなたに贈る スクラム導入コミュニケーション術
Kouki Kawagoi
 
なんたって”DevQA” アジャイル開発とQAの合体が改善を生む - 永田 敦 氏 #postudy
POStudy
 
[RSGT2017] つらい問題に出会ったら
Takahiro Kaihara
 
Ad

Similar to WalB: Real-time and Incremental Backup System for Block Devices (20)

PDF
An Efficient Backup and Replication of Storage
Takashi Hoshino
 
PPTX
Truly non-intrusive OpenStack Cinder backup for mission critical systems
Dipak Kumar Singh
 
PDF
Why btrfs is the Bread and Butter of Filesystems
degarden
 
PDF
LinuxCon_2013_NA_Eckermann_Filesystems_btrfs.pdf
degarden
 
PDF
RBD: What will the future bring? - Jason Dillaman
Ceph Community
 
PDF
Full system roll-back and systemd in SUSE Linux Enterprise 12
Gábor Nyers
 
PDF
Ceph RBD Update - June 2021
Ceph Community
 
PDF
PostgreSQL on EXT4, XFS, BTRFS and ZFS
Tomas Vondra
 
PDF
optimizing_ceph_flash
Vijayendra Shamanna
 
PDF
TUT18972: Unleash the power of Ceph across the Data Center
Ettore Simone
 
PPTX
Backups _Disaster_Recovery for 202 .pptx
Dorcask3
 
PDF
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
Tomas Vondra
 
ODP
How we setup Rsync-powered Incremental Backups
nicholaspaun
 
PDF
Become a MySQL DBA - slides: Deciding on a relevant backup solution
Severalnines
 
PPTX
Database Dumps and Backups
EDB
 
PPTX
From Backups To Time Travel: A Systems Perspective on Snapshots
NuoDB
 
PPTX
Backing up the virtual datacentre. Charlie Llewellyn and Andy Powell from Edu...
Eduserv
 
PPTX
Ceph - High Performance Without High Costs
Jonathan Long
 
PDF
Linux con europe_2014_f
sprdd
 
PDF
Linux con europe_2014_full_system_rollback_btrfs_snapper_0
sprdd
 
An Efficient Backup and Replication of Storage
Takashi Hoshino
 
Truly non-intrusive OpenStack Cinder backup for mission critical systems
Dipak Kumar Singh
 
Why btrfs is the Bread and Butter of Filesystems
degarden
 
LinuxCon_2013_NA_Eckermann_Filesystems_btrfs.pdf
degarden
 
RBD: What will the future bring? - Jason Dillaman
Ceph Community
 
Full system roll-back and systemd in SUSE Linux Enterprise 12
Gábor Nyers
 
Ceph RBD Update - June 2021
Ceph Community
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
Tomas Vondra
 
optimizing_ceph_flash
Vijayendra Shamanna
 
TUT18972: Unleash the power of Ceph across the Data Center
Ettore Simone
 
Backups _Disaster_Recovery for 202 .pptx
Dorcask3
 
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
Tomas Vondra
 
How we setup Rsync-powered Incremental Backups
nicholaspaun
 
Become a MySQL DBA - slides: Deciding on a relevant backup solution
Severalnines
 
Database Dumps and Backups
EDB
 
From Backups To Time Travel: A Systems Perspective on Snapshots
NuoDB
 
Backing up the virtual datacentre. Charlie Llewellyn and Andy Powell from Edu...
Eduserv
 
Ceph - High Performance Without High Costs
Jonathan Long
 
Linux con europe_2014_f
sprdd
 
Linux con europe_2014_full_system_rollback_btrfs_snapper_0
sprdd
 
Ad

More from uchan_nos (20)

PPTX
MikanOSと自作CPUをUSBで接続する
uchan_nos
 
PPTX
OSを手作りするという趣味と仕事
uchan_nos
 
PPTX
小型安価なFPGAボードの紹介と任意波形発生器
uchan_nos
 
PPTX
トランジスタ回路:エミッタ接地増幅回路
uchan_nos
 
PPTX
OpeLa: セルフホストなOSと言語処理系を作るプロジェクト
uchan_nos
 
PPTX
自作言語でお絵描き
uchan_nos
 
PPTX
OpeLa 進捗報告 at 第23回自作OSもくもく会
uchan_nos
 
PPTX
サイボウズ・ラボへ転籍して1年を振り返る
uchan_nos
 
PPTX
USB3.0ドライバ開発の道
uchan_nos
 
PPTX
Security Nextcamp remote mob programming
uchan_nos
 
PPTX
Langsmith OpeLa handmade self-hosted OS and LPS
uchan_nos
 
PPTX
OpeLa セルフホストなOSと言語処理系の自作
uchan_nos
 
PPTX
自動でバグを見つける!プログラム解析と動的バイナリ計装
uchan_nos
 
PDF
1を書いても0が読める!?隠れた重要命令INVLPG
uchan_nos
 
PDF
レガシーフリーOSに必要な要素技術 legacy free os
uchan_nos
 
PDF
Building libc++ for toy OS
uchan_nos
 
PDF
プランクトンサミットの歴史2019
uchan_nos
 
PDF
Introduction of security camp 2019
uchan_nos
 
PDF
30分で分かる!OSの作り方 ver.2
uchan_nos
 
PDF
Timers
uchan_nos
 
MikanOSと自作CPUをUSBで接続する
uchan_nos
 
OSを手作りするという趣味と仕事
uchan_nos
 
小型安価なFPGAボードの紹介と任意波形発生器
uchan_nos
 
トランジスタ回路:エミッタ接地増幅回路
uchan_nos
 
OpeLa: セルフホストなOSと言語処理系を作るプロジェクト
uchan_nos
 
自作言語でお絵描き
uchan_nos
 
OpeLa 進捗報告 at 第23回自作OSもくもく会
uchan_nos
 
サイボウズ・ラボへ転籍して1年を振り返る
uchan_nos
 
USB3.0ドライバ開発の道
uchan_nos
 
Security Nextcamp remote mob programming
uchan_nos
 
Langsmith OpeLa handmade self-hosted OS and LPS
uchan_nos
 
OpeLa セルフホストなOSと言語処理系の自作
uchan_nos
 
自動でバグを見つける!プログラム解析と動的バイナリ計装
uchan_nos
 
1を書いても0が読める!?隠れた重要命令INVLPG
uchan_nos
 
レガシーフリーOSに必要な要素技術 legacy free os
uchan_nos
 
Building libc++ for toy OS
uchan_nos
 
プランクトンサミットの歴史2019
uchan_nos
 
Introduction of security camp 2019
uchan_nos
 
30分で分かる!OSの作り方 ver.2
uchan_nos
 
Timers
uchan_nos
 

Recently uploaded (20)

PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PPT
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
PPTX
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
PPTX
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
PPTX
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
PPTX
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PDF
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PPTX
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
PPTX
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
PDF
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
PDF
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
 
PPTX
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
 
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 

WalB: Real-time and Incremental Backup System for Block Devices

  • 1. WalB: A Fast and Low Latency Backup System for Block Devices Cybozu Meetup #8 SRE WalB Kota Uchida September 25, 2017 1
  • 2. 2 About me ▌Kota Uchida ▌SRE team at Cybozu, Inc. ▌A WalB developer
  • 3. 3 About Cybozu ▌A large cloud service vendor in Japan. ▌Largest market shares in field of collaborative software. ▌We serve web applications on our own cloud platform.  kintone: a low-code business app platform  and more
  • 4. #customer companies: #accesses / day: write IOs / day: 20,000+ 210 millions 24.5 TiB 4
  • 5. 5 Service Level Objective ▌24/7 nonstop service ▌99.99% availability (4 min / month) ▌Daily backup (retention period is 14 days) ▌Disaster recover: copy data to a remote site once a day
  • 6. Architecture of our platform 6 Application Server L7LB Storage Server dm-snap Storage Server dm-snap Backup Server Remote Site Database Server DiffDiff DiffDiff The scope of this talk RAID 1 Blob Server
  • 7. Mapping Info Snapshot Management with dm-snap 7 A B Original Volume Area Snapshot Area Logical Structure Physical Structure (1) CoW Latest Image Write A’ Write B’ Snapshot Image (2) Write B’ B B’ A A’ A’ 0 1 2 3 4
  • 8. Backup using dm-snap 8 Snapshot1 (2) Full-scan a new snapshot Logical Structure Snapshot0 B’A’ (3) Generate a diff image by comparing two snapshots B (1) Full-scan an old snapshot B’A’ A
  • 9. Full-scan at night 9 Daytime Backup processing time o’clock
  • 10. UX degradation during a full-scan 10Full-scanning
  • 11. 11 We have no more “nights” ▌Until now: Full scan is allowed only when access rate is low, i.e., at night. ▌From now on: We have to handle accesses from multiple timezones. ▌We must be able to backup any time without UX degradation.
  • 12. 12 New Solution ▌We need a new solution with:  No IO spikes  Short backup time ▌We compared dm-thin with WalB
  • 13. 13 What is dm-thin? ▌dm-thin provides thin-provisioning volume management to  share same data among volumes  reduce disk usage using snapshots ▌In the mainline Linux kernel
  • 14. Snapshot Management with dm-thin Logical Structure Physical Structure A Latest Tree Latest Image A
  • 15. Snapshot Management with dm-thin 15 Logical Structure Physical Structure A Snapshot Tree Latest Tree ASnapshot Latest Image A
  • 16. Snapshot Management with dm-thin 16 A A’ Snapshot Tree Latest Tree (1) CoW (1) CoW Write A’ Physical Structure (2) Write (2) Update A’ ASnapshot Latest Image Logical Structure
  • 17. 17 A B B’ Snapshot0 Snapshot1 A’ A’ B’ A BSnapshot0 Snapshot1 Generate a diff image using dm-thin metadata Logical Structure Physical Structure Backup using dm-thin
  • 18. 18 What is WalB? ▌A real-time and incremental backup system  developed at Cybozu Labs ▌Can backup block devices without IO spikes dm-snap full scanning WalB no spikes
  • 19. Special Block Devices for WalB 19 WalB device Data device Log device Read Write Any application (File system, DBMS, etc.) Linear mapped Ring buffer
  • 20. Write IO Logging and Backup with WalB 20 A B Data Device Log Device 0 1 2 3 4 Time series of write I/Os Time
  • 21. Write IO Logging and Backup with WalB 21 B A B Write A’ Data Device Log Device A’ 0 1 2 3 4 1 A’ Time series of write I/Os Time Scan the log device and generate a diff image
  • 22. Write IO Logging and Backup with WalB 22 B A B B’ Write A’ Write B’ Data Device Log Device A’ A’ 41 0 1 2 3 4 A’ A’ B’ Time series of write I/Os Scan the log device and generate a diff image Time 1
  • 23. 23 Performance test ▌Compared dm-snap, dm-thin, and WalB ▌Executed a workload during a backup  The workload & the backup will affect each other ▌Measured the following metrics:  Latencies of the workload  Backup time
  • 24. 24 Environment & Settings ▌Test environment:  CPU:2.40 GHz x 12 cores  MEM:192 GiB  HDD:4 TB HDD, RAID 6 (8D2P)  NIC:10 Gbps x 2  Kernel:4.11 (latest upstream) ▌Test settings:  100 GiB volumes  Workload: 4 KiB Random writes for a 5 GiB range
  • 25. 25 Measuring the Backup Time (dm-snap, dm-thin) ▌dm-snap:take a snapshot & scan full image ▌dm-thin:get a structure of snapshot trees & find modified blocks & read these blocks 5 GiB 95 GiB (unchanged) 4 KiB Random Writes dm-snap : scan full image dm-thin : scan changed chunks (tree traversal)
  • 26. 26 Measuring the Backup Time (WalB) ▌WalB:scan logs from a log device & send them to a backup server continuously 5 GiB 95 GiB (unchanged) 4 KiB Random Writes WalB : scan logs Log Device Write IO logsWalB Device Backup Server DiffDiff Network
  • 27. Write I/O latency dm-thin dm-snap WalB no-backup 27 IO spikes due to CoW, worse than dm-snap! Small overhead large due to CoW
  • 29. 29 Conclusion ▌dm-snap & dm-thin  High I/O latency during a backup  Long backup time ▌WalB  Stable and low I/O latency (no spikes)  Short backup time WalB satisfies our requirements for production use.
  • 30. 30 Try WalB! ▌Project page  https://walb-linux.github.io/ ▌Tutorial  https://github.com/walb-linux/walb- tools/tree/master/misc/vagrant/  Vagrantfile for Ubuntu 16.04 and CentOS 7
  • 31. Remote Host 31 Incremental backup ▌Daily backup (retention period is 14 days) ▌Worker daemon of WalB selects diff files older than 14 days and applies them to a base image. Volume Diff Diff Diff… Base Diff files for 14 days Backup Host Apply everyday
  • 32. Remote Host 32 Restoring a volume ▌To restore the latest state of a volume:  take a snapshot of a base image, and  apply all diff files to it. Diff Diff Diff… Base Base' Writable snapshot Apply all diffs
  • 33. Remote Host 33 Make restoration faster 1/2 ▌Fast restoration by preparing read-only snapshots for each day Diff Diff Diff… Base 1421 dm-thin snapshots for each day Diff
  • 34. Remote Host 34 Make restoration faster 2/2 ▌Apply some diffs to the appropriate snapshot. ▌At most 24 hours of diffs are needed to be applied. Faster! Diff Diff Diff… Base 1421 Diff
  • 35. 35 Worldline: restoring a whole environment ▌"Worldline" means a parallel world. ▌We backup configurations in addition to user data.  Configurations: definitions for each customer (ID, FQDN, Apps, …), application version definition, host definition, etc. ▌It is important to use applications whose versions are consistent with user data backed up before.
  • 36. 36 Worldline: restoring a whole environment ▌A daily script takes a snapshot of a whole environment. ▌An weekly script restores the latest backup, so we can use it for investigation of failures or development our services. User data DiffDiff Snap shot Config DB Config DB'Backup Backup Worldline Spare hosts Restore DiffDiff Restore

Editor's Notes

  • #2: Thank you for attending to this presentation. I’ll talk about WalB, a fast and low latency backup system for block devices. OK, let’s start.
  • #3: My name is Kota Uchida. I’m a site reliability engineer at Cybozu incorporated. I’m a WalB developer. I have deployed WalB backup system on our production environment.
  • #4: Do you know Cybozu? Cybozu is a large cloud service vendor in Japan. We have the largest market shares in the field of “groupware” or “collaborative software”, like online calendar, workflow, bulletin board system, and so on. Our mission is to enhance teamworks all over the world. We serve web applications on our own cloud platform, not a public cloud like AWS. One of our applications is “kintone”. “kintone” is a low-code business application platform. You can create business applications with little or no code.
  • #5: Over nineteen thousands companies are using our services. One ninety millions accesses per day. About Twenty five tibibytes data are written to storages everyday.
  • #6: Let me explain our service level objective. Our services are twenty-four seven, nonstop services. We target, and almost achieve, four-nine availability. We backup user data everyday and keep them for fourteen days. We also send the data to a remote site once a day for disaster recovery.
  • #7: This diagram shows the architecture of our cloud platform. It is basic architecture. An user request goes through several components, such as layer seven load balancers, application servers, a database server and a blob server. User data will be written to two storage servers. They replicate data each other by software RAID I. Data written to the storage servers will be backed up to a backup server, and copied to a remote site once a day. In a storage server, we use dm-snap to create snapshots and back them up. I’ll show how we do that in the next slide.
  • #8: This is detailed architecture of dm-snap. The above figures, indicated as logical structure, show how volumes and snapshots of dm-snap look like from the point of view of users and applications . Let’s consider there is a disk which consists of 5 blocks, numbered zero to four. We assume we took a snapshot when the content of block 1 was A, and block 3 was B. Then write requests came to block 1 and 3. As a result these blocks were overwritten, while the snapshot image was not changed. The below figures are physical structure of a volume with a snapshot. When you create a snapshot, dm-snap prepares a snapshot area with initialized mapping information. When you submit a write IO request at block 1 with A-dash data, dm-snap copies block 1 to the snapshot area first, then the block will be overwritten by A-dash data. In the next slide, I’ll show you how to backup data using dm-snap.
  • #9: There are two snapshots 0 and 1. We assume snapshot 1 is newer than snapshot 0. To backup a snapshot taken by dm-snap, two snapshots are required. We scan all blocks of the two snapshots, then we compare them block by block, finally we can get a diff image. We need massive amount of reads for incremental backup, using dm-snap. Q: Like dm-thin, is it possible to extract diff images using mapping indexes without full scan dm-snap? A: Yes it is possible, but it must be very slow because it requires two snapshots at the same time. The overhead of copy-on-write is much larger than that of with dm-thin.
  • #10: So we backup user data at night, when accesses are decreased. The above graph shows read I/O throughput of one volume. There is a large spike, about seven hundred MB per second, at midnight. The below graph shows the number of user requests on our platform. Many people don’t use our services at night. # グラフデータは read throughput については tyss-221 と tyss-222 の合算。 response については serviceset:ty13 の全て # Q. 何故 daytime に read がほとんどないのか # A. ほとんどのデータはキャッシュに載っているから Because almost all of data are on cache memory.
  • #11: This graph shows a time series of user response time in milliseconds . Data consist of several update operations in a storage server in production. During full-scan, user experience seems to be worse than usual, because ninety percentile of response time exceeds 1 second . # 結局 ty22 の kintone add.json のみのグラフ。10分間隔でのデータ。8:00JST-19:00JST
  • #12: And now, we have no more nights. Because we try to provide our services to worldwide customers, we have to handle accesses from multiple timezones. So we must be able to backup user data any time without affecting user experience. With dm-snap, it cannot be achieved. We need a new solution.
  • #13: We have researched other backup solutions which satisfy 2 requirements: no I/O spikes and short backup time. There are two candidates, dm-thin and WalB. We compared them.
  • #14: dm-thin provides thin-provisioning volume management to share same data among volumes and reduce disk usage using snapshots. dm-thin is included in the mainline kernel. In the following slides, I will explain how dm-thin implements snapshot feature, and how to use it for backup.
  • #15: This figures show how dm-thin provides snapshot management feature. Please look at the above figure. From user’s point of view, a snapshot of dm-thin can be considered as a normal volume. Next, look at the below figure. This is little bit complicated. Since dm-thin is not the essential part of this talk, you may ignore this diagram. At first, there is one tree expressing the latest image. Intermediate nodes have only meta data, and leaf nodes have user data.
  • #16: When you take a snapshot of a volume, dm-thin copies the root node for the latest and the snapshot tree. At this point, the “latest tree” node refers the original intermediate nodes.
  • #17: When a write request comes, dm-thin copies a corresponding leaf and its ancestor nodes, and modify the link of the root node. The latest tree has been updated while the snapshot tree remains unchanged.
  • #18: In this slide, I’ll explain how to backup a volume using dm-thin. In the above figure, there are two snapshots. We assume snapshot 1 is newer than snapshot 0. Difference between snapshot 0 and 1 are two blocks, A and B. Its physical structure is pictured as the below figure. There are two trees representing two snapshots. You can get a diff image for incremental backup by comparing structure of two snapshot trees. In this example, the diff image consists of A’ and B’.
  • #19: So far I have explained dm-thin. From here, let me introduce WalB. WalB is a real-time and incremental backup system. It has been developed at Cybozu Labs. Using WalB, there seems to be no I/O spikes.
  • #20: This slide explains architecture of a WalB device. WalB backup system uses special block devices, called WalB devices. A WalB device, shown in this picture, is a virtual block device that consists of two ordinary block devices, a data device and a log device. A data device stores user data. Its block addresses are mapped linearly to WalB device. A log device stores write IO logs. This area is used as a ring buffer. Write I/O requests are handled at both the data and log devices. Read I/O requests are handled at the data device only. WalB device driver preserves consistency appropriately.
  • #21: In this slide, I will explain how WalB takes a backup continuously. Imagine there is a block device with 5 blocks. Block 1 has data A, and block 4 has data B.
  • #22: When a write request comes, a WalB device writes it to a data device and a log device. WalB tools read the log device and generate a diff image.
  • #23: Another write request will be treated in the same way. Diff images generated by WalB tools will be sent to a backup server.
  • #24: We conducted an experiment to see performance of dm-snap, dm-thin, and WalB. In the experiment, we executed a workload during a backup. The workload and the backup will affect each other, so we did these concurrently. We measured two metrics, IO latencies of the workload and the backup time.
  • #25: We used server machines with the same spec as ones in our production environment. CPUs have total twelve cores. One-ninety-two gibibytes memory. RAID six storage. Ten gigabits ethernet. Latest upstream kernel. We created one hundred gibibytes volumes for each backup solution, dm-snap, dm-thin, and WalB. And we executed four kibibytes random writes onto the volumes. During the workload running, we executed an incremental backup also.
  • #26: Now let’s check again how to backup with dm-snap, dm-thin, and WalB. With dm-snap, we have to scan full area of the snapshot to create a diff image for incremental backup. Those read I/Os are nearly sequential. With dm-thin, first we get information about structures of snapshot trees; then we calculate which blocks are modified; finally we read modified blocks to create a diff image. Tree traversal is tend to be random read. In fact we did an emulation of backup for dm-snap and dm-thin. A real backup has to read a volume, calculate differences and send them to a backup server. In this experiment, for dm-snap, we just scanned full of the latest snapshot. For dm-thin, we just read modified blocks using metadata information. They are the dominant part of the backup time, so we treated them as their backup time.
  • #27: For WalB, we employed a real backup system, which is used in our production environment. It’s not an emulation. WalB tools extract write IO logs from the log device and send them to the backup server. The logs extracted are converted to diff images at the backup server. Then the corresponding snapshot becomes restorable. Because WalB tools continuously extract and send I/Os logs, written data will be backed up at almost the same time as the data have been written to a volume. Backup time is defined as a latency from when a snapshot is set at the WalB device as a mark on the latest IO log, to when the snapshot becomes restorable at the backup server. Our 10 gigabit datacenter network does not become a bottleneck, so the backup time with WalB will be a few seconds or less.
  • #28: This is the result of our experiment. This graph shows I/O latencies of workloads, 4KiB random writes during a backup. Base line is a normal logical volume without backup, labelled no-backup. Its latency is stably at 5 milliseconds. Latency of dm-snap is about six times of no-backup, thirty milliseconds. Existence of snapshot and a lot of read I/Os affects write I/O latency. Latency of dm-thin is much worse than that of dm-snap! In the first forty seconds, we can see a large spike of dm-thin. We can see an overhead of WalB, but it is very small: about zero point five milliseconds. WalB does not generate spikes. These characteristics are important for 24/7 cloud service. Now let’s look another aspect of this experiment.
  • #29: This graph shows backup times for the three solutions. Backups are executed under random write workload. dm-snap takes about one thousand seconds. dm-snap is the solution we were using before WalB. dm-thin takes twice as long as dm-snap. If user data we have grow larger, dm-thin may not be able to back them up within one day. It cannot be tolerated because our objective declares daily backup. WalB takes only 1 point 2 seconds. This is short enough to satisfy the declaration of daily backup. In addition to that, this duration, 1 point 2 seconds, is independent of amount of data. It depends amount of write IOs. Our service will glow much more, but backup time shall be no longer a problem. Let me conclude my talk.
  • #30: Taking backup with dm-snap takes a long time and makes user experience worse. Then I compared two solutions: dm-thin and WalB. dm-thin takes also a long time to backup once and causes spikes of I/O latency, while WalB takes a few seconds and has no spikes. So we chose WalB for our backup system.
  • #31: WalB is of course an open source software, it is developed on GitHub. There are vagrantfiles for Ubuntu and CentOS. Detailed tutorial documentation is linked in readme of the vagrantfile. Please try WalB.
  • #38: Thank you for coming to this presentation. Now questions are welcomed.