SlideShare a Scribd company logo
RocksDB Storage Engine
Igor Canadi | Facebook
Overview
•  Story of RocksDB
•  Architecture
•  Performance tuning
•  Next steps
1
Story of RocksDB
Pre-2011
•  FB infrastructure – many custom-built key-value stores
•  LevelDB released
2
Experimentation (2011 – 2013)
•  First use-cases
•  Not designed for server – many bottlenecks, stalls
•  Optimization
•  New features
3
Explosion (2013 – 2015)
•  Open sourced RocksDB
•  Big success within Facebook
•  External traction – Linkedin, Yahoo, CockroachDB, …
4
New Challenges (2015 - )
•  Bring RocksDB to databases
5
MongoRocks
•  Running in production at Parse for 6 months
•  Huge storage savings (5TB à 285GB)
•  Document-level locking
6
MyRocks
7
InnoDB RocksDB
0
0.2
0.4
0.6
0.8
1
1.2
Database size (relative)
InnoDB
RocksDB
InnoDB RocksDB
0
0.2
0.4
0.6
0.8
1
1.2
Bytes written (relative)
InnoDB
RocksDB
Architecture
Log Structured Merge Trees
Log Structured Merge Trees
8
(64MB)
(256MB)
(512MB)
(5GB)
(50GB)
(500GB)
Memtable
Level 0
Level 1
Level 2
Level 3
Level 4
Log Structured Merge Trees – write
9
(64MB)
(256MB)
Memtable
Level 0
(key,value)
Log Structured Merge Trees – flush
10
(64MB)
(256MB)
Memtable
Level 0
Log Structured Merge Trees – compaction
11
(5GB)
(50GB)
Level 2
Level 3
Writes
•  Foreground:
•  Writes go to memtable (skiplist) + write-ahead log
•  Background:
•  When memtable is full, we flush to Level 0
•  When a level is full, we run compaction
12
Reads
13
(64MB)
(256MB)
(512MB)
(5GB)
(50GB)
(500GB)
Memtable
Level 0
Level 1
Level 2
Level 3
Level 4
Reads
•  Point queries
•  Bloom filters reduce reads from storage
•  Usually only 1 read IO
•  Range scans
•  Bloom filters don’t help
•  Depends on amount of memory, 1-2 IO
14
RocksDB Files
15
rocksdb/> ls

MANIFEST-000032

000024.log

000031.log

000025.sst

000028.sst

000029.sst

000033.sst

000034.sst

LOG

LOG.old.1441234029851978

...
RocksDB Files – MANIFEST
16
(initial state)
Add file 1
Add file 2
Add file 3
Add file 4
…
(flush)
Add file 9
Mark log 6 persisted
(compaction)
Add file 10
Add file 11
Remove file 9
Remove file 8
Add new column
family “system”
•  Atomical updates to database metadata
RocksDB Files – Write-ahead log
17
Write (A, B) Write (C, D)
Write (E, F)
Delete(A) Write(X, Y)
Delete(C)
•  Persisted memtable state
RocksDB Files – Table files
18
(Data block)
•  compressed
•  prefix encoded
(Data block)
<key, value>
(Data block) (Data block)
(Data block) (Data block) (Data block) (Data block)
(Index block)
<key, block>
(Filter block) (Statistics) (Meta index block)
Pointers to blocks
RocksDB Files – LOG files
•  Debugging output
•  Tuning options
•  Information about flushes and compactions
•  Performance statistics
19
Backups
•  Table files are immutable
•  Other files are append-only
•  Easy and fast incremental backups
•  Open sourced Rocks-Strata
20
Performance tuning
Tombstones
•  Deletions are deferred
•  May cause higher P99 latencies
•  Be careful with pathological workloads, e.g. queues
21
Caching
22
Block cache
•  Managed by RocksDB
•  Uncompressed data
•  Defaults to 1/3 of RAM
Page cache
•  Managed by kernel
•  Compressed data
Memory usage
•  Block cache
•  Index and filter blocks (0.5 – 2% of the database)
•  Memtables
•  Blocks pinned by iterators
23
Reduce memory usage
•  Reduce block cache size – will increase CPU
•  Increase block size – decrease index size
•  Turn off bloom filters on bottom level
24
Reduce CPU
•  Profile the CPU usage
•  Increase block cache size – will increase memory usage
•  Turn off compression
•  It might be tombstones
25
Reduce write amplification
•  Write amplification = 5 * num_levels
•  Increase memtable and level 1 size
•  Stronger (zlib, zstd) compression for bottom levels
•  Try universal compaction
26
Next steps
Next steps
•  Increase performance & stability
•  Deploy MyRocks at Facebook
•  External adoption of MyRocks and MongoRocks
•  Build an ecosystem
27
Thank you

More Related Content

What's hot (20)

PDF
Ceph and RocksDB
Sage Weil
 
PPTX
Some key value stores using log-structure
Zhichao Liang
 
PDF
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
The Hive
 
PPTX
When is MyRocks good?
Alkin Tezuysal
 
PDF
MyRocks in MariaDB | M18
Sergey Petrunya
 
PDF
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
Scott Mansfield
 
PDF
M|18 How to use MyRocks with MariaDB Server
MariaDB plc
 
PDF
Managing terabytes: When Postgres gets big
Selena Deckelmann
 
PPTX
Introduction to Redis
Arnab Mitra
 
PPTX
SQL Server 2014 In-Memory OLTP
Tony Rogerson
 
PDF
Boosting Machine Learning with Redis Modules and Spark
Dvir Volk
 
PDF
Disperse xlator ramon_datalab
Gluster.org
 
PDF
Ndb cluster 80_ycsb_disk
mikaelronstrom
 
PDF
XSKY - ceph luminous update
inwin stack
 
PPTX
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
ScyllaDB
 
PDF
Optimizing columnar stores
Istvan Szukacs
 
PDF
Fractal Tree Indexes : From Theory to Practice
Tim Callaghan
 
PDF
Key-Value-Stores -- The Key to Scaling?
Tim Lossen
 
PDF
NOSQL Overview
Tobias Lindaaker
 
PPTX
Get More Out of MySQL with TokuDB
Tim Callaghan
 
Ceph and RocksDB
Sage Weil
 
Some key value stores using log-structure
Zhichao Liang
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
The Hive
 
When is MyRocks good?
Alkin Tezuysal
 
MyRocks in MariaDB | M18
Sergey Petrunya
 
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
Scott Mansfield
 
M|18 How to use MyRocks with MariaDB Server
MariaDB plc
 
Managing terabytes: When Postgres gets big
Selena Deckelmann
 
Introduction to Redis
Arnab Mitra
 
SQL Server 2014 In-Memory OLTP
Tony Rogerson
 
Boosting Machine Learning with Redis Modules and Spark
Dvir Volk
 
Disperse xlator ramon_datalab
Gluster.org
 
Ndb cluster 80_ycsb_disk
mikaelronstrom
 
XSKY - ceph luminous update
inwin stack
 
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
ScyllaDB
 
Optimizing columnar stores
Istvan Szukacs
 
Fractal Tree Indexes : From Theory to Practice
Tim Callaghan
 
Key-Value-Stores -- The Key to Scaling?
Tim Lossen
 
NOSQL Overview
Tobias Lindaaker
 
Get More Out of MySQL with TokuDB
Tim Callaghan
 

Viewers also liked (14)

PPTX
PostgreSQL and CockroachDB SQL
CockroachDB
 
PDF
Elasticsearch入門 pyfes 201207
Jun Ohtani
 
PDF
第17回Lucene/Solr勉強会 #SolrJP – Apache Lucene Solrによる形態素解析の課題とN-bestの提案
Yahoo!デベロッパーネットワーク
 
PDF
Migrating One of the Most Popular eCommerce Platforms to MongoDB
MongoDB
 
PDF
第10回solr勉強会 solr cloudの導入事例
Ken Hirose
 
PDF
Nhật ký Đặng Thùy Trâm. Bản gốc. Quyển 1.
Bùi Việt Hà
 
KEY
Augmenting RDBMS with MongoDB for ecommerce
Steven Francia
 
PDF
Advanced nginx in mercari - How to handle over 1,200,000 HTTPS Reqs/Min
Masahiro Nagano
 
PDF
NoSQL into E-Commerce: lessons learned
La FeWeb
 
PDF
Architecture of the Hyperledger Blockchain Fabric - Christian Cachin - IBM Re...
Romeo Kienzler
 
PDF
Monitoring Kafka w/ Prometheus
kawamuray
 
PDF
今どきのアーキテクチャ設計戦略 - QCon Tokyo 2016
Yusuke Suzuki
 
PPTX
MongoDB for Time Series Data: Schema Design
MongoDB
 
PPTX
Scaling Jenkins with Docker and Kubernetes
Carlos Sanchez
 
PostgreSQL and CockroachDB SQL
CockroachDB
 
Elasticsearch入門 pyfes 201207
Jun Ohtani
 
第17回Lucene/Solr勉強会 #SolrJP – Apache Lucene Solrによる形態素解析の課題とN-bestの提案
Yahoo!デベロッパーネットワーク
 
Migrating One of the Most Popular eCommerce Platforms to MongoDB
MongoDB
 
第10回solr勉強会 solr cloudの導入事例
Ken Hirose
 
Nhật ký Đặng Thùy Trâm. Bản gốc. Quyển 1.
Bùi Việt Hà
 
Augmenting RDBMS with MongoDB for ecommerce
Steven Francia
 
Advanced nginx in mercari - How to handle over 1,200,000 HTTPS Reqs/Min
Masahiro Nagano
 
NoSQL into E-Commerce: lessons learned
La FeWeb
 
Architecture of the Hyperledger Blockchain Fabric - Christian Cachin - IBM Re...
Romeo Kienzler
 
Monitoring Kafka w/ Prometheus
kawamuray
 
今どきのアーキテクチャ設計戦略 - QCon Tokyo 2016
Yusuke Suzuki
 
MongoDB for Time Series Data: Schema Design
MongoDB
 
Scaling Jenkins with Docker and Kubernetes
Carlos Sanchez
 
Ad

Similar to RocksDB storage engine for MySQL and MongoDB (20)

PDF
MyRocks introduction and production deployment
Yoshinori Matsunobu
 
PDF
RocksDB Performance and Reliability Practices
Yoshinori Matsunobu
 
PDF
Rocksdb vs boltdb
Liu Xun
 
PDF
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
HostedbyConfluent
 
PPTX
Migrating from InnoDB and HBase to MyRocks at Facebook
MariaDB plc
 
PDF
MyRocks Deep Dive
Yoshinori Matsunobu
 
PPTX
YugaByte DB Internals - Storage Engine and Transactions
Yugabyte
 
PDF
TokuDB vs RocksDB
Vlad Lesin
 
PDF
Say Hello to MyRocks
Sergey Petrunya
 
PPTX
How does Apache Pegasus (incubating) community develop at SensorsData
acelyc1112009
 
PDF
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
confluent
 
PDF
Performance Tuning RocksDB for Kafka Streams’ State Stores
confluent
 
PPTX
M|18 How Facebook Migrated to MyRocks
MariaDB plc
 
PPTX
Monitoring MongoDB’s Engines in the Wild
Tim Vaillancourt
 
PDF
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Ontico
 
PDF
Benchmarking, Load Testing, and Preventing Terrible Disasters
MongoDB
 
PDF
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
KEY
Mongo db admin_20110329
radiocats
 
PDF
Application Caching: The Hidden Microservice
Scott Mansfield
 
KEY
Discover MongoDB - Israel
Michael Fiedler
 
MyRocks introduction and production deployment
Yoshinori Matsunobu
 
RocksDB Performance and Reliability Practices
Yoshinori Matsunobu
 
Rocksdb vs boltdb
Liu Xun
 
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
HostedbyConfluent
 
Migrating from InnoDB and HBase to MyRocks at Facebook
MariaDB plc
 
MyRocks Deep Dive
Yoshinori Matsunobu
 
YugaByte DB Internals - Storage Engine and Transactions
Yugabyte
 
TokuDB vs RocksDB
Vlad Lesin
 
Say Hello to MyRocks
Sergey Petrunya
 
How does Apache Pegasus (incubating) community develop at SensorsData
acelyc1112009
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
confluent
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
confluent
 
M|18 How Facebook Migrated to MyRocks
MariaDB plc
 
Monitoring MongoDB’s Engines in the Wild
Tim Vaillancourt
 
Making the case for write-optimized database algorithms / Mark Callaghan (Fac...
Ontico
 
Benchmarking, Load Testing, and Preventing Terrible Disasters
MongoDB
 
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
Mongo db admin_20110329
radiocats
 
Application Caching: The Hidden Microservice
Scott Mansfield
 
Discover MongoDB - Israel
Michael Fiedler
 
Ad

Recently uploaded (20)

PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 

RocksDB storage engine for MySQL and MongoDB