SlideShare a Scribd company logo
Data Storage Systems
Dmitry Alimov
2018
● Dmitry Alimov (@delimitry)
● Software Engineer
● SPbPython and PiterPy active member & speaker
● SPbPython drinkups co-organizer
● CTF player with SiBears team
● Python Software Foundation (PSF) contributing member
2
$ whoami
Outline
● Storage data structures
○ B-tree
○ LSM-tree
○ Other indices
● RUM conjecture
● OLTP, OLAP, HTAP
● SQL, NoSQL, NewSQL
● DB in CPython
● Books and references *
* References are in brackets “[ref num]”
3
Intro
- The amounts of data are constantly growing
4
Intro
- The amounts of data are constantly growing
- Every year new databases appear, existing ones are improved
5
Intro
- The amounts of data are constantly growing
- Every year new databases appear, existing ones are improved
- Each database has its own trade-offs
6
Intro
- The amounts of data are constantly growing
- Every year new databases appear, existing ones are improved
- Each database has its own trade-offs
- Understanding them helps to choose the right one
7
Intro
- The amounts of data are constantly growing
- Every year new databases appear, existing ones are improved
- Each database has its own trade-offs
- Understanding them helps to choose the right one
- Knowing and understanding of storage internals helps to make better design
decisions, troubleshoot problems, tune database
8
Storage Data Structures
9
“Wer Ordnung hält, ist nur zu faul zum Suchen”
(He who keeps order is just too lazy
to spend his time searching)
German proverb
10
Simple datastore
def set(key, value):
with open('main.db', 'a') as db_file:
db_file.write('{},{}n'.format(key, value))
def get(key):
value = None
with open('main.db', 'r') as db_file:
for line in db_file:
k, v = line.split(',')
if k == key:
value = v.rstrip()
return value
11
Simple datastore
>>> set('a', 'one')
>>> set('b', 'two')
>>> set('c', 'three')
>>> set('b', 'four')
>>> print(get('a'))
one
>>> print(get('b'))
four
>>> print(get('z'))
None
12
$ cat main.db
a,one
b,two
c,three
b,four
Questions
- Escaping
13
>>> set('a,b', 'oops')
$ cat main.db
...
a,b,oops # ???
...
Questions
- Escaping
- Deleting
14
>>> delete('c')
$ cat main.db
...
c,<tombstone>
...
Questions
- Escaping
- Deleting
- Concurrency
15
More questions:
Locks
MVCC
Single-writer, multiple-reader
...
main.db
client 1
client 2
client N
...
Questions
- Escaping
- Deleting
- Concurrency
- Compaction
16
$ cat main.db
a,one
b,two
c,three
b,four
c,<tombstone>
$ cat main.db
a,one
b,four
Compact
Questions
- Escaping
- Deleting
- Concurrency
- Compaction
- Performance
17
$ cat main.db
a,one
b,two
c,three
b,four
Insert: O(1), Search: O(n)
Indices
18
Database indices
- Hash indices [50]
19
Database indices
- Hash indices [50]
- B-tree [50]
20
Database indices
- Hash indices [50]
- B-tree [50]
- LSM-tree [54]
21
Database indices
- Hash indices [50]
- B-tree [50]
- LSM-tree [54]
- Other (Spatial indices (R-trees), BRIN, Log-Structured Hash Table, etc)
22
Database indices
- Hash indices [50]
- B-tree [50]
- LSM-tree [54]
- Other (Spatial indices (R-trees), BRIN, Log-Structured Hash Table, etc)
Indices speed up read queries, but slow down writes
23
Databases
24
B-tree LSM-tree
LevelDB
B-tree
25
26
“B-trees are by far the most important access
path structure in database and file systems”
Gray and Reuter, 1992 [31]
“It could be said that the world’s information
is at our fingertips because of B-trees”
Goetz Graefe, 2011 [32]
B-tree
27
Self-balancing tree structure, invented in 1971 by Rudolf Bayer and Ed McCreight
B-tree
28
Self-balancing tree structure, invented in 1971 by Rudolf Bayer and Ed McCreight
The most widely used indexing structure [1]
B-tree
29
Self-balancing tree structure, invented in 1971 by Rudolf Bayer and Ed McCreight
The most widely used indexing structure [1]
Used in: MySQL (InnoDB), PostgreSQL, MongoDB, Oracle DB, MS SQL Server, IBM
DB2, CouchDB, Couchbase, etc
B-tree
30
Self-balancing tree structure, invented in 1971 by Rudolf Bayer and Ed McCreight
The most widely used indexing structure [1]
Used in: MySQL (InnoDB), PostgreSQL, MongoDB, Oracle DB, MS SQL Server, IBM
DB2, CouchDB, Couchbase, etc
Optimized for paged data access [7]
B-tree
31
Self-balancing tree structure, invented in 1971 by Rudolf Bayer and Ed McCreight
The most widely used indexing structure [1]
Used in: MySQL (InnoDB), PostgreSQL, MongoDB, Oracle DB, MS SQL Server, IBM
DB2, CouchDB, Couchbase, etc
Optimized for paged data access [7]
Branching factor between 50 and 2000 is often used [50]
B-tree
32
Self-balancing tree structure, invented in 1971 by Rudolf Bayer and Ed McCreight
The most widely used indexing structure [1]
Used in: MySQL (InnoDB), PostgreSQL, MongoDB, Oracle DB, MS SQL Server, IBM
DB2, CouchDB, Couchbase, etc
Optimized for paged data access [7]
Branching factor between 50 and 2000 is often used [50]
Typically faster for reads [39]
nn
B-tree
33
24
14 19
2 8 15 16 20 21 22 27 30 42 50 55 56 57 62
33 52
2
8
14
...
aa
be
ff
...
key value
aa be ok z k2 val foo top $ ret 123 u aa n bar
ff
abc
tt py
nn
B-tree durability
34
24
14 19
2 8 15 16 20 21 22 27 30 42 50 55 56 57 62
33 52
Write-ahead log (WAL)
…………
…………
…………
2
8
14
...
aa
be
ff
...
key value
aa be ok z k2 val foo top $ ret 123 u aa n bar
ff
abc
tt py
B-tree point query
35
24
14 19
2 8 15 16 20 21 22 27 30 42 50 55 56 57 62
select value with key 16
33 52
Insert: O(logB
N)
Search: O(logB
N)
16 < 24
16 < 14
16 < 15 16 = 16
16 < 19
B-tree range query
36
24
14 19
2 8 15 16 20 21 22 27 30 42 50 55 56 57 62
select values with keys in [15...27]
33 52
Non-optimal :(
B+ tree
37
27
15 20
2 8 15 16 20 21 22 27 30 42 50 55 56 57 62
42 55
aa be ok z k2 val foo top $ ret 123 u aa n bar
B+ tree range query
38
27
15 20
2 8 15 16 20 21 22 27 30 42 50 55 56 57 62
42 55
aa be ok z k2 val foo top $ ret 123 u aa n bar
Insert: O(logB
N)
Search: O(logB
N)
RangeQuery: O(logB
N + k)
select values with keys in [15...27]
Memory vs Disk
39
40
Memory Prices [34]
PriceperMB($)
Year
41
Memory vs Disk [56, 57, 58, 59]
Operation Time, ns * Comment
Memory access 100
SSD random read 16 000 (16 µs)
HDD seek 4 000 000 (4 ms)
SSD I/O 50 000 - 150 000 (50 - 150 )
HDD I/O 1 000 000 - 10 000 000 (1 - 10 ms)
Read 1 MB sequentially from memory 9 000 (9 µs)
Read 1 MB sequentially from SSD 200 000 (200 µs) 22x memory
Read 1 MB sequentially from HDD 2 000 000 (2 ms) 10x SSD, 220x memory
* Numbers for 2015
LSM-tree
42
Patrick O'Neil et al., introduced in 1996 [54]
LSM-tree (Log-structured merge-tree)
43
Patrick O'Neil et al., introduced in 1996 [54]
Used in: LevelDB, RocksDB, Cassandra, HBase, BigTable, InfluxDB, ScyllaDB,
SQLite4, Tarantool, MongoDB (WiredTiger), etc.
LSM-tree (Log-structured merge-tree)
44
Patrick O'Neil et al., introduced in 1996 [54]
Used in: LevelDB, RocksDB, Cassandra, HBase, BigTable, InfluxDB, ScyllaDB,
SQLite4, Tarantool, MongoDB (WiredTiger), etc.
Google’s Bigtable paper in 2006 [1, 53]
LSM-tree (Log-structured merge-tree)
45
Patrick O'Neil et al., introduced in 1996 [54]
Used in: LevelDB, RocksDB, Cassandra, HBase, BigTable, InfluxDB, ScyllaDB,
SQLite4, Tarantool, MongoDB (WiredTiger), etc.
Google’s Bigtable paper in 2006 [1, 53]
Memtable (B-tree, Skip List, etc)
LSM-tree (Log-structured merge-tree)
46
Patrick O'Neil et al., introduced in 1996 [54]
Used in: LevelDB, RocksDB, Cassandra, HBase, BigTable, InfluxDB, ScyllaDB,
SQLite4, Tarantool, MongoDB (WiredTiger), etc.
Google’s Bigtable paper in 2006 [1, 53]
Memtable (B-tree, Skip List, etc)
LSM-tree (Log-structured merge-tree)
47
https://www.themarysue.com/periodic-meme-table/
Patrick O'Neil et al., introduced in 1996 [54]
Used in: LevelDB, RocksDB, Cassandra, HBase, BigTable, InfluxDB, ScyllaDB,
SQLite4, Tarantool, MongoDB (WiredTiger), etc.
Google’s Bigtable paper in 2006 [1, 53]
Memtable (B-tree, Skip List, etc)
Sorted String Table (SSTable) - immutable
LSM-tree (Log-structured merge-tree)
48
Patrick O'Neil et al., introduced in 1996 [54]
Used in: LevelDB, RocksDB, Cassandra, HBase, BigTable, InfluxDB, ScyllaDB,
SQLite4, Tarantool, MongoDB (WiredTiger), etc.
Google’s Bigtable paper in 2006 [1, 53]
Memtable (B-tree, Skip List, etc)
Sorted String Table (SSTable) - immutable
Typically faster for writes [39]
LSM-tree (Log-structured merge-tree)
49
50
Memory
Disk
memtable
(B-tree, Skip List, etc)
SSTable data
aaaa
bbb
...
zzzz
124
7351
...
7
key file offset
aaaa 0
SSTable index
... ...
key value
0 1 0 1 1
Bloom filter
LSM-tree
51
Memory
Disk
memtable
(B-tree, Skip List, etc)
SSTable data
aaaa
bbb
...
zzzz
Commit log
124
7351
...
7
…………
…………
…………
…………
key file offset
aaaa 0
SSTable index
... ...
key value
0 1 0 1 1
Bloom filter
LSM-tree durability
52
SSTables compact & merge
aaaa
foo
...
zzzz
124
7351
...
<deleted>
key value
aaaa
aab
...
zzzz
123
1
...
7
key value
aaaa
aab
...
yy
124
1
...
222
key value
foo 7351
SSTable (merged)
SSTable (old)
SSTable (new)
Leveled & Size-tiered
compaction
Bloom filter [73]
53
Created by Burton Howard Bloom in 1970
Bloom filter [73]
54
Created by Burton Howard Bloom in 1970
Space-efficient probabilistic data structure
Bloom filter [73]
55
Created by Burton Howard Bloom in 1970
Space-efficient probabilistic data structure
Used by:
Google Bigtable, Apache HBase, Cassandra, and PostgreSQL [79]
Bloom filter [73]
56
Created by Burton Howard Bloom in 1970
Space-efficient probabilistic data structure
Used by:
Google Bigtable, Apache HBase, Cassandra, and PostgreSQL [79]
Akamai - to prevent "one-hit-wonders" from being stored in its disk caches
Bloom filter [73]
57
Created by Burton Howard Bloom in 1970
Space-efficient probabilistic data structure
Used by:
Google Bigtable, Apache HBase, Cassandra, and PostgreSQL [79]
Akamai - to prevent "one-hit-wonders" from being stored in its disk caches
The Google Chrome - to identify malicious URLs
Bloom filter [73]
58
Created by Burton Howard Bloom in 1970
Space-efficient probabilistic data structure
Used by:
Google Bigtable, Apache HBase, Cassandra, and PostgreSQL [79]
Akamai - to prevent "one-hit-wonders" from being stored in its disk caches
The Google Chrome - to identify malicious URLs
Medium - to avoid recommending articles a user has previously read
Bloom filter
59
0 1 0 0 1 0 1 1 0 0 1 0
abc foo
key “foo” probably exists in SSTable
key “abc” definitely not
Bit array
Bloom filter
60
0 1 0 0 1 0 1 1 0 0 1 0
abc foo
SSTable
SSTable data
foo 7351
key file offset
foo 3584
SSTable index
key value
key “foo” probably exists in SSTable
try to get its value
Bit array
R-tree
61
R-tree
62
Proposed by Antonin Guttman in 1984 [82]
R-tree
63
Proposed by Antonin Guttman in 1984 [82]
Tree data structure for indexing spatial information such as geographical
coordinates, rectangles or polygons [82]
R-tree
64
Proposed by Antonin Guttman in 1984 [82]
Tree data structure for indexing spatial information such as geographical
coordinates, rectangles or polygons [82]
Common operation on spatial data is a search for all objects in an area [83],
e.g.: “Find all shops within 1 km of my current location”
R6
R-tree
65
R1 R2
R3 R4 R6 R7
R3
R4
R5
R5
R8 R9
R7
R10 R11 R12 R13 R14 R15
R8
R9
R10
R11
R12
R13
R14
R15
R1
R2
BRIN
66
Block Range Index (BRIN) [80]
67
Proposed by Alvaro Herrera of 2ndQuadrant in 2013 as Minmax index [80, 81]
Block Range Index (BRIN) [80]
68
Proposed by Alvaro Herrera of 2ndQuadrant in 2013 as Minmax index [80, 81]
Designed for large tables (best for ordered set)
Block Range Index (BRIN) [80]
69
Proposed by Alvaro Herrera of 2ndQuadrant in 2013 as Minmax index [80, 81]
Designed for large tables (best for ordered set)
Used in: PostgreSQL
Block Range Index (BRIN) [80]
70
Proposed by Alvaro Herrera of 2ndQuadrant in 2013 as Minmax index [80, 81]
Designed for large tables (best for ordered set)
Used in: PostgreSQL
Other vendors have similar features: Oracle "storage indexes", Netezza “zone
maps”, Infobright “data packs”, MonetDB, Apache Hive, ORC, Parquet [80, 81]
B-tree vs BRIN
71
24
2 11 33 52
abc
18 22
55
57 70
2 22 24 55 33 52 57 70
foo
block range min value max value
1 2 22
2 24 55
3 33 52
4 57 70
B-tree
BRIN
Log-Structured Hash Table
72
Log-Structured Hash Table
73
key file offset
car 54
droid 475
current data file
older data fileBitcask (the default storage engine in Riak) [51]
older data file
car
Memory Disk
engine number 3710 ...
... droid model name S21 ...
...
Log-Structured Hash Table
74
key file offset
car 54
droid 475
current data file
older data file
older data file
car
Memory Disk
engine number 3710 ...
... droid model name S21 ...
Limitations:
1) must fit in memory
2) ranges not efficient
...
Bitcask (the default storage engine in Riak) [51]
RUM Conjecture
75
RUM Conjecture [38, 39]
76
Read Optimized
Update Optimized Memory Optimized
LSM
Hash
B-tree
Trie
Skip List
Sparse Index
Bloom filter
Bitmap
Cracking
Merging
Point & Tree
Indexes
Compressible/Approximate
Indexes
Differential
Structures
Adaptive
Structures
Amplifications
77
- Read amplification — amount of work done per logical read operation [49]
78
Amplifications
- Read amplification — amount of work done per logical read operation [49]
- Write amplification — amount of work done per write operation [49]
Writing 1 byte -> writing a page (up to 16 KB for some models) [36]
79
Amplifications
- Read amplification — amount of work done per logical read operation [49]
- Write amplification — amount of work done per write operation [49]
Writing 1 byte -> writing a page (up to 16 KB for some models) [36]
- Space amplification — ratio of the size of DB to the size of the data in DB [49]
80
Amplifications
- Read amplification — amount of work done per logical read operation [49]
- Write amplification — amount of work done per write operation [49]
Writing 1 byte -> writing a page (up to 16 KB for some models) [36]
- Space amplification — ratio of the size of DB to the size of the data in DB [49]
The SPAce, Read Or Write theorem (SPARROW) [46]
RA is inversely related to WA, and WA is inversely related to SA
81
Amplifications
- Read amplification — amount of work done per logical read operation [49]
- Write amplification — amount of work done per write operation [49]
Writing 1 byte -> writing a page (up to 16 KB for some models) [36]
- Space amplification — ratio of the size of DB to the size of the data in DB [49]
The SPAce, Read Or Write theorem (SPARROW) [46]
RA is inversely related to WA, and WA is inversely related to SA
Amplification and other issues are heavily dependent on workload, configuration of
the engine, and the specific implementation [48]
82
Amplifications
Interesting projects
The periodic table of data structures [41]
83
Interesting projects
The periodic table of data structures [41]
Data calculator [42, 43]
Interactive, semi-automated design of data
structures
84
Interesting projects
The periodic table of data structures [41]
Data calculator [42, 43]
Interactive, semi-automated design of data
structures
CrimsonDB [45]
A self-designing key-value store
85
OLTP/OLAP
86
OLTP vs OLAP
87
In the early days of business data processing, a write to the database typically
corresponded to a commercial transactions [1]
OLTP vs OLAP
88
In the early days of business data processing, a write to the database typically
corresponded to a commercial transactions [1]
Databases started being used for many different kinds of applications. Because
applications are interactive, the access pattern became known as online transaction
processing (OLTP)
OLTP vs OLAP
89
In the early days of business data processing, a write to the database typically
corresponded to a commercial transactions [1]
Databases started being used for many different kinds of applications. Because
applications are interactive, the access pattern became known as online transaction
processing (OLTP)
Databases also started being increasingly used for data analytics
OLTP vs OLAP
90
In the early days of business data processing, a write to the database typically
corresponded to a commercial transactions [1]
Databases started being used for many different kinds of applications. Because
applications are interactive, the access pattern became known as online transaction
processing (OLTP)
Databases also started being increasingly used for data analytics
Databases for online analytical processing (OLAP) was called a Data Warehouse
OLTP vs OLAP
91
In the early days of business data processing, a write to the database typically
corresponded to a commercial transactions [1]
Databases started being used for many different kinds of applications. Because
applications are interactive, the access pattern became known as online transaction
processing (OLTP)
Databases also started being increasingly used for data analytics
Databases for online analytical processing (OLAP) was called a Data Warehouse
Hybrid transaction/analytical processing (HTAP) [84]
Row oriented vs Column oriented DBMS
92
name age
John 54
Alice 22
datetime
10/10/2018 12:01:23
10/10/2018 12:01:24
name age
John 54
Alice 22
datetime
10/10/2018 12:01:23
10/10/2018 12:01:24
name age
John 54
Alice 22
datetime
10/10/2018 12:01:23
10/10/2018 12:01:24
Compression:
RLE, LZW, etc
Columns
Rows
Column oriented & time series DBs
93
Apache Parquet, ClickHouse, C-Store,
Greenplum, MonetDB, Vertica, etc.
Time series databases (TSDB):
Druid, Akumuli, InfluxDB, Riak TS, etc.
C-Store Akumuli
SQL, NoSQL, NewSQL
94
95
RDBMS/SQL, NoSQL, NewSQL [72]
RDBMS/SQL NoSQL NewSQL
Relational Yes No Yes
ACID transactions Yes No Yes
SQL support Yes No Yes
Horizontal scalability No Yes Yes
Schemaless No Yes No
96
Matthew Aslett, The 451 Group [55]
DB in CPython
97
Python DB API Specification
PEP 248 - v1.0 (Release-Date: 09 Apr 1996 [74])
98
Python DB API Specification
PEP 248 - v1.0 (Release-Date: 09 Apr 1996 [74])
PEP 249 - v2.0 (Release-Date: 07 Apr 1999 [75])
99
Python DB API Specification
PEP 248 - v1.0 (Release-Date: 09 Apr 1996 [74])
PEP 249 - v2.0 (Release-Date: 07 Apr 1999 [75])
Implementations are available for:
- PostgreSQL (psycopg2, txpostgres, ...)
- MySQL (mysql-python, PyMySQL, ...)
- MS SQL Server (adodbapi, pymssql, mxODBC, pyodbc, ...)
- Oracle (cx_Oracle, mxODBC, pyodbc, ...)
- etc.
100
DBs in Python
dbm, gdbm or bsddb
dbm — interfaces to Unix “databases” [76]
101
DBs in Python
dbm, gdbm or bsddb
dbm — interfaces to Unix “databases” [76]
shelve
“shelf” — persistent, dictionary-like object
The values can be arbitrary Python objects — anything that the pickle module can
handle, but the keys are strings [77]
102
DBs in Python
dbm, gdbm or bsddb
dbm — interfaces to Unix “databases” [76]
shelve
“shelf” — persistent, dictionary-like object
The values can be arbitrary Python objects — anything that the pickle module can
handle, but the keys are strings [77]
103
https://pixnio.com/food-and-drink/bell-pepper-jar-carfiol-veg
etable-food-diet-glass-organic
DBs in Python
dbm, gdbm or bsddb
dbm — interfaces to Unix “databases” [76]
shelve
“shelf” — persistent, dictionary-like object
The values can be arbitrary Python objects — anything that the pickle module can
handle, but the keys are strings [77]
sqlite3
sqlite3 — DB-API 2.0 interface for SQLite databases [78]
104
● OLTP and OLAP
● OLTP:
○ B-tree
○ LSM-tree
○ Other indices
○ RAM, SSD
● OLAP
○ Column-oriented storage
● RUM Conjecture
● Amplifications
Summary
105
106
Books [1, 28]
107
Thank you!
Happy databasing!
References
1. Martin Kleppmann: Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and
Maintainable Systems, 1st edition. O'Reilly Media, 2017. ISBN: 978-1-449-37332-0 (https://dataintensive.net)
2. Alex Petrov: On Disk IO, Part 1: Flavors of IO, medium.com, September 3, 2017.
(https://medium.com/databasss/on-disk-io-part-1-flavours-of-io-8e1ace1de017)
3. Alex Petrov: On Disk IO, Part 2: More Flavours of IO, medium.com, September 11, 2017.
(https://medium.com/databasss/on-disk-io-part-2-more-flavours-of-io-c945db3edb13)
4. Alex Petrov: On Disk IO, Part 3: LSM Trees, medium.com, September 27, 2017.
(https://medium.com/databasss/on-disk-io-part-3-lsm-trees-8b2da218496f)
5. Alex Petrov: On Disk IO, Part 4: B-Trees and RUM Conjecture, medium.com, October 4, 2017.
(https://medium.com/databasss/on-disk-storage-part-4-b-trees-30791060741)
6. Alex Petrov: On Disk IO, Part 5: Access Patterns in LSM Trees, medium.com, October 30, 2017.
(https://medium.com/databasss/on-disk-io-access-patterns-in-lsm-trees-2ba8dffc05f9)
7. Alex Petrov: Algorithms Behind Modern Storage Systems. Communications of the ACM, volume 61, number 8,
pages 38-44, August 2018, doi:10.1145/3209210 (https://queue.acm.org/detail.cfm?id=3220266)
108
References
8. PostgreSQL 9.2.24 Documentation: Chapter 11. Indexes
(https://www.postgresql.org/docs/9.2/static/indexes-types.html)
9. MySQL 8.0 Reference Manual: 15.8.2.2 The Physical Structure of an InnoDB Index
(https://dev.mysql.com/doc/refman/8.0/en/innodb-physical-structure.html)
10. Oracle Database Concepts: Indexes and Index-Organized Tables
(https://docs.oracle.com/cd/E11882_01/server.112/e40540/indexiot.htm#CNCPT1170)
11. SQL Server Index Design Guide (https://technet.microsoft.com/en-us/library/jj835095(v=sql.110).aspx)
12. IBM Knowledge Center: Table and index management for standard tables
(https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.admin.perf.doc/doc/c
0005424.html)
13. MariaDB Knowledge Base: Storage Engine Index Types
(https://mariadb.com/kb/en/library/storage-engine-index-types/)
14. Wiredtiger: Btree vs LSM
(https://github.com/wiredtiger/wiredtiger/wiki/Btree-vs-LSM/1cae5a2c73e938fa2095d900f8c25a9ee9a05412)
15. CouchDB The Definitive Guide: The Power of B-trees (http://guide.couchdb.org/draft/btree.html)
109
References
16. Architecture of SQLite (https://www.sqlite.org/arch.html)
17. The Couchbase Blog: Compaction magic in Couchbase Server 2.0
(https://blog.couchbase.com/compaction-magic-couchbase-server-20/)
18. Apache Cassandra 3.0: Storage engine
(https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlManageOndisk.html)
19. Apache HBase: Accordion: HBase Breathes with In-Memory Compaction, blogs.apache.org, April 09, 2017.
(https://blogs.apache.org/hbase/entry/accordion-hbase-breathes-with-in)
20. InfluxData Documentation: In-memory indexing and the Time-Structured Merge Tree (TSM)
(https://docs.influxdata.com/influxdb/v1.6/concepts/storage_engine/#the-influxdb-storage-engine-and-the-tim
e-structured-merge-tree-tsm)
21. Ilya Grigorik: SSTable and Log Structured Storage: LevelDB, igvita.com, February 06, 2012.
(https://www.igvita.com/2012/02/06/sstable-and-log-structured-storage-leveldb/)
22. RocksDB Basics
(https://github.com/facebook/rocksdb/wiki/RocksDB-Basics/8e2e3f69e163fbc370b13c3d2baf8ecf798f85e5)
23. SSTable compaction and compaction strategies
(https://github.com/scylladb/scylla/wiki/SSTable-compaction-and-compaction-strategies/419412878eea8a9f9
775fb718eda2fed2c1d551b)
110
References
24. Nadav Har'El: Scylla’s Compaction Strategies Series: Write Amplification in Leveled Compaction, scylladb.com,
January 31, 2018. (https://www.scylladb.com/2018/01/31/compaction-series-leveled-compaction/)
25. Vinyl Architecture
(https://github.com/tarantool/tarantool/wiki/Vinyl-Architecture/c83dec9b0719478ef24d6407ba6583faf6ae4547)
26. Tarantool: Storage engines (https://www.tarantool.io/en/doc/1.9/book/box/engines/)
27. SQLite4: LSM Users Guide (https://sqlite.org/src4/doc/trunk/www/lsmusr.wiki)
28. Peter Bailis, Joseph M. Hellerstein, Michael Stonebraker: Readings in Database Systems, 5th Edition, 2015.
(http://www.redbook.io/)
29. Douglas Comer: The Ubiquitous B-Tree, ACM Computing Surveys, volume 11, number 2, pages 121–137, June 1979.
doi:10.1145/356770.356776 (http://www.ezdoum.com/upload/14/20020512204603/TheUbiquitousB-Tree.pdf)
30. Yinan Li, Bingsheng He, Robin Jun Yang, et al.: Tree Indexing on Solid State Drives, Proceedings of the VLDB
Endowment, volume 3, number 1, pages 1195–1206, September 2010.
(http://www.vldb.org/pvldb/vldb2010/papers/R106.pdf)
111
References
31. Jim Gray, Andreas Reuter: Transaction processing: concepts and techniques, Morgan Kaufmann, 1992. ISBN:
978-1-55860-190-1
32. Goetz Graefe: Modern B-Tree Techniques, Now Publishers Inc, 2011. ISBN: 978-1-60198-482-1
33. Niv Dayan: Log-Structured-Merge Trees, Comp115 guest lecture, February 23, 2017.
(http://manos.athanassoulis.net/classes/Comp115-Spring2017/slides/Comp115-Guest-Lecture-LSM-Trees.pdf)
34. Memory Prices 1957 to 2018 (https://jcmit.net/memoryprice.htm)
35. Yinan Li, Bingsheng He, Robin Jun Yang, et al.: Tree Indexing on Solid State Drives, Proceedings of the VLDB
Endowment, volume 3, number 1, pages 1195–1206, September 2010.
(http://www.vldb.org/pvldb/vldb2010/papers/R106.pdf)
36. Emmanuel Goossaert: Coding for SSDs, codecapsule.com, February 12, 2014.
(http://codecapsule.com/2014/02/12/coding-for-ssds-part-1-introduction-and-table-of-contents/)
37. Write amplification, en.wikipedia.org. https://en.wikipedia.org/wiki/Write_amplification
38. The RUM Conjecture, daslab.seas.harvard.edu, 2016. http://daslab.seas.harvard.edu/rum-conjecture/
39. Manos Athanassoulis, Michael S. Kester, Lukas M. Maas, et al.: Designing Access Methods: The RUM Conjecture, at
19th International Conference on Extending Database Technology (EDBT), March 2016.
doi:10.5441/002/edbt.2016.42 (http://openproceedings.org/2016/conf/edbt/paper-12.pdf) 112
References
40. Гипотеза RUM, delimitry.blogspot.com, March 13, 2018. (http://delimitry.blogspot.com/2018/03/rum.html)
41. S. Idreos, et al., The Periodic Table of Data Structures, Bulletin of the IEEE Computer Society Technical
Committee on Data Engineering, vol. 41, no. 3, pp. 64-75, 2018.
(http://sites.computer.org/debull/A18sept/p64.pdf)
42. S. Idreos, K. Zoumpatianos, B. Hentschel, M. Kester, and D. Guo: The Data Calculator: Data Structure Design and
Cost Synthesis from First Principles and Learned Cost Models, SIGMOD, 2018.
43. Data Calculator, daslab.seas.harvard.edu, 2018. (http://daslab.seas.harvard.edu/datacalculator/)
44. Периодическая таблица структур данных, delimitry.blogspot.com, October 14, 2018.
(http://delimitry.blogspot.com/2018/10/blog-post.html)
45. CrimsonDB: A Self-Designing Key-Value Store, daslab.seas.harvard.edu, 2018.
(http://daslab.seas.harvard.edu/projects/crimsondb/)
46. The SPARROW Theorem for performance of storage systems, rocksdb.blogspot.com, October 16, 2013.
(http://rocksdb.blogspot.com/2013/10/SparrowTheorem.html)
47. Goetz Graefe: Modern B-Tree Techniques, Foundations and Trends in Databases, volume 3, number 4, pages
203–402, August 2011. doi:10.1561/1900000028
(http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.219.7269&rep=rep1&type=pdf) 113
References
48. Baron Schwartz: The State of the Storage Engine, dzone.com, March 16, 2015.
(https://dzone.com/articles/state-storage-engine)
49. Mark Callaghan: Read, write & space amplification - pick 2, smalldatum.blogspot.com, November 23, 2015.
(http://smalldatum.blogspot.com/2015/11/read-write-space-amplification-pick-2_23.html)
50. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein: Introduction to Algorithms, 3rd
edition. MIT Press, 2009. ISBN: 978-0-262-53305-8
51. Justin Sheehy and David Smith: Bitcask: A Log-Structured Hash Table for Fast Key/Value Data, Basho
Technologies, April 2010. (http://basho.com/wp-content/uploads/2015/05/bitcask-intro.pdf)
52. D. Severance/G. Lohman: Differential Files: Their Applications to the Maintenance of Large Databases, ACM
Transactions on Database Systems, volume 1, number 3, pages 256–367, September 1976.
doi:10.1145/320473.320484
53. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, et al.: Bigtable: A Distributed Storage System for Structured Data, at
7th USENIX Symposium on Operating System Design and Implementation (OSDI), November 2006.
(https://ai.google/research/pubs/pub27898)
114
References
54. Patrick O’Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil: The Log-Structured Merge-Tree (LSM-Tree),
Acta Informatica, volume 33, number 4, pages 351–385, June 1996. doi:10.1007/s002360050048
(https://www.cs.umb.edu/~poneil/lsmtree.pdf)
55. Matthew Aslett: NoSQL, NewSQL and Beyond: The answer to SPRAINed relational databases,
blogs.the451group.com, April 15th, 2011.
(https://blogs.the451group.com/information_management/2011/04/15/nosql-newsql-and-beyond/)
56. Brendan Gregg: Systems Performance: Enterprise and the Cloud, Prentice Hall, October 2013. ISBN:
978-0-13-339009-4
57. Latency Numbers Every Programmer Should Know (https://gist.github.com/jboner/2841832)
58. Peter Norvig: Teach Yourself Programming in Ten Years (http://norvig.com/21-days.html#answers)
59. Latency Numbers Every Programmer Should Know (Interactive Latency)
(https://people.eecs.berkeley.edu/~rcs/research/interactive_latency.html)
60. Apache Parquet (https://parquet.apache.org/)
61. ClickHouse (https://clickhouse.yandex/)
62. C-Store (http://db.csail.mit.edu/projects/cstore/)
115
References
63. Apache Druid (http://druid.io/)
64. Greenplum (https://greenplum.org/)
65. MonetDB (https://www.monetdb.org/)
66. Vertica (https://www.vertica.com/overview/)
67. Akumuli (https://akumuli.org/)
68. InfluxDB (https://www.influxdata.com/)
69. Riak TS (http://basho.com/products/riak-ts/)
70. Eric Evans: NOSQL 2009, blog.sym-link.com, May 12, 2009.
(http://blog.sym-link.com/2009/05/12/nosql_2009.html)
71. Eric Evans: NoSQL: What’s in a Name?, blog.sym-link.com, October 30, 2009.
(http://blog.sym-link.com/2009/10/30/nosql_whats_in_a_name.html)
72. Laurent Guérin: NewSQL: what’s this?, labs.sogeti.com, January 22, 2014.
(http://labs.sogeti.com/newsql-whats/)
73. Burton H. Bloom: Space/Time Trade-offs in Hash Coding with Allowable Errors, Communications of the ACM,
volume 13, number 7, pages 422–426, July 1970. doi:10.1145/362686.362692
(http://www.cs.upc.edu/~diaz/p422-bloom.pdf) 116
References
74. http://svn.python.org/projects/peps/tags/date2001-07-16/pep-0248.txt
(https://www.python.org/dev/peps/pep-0249/)
75. http://svn.python.org/projects/peps/tags/date2001-07-16/pep-0249.txt
(https://www.python.org/dev/peps/pep-0249/)
76. Python 3.7.1rc1 documentation: dbm — Interfaces to Unix “databases”
(https://docs.python.org/3/library/dbm.html)
77. Python 3.7.1rc1 documentation: shelve — Python object persistence
(https://docs.python.org/3/library/shelve.html)
78. Python 3.7.1rc1 documentation: sqlite3 — DB-API 2.0 interface for SQLite databases
(https://docs.python.org/3/library/sqlite3.html)
79. Bloom filter (https://en.wikipedia.org/wiki/Bloom_filter)
80. Block Range Index (https://en.wikipedia.org/wiki/Block_Range_Index)
81. PostgreSQL: [RFC] Minmax indexes
(https://www.postgresql.org/message-id/20130614222805.GZ5491@eldon.alvh.no-ip.org)
117
References
82. R-tree (https://en.wikipedia.org/wiki/R-tree)
83. Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD '84 Proceedings of the
1984 ACM SIGMOD international conference on Management of data, pp. 47-57, 1984.
doi:10.1145/602259.602266. ISBN: 0-89791-128-8
(http://www-db.deis.unibo.it/courses/SI-LS/papers/Gut84.pdf)
84. Hybrid transactional/analytical processing (HTAP)
(https://en.wikipedia.org/wiki/Hybrid_transactional/analytical_processing_(HTAP))
118
119
Questions

More Related Content

What's hot (20)

PDF
Fractal Tree Indexes : From Theory to Practice
Tim Callaghan
 
PPTX
Redis/Lessons learned
Tit Petric
 
PDF
Native erasure coding support inside hdfs presentation
lin bao
 
PPTX
การจัดการฐานข้อมูล
ABELE Snvip
 
PPTX
Webinar: Getting Started with MongoDB - Back to Basics
MongoDB
 
PPTX
Ceph Day Chicago - Supermicro Ceph - Open SolutionsDefined by Workload
Ceph Community
 
PPT
The NoSQL Way in Postgres
EDB
 
PPTX
Back to Basics, webinar 2: La tua prima applicazione MongoDB
MongoDB
 
PPTX
SQLRally Amsterdam 2013 - Hadoop
Jan Pieter Posthuma
 
PDF
Efficient Data Storage for Analytics with Apache Parquet 2.0
Cloudera, Inc.
 
PPTX
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Keshav Murthy
 
PPT
Lecture 40 1
patib5
 
PDF
Neo4j: Graph-like power
Roman Rodomansky
 
KEY
Python Development (MongoSF)
Mike Dirolf
 
PDF
PgREST: Node.js in the Database
Audrey Tang
 
KEY
R for Pirates. ESCCONF October 27, 2011
Mandi Walls
 
ODP
Introduction to PostgreSQL
Jim Mlodgenski
 
PPT
Tthornton code4lib
trevorthornton
 
PPT
Mongodb
Manav Prasad
 
PPTX
Mongo db queries
ssuser6d5faa
 
Fractal Tree Indexes : From Theory to Practice
Tim Callaghan
 
Redis/Lessons learned
Tit Petric
 
Native erasure coding support inside hdfs presentation
lin bao
 
การจัดการฐานข้อมูล
ABELE Snvip
 
Webinar: Getting Started with MongoDB - Back to Basics
MongoDB
 
Ceph Day Chicago - Supermicro Ceph - Open SolutionsDefined by Workload
Ceph Community
 
The NoSQL Way in Postgres
EDB
 
Back to Basics, webinar 2: La tua prima applicazione MongoDB
MongoDB
 
SQLRally Amsterdam 2013 - Hadoop
Jan Pieter Posthuma
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Cloudera, Inc.
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Keshav Murthy
 
Lecture 40 1
patib5
 
Neo4j: Graph-like power
Roman Rodomansky
 
Python Development (MongoSF)
Mike Dirolf
 
PgREST: Node.js in the Database
Audrey Tang
 
R for Pirates. ESCCONF October 27, 2011
Mandi Walls
 
Introduction to PostgreSQL
Jim Mlodgenski
 
Tthornton code4lib
trevorthornton
 
Mongodb
Manav Prasad
 
Mongo db queries
ssuser6d5faa
 

Similar to Data storage systems (20)

PDF
Read- and Write-Optimization in Modern Database Infrastructures by Dzejla Med...
ScyllaDB
 
PPTX
Database Storage Engine Internals
AdewumiSunkanmi
 
PPTX
Database storage engine internals.pptx
AdewumiSunkanmi
 
PPTX
Designing data intensive applications
Hemchander Sannidhanam
 
PDF
Write intensive workloads and lsm trees
Tilak Patidar
 
PPT
[Www.pkbulk.blogspot.com]file and indexing
AnusAhmad
 
PDF
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
Insight Technology, Inc.
 
PPT
The No SQL Principles and Basic Application Of Casandra Model
Rishikese MR
 
PPT
9910559 jjjgjgjfs lke lwmerfml lew we.ppt
abduganiyevbekzod011
 
PPTX
Presentation on b trees [autosaved]
AKASHKUMAR1542
 
PPT
3620121datastructures.ppt
SheejamolMathew
 
PPTX
DBMS-Unit5-PPT.pptx important for revision
yuvivarmaa
 
PPTX
B tree
svishalsingh01
 
PPT
Indexing_DATA STRUCTURE FOR ENGINEERING STUDENTS ppt
ssuser99ca78
 
PPT
Indexing.pptvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
shesnasuneer
 
PPTX
The last mile from db to disk
Bartosz Sypytkowski
 
PPT
Indexing.ppt mmmmmmmmmmmmmmmmmmmmmmmmmmmmm
RAtna29
 
PPT
Indexing.ppt
KalsoomTahir2
 
PPTX
File Structures(Part 2)
Dr. SURBHI SAROHA
 
ODP
Vote NO for MySQL
Ulf Wendel
 
Read- and Write-Optimization in Modern Database Infrastructures by Dzejla Med...
ScyllaDB
 
Database Storage Engine Internals
AdewumiSunkanmi
 
Database storage engine internals.pptx
AdewumiSunkanmi
 
Designing data intensive applications
Hemchander Sannidhanam
 
Write intensive workloads and lsm trees
Tilak Patidar
 
[Www.pkbulk.blogspot.com]file and indexing
AnusAhmad
 
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
Insight Technology, Inc.
 
The No SQL Principles and Basic Application Of Casandra Model
Rishikese MR
 
9910559 jjjgjgjfs lke lwmerfml lew we.ppt
abduganiyevbekzod011
 
Presentation on b trees [autosaved]
AKASHKUMAR1542
 
3620121datastructures.ppt
SheejamolMathew
 
DBMS-Unit5-PPT.pptx important for revision
yuvivarmaa
 
Indexing_DATA STRUCTURE FOR ENGINEERING STUDENTS ppt
ssuser99ca78
 
Indexing.pptvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
shesnasuneer
 
The last mile from db to disk
Bartosz Sypytkowski
 
Indexing.ppt mmmmmmmmmmmmmmmmmmmmmmmmmmmmm
RAtna29
 
Indexing.ppt
KalsoomTahir2
 
File Structures(Part 2)
Dr. SURBHI SAROHA
 
Vote NO for MySQL
Ulf Wendel
 
Ad

More from delimitry (19)

PDF
Python Hashlib & A True Story of One Bug
delimitry
 
PDF
JIT compilation for CPython
delimitry
 
PDF
Fuzzing python modules
delimitry
 
PDF
Writing file system in CPython
delimitry
 
PDF
CPython logo
delimitry
 
PDF
Contribute to CPython
delimitry
 
PDF
Buzzword poem generator in Python
delimitry
 
PDF
True stories on the analysis of network activity using Python
delimitry
 
PDF
Numbers obfuscation in Python
delimitry
 
PDF
ITGM #9 - Коварный CodeType, или от segfault'а к работающему коду
delimitry
 
PDF
Python dictionary : past, present, future
delimitry
 
PDF
Python dict: прошлое, настоящее, будущее
delimitry
 
PPTX
Разработка фреймворка на Python для автоматизации тестирования STB боксов
delimitry
 
PPTX
SchoolCTF 2012 - Tpircsavaj
delimitry
 
PPTX
SchoolCTF 2012 - See Shark
delimitry
 
PPTX
SchoolCTF 2012 - Rings
delimitry
 
PPTX
SchoolCTF 2012 - Bin Pix
delimitry
 
PPTX
SchoolCTF 2012 - Acid
delimitry
 
PPTX
Python GC
delimitry
 
Python Hashlib & A True Story of One Bug
delimitry
 
JIT compilation for CPython
delimitry
 
Fuzzing python modules
delimitry
 
Writing file system in CPython
delimitry
 
CPython logo
delimitry
 
Contribute to CPython
delimitry
 
Buzzword poem generator in Python
delimitry
 
True stories on the analysis of network activity using Python
delimitry
 
Numbers obfuscation in Python
delimitry
 
ITGM #9 - Коварный CodeType, или от segfault'а к работающему коду
delimitry
 
Python dictionary : past, present, future
delimitry
 
Python dict: прошлое, настоящее, будущее
delimitry
 
Разработка фреймворка на Python для автоматизации тестирования STB боксов
delimitry
 
SchoolCTF 2012 - Tpircsavaj
delimitry
 
SchoolCTF 2012 - See Shark
delimitry
 
SchoolCTF 2012 - Rings
delimitry
 
SchoolCTF 2012 - Bin Pix
delimitry
 
SchoolCTF 2012 - Acid
delimitry
 
Python GC
delimitry
 
Ad

Recently uploaded (20)

PPTX
Server Side Web Development Unit 1 of Nodejs.pptx
sneha852132
 
PPTX
Pharmaceuticals and fine chemicals.pptxx
jaypa242004
 
PPTX
Introduction to Neural Networks and Perceptron Learning Algorithm.pptx
Kayalvizhi A
 
PDF
Statistical Data Analysis Using SPSS Software
shrikrishna kesharwani
 
PPTX
Destructive Tests corrosion engineer (1).pptx
zeidali3
 
PPTX
Hashing Introduction , hash functions and techniques
sailajam21
 
PPTX
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
PPTX
UNIT DAA PPT cover all topics 2021 regulation
archu26
 
PDF
6th International Conference on Machine Learning Techniques and Data Science ...
ijistjournal
 
PPTX
site survey architecture student B.arch.
sri02032006
 
PPTX
Innowell Capability B0425 - Commercial Buildings.pptx
regobertroza
 
PPTX
MPMC_Module-2 xxxxxxxxxxxxxxxxxxxxx.pptx
ShivanshVaidya5
 
PPTX
EC3551-Transmission lines Demo class .pptx
Mahalakshmiprasannag
 
PPTX
Benefits_^0_Challigi😙🏡💐8fenges[1].pptx
akghostmaker
 
PDF
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
PPTX
原版一样(Acadia毕业证书)加拿大阿卡迪亚大学毕业证办理方法
Taqyea
 
PPTX
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
DOC
MRRS Strength and Durability of Concrete
CivilMythili
 
PPTX
Thermal runway and thermal stability.pptx
godow93766
 
PDF
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
Server Side Web Development Unit 1 of Nodejs.pptx
sneha852132
 
Pharmaceuticals and fine chemicals.pptxx
jaypa242004
 
Introduction to Neural Networks and Perceptron Learning Algorithm.pptx
Kayalvizhi A
 
Statistical Data Analysis Using SPSS Software
shrikrishna kesharwani
 
Destructive Tests corrosion engineer (1).pptx
zeidali3
 
Hashing Introduction , hash functions and techniques
sailajam21
 
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
UNIT DAA PPT cover all topics 2021 regulation
archu26
 
6th International Conference on Machine Learning Techniques and Data Science ...
ijistjournal
 
site survey architecture student B.arch.
sri02032006
 
Innowell Capability B0425 - Commercial Buildings.pptx
regobertroza
 
MPMC_Module-2 xxxxxxxxxxxxxxxxxxxxx.pptx
ShivanshVaidya5
 
EC3551-Transmission lines Demo class .pptx
Mahalakshmiprasannag
 
Benefits_^0_Challigi😙🏡💐8fenges[1].pptx
akghostmaker
 
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
原版一样(Acadia毕业证书)加拿大阿卡迪亚大学毕业证办理方法
Taqyea
 
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
MRRS Strength and Durability of Concrete
CivilMythili
 
Thermal runway and thermal stability.pptx
godow93766
 
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 

Data storage systems

  • 2. ● Dmitry Alimov (@delimitry) ● Software Engineer ● SPbPython and PiterPy active member & speaker ● SPbPython drinkups co-organizer ● CTF player with SiBears team ● Python Software Foundation (PSF) contributing member 2 $ whoami
  • 3. Outline ● Storage data structures ○ B-tree ○ LSM-tree ○ Other indices ● RUM conjecture ● OLTP, OLAP, HTAP ● SQL, NoSQL, NewSQL ● DB in CPython ● Books and references * * References are in brackets “[ref num]” 3
  • 4. Intro - The amounts of data are constantly growing 4
  • 5. Intro - The amounts of data are constantly growing - Every year new databases appear, existing ones are improved 5
  • 6. Intro - The amounts of data are constantly growing - Every year new databases appear, existing ones are improved - Each database has its own trade-offs 6
  • 7. Intro - The amounts of data are constantly growing - Every year new databases appear, existing ones are improved - Each database has its own trade-offs - Understanding them helps to choose the right one 7
  • 8. Intro - The amounts of data are constantly growing - Every year new databases appear, existing ones are improved - Each database has its own trade-offs - Understanding them helps to choose the right one - Knowing and understanding of storage internals helps to make better design decisions, troubleshoot problems, tune database 8
  • 10. “Wer Ordnung hält, ist nur zu faul zum Suchen” (He who keeps order is just too lazy to spend his time searching) German proverb 10
  • 11. Simple datastore def set(key, value): with open('main.db', 'a') as db_file: db_file.write('{},{}n'.format(key, value)) def get(key): value = None with open('main.db', 'r') as db_file: for line in db_file: k, v = line.split(',') if k == key: value = v.rstrip() return value 11
  • 12. Simple datastore >>> set('a', 'one') >>> set('b', 'two') >>> set('c', 'three') >>> set('b', 'four') >>> print(get('a')) one >>> print(get('b')) four >>> print(get('z')) None 12 $ cat main.db a,one b,two c,three b,four
  • 13. Questions - Escaping 13 >>> set('a,b', 'oops') $ cat main.db ... a,b,oops # ??? ...
  • 14. Questions - Escaping - Deleting 14 >>> delete('c') $ cat main.db ... c,<tombstone> ...
  • 15. Questions - Escaping - Deleting - Concurrency 15 More questions: Locks MVCC Single-writer, multiple-reader ... main.db client 1 client 2 client N ...
  • 16. Questions - Escaping - Deleting - Concurrency - Compaction 16 $ cat main.db a,one b,two c,three b,four c,<tombstone> $ cat main.db a,one b,four Compact
  • 17. Questions - Escaping - Deleting - Concurrency - Compaction - Performance 17 $ cat main.db a,one b,two c,three b,four Insert: O(1), Search: O(n)
  • 19. Database indices - Hash indices [50] 19
  • 20. Database indices - Hash indices [50] - B-tree [50] 20
  • 21. Database indices - Hash indices [50] - B-tree [50] - LSM-tree [54] 21
  • 22. Database indices - Hash indices [50] - B-tree [50] - LSM-tree [54] - Other (Spatial indices (R-trees), BRIN, Log-Structured Hash Table, etc) 22
  • 23. Database indices - Hash indices [50] - B-tree [50] - LSM-tree [54] - Other (Spatial indices (R-trees), BRIN, Log-Structured Hash Table, etc) Indices speed up read queries, but slow down writes 23
  • 26. 26 “B-trees are by far the most important access path structure in database and file systems” Gray and Reuter, 1992 [31] “It could be said that the world’s information is at our fingertips because of B-trees” Goetz Graefe, 2011 [32]
  • 27. B-tree 27 Self-balancing tree structure, invented in 1971 by Rudolf Bayer and Ed McCreight
  • 28. B-tree 28 Self-balancing tree structure, invented in 1971 by Rudolf Bayer and Ed McCreight The most widely used indexing structure [1]
  • 29. B-tree 29 Self-balancing tree structure, invented in 1971 by Rudolf Bayer and Ed McCreight The most widely used indexing structure [1] Used in: MySQL (InnoDB), PostgreSQL, MongoDB, Oracle DB, MS SQL Server, IBM DB2, CouchDB, Couchbase, etc
  • 30. B-tree 30 Self-balancing tree structure, invented in 1971 by Rudolf Bayer and Ed McCreight The most widely used indexing structure [1] Used in: MySQL (InnoDB), PostgreSQL, MongoDB, Oracle DB, MS SQL Server, IBM DB2, CouchDB, Couchbase, etc Optimized for paged data access [7]
  • 31. B-tree 31 Self-balancing tree structure, invented in 1971 by Rudolf Bayer and Ed McCreight The most widely used indexing structure [1] Used in: MySQL (InnoDB), PostgreSQL, MongoDB, Oracle DB, MS SQL Server, IBM DB2, CouchDB, Couchbase, etc Optimized for paged data access [7] Branching factor between 50 and 2000 is often used [50]
  • 32. B-tree 32 Self-balancing tree structure, invented in 1971 by Rudolf Bayer and Ed McCreight The most widely used indexing structure [1] Used in: MySQL (InnoDB), PostgreSQL, MongoDB, Oracle DB, MS SQL Server, IBM DB2, CouchDB, Couchbase, etc Optimized for paged data access [7] Branching factor between 50 and 2000 is often used [50] Typically faster for reads [39]
  • 33. nn B-tree 33 24 14 19 2 8 15 16 20 21 22 27 30 42 50 55 56 57 62 33 52 2 8 14 ... aa be ff ... key value aa be ok z k2 val foo top $ ret 123 u aa n bar ff abc tt py
  • 34. nn B-tree durability 34 24 14 19 2 8 15 16 20 21 22 27 30 42 50 55 56 57 62 33 52 Write-ahead log (WAL) ………… ………… ………… 2 8 14 ... aa be ff ... key value aa be ok z k2 val foo top $ ret 123 u aa n bar ff abc tt py
  • 35. B-tree point query 35 24 14 19 2 8 15 16 20 21 22 27 30 42 50 55 56 57 62 select value with key 16 33 52 Insert: O(logB N) Search: O(logB N) 16 < 24 16 < 14 16 < 15 16 = 16 16 < 19
  • 36. B-tree range query 36 24 14 19 2 8 15 16 20 21 22 27 30 42 50 55 56 57 62 select values with keys in [15...27] 33 52 Non-optimal :(
  • 37. B+ tree 37 27 15 20 2 8 15 16 20 21 22 27 30 42 50 55 56 57 62 42 55 aa be ok z k2 val foo top $ ret 123 u aa n bar
  • 38. B+ tree range query 38 27 15 20 2 8 15 16 20 21 22 27 30 42 50 55 56 57 62 42 55 aa be ok z k2 val foo top $ ret 123 u aa n bar Insert: O(logB N) Search: O(logB N) RangeQuery: O(logB N + k) select values with keys in [15...27]
  • 41. 41 Memory vs Disk [56, 57, 58, 59] Operation Time, ns * Comment Memory access 100 SSD random read 16 000 (16 µs) HDD seek 4 000 000 (4 ms) SSD I/O 50 000 - 150 000 (50 - 150 ) HDD I/O 1 000 000 - 10 000 000 (1 - 10 ms) Read 1 MB sequentially from memory 9 000 (9 µs) Read 1 MB sequentially from SSD 200 000 (200 µs) 22x memory Read 1 MB sequentially from HDD 2 000 000 (2 ms) 10x SSD, 220x memory * Numbers for 2015
  • 43. Patrick O'Neil et al., introduced in 1996 [54] LSM-tree (Log-structured merge-tree) 43
  • 44. Patrick O'Neil et al., introduced in 1996 [54] Used in: LevelDB, RocksDB, Cassandra, HBase, BigTable, InfluxDB, ScyllaDB, SQLite4, Tarantool, MongoDB (WiredTiger), etc. LSM-tree (Log-structured merge-tree) 44
  • 45. Patrick O'Neil et al., introduced in 1996 [54] Used in: LevelDB, RocksDB, Cassandra, HBase, BigTable, InfluxDB, ScyllaDB, SQLite4, Tarantool, MongoDB (WiredTiger), etc. Google’s Bigtable paper in 2006 [1, 53] LSM-tree (Log-structured merge-tree) 45
  • 46. Patrick O'Neil et al., introduced in 1996 [54] Used in: LevelDB, RocksDB, Cassandra, HBase, BigTable, InfluxDB, ScyllaDB, SQLite4, Tarantool, MongoDB (WiredTiger), etc. Google’s Bigtable paper in 2006 [1, 53] Memtable (B-tree, Skip List, etc) LSM-tree (Log-structured merge-tree) 46
  • 47. Patrick O'Neil et al., introduced in 1996 [54] Used in: LevelDB, RocksDB, Cassandra, HBase, BigTable, InfluxDB, ScyllaDB, SQLite4, Tarantool, MongoDB (WiredTiger), etc. Google’s Bigtable paper in 2006 [1, 53] Memtable (B-tree, Skip List, etc) LSM-tree (Log-structured merge-tree) 47 https://www.themarysue.com/periodic-meme-table/
  • 48. Patrick O'Neil et al., introduced in 1996 [54] Used in: LevelDB, RocksDB, Cassandra, HBase, BigTable, InfluxDB, ScyllaDB, SQLite4, Tarantool, MongoDB (WiredTiger), etc. Google’s Bigtable paper in 2006 [1, 53] Memtable (B-tree, Skip List, etc) Sorted String Table (SSTable) - immutable LSM-tree (Log-structured merge-tree) 48
  • 49. Patrick O'Neil et al., introduced in 1996 [54] Used in: LevelDB, RocksDB, Cassandra, HBase, BigTable, InfluxDB, ScyllaDB, SQLite4, Tarantool, MongoDB (WiredTiger), etc. Google’s Bigtable paper in 2006 [1, 53] Memtable (B-tree, Skip List, etc) Sorted String Table (SSTable) - immutable Typically faster for writes [39] LSM-tree (Log-structured merge-tree) 49
  • 50. 50 Memory Disk memtable (B-tree, Skip List, etc) SSTable data aaaa bbb ... zzzz 124 7351 ... 7 key file offset aaaa 0 SSTable index ... ... key value 0 1 0 1 1 Bloom filter LSM-tree
  • 51. 51 Memory Disk memtable (B-tree, Skip List, etc) SSTable data aaaa bbb ... zzzz Commit log 124 7351 ... 7 ………… ………… ………… ………… key file offset aaaa 0 SSTable index ... ... key value 0 1 0 1 1 Bloom filter LSM-tree durability
  • 52. 52 SSTables compact & merge aaaa foo ... zzzz 124 7351 ... <deleted> key value aaaa aab ... zzzz 123 1 ... 7 key value aaaa aab ... yy 124 1 ... 222 key value foo 7351 SSTable (merged) SSTable (old) SSTable (new) Leveled & Size-tiered compaction
  • 53. Bloom filter [73] 53 Created by Burton Howard Bloom in 1970
  • 54. Bloom filter [73] 54 Created by Burton Howard Bloom in 1970 Space-efficient probabilistic data structure
  • 55. Bloom filter [73] 55 Created by Burton Howard Bloom in 1970 Space-efficient probabilistic data structure Used by: Google Bigtable, Apache HBase, Cassandra, and PostgreSQL [79]
  • 56. Bloom filter [73] 56 Created by Burton Howard Bloom in 1970 Space-efficient probabilistic data structure Used by: Google Bigtable, Apache HBase, Cassandra, and PostgreSQL [79] Akamai - to prevent "one-hit-wonders" from being stored in its disk caches
  • 57. Bloom filter [73] 57 Created by Burton Howard Bloom in 1970 Space-efficient probabilistic data structure Used by: Google Bigtable, Apache HBase, Cassandra, and PostgreSQL [79] Akamai - to prevent "one-hit-wonders" from being stored in its disk caches The Google Chrome - to identify malicious URLs
  • 58. Bloom filter [73] 58 Created by Burton Howard Bloom in 1970 Space-efficient probabilistic data structure Used by: Google Bigtable, Apache HBase, Cassandra, and PostgreSQL [79] Akamai - to prevent "one-hit-wonders" from being stored in its disk caches The Google Chrome - to identify malicious URLs Medium - to avoid recommending articles a user has previously read
  • 59. Bloom filter 59 0 1 0 0 1 0 1 1 0 0 1 0 abc foo key “foo” probably exists in SSTable key “abc” definitely not Bit array
  • 60. Bloom filter 60 0 1 0 0 1 0 1 1 0 0 1 0 abc foo SSTable SSTable data foo 7351 key file offset foo 3584 SSTable index key value key “foo” probably exists in SSTable try to get its value Bit array
  • 62. R-tree 62 Proposed by Antonin Guttman in 1984 [82]
  • 63. R-tree 63 Proposed by Antonin Guttman in 1984 [82] Tree data structure for indexing spatial information such as geographical coordinates, rectangles or polygons [82]
  • 64. R-tree 64 Proposed by Antonin Guttman in 1984 [82] Tree data structure for indexing spatial information such as geographical coordinates, rectangles or polygons [82] Common operation on spatial data is a search for all objects in an area [83], e.g.: “Find all shops within 1 km of my current location”
  • 65. R6 R-tree 65 R1 R2 R3 R4 R6 R7 R3 R4 R5 R5 R8 R9 R7 R10 R11 R12 R13 R14 R15 R8 R9 R10 R11 R12 R13 R14 R15 R1 R2
  • 67. Block Range Index (BRIN) [80] 67 Proposed by Alvaro Herrera of 2ndQuadrant in 2013 as Minmax index [80, 81]
  • 68. Block Range Index (BRIN) [80] 68 Proposed by Alvaro Herrera of 2ndQuadrant in 2013 as Minmax index [80, 81] Designed for large tables (best for ordered set)
  • 69. Block Range Index (BRIN) [80] 69 Proposed by Alvaro Herrera of 2ndQuadrant in 2013 as Minmax index [80, 81] Designed for large tables (best for ordered set) Used in: PostgreSQL
  • 70. Block Range Index (BRIN) [80] 70 Proposed by Alvaro Herrera of 2ndQuadrant in 2013 as Minmax index [80, 81] Designed for large tables (best for ordered set) Used in: PostgreSQL Other vendors have similar features: Oracle "storage indexes", Netezza “zone maps”, Infobright “data packs”, MonetDB, Apache Hive, ORC, Parquet [80, 81]
  • 71. B-tree vs BRIN 71 24 2 11 33 52 abc 18 22 55 57 70 2 22 24 55 33 52 57 70 foo block range min value max value 1 2 22 2 24 55 3 33 52 4 57 70 B-tree BRIN
  • 73. Log-Structured Hash Table 73 key file offset car 54 droid 475 current data file older data fileBitcask (the default storage engine in Riak) [51] older data file car Memory Disk engine number 3710 ... ... droid model name S21 ... ...
  • 74. Log-Structured Hash Table 74 key file offset car 54 droid 475 current data file older data file older data file car Memory Disk engine number 3710 ... ... droid model name S21 ... Limitations: 1) must fit in memory 2) ranges not efficient ... Bitcask (the default storage engine in Riak) [51]
  • 76. RUM Conjecture [38, 39] 76 Read Optimized Update Optimized Memory Optimized LSM Hash B-tree Trie Skip List Sparse Index Bloom filter Bitmap Cracking Merging Point & Tree Indexes Compressible/Approximate Indexes Differential Structures Adaptive Structures
  • 78. - Read amplification — amount of work done per logical read operation [49] 78 Amplifications
  • 79. - Read amplification — amount of work done per logical read operation [49] - Write amplification — amount of work done per write operation [49] Writing 1 byte -> writing a page (up to 16 KB for some models) [36] 79 Amplifications
  • 80. - Read amplification — amount of work done per logical read operation [49] - Write amplification — amount of work done per write operation [49] Writing 1 byte -> writing a page (up to 16 KB for some models) [36] - Space amplification — ratio of the size of DB to the size of the data in DB [49] 80 Amplifications
  • 81. - Read amplification — amount of work done per logical read operation [49] - Write amplification — amount of work done per write operation [49] Writing 1 byte -> writing a page (up to 16 KB for some models) [36] - Space amplification — ratio of the size of DB to the size of the data in DB [49] The SPAce, Read Or Write theorem (SPARROW) [46] RA is inversely related to WA, and WA is inversely related to SA 81 Amplifications
  • 82. - Read amplification — amount of work done per logical read operation [49] - Write amplification — amount of work done per write operation [49] Writing 1 byte -> writing a page (up to 16 KB for some models) [36] - Space amplification — ratio of the size of DB to the size of the data in DB [49] The SPAce, Read Or Write theorem (SPARROW) [46] RA is inversely related to WA, and WA is inversely related to SA Amplification and other issues are heavily dependent on workload, configuration of the engine, and the specific implementation [48] 82 Amplifications
  • 83. Interesting projects The periodic table of data structures [41] 83
  • 84. Interesting projects The periodic table of data structures [41] Data calculator [42, 43] Interactive, semi-automated design of data structures 84
  • 85. Interesting projects The periodic table of data structures [41] Data calculator [42, 43] Interactive, semi-automated design of data structures CrimsonDB [45] A self-designing key-value store 85
  • 87. OLTP vs OLAP 87 In the early days of business data processing, a write to the database typically corresponded to a commercial transactions [1]
  • 88. OLTP vs OLAP 88 In the early days of business data processing, a write to the database typically corresponded to a commercial transactions [1] Databases started being used for many different kinds of applications. Because applications are interactive, the access pattern became known as online transaction processing (OLTP)
  • 89. OLTP vs OLAP 89 In the early days of business data processing, a write to the database typically corresponded to a commercial transactions [1] Databases started being used for many different kinds of applications. Because applications are interactive, the access pattern became known as online transaction processing (OLTP) Databases also started being increasingly used for data analytics
  • 90. OLTP vs OLAP 90 In the early days of business data processing, a write to the database typically corresponded to a commercial transactions [1] Databases started being used for many different kinds of applications. Because applications are interactive, the access pattern became known as online transaction processing (OLTP) Databases also started being increasingly used for data analytics Databases for online analytical processing (OLAP) was called a Data Warehouse
  • 91. OLTP vs OLAP 91 In the early days of business data processing, a write to the database typically corresponded to a commercial transactions [1] Databases started being used for many different kinds of applications. Because applications are interactive, the access pattern became known as online transaction processing (OLTP) Databases also started being increasingly used for data analytics Databases for online analytical processing (OLAP) was called a Data Warehouse Hybrid transaction/analytical processing (HTAP) [84]
  • 92. Row oriented vs Column oriented DBMS 92 name age John 54 Alice 22 datetime 10/10/2018 12:01:23 10/10/2018 12:01:24 name age John 54 Alice 22 datetime 10/10/2018 12:01:23 10/10/2018 12:01:24 name age John 54 Alice 22 datetime 10/10/2018 12:01:23 10/10/2018 12:01:24 Compression: RLE, LZW, etc Columns Rows
  • 93. Column oriented & time series DBs 93 Apache Parquet, ClickHouse, C-Store, Greenplum, MonetDB, Vertica, etc. Time series databases (TSDB): Druid, Akumuli, InfluxDB, Riak TS, etc. C-Store Akumuli
  • 95. 95 RDBMS/SQL, NoSQL, NewSQL [72] RDBMS/SQL NoSQL NewSQL Relational Yes No Yes ACID transactions Yes No Yes SQL support Yes No Yes Horizontal scalability No Yes Yes Schemaless No Yes No
  • 96. 96 Matthew Aslett, The 451 Group [55]
  • 98. Python DB API Specification PEP 248 - v1.0 (Release-Date: 09 Apr 1996 [74]) 98
  • 99. Python DB API Specification PEP 248 - v1.0 (Release-Date: 09 Apr 1996 [74]) PEP 249 - v2.0 (Release-Date: 07 Apr 1999 [75]) 99
  • 100. Python DB API Specification PEP 248 - v1.0 (Release-Date: 09 Apr 1996 [74]) PEP 249 - v2.0 (Release-Date: 07 Apr 1999 [75]) Implementations are available for: - PostgreSQL (psycopg2, txpostgres, ...) - MySQL (mysql-python, PyMySQL, ...) - MS SQL Server (adodbapi, pymssql, mxODBC, pyodbc, ...) - Oracle (cx_Oracle, mxODBC, pyodbc, ...) - etc. 100
  • 101. DBs in Python dbm, gdbm or bsddb dbm — interfaces to Unix “databases” [76] 101
  • 102. DBs in Python dbm, gdbm or bsddb dbm — interfaces to Unix “databases” [76] shelve “shelf” — persistent, dictionary-like object The values can be arbitrary Python objects — anything that the pickle module can handle, but the keys are strings [77] 102
  • 103. DBs in Python dbm, gdbm or bsddb dbm — interfaces to Unix “databases” [76] shelve “shelf” — persistent, dictionary-like object The values can be arbitrary Python objects — anything that the pickle module can handle, but the keys are strings [77] 103 https://pixnio.com/food-and-drink/bell-pepper-jar-carfiol-veg etable-food-diet-glass-organic
  • 104. DBs in Python dbm, gdbm or bsddb dbm — interfaces to Unix “databases” [76] shelve “shelf” — persistent, dictionary-like object The values can be arbitrary Python objects — anything that the pickle module can handle, but the keys are strings [77] sqlite3 sqlite3 — DB-API 2.0 interface for SQLite databases [78] 104
  • 105. ● OLTP and OLAP ● OLTP: ○ B-tree ○ LSM-tree ○ Other indices ○ RAM, SSD ● OLAP ○ Column-oriented storage ● RUM Conjecture ● Amplifications Summary 105
  • 108. References 1. Martin Kleppmann: Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems, 1st edition. O'Reilly Media, 2017. ISBN: 978-1-449-37332-0 (https://dataintensive.net) 2. Alex Petrov: On Disk IO, Part 1: Flavors of IO, medium.com, September 3, 2017. (https://medium.com/databasss/on-disk-io-part-1-flavours-of-io-8e1ace1de017) 3. Alex Petrov: On Disk IO, Part 2: More Flavours of IO, medium.com, September 11, 2017. (https://medium.com/databasss/on-disk-io-part-2-more-flavours-of-io-c945db3edb13) 4. Alex Petrov: On Disk IO, Part 3: LSM Trees, medium.com, September 27, 2017. (https://medium.com/databasss/on-disk-io-part-3-lsm-trees-8b2da218496f) 5. Alex Petrov: On Disk IO, Part 4: B-Trees and RUM Conjecture, medium.com, October 4, 2017. (https://medium.com/databasss/on-disk-storage-part-4-b-trees-30791060741) 6. Alex Petrov: On Disk IO, Part 5: Access Patterns in LSM Trees, medium.com, October 30, 2017. (https://medium.com/databasss/on-disk-io-access-patterns-in-lsm-trees-2ba8dffc05f9) 7. Alex Petrov: Algorithms Behind Modern Storage Systems. Communications of the ACM, volume 61, number 8, pages 38-44, August 2018, doi:10.1145/3209210 (https://queue.acm.org/detail.cfm?id=3220266) 108
  • 109. References 8. PostgreSQL 9.2.24 Documentation: Chapter 11. Indexes (https://www.postgresql.org/docs/9.2/static/indexes-types.html) 9. MySQL 8.0 Reference Manual: 15.8.2.2 The Physical Structure of an InnoDB Index (https://dev.mysql.com/doc/refman/8.0/en/innodb-physical-structure.html) 10. Oracle Database Concepts: Indexes and Index-Organized Tables (https://docs.oracle.com/cd/E11882_01/server.112/e40540/indexiot.htm#CNCPT1170) 11. SQL Server Index Design Guide (https://technet.microsoft.com/en-us/library/jj835095(v=sql.110).aspx) 12. IBM Knowledge Center: Table and index management for standard tables (https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.admin.perf.doc/doc/c 0005424.html) 13. MariaDB Knowledge Base: Storage Engine Index Types (https://mariadb.com/kb/en/library/storage-engine-index-types/) 14. Wiredtiger: Btree vs LSM (https://github.com/wiredtiger/wiredtiger/wiki/Btree-vs-LSM/1cae5a2c73e938fa2095d900f8c25a9ee9a05412) 15. CouchDB The Definitive Guide: The Power of B-trees (http://guide.couchdb.org/draft/btree.html) 109
  • 110. References 16. Architecture of SQLite (https://www.sqlite.org/arch.html) 17. The Couchbase Blog: Compaction magic in Couchbase Server 2.0 (https://blog.couchbase.com/compaction-magic-couchbase-server-20/) 18. Apache Cassandra 3.0: Storage engine (https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlManageOndisk.html) 19. Apache HBase: Accordion: HBase Breathes with In-Memory Compaction, blogs.apache.org, April 09, 2017. (https://blogs.apache.org/hbase/entry/accordion-hbase-breathes-with-in) 20. InfluxData Documentation: In-memory indexing and the Time-Structured Merge Tree (TSM) (https://docs.influxdata.com/influxdb/v1.6/concepts/storage_engine/#the-influxdb-storage-engine-and-the-tim e-structured-merge-tree-tsm) 21. Ilya Grigorik: SSTable and Log Structured Storage: LevelDB, igvita.com, February 06, 2012. (https://www.igvita.com/2012/02/06/sstable-and-log-structured-storage-leveldb/) 22. RocksDB Basics (https://github.com/facebook/rocksdb/wiki/RocksDB-Basics/8e2e3f69e163fbc370b13c3d2baf8ecf798f85e5) 23. SSTable compaction and compaction strategies (https://github.com/scylladb/scylla/wiki/SSTable-compaction-and-compaction-strategies/419412878eea8a9f9 775fb718eda2fed2c1d551b) 110
  • 111. References 24. Nadav Har'El: Scylla’s Compaction Strategies Series: Write Amplification in Leveled Compaction, scylladb.com, January 31, 2018. (https://www.scylladb.com/2018/01/31/compaction-series-leveled-compaction/) 25. Vinyl Architecture (https://github.com/tarantool/tarantool/wiki/Vinyl-Architecture/c83dec9b0719478ef24d6407ba6583faf6ae4547) 26. Tarantool: Storage engines (https://www.tarantool.io/en/doc/1.9/book/box/engines/) 27. SQLite4: LSM Users Guide (https://sqlite.org/src4/doc/trunk/www/lsmusr.wiki) 28. Peter Bailis, Joseph M. Hellerstein, Michael Stonebraker: Readings in Database Systems, 5th Edition, 2015. (http://www.redbook.io/) 29. Douglas Comer: The Ubiquitous B-Tree, ACM Computing Surveys, volume 11, number 2, pages 121–137, June 1979. doi:10.1145/356770.356776 (http://www.ezdoum.com/upload/14/20020512204603/TheUbiquitousB-Tree.pdf) 30. Yinan Li, Bingsheng He, Robin Jun Yang, et al.: Tree Indexing on Solid State Drives, Proceedings of the VLDB Endowment, volume 3, number 1, pages 1195–1206, September 2010. (http://www.vldb.org/pvldb/vldb2010/papers/R106.pdf) 111
  • 112. References 31. Jim Gray, Andreas Reuter: Transaction processing: concepts and techniques, Morgan Kaufmann, 1992. ISBN: 978-1-55860-190-1 32. Goetz Graefe: Modern B-Tree Techniques, Now Publishers Inc, 2011. ISBN: 978-1-60198-482-1 33. Niv Dayan: Log-Structured-Merge Trees, Comp115 guest lecture, February 23, 2017. (http://manos.athanassoulis.net/classes/Comp115-Spring2017/slides/Comp115-Guest-Lecture-LSM-Trees.pdf) 34. Memory Prices 1957 to 2018 (https://jcmit.net/memoryprice.htm) 35. Yinan Li, Bingsheng He, Robin Jun Yang, et al.: Tree Indexing on Solid State Drives, Proceedings of the VLDB Endowment, volume 3, number 1, pages 1195–1206, September 2010. (http://www.vldb.org/pvldb/vldb2010/papers/R106.pdf) 36. Emmanuel Goossaert: Coding for SSDs, codecapsule.com, February 12, 2014. (http://codecapsule.com/2014/02/12/coding-for-ssds-part-1-introduction-and-table-of-contents/) 37. Write amplification, en.wikipedia.org. https://en.wikipedia.org/wiki/Write_amplification 38. The RUM Conjecture, daslab.seas.harvard.edu, 2016. http://daslab.seas.harvard.edu/rum-conjecture/ 39. Manos Athanassoulis, Michael S. Kester, Lukas M. Maas, et al.: Designing Access Methods: The RUM Conjecture, at 19th International Conference on Extending Database Technology (EDBT), March 2016. doi:10.5441/002/edbt.2016.42 (http://openproceedings.org/2016/conf/edbt/paper-12.pdf) 112
  • 113. References 40. Гипотеза RUM, delimitry.blogspot.com, March 13, 2018. (http://delimitry.blogspot.com/2018/03/rum.html) 41. S. Idreos, et al., The Periodic Table of Data Structures, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol. 41, no. 3, pp. 64-75, 2018. (http://sites.computer.org/debull/A18sept/p64.pdf) 42. S. Idreos, K. Zoumpatianos, B. Hentschel, M. Kester, and D. Guo: The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models, SIGMOD, 2018. 43. Data Calculator, daslab.seas.harvard.edu, 2018. (http://daslab.seas.harvard.edu/datacalculator/) 44. Периодическая таблица структур данных, delimitry.blogspot.com, October 14, 2018. (http://delimitry.blogspot.com/2018/10/blog-post.html) 45. CrimsonDB: A Self-Designing Key-Value Store, daslab.seas.harvard.edu, 2018. (http://daslab.seas.harvard.edu/projects/crimsondb/) 46. The SPARROW Theorem for performance of storage systems, rocksdb.blogspot.com, October 16, 2013. (http://rocksdb.blogspot.com/2013/10/SparrowTheorem.html) 47. Goetz Graefe: Modern B-Tree Techniques, Foundations and Trends in Databases, volume 3, number 4, pages 203–402, August 2011. doi:10.1561/1900000028 (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.219.7269&rep=rep1&type=pdf) 113
  • 114. References 48. Baron Schwartz: The State of the Storage Engine, dzone.com, March 16, 2015. (https://dzone.com/articles/state-storage-engine) 49. Mark Callaghan: Read, write & space amplification - pick 2, smalldatum.blogspot.com, November 23, 2015. (http://smalldatum.blogspot.com/2015/11/read-write-space-amplification-pick-2_23.html) 50. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein: Introduction to Algorithms, 3rd edition. MIT Press, 2009. ISBN: 978-0-262-53305-8 51. Justin Sheehy and David Smith: Bitcask: A Log-Structured Hash Table for Fast Key/Value Data, Basho Technologies, April 2010. (http://basho.com/wp-content/uploads/2015/05/bitcask-intro.pdf) 52. D. Severance/G. Lohman: Differential Files: Their Applications to the Maintenance of Large Databases, ACM Transactions on Database Systems, volume 1, number 3, pages 256–367, September 1976. doi:10.1145/320473.320484 53. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, et al.: Bigtable: A Distributed Storage System for Structured Data, at 7th USENIX Symposium on Operating System Design and Implementation (OSDI), November 2006. (https://ai.google/research/pubs/pub27898) 114
  • 115. References 54. Patrick O’Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil: The Log-Structured Merge-Tree (LSM-Tree), Acta Informatica, volume 33, number 4, pages 351–385, June 1996. doi:10.1007/s002360050048 (https://www.cs.umb.edu/~poneil/lsmtree.pdf) 55. Matthew Aslett: NoSQL, NewSQL and Beyond: The answer to SPRAINed relational databases, blogs.the451group.com, April 15th, 2011. (https://blogs.the451group.com/information_management/2011/04/15/nosql-newsql-and-beyond/) 56. Brendan Gregg: Systems Performance: Enterprise and the Cloud, Prentice Hall, October 2013. ISBN: 978-0-13-339009-4 57. Latency Numbers Every Programmer Should Know (https://gist.github.com/jboner/2841832) 58. Peter Norvig: Teach Yourself Programming in Ten Years (http://norvig.com/21-days.html#answers) 59. Latency Numbers Every Programmer Should Know (Interactive Latency) (https://people.eecs.berkeley.edu/~rcs/research/interactive_latency.html) 60. Apache Parquet (https://parquet.apache.org/) 61. ClickHouse (https://clickhouse.yandex/) 62. C-Store (http://db.csail.mit.edu/projects/cstore/) 115
  • 116. References 63. Apache Druid (http://druid.io/) 64. Greenplum (https://greenplum.org/) 65. MonetDB (https://www.monetdb.org/) 66. Vertica (https://www.vertica.com/overview/) 67. Akumuli (https://akumuli.org/) 68. InfluxDB (https://www.influxdata.com/) 69. Riak TS (http://basho.com/products/riak-ts/) 70. Eric Evans: NOSQL 2009, blog.sym-link.com, May 12, 2009. (http://blog.sym-link.com/2009/05/12/nosql_2009.html) 71. Eric Evans: NoSQL: What’s in a Name?, blog.sym-link.com, October 30, 2009. (http://blog.sym-link.com/2009/10/30/nosql_whats_in_a_name.html) 72. Laurent Guérin: NewSQL: what’s this?, labs.sogeti.com, January 22, 2014. (http://labs.sogeti.com/newsql-whats/) 73. Burton H. Bloom: Space/Time Trade-offs in Hash Coding with Allowable Errors, Communications of the ACM, volume 13, number 7, pages 422–426, July 1970. doi:10.1145/362686.362692 (http://www.cs.upc.edu/~diaz/p422-bloom.pdf) 116
  • 117. References 74. http://svn.python.org/projects/peps/tags/date2001-07-16/pep-0248.txt (https://www.python.org/dev/peps/pep-0249/) 75. http://svn.python.org/projects/peps/tags/date2001-07-16/pep-0249.txt (https://www.python.org/dev/peps/pep-0249/) 76. Python 3.7.1rc1 documentation: dbm — Interfaces to Unix “databases” (https://docs.python.org/3/library/dbm.html) 77. Python 3.7.1rc1 documentation: shelve — Python object persistence (https://docs.python.org/3/library/shelve.html) 78. Python 3.7.1rc1 documentation: sqlite3 — DB-API 2.0 interface for SQLite databases (https://docs.python.org/3/library/sqlite3.html) 79. Bloom filter (https://en.wikipedia.org/wiki/Bloom_filter) 80. Block Range Index (https://en.wikipedia.org/wiki/Block_Range_Index) 81. PostgreSQL: [RFC] Minmax indexes (https://www.postgresql.org/message-id/[email protected]) 117
  • 118. References 82. R-tree (https://en.wikipedia.org/wiki/R-tree) 83. Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data, pp. 47-57, 1984. doi:10.1145/602259.602266. ISBN: 0-89791-128-8 (http://www-db.deis.unibo.it/courses/SI-LS/papers/Gut84.pdf) 84. Hybrid transactional/analytical processing (HTAP) (https://en.wikipedia.org/wiki/Hybrid_transactional/analytical_processing_(HTAP)) 118