Basics of Distributed Systems - Distributed Storage

A collection of computers that appear to its user as one computer.
Characteristics
 The computers operate concurrently.
 The computers fail independently.
 The computers do not share global clock.
 Examples
 Amazon.com
 Cassandra Database
 Is multi-core processor distributed system ?
 Is single core computer with peripherals (Wifi , printer , multiple displays etc. )distributed
system ?

 Distributed Storage
 Relational , Mongo , Cassandra, HDFS (Hadoop Distributed File System), Hbase , Redis
 Distributed Computation
 Hadoop, Spark, Storm , Akka , Apache Flink
 Distributed Synchronization
 NTP (Network time protocol ) , Vector Clocks
 Distributed Consensus
 Paxos, Zookeeper
 Distributed Messaging
 Apache Kafka , RabbitMQ
 Load Balancers
 Round Robin , weighted Round Robin , min load , weighted load , Session Aware etc.
 Serialization
 Protocol Buffers , Thrift , Avro etc.

 Single-Master storage
 Powerful machine and scaling up.
 Type of loads
Read heavy load
Write heavy load
Mixed read write load
 Scaling Strategies / Data distribution
 Read Replication (scaling out)
 Sharding (scaling out)

Master Node – Updates must pass through master node.
Followers Nodes – Gets asynchronous replication or data propagation from master
to follower nodes.
Read requests can be fulfilled by follower nodes. It increases over-all I/O of system.
Problems of such design :
 Increased complexity of replication.
 No guarantee of strong consistency.
 Read after write scenarios will not guarantee of latest value.
 Master node could be bottleneck for write request.
Model is suitable for read heavy work load.
Example : Google search engine , Relational Data base with cluster.

Maste
r
X =5
F1
X =5
Follower
X = 3 Follower
X =3

Data distribution techniques
Sharding
 Used in relational databases and distributed databases.
 Could be manual to completely automated based on scheme.
Consistent hashing
 Used in distributed data bases.
 It is automated .

 Used to partition data across multiple nodes based on some key or aspect.
 Techniques ranges from manual to automated sharding
 Functional Partitioning (burden on client)
Example : Store all user data on one node and transaction data on another node.
 Horizontal partitioning (popular)
 Ranges
 Hashes
 Directory
 Vertical partitioning (less popular)

 Data belongs to same set/table/relation is distributed across nodes.

Value Node
10 F1
N1
N2
X = 10
N3
N4
Shard
Router

Basics of Distributed Systems - Distributed Storage

 Shard routing layer to distribute writes/reads
 More complexity
 Routing layer , awareness of network topology , handle dynamic cluster
 Limited data model
 Every data model should have key which is used for routing.
 Limited data access patterns
 All read/write/update/delete queries must include the key.
 Redundant data for more access models.
 For accessing data with more than one key , Data need to be stored by those keys so multiple copies or de-
normalized data .
 Need to consider data access patterns before designing the models.
 Too much scatter gather for aggregations , read might slow down system.
 OLTP will slow down the system.
 Number of shards need to be decided early in system design.

 In case of hash function of in-memory map , when we re-hash after load factor
increases after certain threshold we have to re-hash all keys . So modular hash
function will not work in case when number of buckets of map are changing
dynamically .
 Consistent hashing is technique used to limit the reshuffling of keys when a hash
table structure is rebalanced .(Dynamically changing say number of buckets ).
 Hash space is shared by key hash space and virtual node hash space.
 Keys/virtual nodes are hashed to same value despite of number of physical nodes.
Only difference is they stored on different physical nodes.
 Advantages
 Avoids re-hashing of all keys when nodes leave and join.
 Example : For Cassandra there are 128 virtual nodes per physical node.

A , B are physical nodes mapped to 8 virtual nodes A , B ,C,D are physical nodes mapped to 8 virtual nodes
Node Hash
A (0-8),(16-24) ,(32-40) ,
(48-56)
B (8-16) , (24-32) ,(40-
48) ,(56-64)
Node Hash
A 0-8),(32-40)
B (8-16), (40-48)
C (16-24), (48-56)
D (24-32) , (56-64)
Remove Node C,D
50% keys are affected
Add Node C,D
50% keys are affected
Lets say we have hash function which gives 8 bit hash. So hash space is 2^8 = 64.
hash(John) = 00111100 hash(Jane) = 00011000 . Hashes are assigned to node segment ahead of node in
CWD.

 Eventual Consistency
 Consistency Tuning
R+W > N
R – Number of replicas to responds for successful reading
W – Number of replicas to respond for successful writing/updates
N – Number of replicas
 Failures
 Node offline
 Network latency
 GC like process making node un-responsive
 Hinted Handoffs , read repairs
 Huge impact on design ( write then read scenarios)

 CAP
 Consistency - Every request gets most recent value after (write/update/delete)
 Availability – Every request receives response without error .(No guarantee of
most recent value)
 Partition Tolerance – The system continues to respond despite of arbitrary
number of nodes fails. (can not communicate with other nodes temporarily or
permanently due to network partition , congestion or communication delays ,
GC pause in case of JVM)
 Ground Reality
 Partition Tolerance is must. Nobody wants data loss.
 Practical choice is always choosing between Consistency and Availability.
 Example:
 Amazon S3 services chooses availability over consistency so it is A-P system.

 In relational database
 Two phase commit in distributed relational databases. (suffer throughput)
 ACID properties of transaction in relational databases .
A – Atomic , Transactions ( bundle of statements) completes all or nothing.
C – Consistency , Keep database in valid state before and after transaction.
I – Isolation , Transactions acts like it is alone working the data.(serializable , repeatable
reads, read committed, read uncommitted ( dirty reads , phantom reads) )
D- Durability , Once transaction committed , changes are permanent.
 Can roll-back like that transaction as if did not happen
 Options in distributed storage systems
 Lighter transactions are supported like update if present etc.
 Write-off (no money back not guarantee of delivery )
 Re-try (try with exponential time interval )
 Compensating actions (say revert credit card payment)
 Distributed transactions (2PL) (slow you down.)
 Main reason to scarifies transactions is availability.
 Impact on design of applications using distributed storage.

 Aspects to consider
 Scale
 Transactional Needs
 Highly available
 Design to failures

Storage options and scenarios
 Relational databases
 Strong transactional requirements (OLTP systems)
 NoSQL
 Giant distributed hash table (which can not fit on single machine) with
nested keys.
 Key value stores Map<K,V>
 Document databases Map<K,{k1:v1,k2:v2,…}> , value is generally of type JSON or some
kind of serializable/de- serializable format or binary file.
 Columnar databases SortedMap<<K1,K2,K3..> , V>
 Graph databases AdjacencyMap<K ,[K1,K2,K3..]> , lots of small relations or links
 Search Engines , lots of indexes based on search requirements Map<K1,K> ,Map<K2,K>
,Map<K3,K> .. actual raw document storage Map<K,V>

 Distributed Computation …

Basics of Distributed Systems - Distributed Storage

More Related Content

What's hot (20)

Similar to Basics of Distributed Systems - Distributed Storage (20)

Recently uploaded (20)

Basics of Distributed Systems - Distributed Storage

Editor's Notes