A Decentralized Structure Storage Model
- Avinash Lakshman & Prashanth Malik
- Presented by
Srinidhi Katla
CASSANDRA
Topics covered:
 What Is Cassandra
 Motive
 Data Model
 Architecture
 The After Story
 Applications
Features of Cassandra
 Distributed Storage system
 Manages very large amounts of Data
 Highly available
 No Single point of failure
 Simple data model
 Dynamic control over Data layout and format
 Designed to run on cheap commodity hardware
 Handles high throughput while not sacrificing high
read efficiency
Motives behind Cassandra
 Storage needs of Inbox search problem
o High write throughput
o Increasing number of users
o High search latencies due to data distribution.
 Operational Requirements :
o Scalability
o Handle Hardware failure
 Inbox Search was launched in 2008 for 100 million
users ;
 Is Deployed as backend storage system for multiple
services within FB
Data Model
 Is based on Amazon’s Dynamo and Google’s Big
Table.
 Table : distributed Multi-dimensional map indexed by
a key
 Consists : Row key, Column, Column Family, Super
column Family
 Row Key : Can be considered equivalent to primary
index of the RDBMS.
 Column : is a “name , value, time ” (e.g.,
“color=red”).
 Column Family : Set of columns grouped together
Simple column Family
Super column Family : column family within
column family
Column Family
Image courtesy : http://www.ebaytechblog.com/author/jhpatel/#.VSPslfnF8SM
Column Family (Conti..)
 Access column using convention :
column_family:column
 Super column :
column_family:supercolumn:column
Facebook super column abstraction
 term search :
User Id = row key ;
Terms searched = supercolumn;
Message identifiers of message containing the
word = column
 Interaction
User ID : rowkey;
receipients’ IDs : supercolumn
Individual message identifier = columns
API
 Cassandra has thrift querying :
insert (table, key, row Mutation)
get(table, key, column Name)
delete(table, key, columnName)
Architecture
 Partitioning
 Replication
 Membership and Failure Detection
 Bootstrapping
 Scaling the cluster
 Local Persistence
Partitioning
 Data is partitioned dynamically over the nodes to
aid scaling.
 Implements order preserving consistent
hashing.(CH)
 Through consistent Hashing, coordinator for each
data key is determined.
 Advantages of CH : Departure and Arrival of
node only affects its neighbours.
 Disadvantage of CH : Non-uniform data
distribution . Hashing is unaware of the
heterogenity of the performance of nodes.
 Solution by Cassandra: Lightly loaded nodes
move on the ring to alleviate heavily loaded
nodes.
Replication :
 Required for ensuring High availability and
durability
 Replication Factor “N”
 Coordinator node is responsible for replication of
data at N-1 nodes.
 Replication Policies :
 Rack Unaware : replicated to N-1 successors of
coordinator
 Rack Aware Zookeeper is chosen,
 Data Center Aware informs the nodes what replicas to
store
• Meta data about ranges a node is
responsible for is stored in ZooKeeper as well as
the node.
Membership and Failure
Detection
 Membership is based on Scuttlebutt
 – Gossip based mechanism.
 Efficient CPU utilization
 Efficient utilization of gossip channel
 Used for membership and to disseminate system related
control state
 Failure detection : To check if the node is
available and to avoid attempts to communicate
with the unreachable nodes. – uses Modified ᶲ
Accrual Failure detector
 Failure detection emits suspicion level defined as ᶲ
instead of Boolean value.
Boot Strapping & Scaling
 Token assigned to new node is gossiped among all
the nodes.
 New node is assigned token so as alleviate the
heavily loaded node.
 New node reads the configuration file from the
ZooKeeper.
 Node outages are usually transient => Rebalancing
of partition assignment or repair of unreachable
replicas should be avoided.
 Change of node membership is manual.
 The heavily loaded node splits the data and
responsibility.
 Operational experience shows that the data can be
transferred at a rate of 40 Mbps from single node.
Local Persistence
 Relies on local file system
 Dedicated disk on each machine for commit log to
maximise the disk throughput
 Write :Data is first written to commit log and later to in-
memory data structure.
 After the data limit crosses a threshold value in the in-
memory DS, it is dumped to the disk.
 - Index is created for efficient lookup.
 Many files exist on the disk over time. Merge process
to collate these files into one file. Similar to
compaction process in Big Table.
 Generate index for 256K block for efficient lookup in
columns
Local Persistence (Conti)
 Read: Query the in-memory DS 1st. Then look up
in the disk.
 Files are looked up in the order of new to old.
 Bloom filter to check if the key exists in the file.
 Column indices
Reads and Writes
 Request for a key is routed to a node in the
cluster.
 Node determines the replicas and route request.
 Fail request if the replies are not received within
time.
 For Writes : routes request to replica and waits
for a quorun of replicas to acknowledge the
completion of writes
 For Reads : Based on Client set consistency
guarantee value, request is routed to either the
closest replica or request is routed to all replicas
and wait for the quorum of responses
Implementation
 Cassandra on each machine – partitioning
module, cluster membership, failure detection,
storage engine
 Implemented ground up using Java
 Purge commit log entries using rolling commit log
mechanism for 128 MB chunk.
 In memory DS and datafile for every column
family
 All writes to disk are sequential to maximize the
throughput
 No locks since the files dumped to the disk are
not mutated.
The After Story
 It was released as an open source project on Google
code in July 2008 which is now being developed and
marketed by Apache as Apache Cassandra
(henceforth referred as Cassandra in this slide).
 In Apache Cassandra, Super columns are stripped
due to performance issues. Instead composite column
is introduced
 Cassandra Query Language presents a data model
familiar to relational database users.
 Cassandra partitioning is still based on consistent
hashing, but has moved away from load balancing in
favor of virtual nodes,
 Order preserving hash function was ripped in favor of
a true OrderedPartitioner (later superseded by
ByteOrderedPartitioner).
The After Story (Conti..)
 In modern Cassandra terminology, the
coordinator is the node that processes a given
client’s request and routes it to the appropriate
replicas; it is not necessarily itself a replica.
 Zookeeper usage was restricted to Facebook’s in-
house Cassandra branch;
 Modern Cassandra management tools
include DataStax’s OpsCenter and Netflix’s
Priam.
Big Players
 Facebook Inbox search feature was implemented
on Cassandra where every user is an index and
the recipient and messages are stored as
columns. The sytem currently stores more than
50 TB of data on a 150 node cluster with a
median search latency of approximately 15 ms.
 Netflix, a video streaming firm stores 95% of its
data in Cassandra
 Ebay has implemented Cassandra for the
features like counts for “own” “want” “like” data on
its web page.
 Coursera, an online training service, has
Cassandra implemented for its mobile
applications
References:
 http://www.ebaytechblog.com/author/jhpatel/#.VS
PslfnF8SM
 http://www.divconq.com/2010/cassandra-
columns-and-supercolumns-and-rows/
 http://docs.datastax.com/en/articles/cassandra/ca
ssandrathenandnow.html
QUESTIONS?

More Related Content

PDF
An Introduction to Apache Cassandra
PDF
Cassandra - A Decentralized Structured Storage System
PPTX
Cassandra
PPT
6.1-Cassandra.ppt
PPT
6.1-Cassandra.ppt
PPT
Cassandra
PPTX
Cassandra - A decentralized storage system
PDF
About "Apache Cassandra"
An Introduction to Apache Cassandra
Cassandra - A Decentralized Structured Storage System
Cassandra
6.1-Cassandra.ppt
6.1-Cassandra.ppt
Cassandra
Cassandra - A decentralized storage system
About "Apache Cassandra"

Similar to 5266732.ppt (20)

PPTX
BigData Developers MeetUp
PPTX
Cassandra tutorial
PDF
Introduction to cassandra 2014
PPT
NOSQL Database: Apache Cassandra
PPT
in this ppt the basic details of cassandra database
PPTX
Learn Cassandra at edureka!
PDF
A Deep Dive into Apache Cassandra for .NET Developers
PPTX
Cassandra
PPTX
Unit -3 _Cassandra-CRUD Operations_Practice Examples
PPTX
Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces
PDF
Cassandra overview
PPTX
Cassandra training
PPTX
Cassandra & Python - Springfield MO User Group
PPTX
Learning Cassandra NoSQL
PPTX
Cassandra an overview
PPTX
Dynamo cassandra
PPT
Scaling Web Applications with Cassandra Presentation (1).ppt
ODP
Intro to cassandra
PPTX
Cassandra's Sweet Spot - an introduction to Apache Cassandra
PPTX
Cassandra - Research Paper Overview
BigData Developers MeetUp
Cassandra tutorial
Introduction to cassandra 2014
NOSQL Database: Apache Cassandra
in this ppt the basic details of cassandra database
Learn Cassandra at edureka!
A Deep Dive into Apache Cassandra for .NET Developers
Cassandra
Unit -3 _Cassandra-CRUD Operations_Practice Examples
Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces
Cassandra overview
Cassandra training
Cassandra & Python - Springfield MO User Group
Learning Cassandra NoSQL
Cassandra an overview
Dynamo cassandra
Scaling Web Applications with Cassandra Presentation (1).ppt
Intro to cassandra
Cassandra's Sweet Spot - an introduction to Apache Cassandra
Cassandra - Research Paper Overview

Recently uploaded (20)

PDF
dozer bulldozer Komatsu d85 px 15e0 Technical manual.pdf
PPTX
Instrument_Cluster_And_Infotainement.pptx
PPTX
Electric Vehicles vs Combustion Engine Vehicles.pptx
PPTX
Air craft and Rescue related ppt which detailed explain about
DOC
Bài tập bổ trợ I-learn Smart World 9 HK1 Unit 4 (HS) (1).doc
PDF
Some briefly information About buying linkedin accounts.pdf
PDF
Caterpillar Cat 330B L Excavator (Prefix 4RS) Service Repair Manual Instant D...
PDF
Deutz D6D EBE2 Volvo EC160BLC Excavator Service Repair Manual.pdf
PDF
Volvo Dump Truck A40E Service Repair Manual.pdf
PDF
Where to Buy Verified LinkedIn Accounts Securely in 2025.pdf
PDF
Bobcat tl35.70 telescopic handler service manual sn b3 zu11001 and above.pdf
PDF
Bobcat t40180 telescopic handler repair manual sn b33 j11001 and above.pdf
PPTX
DNA Packaging_ Structure and Function in Cells.pptx
PPTX
Training Material_Verification Station.pptx
PDF
Volvo EC160B LC EC160BLC Excavator Service Repair Manual Instant Download.pdf
PDF
BOBCAT 442 muffler SN 522311001 & Above.pdf
PDF
Bobcat t40140 telescopic handler service repair manual sn b33 j11001 and abov...
PPTX
APQP FOR AUTOMOTIVE AND PPAP FOR SUPPLIER
PDF
D85PX-15E0 Komatsu d85 px 15e0 dozer bulldozer Technical manual.pdf
PDF
CASE CX50B Series 2 Mini Excavator Service Repair Manual Instant Download.pdf
dozer bulldozer Komatsu d85 px 15e0 Technical manual.pdf
Instrument_Cluster_And_Infotainement.pptx
Electric Vehicles vs Combustion Engine Vehicles.pptx
Air craft and Rescue related ppt which detailed explain about
Bài tập bổ trợ I-learn Smart World 9 HK1 Unit 4 (HS) (1).doc
Some briefly information About buying linkedin accounts.pdf
Caterpillar Cat 330B L Excavator (Prefix 4RS) Service Repair Manual Instant D...
Deutz D6D EBE2 Volvo EC160BLC Excavator Service Repair Manual.pdf
Volvo Dump Truck A40E Service Repair Manual.pdf
Where to Buy Verified LinkedIn Accounts Securely in 2025.pdf
Bobcat tl35.70 telescopic handler service manual sn b3 zu11001 and above.pdf
Bobcat t40180 telescopic handler repair manual sn b33 j11001 and above.pdf
DNA Packaging_ Structure and Function in Cells.pptx
Training Material_Verification Station.pptx
Volvo EC160B LC EC160BLC Excavator Service Repair Manual Instant Download.pdf
BOBCAT 442 muffler SN 522311001 & Above.pdf
Bobcat t40140 telescopic handler service repair manual sn b33 j11001 and abov...
APQP FOR AUTOMOTIVE AND PPAP FOR SUPPLIER
D85PX-15E0 Komatsu d85 px 15e0 dozer bulldozer Technical manual.pdf
CASE CX50B Series 2 Mini Excavator Service Repair Manual Instant Download.pdf

5266732.ppt

  • 1. A Decentralized Structure Storage Model - Avinash Lakshman & Prashanth Malik - Presented by Srinidhi Katla CASSANDRA
  • 2. Topics covered:  What Is Cassandra  Motive  Data Model  Architecture  The After Story  Applications
  • 3. Features of Cassandra  Distributed Storage system  Manages very large amounts of Data  Highly available  No Single point of failure  Simple data model  Dynamic control over Data layout and format  Designed to run on cheap commodity hardware  Handles high throughput while not sacrificing high read efficiency
  • 4. Motives behind Cassandra  Storage needs of Inbox search problem o High write throughput o Increasing number of users o High search latencies due to data distribution.  Operational Requirements : o Scalability o Handle Hardware failure  Inbox Search was launched in 2008 for 100 million users ;  Is Deployed as backend storage system for multiple services within FB
  • 5. Data Model  Is based on Amazon’s Dynamo and Google’s Big Table.  Table : distributed Multi-dimensional map indexed by a key  Consists : Row key, Column, Column Family, Super column Family  Row Key : Can be considered equivalent to primary index of the RDBMS.  Column : is a “name , value, time ” (e.g., “color=red”).  Column Family : Set of columns grouped together Simple column Family Super column Family : column family within column family
  • 6. Column Family Image courtesy : http://www.ebaytechblog.com/author/jhpatel/#.VSPslfnF8SM
  • 7. Column Family (Conti..)  Access column using convention : column_family:column  Super column : column_family:supercolumn:column
  • 8. Facebook super column abstraction  term search : User Id = row key ; Terms searched = supercolumn; Message identifiers of message containing the word = column  Interaction User ID : rowkey; receipients’ IDs : supercolumn Individual message identifier = columns
  • 9. API  Cassandra has thrift querying : insert (table, key, row Mutation) get(table, key, column Name) delete(table, key, columnName)
  • 10. Architecture  Partitioning  Replication  Membership and Failure Detection  Bootstrapping  Scaling the cluster  Local Persistence
  • 11. Partitioning  Data is partitioned dynamically over the nodes to aid scaling.  Implements order preserving consistent hashing.(CH)  Through consistent Hashing, coordinator for each data key is determined.  Advantages of CH : Departure and Arrival of node only affects its neighbours.  Disadvantage of CH : Non-uniform data distribution . Hashing is unaware of the heterogenity of the performance of nodes.  Solution by Cassandra: Lightly loaded nodes move on the ring to alleviate heavily loaded nodes.
  • 12. Replication :  Required for ensuring High availability and durability  Replication Factor “N”  Coordinator node is responsible for replication of data at N-1 nodes.  Replication Policies :  Rack Unaware : replicated to N-1 successors of coordinator  Rack Aware Zookeeper is chosen,  Data Center Aware informs the nodes what replicas to store • Meta data about ranges a node is responsible for is stored in ZooKeeper as well as the node.
  • 13. Membership and Failure Detection  Membership is based on Scuttlebutt  – Gossip based mechanism.  Efficient CPU utilization  Efficient utilization of gossip channel  Used for membership and to disseminate system related control state  Failure detection : To check if the node is available and to avoid attempts to communicate with the unreachable nodes. – uses Modified ᶲ Accrual Failure detector  Failure detection emits suspicion level defined as ᶲ instead of Boolean value.
  • 14. Boot Strapping & Scaling  Token assigned to new node is gossiped among all the nodes.  New node is assigned token so as alleviate the heavily loaded node.  New node reads the configuration file from the ZooKeeper.  Node outages are usually transient => Rebalancing of partition assignment or repair of unreachable replicas should be avoided.  Change of node membership is manual.  The heavily loaded node splits the data and responsibility.  Operational experience shows that the data can be transferred at a rate of 40 Mbps from single node.
  • 15. Local Persistence  Relies on local file system  Dedicated disk on each machine for commit log to maximise the disk throughput  Write :Data is first written to commit log and later to in- memory data structure.  After the data limit crosses a threshold value in the in- memory DS, it is dumped to the disk.  - Index is created for efficient lookup.  Many files exist on the disk over time. Merge process to collate these files into one file. Similar to compaction process in Big Table.  Generate index for 256K block for efficient lookup in columns
  • 16. Local Persistence (Conti)  Read: Query the in-memory DS 1st. Then look up in the disk.  Files are looked up in the order of new to old.  Bloom filter to check if the key exists in the file.  Column indices
  • 17. Reads and Writes  Request for a key is routed to a node in the cluster.  Node determines the replicas and route request.  Fail request if the replies are not received within time.  For Writes : routes request to replica and waits for a quorun of replicas to acknowledge the completion of writes  For Reads : Based on Client set consistency guarantee value, request is routed to either the closest replica or request is routed to all replicas and wait for the quorum of responses
  • 18. Implementation  Cassandra on each machine – partitioning module, cluster membership, failure detection, storage engine  Implemented ground up using Java  Purge commit log entries using rolling commit log mechanism for 128 MB chunk.  In memory DS and datafile for every column family  All writes to disk are sequential to maximize the throughput  No locks since the files dumped to the disk are not mutated.
  • 19. The After Story  It was released as an open source project on Google code in July 2008 which is now being developed and marketed by Apache as Apache Cassandra (henceforth referred as Cassandra in this slide).  In Apache Cassandra, Super columns are stripped due to performance issues. Instead composite column is introduced  Cassandra Query Language presents a data model familiar to relational database users.  Cassandra partitioning is still based on consistent hashing, but has moved away from load balancing in favor of virtual nodes,  Order preserving hash function was ripped in favor of a true OrderedPartitioner (later superseded by ByteOrderedPartitioner).
  • 20. The After Story (Conti..)  In modern Cassandra terminology, the coordinator is the node that processes a given client’s request and routes it to the appropriate replicas; it is not necessarily itself a replica.  Zookeeper usage was restricted to Facebook’s in- house Cassandra branch;  Modern Cassandra management tools include DataStax’s OpsCenter and Netflix’s Priam.
  • 21. Big Players  Facebook Inbox search feature was implemented on Cassandra where every user is an index and the recipient and messages are stored as columns. The sytem currently stores more than 50 TB of data on a 150 node cluster with a median search latency of approximately 15 ms.  Netflix, a video streaming firm stores 95% of its data in Cassandra  Ebay has implemented Cassandra for the features like counts for “own” “want” “like” data on its web page.  Coursera, an online training service, has Cassandra implemented for its mobile applications

Editor's Notes

  • #8: Row keys and super column keys do not have any values Column Keys and Supercolumn Keys are indexed and sorted by a specific type. super column keys in different rows do not have to match and often will not.
  • #10: with Cassandra you need to think about what queries you want to support efficiently ahead of time, and model appropriately. Since there are no automatically-provided indexes, you will be much closer to one ColumnFamily per query than you would have been with tables:queries relationally. Don't be afraid to denormalize accordingly; Cassandra is much, much faster at writes than relational systems.