SlideShare a Scribd company logo
Apache Cassandra, part 2 – data model example, machinery
V. Data model example - Twissandra
Twissandra Use CasesGet the friends of a usernameGet the followers of a usernameGet a timeline of a specific user’s tweetsCreate a tweetCreate a userAdd friends to a user
Twissandra – DB UserUseriduser_namepassword
Twissandra - DB FollowersUserUserFollowersiduser_namepasswordiduser_namepassworduser_idfollower_id
Twissandra - DB FollowingUserUserFollowingiduser_namepasswordiduser_namepassworduser_idfollowing_id
Twissandra – DB TweetsUserTweetiduser_namepasswordiduser_idbodytimestamp
Twissandra column familiesUserUsernameFriends, FollowersTweetUserlineTimeline
Twissandra – Users CF<<CF>> User<<CF>> Username<<RowKey>> userid+ username+ password<<RowKey>> username+ userid
Twissandra–Friends and Followers CFs<<CF>> Friends<<CF>> Followers<<RowKey>> userid<<RowKey>> useridfriendidfolloweridtimestamptimestamp
Twissandra – Tweet CF<<CF>> Tweet<<RowKey>> tweetid + userid + body + timestamp
Twissandra–Userline and Timeline CFs<<CF>> Userline<<CF>> Timeline<<RowKey>> userid<<RowKey>> useridtimestamptimestamptweetidtweetid
Cassandra QL – User creationBATCH BEGIN BATCH INSERT INTO User (KEY, username, password) VALUES (‘id', ‘konstantin’,  ‘******’)INSERT INTO Username (KEY, userid) VALUES ( ‘konstantin’,  ‘id’)APPLY BATCH
Cassandra QL – following a friendBATCH BEGIN BATCHINSERT INTO Friends (KEY,  friendid) VALUES (‘userid‘, ‘friendid’)INSERT INTO Followers (KEY, userid) VALUES (‘friendid ‘, ‘userid’)APPLY BATCH
Cassandra QL – Tweet creation BATCH BEGIN BATCHINSERT INTO Tweet (KEY,  userid, body, timestamp) VALUES (‘tweetid‘, ‘userid’, ’@ericflo thanks for Twissandra, it helps!’, 123656459847)INSERT INTO Userline (KEY, 123656459847) VALUES ( ‘userid’, ‘tweetid’)INSERT INTO Timeline (KEY, 123656459847) VALUES ( ‘userid’, ‘tweetid’)……..INSERT INTO Timeline (KEY, 123656459847) VALUES ( ‘followerid’, ‘tweetid’)……APPLY BATCH
Cassandra QL – Getting user tweetsSELECT  * FROM Userline KEY = ‘userid’SELECT * FROM  Tweet WHERE KEY IN (‘tweetid1’, ‘tweetid2’, ‘tweetid3’, …., ‘tweetidn’)
Cassandra QL – Getting user timelineSELECT  * FROM Timeline KEY = ‘userid’SELECT * FROM  Tweet WHERE KEY IN (‘tweetid1’, ‘tweetid2’, ‘tweetid3’, …., ‘tweetidn’)
Design patternsMaterialized Viewcreate a second column family to represent additional queriesValueless Columnuse column names for valuesAggregate KeyIf you need to find sub item, use composite key
Indexes<<CF>> Item_Properties<<CF>> Container_Items<<RowKey>> item_id<<RowKey>> container_idproperty_nameitem_idproperty_valueinsertion_timestamp
Indexes<<CF>> Container_Items_Property_Index<<RowKey>> container_id + property_namecomposite(property_value, item_id, entry_timestamp)item_idComparator: compositecomparer.CompositeType
Problem with eventual consistencyWhen we update value, we should add new value to index, and remove old value.However, eventual consistency and lack of transactions make it impossible
Solution<<CF>> Container_Item_Property_Index_Entries<<RowKey>> container_id + item_id		+ property_nameentry_timestampproperty_value
VI. Architecture
PartitionersPartitioners decide where a key maps onto the ring.Key 1Key 2Key 3Key 4
PartitionersRandomPartitionerOrderPreservingPartitionerByteOrderedPartitionerCollatingOrderPreservingPartitioner
ReplicationReplication controlled by the replication_factor setting in the keyspace definitionThe actual placement of replicas in the cluster is determined by the Replica Placement Strategies.
Placement StrategiesSimpleStrategy - returns the nodes that are next to each other on the ring.
Placement StrategiesOldNetworkTopologyStrategy - places one replica in a different data center while placing the others on different racks in the current data center.
Placement StrategiesNetworkTopologyStrategy - Allows you to configure the number of replicas per data center as specified in the strategy_options.
SnitchesGive Cassandra information about the network topology of the clusterEndpoint snitch – gives information about network topology.Dynamic snitch – monitor read latencies
Endpoint Snitch ImplementationsSimpleSnitch(default)- can be efficient for locating nodes in clusters limited to a single data center.
Endpoint Snitch ImplementationsRackInferringSnitch - extrapolates the topolology of the network by analyzing IP addresses.192.168.191.71In the same rack192.168.191.21192.168.191.71In the same datacenter192.168.171.21192.78.19.71In different datacenters192.18.11.21
Endpoint Snitch ImplementationsPropertyFileSnitch - determines the location of nodes by referring to a user-defined description of the network details located in the property file cassandra-topology.properties.
Commit Log Durability
 sequential writes onlyMemtable no disk access, batched writesSSTable become read‐only
 indexesMemtables, SSTables, Commit Logs
Write propertiesWrite propertiesNo readsNo seeksFastAtomic within ColumnFamilyAlways writable
Write/Read propertiesRead propertiesRead multiple SSTablesSlower than writes (but still fast)Seeks can be mitigated with more RAMScales to billions of rows
Commit Log durabilityDurability settings reflects PostgreSQL settings.Periodic sync of commit log. With potential probability for data loss.Batch sync of commit log.  Write is acknowledged only if commit log is flushed on disk. It is strongly recommended to have separate device for commit log in such case.
Gossip protocolIntra-ring communicationRuns periodicallyFailure detection,hinted handoffs and nodes exchange
Gossip protocolorg.apache.cassandra.gms.GossiperHas the list of nodes that are alive and deadChooses a random node and starts “chat” with it. One gossip round requires three messagesFailure detection uses a suspicion level to decide whether the node is alive or dead
Hinted handoffWriteHintCassandra is always available for write
Consistency level
TombstonesThe data is not immediately deletedDeleted values are markedTombstones will be suppressed during next compactionGCGraceSeconds – amount of seconds that server will wait to garbage-collect a tombstone
CompactionMerging SSTables into onemerging keyscombining columnscreating new indexMain aims:Free up spaceReduce number of required seeks
CompactionMinor:Triggered when at least N SSTables have been flushed on disk (N is tunable, 4 – by default)Merging SSTables of the similar sizeMajor:Merging all SSTablesDone manually through nodetool compactdiscarding tombstones
Replica synchronizationAnti-entropyRead repair
Anti-entropyDuring major compaction the node exchanges Merkle trees (hash of its data) with another nodesIf the trees don’t match, they are repairedNodes maintain timestamp index and exchange only the most recent updates
Read repairDuring read operation replicas with stale values are brought up to dateWeek consistency level (ONE):			after the data is returnedStrong consistency level (QUORUM, ALL):			before the data is returned
Bloom filtersA bit arrayTest whether value is a member of setReduce disk access (improve performance)
Bloom filtersOn write:`several hashes are generated per keybits for each hash are markedOn read:hashes are generated for the keyif all bits of this hashes are non-empty then the key may probably exist in SSTableif at least one bit is empty then the key has been never written to SSTable
Bloom filtersReadWrite100Hash1Hash1000Key1Hash2Key2Hash2010Hash31Hash30SSTable

More Related Content

What's hot (20)

PDF
Apache Cassandra Lesson: Data Modelling and CQL3
Markus Klems
 
PPTX
Cassandra Community Webinar: Back to Basics with CQL3
DataStax
 
PDF
Deep Dive into Cassandra
Brent Theisen
 
PDF
Cassandra EU - Data model on fire
Patrick McFadin
 
PPT
NOSQL and Cassandra
rantav
 
PPTX
A Deep Dive Into Understanding Apache Cassandra
DataStax Academy
 
PDF
Advanced data modeling with apache cassandra
Patrick McFadin
 
PPT
Scaling web applications with cassandra presentation
Murat Çakal
 
PDF
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example
DataStax Academy
 
PDF
Cassandra Community Webinar | Become a Super Modeler
DataStax
 
PPTX
C*ollege Credit: Creating Your First App in Java with Cassandra
DataStax
 
PDF
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
DataStax
 
PPTX
Algebird : Abstract Algebra for big data analytics. Devoxx 2014
Samir Bessalah
 
PDF
Successful Architectures for Fast Data
Patrick McFadin
 
PPTX
Spanner (may 19)
Sultan Ahmed
 
PDF
Cassandra nice use cases and worst anti patterns
Duyhai Doan
 
PDF
Datastax day 2016 introduction to apache cassandra
Duyhai Doan
 
PDF
Cassandra 3.0 - JSON at scale - StampedeCon 2015
StampedeCon
 
PDF
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
DataStax Academy
 
PDF
Big data 101 for beginners devoxxpl
Duyhai Doan
 
Apache Cassandra Lesson: Data Modelling and CQL3
Markus Klems
 
Cassandra Community Webinar: Back to Basics with CQL3
DataStax
 
Deep Dive into Cassandra
Brent Theisen
 
Cassandra EU - Data model on fire
Patrick McFadin
 
NOSQL and Cassandra
rantav
 
A Deep Dive Into Understanding Apache Cassandra
DataStax Academy
 
Advanced data modeling with apache cassandra
Patrick McFadin
 
Scaling web applications with cassandra presentation
Murat Çakal
 
Cassandra Day Atlanta 2015: Data Modeling In-Depth: A Time Series Example
DataStax Academy
 
Cassandra Community Webinar | Become a Super Modeler
DataStax
 
C*ollege Credit: Creating Your First App in Java with Cassandra
DataStax
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
DataStax
 
Algebird : Abstract Algebra for big data analytics. Devoxx 2014
Samir Bessalah
 
Successful Architectures for Fast Data
Patrick McFadin
 
Spanner (may 19)
Sultan Ahmed
 
Cassandra nice use cases and worst anti patterns
Duyhai Doan
 
Datastax day 2016 introduction to apache cassandra
Duyhai Doan
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
StampedeCon
 
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
DataStax Academy
 
Big data 101 for beginners devoxxpl
Duyhai Doan
 

Viewers also liked (6)

PPTX
CQRS innovations (English version)
Andrey Lomakin
 
PDF
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
DataStax
 
PPTX
Cassandra internals
narsiman
 
PPTX
High performance queues with Cassandra
Mikalai Alimenkou
 
PPTX
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
DataStax
 
PPT
Distributed Airline Reservation System
amanchaurasia
 
CQRS innovations (English version)
Andrey Lomakin
 
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
DataStax
 
Cassandra internals
narsiman
 
High performance queues with Cassandra
Mikalai Alimenkou
 
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
DataStax
 
Distributed Airline Reservation System
amanchaurasia
 
Ad

Similar to Apache Cassandra, part 2 – data model example, machinery (20)

PDF
Cassandra1.2
Tianlun Zhang
 
PDF
Cassandra
Carbo Kuo
 
PDF
Cassandra: Open Source Bigtable + Dynamo
jbellis
 
PPTX
Cassandra's Sweet Spot - an introduction to Apache Cassandra
Dave Gardner
 
PPTX
Apache cassandra - future without boundaries (part2)
Return on Intelligence
 
PPTX
Cassandra
exsuns
 
PDF
Time series with Apache Cassandra - Long version
Patrick McFadin
 
PDF
Introduction to Cassandra Concepts and its usage
bharatkumarbhojwani
 
PPTX
Cassandra20141009
Brian Enochson
 
PPTX
Presentation
Dimitris Stripelis
 
PDF
Cassandra
Robert Koletka
 
PPTX
Dynamo cassandra
Wu Liang
 
PPTX
Cassandra - A decentralized storage system
Arunit Gupta
 
PPTX
L6.sp17.pptx
SudheerKumar499932
 
PDF
An Introduction to Apache Cassandra
Saeid Zebardast
 
PPTX
Cassandra20141113
Brian Enochson
 
PPTX
Apache Cassandra Data Modeling with Travis Price
DataStax Academy
 
PPTX
Cassandra & Python - Springfield MO User Group
Adam Hutson
 
PDF
Introduction to Apache Cassandra
Luke Tillman
 
PDF
Cassandra overview
Sean Murphy
 
Cassandra1.2
Tianlun Zhang
 
Cassandra
Carbo Kuo
 
Cassandra: Open Source Bigtable + Dynamo
jbellis
 
Cassandra's Sweet Spot - an introduction to Apache Cassandra
Dave Gardner
 
Apache cassandra - future without boundaries (part2)
Return on Intelligence
 
Cassandra
exsuns
 
Time series with Apache Cassandra - Long version
Patrick McFadin
 
Introduction to Cassandra Concepts and its usage
bharatkumarbhojwani
 
Cassandra20141009
Brian Enochson
 
Presentation
Dimitris Stripelis
 
Cassandra
Robert Koletka
 
Dynamo cassandra
Wu Liang
 
Cassandra - A decentralized storage system
Arunit Gupta
 
L6.sp17.pptx
SudheerKumar499932
 
An Introduction to Apache Cassandra
Saeid Zebardast
 
Cassandra20141113
Brian Enochson
 
Apache Cassandra Data Modeling with Travis Price
DataStax Academy
 
Cassandra & Python - Springfield MO User Group
Adam Hutson
 
Introduction to Apache Cassandra
Luke Tillman
 
Cassandra overview
Sean Murphy
 
Ad

Recently uploaded (20)

PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PPTX
Designing Production-Ready AI Agents
Kunal Rai
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
July Patch Tuesday
Ivanti
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Designing Production-Ready AI Agents
Kunal Rai
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
July Patch Tuesday
Ivanti
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 

Apache Cassandra, part 2 – data model example, machinery

Editor's Notes

  • #31: Endpoint snitch can be wrapped with a dynamic snitch, which will monitor read latencies and avoid reading from hosts that have slowed (due to compaction, for instance)