SlideShare a Scribd company logo
Apache Cassandra An Introduction for Java Developers Nate McCall [email_address] @zznate
What is Apache Cassandra?
CAP Theorem  C onsistency A vailability  P artition Tolerance “ Though shalt have but 2”  - Conjecture made by Eric Brewer in 2000 - Published as formal proof in 2002 - See:  http://en.wikipedia.org/wiki/CAP_theorem  for more
Apache Cassandra Concepts - Explicit choice of partition tolerance and availability. Consistency is tunable. - No read before write - Merge on read - Idempotent - Schema Optional - All nodes share the same roll - Still performs well with larger-than-memory data sets
Generally complements another system(s)  (Not intended to be one-size-fits-all) *** You should always use the right tool for the right job anyway
How does this differ from an RDBMS?
How does this differ from an RDBMS? Substantially.
vs. RDBMS - No Joins  Unless:  - you do them on the client  - you do them via Map/Reduce
vs. RDBMS - Schema Optional  (Though you can add meta information for validation and type checking)  *** Supports secondary indexes too: “ …  WHERE state = 'TX' ”
vs. RDBMS - Prematerialized and Transaction-less - No ACID transactions  - Limited support for ad-hoc queries
vs. RDBMS - Prematerialized and Transaction-less - No ACID transactions  - Limited support for ad-hoc queries *** You are going to give up both of these anyway when you shard an RDBMS ***
vs. RDBMS - Facilitates Consolidation It can be your caching layer * Off-heap cache (provided you install JNA) It can be your analytics infrastructure * true map/reduce * pig driver * hive driver coming soon
vs. RDBMS - Shared-Nothing Architecture Every node plays the same role: no masters, no slaves, no special nodes *** No single point of failure
vs. RDBMS - Real Linear Scalability Want 2x performance? Add 2x nodes. *** 'No downtime' included!
vs. RDBMS - Performance Reads on par with writes
Clustering
Clustering Single node cluster (easy development setup) - one node owns the whole hash range
Clustering Two node cluster - Key range divided between nodes
Clustering Consistent Hashing: md5(“zznate”) = “C”
Clustering Consistent Hashing FTW: - Ring ownership continuously “gossiped” between nodes - Any node can act as a “coordinator” to service client requests for any key * requests forwarded to the appropriate nodes by coordinator transparently to the client
Clustering Client Read:  get(“zznate”) md5 = “C”
Clustering – Scale Out
Clustering – Scale Out
Clustering – Scale Out
Clustering - Multi-DC
Clustering - Reliability
Clustering - Reliability
Clustering - Reliability
Clustering - Reliability
Clustering - Multi-Datacenter
Clustering – Multi-DC Reliability
Storage (Briefly)
Storage (Briefly)  Understanding the on-disk format is extremely helpful in designing your data model correctly
Storage - SSTable - SSTables are immutable (“Merge on read”) - Newest timestamp wins
Storage – Compaction Merge SSTables – keeping count down making Merge on Read more efficient Discards Tombstones (more on this later!)
Data Model
Data Model "...sparse, persistent, distributed, multi-dimensional sorted map." (The “Bigtable” paper)
Data Model Keyspace - Collection of Column Families
- Controls replication
Column Family
- Similar to a table
- Columns ordered by name
Data Model – Column Family Static Column Family - Model my object data
Dynamic Column Family
- Pre-calculated query results
Nothing stopping you from mixing them!
Data Model – Static CF zznate driftx thobbs jbellis password : * password : * password : * name : Nate name : Brandon name : Tyler password : * name : Jonathan site : datastax.com Users
Data Model – Prematerialized Query Following zznate driftx thobbs jbellis driftx: thobbs: driftx: thobbs: mdennis: zznate zznate: pcmanus xedin:
Data Model – Prematerialized Query Additional examples: Timeline of tweets by a user Timeline of tweets by all of the people a user is following List of comments sorted by score List of friends grouped by state
API Operations
Five general categories Retrieving Writing/Updating/Removing (all the same op!) Increment counters Meta Information Schema Manipulation CQL Execution
Using a Client Hector Client: http://hector-client.org - Most popular Java client  - In use at very large installations - A number of tools and utilities built on top - Very active community - MIT Licensed  *** like any open source project fully dependent on another open source project it has it's worts
Sample Project for Experimenting https://github.com/zznate/cassandra-tutorial https://github.com/zznate/hector-examples Built using Hector  Really basic – designed to be beginner level w/ very few moving parts Modify/abuse/alter as needed *** Descriptions of what is going on and how to run each example are in the Javadoc comments. 
ColumnFamilyTemplate Familiar, type-safe approach - based on template-method design pattern - generic: ColumnFamilyTemplate<K,N> (K is the key type, N the column name type) ColumnFamilyTemplate template = new ThriftColumnFamilyTemplate(keyspaceName,  columnFamilyName,  StringSerializer.get(),  StringSerializer.get()); *** (no generics for clarity)
ColumnFamilyTemplate new ThriftColumnFamilyTemplate(keyspaceName,  columnFamilyName,  StringSerializer.get(),  StringSerializer.get()); Key Format Column Name Format - Cassandra calls this a “comparator” - Remember: defines column order in on-disk format
ColumnFamilyTemplate ColumnFamilyResult<String, String> res = cft.queryColumns(&quot;zznate&quot;); String value = res.getString(&quot;email&quot;); Date startDate = res.getDate(“startDate”); Key Format Column Name Format
ColumnFamilyTemplate ColumnFamilyResult wrapper =  template.queryColumns(&quot;zznate&quot;, &quot;patricioe&quot;, &quot;thobbs&quot;); String nateEmail = wrapper.getString(&quot;email&quot;);  wrapper.next(); String patoEmail = wrapper.getString(&quot;email&quot;); wrapper.next(); String tylerEmail = wrapper.getString(&quot;email&quot;); Querying multiple rows and iterating over results
ColumnFamilyTemplate ColumnFamilyUpdater updater = template.createUpdater(&quot;zznate&quot;);  updater.setString(&quot;companyName&quot;,&quot;DataStax&quot;); updater.addKey(&quot;sergek&quot;); updater.setString(&quot;companyName&quot;,&quot;PrestoSports&quot;); template.update(updater); Inserting data with ColumnFamilyUpdater
ColumnFamilyTemplate template.deleteColumn(&quot;zznate&quot;, &quot;notNeededStuff&quot;); template.deleteColumn(&quot;zznate&quot;, &quot;somethingElse&quot;); template.deleteColumn(&quot;patricioe&quot;, &quot;aDifferentColumnName&quot;); ... template.deleteRow(“someuser”); template.executeBatch(); Deleting Data with ColumnFamilyTemplate
Deletion
Deletion Again: Every mutation is an insert!
- Merge on read
- Sstables are immutable
- Highest timestamp wins
Deletion – As Seen by CLI [default@Tutorial] list StateCity; Using default limit of 100
-------------------
RowKey: CA Burlingame
=> (column=650, value=33372e3537783132322e3334, timestamp=1310340410528000)
-------------------
RowKey: TX Austin

More Related Content

What's hot (20)

PDF
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
DataStax Academy
 
PPTX
Apache Cassandra Developer Training Slide Deck
DataStax Academy
 
PDF
Cassandra
Edureka!
 
PPTX
Introduction to NoSQL & Apache Cassandra
Chetan Baheti
 
PPTX
Apache Cassandra 2.0
Joe Stein
 
PDF
The Cassandra Distributed Database
Eric Evans
 
PDF
Cassandra: Open Source Bigtable + Dynamo
jbellis
 
PDF
Cassandra NoSQL Tutorial
Michelle Darling
 
PDF
Cassandra Database
YounesCharfaoui
 
PDF
Understanding Data Partitioning and Replication in Apache Cassandra
DataStax
 
PDF
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
DataStax
 
PPTX
Learn Cassandra at edureka!
Edureka!
 
PDF
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
DataStax Academy
 
PDF
Apache Cassandra in the Real World
Jeremy Hanna
 
PDF
Cassandra multi-datacenter operations essentials
Julien Anguenot
 
PPTX
Apache Cassandra at the Geek2Geek Berlin
Christian Johannsen
 
PPT
Apache Cassandra training. Overview and Basics
Oleg Magazov
 
PDF
Introduction to Cassandra
SoftwareMill
 
PDF
Apache Cassandra Interview Questions and Answers | Cassandra Tutorial | Cassa...
Edureka!
 
PPTX
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
DataStax
 
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
DataStax Academy
 
Apache Cassandra Developer Training Slide Deck
DataStax Academy
 
Cassandra
Edureka!
 
Introduction to NoSQL & Apache Cassandra
Chetan Baheti
 
Apache Cassandra 2.0
Joe Stein
 
The Cassandra Distributed Database
Eric Evans
 
Cassandra: Open Source Bigtable + Dynamo
jbellis
 
Cassandra NoSQL Tutorial
Michelle Darling
 
Cassandra Database
YounesCharfaoui
 
Understanding Data Partitioning and Replication in Apache Cassandra
DataStax
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
DataStax
 
Learn Cassandra at edureka!
Edureka!
 
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
DataStax Academy
 
Apache Cassandra in the Real World
Jeremy Hanna
 
Cassandra multi-datacenter operations essentials
Julien Anguenot
 
Apache Cassandra at the Geek2Geek Berlin
Christian Johannsen
 
Apache Cassandra training. Overview and Basics
Oleg Magazov
 
Introduction to Cassandra
SoftwareMill
 
Apache Cassandra Interview Questions and Answers | Cassandra Tutorial | Cassa...
Edureka!
 
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
DataStax
 

Viewers also liked (15)

KEY
Cassandra+Hadoop
Jeremy Hanna
 
PPTX
An Overview of Apache Cassandra
DataStax
 
PPTX
Introduction to NoSQL and Cassandra
Patricio Echagüe
 
PPTX
Cassandra
exsuns
 
ODP
Intro to cassandra
Aaron Ploetz
 
PDF
Apache Cassandra: NoSQL in the enterprise
jbellis
 
PDF
Apache Cassandra
Sperasoft
 
PDF
Dağıtık Sistemler / Programlama
Şahabettin Akca
 
PDF
Cassandra at NoSql Matters 2012
jbellis
 
PDF
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
DataStax
 
PDF
Cassandra By Example: Data Modelling with CQL3
Eric Evans
 
PDF
Cassandra Explained
Eric Evans
 
PDF
Advanced data modeling with apache cassandra
Patrick McFadin
 
PDF
Cassandra Introduction & Features
DataStax Academy
 
PPTX
Cassandra Data Modeling - Practical Considerations @ Netflix
nkorla1share
 
Cassandra+Hadoop
Jeremy Hanna
 
An Overview of Apache Cassandra
DataStax
 
Introduction to NoSQL and Cassandra
Patricio Echagüe
 
Cassandra
exsuns
 
Intro to cassandra
Aaron Ploetz
 
Apache Cassandra: NoSQL in the enterprise
jbellis
 
Apache Cassandra
Sperasoft
 
Dağıtık Sistemler / Programlama
Şahabettin Akca
 
Cassandra at NoSql Matters 2012
jbellis
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
DataStax
 
Cassandra By Example: Data Modelling with CQL3
Eric Evans
 
Cassandra Explained
Eric Evans
 
Advanced data modeling with apache cassandra
Patrick McFadin
 
Cassandra Introduction & Features
DataStax Academy
 
Cassandra Data Modeling - Practical Considerations @ Netflix
nkorla1share
 
Ad

Similar to Introduciton to Apache Cassandra for Java Developers (JavaOne) (20)

ODP
Nyc summit intro_to_cassandra
zznate
 
ODP
Meetup cassandra for_java_cql
zznate
 
PPT
Storage cassandra
PL dream
 
PPT
No sql
Shruti_gtbit
 
PPTX
Using Cassandra with your Web Application
supertom
 
PPTX
NoSql Database
Suresh Parmar
 
ODP
Introduction to apache_cassandra_for_developers-lhg
zznate
 
PPTX
Cassandra - A decentralized storage system
Arunit Gupta
 
ODP
Practical catalyst
dwm042
 
PPTX
Cassandra's Sweet Spot - an introduction to Apache Cassandra
Dave Gardner
 
PPT
No sql
Murat Çakal
 
PPTX
Riak add presentation
Ilya Bogunov
 
PPTX
MongoDB
fsbrooke
 
PDF
Postgres Vienna DB Meetup 2014
Michael Renner
 
PDF
Using Document Databases with TYPO3 Flow
Karsten Dambekalns
 
PPTX
Python (Jinja2) Templates for Network Automation
Rick Sherman
 
PDF
What's New in Apache Hive
DataWorks Summit
 
PDF
Gcp data engineer
Narendranath Reddy T
 
PPT
NOSQL and Cassandra
rantav
 
PPTX
Scaling opensimulator inventory using nosql
David Daeschler
 
Nyc summit intro_to_cassandra
zznate
 
Meetup cassandra for_java_cql
zznate
 
Storage cassandra
PL dream
 
No sql
Shruti_gtbit
 
Using Cassandra with your Web Application
supertom
 
NoSql Database
Suresh Parmar
 
Introduction to apache_cassandra_for_developers-lhg
zznate
 
Cassandra - A decentralized storage system
Arunit Gupta
 
Practical catalyst
dwm042
 
Cassandra's Sweet Spot - an introduction to Apache Cassandra
Dave Gardner
 
No sql
Murat Çakal
 
Riak add presentation
Ilya Bogunov
 
MongoDB
fsbrooke
 
Postgres Vienna DB Meetup 2014
Michael Renner
 
Using Document Databases with TYPO3 Flow
Karsten Dambekalns
 
Python (Jinja2) Templates for Network Automation
Rick Sherman
 
What's New in Apache Hive
DataWorks Summit
 
Gcp data engineer
Narendranath Reddy T
 
NOSQL and Cassandra
rantav
 
Scaling opensimulator inventory using nosql
David Daeschler
 
Ad

More from zznate (15)

PDF
Advanced Apache Cassandra Operations with JMX
zznate
 
PDF
Hardening cassandra q2_2016
zznate
 
PDF
Seattle C* Meetup: Hardening cassandra for compliance or paranoia
zznate
 
PDF
Software Development with Apache Cassandra
zznate
 
PDF
Hardening cassandra for compliance or paranoia
zznate
 
PDF
Successful Software Development with Apache Cassandra
zznate
 
PDF
Stampede con 2014 cassandra in the real world
zznate
 
PDF
An Introduction to the Vert.x framework
zznate
 
PDF
Intravert atx meetup_condensed
zznate
 
PDF
Apachecon cassandra transport
zznate
 
KEY
Oscon 2012 tdd_cassandra
zznate
 
PPTX
Strata west 2012_java_cassandra
zznate
 
ODP
Meetup cassandra sfo_jdbc
zznate
 
PPT
Introduction to apache_cassandra_for_develope
zznate
 
PPT
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
zznate
 
Advanced Apache Cassandra Operations with JMX
zznate
 
Hardening cassandra q2_2016
zznate
 
Seattle C* Meetup: Hardening cassandra for compliance or paranoia
zznate
 
Software Development with Apache Cassandra
zznate
 
Hardening cassandra for compliance or paranoia
zznate
 
Successful Software Development with Apache Cassandra
zznate
 
Stampede con 2014 cassandra in the real world
zznate
 
An Introduction to the Vert.x framework
zznate
 
Intravert atx meetup_condensed
zznate
 
Apachecon cassandra transport
zznate
 
Oscon 2012 tdd_cassandra
zznate
 
Strata west 2012_java_cassandra
zznate
 
Meetup cassandra sfo_jdbc
zznate
 
Introduction to apache_cassandra_for_develope
zznate
 
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
zznate
 

Recently uploaded (20)

PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 

Introduciton to Apache Cassandra for Java Developers (JavaOne)