SlideShare a Scribd company logo
Cassandra
Introduction & Key Features
Meetup Vienna Cassandra Users
13th of January 2014
philipp.potisk@geroba.com
Definition
Apache Cassandra is an open source, distributed,
decentralized, elastically scalable, highly available,
fault-tolerant, tuneably consistent, column-oriented
database that bases its distribution design on Amazon’s
Dynamo and its data model on Google’s Bigtable.
Created at Facebook, it is now used at some of the most
popular sites on the Web [The Definitive Guide, Eben
Hewitt, 2010]
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

2
History
Dynamo, 2007

Bigtable, 2006

OpenSource, 2008

13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

3
Key Features

Distributed
and
Decentralized
High Performance

CQL – A SQL
like query
interface

Elastic
Scalability

Cassandra

Columnoriented
Key-Value
store
13/01/2014

High
Availability
and Fault
Tolerance

Tuneable
Consistency

Cassandra Introduction & Key Features by Philipp Potisk

4
Distributed and Decentralized
Datacenter 1

• Distributed: Capable of running
on multiple machines
• Decentralized: No single point of
failure
No master-slave issues due to
peer-to-peer architecture
(protocol "gossip")
Single Cassandra cluster may run
across geographically dispersed
data centers
13/01/2014

Datacenter 2

1

7

6

2

5

3

4

12

8

11

9
10

Read- and writerequests to any node

Cassandra Introduction & Key Features by Philipp Potisk

5
Elastic Scalability

1
8

1

• Cassandra scales horizontally,
adding more machines that have
all or some of the data on
• Adding of nodes increase
performance throughput linearly
• De-/ and increasing the
nodecount happen seamlessly

4 Performance
2
throughput = N
3

2

Performance
throughput = N x 2

7

4

6
5

Linearly scales to
terabytes and
petabytes of data
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

3

6
Scaling Benchmark By Netflix*
48, 96, 144 and 288
instances, with 10, 20,
30 and 60 clients
respectively. Each client
generated ~20.000w/s
having 400byte in size

Cassandra scales linearly far
beyond our current capacity
requirements, and very
rapid deployment
automation makes it easy to
manage. In particular,
benchmarking in the cloud
is fast, cheap and scalable,

*http://techblog.netflix.com/201
1/11/benchmarking-cassandrascalability-on.html
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

7
High Availability and Fault Tolerance
• High Availability?
Multiple networked computers
operating in a cluster
Facility for recognizing node
failures
Forward failing over requests to
another part of the system

1
6

2

5

3
4

• Cassandra has High Availability

No single point of failure
due to the peer-to-peer
architecture
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

8
Tunable Consistency
• Choose between strong and eventual
consistency
• Adjustable for read- and writeoperations separately
• Conflicts are solved during reads, as
focus lies on write-performance

TUNABLE

Available

Consistency

Use case dependent
level of consistency
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

9
When do we have strong consistency?
• Simple Formula:

jsmith

(nodes_written + nodes_read) >
replication_factor
jsmith

t1
t2

NW: 2
NR: 2
RF: 3

t1
t2

jsmith

t1

• Ensures that a read always
reflects the most recent write
• If not: Weak consistency
 Eventually consistent
jsmith

13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

t2
10
Column-oriented Key-Value Store
Row Key1

Column
Key1
Column
Value1

Column
Key2
Column
Value2

Column
Key3
Column
Value3

…
…

…

• Data is stored in sparse
multidimensional hash tables
• A row can have multiple columns –
not necessarily the same amount of
columns for each row
• Each row has a unique key, which
also determines partitioning
• No relations!

Stored sorted by row key *

Stored sorted by column key/value

Map<RowKey, SortedMap<ColumnKey, ColumnValue>>
* Row keys (partition keys) should be hashed, in order to distribute data across the cluster evenly
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

11
CQL – An SQL-like query interface
• “CQL 3 is the default and primary interface into the Cassandra DBMS” *
• Familiar SQL-like syntax that maps to Cassandras storage engine and
simplifies data modelling
CRETE TABLE songs (
id uuid PRIMARY KEY,
title text,
album text,
artist text,
data blob,
tags set<text>
);

INSERT INTO songs
(id, title, artist,
album, tags)
VALUES(
'a3e64f8f...',
'La Grange',
'ZZ Top',
'Tres Hombres'‚
{'cool', 'hot'});

SELECT *
FROM songs
WHERE id = 'a3e64f8f...';

“SQL-like” but NOT
relational SQL

* http://www.datastax.com/documentation/cql/3.0/pdf/cql30.pdf
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

12
High Performance
• Optimized from the ground up
for high throughput
• All disk writes are sequential,
append only operations
• No reading before writing
• Cassandra`s threading-concept is
optimized for running on
multiprocessor/ multicore
machines
13/01/2014

Optimized for writing,
but fast reads are
possible as well

Cassandra Introduction & Key Features by Philipp Potisk

13
Benchmark from 2011 (Cassandra 0.7.4)*
ops
Cassandra showed
outstanding throughput in
“INSERT-only” with 20,000
ops

Insert: Enter 50 million 1K-sized records
Read: Search key for a one hour period + optional update
Hardware: Nehalem 6 Core x 2 CPU, 16GB Memory
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

*NoSql Benchmarking by Curbit
http://www.cubrid.org/blog/de
v-platform/nosqlbenchmarking/
14
Benchmark from 2013 (Cassandra 1.1.6)*

* Benchmarking Top NoSQL Databases by End Point Corporation,
http://www.datastax.com/wp-content/uploads/2013/02/WP-Benchmarking-Top-NoSQL-Databases.pdf
Yahoo! Cloud Serving Benchmark: https://github.com/brianfrankcooper/YCSB
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

15
When do we need these features?
Lots of
Writes,
Statistics, and
Analysis

Geographical
Distribution

Large
Deployments

13/01/2014

Evolving
Applications

Cassandra Introduction & Key Features by Philipp Potisk

16
Who is using Cassandra?

13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

17
ebay Data Infrastructure*
•
•
•
•
•
•

Thousands of nodes
> 2K sharded logical host
> 16K tables
> 27K indexes
> 140 billion SQLs/day
> 5 PB provisioned

• 10+ clusters
• 100+ nodes
• > 250 TB provisioned
(local HDD + shared SSD)
• > 9 billion writes/day
• > 5 billion reads/day

• Hundreds of nodes
• Persistent & in-memory
• > 40 billion SQLs/day

Not replacing RDMBS but
complementing!

Hundreds of nodes
> 50 TB
> 2 billion ops/day

• Thousands of nodes
• The world largest cluster
with 2K+ nodes

*by Jay Patel, Cassandra Summit June 2013 San Francisco
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

18
Cassandra Use Case at Ebay
Application/Use Case
• Time-series data and real-time insights
• Fraud detection & prevention
• Quality Click Pricing for affiliates
• Order & Shipment Tracking
•…
• Server metrics collection
• Taste graph-based next-gen recommendation
system
• Social Signals on eBay Product & Item pages
13/01/2014

Why Cassandra?
• Multi-Datacenter (active-active)
• No SPOF
• Easy to scale
• Write performance
• Distributed Counters

Cassandra Introduction & Key Features by Philipp Potisk

19
Cassandra/Hadoop Deployment

13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

20
Summary
• History
• Key features of Cassandra
•
•
•
•
•
•
•

Distributed and Decentralized
Elastic Scalability
High Availability and Fault Tolerance
Tunable Consistency
Column-oriented key-value store
CQL interface
High Performance

• Ebay Use Case
13/01/2014

Apache project: http://cassandra.apache.org

Community portal: http://planetcassandra.org

Documentation: http://www.datastax.com/docs

Cassandra Introduction & Key Features by Philipp Potisk

21

More Related Content

What's hot (20)

PDF
Managing multi tenant resource toward Hive 2.0
Kai Sasaki
 
PDF
Technical Introduction to PostgreSQL and PPAS
Ashnikbiz
 
PDF
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
DataStax Academy
 
PPTX
Change Data Capture using Kafka
Akash Vacher
 
PPTX
Spark streaming with apache kafka
punesparkmeetup
 
KEY
Near-realtime analytics with Kafka and HBase
dave_revell
 
PDF
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Data Con LA
 
PPT
Scaling MySQL using Fabric
Karthik .P.R
 
PDF
PaaSTA: Autoscaling at Yelp
Nathan Handler
 
PDF
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
DataStax Academy
 
PPTX
Migration from Redshift to Spark
Sky Yin
 
PDF
Make 2016 your year of SMACK talk
DataStax Academy
 
PPTX
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Shivji Kumar Jha
 
PDF
Mining AWR V2 - Trend Analysis
Maris Elsins
 
PPTX
Real time dashboards with Kafka and Druid
Venu Ryali
 
KEY
From 100s to 100s of Millions
Erik Onnen
 
PPTX
Kafka - Linkedin's messaging backbone
Ayyappadas Ravindran (Appu)
 
PDF
Actor-based concurrency in a modern Java Enterprise
Alexander Lukyanchikov
 
PDF
MySQL Query Optimization (Basics)
Karthik .P.R
 
PPTX
Redis Labs and SQL Server
Lynn Langit
 
Managing multi tenant resource toward Hive 2.0
Kai Sasaki
 
Technical Introduction to PostgreSQL and PPAS
Ashnikbiz
 
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
DataStax Academy
 
Change Data Capture using Kafka
Akash Vacher
 
Spark streaming with apache kafka
punesparkmeetup
 
Near-realtime analytics with Kafka and HBase
dave_revell
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Data Con LA
 
Scaling MySQL using Fabric
Karthik .P.R
 
PaaSTA: Autoscaling at Yelp
Nathan Handler
 
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
DataStax Academy
 
Migration from Redshift to Spark
Sky Yin
 
Make 2016 your year of SMACK talk
DataStax Academy
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Shivji Kumar Jha
 
Mining AWR V2 - Trend Analysis
Maris Elsins
 
Real time dashboards with Kafka and Druid
Venu Ryali
 
From 100s to 100s of Millions
Erik Onnen
 
Kafka - Linkedin's messaging backbone
Ayyappadas Ravindran (Appu)
 
Actor-based concurrency in a modern Java Enterprise
Alexander Lukyanchikov
 
MySQL Query Optimization (Basics)
Karthik .P.R
 
Redis Labs and SQL Server
Lynn Langit
 

Viewers also liked (20)

PPTX
System Center 2012 - January Licensing Update
Softchoice Corporation
 
PPTX
SQL Server 2012 ile Gelen Yeni Özellikler
turgaysahtiyan
 
PPTX
Softchoice Webinar Series: VMware vSphere 5.1 Changes
Softchoice Corporation
 
PPTX
You voiced your concerns. VMware listened: Major Adjustments to vSphere 5 lic...
Softchoice Corporation
 
PDF
Nordic VMUG User Conference 2014 - Design VMware vCenter Server
Andrea Mauro
 
PPTX
Limewood Event - VMware
BlueChipICT
 
PDF
VMUGIT Meeting Pisa 2015 - SDS secondo VMware: VSAN e VVOL
gguglie
 
PPTX
Findability Day 2015 Mattias Ellison - Findwise - Enterprise Search and fin...
Findwise
 
PDF
Site Recovery Manager - Una visione architetturale
gguglie
 
PPTX
SQL Server Performans İpuçları
turgaysahtiyan
 
PDF
Docker at Djangocon 2013 | Talk by Ken Cochrane
dotCloud
 
PDF
vCenter and ESXi network port communications
Animesh Dixit
 
PDF
Virtual Space Race: How IT with The Right Stuff Creates a Competitive Advantage
Softchoice Corporation
 
PDF
VMworld 2014: Site Recovery Manager and vSphere Replication
VMworld
 
PDF
Working Hard or Hardly Networked?
Softchoice Corporation
 
PPTX
vmware_site_recovery_manager_and_net_app_fas_v-series_se_technical_presentati...
Vinh Nguyen
 
PPTX
Creating 3rd Generation Web APIs with Hydra
Markus Lanthaler
 
PPTX
Getting secure in a mobile-first world with EMS
Softchoice Corporation
 
PPT
How to hack VMware vCenter server in 60 seconds
Positive Hack Days
 
ZIP
InfoGrid Core Ideas
InfoGrid.org
 
System Center 2012 - January Licensing Update
Softchoice Corporation
 
SQL Server 2012 ile Gelen Yeni Özellikler
turgaysahtiyan
 
Softchoice Webinar Series: VMware vSphere 5.1 Changes
Softchoice Corporation
 
You voiced your concerns. VMware listened: Major Adjustments to vSphere 5 lic...
Softchoice Corporation
 
Nordic VMUG User Conference 2014 - Design VMware vCenter Server
Andrea Mauro
 
Limewood Event - VMware
BlueChipICT
 
VMUGIT Meeting Pisa 2015 - SDS secondo VMware: VSAN e VVOL
gguglie
 
Findability Day 2015 Mattias Ellison - Findwise - Enterprise Search and fin...
Findwise
 
Site Recovery Manager - Una visione architetturale
gguglie
 
SQL Server Performans İpuçları
turgaysahtiyan
 
Docker at Djangocon 2013 | Talk by Ken Cochrane
dotCloud
 
vCenter and ESXi network port communications
Animesh Dixit
 
Virtual Space Race: How IT with The Right Stuff Creates a Competitive Advantage
Softchoice Corporation
 
VMworld 2014: Site Recovery Manager and vSphere Replication
VMworld
 
Working Hard or Hardly Networked?
Softchoice Corporation
 
vmware_site_recovery_manager_and_net_app_fas_v-series_se_technical_presentati...
Vinh Nguyen
 
Creating 3rd Generation Web APIs with Hydra
Markus Lanthaler
 
Getting secure in a mobile-first world with EMS
Softchoice Corporation
 
How to hack VMware vCenter server in 60 seconds
Positive Hack Days
 
InfoGrid Core Ideas
InfoGrid.org
 
Ad

Similar to Cassandra Introduction & Features (20)

ODP
Intro to cassandra
Aaron Ploetz
 
PPT
Introduction to cassandra
Nguyen Quang
 
PDF
An Introduction to Apache Cassandra
Saeid Zebardast
 
PPTX
Presentation of Apache Cassandra
Nikiforos Botis
 
PDF
Apache Cassandra overview
ElifTech
 
PPTX
Cassandra tutorial
Ramakrishna kapa
 
PPTX
Learn Cassandra at edureka!
Edureka!
 
PPTX
Unit -3 _Cassandra-CRUD Operations_Practice Examples
chayapathiar1
 
PPTX
Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces
ssuser9d6aac
 
PPTX
Appache Cassandra
nehabsairam
 
PPTX
Cassandra
Pooja GV
 
PPTX
BigData Developers MeetUp
Christian Johannsen
 
PPTX
Cassandra for mission critical data
Oleksandr Semenov
 
PDF
Cassandra Workshop - Cassandra from scratch in one day
Carlos Alonso Pérez
 
PPTX
Learning Cassandra NoSQL
Pankaj Khattar
 
PPTX
cassandra_presentation_final
SergioBruno21
 
PDF
Cassandra Database
YounesCharfaoui
 
PDF
cassandra
Akash R
 
PDF
Cassandra and Spark
nickmbailey
 
PPTX
Cassandra's Sweet Spot - an introduction to Apache Cassandra
Dave Gardner
 
Intro to cassandra
Aaron Ploetz
 
Introduction to cassandra
Nguyen Quang
 
An Introduction to Apache Cassandra
Saeid Zebardast
 
Presentation of Apache Cassandra
Nikiforos Botis
 
Apache Cassandra overview
ElifTech
 
Cassandra tutorial
Ramakrishna kapa
 
Learn Cassandra at edureka!
Edureka!
 
Unit -3 _Cassandra-CRUD Operations_Practice Examples
chayapathiar1
 
Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces
ssuser9d6aac
 
Appache Cassandra
nehabsairam
 
Cassandra
Pooja GV
 
BigData Developers MeetUp
Christian Johannsen
 
Cassandra for mission critical data
Oleksandr Semenov
 
Cassandra Workshop - Cassandra from scratch in one day
Carlos Alonso Pérez
 
Learning Cassandra NoSQL
Pankaj Khattar
 
cassandra_presentation_final
SergioBruno21
 
Cassandra Database
YounesCharfaoui
 
cassandra
Akash R
 
Cassandra and Spark
nickmbailey
 
Cassandra's Sweet Spot - an introduction to Apache Cassandra
Dave Gardner
 
Ad

Recently uploaded (20)

PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 

Cassandra Introduction & Features

  • 1. Cassandra Introduction & Key Features Meetup Vienna Cassandra Users 13th of January 2014 [email protected]
  • 2. Definition Apache Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, fault-tolerant, tuneably consistent, column-oriented database that bases its distribution design on Amazon’s Dynamo and its data model on Google’s Bigtable. Created at Facebook, it is now used at some of the most popular sites on the Web [The Definitive Guide, Eben Hewitt, 2010] 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 2
  • 3. History Dynamo, 2007 Bigtable, 2006 OpenSource, 2008 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 3
  • 4. Key Features Distributed and Decentralized High Performance CQL – A SQL like query interface Elastic Scalability Cassandra Columnoriented Key-Value store 13/01/2014 High Availability and Fault Tolerance Tuneable Consistency Cassandra Introduction & Key Features by Philipp Potisk 4
  • 5. Distributed and Decentralized Datacenter 1 • Distributed: Capable of running on multiple machines • Decentralized: No single point of failure No master-slave issues due to peer-to-peer architecture (protocol "gossip") Single Cassandra cluster may run across geographically dispersed data centers 13/01/2014 Datacenter 2 1 7 6 2 5 3 4 12 8 11 9 10 Read- and writerequests to any node Cassandra Introduction & Key Features by Philipp Potisk 5
  • 6. Elastic Scalability 1 8 1 • Cassandra scales horizontally, adding more machines that have all or some of the data on • Adding of nodes increase performance throughput linearly • De-/ and increasing the nodecount happen seamlessly 4 Performance 2 throughput = N 3 2 Performance throughput = N x 2 7 4 6 5 Linearly scales to terabytes and petabytes of data 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 3 6
  • 7. Scaling Benchmark By Netflix* 48, 96, 144 and 288 instances, with 10, 20, 30 and 60 clients respectively. Each client generated ~20.000w/s having 400byte in size Cassandra scales linearly far beyond our current capacity requirements, and very rapid deployment automation makes it easy to manage. In particular, benchmarking in the cloud is fast, cheap and scalable, *http://techblog.netflix.com/201 1/11/benchmarking-cassandrascalability-on.html 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 7
  • 8. High Availability and Fault Tolerance • High Availability? Multiple networked computers operating in a cluster Facility for recognizing node failures Forward failing over requests to another part of the system 1 6 2 5 3 4 • Cassandra has High Availability No single point of failure due to the peer-to-peer architecture 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 8
  • 9. Tunable Consistency • Choose between strong and eventual consistency • Adjustable for read- and writeoperations separately • Conflicts are solved during reads, as focus lies on write-performance TUNABLE Available Consistency Use case dependent level of consistency 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 9
  • 10. When do we have strong consistency? • Simple Formula: jsmith (nodes_written + nodes_read) > replication_factor jsmith t1 t2 NW: 2 NR: 2 RF: 3 t1 t2 jsmith t1 • Ensures that a read always reflects the most recent write • If not: Weak consistency  Eventually consistent jsmith 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk t2 10
  • 11. Column-oriented Key-Value Store Row Key1 Column Key1 Column Value1 Column Key2 Column Value2 Column Key3 Column Value3 … … … • Data is stored in sparse multidimensional hash tables • A row can have multiple columns – not necessarily the same amount of columns for each row • Each row has a unique key, which also determines partitioning • No relations! Stored sorted by row key * Stored sorted by column key/value Map<RowKey, SortedMap<ColumnKey, ColumnValue>> * Row keys (partition keys) should be hashed, in order to distribute data across the cluster evenly 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 11
  • 12. CQL – An SQL-like query interface • “CQL 3 is the default and primary interface into the Cassandra DBMS” * • Familiar SQL-like syntax that maps to Cassandras storage engine and simplifies data modelling CRETE TABLE songs ( id uuid PRIMARY KEY, title text, album text, artist text, data blob, tags set<text> ); INSERT INTO songs (id, title, artist, album, tags) VALUES( 'a3e64f8f...', 'La Grange', 'ZZ Top', 'Tres Hombres'‚ {'cool', 'hot'}); SELECT * FROM songs WHERE id = 'a3e64f8f...'; “SQL-like” but NOT relational SQL * http://www.datastax.com/documentation/cql/3.0/pdf/cql30.pdf 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 12
  • 13. High Performance • Optimized from the ground up for high throughput • All disk writes are sequential, append only operations • No reading before writing • Cassandra`s threading-concept is optimized for running on multiprocessor/ multicore machines 13/01/2014 Optimized for writing, but fast reads are possible as well Cassandra Introduction & Key Features by Philipp Potisk 13
  • 14. Benchmark from 2011 (Cassandra 0.7.4)* ops Cassandra showed outstanding throughput in “INSERT-only” with 20,000 ops Insert: Enter 50 million 1K-sized records Read: Search key for a one hour period + optional update Hardware: Nehalem 6 Core x 2 CPU, 16GB Memory 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk *NoSql Benchmarking by Curbit http://www.cubrid.org/blog/de v-platform/nosqlbenchmarking/ 14
  • 15. Benchmark from 2013 (Cassandra 1.1.6)* * Benchmarking Top NoSQL Databases by End Point Corporation, http://www.datastax.com/wp-content/uploads/2013/02/WP-Benchmarking-Top-NoSQL-Databases.pdf Yahoo! Cloud Serving Benchmark: https://github.com/brianfrankcooper/YCSB 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 15
  • 16. When do we need these features? Lots of Writes, Statistics, and Analysis Geographical Distribution Large Deployments 13/01/2014 Evolving Applications Cassandra Introduction & Key Features by Philipp Potisk 16
  • 17. Who is using Cassandra? 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 17
  • 18. ebay Data Infrastructure* • • • • • • Thousands of nodes > 2K sharded logical host > 16K tables > 27K indexes > 140 billion SQLs/day > 5 PB provisioned • 10+ clusters • 100+ nodes • > 250 TB provisioned (local HDD + shared SSD) • > 9 billion writes/day • > 5 billion reads/day • Hundreds of nodes • Persistent & in-memory • > 40 billion SQLs/day Not replacing RDMBS but complementing! Hundreds of nodes > 50 TB > 2 billion ops/day • Thousands of nodes • The world largest cluster with 2K+ nodes *by Jay Patel, Cassandra Summit June 2013 San Francisco 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 18
  • 19. Cassandra Use Case at Ebay Application/Use Case • Time-series data and real-time insights • Fraud detection & prevention • Quality Click Pricing for affiliates • Order & Shipment Tracking •… • Server metrics collection • Taste graph-based next-gen recommendation system • Social Signals on eBay Product & Item pages 13/01/2014 Why Cassandra? • Multi-Datacenter (active-active) • No SPOF • Easy to scale • Write performance • Distributed Counters Cassandra Introduction & Key Features by Philipp Potisk 19
  • 21. Summary • History • Key features of Cassandra • • • • • • • Distributed and Decentralized Elastic Scalability High Availability and Fault Tolerance Tunable Consistency Column-oriented key-value store CQL interface High Performance • Ebay Use Case 13/01/2014 Apache project: http://cassandra.apache.org Community portal: http://planetcassandra.org Documentation: http://www.datastax.com/docs Cassandra Introduction & Key Features by Philipp Potisk 21