SlideShare a Scribd company logo
Introduction to 
Apache 
1
Me 
Robert Stupp 
Freelancer, Coder, Architect 
@snazy snazy@snazy.de 
Contributor to Apache Cassandra, 
3.0 UDFs (CASSANDRA-7395 + related) 
Databases, Network, Backend 
2
Agenda 
Apache Cassandra History 
Design Principles 
Outstanding differences 
CQL Intro 
Access C* 
Clusters 
Cassandra Future 
3
Apache Cassandra 
History 
4
Apache Cassandra 
started at Facebook 
inspired by 
Note: Facebook initially had 
two data centers. 
5
2.1 released in Sep 2014 
6
Apache Cassandra 
Design Principles 
7
Hardware failures 
can and will occur! 
Cassandra handles failures. 
From single node to whole data center. 
From client to server. 
8
The complicated part 
when learning Cassandra, 
is to understand 
Cassandra’s simplicity 
9
Keep it simple 
all nodes are equal 
master-less architecture 
no name nodes 
no SPOF (single point of failure) 
no read before modify 
(prevent race conditions) 
10
Keep it running 
No need to take cluster down … e.g. 
during maintenance 
during software update 
Rolling restart is your friend 
11
Outstanding 
Differences 
12
Cassandra 
Highly scalable 
runs with a few nodes 
up to 1000+ nodes cluster! 
Linear scalability (proven!) 
Multi datacenter aware (world-wide!) 
No SPOF 
13
Cassandra @ Apple 
14
Linear Scalability 
15
Scaling Cassandra 
More data? 
-> add more nodes 
Faster access? 
-> add more nodes 
16
Read / Write 
performance 
Reads are fast 
Writes are even faster 
17
Durability 
Writes are durable - period. 
18
Availability @ 
Netflix 
19 
Chaos 
Monkey 
kills nodes randomly
Availability @ 
Netflix 
20 
Chaos 
Gorilla 
kill regions randomly
Availability @ 
Netflix 
Chaos 
Kong 
kills whole data centers 
21
Availability @ 
Netflix 
http://de.slideshare.net/planetcassandra/ 
active-active-c-behind-the-scenes-at-netflix 
22
32 node cluster (Rasperry PIs) 
@DataStax 
23
Most outstanding 
Great documentation 
Many blog posts 
Many presentations 
Many videos 
Regular webinars 
Huge, active and healthy community 
24
Data Distribution 
25
DHT 
Data is organized in a 
„Distributed Hash Table“ 
(hash over row key) 
26
DHT 
0 
27 
1 
2 
3 
4 
5 
6 
7
Replication 
28
Replication Factor 2 
0 
29 
1 
2 
3 
4 
5 
6 
7 
Row A 
Row B
Replication Factor 3 
0 
30 
1 
2 
3 
4 
5 
6 
7 
Row A 
Row B
Consistency 
Consistency defined per request 
Several consistency levels (CLs) 
for different needs 
31
Eventual consistency 
is not 
hopefully consistent 
EC means there’s a time gap until updates 
are consistently readable 
32
Consistency Levels 
ANY (only for writes) 
ONE, LOCAL_ONE, 
TWO, THREE, (not recommended) 
ALL, (not recommended) 
QUORUM, LOCAL_QUORUM, EACH_QUORUM 
SERIAL, LOCAL_SERIAL 
33
Consistency 
Data is always replicated 
CL defines how many replicas must 
fulfill the request 
34
Write 
0 
35 
1 
2 
3 
4 
5 
6 
7 
Write
Write 
0 
36 
1 
2 
3 
4 
5 
6 
7 
Write
Mutli DC setup 
DC 1 DC 2 
37
Multi DC replication 
38 
Write 
DC 1 DC 2
Mutli DC replication 
39 
Write 
DC 1 DC 2
Mutli DC replication 
40 
Write 
DC 1 DC 2
Replication & 
Consistency 
Define # of replicas 
using replication factor 
Define required consistency 
per request 
41
CQL Introduction 
CQL = Cassandra query language 
42
“CQL is SQL 
minus joins, 
minus subqueries, 
plus collections” 
(plus user types, 
plus tuple types) 
43
Why CQL? 
Introduces a schema to Cassandra 
Familiar syntax 
Easy to understand 
DML operations are atomic 
44
Data model 
(hierarchical view) 
Keyspace (schema) 
Table (column family) 
Row 
partition key (part of primary key) 
static columns 
clustering key (part of primary key) 
columns 
45
CQL / DDL 
Similar to SQL 
CREATE TABLE … 
ALTER TABLE … 
DROP TABLE … 
46
CQL / DML 
Similar to SQL 
INSERT … 
UPDATE … 
DELETE … 
SELECT … 
47
CQL / BATCH 
Group related modifications 
(INSERT, UPDATE, DELETE) 
Atomic operation 
48
CQL types 
boolean, int (32bit), bigint (64bit), 
float, double, 
decimal ("BigDecimal"), 
varint ("BigInteger"), 
ascii, text (= varchar), blob, 
inet, timestamp, uuid, timeuuid 
49
CQL collection 
types 
list < foo > 
set < foo > 
map < foo , bar > 
Since C* 2.1 collections can contain 
any type - even other collections. 
50
CQL composite 
types 
user types (C* 2.1) 
are composite types with named fields 
tuple types (C* 2.1) 
are unstructured lists of values 
51
CQL / user types 
CREATE TYPE address ( 
street text, 
zip int, 
city text); 
CREATE TABLE users ( 
username text, 
addresses map<text, address>, 
... 
52
Cassandra 
Data Modeling 
Access by key 
no access by arbitrary WHERE clause 
Duplicate data (it’s ok!) 
Aggregate data 
Build application maintained indexes 
53
RDBMS modeling 
54
C* modeling 
55
Data Modeling 
with RDBMS 
Driven by 
"How can I store 
something right?" 
"What answers 
do I have?" 
56
Data Modeling 
with NoSQL 
Driven by 
"How can I access 
something right?" 
"What questions 
do I have?" 
57
Data Modeling 
Basics 
Work top-down. Think about: 
What does the application do? 
What are the access patterns? 
Now design data model 
58
Data Modeling 
http://de.slideshare.net/planetcassandra/ 
cassandra-day-sv-2014-fundamentals-of- 
apache-cassandra-data-modeling 
http://de.slideshare.net/planetcassandra/ 
data-modeling-with-travis-price 
59
Accessing 
Cassandra 
60
Command Line 
cqlsh 
CQL shell 
nodetool 
node/cluster administration 
61
GUI: DevCenter 
Visual query tool 
62
Stress test? 
Cassandra 2.1 comes with improved 
stress tool 
Simulate read+write workload 
Uses configurable data 
Works against older C* versions, too 
63
DataStax APLv2 
Open Source Drivers 
for Java 
for Python 
for C# 
for Scala / Spark 
https://github.com/datastax/ 
or http://www.datastax.com/download 
64
Native protocol 
C*’s own net protocol for clients 
Request multiplexing 
Schema change notifications 
Cluster change notifications 
65
Third Party Drivers 
for huge number of languages 
66
Mappers 
High level mappers exist at least for 
Java 
Special case: Scala 
due to its strong+complex type 
model (DataStax OSS Spark driver) 
67
Spark + Hadoop 
Yes - works really good 
Note: Spark is about 100x faster 
68
Clusters 
69
Cluster sizes 
C* works with a few nodes 
C* works with several hundred / 
thousand nodes 
70
Cluster setup 
Configure for multiple data centers 
Plan for multi-DC setup :) 
71
Cluster experience 
Remember: A single Cassandra 
clusters works over multiple data 
centers all over the world 
„Desaster proven“ 
Hurricanes 
Amazon DC outages 
72
Apache Cassandra 
Future 
73
Cassandra 3.0 
(in development) 
User Defined Functions 
Aggregate functions 
Functional indexes 
Workload recording + playback 
Better SSTables, Fully off-heap row cache, Better 
serial consistency 
Indexes w/ high cardinality 
74 
Subject 
to 
change!!!
Get active ! 
75
Cassandra Community 
http://cassandra.apache.org/ 
http://planetcassandra.org/ - Blog 
http://www.slideshare.net/ 
planetcassandra/presentations 
http://de.slideshare.net/DataStax/ 
presentations 
76
Cassandra Community 
https://www.youtube.com/user/ 
PlanetCassandra 
https://www.youtube.com/user/DataStax 
http://www.datastax.com/dev/blog/ 
http://www.datastax.com/docs/ 
Users Mailing List 
users@cassandra.apache.org 
77
Free C* Training! 
http://planetcassandra.org/cassandra-training/ 
78
Get involved! 
Ask questions, 
submit RFEs or experiences to 
user mailing list 
user@cassandra.apache.org 
Answers arrive quickly! 
79
Live Demo 
User Defined Functions 
80
C* 3.0 UDFs 
Users create functions using 
CREATE FUNCTION … 
LANGUAGE … 
AS … 
Java, JavaScript, Scala, Groovy, 
JRuby, Jython 
Functions work on all nodes 
81
C* 3.0 UDFs 
Example 
CREATE FUNCTION sin(input double) 
RETURNS double 
LANGUAGE javascript 
AS 'Math.sin(input)'; 
82 
This is JavaScript!
UDFs for what? 
Own aggregation code - e.g. 
SELECT sum(value) FROM table 
WHERE …; 
Functional indexes - e.g. 
CREATE INDEX idx 
ON table ( myFunction(colname) ); 
83 
Targeted for C* 3.0
Thanks 
for your attention 
Download Apache Cassandra at 
http://cassandra.apache.org/ 
Robert Stupp 
@snazy 
snazy@snazy.de 
de.slideshare.net/RobertStupp 
84
Q & A 
85
86
BACKUP SLIDES 
User-Defined-Functions 
Demo 
87
88
89
90
91
92
93
94
95
96
97
98
99

More Related Content

What's hot (20)

PDF
ETL With Cassandra Streaming Bulk Loading
alex_araujo
 
PPTX
Cassandra an overview
PritamKathar
 
PDF
Intro to Cassandra
DataStax Academy
 
PPTX
NoSQL databases - An introduction
Pooyan Mehrparvar
 
PPTX
Apache Cassandra at the Geek2Geek Berlin
Christian Johannsen
 
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
PPTX
Cassandra Data Modeling - Practical Considerations @ Netflix
nkorla1share
 
PPTX
Delta lake and the delta architecture
Adam Doyle
 
PDF
Introduction to Cassandra Architecture
nickmbailey
 
PDF
Cassandra 101
Nader Ganayem
 
PDF
Cassandra overview
Sean Murphy
 
PDF
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Anant Corporation
 
PDF
How to Build a Scylla Database Cluster that Fits Your Needs
ScyllaDB
 
PDF
The delta architecture
Prakash Chockalingam
 
PDF
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
PPTX
Cassandra
Upaang Saxena
 
PDF
Introduction to PySpark
Russell Jurney
 
PDF
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
PDF
Building Robust ETL Pipelines with Apache Spark
Databricks
 
PDF
Deep Dive into Cassandra
Brent Theisen
 
ETL With Cassandra Streaming Bulk Loading
alex_araujo
 
Cassandra an overview
PritamKathar
 
Intro to Cassandra
DataStax Academy
 
NoSQL databases - An introduction
Pooyan Mehrparvar
 
Apache Cassandra at the Geek2Geek Berlin
Christian Johannsen
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Cassandra Data Modeling - Practical Considerations @ Netflix
nkorla1share
 
Delta lake and the delta architecture
Adam Doyle
 
Introduction to Cassandra Architecture
nickmbailey
 
Cassandra 101
Nader Ganayem
 
Cassandra overview
Sean Murphy
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Anant Corporation
 
How to Build a Scylla Database Cluster that Fits Your Needs
ScyllaDB
 
The delta architecture
Prakash Chockalingam
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Cassandra
Upaang Saxena
 
Introduction to PySpark
Russell Jurney
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
Building Robust ETL Pipelines with Apache Spark
Databricks
 
Deep Dive into Cassandra
Brent Theisen
 

Viewers also liked (7)

PDF
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
DataStax Academy
 
PPTX
Cql – cassandra query language
Courtney Robinson
 
PDF
Migrating Netflix from Datacenter Oracle to Global Cassandra
Adrian Cockcroft
 
PDF
Solr & Cassandra: Searching Cassandra with DataStax Enterprise
DataStax Academy
 
PPT
Introduction to cassandra
Nguyen Quang
 
PPTX
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
DataStax
 
PDF
Cassandra at eBay - Cassandra Summit 2012
Jay Patel
 
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
DataStax Academy
 
Cql – cassandra query language
Courtney Robinson
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Adrian Cockcroft
 
Solr & Cassandra: Searching Cassandra with DataStax Enterprise
DataStax Academy
 
Introduction to cassandra
Nguyen Quang
 
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
DataStax
 
Cassandra at eBay - Cassandra Summit 2012
Jay Patel
 
Ad

Similar to Introduction to Apache Cassandra (20)

PDF
A Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
Red Hat Developers
 
PDF
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis
 
PPTX
DataStax NYC Java Meetup: Cassandra with Java
carolinedatastax
 
PPTX
BigData Developers MeetUp
Christian Johannsen
 
PPTX
Cassandra - A decentralized storage system
Arunit Gupta
 
PPTX
Cassandra vs. ScyllaDB: Evolutionary Differences
ScyllaDB
 
PPTX
NoSQL Intro with cassandra
Brian Enochson
 
PPTX
Cassandra training
András Fehér
 
PDF
Cassandra and Spark
nickmbailey
 
PPTX
Appache Cassandra
nehabsairam
 
PPTX
Cassandra implementation for collecting data and presenting data
Chen Robert
 
PDF
Multi-cluster k8ssandra
KubernetesCommunityD
 
PPTX
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
DataStax Academy
 
PPTX
Scaling opensimulator inventory using nosql
David Daeschler
 
PDF
MySQL Cluster Scaling to a Billion Queries
Bernd Ocklin
 
PDF
Cisco: Cassandra adoption on Cisco UCS & OpenStack
DataStax Academy
 
PDF
NewSQL Database Overview
Steve Min
 
PPTX
Apache Cassandra introduction
fardinjamshidi
 
PPT
Cassandra - A Distributed Database System
Md. Shohel Rana
 
PDF
No sql
keivan mahdavi
 
A Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
Red Hat Developers
 
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis
 
DataStax NYC Java Meetup: Cassandra with Java
carolinedatastax
 
BigData Developers MeetUp
Christian Johannsen
 
Cassandra - A decentralized storage system
Arunit Gupta
 
Cassandra vs. ScyllaDB: Evolutionary Differences
ScyllaDB
 
NoSQL Intro with cassandra
Brian Enochson
 
Cassandra training
András Fehér
 
Cassandra and Spark
nickmbailey
 
Appache Cassandra
nehabsairam
 
Cassandra implementation for collecting data and presenting data
Chen Robert
 
Multi-cluster k8ssandra
KubernetesCommunityD
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
DataStax Academy
 
Scaling opensimulator inventory using nosql
David Daeschler
 
MySQL Cluster Scaling to a Billion Queries
Bernd Ocklin
 
Cisco: Cassandra adoption on Cisco UCS & OpenStack
DataStax Academy
 
NewSQL Database Overview
Steve Min
 
Apache Cassandra introduction
fardinjamshidi
 
Cassandra - A Distributed Database System
Md. Shohel Rana
 
Ad

Recently uploaded (20)

PPTX
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
PDF
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
PDF
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PDF
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PDF
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PDF
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PDF
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 

Introduction to Apache Cassandra