SlideShare a Scribd company logo
Cassandra : Introduction
Patrick McFadin
Chief Evangelist/Solution Architect - DataStax
@PatrickMcFadin
©2013 DataStax Confidential. Do not distribute without consent.
Who I am

• Patrick McFadin
• Solution Architect at DataStax
• Cassandra MVP
• User for years
• Follow me for more:

Dude.
Uptime == $$

@PatrickMcFadin

I talk about Cassandra and building scalable, resilient apps ALL THE TIME!
!2
Five Years of Cassandra

0.1
Jul-08

...

0.3
0

0.6
1

0.7

1.0
2

1.2
3

DSE

4

2.0
5
Why Cassandra?
The Best
Persistence
Tier
For Your
Application
!
!
!
!
!
!
!
!
Cassandra - An introduction
Cassandra - Roots
• Based on Amazon Dynamo and Google BigTable paper
• Shared nothing
• Data safe as possible
• Predictable scaling

Dynamo

BigTable
!7
Cassandra - More than one server
Each node owns
25% of the data

• All nodes participate in a cluster
• Shared nothing
• Add or remove as needed
25%

• More capacity? Add a server


25%

25%

25%

!8
Core Concepts Write path

<row,column>

Compacted later
Core Concepts Read Path

Real user story
• New app
• SSDs
• 2.5 m requests
• Client P99: 3.17ms!
Cassandra - Locally Distributed
• Client writes to any node
• Node coordinates with others
• Data replicated in parallel
• Replication factor: How many
copies of your data?
• RF = 3 here

!11
Cassandra - Consistency
• Consistency Level (CL)
• Client specifies per read or write

• ALL = All replicas ack
• QUORUM = > 51% of replicas ack
• LOCAL_QUORUM = > 51% in local DC ack
• ONE = Only one replica acks
!12
Cassandra - Transparent to the application
• A single node failure shouldn’t bring failure
• Replication Factor + Consistency Level = Success
• This example:
• RF = 3
• CL = QUORUM

>51% Ack so we are good!

!13
My favorite feature.

Ever!

!14
Cassandra - Geographically Distributed
• Client writes local
• Data syncs across WAN
• Replication Factor per DC

!15
Cassandra Applications - Drivers
• DataStax Drivers for Cassandra
• Java
• C#
• Python
• more on the way

!16
Cassandra Applications - Connecting
• Create a pool of local servers
• Client just uses session to interact with Cassandra
!
contactPoints = {“10.0.0.1”,”10.0.0.2”}!

!

keyspace = “videodb”!

!
!

public VideoDbBasicImpl(List<String> contactPoints, String keyspace) {!
cluster = Cluster!
.builder()!
.addContactPoints(!
contactPoints.toArray(new String[contactPoints.size()]))!
.withLoadBalancingPolicy(Policies.defaultLoadBalancingPolicy())!
.withRetryPolicy(Policies.defaultRetryPolicy())!
.build();!

!

!

session = cluster.connect(keyspace);!
}

!17
CQL Intro
• Cassandra Query Language
• SQL–like language to query Cassandra
• Limited predicates. Attempts to prevent bad queries
• But still offers enough leeway to get into trouble

!18
Data Model Logical containers
Cluster - Contains all nodes. Even across WAN
Keyspace - Contains all tables. Specifies replication
Table (Column Family) - Contains rows
CQL Intro
• CREATE / DROP / ALTER TABLE
• SELECT
!

• BUT
• INSERT AND UPDATE are similar to each other
• If a row doesn’t exist, UPDATE will insert it, and if it exists, INSERT will replace it.
• Think of it as an UPSERT
• Therefore we never get a key violation
• For updates, Cassandra never reads (no col = col + 1)

!20
Data Modeling Creating Tables
CREATE TABLE user (!
! username varchar,!
! firstname varchar,!
! lastname varchar,!
! shopping_carts set<varchar>,!
! PRIMARY KEY (username)!
);

Collection!
CREATE TABLE shopping_cart (!
! username varchar,!
! cart_name text!
! item_id int,!
! item_name varchar,!
description varchar,!
! price float,!
! item_detail map<varchar,varchar>!
! PRIMARY KEY
((username,cart_name),item_id)!
);

Creates compound partition row key
CQL Inserts
• Insert will always overwrite

INSERT INTO users (username, firstname, lastname, !
email, password, created_date)!
VALUES ('pmcfadin','Patrick','McFadin',!
['patrick@datastax.com'],'ba27e03fd95e507daf2937c937d499ab',!
'2011-06-20 13:50:00');!

!22
CQL Selects
• No joins
• Data is returned in row/column format
SELECT username, firstname, lastname, !
email, password, created_date!
FROM users!
WHERE username = 'pmcfadin';!

username | firstname | lastname | email
| password
| created_date!
----------+-----------+----------+--------------------------+----------------------------------+--------------------------!
pmcfadin |
Patrick | McFadin | ['patrick@datastax.com'] | ba27e03fd95e507daf2937c937d499ab | 2011-06-20 13:50:00-0700!

!23
Cassandra and Time Series
Time Series Taming the beast
• Peter Higgs and Francois Englert. Nobel prize for Physics
• Theorized the existence of the Higgs boson
!

• Found using ATLAS
!
!

• Data stored in P-BEAST
!
!

• Time series running on Cassandra
Use Cassandra for time series

Get a nobel prize
Time Series Why
• Storage model from BigTable is perfect
• One row key and tons of (variable)columns
• Single layout on disk

Row Key

Column Name

Column Name

Column Value

Column Value
Time Series Example
• Storing weather data
• One weather station
• Temperature measurements every minute

WeatherStation ID 2013-10-09 10:00 AM 2013-10-09 10:00 AM
72 Degrees

72 Degrees

2013-10-10 11:00 AM
65 Degrees
Time Series Example
• Query data
• Weather Station ID = Locality of single node
Date query

weatherStationID = 100 AND!
date = 2013-10-09 10:00 AM

WeatherStation ID
2013-10-09 10:00 AM 2013-10-09 10:00 AM
100
72 Degrees

72 Degrees

2013-10-10 11:00 AM
65 Degrees

OR
Date Range

weatherStationID = 100 AND!
date > 2013-10-09 10:00 AM AND!
date < 2013-10-10 11:01 AM
Time Series How
• CQL expresses this well
• Data partitioned by weather station ID and time
CREATE TABLE temperature (!
weatherstation_id text,!
event_time timestamp,!
temperature text,!
PRIMARY KEY (weatherstation_id,event_time)!
);

!
!
!

• Easy to insert data
INSERT

INTO temperature(weatherstation_id,event_time,temperature) !
VALUES ('1234ABCD','2013-04-03 07:01:00','72F');

!
!

• Easy to query

SELECT temperature !
FROM temperature !
WHERE weatherstation_id='1234ABCD'!
AND event_time > '2013-04-03 07:01:00'!
AND event_time < '2013-04-03 07:04:00';
Time Series Further partitioning
• At every minute you will eventually run out of rows
• 2 billion columns per storage row
• Data partitioned by weather station ID and time
• Use the partition key to split things up
CREATE TABLE temperature_by_day (!
weatherstation_id text,!
date text,!
event_time timestamp,!
temperature text,!
PRIMARY KEY ((weatherstation_id,date),event_time)!
);
Time Series Further Partitioning
• Still easy to insert
!
!

INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature) !
VALUES ('1234ABCD','2013-04-03','2013-04-03 07:01:00','72F');

!
!

• Still easy to query
SELECT temperature !
FROM temperature_by_day !
WHERE weatherstation_id='1234ABCD' !
AND date='2013-04-03'!
AND event_time > '2013-04-03 07:01:00'!
AND event_time < '2013-04-03 07:04:00';
Time Series Use cases
• Logging
• Thing Tracking (IoT)
• Sensor Data
• User Tracking
• Fraud Detection
• Nobel prizes!
Application Example - Layout
• Active-Active
• Service based DNS routing

Cassandra Replication

!34
Application Example - Uptime
• Normal server maintenance
• Application is unaware

Cassandra Replication

!35
Application Example - Failure
• Data center failure

Another happy user!

• Data is safe. Route traffic.

33
!36
Cassandra Users and Use Cases
Netflix!
• If you haven’t heard their story… where have you been?
• 18B market cap — Runs on Cassandra
• User accounts
• Play lists
• Payments
• Statistics
Spotify
• Millions of songs. Millions of users.
• Playlists
• 1 billion playlists
• 30+ Cassandra clusters
• 50+ TB of data
• 40k req/sec peak
http://www.slideshare.net/noaresare/cassandra-nyc

!39
Instagram(Facebook)
• Loads and loads of photos. (Probably yours)
• All in AWS
• Security audits
• News feed
• 20k writes/sec. 15k reads/sec.

!40
DataStax Ac*demy for Apache Cassandra
Content
• First four sessions available with Weekly roll-out of 7 sessions total
• Based on DataStax Community Edition
• CQL, Schema Design and Data Modeling
• Introduction to Cassandra Objects
• First Java, then Python, C# and .NET

Goals
• 100,000 Registrations by the end of 2014
• 25,000 Certifications by the end of 2014
https://datastaxacademy.elogiclearning.com/
!41
©2013 DataStax Confidential. Do not distribute without consent.

!42

More Related Content

What's hot (20)

PDF
Real data models of silicon valley
Patrick McFadin
 
PDF
Spark Streaming with Cassandra
Jacek Lewandowski
 
PDF
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
Patrick McFadin
 
PDF
An Introduction to time series with Team Apache
Patrick McFadin
 
PDF
Apache cassandra and spark. you got the the lighter, let's start the fire
Patrick McFadin
 
PPTX
Advanced Sharding Features in MongoDB 2.4
MongoDB
 
PDF
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB
 
PDF
Escape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Piotr Kolaczkowski
 
PDF
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
DataStax Academy
 
PDF
Cassandra Basics, Counters and Time Series Modeling
Vassilis Bekiaris
 
PDF
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
DataStax
 
PDF
Laying down the smack on your data pipelines
Patrick McFadin
 
PDF
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
DataStax
 
PDF
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Patrick McFadin
 
PDF
Spark with Cassandra by Christopher Batey
Spark Summit
 
PDF
Apache Cassandra and Drivers
DataStax Academy
 
PPT
Webinar: Getting Started with Apache Cassandra
DataStax
 
PDF
Cassandra 2.0 (Introduction)
bigdatagurus_meetup
 
PDF
Lightning fast analytics with Spark and Cassandra
nickmbailey
 
PPTX
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
DataStax
 
Real data models of silicon valley
Patrick McFadin
 
Spark Streaming with Cassandra
Jacek Lewandowski
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
Patrick McFadin
 
An Introduction to time series with Team Apache
Patrick McFadin
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Patrick McFadin
 
Advanced Sharding Features in MongoDB 2.4
MongoDB
 
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB
 
Escape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Piotr Kolaczkowski
 
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
DataStax Academy
 
Cassandra Basics, Counters and Time Series Modeling
Vassilis Bekiaris
 
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
DataStax
 
Laying down the smack on your data pipelines
Patrick McFadin
 
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
DataStax
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Patrick McFadin
 
Spark with Cassandra by Christopher Batey
Spark Summit
 
Apache Cassandra and Drivers
DataStax Academy
 
Webinar: Getting Started with Apache Cassandra
DataStax
 
Cassandra 2.0 (Introduction)
bigdatagurus_meetup
 
Lightning fast analytics with Spark and Cassandra
nickmbailey
 
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
DataStax
 

Viewers also liked (16)

PDF
Cassandra Summit: C* Keys - Partitioning, Clustering, & Crossfit
Adam Hutson
 
ODP
Introduction to apache_cassandra_for_developers-lhg
zznate
 
PPT
Introduction to apache_cassandra_for_develope
zznate
 
PDF
Cassandra Summit 2014: Cassandra at Instagram 2014
DataStax Academy
 
PDF
Cassandra Day Denver 2014: Introduction to Apache Cassandra
DataStax Academy
 
PDF
Introduction to Cassandra Architecture
nickmbailey
 
PDF
NoSQL Essentials: Cassandra
Fernando Rodriguez
 
PDF
Open source or proprietary, choose wisely!
Patrick McFadin
 
PDF
Introduction to Cassandra & Data model
Duyhai Doan
 
PDF
Cassandra Summit 2014: CQL Under the Hood
DataStax Academy
 
PDF
Introduction to Cassandra Basics
nickmbailey
 
PDF
Introduction to Apache Cassandra
Robert Stupp
 
PDF
Overview of DataStax OpsCenter
DataStax
 
PDF
DataStax: Backup and Restore in Cassandra and OpsCenter
DataStax Academy
 
PDF
Apache cassandra architecture internals
Bhuvan Rawal
 
PDF
Cassandra for Sysadmins
Nathan Milford
 
Cassandra Summit: C* Keys - Partitioning, Clustering, & Crossfit
Adam Hutson
 
Introduction to apache_cassandra_for_developers-lhg
zznate
 
Introduction to apache_cassandra_for_develope
zznate
 
Cassandra Summit 2014: Cassandra at Instagram 2014
DataStax Academy
 
Cassandra Day Denver 2014: Introduction to Apache Cassandra
DataStax Academy
 
Introduction to Cassandra Architecture
nickmbailey
 
NoSQL Essentials: Cassandra
Fernando Rodriguez
 
Open source or proprietary, choose wisely!
Patrick McFadin
 
Introduction to Cassandra & Data model
Duyhai Doan
 
Cassandra Summit 2014: CQL Under the Hood
DataStax Academy
 
Introduction to Cassandra Basics
nickmbailey
 
Introduction to Apache Cassandra
Robert Stupp
 
Overview of DataStax OpsCenter
DataStax
 
DataStax: Backup and Restore in Cassandra and OpsCenter
DataStax Academy
 
Apache cassandra architecture internals
Bhuvan Rawal
 
Cassandra for Sysadmins
Nathan Milford
 
Ad

Similar to Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin (20)

PPTX
Apache Cassandra at the Geek2Geek Berlin
Christian Johannsen
 
PPTX
Migrating from a Relational Database to Cassandra: Why, Where, When and How
Anant Corporation
 
PDF
Moving from a Relational Database to Cassandra: Why, Where, When, and How
Anant Corporation
 
PDF
Apache Cassandra in the Real World
Jeremy Hanna
 
PDF
Cassandra NoSQL Tutorial
Michelle Darling
 
PPTX
Cassandra's Sweet Spot - an introduction to Apache Cassandra
Dave Gardner
 
PPTX
Unit -3 _Cassandra-CRUD Operations_Practice Examples
chayapathiar1
 
PPTX
Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces
ssuser9d6aac
 
PDF
An Introduction to Apache Cassandra
Saeid Zebardast
 
PDF
Slides: Relational to NoSQL Migration
DATAVERSITY
 
PPTX
BigData Developers MeetUp
Christian Johannsen
 
PDF
Cassandra and Spark
nickmbailey
 
PPTX
Cassandra - A decentralized storage system
Arunit Gupta
 
PDF
Apache cassandra & apache spark for time series data
Patrick McFadin
 
PDF
1 Dundee - Cassandra 101
Christopher Batey
 
PDF
Apache Cassandra in the Real World
Jeremy Hanna
 
PPTX
Presentation
Dimitris Stripelis
 
PDF
Deep Dive into Cassandra
Brent Theisen
 
PDF
Cassandra: An Alien Technology That's not so Alien
Brian Hess
 
PPTX
Apache Cassandra Data Modeling with Travis Price
DataStax Academy
 
Apache Cassandra at the Geek2Geek Berlin
Christian Johannsen
 
Migrating from a Relational Database to Cassandra: Why, Where, When and How
Anant Corporation
 
Moving from a Relational Database to Cassandra: Why, Where, When, and How
Anant Corporation
 
Apache Cassandra in the Real World
Jeremy Hanna
 
Cassandra NoSQL Tutorial
Michelle Darling
 
Cassandra's Sweet Spot - an introduction to Apache Cassandra
Dave Gardner
 
Unit -3 _Cassandra-CRUD Operations_Practice Examples
chayapathiar1
 
Unit -3 -Features of Cassandra, CQL Data types, CQLSH, Keyspaces
ssuser9d6aac
 
An Introduction to Apache Cassandra
Saeid Zebardast
 
Slides: Relational to NoSQL Migration
DATAVERSITY
 
BigData Developers MeetUp
Christian Johannsen
 
Cassandra and Spark
nickmbailey
 
Cassandra - A decentralized storage system
Arunit Gupta
 
Apache cassandra & apache spark for time series data
Patrick McFadin
 
1 Dundee - Cassandra 101
Christopher Batey
 
Apache Cassandra in the Real World
Jeremy Hanna
 
Presentation
Dimitris Stripelis
 
Deep Dive into Cassandra
Brent Theisen
 
Cassandra: An Alien Technology That's not so Alien
Brian Hess
 
Apache Cassandra Data Modeling with Travis Price
DataStax Academy
 
Ad

More from DataStax Academy (20)

PDF
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
DataStax Academy
 
PPTX
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
PPTX
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
DataStax Academy
 
PPTX
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
PDF
Cassandra 3.0 Data Modeling
DataStax Academy
 
PPTX
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
PDF
Data Modeling for Apache Cassandra
DataStax Academy
 
PDF
Coursera Cassandra Driver
DataStax Academy
 
PDF
Production Ready Cassandra
DataStax Academy
 
PDF
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 1
DataStax Academy
 
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 2
DataStax Academy
 
PDF
Standing Up Your First Cluster
DataStax Academy
 
PDF
Real Time Analytics with Dse
DataStax Academy
 
PDF
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
PDF
Cassandra Core Concepts
DataStax Academy
 
PPTX
Bad Habits Die Hard
DataStax Academy
 
PDF
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 
PDF
Advanced Cassandra
DataStax Academy
 
PDF
Getting Started with Graph Databases
DataStax Academy
 
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
DataStax Academy
 
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
DataStax Academy
 
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra 3.0 Data Modeling
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Data Modeling for Apache Cassandra
DataStax Academy
 
Coursera Cassandra Driver
DataStax Academy
 
Production Ready Cassandra
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
DataStax Academy
 
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Cassandra Core Concepts
DataStax Academy
 
Bad Habits Die Hard
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 
Advanced Cassandra
DataStax Academy
 
Getting Started with Graph Databases
DataStax Academy
 

Recently uploaded (20)

PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 

Cassandra Community Webinar | Getting Started with Apache Cassandra with Patrick McFadin

  • 1. Cassandra : Introduction Patrick McFadin Chief Evangelist/Solution Architect - DataStax @PatrickMcFadin ©2013 DataStax Confidential. Do not distribute without consent.
  • 2. Who I am • Patrick McFadin • Solution Architect at DataStax • Cassandra MVP • User for years • Follow me for more: Dude. Uptime == $$ @PatrickMcFadin I talk about Cassandra and building scalable, resilient apps ALL THE TIME! !2
  • 3. Five Years of Cassandra 0.1 Jul-08 ... 0.3 0 0.6 1 0.7 1.0 2 1.2 3 DSE 4 2.0 5
  • 6. Cassandra - An introduction
  • 7. Cassandra - Roots • Based on Amazon Dynamo and Google BigTable paper • Shared nothing • Data safe as possible • Predictable scaling Dynamo BigTable !7
  • 8. Cassandra - More than one server Each node owns 25% of the data • All nodes participate in a cluster • Shared nothing • Add or remove as needed 25% • More capacity? Add a server
 25% 25% 25% !8
  • 9. Core Concepts Write path <row,column> Compacted later
  • 10. Core Concepts Read Path Real user story • New app • SSDs • 2.5 m requests • Client P99: 3.17ms!
  • 11. Cassandra - Locally Distributed • Client writes to any node • Node coordinates with others • Data replicated in parallel • Replication factor: How many copies of your data? • RF = 3 here !11
  • 12. Cassandra - Consistency • Consistency Level (CL) • Client specifies per read or write • ALL = All replicas ack • QUORUM = > 51% of replicas ack • LOCAL_QUORUM = > 51% in local DC ack • ONE = Only one replica acks !12
  • 13. Cassandra - Transparent to the application • A single node failure shouldn’t bring failure • Replication Factor + Consistency Level = Success • This example: • RF = 3 • CL = QUORUM >51% Ack so we are good! !13
  • 15. Cassandra - Geographically Distributed • Client writes local • Data syncs across WAN • Replication Factor per DC !15
  • 16. Cassandra Applications - Drivers • DataStax Drivers for Cassandra • Java • C# • Python • more on the way !16
  • 17. Cassandra Applications - Connecting • Create a pool of local servers • Client just uses session to interact with Cassandra ! contactPoints = {“10.0.0.1”,”10.0.0.2”}! ! keyspace = “videodb”! ! ! public VideoDbBasicImpl(List<String> contactPoints, String keyspace) {! cluster = Cluster! .builder()! .addContactPoints(! contactPoints.toArray(new String[contactPoints.size()]))! .withLoadBalancingPolicy(Policies.defaultLoadBalancingPolicy())! .withRetryPolicy(Policies.defaultRetryPolicy())! .build();! ! ! session = cluster.connect(keyspace);! } !17
  • 18. CQL Intro • Cassandra Query Language • SQL–like language to query Cassandra • Limited predicates. Attempts to prevent bad queries • But still offers enough leeway to get into trouble !18
  • 19. Data Model Logical containers Cluster - Contains all nodes. Even across WAN Keyspace - Contains all tables. Specifies replication Table (Column Family) - Contains rows
  • 20. CQL Intro • CREATE / DROP / ALTER TABLE • SELECT ! • BUT • INSERT AND UPDATE are similar to each other • If a row doesn’t exist, UPDATE will insert it, and if it exists, INSERT will replace it. • Think of it as an UPSERT • Therefore we never get a key violation • For updates, Cassandra never reads (no col = col + 1) !20
  • 21. Data Modeling Creating Tables CREATE TABLE user (! ! username varchar,! ! firstname varchar,! ! lastname varchar,! ! shopping_carts set<varchar>,! ! PRIMARY KEY (username)! ); Collection! CREATE TABLE shopping_cart (! ! username varchar,! ! cart_name text! ! item_id int,! ! item_name varchar,! description varchar,! ! price float,! ! item_detail map<varchar,varchar>! ! PRIMARY KEY ((username,cart_name),item_id)! ); Creates compound partition row key
  • 22. CQL Inserts • Insert will always overwrite INSERT INTO users (username, firstname, lastname, ! email, password, created_date)! VALUES ('pmcfadin','Patrick','McFadin',! ['[email protected]'],'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00');! !22
  • 23. CQL Selects • No joins • Data is returned in row/column format SELECT username, firstname, lastname, ! email, password, created_date! FROM users! WHERE username = 'pmcfadin';! username | firstname | lastname | email | password | created_date! ----------+-----------+----------+--------------------------+----------------------------------+--------------------------! pmcfadin | Patrick | McFadin | ['[email protected]'] | ba27e03fd95e507daf2937c937d499ab | 2011-06-20 13:50:00-0700! !23
  • 25. Time Series Taming the beast • Peter Higgs and Francois Englert. Nobel prize for Physics • Theorized the existence of the Higgs boson ! • Found using ATLAS ! ! • Data stored in P-BEAST ! ! • Time series running on Cassandra
  • 26. Use Cassandra for time series Get a nobel prize
  • 27. Time Series Why • Storage model from BigTable is perfect • One row key and tons of (variable)columns • Single layout on disk Row Key Column Name Column Name Column Value Column Value
  • 28. Time Series Example • Storing weather data • One weather station • Temperature measurements every minute WeatherStation ID 2013-10-09 10:00 AM 2013-10-09 10:00 AM 72 Degrees 72 Degrees 2013-10-10 11:00 AM 65 Degrees
  • 29. Time Series Example • Query data • Weather Station ID = Locality of single node Date query weatherStationID = 100 AND! date = 2013-10-09 10:00 AM WeatherStation ID 2013-10-09 10:00 AM 2013-10-09 10:00 AM 100 72 Degrees 72 Degrees 2013-10-10 11:00 AM 65 Degrees OR Date Range weatherStationID = 100 AND! date > 2013-10-09 10:00 AM AND! date < 2013-10-10 11:01 AM
  • 30. Time Series How • CQL expresses this well • Data partitioned by weather station ID and time CREATE TABLE temperature (! weatherstation_id text,! event_time timestamp,! temperature text,! PRIMARY KEY (weatherstation_id,event_time)! ); ! ! ! • Easy to insert data INSERT INTO temperature(weatherstation_id,event_time,temperature) ! VALUES ('1234ABCD','2013-04-03 07:01:00','72F'); ! ! • Easy to query SELECT temperature ! FROM temperature ! WHERE weatherstation_id='1234ABCD'! AND event_time > '2013-04-03 07:01:00'! AND event_time < '2013-04-03 07:04:00';
  • 31. Time Series Further partitioning • At every minute you will eventually run out of rows • 2 billion columns per storage row • Data partitioned by weather station ID and time • Use the partition key to split things up CREATE TABLE temperature_by_day (! weatherstation_id text,! date text,! event_time timestamp,! temperature text,! PRIMARY KEY ((weatherstation_id,date),event_time)! );
  • 32. Time Series Further Partitioning • Still easy to insert ! ! INSERT INTO temperature_by_day(weatherstation_id,date,event_time,temperature) ! VALUES ('1234ABCD','2013-04-03','2013-04-03 07:01:00','72F'); ! ! • Still easy to query SELECT temperature ! FROM temperature_by_day ! WHERE weatherstation_id='1234ABCD' ! AND date='2013-04-03'! AND event_time > '2013-04-03 07:01:00'! AND event_time < '2013-04-03 07:04:00';
  • 33. Time Series Use cases • Logging • Thing Tracking (IoT) • Sensor Data • User Tracking • Fraud Detection • Nobel prizes!
  • 34. Application Example - Layout • Active-Active • Service based DNS routing Cassandra Replication !34
  • 35. Application Example - Uptime • Normal server maintenance • Application is unaware Cassandra Replication !35
  • 36. Application Example - Failure • Data center failure Another happy user! • Data is safe. Route traffic. 33 !36
  • 37. Cassandra Users and Use Cases
  • 38. Netflix! • If you haven’t heard their story… where have you been? • 18B market cap — Runs on Cassandra • User accounts • Play lists • Payments • Statistics
  • 39. Spotify • Millions of songs. Millions of users. • Playlists • 1 billion playlists • 30+ Cassandra clusters • 50+ TB of data • 40k req/sec peak http://www.slideshare.net/noaresare/cassandra-nyc !39
  • 40. Instagram(Facebook) • Loads and loads of photos. (Probably yours) • All in AWS • Security audits • News feed • 20k writes/sec. 15k reads/sec. !40
  • 41. DataStax Ac*demy for Apache Cassandra Content • First four sessions available with Weekly roll-out of 7 sessions total • Based on DataStax Community Edition • CQL, Schema Design and Data Modeling • Introduction to Cassandra Objects • First Java, then Python, C# and .NET Goals • 100,000 Registrations by the end of 2014 • 25,000 Certifications by the end of 2014 https://datastaxacademy.elogiclearning.com/ !41
  • 42. ©2013 DataStax Confidential. Do not distribute without consent. !42