SlideShare a Scribd company logo
©2013 DataStax Confidential. Do not distribute without consent.
@PatrickMcFadin
Patrick McFadin

Chief Evangelist for Apache Cassandra
Introduction to Data Modeling
with Apache Cassandra
1
My Background
…ran into this problem
Gave it my best shot
shard 1 shard 2 shard 3 shard 4
router
client
Patrick,
All your wildest
dreams will come
true.
Just add complexity!
A new plan
ACID vs CAP
ACID
CAP - Pick two
Atomic - All or none
Consistency - Only valid data is written
Isolation - One operation at a time
Durability - Once committed, it stays that way
Consistency - All data on cluster
Availability - Cluster always accepts writes
Partition tolerance - Nodes in cluster can’t talk to each other
Cassandra let’s you tune this
Relational Data Models
• 5 normal forms
• Foreign Keys
• Joins
deptId First Last
1 Edgar Codd
2 Raymond Boyce
id Dept
1 Engineering
2 Math
Employees
Department
Relational Modeling
Data
Models
Application
Cassandra Modeling
Data
Models
Application
CQL vs SQL
• No joins
• No aggregations
deptId First Last
1 Edgar Codd
2 Raymond Boyce
id Dept
1 Engineering
2 Math
Employees
Department
SELECT e.First, e.Last, d.Dept
FROM Department d, Employees e
WHERE ‘Codd’ = e.Last
AND e.deptId = d.id
Denormalization
• Combine table columns into a single view
• No joins
SELECT First, Last, Dept
FROM employees
WHERE id = ‘1’
id First Last Dept
1 Edgar Codd Engineering
2 Raymond Boyce Math
Employees
No more sequences
• Great for auto-creation of Ids
• Guaranteed unique
• Needs ACID to work. (Sorry. No sharding)
INSERT INTO user (id, firstName, LastName)
VALUES (seq.nextVal(), ‘Ted’, ‘Codd’)
No sequences???
• Almost impossible in a distributed system
• Couple of great choices
• Natural Key - Unique values like email
• Surrogate Key - UUID
• Universal Unique ID
• 128 bit number represented in character form
• Easily generated on the client
• Same as GUID for the MS folks
99051fe9-6a9c-46c2-b949-38ef78858dd0
KillrVideo.com
• Hosted on Azure
• Code on GitHub
• Also on your USB
• Data Model for examples
Entity Table
• Simple view of a single
user
• UUID used for ID
• Simple primary key // Users keyed by id
CREATE TABLE users (
userid uuid,
firstname text,
lastname text,
email text,
created_date timestamp,
PRIMARY KEY (userid)
);
SELECT firstname, lastname
FROM user
WHERE userId = 99051fe9-6a9c-46c2-b949-38ef78858dd0
CQL Collections
CQL Collections
• Meant to be dynamic part of table
• Update syntax is very different from insert
• Reads require all of collection to be read
CQL Set
• Set is sorted by CQL type comparator
INSERT INTO collections_example (id, set_example)
VALUES(1, {'1-one', '2-two'});
set_example set<text>
Collection name Collection type CQLType
CQL Set Operations
• Adding an element to the set
• After adding this element, it will sort to the beginning.
• Removing an element from the set
UPDATE collections_example
SET set_example = set_example + {'3-three'} WHERE id = 1;
UPDATE collections_example
SET set_example = set_example + {'0-zero'} WHERE id = 1;
UPDATE collections_example
SET set_example = set_example - {'3-three'} WHERE id = 1;
CQL List
• Ordered by insertion
• Use with caution
list_example list<text>
Collection name Collection type
INSERT INTO collections_example (id, list_example)
VALUES(1, ['1-one', '2-two']);
CQLType
CQL List Operations
• Adding an element to the end of a list
• Adding an element to the beginning of a list
• Deleting an element from a list
UPDATE collections_example
SET list_example = list_example + ['3-three']
WHERE id = 1;
UPDATE collections_example
SET list_example = ['0-zero'] + list_example
WHERE id = 1;
UPDATE collections_example
SET list_example = list_example - ['3-three'] WHERE id = 1;
CQL Map
• Key and value
• Key is sorted by CQL type comparator
INSERT INTO collections_example (id, map_example)
VALUES(1, { 1 : 'one', 2 : 'two' });
map_example map<int,text>
Collection name Collection type Value CQLTypeKey CQLType
CQL Map Operations
• Add an element to the map
• Update an existing element in the map
• Delete an element in the map
UPDATE collections_example
SET map_example[3] = 'three'
WHERE id = 1;
UPDATE collections_example
SET map_example[3] = 'tres'
WHERE id = 1;
DELETE map_example[3]
FROM collections_example
WHERE id = 1;
Entity with collections
• Same type of entity
• SET type for dynamic data
• tags for each video
// Videos by id
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
location text,
location_type int,
preview_image_location text,
tags set<text>,
added_date timestamp,
PRIMARY KEY (videoid)
);
Index (or lookup) tables
• Table arranged to find data
• Denormalized for speed
• Find videos for a user
// One-to-many from user point of view (lookup table)
CREATE TABLE user_videos (
userid uuid,
added_date timestamp,
videoid uuid,
name text,
preview_image_location text,
PRIMARY KEY (userid, added_date, videoid)
) WITH CLUSTERING ORDER BY (added_date DESC, videoid ASC);
Primary Key
• First column name is the Partition Key
• Subsequent are the Clustering Columns
• Videos will be ordered by added_date and
videoId per user
// One-to-many from user point of view (lookup table)
CREATE TABLE user_videos (
userid uuid,
added_date timestamp,
videoid uuid,
name text,
preview_image_location text,
PRIMARY KEY (userid, added_date, videoid)
) WITH CLUSTERING ORDER BY (added_date DESC, videoid ASC);
Primary key relationship
PRIMARY KEY (userId,added_date,videoId)
Primary key relationship
Partition Key
PRIMARY KEY (userId,added_date,videoId)
Primary key relationship
Partition Key Clustering Columns
PRIMARY KEY (userId,added_date,videoId)
Primary key relationship
Partition Key Clustering Columns
A12378E55F5A32
PRIMARY KEY (userId,added_date,videoId)
2005:12:1:102005:12:1:92005:12:1:82005:12:1:7
5F22A0BC
Primary key relationship
Partition Key Clustering Columns
F2B3652CFFB3652D7AB3652C
PRIMARY KEY (userId,added_date,videoId)
A12378E55F5A32
SELECT videoId FROM user_videos
WHERE userId = A12378E55F5A32
AND added_date = ‘2005-12-1’
AND videoId = 5F22A0BC
Clustering Order
• Clustering Columns have default order
• Use to specify order
• Bonus: Sorts on disk for speed
// One-to-many from user point of view (lookup table)
CREATE TABLE user_videos (
userid uuid,
added_date timestamp,
videoid uuid,
name text,
preview_image_location text,
PRIMARY KEY (userid, added_date, videoid)
) WITH CLUSTERING ORDER BY (added_date DESC, videoid ASC);
Multiple Lookups
• Same data
• Different lookup pattern // Index for tag keywords
CREATE TABLE videos_by_tag (
tag text,
videoid uuid,
added_date timestamp,
name text,
preview_image_location text,
tagged_date timestamp,
PRIMARY KEY (tag, videoid)
);
// Index for tags by first letter in the tag
CREATE TABLE tags_by_letter (
first_letter text,
tag text,
PRIMARY KEY (first_letter, tag)
);
Many to Many Relationships
• Two views
• Different directions
• Insert data in a batch
// Comments for a given video
CREATE TABLE comments_by_video (
videoid uuid,
commentid timeuuid,
userid uuid,
comment text,
PRIMARY KEY (videoid, commentid)
) WITH CLUSTERING ORDER BY (commentid DESC);
// Comments for a given user
CREATE TABLE comments_by_user (
userid uuid,
commentid timeuuid,
videoid uuid,
comment text,
PRIMARY KEY (userid, commentid)
) WITH CLUSTERING ORDER BY (commentid DESC);
Use Case Example
Example 1: Weather Station
• Weather station collects data
• Cassandra stores in sequence
• Application reads in sequence
Use case
• Store data per weather station
• Store time series in order: first to last
• Get all data for one weather station
• Get data for a single date and time
• Get data for a range of dates and times
Needed Queries
Data Model to support queries
Data Model
• Weather Station Id and Time
are unique
• Store as many as needed
CREATE TABLE temperature (
weather_station text,
year int,
month int,
day int,
hour int,
temperature double,
PRIMARY KEY (weather_station,year,month,day,hour)
);
INSERT INTO temperature(weather_station,year,month,day,hour,temperature)
VALUES (‘10010:99999’,2005,12,1,7,-5.6);
INSERT INTO temperature(weather_station,year,month,day,hour,temperature)
VALUES (‘10010:99999’,2005,12,1,8,-5.1);
INSERT INTO temperature(weather_station,year,month,day,hour,temperature)
VALUES (‘10010:99999’,2005,12,1,9,-4.9);
INSERT INTO temperature(weather_station,year,month,day,hour,temperature)
VALUES (‘10010:99999’,2005,12,1,10,-5.3);
Storage Model - Logical View
2005:12:1:7
-5.6
2005:12:1:8
-5.1
2005:12:1:9
-4.9
SELECT weather_station,hour,temperature
FROM temperature
WHERE weatherstation_id='10010:99999';
10010:99999
10010:99999
10010:99999
weather_station hour temperature
2005:12:1:10
-5.3
10010:99999
2005:12:1:12
-5.4
2005:12:1:11
-4.9-5.3-4.9-5.1
2005:12:1:7
-5.6
Storage Model - Disk Layout
2005:12:1:8 2005:12:1:9
10010:99999
2005:12:1:10
Merged, Sorted and Stored Sequentially
SELECT weather_station,hour,temperature
FROM temperature
WHERE weatherstation_id='10010:99999';
Query patterns
• Range queries
• “Slice” operation on disk
SELECT weatherstation,hour,temperature
FROM temperature
WHERE weatherstation=‘10010:99999'
AND year = 2005 AND month = 12 AND day = 1
AND hour >= 7 AND hour <= 10;
Single seek on disk
2005:12:1:12
-5.4
2005:12:1:11
-4.9-5.3-4.9-5.1
2005:12:1:7
-5.6
2005:12:1:8 2005:12:1:9
10010:99999
2005:12:1:10
Partition key for locality
Query patterns
• Range queries
• “Slice” operation on disk
Programmers like this
Sorted by event_time
2005:12:1:7
-5.6
2005:12:1:8
-5.1
2005:12:1:9
-4.9
10010:99999
10010:99999
10010:99999
weather_station hour temperature
2005:12:1:10
-5.3
10010:99999
SELECT weatherstation,hour,temperature
FROM temperature
WHERE weatherstation=‘10010:99999'
AND year = 2005 AND month = 12 AND day = 1
AND hour >= 7 AND hour <= 10;
Thank you!
Bring the questions
Follow me on twitter
@PatrickMcFadin

More Related Content

What's hot (20)

PDF
Owning time series with team apache Strata San Jose 2015
Patrick McFadin
 
PDF
Advanced data modeling with apache cassandra
Patrick McFadin
 
PDF
Cassandra EU - Data model on fire
Patrick McFadin
 
PDF
Introduction to cassandra 2014
Patrick McFadin
 
PDF
Real data models of silicon valley
Patrick McFadin
 
PPTX
Using Spark to Load Oracle Data into Cassandra
Jim Hatcher
 
PDF
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
DataStax Academy
 
PDF
Laying down the smack on your data pipelines
Patrick McFadin
 
PPTX
Spark Cassandra Connector: Past, Present and Furure
DataStax Academy
 
PDF
Cassandra 2.0 better, faster, stronger
Patrick McFadin
 
PDF
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
DataStax
 
PDF
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
Patrick McFadin
 
PPTX
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
StampedeCon
 
PDF
An Introduction to time series with Team Apache
Patrick McFadin
 
PDF
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
DataStax
 
PPTX
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
DataStax
 
PPTX
Spark + Cassandra = Real Time Analytics on Operational Data
Victor Coustenoble
 
PDF
Cassandra and Spark
datastaxjp
 
PDF
Introduction to Apache Cassandra
Robert Stupp
 
PDF
Cassandra Fundamentals - C* 2.0
Russell Spitzer
 
Owning time series with team apache Strata San Jose 2015
Patrick McFadin
 
Advanced data modeling with apache cassandra
Patrick McFadin
 
Cassandra EU - Data model on fire
Patrick McFadin
 
Introduction to cassandra 2014
Patrick McFadin
 
Real data models of silicon valley
Patrick McFadin
 
Using Spark to Load Oracle Data into Cassandra
Jim Hatcher
 
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
DataStax Academy
 
Laying down the smack on your data pipelines
Patrick McFadin
 
Spark Cassandra Connector: Past, Present and Furure
DataStax Academy
 
Cassandra 2.0 better, faster, stronger
Patrick McFadin
 
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
DataStax
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
Patrick McFadin
 
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
StampedeCon
 
An Introduction to time series with Team Apache
Patrick McFadin
 
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
DataStax
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
DataStax
 
Spark + Cassandra = Real Time Analytics on Operational Data
Victor Coustenoble
 
Cassandra and Spark
datastaxjp
 
Introduction to Apache Cassandra
Robert Stupp
 
Cassandra Fundamentals - C* 2.0
Russell Spitzer
 

Similar to Introduction to data modeling with apache cassandra (20)

PDF
Cassandra Day Atlanta 2015: Data Modeling 101
DataStax Academy
 
PDF
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
DataStax Academy
 
PDF
Cassandra Day London 2015: Data Modeling 101
DataStax Academy
 
PDF
Introduction to Data Modeling with Apache Cassandra
Luke Tillman
 
PDF
Apache Cassandra & Data Modeling
Massimiliano Tomassi
 
PDF
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
PDF
Cassandra Data Modeling
Ben Knear
 
PPTX
Apache Cassandra Developer Training Slide Deck
DataStax Academy
 
PDF
The data model is dead, long live the data model
Patrick McFadin
 
PDF
Cassandra Community Webinar | Become a Super Modeler
DataStax
 
PDF
Cassandra - lesson learned
Andrzej Ludwikowski
 
PDF
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax Academy
 
PDF
Become a super modeler
Patrick McFadin
 
PPTX
Apache Cassandra Data Modeling with Travis Price
DataStax Academy
 
PDF
Cassandra lesson learned - extended
Andrzej Ludwikowski
 
PDF
Oracle to Cassandra Core Concepts Guide Pt. 2
DataStax Academy
 
PPTX
Cassandra
Bang Tsui Liou
 
PDF
Cassandra for impatients
Carlos Alonso Pérez
 
PDF
Cassandra Data Modelling with CQL (OSCON 2015)
twentyideas
 
PDF
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
Tim Callaghan
 
Cassandra Day Atlanta 2015: Data Modeling 101
DataStax Academy
 
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
DataStax Academy
 
Cassandra Day London 2015: Data Modeling 101
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Luke Tillman
 
Apache Cassandra & Data Modeling
Massimiliano Tomassi
 
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Cassandra Data Modeling
Ben Knear
 
Apache Cassandra Developer Training Slide Deck
DataStax Academy
 
The data model is dead, long live the data model
Patrick McFadin
 
Cassandra Community Webinar | Become a Super Modeler
DataStax
 
Cassandra - lesson learned
Andrzej Ludwikowski
 
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax Academy
 
Become a super modeler
Patrick McFadin
 
Apache Cassandra Data Modeling with Travis Price
DataStax Academy
 
Cassandra lesson learned - extended
Andrzej Ludwikowski
 
Oracle to Cassandra Core Concepts Guide Pt. 2
DataStax Academy
 
Cassandra
Bang Tsui Liou
 
Cassandra for impatients
Carlos Alonso Pérez
 
Cassandra Data Modelling with CQL (OSCON 2015)
twentyideas
 
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
Tim Callaghan
 
Ad

More from Patrick McFadin (13)

PDF
Successful Architectures for Fast Data
Patrick McFadin
 
PDF
Open source or proprietary, choose wisely!
Patrick McFadin
 
PDF
Help! I want to contribute to an Open Source project but my boss says no.
Patrick McFadin
 
PDF
Analyzing Time Series Data with Apache Spark and Cassandra
Patrick McFadin
 
PDF
Cassandra 3.0 advanced preview
Patrick McFadin
 
PDF
Apache cassandra & apache spark for time series data
Patrick McFadin
 
PDF
Making money with open source and not losing your soul: A practical guide
Patrick McFadin
 
PDF
Building Antifragile Applications with Apache Cassandra
Patrick McFadin
 
PDF
Cassandra at scale
Patrick McFadin
 
PDF
The world's next top data model
Patrick McFadin
 
PDF
Cassandra Virtual Node talk
Patrick McFadin
 
PPT
Toronto jaspersoft meetup
Patrick McFadin
 
PDF
Cassandra data modeling talk
Patrick McFadin
 
Successful Architectures for Fast Data
Patrick McFadin
 
Open source or proprietary, choose wisely!
Patrick McFadin
 
Help! I want to contribute to an Open Source project but my boss says no.
Patrick McFadin
 
Analyzing Time Series Data with Apache Spark and Cassandra
Patrick McFadin
 
Cassandra 3.0 advanced preview
Patrick McFadin
 
Apache cassandra & apache spark for time series data
Patrick McFadin
 
Making money with open source and not losing your soul: A practical guide
Patrick McFadin
 
Building Antifragile Applications with Apache Cassandra
Patrick McFadin
 
Cassandra at scale
Patrick McFadin
 
The world's next top data model
Patrick McFadin
 
Cassandra Virtual Node talk
Patrick McFadin
 
Toronto jaspersoft meetup
Patrick McFadin
 
Cassandra data modeling talk
Patrick McFadin
 
Ad

Recently uploaded (20)

PDF
[GDGoC FPTU] Spring 2025 Summary Slidess
minhtrietgect
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PPTX
Manual Testing for Accessibility Enhancement
Julia Undeutsch
 
PDF
NASA A Researcher’s Guide to International Space Station : Earth Observations
Dr. PANKAJ DHUSSA
 
PDF
Modern Decentralized Application Architectures.pdf
Kalema Edgar
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PPTX
Role_of_Artificial_Intelligence_in_Livestock_Extension_Services.pptx
DrRajdeepMadavi
 
PDF
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PPTX
Talbott's brief History of Computers for CollabDays Hamburg 2025
Talbott Crowell
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
CapCut Pro PC Crack Latest Version Free Free
josanj305
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Evolution: How True AI is Redefining Safety in Industry 4.0
vikaassingh4433
 
[GDGoC FPTU] Spring 2025 Summary Slidess
minhtrietgect
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Manual Testing for Accessibility Enhancement
Julia Undeutsch
 
NASA A Researcher’s Guide to International Space Station : Earth Observations
Dr. PANKAJ DHUSSA
 
Modern Decentralized Application Architectures.pdf
Kalema Edgar
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Role_of_Artificial_Intelligence_in_Livestock_Extension_Services.pptx
DrRajdeepMadavi
 
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Talbott's brief History of Computers for CollabDays Hamburg 2025
Talbott Crowell
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
CapCut Pro PC Crack Latest Version Free Free
josanj305
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Evolution: How True AI is Redefining Safety in Industry 4.0
vikaassingh4433
 

Introduction to data modeling with apache cassandra

  • 1. ©2013 DataStax Confidential. Do not distribute without consent. @PatrickMcFadin Patrick McFadin
 Chief Evangelist for Apache Cassandra Introduction to Data Modeling with Apache Cassandra 1
  • 3. Gave it my best shot shard 1 shard 2 shard 3 shard 4 router client Patrick, All your wildest dreams will come true.
  • 6. ACID vs CAP ACID CAP - Pick two Atomic - All or none Consistency - Only valid data is written Isolation - One operation at a time Durability - Once committed, it stays that way Consistency - All data on cluster Availability - Cluster always accepts writes Partition tolerance - Nodes in cluster can’t talk to each other Cassandra let’s you tune this
  • 7. Relational Data Models • 5 normal forms • Foreign Keys • Joins deptId First Last 1 Edgar Codd 2 Raymond Boyce id Dept 1 Engineering 2 Math Employees Department
  • 10. CQL vs SQL • No joins • No aggregations deptId First Last 1 Edgar Codd 2 Raymond Boyce id Dept 1 Engineering 2 Math Employees Department SELECT e.First, e.Last, d.Dept FROM Department d, Employees e WHERE ‘Codd’ = e.Last AND e.deptId = d.id
  • 11. Denormalization • Combine table columns into a single view • No joins SELECT First, Last, Dept FROM employees WHERE id = ‘1’ id First Last Dept 1 Edgar Codd Engineering 2 Raymond Boyce Math Employees
  • 12. No more sequences • Great for auto-creation of Ids • Guaranteed unique • Needs ACID to work. (Sorry. No sharding) INSERT INTO user (id, firstName, LastName) VALUES (seq.nextVal(), ‘Ted’, ‘Codd’)
  • 13. No sequences??? • Almost impossible in a distributed system • Couple of great choices • Natural Key - Unique values like email • Surrogate Key - UUID • Universal Unique ID • 128 bit number represented in character form • Easily generated on the client • Same as GUID for the MS folks 99051fe9-6a9c-46c2-b949-38ef78858dd0
  • 14. KillrVideo.com • Hosted on Azure • Code on GitHub • Also on your USB • Data Model for examples
  • 15. Entity Table • Simple view of a single user • UUID used for ID • Simple primary key // Users keyed by id CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, created_date timestamp, PRIMARY KEY (userid) ); SELECT firstname, lastname FROM user WHERE userId = 99051fe9-6a9c-46c2-b949-38ef78858dd0
  • 17. CQL Collections • Meant to be dynamic part of table • Update syntax is very different from insert • Reads require all of collection to be read
  • 18. CQL Set • Set is sorted by CQL type comparator INSERT INTO collections_example (id, set_example) VALUES(1, {'1-one', '2-two'}); set_example set<text> Collection name Collection type CQLType
  • 19. CQL Set Operations • Adding an element to the set • After adding this element, it will sort to the beginning. • Removing an element from the set UPDATE collections_example SET set_example = set_example + {'3-three'} WHERE id = 1; UPDATE collections_example SET set_example = set_example + {'0-zero'} WHERE id = 1; UPDATE collections_example SET set_example = set_example - {'3-three'} WHERE id = 1;
  • 20. CQL List • Ordered by insertion • Use with caution list_example list<text> Collection name Collection type INSERT INTO collections_example (id, list_example) VALUES(1, ['1-one', '2-two']); CQLType
  • 21. CQL List Operations • Adding an element to the end of a list • Adding an element to the beginning of a list • Deleting an element from a list UPDATE collections_example SET list_example = list_example + ['3-three'] WHERE id = 1; UPDATE collections_example SET list_example = ['0-zero'] + list_example WHERE id = 1; UPDATE collections_example SET list_example = list_example - ['3-three'] WHERE id = 1;
  • 22. CQL Map • Key and value • Key is sorted by CQL type comparator INSERT INTO collections_example (id, map_example) VALUES(1, { 1 : 'one', 2 : 'two' }); map_example map<int,text> Collection name Collection type Value CQLTypeKey CQLType
  • 23. CQL Map Operations • Add an element to the map • Update an existing element in the map • Delete an element in the map UPDATE collections_example SET map_example[3] = 'three' WHERE id = 1; UPDATE collections_example SET map_example[3] = 'tres' WHERE id = 1; DELETE map_example[3] FROM collections_example WHERE id = 1;
  • 24. Entity with collections • Same type of entity • SET type for dynamic data • tags for each video // Videos by id CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) );
  • 25. Index (or lookup) tables • Table arranged to find data • Denormalized for speed • Find videos for a user // One-to-many from user point of view (lookup table) CREATE TABLE user_videos ( userid uuid, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (userid, added_date, videoid) ) WITH CLUSTERING ORDER BY (added_date DESC, videoid ASC);
  • 26. Primary Key • First column name is the Partition Key • Subsequent are the Clustering Columns • Videos will be ordered by added_date and videoId per user // One-to-many from user point of view (lookup table) CREATE TABLE user_videos ( userid uuid, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (userid, added_date, videoid) ) WITH CLUSTERING ORDER BY (added_date DESC, videoid ASC);
  • 27. Primary key relationship PRIMARY KEY (userId,added_date,videoId)
  • 28. Primary key relationship Partition Key PRIMARY KEY (userId,added_date,videoId)
  • 29. Primary key relationship Partition Key Clustering Columns PRIMARY KEY (userId,added_date,videoId)
  • 30. Primary key relationship Partition Key Clustering Columns A12378E55F5A32 PRIMARY KEY (userId,added_date,videoId)
  • 31. 2005:12:1:102005:12:1:92005:12:1:82005:12:1:7 5F22A0BC Primary key relationship Partition Key Clustering Columns F2B3652CFFB3652D7AB3652C PRIMARY KEY (userId,added_date,videoId) A12378E55F5A32 SELECT videoId FROM user_videos WHERE userId = A12378E55F5A32 AND added_date = ‘2005-12-1’ AND videoId = 5F22A0BC
  • 32. Clustering Order • Clustering Columns have default order • Use to specify order • Bonus: Sorts on disk for speed // One-to-many from user point of view (lookup table) CREATE TABLE user_videos ( userid uuid, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (userid, added_date, videoid) ) WITH CLUSTERING ORDER BY (added_date DESC, videoid ASC);
  • 33. Multiple Lookups • Same data • Different lookup pattern // Index for tag keywords CREATE TABLE videos_by_tag ( tag text, videoid uuid, added_date timestamp, name text, preview_image_location text, tagged_date timestamp, PRIMARY KEY (tag, videoid) ); // Index for tags by first letter in the tag CREATE TABLE tags_by_letter ( first_letter text, tag text, PRIMARY KEY (first_letter, tag) );
  • 34. Many to Many Relationships • Two views • Different directions • Insert data in a batch // Comments for a given video CREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC); // Comments for a given user CREATE TABLE comments_by_user ( userid uuid, commentid timeuuid, videoid uuid, comment text, PRIMARY KEY (userid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC);
  • 36. Example 1: Weather Station • Weather station collects data • Cassandra stores in sequence • Application reads in sequence
  • 37. Use case • Store data per weather station • Store time series in order: first to last • Get all data for one weather station • Get data for a single date and time • Get data for a range of dates and times Needed Queries Data Model to support queries
  • 38. Data Model • Weather Station Id and Time are unique • Store as many as needed CREATE TABLE temperature ( weather_station text, year int, month int, day int, hour int, temperature double, PRIMARY KEY (weather_station,year,month,day,hour) ); INSERT INTO temperature(weather_station,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,7,-5.6); INSERT INTO temperature(weather_station,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,8,-5.1); INSERT INTO temperature(weather_station,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,9,-4.9); INSERT INTO temperature(weather_station,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,10,-5.3);
  • 39. Storage Model - Logical View 2005:12:1:7 -5.6 2005:12:1:8 -5.1 2005:12:1:9 -4.9 SELECT weather_station,hour,temperature FROM temperature WHERE weatherstation_id='10010:99999'; 10010:99999 10010:99999 10010:99999 weather_station hour temperature 2005:12:1:10 -5.3 10010:99999
  • 40. 2005:12:1:12 -5.4 2005:12:1:11 -4.9-5.3-4.9-5.1 2005:12:1:7 -5.6 Storage Model - Disk Layout 2005:12:1:8 2005:12:1:9 10010:99999 2005:12:1:10 Merged, Sorted and Stored Sequentially SELECT weather_station,hour,temperature FROM temperature WHERE weatherstation_id='10010:99999';
  • 41. Query patterns • Range queries • “Slice” operation on disk SELECT weatherstation,hour,temperature FROM temperature WHERE weatherstation=‘10010:99999' AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10; Single seek on disk 2005:12:1:12 -5.4 2005:12:1:11 -4.9-5.3-4.9-5.1 2005:12:1:7 -5.6 2005:12:1:8 2005:12:1:9 10010:99999 2005:12:1:10 Partition key for locality
  • 42. Query patterns • Range queries • “Slice” operation on disk Programmers like this Sorted by event_time 2005:12:1:7 -5.6 2005:12:1:8 -5.1 2005:12:1:9 -4.9 10010:99999 10010:99999 10010:99999 weather_station hour temperature 2005:12:1:10 -5.3 10010:99999 SELECT weatherstation,hour,temperature FROM temperature WHERE weatherstation=‘10010:99999' AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10;
  • 43. Thank you! Bring the questions Follow me on twitter @PatrickMcFadin