SlideShare a Scribd company logo
Data Modeling Basics for the Cloud
Robert Stupp
Solutions Architect @ DataStax – Committer to Apache Cassandra
Data Modeling for the Cloud
DSE is the database
for the cloud
1.Always On
2.Instantaneously
Responsive
3.Numerous Endpoints
4.Geographically Distributed
5.Predictively Scalable
© 2016 DataStax, All Rights Reserved. 2
CC BY 2.0, by Blake Patterson on Flickr
100000 transactions
per second
200000 transactions
per second
Application
Replication Factor 3
Eventual Consistency
© DataStax, All Rights Reserved. 3
… is not hopefully consistent
Some data
Some dataSome data
Consistency Level:
ONE
Application
UP
Replication Factor 3
Quorum Consistency
© DataStax, All Rights Reserved. 4
Some data
Some dataSome data
Consistency Level:
QUORUM
DOWN
DSE / Cassandra NodeApplication
Write Path
© DataStax, All Rights Reserved. 5
Memtable
Commit
Log
Files
SSTable
Some data
Some data
SSTable
SSTable SSTable
SSTable SSTable
Some data
Some data
Some data
Some data
Some data
Compaction
© DataStax, All Rights Reserved. 6
SSTable SSTable SSTable SSTable
SSTable
Compaction Strategies
• Size Tiered
• Leveled
• Date Tiered
© DataStax, All Rights Reserved. 7
Data Organization in DSE / Cassandra
Partition
Device ID Timestamp Temperature Humidity
01-32483-17383 2016-04-19 14:00 22 70
01-32483-17383 2016-04-19 15:00 21.5 65
01-32483-17383 2016-04-19 16:00 23.0 70
Partition
Key
Clustering
Key
Columns
Primary Key
Device ID Timestamp
01-32483-17383 2016-04-19 14:00
01-32483-17383 2016-04-19 15:00
01-32483-17383 2016-04-19 16:00
Device ID
01-32483-17383
01-32483-17383
01-32483-17383
Data Modeling 101
1. Understand your data
Conceptual data modeling
2. Collect queries
Understand your application
3. Model according to queries
Logical data modeling
4. Apply optimizations
Physical data modeling
© DataStax, All Rights Reserved. 9
Query driven modeling
1. Collect your use cases
2. Extract queries
3. Model your tables
© DataStax, All Rights Reserved. 10
Queries, yes
SELECT
timestamp, temperature, humidity
FROM
sensor_data
WHERE
sensor_id = ’01-32483-17383’
© 2016 DataStax, All Rights Reserved. 11
Always include the
Partition Key
Some standard use-cases
• Customer registration
• Customer login
• Delivery addresses
© DataStax, All Rights Reserved. 12
Customer registration
1. Check if customer exists
query by username
© 2016 DataStax, All Rights Reserved. 13
CREATE TABLE customers (
username text PRIMARY KEY,
password_hash text,
first_name text,
last_name text,
email text
);
SELECT username FROM customers WHERE username = ?
Customer login by username
1. Check if user exists and password matches
query by username
© 2016 DataStax, All Rights Reserved. 14
CREATE TABLE customers (
username text PRIMARY KEY,
password_hash text,
first_name text,
last_name text,
email text
);
SELECT password_hash FROM customers WHERE username = ?
Customer login by email
1. Check if user exists and password matches
query by email
© 2016 DataStax, All Rights Reserved. 15
CREATE TABLE customers (
username text PRIMARY KEY,
password_hash text,
first_name text,
last_name text,
email text
);
SELECT password_hash FROM customers WHERE email = ?
InvalidRequest: code=2200 [Invalid query]
message="Cannot execute this query as it might
involve data filtering and
thus may have unpredictable performance.
Customer login by email
1. Check if user exists and password matches
query by email
© 2016 DataStax, All Rights Reserved. 16
CREATE TABLE customers_by_email (
email text PRIMARY KEY,
password_hash text,
first_name text,
last_name text,
username text
);
SELECT password_hash FROM customers_by_email WHERE email = ?
This works
Modeling delivery addresses
© 2016 DataStax, All Rights Reserved. 17
CREATE TABLE customer_addresses (
username text,
address_type text,
street text,
zip text,
city text,
PRIMARY KEY ( username, address_type )
);
SELECT street,zip,city FROM customer_addresses WHERE username = ?;
SELECT street,zip,city FROM customer_addresses
WHERE username = ? AND address_type = ?;
Modeling delivery addresses
1. Print delivery address label
query by user by user name
query delivery address by user and type
© 2016 DataStax, All Rights Reserved. 18
SELECT first_name, last_name FROM customers WHERE username = ?;
SELECT street,zip,city FROM customer_addresses
WHERE username = ? AND address_type = ?;
This works,
But it’s not great.
Modeling delivery addresses
© 2016 DataStax, All Rights Reserved. 19
CREATE TYPE delivery_address (
street text,
zip text,
city text);
Just 1 read
CREATE TABLE customers (
username text PRIMARY KEY,
password_hash text,
first_name text,
last_name text,
email text,
delivery_addrs map < text, frozen < delivery_address > >
);
SELECT first_name, last_name, delivery_addrs
FROM customers WHERE username = ?;
Customer registration – the problem
SELECT username FROM customers
WHERE username = ?
(no results)
© 2016 DataStax, All Rights Reserved. 20
SELECT username FROM customers
WHERE username = ?
(no results)
INSERT INTO customers
(username, first_name, last_name)
VALUES
(‘snazy’, ‘Robert’, ‘Stupp’)
(success) INSERT INTO customers
(username, first_name, last_name)
VALUES
(‘snazy’, ‘Not’, ‘Robert’)
(success)
This one wins
This one gets
overwritten
Customer registration – the solution
SELECT username FROM customers
WHERE username = ?
(no results)
© 2016 DataStax, All Rights Reserved. 21
SELECT username FROM customers
WHERE username = ?
(no results)
INSERT INTO customers …
IF NOT EXISTS
 [applied] = true
INSERT INTO customers …
IF NOT EXISTS
 [applied] = false
Sorry, dude
OK
Customer registration – the even better
solution
© 2016 DataStax, All Rights Reserved. 22
INSERT INTO customers …
IF NOT EXISTS
 [applied] = true
INSERT INTO customers …
IF NOT EXISTS
 [applied] = false
Sorry, dude
OK
Customer login by email – w/ DSE 5.0
1. Check if user exists and password matches
query by email
© 2016 DataStax, All Rights Reserved. 23
CREATE TABLE customers (
username text PRIMARY KEY,
password_hash text,
first_name text,
last_name text,
email text
);
CREATE MATERIALIZED VIEW customers_by_email AS
SELECT email, username, first_name, last_name, password_hash
FROM customers
WHERE email IS NOT NULL
PRIMARY KEY ( email, username );
SELECT password_hash FROM customers_by_email WHERE email = ?;
May the node
be with you!
Robert Stupp Solutions Architect @ DataStax
robert.stupp@datastax.com Committer to Apache Cassandra
@snazy

More Related Content

What's hot (20)

PDF
Reltio: Powering Enterprise Data-driven Applications with Cassandra
DataStax Academy
 
PDF
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
DataStax
 
PPTX
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
DataStax
 
PPTX
Transforms Document Management at Scale with Distributed Database Solution wi...
DataStax Academy
 
PPTX
How much money do you lose every time your ecommerce site goes down?
DataStax
 
PPTX
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
DataStax
 
PPTX
From PoCs to Production
DataStax
 
PPTX
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
DataStax
 
PPT
Reporting from the Trenches: Intuit & Cassandra
DataStax
 
PDF
Building a Digital Bank
DataStax
 
PDF
Real-time personal trainer on the SMACK stack
Anirvan Chakraborty
 
PPTX
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
DataStax
 
PPTX
Webinar | Introducing DataStax Enterprise 4.6
DataStax
 
PPTX
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
DataStax
 
PPTX
How jKool Analyzes Streaming Data in Real Time with DataStax
DataStax
 
PPTX
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Fwdays
 
PPTX
Making Every Drop Count: How i20 Addresses the Water Crisis with the IoT and ...
DataStax
 
PPTX
Design Patterns for Building 360-degree Views with HBase and Kiji
HBaseCon
 
PDF
Managing Cassandra Databases with OpenStack Trove
Tesora
 
PPTX
How to Successfully Visualize DSE Graph data
DataStax
 
Reltio: Powering Enterprise Data-driven Applications with Cassandra
DataStax Academy
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
DataStax
 
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
DataStax
 
Transforms Document Management at Scale with Distributed Database Solution wi...
DataStax Academy
 
How much money do you lose every time your ecommerce site goes down?
DataStax
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
DataStax
 
From PoCs to Production
DataStax
 
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
DataStax
 
Reporting from the Trenches: Intuit & Cassandra
DataStax
 
Building a Digital Bank
DataStax
 
Real-time personal trainer on the SMACK stack
Anirvan Chakraborty
 
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
DataStax
 
Webinar | Introducing DataStax Enterprise 4.6
DataStax
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
DataStax
 
How jKool Analyzes Streaming Data in Real Time with DataStax
DataStax
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Fwdays
 
Making Every Drop Count: How i20 Addresses the Water Crisis with the IoT and ...
DataStax
 
Design Patterns for Building 360-degree Views with HBase and Kiji
HBaseCon
 
Managing Cassandra Databases with OpenStack Trove
Tesora
 
How to Successfully Visualize DSE Graph data
DataStax
 

Viewers also liked (12)

PDF
EU: Vaccines For Human Medicine - Market Report. Analysis And Forecast To 2020
IndexBox Marketing
 
PDF
20161229內政部:「中華民國與聖克里斯多福及尼維斯聯邦警政合作協定」
R.O.C.Executive Yuan
 
PDF
A-N Certificates
Ahmed Nabil
 
PPTX
行政院簡報 科技部:科技發展施政成果
releaseey
 
PPTX
Diabetes mellitus tipo 2 genetica
Catherin_Chango
 
PDF
Boomerang - Social Listening
Vũ Văn Hiển
 
PPT
Fitsiou panagiota mobile psychiatric unit 2012
Θεόδωρος Γκιώσης
 
PPTX
105年度長照電話民調結果摘要
R.O.C.Ministry of Health and Welfare
 
PDF
20151111衛環及財政委員會第1次聯席會議 - 菸品健康福利捐分配及運用辦法
R.O.C.Ministry of Health and Welfare
 
PPTX
Exodus Lessons Learned H4Dip Stanford 2016
Stanford University
 
EU: Vaccines For Human Medicine - Market Report. Analysis And Forecast To 2020
IndexBox Marketing
 
20161229內政部:「中華民國與聖克里斯多福及尼維斯聯邦警政合作協定」
R.O.C.Executive Yuan
 
A-N Certificates
Ahmed Nabil
 
行政院簡報 科技部:科技發展施政成果
releaseey
 
Diabetes mellitus tipo 2 genetica
Catherin_Chango
 
Boomerang - Social Listening
Vũ Văn Hiển
 
Fitsiou panagiota mobile psychiatric unit 2012
Θεόδωρος Γκιώσης
 
105年度長照電話民調結果摘要
R.O.C.Ministry of Health and Welfare
 
20151111衛環及財政委員會第1次聯席會議 - 菸品健康福利捐分配及運用辦法
R.O.C.Ministry of Health and Welfare
 
Exodus Lessons Learned H4Dip Stanford 2016
Stanford University
 
Ad

Similar to Data Modeling Basics for the Cloud with DataStax (20)

PPTX
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
DataStax
 
PDF
Cassandra Community Webinar | Become a Super Modeler
DataStax
 
PPTX
An in Depth Journey into Odoo's ORM
Odoo
 
PDF
LJC Conference 2014 Cassandra for Java Developers
Christopher Batey
 
PDF
Cassandra introduction 2016
Duyhai Doan
 
PPTX
Behind the scenes data engineering
Else de boer
 
PDF
Become a super modeler
Patrick McFadin
 
PDF
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
DataStax Academy
 
PDF
State of Cassandra 2012
jbellis
 
PPTX
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
DataStax
 
PDF
Datastax day 2016 introduction to apache cassandra
Duyhai Doan
 
PPT
SQL structure query language full presentation
JKarthickMyilvahanan
 
PDF
DataStax: Datastax Enterprise - The Multi-Model Platform
DataStax Academy
 
PDF
Database Design Project-Oracle 11g
Sunny U Okoro
 
PDF
The world's next top data model
Patrick McFadin
 
PPT
Toronto jaspersoft meetup
Patrick McFadin
 
PDF
Cassandra Community Webinar | The World's Next Top Data Model
DataStax
 
PDF
Macy's: Changing Engines in Mid-Flight
DataStax Academy
 
PDF
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Data Con LA
 
PDF
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
DataStax
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
DataStax
 
Cassandra Community Webinar | Become a Super Modeler
DataStax
 
An in Depth Journey into Odoo's ORM
Odoo
 
LJC Conference 2014 Cassandra for Java Developers
Christopher Batey
 
Cassandra introduction 2016
Duyhai Doan
 
Behind the scenes data engineering
Else de boer
 
Become a super modeler
Patrick McFadin
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
DataStax Academy
 
State of Cassandra 2012
jbellis
 
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
DataStax
 
Datastax day 2016 introduction to apache cassandra
Duyhai Doan
 
SQL structure query language full presentation
JKarthickMyilvahanan
 
DataStax: Datastax Enterprise - The Multi-Model Platform
DataStax Academy
 
Database Design Project-Oracle 11g
Sunny U Okoro
 
The world's next top data model
Patrick McFadin
 
Toronto jaspersoft meetup
Patrick McFadin
 
Cassandra Community Webinar | The World's Next Top Data Model
DataStax
 
Macy's: Changing Engines in Mid-Flight
DataStax Academy
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Data Con LA
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
DataStax
 
Ad

More from DataStax (20)

PPTX
Is Your Enterprise Ready to Shine This Holiday Season?
DataStax
 
PPTX
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
DataStax
 
PPTX
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
DataStax
 
PPTX
Best Practices for Getting to Production with DataStax Enterprise Graph
DataStax
 
PPTX
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
DataStax
 
PPTX
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
DataStax
 
PDF
Webinar | Better Together: Apache Cassandra and Apache Kafka
DataStax
 
PDF
Introduction to Apache Cassandra™ + What’s New in 4.0
DataStax
 
PPTX
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
DataStax
 
PPTX
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
DataStax
 
PDF
Designing a Distributed Cloud Database for Dummies
DataStax
 
PDF
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
DataStax
 
PDF
How to Evaluate Cloud Databases for eCommerce
DataStax
 
PPTX
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
DataStax
 
PPTX
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
DataStax
 
PPTX
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
DataStax
 
PPTX
Datastax - The Architect's guide to customer experience (CX)
DataStax
 
PPTX
An Operational Data Layer is Critical for Transformative Banking Applications
DataStax
 
PPTX
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
DataStax
 
PPTX
Innovation Around Data and AI for Fraud Detection
DataStax
 
Is Your Enterprise Ready to Shine This Holiday Season?
DataStax
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
DataStax
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
DataStax
 
Best Practices for Getting to Production with DataStax Enterprise Graph
DataStax
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
DataStax
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
DataStax
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
DataStax
 
Introduction to Apache Cassandra™ + What’s New in 4.0
DataStax
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
DataStax
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
DataStax
 
Designing a Distributed Cloud Database for Dummies
DataStax
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
DataStax
 
How to Evaluate Cloud Databases for eCommerce
DataStax
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
DataStax
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
DataStax
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
DataStax
 
Datastax - The Architect's guide to customer experience (CX)
DataStax
 
An Operational Data Layer is Critical for Transformative Banking Applications
DataStax
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
DataStax
 
Innovation Around Data and AI for Fraud Detection
DataStax
 

Recently uploaded (20)

PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Digital Circuits, important subject in CS
contactparinay1
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 

Data Modeling Basics for the Cloud with DataStax

  • 1. Data Modeling Basics for the Cloud Robert Stupp Solutions Architect @ DataStax – Committer to Apache Cassandra
  • 2. Data Modeling for the Cloud DSE is the database for the cloud 1.Always On 2.Instantaneously Responsive 3.Numerous Endpoints 4.Geographically Distributed 5.Predictively Scalable © 2016 DataStax, All Rights Reserved. 2 CC BY 2.0, by Blake Patterson on Flickr 100000 transactions per second 200000 transactions per second
  • 3. Application Replication Factor 3 Eventual Consistency © DataStax, All Rights Reserved. 3 … is not hopefully consistent Some data Some dataSome data Consistency Level: ONE
  • 4. Application UP Replication Factor 3 Quorum Consistency © DataStax, All Rights Reserved. 4 Some data Some dataSome data Consistency Level: QUORUM DOWN
  • 5. DSE / Cassandra NodeApplication Write Path © DataStax, All Rights Reserved. 5 Memtable Commit Log Files SSTable Some data Some data SSTable SSTable SSTable SSTable SSTable Some data Some data Some data Some data Some data
  • 6. Compaction © DataStax, All Rights Reserved. 6 SSTable SSTable SSTable SSTable SSTable
  • 7. Compaction Strategies • Size Tiered • Leveled • Date Tiered © DataStax, All Rights Reserved. 7
  • 8. Data Organization in DSE / Cassandra Partition Device ID Timestamp Temperature Humidity 01-32483-17383 2016-04-19 14:00 22 70 01-32483-17383 2016-04-19 15:00 21.5 65 01-32483-17383 2016-04-19 16:00 23.0 70 Partition Key Clustering Key Columns Primary Key Device ID Timestamp 01-32483-17383 2016-04-19 14:00 01-32483-17383 2016-04-19 15:00 01-32483-17383 2016-04-19 16:00 Device ID 01-32483-17383 01-32483-17383 01-32483-17383
  • 9. Data Modeling 101 1. Understand your data Conceptual data modeling 2. Collect queries Understand your application 3. Model according to queries Logical data modeling 4. Apply optimizations Physical data modeling © DataStax, All Rights Reserved. 9
  • 10. Query driven modeling 1. Collect your use cases 2. Extract queries 3. Model your tables © DataStax, All Rights Reserved. 10
  • 11. Queries, yes SELECT timestamp, temperature, humidity FROM sensor_data WHERE sensor_id = ’01-32483-17383’ © 2016 DataStax, All Rights Reserved. 11 Always include the Partition Key
  • 12. Some standard use-cases • Customer registration • Customer login • Delivery addresses © DataStax, All Rights Reserved. 12
  • 13. Customer registration 1. Check if customer exists query by username © 2016 DataStax, All Rights Reserved. 13 CREATE TABLE customers ( username text PRIMARY KEY, password_hash text, first_name text, last_name text, email text ); SELECT username FROM customers WHERE username = ?
  • 14. Customer login by username 1. Check if user exists and password matches query by username © 2016 DataStax, All Rights Reserved. 14 CREATE TABLE customers ( username text PRIMARY KEY, password_hash text, first_name text, last_name text, email text ); SELECT password_hash FROM customers WHERE username = ?
  • 15. Customer login by email 1. Check if user exists and password matches query by email © 2016 DataStax, All Rights Reserved. 15 CREATE TABLE customers ( username text PRIMARY KEY, password_hash text, first_name text, last_name text, email text ); SELECT password_hash FROM customers WHERE email = ? InvalidRequest: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance.
  • 16. Customer login by email 1. Check if user exists and password matches query by email © 2016 DataStax, All Rights Reserved. 16 CREATE TABLE customers_by_email ( email text PRIMARY KEY, password_hash text, first_name text, last_name text, username text ); SELECT password_hash FROM customers_by_email WHERE email = ? This works
  • 17. Modeling delivery addresses © 2016 DataStax, All Rights Reserved. 17 CREATE TABLE customer_addresses ( username text, address_type text, street text, zip text, city text, PRIMARY KEY ( username, address_type ) ); SELECT street,zip,city FROM customer_addresses WHERE username = ?; SELECT street,zip,city FROM customer_addresses WHERE username = ? AND address_type = ?;
  • 18. Modeling delivery addresses 1. Print delivery address label query by user by user name query delivery address by user and type © 2016 DataStax, All Rights Reserved. 18 SELECT first_name, last_name FROM customers WHERE username = ?; SELECT street,zip,city FROM customer_addresses WHERE username = ? AND address_type = ?; This works, But it’s not great.
  • 19. Modeling delivery addresses © 2016 DataStax, All Rights Reserved. 19 CREATE TYPE delivery_address ( street text, zip text, city text); Just 1 read CREATE TABLE customers ( username text PRIMARY KEY, password_hash text, first_name text, last_name text, email text, delivery_addrs map < text, frozen < delivery_address > > ); SELECT first_name, last_name, delivery_addrs FROM customers WHERE username = ?;
  • 20. Customer registration – the problem SELECT username FROM customers WHERE username = ? (no results) © 2016 DataStax, All Rights Reserved. 20 SELECT username FROM customers WHERE username = ? (no results) INSERT INTO customers (username, first_name, last_name) VALUES (‘snazy’, ‘Robert’, ‘Stupp’) (success) INSERT INTO customers (username, first_name, last_name) VALUES (‘snazy’, ‘Not’, ‘Robert’) (success) This one wins This one gets overwritten
  • 21. Customer registration – the solution SELECT username FROM customers WHERE username = ? (no results) © 2016 DataStax, All Rights Reserved. 21 SELECT username FROM customers WHERE username = ? (no results) INSERT INTO customers … IF NOT EXISTS  [applied] = true INSERT INTO customers … IF NOT EXISTS  [applied] = false Sorry, dude OK
  • 22. Customer registration – the even better solution © 2016 DataStax, All Rights Reserved. 22 INSERT INTO customers … IF NOT EXISTS  [applied] = true INSERT INTO customers … IF NOT EXISTS  [applied] = false Sorry, dude OK
  • 23. Customer login by email – w/ DSE 5.0 1. Check if user exists and password matches query by email © 2016 DataStax, All Rights Reserved. 23 CREATE TABLE customers ( username text PRIMARY KEY, password_hash text, first_name text, last_name text, email text ); CREATE MATERIALIZED VIEW customers_by_email AS SELECT email, username, first_name, last_name, password_hash FROM customers WHERE email IS NOT NULL PRIMARY KEY ( email, username ); SELECT password_hash FROM customers_by_email WHERE email = ?;
  • 24. May the node be with you! Robert Stupp Solutions Architect @ DataStax [email protected] Committer to Apache Cassandra @snazy

Editor's Notes

  • #3: Frankly, "the cloud" started with... the iPhone Think the "cloud way" Nothing worse than customers not reaching your service –> lose money Users’ apps are always on – so should your database Answers must come really quick - https://www.nngroup.com/articles/website-response-times/ 0.1 seconds gives the feeling of instantaneous response 1 second keeps the user's flow of thought seamless 10 seconds keeps the user's attention Amount of different devices Netflix example Network latency – bring the data to the users - http://www.verizonenterprise.com/about/network/latency/ 45ms within US 30ms within Europe 90ms London – New York 160ms Trans Pacific 250ms Europe – Asia CANNOT BEAT THAT – IT’S BARE PHYSICS Add more nodes for more transactions , more data
  • #4: EC means: There is a time gap between the first write until the data is available on all replicas MENTION: Replication to other data center
  • #5: QUORUM means MAJORITY MAJORITY of 3 is 2 AFTER: Mention LOCAL_ONE, LOCAL_QUORUM, TWO, THREE, ALL, EACH_QUORUM
  • #6: 1. DSE write path 2. DSE node 3. Memtable 4. Commit Log 5. Application want to write some data 6. ... goes to the memtable 7. ... written to CL (node restart) --------- 8. (hint: SSTables) 9. much data written over time --> memtable grows 10. memtable flushed to SSTable 11. more sstables (ANIMATED!)
  • #7: Take some similar sized SSTables and compact them to one, bigger one That’s STCS
  • #8: STCS – size tiered compaction strategy Default Multiple, similar sized SSTables compacted to one LCS – leveled compaction strategy Many writes to same partitions Works fine with SSDs DTCS – date tiered compaction strategy Time series data TTL’d data never overwritten Old SSTables can just be dropped
  • #9: MENTION: Keyspace, Table Partition Key: determines the replica nodes Clustering Key: identifies the CQL row in the partition MENTION: Size restrictions
  • #10: 1) “Logical” means: Entities Relations between entities 2) When you know what you ask for, you know: your queries the workflow of your application the data you really need 3) Combine conceptual model and queries Declare tables and their keys Add additional views to tables 4) Depending on the workload Add bucketing (split partitions logically) Choose the “right” compaction strategy Consider TTLs
  • #13: Guide through some standard use cases
  • #16: Partition key not included --> does not know the nodes to ask
  • #17: Needs two writes: - to "customers" table - to "customers_by_email" table
  • #18: ( THE NAIVE, RELATIONAL WAY ) - query all addresses - query address by user and type Access by partition key --> fine
  • #19: That’s relational That’s client side join
  • #20: EXPLAIN : UDTs EXPLAIN : collections MENTION : frozen Just ONE read - not TWO reads as before
  • #21: Registration pre-check - THE NAIVE WAY CLASSICAL RACE CONDITION
  • #22: LWT MENTION: Expensive Paxos
  • #23: Pre-checks w/ read not necessary
  • #24: RECALL: the customers table RECALL: the customers_by_email table