SlideShare a Scribd company logo
C* Keys: Partitioning, Clustering, & CrossFit
Adam Hutson - Data Architect, DataScale Inc.
© DataStax, All Rights Reserved.
Who am I & What do we do?
2
Adam Hutson
Data Architect @ DataScale -> www.datascale.io
DataStax MVP for Apache Cassandra
DataScale provides hosted data platforms as a service
Offering Cassandra & Spark, with more to come
Currently hosted in Amazon & Azure
Overview
© DataStax, All Rights Reserved.
1 Why
2 Partition
3 Partition Key
4 Composite Partition Key
5 Clustering Columns
4
© DataStax, All Rights Reserved.
Why give this presentation?
Partitioning & Clustering should be the foundation.
Too often glossed over.
Has the biggest impact to performance of the cluster
5
Partition
© DataStax, All Rights Reserved.
Partition Explained
• Token values can range from -263 to 263-1.
• Nodes in the cluster/ring are assigned a single
token.
• A node is responsible for the token value and
expands to the previous node’s token.
• A Partitioner decides where a partition key
maps onto the cluster/ring.
7
Node #3 is responsible for tokens
from -1844674407370955162
to -5534023222112865485
© DataStax, All Rights Reserved.
Partition Explained
8
Partition Key
© DataStax, All Rights Reserved.
Partition Key Explained
The Partition Key is:
• responsible for distribution of data amongst the nodes
• the first column defined in the PRIMARY KEY
10
© DataStax, All Rights Reserved.
Partition Key Explained
11
© DataStax, All Rights Reserved.
Partition Key Explained
12
Composite Partition Key
© DataStax, All Rights Reserved.
Composite Partition Key Explained
Using multiple columns for the token hash value.
14
© DataStax, All Rights Reserved.
Composite Partition Key Explained
15
© DataStax, All Rights Reserved.
Composite Partition Key Explained
16
Clustering Columns
© DataStax, All Rights Reserved.
Clustering Columns Explained
Clustering Columns are:
• responsible for sorting within the partition
• any column added to the Primary Key, past
the first column
18
© DataStax, All Rights Reserved.
Clustering Columns Explained
Can be used for Hierarchical structured data.
19
© DataStax, All Rights Reserved.
Clustering Columns Explained
Can be used for Time Series structured data.
CREATE TABLE member_log
( member text,
workout_date timestamp,
workout_duration text,
PRIMARY KEY (member, workout_date)
) WITH CLUSTERING ORDER BY (workout_date DESC);
20
© DataStax, All Rights Reserved.
Clustering Columns Explained
21
Thank You!
Questions?
Adam Hutson @AdamHutson
adam@datascale.io @DataScaleInc

More Related Content

What's hot (20)

PPTX
DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...
DataStax
 
PDF
Can My Inventory Survive Eventual Consistency?
DataStax
 
PDF
Aleksejs Nemirovskis - Manage your data using oracle BDA
Andrejs Vorobjovs
 
PDF
Unleash the power of Azure Data Factory
Sergio Zenatti Filho
 
PDF
The new big data
Adam Doyle
 
PPTX
Data Modeling Basics for the Cloud with DataStax
DataStax
 
PPTX
NoSQL on MySQL - MySQL Document Store by Vadim Tkachenko
Data Con LA
 
PDF
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
Databricks
 
PDF
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Adam Doyle
 
PPTX
How jKool Analyzes Streaming Data in Real Time with DataStax
DataStax
 
PDF
Azure Data Factory v2
Sergio Zenatti Filho
 
PPTX
Cloudian HyperStore Operating Environment
Cloudian
 
PDF
Apache Hadoop 3
Cloudera, Inc.
 
PDF
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax
 
PDF
A Gentle Introduction to GPU Computing by Armen Donigian
Data Con LA
 
PPT
Webinar - The Agility Challenge - Powering Cloud Apps with Multi-Model & Mixe...
DataStax
 
PDF
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
Tom Diederich
 
PDF
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Alluxio, Inc.
 
PPTX
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
DataStax
 
PDF
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
Spark Summit
 
DataStax | Adversarial Modeling: Graph, ML, and Analytics for Identity Fraud ...
DataStax
 
Can My Inventory Survive Eventual Consistency?
DataStax
 
Aleksejs Nemirovskis - Manage your data using oracle BDA
Andrejs Vorobjovs
 
Unleash the power of Azure Data Factory
Sergio Zenatti Filho
 
The new big data
Adam Doyle
 
Data Modeling Basics for the Cloud with DataStax
DataStax
 
NoSQL on MySQL - MySQL Document Store by Vadim Tkachenko
Data Con LA
 
Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA a...
Databricks
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Adam Doyle
 
How jKool Analyzes Streaming Data in Real Time with DataStax
DataStax
 
Azure Data Factory v2
Sergio Zenatti Filho
 
Cloudian HyperStore Operating Environment
Cloudian
 
Apache Hadoop 3
Cloudera, Inc.
 
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax
 
A Gentle Introduction to GPU Computing by Armen Donigian
Data Con LA
 
Webinar - The Agility Challenge - Powering Cloud Apps with Multi-Model & Mixe...
DataStax
 
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
Tom Diederich
 
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Alluxio, Inc.
 
How DataStax Enterprise and Azure Make Your Apps Scale from Day 1
DataStax
 
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
Spark Summit
 

Viewers also liked (20)

PDF
A Shortcut to Awesome: Cassandra Data Modeling By Example (Jon Haddad, The La...
DataStax
 
PDF
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
DataStax Academy
 
PDF
NoSQL Essentials: Cassandra
Fernando Rodriguez
 
PDF
RDBからの脱却: 新ERP"HUE"におけるCassandra
2t3
 
PDF
Cassandra Summit 2014: CQL Under the Hood
DataStax Academy
 
PDF
Overview of DataStax OpsCenter
DataStax
 
PDF
DataStax: Backup and Restore in Cassandra and OpsCenter
DataStax Academy
 
PPTX
EADL conference: Towards National stratgies for OER? The Dutch landscape, Fre...
Fred de Vries
 
PDF
VA HOME LOAN
SUSAN HARVEY
 
PDF
ScrumMaster activities in building a winning self organized teams - Naveen Na...
Naveen Nanjundappa
 
PPTX
Nida presentation
Dinesh Raheja
 
PDF
Attack toolkit webinar 9-7-11
Alex T.
 
PDF
J1939 stack integration with an advanced EPS system | Automotive Tier-I Suppl...
Embitel Technologies - A VOLKSWAGEN GROUP COMPANY
 
PPTX
Working With Interpreters in Palliative Care.
HMVT Teaching and Learning Space
 
PPT
Becky kelly[1]
rkelly2010
 
PDF
Ellsworthetal1996SSSAJpaper
ellswort
 
PDF
Zasady prezentacji 2
pcmp
 
PDF
Ukraine - Business unplugged!
Morten Munk
 
DOCX
Microsoft Project workshop in Pune 6th & 7th August
vrushalis
 
PPT
ATTACK Toolkit Webinar on Big Tobacco's Emerging Marketing
Alex T.
 
A Shortcut to Awesome: Cassandra Data Modeling By Example (Jon Haddad, The La...
DataStax
 
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
DataStax Academy
 
NoSQL Essentials: Cassandra
Fernando Rodriguez
 
RDBからの脱却: 新ERP"HUE"におけるCassandra
2t3
 
Cassandra Summit 2014: CQL Under the Hood
DataStax Academy
 
Overview of DataStax OpsCenter
DataStax
 
DataStax: Backup and Restore in Cassandra and OpsCenter
DataStax Academy
 
EADL conference: Towards National stratgies for OER? The Dutch landscape, Fre...
Fred de Vries
 
VA HOME LOAN
SUSAN HARVEY
 
ScrumMaster activities in building a winning self organized teams - Naveen Na...
Naveen Nanjundappa
 
Nida presentation
Dinesh Raheja
 
Attack toolkit webinar 9-7-11
Alex T.
 
J1939 stack integration with an advanced EPS system | Automotive Tier-I Suppl...
Embitel Technologies - A VOLKSWAGEN GROUP COMPANY
 
Working With Interpreters in Palliative Care.
HMVT Teaching and Learning Space
 
Becky kelly[1]
rkelly2010
 
Ellsworthetal1996SSSAJpaper
ellswort
 
Zasady prezentacji 2
pcmp
 
Ukraine - Business unplugged!
Morten Munk
 
Microsoft Project workshop in Pune 6th & 7th August
vrushalis
 
ATTACK Toolkit Webinar on Big Tobacco's Emerging Marketing
Alex T.
 
Ad

Similar to Cassandra Summit: C* Keys - Partitioning, Clustering, & Crossfit (20)

PDF
Datastax day 2016 : Cassandra data modeling basics
Duyhai Doan
 
PDF
Introduction to Dating Modeling for Cassandra
DataStax Academy
 
PDF
Cassandra
Lucian Neghina
 
PPTX
Introduction to Apache Cassandra
Jesus Guzman
 
PDF
Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the ...
DataStax Academy
 
PDF
Cassandra Data Modelling
Knoldus Inc.
 
DOCX
Cassandra data modelling best practices
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
PDF
Meetup Crash Course: Cassandra Data Modelling
Erick Ramirez
 
PPTX
DataStax & Cassandra Data Modeling Strategies
Anant Corporation
 
PPTX
Datastax / Cassandra Modeling Strategies
Anant Corporation
 
PPTX
NoSQL, SQL, NewSQL - methods of structuring data.
Tony Rogerson
 
PPTX
Performance tuning - A key to successful cassandra migration
Ramkumar Nottath
 
PDF
Big Data Grows Up - A (re)introduction to Cassandra
Robbie Strickland
 
PPTX
Presentation
Dimitris Stripelis
 
PPTX
Learning spark ch04 - Working with Key/Value Pairs
phanleson
 
PDF
What We Need to Unlearn about Persistent Storage
ScyllaDB
 
PDF
Avoiding Data Hotspots at Scale
ScyllaDB
 
PDF
Data Partitioning in Mongo DB with Cloud
IJAAS Team
 
PPTX
Symantec: Cassandra Data Modelling techniques in action
DataStax Academy
 
PDF
Apache Cassandra & Data Modeling
Massimiliano Tomassi
 
Datastax day 2016 : Cassandra data modeling basics
Duyhai Doan
 
Introduction to Dating Modeling for Cassandra
DataStax Academy
 
Cassandra
Lucian Neghina
 
Introduction to Apache Cassandra
Jesus Guzman
 
Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the ...
DataStax Academy
 
Cassandra Data Modelling
Knoldus Inc.
 
Cassandra data modelling best practices
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Meetup Crash Course: Cassandra Data Modelling
Erick Ramirez
 
DataStax & Cassandra Data Modeling Strategies
Anant Corporation
 
Datastax / Cassandra Modeling Strategies
Anant Corporation
 
NoSQL, SQL, NewSQL - methods of structuring data.
Tony Rogerson
 
Performance tuning - A key to successful cassandra migration
Ramkumar Nottath
 
Big Data Grows Up - A (re)introduction to Cassandra
Robbie Strickland
 
Presentation
Dimitris Stripelis
 
Learning spark ch04 - Working with Key/Value Pairs
phanleson
 
What We Need to Unlearn about Persistent Storage
ScyllaDB
 
Avoiding Data Hotspots at Scale
ScyllaDB
 
Data Partitioning in Mongo DB with Cloud
IJAAS Team
 
Symantec: Cassandra Data Modelling techniques in action
DataStax Academy
 
Apache Cassandra & Data Modeling
Massimiliano Tomassi
 
Ad

Recently uploaded (20)

PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 

Cassandra Summit: C* Keys - Partitioning, Clustering, & Crossfit

  • 1. C* Keys: Partitioning, Clustering, & CrossFit Adam Hutson - Data Architect, DataScale Inc.
  • 2. © DataStax, All Rights Reserved. Who am I & What do we do? 2 Adam Hutson Data Architect @ DataScale -> www.datascale.io DataStax MVP for Apache Cassandra DataScale provides hosted data platforms as a service Offering Cassandra & Spark, with more to come Currently hosted in Amazon & Azure
  • 4. © DataStax, All Rights Reserved. 1 Why 2 Partition 3 Partition Key 4 Composite Partition Key 5 Clustering Columns 4
  • 5. © DataStax, All Rights Reserved. Why give this presentation? Partitioning & Clustering should be the foundation. Too often glossed over. Has the biggest impact to performance of the cluster 5
  • 7. © DataStax, All Rights Reserved. Partition Explained • Token values can range from -263 to 263-1. • Nodes in the cluster/ring are assigned a single token. • A node is responsible for the token value and expands to the previous node’s token. • A Partitioner decides where a partition key maps onto the cluster/ring. 7 Node #3 is responsible for tokens from -1844674407370955162 to -5534023222112865485
  • 8. © DataStax, All Rights Reserved. Partition Explained 8
  • 10. © DataStax, All Rights Reserved. Partition Key Explained The Partition Key is: • responsible for distribution of data amongst the nodes • the first column defined in the PRIMARY KEY 10
  • 11. © DataStax, All Rights Reserved. Partition Key Explained 11
  • 12. © DataStax, All Rights Reserved. Partition Key Explained 12
  • 14. © DataStax, All Rights Reserved. Composite Partition Key Explained Using multiple columns for the token hash value. 14
  • 15. © DataStax, All Rights Reserved. Composite Partition Key Explained 15
  • 16. © DataStax, All Rights Reserved. Composite Partition Key Explained 16
  • 18. © DataStax, All Rights Reserved. Clustering Columns Explained Clustering Columns are: • responsible for sorting within the partition • any column added to the Primary Key, past the first column 18
  • 19. © DataStax, All Rights Reserved. Clustering Columns Explained Can be used for Hierarchical structured data. 19
  • 20. © DataStax, All Rights Reserved. Clustering Columns Explained Can be used for Time Series structured data. CREATE TABLE member_log ( member text, workout_date timestamp, workout_duration text, PRIMARY KEY (member, workout_date) ) WITH CLUSTERING ORDER BY (workout_date DESC); 20
  • 21. © DataStax, All Rights Reserved. Clustering Columns Explained 21