SlideShare a Scribd company logo
Migrating SQL Schemas
for ScyllaDB:
Data Modeling Best Practices
Pascal Desmarets
Founder & CEO
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Practices
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Practices
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Practices
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Practices
Pascal Desmarets
■ Married, father of 2 boys in business school
■ Passionate about data, technology, and doing things right
■ Avid sailboat racer, preferably offshore
Founder & CEO
YOUR PHOTO
GOES HERE
Why is Data Modeling a key success factor?
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Practices
Data Modeling is a Key Success Factor
Data models and schemas are perhaps the
most important part of developing software,
because they have such a profound effect:
■ not only on how the software is written,
■ but also on how we think about the
problem that we are solving.
Martin Kleppmann,
Designing Data-Intensive Applications
Data Modeling for ScyllaDB
The ideal ScyllaDB application has the following characteristics
■ Writes exceed reads by a large margin
■ Data is rarely updated and when updates are made, they are idempotent (the
result of a successful performed operation is independent of the number of
times it is executed)
■ Read Access is by a known primary key
■ Data can be partitioned via a key that allows the database to be spread evenly
across multiple nodes
■ There is no need for joins or aggregates
Excellent ScyllaDB Use Cases
■ Transaction logging: purchases, test scores, movies watched and movie latest location
■ Recommendation and personalization engines
■ Fraud detection
■ Tracking pretty much anything including order status, packages, etc
■ Storing time series data (as long as you do your own aggregates)
• Health tracker data
• Weather service history
• Internet of things status and event history
• Sensor data in general
■ Messaging systems: chats, collaboration, and instant messaging apps, etc
It may be misleading that…
■ ScyllaDB tables look like RDBMS tables
■ CQL looks like SQL
Denormalization is expected
Writes are (almost) free
No DB-level joins
No referential integrity
Indexing useful in specific
circumstances
Differences
between
ScyllaDB
and
relational
databases
Mindshift from application-agnostic to
application-specific modeling
Data Data Model Application
Application
Design
Access
patterns
& Queries
Data Model Data
Relational
NoSQL
ScyllaDB Data Model Principles (1 of 3)
■ Keyspace: container for tables in a Cassandra data model
■ Table: container for an ordered collection of rows
■ Rows: made of a primary key plus an ordered set of columns, themselves
made of name/value pairs.
■ No need to store a value for every column each time a new row is stored.
ScyllaDB Data Model Principles (2 of 3)
■ Primary key: a composite made of a partition key plus an optional set of
clustering columns.
• Partition key: is responsible for data distribution across the nodes. It determines which node
will store a given row. It can be one or more columns.
• Clustering columns: is responsible for sorting the rows within the partition. It can be zero or
more columns.
ScyllaDB Data Model Principles (3 of 3)
■ Data type: defined to constrain the values stored in a column. Data types include character and
numeric types, collections, and user-defined types. A column also has other attributes:
timestamps and time-to-live.
■ Secondary index: an index on any columns that is not part of the primary key. Secondary indexes
are not recommended on columns with high cardinality or very low cardinality, or on columns that
a frequently updated or deleted.
■ Joins: cannot be performed at the database level. If there is need for a join, either it must be
performed at the application level, or preferably, the data model should be adapted to create a
denormalized table that represents the join results.
Data modeling for ScyllaDB is a
balancing act
■ Two primary rules of data modeling in ScyllaDB:
• each partition should have roughly same amount of data
• read operations should access minimum partitions, ideally only one
■ The two data modeling principles often conflict, therefore you have to find a
balance between the two based on domain understanding and business needs
■ Anticipate growth: a data model that may make sense with a particular
transaction volume, may not longer make sense when multiplied 100x or 1000x
Data modeling in practice
5 steps to a data model
■ Step 1: Build the application workflow
■ Step 2: Model the queries required by the application
■ Step 3: Create the tables
■ Step 4: Get the primary key right
■ Step 5: Use data types effectively
■ Example derived from
https://care-pet.docs.scylladb.com/master/design_and_data_model.html
Step 1: Build the application workflow
Step 2a: Model the queries required by the application
Step 2b: identify attributes for each entity
Step 3: Create the tables
■ In ScyllaDB, tables can be grouped into two distinct categories:
• Tables with single-row partitions:
• tables for which the primary key is also the partition keys
• used to store entities and are usually normalized.
• should be named based on the entity for clarity (i.e., pet or owner).
• Tables with multi-row partitions:
• tables with primary keys composed of partition and clustering keys
• used to store relationships and related entities (Remember: ScyllaDB doesn’t support joins,
so developers need to structure tables to support queries that relate to multiple data items
• give tables meaningful names so that people examining the schema can understand the
purpose of different tables (i.e., sensor, measurement, etc.).
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Practices
Step 4: Get the primary key right
■ The primary key is made up of
• a partition key. For most applications, this should be a unique key (UUID or custom)
• followed by one or more optional clustering columns that control how rows are laid out in a
ScyllaDB partition
■ Getting the primary key right for each table is one of the most crucial aspects
of designing a good data model
■ Remember the two primary rules of data modeling in Cassandra:
• each partition should have roughly same amount of data
• read operations should access minimum partitions, ideally only one
Step 5: Use data types effectively
■ String: ascii, text, varchar, inet
■ Numeric: int, bigint, smallint, tinyint, varint,
counter, decimal, double, float
■ UUIDs: uuid, timeuuid
■ Miscellaneous: Boolean, blob
■ Date/time: timestamp, date, time, duration
■ Geospatial
■ Collections: list, map, set, tuple, nested
■ User-Defined Types (UDT)
Collections
■ List: ordered collection of one or more elements
■ Set: unordered collection of one or more unique elements
■ Map: collection of arbitrary key-value pairs
■ Tuple: holds fixed-length sets of typed positional fields
■ Frozen: serialization of multiple components into a single value – updates to
individual fields is not possible – treated as a blob so as to be able to nest
collections
■ User-Defined Type: re-usable set of multiple fields of related information,
e.g. an address
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Practices
A single table per query
Use denormalization to avoid
joins
Ensure that the choice of
primary key guarantees
uniqueness
Break up large partitions in
buckets
Best
Practices
Migrating relational database structures to ScyllaDB
RDBMS ScyllaDB
Benefits of data modeling
■ While traditional data modeling may be perceived to get in
the way of development and take too much time…
■ Next-gen data modeling tools such as Hackolade are
recognized to:
• facilitate Agile development
• reduce development time
• increase application quality
• implement consistent definitions of data
• improve data quality
• enable better data governance and compliance
• facilitate documentation and communication
To leverage the dynamic schema of ScyllaDB, data
modeling turns out to be even more important than
with relational databases
Thank you!
Stay in touch
Pascal Desmarets
@Hackolade
pascal.desmarets@hackolade.com

More Related Content

What's hot (20)

PDF
Introduction to Cassandra
Gokhan Atil
 
PDF
Automated master failover
Yoshinori Matsunobu
 
PPTX
What is NoSQL and CAP Theorem
Rahul Jain
 
PDF
Big data real time architectures
Daniel Marcous
 
PPTX
C* Summit 2013: Eventual Consistency != Hopeful Consistency by Christos Kalan...
DataStax Academy
 
PDF
Migration Best Practices: From RDBMS to Cassandra without a Hitch
DataStax Academy
 
PDF
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
HostedbyConfluent
 
PPTX
Best practices and lessons learnt from Running Apache NiFi at Renault
DataWorks Summit
 
PPTX
Cassandra
Upaang Saxena
 
PPTX
Introduction to Apache Spark
Rahul Jain
 
PPTX
Flink Streaming
Gyula Fóra
 
PDF
Introduction to Stream Processing
Guido Schmutz
 
PPTX
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking VN
 
PDF
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
GetInData
 
PDF
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
HostedbyConfluent
 
PDF
Introduction To Flink
Knoldus Inc.
 
PPTX
Free Training: How to Build a Lakehouse
Databricks
 
PPTX
Building Expedia’s Travel Graph using MongoDB
MongoDB
 
PDF
Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
Databricks
 
PPTX
Apache Arrow - An Overview
Dremio Corporation
 
Introduction to Cassandra
Gokhan Atil
 
Automated master failover
Yoshinori Matsunobu
 
What is NoSQL and CAP Theorem
Rahul Jain
 
Big data real time architectures
Daniel Marcous
 
C* Summit 2013: Eventual Consistency != Hopeful Consistency by Christos Kalan...
DataStax Academy
 
Migration Best Practices: From RDBMS to Cassandra without a Hitch
DataStax Academy
 
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
HostedbyConfluent
 
Best practices and lessons learnt from Running Apache NiFi at Renault
DataWorks Summit
 
Cassandra
Upaang Saxena
 
Introduction to Apache Spark
Rahul Jain
 
Flink Streaming
Gyula Fóra
 
Introduction to Stream Processing
Guido Schmutz
 
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking VN
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
GetInData
 
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
HostedbyConfluent
 
Introduction To Flink
Knoldus Inc.
 
Free Training: How to Build a Lakehouse
Databricks
 
Building Expedia’s Travel Graph using MongoDB
MongoDB
 
Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
Databricks
 
Apache Arrow - An Overview
Dremio Corporation
 

Similar to Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Practices (20)

PPTX
Introduction to asdfghjkln b vfgh n v
23mz02
 
PPTX
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
ScyllaDB
 
PDF
Building better SQL Server Databases
ColdFusionConference
 
PPTX
Introduction to Data Science NoSQL.pptx
tarakesh7199
 
PPTX
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
nehabsairam
 
PPTX
NoSQL.pptx
RithikRaj25
 
PDF
Nosql data models
Viet-Trung TRAN
 
PPTX
cours database pour etudiant NoSQL (1).pptx
ssuser1fde9c
 
PPTX
Module 2.2 Introduction to NoSQL Databases.pptx
NiramayKolalle
 
PPT
NoSQL Fundamentals PowerPoint Presentation
AnweshMishra21
 
PPTX
DATABASE MANAGEMENT SYSTEMS CS 3492.pptx
venigkrish89
 
PPTX
No SQL- The Future Of Data Storage
Bethmi Gunasekara
 
PPTX
Modern database
Rashid Ansari
 
PPTX
Modeling Data and Queries for Wide Column NoSQL
ScyllaDB
 
PDF
Big Data technology Landscape
ShivanandaVSeeri
 
PPTX
Ch-11 Relational Databases.pptx
ShadowDawg
 
PPT
Database Management & Models
Sunderland City Council
 
PPTX
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEM
Dr.Florence Dayana
 
PPTX
dbms introduction.pptx
ATISHAYJAIN847270
 
Introduction to asdfghjkln b vfgh n v
23mz02
 
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
ScyllaDB
 
Building better SQL Server Databases
ColdFusionConference
 
Introduction to Data Science NoSQL.pptx
tarakesh7199
 
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
nehabsairam
 
NoSQL.pptx
RithikRaj25
 
Nosql data models
Viet-Trung TRAN
 
cours database pour etudiant NoSQL (1).pptx
ssuser1fde9c
 
Module 2.2 Introduction to NoSQL Databases.pptx
NiramayKolalle
 
NoSQL Fundamentals PowerPoint Presentation
AnweshMishra21
 
DATABASE MANAGEMENT SYSTEMS CS 3492.pptx
venigkrish89
 
No SQL- The Future Of Data Storage
Bethmi Gunasekara
 
Modern database
Rashid Ansari
 
Modeling Data and Queries for Wide Column NoSQL
ScyllaDB
 
Big Data technology Landscape
ShivanandaVSeeri
 
Ch-11 Relational Databases.pptx
ShadowDawg
 
Database Management & Models
Sunderland City Council
 
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEM
Dr.Florence Dayana
 
dbms introduction.pptx
ATISHAYJAIN847270
 
Ad

More from ScyllaDB (20)

PDF
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
ScyllaDB
 
PDF
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
ScyllaDB
 
PDF
New Ways to Reduce Database Costs with ScyllaDB
ScyllaDB
 
PDF
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
PDF
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
ScyllaDB
 
PDF
Leading a High-Stakes Database Migration
ScyllaDB
 
PDF
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
ScyllaDB
 
PDF
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
ScyllaDB
 
PDF
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
ScyllaDB
 
PDF
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB
 
PDF
ScyllaDB: 10 Years and Beyond by Dor Laor
ScyllaDB
 
PDF
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
ScyllaDB
 
PDF
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
ScyllaDB
 
PDF
Vector Search with ScyllaDB by Szymon Wasik
ScyllaDB
 
PDF
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
ScyllaDB
 
PDF
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
ScyllaDB
 
PDF
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
ScyllaDB
 
PDF
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
ScyllaDB
 
PDF
Lessons Learned from Building a Serverless Notifications System by Srushith R...
ScyllaDB
 
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
ScyllaDB
 
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
ScyllaDB
 
New Ways to Reduce Database Costs with ScyllaDB
ScyllaDB
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
ScyllaDB
 
Leading a High-Stakes Database Migration
ScyllaDB
 
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
ScyllaDB
 
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
ScyllaDB
 
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
ScyllaDB
 
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB
 
ScyllaDB: 10 Years and Beyond by Dor Laor
ScyllaDB
 
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
ScyllaDB
 
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
ScyllaDB
 
Vector Search with ScyllaDB by Szymon Wasik
ScyllaDB
 
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
ScyllaDB
 
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
ScyllaDB
 
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
ScyllaDB
 
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
ScyllaDB
 
Lessons Learned from Building a Serverless Notifications System by Srushith R...
ScyllaDB
 
Ad

Recently uploaded (20)

PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
Digital Circuits, important subject in CS
contactparinay1
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 

Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Practices

  • 1. Migrating SQL Schemas for ScyllaDB: Data Modeling Best Practices Pascal Desmarets Founder & CEO
  • 6. Pascal Desmarets ■ Married, father of 2 boys in business school ■ Passionate about data, technology, and doing things right ■ Avid sailboat racer, preferably offshore Founder & CEO YOUR PHOTO GOES HERE
  • 7. Why is Data Modeling a key success factor?
  • 9. Data Modeling is a Key Success Factor Data models and schemas are perhaps the most important part of developing software, because they have such a profound effect: ■ not only on how the software is written, ■ but also on how we think about the problem that we are solving. Martin Kleppmann, Designing Data-Intensive Applications
  • 10. Data Modeling for ScyllaDB
  • 11. The ideal ScyllaDB application has the following characteristics ■ Writes exceed reads by a large margin ■ Data is rarely updated and when updates are made, they are idempotent (the result of a successful performed operation is independent of the number of times it is executed) ■ Read Access is by a known primary key ■ Data can be partitioned via a key that allows the database to be spread evenly across multiple nodes ■ There is no need for joins or aggregates
  • 12. Excellent ScyllaDB Use Cases ■ Transaction logging: purchases, test scores, movies watched and movie latest location ■ Recommendation and personalization engines ■ Fraud detection ■ Tracking pretty much anything including order status, packages, etc ■ Storing time series data (as long as you do your own aggregates) • Health tracker data • Weather service history • Internet of things status and event history • Sensor data in general ■ Messaging systems: chats, collaboration, and instant messaging apps, etc
  • 13. It may be misleading that… ■ ScyllaDB tables look like RDBMS tables ■ CQL looks like SQL
  • 14. Denormalization is expected Writes are (almost) free No DB-level joins No referential integrity Indexing useful in specific circumstances Differences between ScyllaDB and relational databases
  • 15. Mindshift from application-agnostic to application-specific modeling Data Data Model Application Application Design Access patterns & Queries Data Model Data Relational NoSQL
  • 16. ScyllaDB Data Model Principles (1 of 3) ■ Keyspace: container for tables in a Cassandra data model ■ Table: container for an ordered collection of rows ■ Rows: made of a primary key plus an ordered set of columns, themselves made of name/value pairs. ■ No need to store a value for every column each time a new row is stored.
  • 17. ScyllaDB Data Model Principles (2 of 3) ■ Primary key: a composite made of a partition key plus an optional set of clustering columns. • Partition key: is responsible for data distribution across the nodes. It determines which node will store a given row. It can be one or more columns. • Clustering columns: is responsible for sorting the rows within the partition. It can be zero or more columns.
  • 18. ScyllaDB Data Model Principles (3 of 3) ■ Data type: defined to constrain the values stored in a column. Data types include character and numeric types, collections, and user-defined types. A column also has other attributes: timestamps and time-to-live. ■ Secondary index: an index on any columns that is not part of the primary key. Secondary indexes are not recommended on columns with high cardinality or very low cardinality, or on columns that a frequently updated or deleted. ■ Joins: cannot be performed at the database level. If there is need for a join, either it must be performed at the application level, or preferably, the data model should be adapted to create a denormalized table that represents the join results.
  • 19. Data modeling for ScyllaDB is a balancing act ■ Two primary rules of data modeling in ScyllaDB: • each partition should have roughly same amount of data • read operations should access minimum partitions, ideally only one ■ The two data modeling principles often conflict, therefore you have to find a balance between the two based on domain understanding and business needs ■ Anticipate growth: a data model that may make sense with a particular transaction volume, may not longer make sense when multiplied 100x or 1000x
  • 20. Data modeling in practice
  • 21. 5 steps to a data model ■ Step 1: Build the application workflow ■ Step 2: Model the queries required by the application ■ Step 3: Create the tables ■ Step 4: Get the primary key right ■ Step 5: Use data types effectively ■ Example derived from https://care-pet.docs.scylladb.com/master/design_and_data_model.html
  • 22. Step 1: Build the application workflow
  • 23. Step 2a: Model the queries required by the application
  • 24. Step 2b: identify attributes for each entity
  • 25. Step 3: Create the tables ■ In ScyllaDB, tables can be grouped into two distinct categories: • Tables with single-row partitions: • tables for which the primary key is also the partition keys • used to store entities and are usually normalized. • should be named based on the entity for clarity (i.e., pet or owner). • Tables with multi-row partitions: • tables with primary keys composed of partition and clustering keys • used to store relationships and related entities (Remember: ScyllaDB doesn’t support joins, so developers need to structure tables to support queries that relate to multiple data items • give tables meaningful names so that people examining the schema can understand the purpose of different tables (i.e., sensor, measurement, etc.).
  • 27. Step 4: Get the primary key right ■ The primary key is made up of • a partition key. For most applications, this should be a unique key (UUID or custom) • followed by one or more optional clustering columns that control how rows are laid out in a ScyllaDB partition ■ Getting the primary key right for each table is one of the most crucial aspects of designing a good data model ■ Remember the two primary rules of data modeling in Cassandra: • each partition should have roughly same amount of data • read operations should access minimum partitions, ideally only one
  • 28. Step 5: Use data types effectively ■ String: ascii, text, varchar, inet ■ Numeric: int, bigint, smallint, tinyint, varint, counter, decimal, double, float ■ UUIDs: uuid, timeuuid ■ Miscellaneous: Boolean, blob ■ Date/time: timestamp, date, time, duration ■ Geospatial ■ Collections: list, map, set, tuple, nested ■ User-Defined Types (UDT)
  • 29. Collections ■ List: ordered collection of one or more elements ■ Set: unordered collection of one or more unique elements ■ Map: collection of arbitrary key-value pairs ■ Tuple: holds fixed-length sets of typed positional fields ■ Frozen: serialization of multiple components into a single value – updates to individual fields is not possible – treated as a blob so as to be able to nest collections ■ User-Defined Type: re-usable set of multiple fields of related information, e.g. an address
  • 31. A single table per query Use denormalization to avoid joins Ensure that the choice of primary key guarantees uniqueness Break up large partitions in buckets Best Practices
  • 32. Migrating relational database structures to ScyllaDB RDBMS ScyllaDB
  • 33. Benefits of data modeling ■ While traditional data modeling may be perceived to get in the way of development and take too much time… ■ Next-gen data modeling tools such as Hackolade are recognized to: • facilitate Agile development • reduce development time • increase application quality • implement consistent definitions of data • improve data quality • enable better data governance and compliance • facilitate documentation and communication To leverage the dynamic schema of ScyllaDB, data modeling turns out to be even more important than with relational databases
  • 34. Thank you! Stay in touch Pascal Desmarets @Hackolade [email protected]