SlideShare a Scribd company logo
World’s Best Data Modeling Tool
for Apache Cassandra
1© 2015. All Rights Reserved.
Artem ChebotkoAndrey Kashlev
1 Cassandra Data Modeling Methodology
2 The KDM Tool
3 Live Demo: IoT
4 Live Demo: Media Cataloguing
5 Future Work
2© 2015. All Rights Reserved.
Data Modeling Process
• Data requirements
• Application requirements
• Schema Design
• Optimization
3© 2015. All Rights Reserved.
Cassandra Data Modeling Methodology
© 2015. All Rights Reserved. 4
Conceptual
Data Model
Application
Workflow
Logical
Data Model
Physical
Data Model
Mapping Optimization
Methodology Models
© 2015. All Rights Reserved. 5
Model Representation
Conceptual Data Model ERD
Application Workflow Model Graph
Logical Data Model Chebotko Diagram
Physical Data Model Chebotko Diagram, CQL
Methodology Protocols
© 2015. All Rights Reserved. 6
• Conceptual-to-logical mapping
– Mapping rules
– Mapping patterns
• Physical optimizations
– Partition size analysis
– Duplication factor analysis
– Keys, aggregation, transactions, …
Example
© 2015. All Rights Reserved. 7
SELECT timestamp, value FROM …
WHERE location = ? AND parameter = ? AND timestamp > ?
ORDER BY timestamp DESC
n
parameter value
1
timestampid location
Sensor Measurementrecords
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
1
Example
© 2015. All Rights Reserved. 8
SELECT timestamp, value FROM …
WHERE location = ? AND parameter = ?
AND timestamp > ?
ORDER BY timestamp DESC
n
parameter value
1
timestampid location
Sensor Measurementrecords
Mapping Entity and Relationship Types
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
1 2
Example
© 2015. All Rights Reserved. 9
SELECT timestamp, value FROM …
WHERE location = ? AND parameter = ?
AND timestamp > ?
ORDER BY timestamp DESC
n
parameter value
1
timestampid location
Sensor Measurementrecords
Mapping Equality Search Atributes
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
sensor_data
location K
parameter K
timestamp C↑
id C↑
value
1 2 3
Example
© 2015. All Rights Reserved. 10
SELECT timestamp, value FROM …
WHERE location = ? AND parameter = ?
AND timestamp > ?
ORDER BY timestamp DESC
n
parameter value
1
timestampid location
Sensor Measurementrecords
Mapping Inequality Search Attributes
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
sensor_data
location K
parameter K
timestamp C↑
id C↑
value
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
1 2 3 4
Example
© 2015. All Rights Reserved. 11
SELECT timestamp, value FROM …
WHERE location = ? AND parameter = ?
AND timestamp > ?
ORDER BY timestamp DESC
n
parameter value
1
timestampid location
Sensor Measurementrecords
Mapping Ordering Attributes
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
sensor_data
location K
parameter K
timestamp C↑
id C↑
value
sensor_data
location K
parameter K
timestamp C↓
id C↑
value
1 2 3 4 5
Example
© 2015. All Rights Reserved. 12
SELECT timestamp, value FROM …
WHERE location = ? AND parameter = ?
AND timestamp > ?
ORDER BY timestamp DESC
n
parameter value
1
timestampid location
Sensor Measurementrecords
Mapping Key Attributes
Methodology Pros and Cons
Correctness
Completeness
© 2015. All Rights Reserved. 13
Complexity
Time investment
Human Errors Happen …
© 2015. All Rights Reserved. 14
Automation
© 2015. All Rights Reserved. 15
Complexity
Time investment
Human Error
1 Cassandra Data Modeling Methodology
2 The KDM Tool
3 Live Demo: IoT
4 Live Demo: Media Cataloguing
5 Future Work
16© 2015. All Rights Reserved.
The KDM Tool
• Streamlines the methodology
• Guides the user
• Automates data modeling tasks:
– Conceptual-to-logical mapping
– Physical optimization
– CQL generation
17© 2015. All Rights Reserved.
KDM Automation Workflow
18© 2015. All Rights Reserved.
KDM Automation Workflow
19© 2015. All Rights Reserved.
Design
Conceptual
Data Model
Step1
Solution
architect
KDM Automation Workflow
20© 2015. All Rights Reserved.
Design
Conceptual
Data Model
Specify
Access
Patterns
Solution
architect
Step1 Step2
Solution
architect
KDM Automation Workflow
21© 2015. All Rights Reserved.
Design
Conceptual
Data Model
Specify
Access
Patterns
Generate
Logical
Data
Models
KDM
Solution
architect
Step1 Step2 Automated
Solution
architect
KDM Automation Workflow
22© 2015. All Rights Reserved.
Design
Conceptual
Data Model
Specify
Access
Patterns
Generate
Logical
Data
Models
Select
Logical
Data
Model
KDM
Solution
architect
Step1 Step2 Step3Automated
Solution
architect
Solution
architect
KDM Automation Workflow
23© 2015. All Rights Reserved.
Design
Conceptual
Data Model
Specify
Access
Patterns
Generate
Logical
Data
Models
Select
Logical
Data
Model
Generate
Physical
Data
Model
KDM
Solution
architect
Step1 Step2 Step3Automated Automated
Solution
architect
Solution
architect
KDM
KDM Automation Workflow
24© 2015. All Rights Reserved.
Design
Conceptual
Data Model
Specify
Access
Patterns
Generate
Logical
Data
Models
Select
Logical
Data
Model
Generate
Physical
Data
Model
Configure
Physical
Data
Model
KDM
Solution
architect
Step1 Step2 Step3 Step4Automated Automated
Solution
architect
Solution
architect
Solution
architect
KDM
KDM Automation Workflow
25© 2015. All Rights Reserved.
Design
Conceptual
Data Model
Specify
Access
Patterns
Generate
Logical
Data
Models
Select
Logical
Data
Model
Generate
Physical
Data
Model
Configure
Physical
Data
Model
Generate
Physical
Schema
KDM
Solution
architect
Step1 Step2 Step3 Step4Automated Automated Automated
Solution
architect
Solution
architect
Solution
architect
KDM KDM
KDM Automation Workflow
26© 2015. All Rights Reserved.
Design
Conceptual
Data Model
Specify
Access
Patterns
Generate
Logical
Data
Models
Select
Logical
Data
Model
Generate
Physical
Data
Model
Configure
Physical
Data
Model
Generate
Physical
Schema
Download
CQL
Script
KDM
Solution
architect
Step1 Step2 Step3 Step4 Step5Automated Automated Automated
Solution
architect
Solution
architect
Solution
architect
Solution
architect
KDM KDM
1 Cassandra Data Modeling Methodology
2 The KDM Tool
3 Live Demo: IoT
4 Live Demo: Media Cataloguing
5 Future Work
27© 2015. All Rights Reserved.
28
1 Cassandra Data Modeling Methodology
2 The KDM Tool
3 Live Demo: IoT
4 Live Demo: Media Cataloguing
5 Future Work
29© 2015. All Rights Reserved.
© 2015. All Rights Reserved. 30
31© 2015. All Rights Reserved.
• KDM:
– automates most complex tasks
– eliminates human error
– simplifies data modeling
– guides
– is a general purpose tool
Summary
32© 2015. All Rights Reserved.
• build new data models
• verify existing data models
• teach/learn data modeling
How Can KDM Help You?
1 Cassandra Data Modeling Methodology
2 The KDM Tool
3 Live Demo: IoT
4 Live Demo: Media Cataloguing
5 Future Work
33© 2015. All Rights Reserved.
Future Work
• Materialized views
© 2015. All Rights Reserved. 34
Future Work
• Materialized views
• User Defined Types
© 2015. All Rights Reserved. 35
Future Work
• Materialized views
• User Defined Types
• Analysis and physical optimization
© 2015. All Rights Reserved. 36
Future Work
• Materialized views
• User Defined Types
• Analysis and physical optimization
• Support for application workflow design
© 2015. All Rights Reserved. 37
Future Work
• Materialized views
• User Defined Types
• Analysis and physical optimization
• Support for application workflow design
• Support for Chebotko Diagrams
© 2015. All Rights Reserved. 38
Sign up for KDM – it’s FREE!
• KDM: kdm.dataview.org
• Methodology: academy.datastax.com
• Planet Cassandra blog posts:
– KDM: An Automated Data Modeling Tool for Apache
Cassandra, Pt. 1, Pt. 2
• Artem Chebotko, Andrey Kashlev, Shiyong Lu,
“A Big Data Modeling Methodology for Apache Cassandra”,
IEEE International Congress on Big Data, 2015.
© 2015. All Rights Reserved. 39
Acknowledgements
• Andrey Kashlev would like to thank:
– Dr. Shiyong Lu
– Anthony Piazza
• Artem Chebotko would like to thank:
– Anthony Piazza
– Patrick McFadin
– Jonathan Ellis
– Tim Berglund
© 2015. All Rights Reserved. 40
Thank you

More Related Content

PDF
DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect
DataStax Academy
 
PDF
DataStax: 0 to App faster with Ruby and NodeJS
DataStax Academy
 
PPTX
Big Data-Driven Applications with Cassandra and Spark
Artem Chebotko
 
PDF
Introduction to TitanDB
Knoldus Inc.
 
PDF
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Databricks
 
PDF
Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...
Databricks
 
PDF
Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Databricks
 
PDF
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Spark Summit
 
DataStax: Rigorous Cassandra Data Modeling for the Relational Data Architect
DataStax Academy
 
DataStax: 0 to App faster with Ruby and NodeJS
DataStax Academy
 
Big Data-Driven Applications with Cassandra and Spark
Artem Chebotko
 
Introduction to TitanDB
Knoldus Inc.
 
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Databricks
 
Spark Machine Learning: Adding Your Own Algorithms and Tools with Holden Kara...
Databricks
 
Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Databricks
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Spark Summit
 

What's hot (20)

PDF
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
Databricks
 
PDF
New Analytics Toolbox DevNexus 2015
Robbie Strickland
 
PDF
Streaming Analytics for Financial Enterprises
Databricks
 
PDF
Apache Beam and Google Cloud Dataflow - IDG - final
Sub Szabolcs Feczak
 
PDF
Structuring Spark: DataFrames, Datasets, and Streaming
Databricks
 
PDF
Keeping Identity Graphs In Sync With Apache Spark
Databricks
 
PDF
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Databricks
 
PDF
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Databricks
 
PDF
Reactive dashboard’s using apache spark
Rahul Kumar
 
PDF
Distributed Stream Processing - Spark Summit East 2017
Petr Zapletal
 
PDF
Druid @ branch
Biswajit Das
 
PDF
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Legacy Typesafe (now Lightbend)
 
PDF
Apache Calcite: One Frontend to Rule Them All
Michael Mior
 
PPTX
Foundations of streaming SQL: stream & table theory
DataWorks Summit
 
PDF
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Spark Summit
 
PPTX
Multi dimension aggregations using spark and dataframes
Romi Kuntsman
 
PPTX
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Brian O'Neill
 
PPTX
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Databricks
 
PPTX
Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
DataWorks Summit
 
PDF
Hyperspace for Delta Lake
Databricks
 
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
Databricks
 
New Analytics Toolbox DevNexus 2015
Robbie Strickland
 
Streaming Analytics for Financial Enterprises
Databricks
 
Apache Beam and Google Cloud Dataflow - IDG - final
Sub Szabolcs Feczak
 
Structuring Spark: DataFrames, Datasets, and Streaming
Databricks
 
Keeping Identity Graphs In Sync With Apache Spark
Databricks
 
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Databricks
 
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Databricks
 
Reactive dashboard’s using apache spark
Rahul Kumar
 
Distributed Stream Processing - Spark Summit East 2017
Petr Zapletal
 
Druid @ branch
Biswajit Das
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Legacy Typesafe (now Lightbend)
 
Apache Calcite: One Frontend to Rule Them All
Michael Mior
 
Foundations of streaming SQL: stream & table theory
DataWorks Summit
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Spark Summit
 
Multi dimension aggregations using spark and dataframes
Romi Kuntsman
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Brian O'Neill
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Databricks
 
Whoops, The Numbers Are Wrong! Scaling Data Quality @ Netflix
DataWorks Summit
 
Hyperspace for Delta Lake
Databricks
 
Ad

Viewers also liked (20)

PPTX
DataStax Enterprise et Cas d'utilisation de Apache Cassandra
Victor Coustenoble
 
PPTX
DataStax Enterprise - La plateforme de base de données pour le Cloud
Victor Coustenoble
 
PDF
Cassandra summit 2015 レポート
datastaxjp
 
PPTX
Sumnet coralcea -presentation
sumnet
 
PDF
Hardening cassandra q2_2016
zznate
 
PDF
Open Source Monitoring for Java with JMX and Graphite (GeeCON 2013)
Cyrille Le Clerc
 
PDF
Advanced Apache Cassandra Operations with JMX
zznate
 
PDF
Elassandra
Diego Pacheco
 
PDF
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax Academy
 
PDF
Battery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
DataStax Academy
 
PDF
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
DataStax Academy
 
PDF
Cassandra Core Concepts
Jon Haddad
 
PDF
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...
DataStax Academy
 
PDF
Cassandra 3.0 Awesomeness
Jon Haddad
 
PDF
DataStax: 7 Deadly Sins for Cassandra Ops
DataStax Academy
 
PDF
Crash course intro to cassandra
Jon Haddad
 
PDF
Cassandra 3.0
Robert Stupp
 
PDF
Diagnosing Problems in Production - Cassandra
Jon Haddad
 
PDF
Instaclustr: Securing Cassandra
DataStax Academy
 
PDF
DataStax: Making Cassandra Fail (for effective testing)
DataStax Academy
 
DataStax Enterprise et Cas d'utilisation de Apache Cassandra
Victor Coustenoble
 
DataStax Enterprise - La plateforme de base de données pour le Cloud
Victor Coustenoble
 
Cassandra summit 2015 レポート
datastaxjp
 
Sumnet coralcea -presentation
sumnet
 
Hardening cassandra q2_2016
zznate
 
Open Source Monitoring for Java with JMX and Graphite (GeeCON 2013)
Cyrille Le Clerc
 
Advanced Apache Cassandra Operations with JMX
zznate
 
Elassandra
Diego Pacheco
 
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax Academy
 
Battery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
DataStax Academy
 
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
DataStax Academy
 
Cassandra Core Concepts
Jon Haddad
 
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...
DataStax Academy
 
Cassandra 3.0 Awesomeness
Jon Haddad
 
DataStax: 7 Deadly Sins for Cassandra Ops
DataStax Academy
 
Crash course intro to cassandra
Jon Haddad
 
Cassandra 3.0
Robert Stupp
 
Diagnosing Problems in Production - Cassandra
Jon Haddad
 
Instaclustr: Securing Cassandra
DataStax Academy
 
DataStax: Making Cassandra Fail (for effective testing)
DataStax Academy
 
Ad

Similar to Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra (20)

PDF
World’s Best Data Modeling Tool
Artem Chebotko
 
PDF
Using the Chebotko Method to Design Sound and Scalable Data Models for Apache...
Artem Chebotko
 
PDF
data-modeling-paper
Artem Chebotko
 
PDF
Rigorous Cassandra Data Modeling for the Relational Data Architect
Artem Chebotko
 
PDF
Les objets connectés : de nombreux cas d'usage
Jedha Bootcamp
 
PDF
Owning time series with team apache Strata San Jose 2015
Patrick McFadin
 
PPT
2. olap warehouse
Azad public school
 
PDF
Apache Cassandra for Timeseries- and Graph-Data
Guido Schmutz
 
PDF
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
DataStax Academy
 
PPTX
Data_Warehousing_and_Data_Mining_Presentation.pptx
tharuanil9000
 
PDF
Data Modeling for Apache Cassandra
DataStax Academy
 
PDF
Data Modeling with Cassandra and Time Series Data
Dani Traphagen
 
PDF
Information is at the heart of all architecture disciplines
Christopher Bradley
 
PDF
Data Mining and Data Warehousing 1st Edition S.K. Mourya
amanamdanear
 
PPTX
types of data modelingEntity-Relationship (E-R) Models UML .pptx
ssuser2690b8
 
PPTX
types of data modeling tecnologyesy.pptx
ssuser2690b8
 
PPTX
data modelingEntity-Relationship (E-R) Models UML (unified modeling language)...
ssuser2690b8
 
PPTX
data modeling data modeling and its context .pptx
ssuser2690b8
 
PDF
Data mining and data warehousing notes
tinamaheswariktm2004
 
PDF
Top 3 Considerations for Machine Learning on Big Data
Datameer
 
World’s Best Data Modeling Tool
Artem Chebotko
 
Using the Chebotko Method to Design Sound and Scalable Data Models for Apache...
Artem Chebotko
 
data-modeling-paper
Artem Chebotko
 
Rigorous Cassandra Data Modeling for the Relational Data Architect
Artem Chebotko
 
Les objets connectés : de nombreux cas d'usage
Jedha Bootcamp
 
Owning time series with team apache Strata San Jose 2015
Patrick McFadin
 
2. olap warehouse
Azad public school
 
Apache Cassandra for Timeseries- and Graph-Data
Guido Schmutz
 
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
DataStax Academy
 
Data_Warehousing_and_Data_Mining_Presentation.pptx
tharuanil9000
 
Data Modeling for Apache Cassandra
DataStax Academy
 
Data Modeling with Cassandra and Time Series Data
Dani Traphagen
 
Information is at the heart of all architecture disciplines
Christopher Bradley
 
Data Mining and Data Warehousing 1st Edition S.K. Mourya
amanamdanear
 
types of data modelingEntity-Relationship (E-R) Models UML .pptx
ssuser2690b8
 
types of data modeling tecnologyesy.pptx
ssuser2690b8
 
data modelingEntity-Relationship (E-R) Models UML (unified modeling language)...
ssuser2690b8
 
data modeling data modeling and its context .pptx
ssuser2690b8
 
Data mining and data warehousing notes
tinamaheswariktm2004
 
Top 3 Considerations for Machine Learning on Big Data
Datameer
 

More from DataStax Academy (20)

PDF
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
DataStax Academy
 
PPTX
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
PPTX
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
DataStax Academy
 
PPTX
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
PDF
Cassandra 3.0 Data Modeling
DataStax Academy
 
PPTX
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
PDF
Coursera Cassandra Driver
DataStax Academy
 
PDF
Production Ready Cassandra
DataStax Academy
 
PDF
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 1
DataStax Academy
 
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 2
DataStax Academy
 
PDF
Standing Up Your First Cluster
DataStax Academy
 
PDF
Real Time Analytics with Dse
DataStax Academy
 
PDF
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
PDF
Cassandra Core Concepts
DataStax Academy
 
PPTX
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
PPTX
Bad Habits Die Hard
DataStax Academy
 
PDF
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 
PDF
Advanced Cassandra
DataStax Academy
 
PDF
Apache Cassandra and Drivers
DataStax Academy
 
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
DataStax Academy
 
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
DataStax Academy
 
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra 3.0 Data Modeling
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Coursera Cassandra Driver
DataStax Academy
 
Production Ready Cassandra
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
DataStax Academy
 
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Cassandra Core Concepts
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Bad Habits Die Hard
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 
Advanced Cassandra
DataStax Academy
 
Apache Cassandra and Drivers
DataStax Academy
 

Recently uploaded (20)

PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
The Future of Artificial Intelligence (AI)
Mukul
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 

Wayne State University & DataStax: World's best data modeling tool for Apache Cassandra

  • 1. World’s Best Data Modeling Tool for Apache Cassandra 1© 2015. All Rights Reserved. Artem ChebotkoAndrey Kashlev
  • 2. 1 Cassandra Data Modeling Methodology 2 The KDM Tool 3 Live Demo: IoT 4 Live Demo: Media Cataloguing 5 Future Work 2© 2015. All Rights Reserved.
  • 3. Data Modeling Process • Data requirements • Application requirements • Schema Design • Optimization 3© 2015. All Rights Reserved.
  • 4. Cassandra Data Modeling Methodology © 2015. All Rights Reserved. 4 Conceptual Data Model Application Workflow Logical Data Model Physical Data Model Mapping Optimization
  • 5. Methodology Models © 2015. All Rights Reserved. 5 Model Representation Conceptual Data Model ERD Application Workflow Model Graph Logical Data Model Chebotko Diagram Physical Data Model Chebotko Diagram, CQL
  • 6. Methodology Protocols © 2015. All Rights Reserved. 6 • Conceptual-to-logical mapping – Mapping rules – Mapping patterns • Physical optimizations – Partition size analysis – Duplication factor analysis – Keys, aggregation, transactions, …
  • 7. Example © 2015. All Rights Reserved. 7 SELECT timestamp, value FROM … WHERE location = ? AND parameter = ? AND timestamp > ? ORDER BY timestamp DESC n parameter value 1 timestampid location Sensor Measurementrecords
  • 8. sensor_data location K parameter K timestamp C↓ id C↑ value 1 Example © 2015. All Rights Reserved. 8 SELECT timestamp, value FROM … WHERE location = ? AND parameter = ? AND timestamp > ? ORDER BY timestamp DESC n parameter value 1 timestampid location Sensor Measurementrecords Mapping Entity and Relationship Types
  • 9. sensor_data location K parameter K timestamp C↓ id C↑ value sensor_data location K parameter K timestamp C↓ id C↑ value 1 2 Example © 2015. All Rights Reserved. 9 SELECT timestamp, value FROM … WHERE location = ? AND parameter = ? AND timestamp > ? ORDER BY timestamp DESC n parameter value 1 timestampid location Sensor Measurementrecords Mapping Equality Search Atributes
  • 10. sensor_data location K parameter K timestamp C↓ id C↑ value sensor_data location K parameter K timestamp C↓ id C↑ value sensor_data location K parameter K timestamp C↑ id C↑ value 1 2 3 Example © 2015. All Rights Reserved. 10 SELECT timestamp, value FROM … WHERE location = ? AND parameter = ? AND timestamp > ? ORDER BY timestamp DESC n parameter value 1 timestampid location Sensor Measurementrecords Mapping Inequality Search Attributes
  • 11. sensor_data location K parameter K timestamp C↓ id C↑ value sensor_data location K parameter K timestamp C↓ id C↑ value sensor_data location K parameter K timestamp C↑ id C↑ value sensor_data location K parameter K timestamp C↓ id C↑ value 1 2 3 4 Example © 2015. All Rights Reserved. 11 SELECT timestamp, value FROM … WHERE location = ? AND parameter = ? AND timestamp > ? ORDER BY timestamp DESC n parameter value 1 timestampid location Sensor Measurementrecords Mapping Ordering Attributes
  • 12. sensor_data location K parameter K timestamp C↓ id C↑ value sensor_data location K parameter K timestamp C↓ id C↑ value sensor_data location K parameter K timestamp C↓ id C↑ value sensor_data location K parameter K timestamp C↑ id C↑ value sensor_data location K parameter K timestamp C↓ id C↑ value 1 2 3 4 5 Example © 2015. All Rights Reserved. 12 SELECT timestamp, value FROM … WHERE location = ? AND parameter = ? AND timestamp > ? ORDER BY timestamp DESC n parameter value 1 timestampid location Sensor Measurementrecords Mapping Key Attributes
  • 13. Methodology Pros and Cons Correctness Completeness © 2015. All Rights Reserved. 13 Complexity Time investment
  • 14. Human Errors Happen … © 2015. All Rights Reserved. 14
  • 15. Automation © 2015. All Rights Reserved. 15 Complexity Time investment Human Error
  • 16. 1 Cassandra Data Modeling Methodology 2 The KDM Tool 3 Live Demo: IoT 4 Live Demo: Media Cataloguing 5 Future Work 16© 2015. All Rights Reserved.
  • 17. The KDM Tool • Streamlines the methodology • Guides the user • Automates data modeling tasks: – Conceptual-to-logical mapping – Physical optimization – CQL generation 17© 2015. All Rights Reserved.
  • 18. KDM Automation Workflow 18© 2015. All Rights Reserved.
  • 19. KDM Automation Workflow 19© 2015. All Rights Reserved. Design Conceptual Data Model Step1 Solution architect
  • 20. KDM Automation Workflow 20© 2015. All Rights Reserved. Design Conceptual Data Model Specify Access Patterns Solution architect Step1 Step2 Solution architect
  • 21. KDM Automation Workflow 21© 2015. All Rights Reserved. Design Conceptual Data Model Specify Access Patterns Generate Logical Data Models KDM Solution architect Step1 Step2 Automated Solution architect
  • 22. KDM Automation Workflow 22© 2015. All Rights Reserved. Design Conceptual Data Model Specify Access Patterns Generate Logical Data Models Select Logical Data Model KDM Solution architect Step1 Step2 Step3Automated Solution architect Solution architect
  • 23. KDM Automation Workflow 23© 2015. All Rights Reserved. Design Conceptual Data Model Specify Access Patterns Generate Logical Data Models Select Logical Data Model Generate Physical Data Model KDM Solution architect Step1 Step2 Step3Automated Automated Solution architect Solution architect KDM
  • 24. KDM Automation Workflow 24© 2015. All Rights Reserved. Design Conceptual Data Model Specify Access Patterns Generate Logical Data Models Select Logical Data Model Generate Physical Data Model Configure Physical Data Model KDM Solution architect Step1 Step2 Step3 Step4Automated Automated Solution architect Solution architect Solution architect KDM
  • 25. KDM Automation Workflow 25© 2015. All Rights Reserved. Design Conceptual Data Model Specify Access Patterns Generate Logical Data Models Select Logical Data Model Generate Physical Data Model Configure Physical Data Model Generate Physical Schema KDM Solution architect Step1 Step2 Step3 Step4Automated Automated Automated Solution architect Solution architect Solution architect KDM KDM
  • 26. KDM Automation Workflow 26© 2015. All Rights Reserved. Design Conceptual Data Model Specify Access Patterns Generate Logical Data Models Select Logical Data Model Generate Physical Data Model Configure Physical Data Model Generate Physical Schema Download CQL Script KDM Solution architect Step1 Step2 Step3 Step4 Step5Automated Automated Automated Solution architect Solution architect Solution architect Solution architect KDM KDM
  • 27. 1 Cassandra Data Modeling Methodology 2 The KDM Tool 3 Live Demo: IoT 4 Live Demo: Media Cataloguing 5 Future Work 27© 2015. All Rights Reserved.
  • 28. 28
  • 29. 1 Cassandra Data Modeling Methodology 2 The KDM Tool 3 Live Demo: IoT 4 Live Demo: Media Cataloguing 5 Future Work 29© 2015. All Rights Reserved.
  • 30. © 2015. All Rights Reserved. 30
  • 31. 31© 2015. All Rights Reserved. • KDM: – automates most complex tasks – eliminates human error – simplifies data modeling – guides – is a general purpose tool Summary
  • 32. 32© 2015. All Rights Reserved. • build new data models • verify existing data models • teach/learn data modeling How Can KDM Help You?
  • 33. 1 Cassandra Data Modeling Methodology 2 The KDM Tool 3 Live Demo: IoT 4 Live Demo: Media Cataloguing 5 Future Work 33© 2015. All Rights Reserved.
  • 34. Future Work • Materialized views © 2015. All Rights Reserved. 34
  • 35. Future Work • Materialized views • User Defined Types © 2015. All Rights Reserved. 35
  • 36. Future Work • Materialized views • User Defined Types • Analysis and physical optimization © 2015. All Rights Reserved. 36
  • 37. Future Work • Materialized views • User Defined Types • Analysis and physical optimization • Support for application workflow design © 2015. All Rights Reserved. 37
  • 38. Future Work • Materialized views • User Defined Types • Analysis and physical optimization • Support for application workflow design • Support for Chebotko Diagrams © 2015. All Rights Reserved. 38
  • 39. Sign up for KDM – it’s FREE! • KDM: kdm.dataview.org • Methodology: academy.datastax.com • Planet Cassandra blog posts: – KDM: An Automated Data Modeling Tool for Apache Cassandra, Pt. 1, Pt. 2 • Artem Chebotko, Andrey Kashlev, Shiyong Lu, “A Big Data Modeling Methodology for Apache Cassandra”, IEEE International Congress on Big Data, 2015. © 2015. All Rights Reserved. 39
  • 40. Acknowledgements • Andrey Kashlev would like to thank: – Dr. Shiyong Lu – Anthony Piazza • Artem Chebotko would like to thank: – Anthony Piazza – Patrick McFadin – Jonathan Ellis – Tim Berglund © 2015. All Rights Reserved. 40