SlideShare a Scribd company logo
Five Tips for
Getting to
Production with
DataStax
Enterprise Graph
1 © DataStax, All Rights Reserved.
A robust, scale-out graph database that focuses on storing, processing,
and acting on highly connected and complex data relationships in real-
time.
DataStax Enterprise (DSE) Graph
© DataStax, All Rights Reserved. Confidential2
• Customer 360
• Personalization
• Recommendations
• Fraud Detection
• Internet of Things
• Asset Management
• Data Integration
Common DSE Graph Use Cases
3 © DataStax, All Rights Reserved. Confidential
Integrating data silos and
exploring neighborhoods to
provide personalized user
experience in real-time.
What is Customer 360 (C360)?
© DataStax, All Rights Reserved. Confidential4
Location
Social
Orders
Account
Contact
Feedback
Devices
360°
Customer
Channels
1. Know Your Data
Distributions
5 © DataStax, All Rights Reserved. Confidential
Ask Yourself...
© DataStax, All Rights Reserved. Confidential6
What relationships exist currently or
could possibly exist in the data?1
Location
Social
Orders
Account
Contact
Feedback
Devices
360°
Customer
Channels
Ask Yourself...
© DataStax, All Rights Reserved. Confidential7
Which of the identified relationships
are important?
What relationships exist currently or
could possibly exist in the data?1
2
Location
Social
Orders
Account
Contact
Feedback
Devices
Customer
Channels
Ask Yourself...
© DataStax, All Rights Reserved. Confidential8
What is the distribution of those
relationships?
Which of the identified relationships
are important?
What relationships exist currently or
could possibly exist in the data?1
2
3
Email to Customer Distribution
...
Number of Customers
CountofEmails
Ask Yourself...
© DataStax, All Rights Reserved. Confidential9
What is the distribution of those
relationships?
Which of the identified relationships
are important?
What relationships exist currently or
could possibly exist in the data?1
2
3
Email to Customer Distribution
...
Number of Edges (Degree)
CountofEmails
2. Know Your Access
Patterns… As Much as
Possible
10 © DataStax, All Rights Reserved. Confidential
Data Modeling
© DataStax, All Rights Reserved. Confidential11
“
The paradigm shift is that
we write our data according to
how we are going to read it.
Nate McCall on the journey of Apache Cassandra during DataStax Accelerate
Relational vs. Cassandra Data Modeling
© DataStax, All Rights Reserved. Confidential12
Application
Models
Data
Data
Models
Application
Relational Cassandra
Relational vs. Cassandra vs Graph Data Modeling
© DataStax, All Rights Reserved. Confidential13
Models
Data Application
Application
Models
Data
Data
Models
Application
Relational Cassandra
Graph
Common C360 Questions
© DataStax, All Rights Reserved. Confidential14
• Who is this customer?
• What is their name, location,
gender, and age?
• What has this customer recently
purchased online or in stores?
• What feedback have they left about
those purchases?
• Who is this customer related to?
• How influential is this customer?
Location
Social
Orders
Account
Contact
Feedback
Devices
Customer
Channels
Common C360 Queries
© DataStax, All Rights Reserved. Confidential15
• Who is this customer?
• What is their name, location,
gender, and age?
• What has this customer recently
purchased online or in stores?
• What feedback have they left about
those purchases?
• Who is this customer related to?
• How influential is this customer?
Location
Social
Orders
Account
Contact
Feedback
Devices
Customer
Channels
Conceptual Data Model
© DataStax, All Rights Reserved. Confidential16
• Who is this customer?
• What is their name, location,
gender, and age?
• What has this customer recently
purchased online or in stores?
• What feedback have they left about
those purchases?
• Who is this customer related to?
• How influential is this customer?
Logical Data Model
© DataStax, All Rights Reserved. Confidential17
• An entity with a single property and an average branching factor of one is a good
indication that the entity should be a property rather than a vertex.
• An entity that has a high median branching factor should be considered for properties
as opposed to vertices.
3. Optimize Query
Performance
18 © DataStax, All Rights Reserved. Confidential
Understand Branching Factor
© DataStax, All Rights Reserved. Confidential19
Traversal time is
roughly
proportional to the
number of edges
and vertices
visited.
Understand Branching Factor
© DataStax, All Rights Reserved. Confidential20
Traversal time is
roughly
proportional to the
number of edges
and vertices
visited.
Filter Vertices out Along the Way
© DataStax, All Rights Reserved. Confidential21
If you know which
vertices you are not
looking for, avoid
walking to them.
Pick the Best Starting Point
© DataStax, All Rights Reserved. Confidential22
Consider where
your traversal
starts - do you walk
along less edges
when you start at
the black vertex or
the red vertex?
Go Back to the Data Model
© DataStax, All Rights Reserved. Confidential23
Can you optimize
the path from black
to red by adding a
short-cut edge?
Go Back to the Data Model
© DataStax, All Rights Reserved. Confidential24
Can you optimize
the path from black
to red by adding a
short-cut edge?
4. Design a Supernode
Strategy
25 © DataStax, All Rights Reserved. Confidential
What is a supernode?
© DataStax, All Rights Reserved. Confidential26
A vertex with a disproportionately high
level of connected edges.
Causes problems such as:
• performance issues
• stability issues
• issues with visualization
• partial or incorrect results
What should you do if you find a supernode?
© DataStax, All Rights Reserved. Confidential27
Try to model this
vertex as a
property.
Consider a
supernode
optimization
strategy.
Validate and clean
data on ingestion.
RIGHT-SKEWED
YES
NO YES
LEFT-SKEWED
NO
What does
the
distribution
look like?
Is your data
the sole
source of
truth?
Is the
data
valid?
What should you do if you find a supernode?
© DataStax, All Rights Reserved. Confidential28
Try to model this
vertex as a
property.
Consider a
supernode
optimization
strategy.
Validate and clean
data on ingestion.
YES
NO YES
LEFT-SKEWED
NO
What does
the
distribution
look like?
Is your data
the sole
source of
truth?
RIGHT-SKEWED
Is the
data
valid?
What should you do if you find a supernode?
© DataStax, All Rights Reserved. Confidential29
Try to model this
vertex as a
property.
Consider a
supernode
optimization
strategy.
Validate and clean
data on ingestion.
RIGHT-SKEWED
YES
NO YES
LEFT-SKEWED
NO
What does
the
distribution
look like?
Is your data
the sole
source of
truth?
Is the
data
valid?
What should you do if you find a supernode?
© DataStax, All Rights Reserved. Confidential30
Try to model this
vertex as a
property.
Consider a
supernode
optimization
strategy.
Validate and clean
data on ingestion.
RIGHT-SKEWED
YES
NO YES
LEFT-SKEWED
NO
What does
the
distribution
look like?
Is your data
the sole
source of
truth?
Is the
data
valid?
What should you do if you find a supernode?
© DataStax, All Rights Reserved. Confidential31
Try to model this
vertex as a
property.
Consider a
supernode
optimization
strategy.
Validate and clean
data on ingestion.
RIGHT-SKEWED
YES
NO YES
LEFT-SKEWED
NO
What does
the
distribution
look like?
Is your data
the sole
source of
truth?
Is the
data
valid?
What should you do if you find a supernode?
© DataStax, All Rights Reserved. Confidential32
Try to model this
vertex as a
property.
Consider a
supernode
optimization
strategy.
Validate and clean
data on ingestion.
RIGHT-SKEWED
YES
NO YES
LEFT-SKEWED
NO
What does
the
distribution
look like?
Is your data
the sole
source of
truth?
Is the
data
valid?
Supernode Strategy: Add an Edge Index
© DataStax, All Rights Reserved. Confidential33
Edge indices, also called Vertex Centric Indices are local to a vertex,
and give the ability to find and traverse only the edges we need
without scanning all edges.
To leverage the index, filter on the edge during the traversal.
Supernode Strategy: Get More Specific
© DataStax, All Rights Reserved. Confidential34
Make your vertices more granular by including another field in the ID
of the vertex.
vs.
If you have a known supernode, but the vertex is too complex to be a
property, you can avoid performance issues by only traversing in to
the vertex to gather information.
Supernode Strategy: Traverse In, but not Out
© DataStax, All Rights Reserved. Confidential35
5. Embrace a Multi-Model
Approach
36 © DataStax, All Rights Reserved. Confidential
Using the Right Tool for the Problem
© DataStax, All Rights Reserved. Confidential37
DSE
Core
DSE Analytics
DSE Search DSE Graph
Query Complexity
Simple Complex
Offline
Fast
Human
Fast
QueryLatency(p99)
Query: Who is this customer?
© DataStax, All Rights Reserved. Confidential38
DSE
Core
DSE Analytics
DSE Search DSE Graph
Query Complexity
Simple Complex
Offline
Fast
Human
Fast
QueryLatency(p99)
Who is this
customer?
Query: What has this customer recently purchased?
© DataStax, All Rights Reserved. Confidential39
DSE
Core
DSE Analytics
DSE Search DSE Graph
Query Complexity
Simple Complex
Offline
Fast
Human
Fast
QueryLatency(p99)
What has
this
customer
recently
purchased?
Who is this
customer?
Query: Who is this customer related to?
© DataStax, All Rights Reserved. Confidential40
DSE
Core
DSE Analytics
DSE Search DSE Graph
Query Complexity
Simple Complex
Offline
Fast
Human
Fast
QueryLatency(p99)
What has
this
customer
recently
purchased?
Who is this
customer related to?
Who is this
customer?
Query: How influential is this customer?
© DataStax, All Rights Reserved. Confidential41
DSE
Core
DSE Analytics
DSE Search DSE Graph
Query Complexity
Simple Complex
Offline
Fast
Human
Fast
QueryLatency(p99)
What has
this
customer
recently
purchased?
Who is this
customer related to?
How influential is
this customer?
Who is this
customer?
Final Multi-Model Approach
© DataStax, All Rights Reserved. Confidential42
DSE
Core
DSE Analytics
DSE Search DSE Graph
Query Complexity
Simple Complex
Offline
Fast
Human
Fast
QueryLatency(p99)
What has
this
customer
recently
purchased?
Who is this
customer related to?
How influential is
this customer?
Who is this
customer?
DataStax Graph For Labs
43 © DataStax, All Rights Reserved. Confidential
DataStax Graph for Labs
© DataStax, All Rights Reserved. Confidential44
“Model Once”
Support
Because solving
complex graph
problems requires
more than just a
graph database.
Inherits DSE
Core Benefits
Fast, scalable and
highly available for
mission critical
applications on
prem and in the
cloud.
Built by the
Experts
Designed and
tested by the core
contributors to
Apache Cassandra
and Tinkerpop.
DataStax Graph for Labs
© DataStax, All Rights Reserved. Confidential45
DOWNLOAD: Visit downloads.datastax.com/#labs to get the new
Graph Engine.
DataStax Graph for Labs
© DataStax, All Rights Reserved. Confidential46
4747
Questions?
Thank you
48

More Related Content

What's hot (20)

PDF
Data Mesh at CMC Markets: Past, Present and Future
Lorenzo Nicora
 
PPTX
An Operational Data Layer is Critical for Transformative Banking Applications
DataStax
 
PDF
Agile Data Management with Enterprise Data Fabric (ASEAN)
Denodo
 
PDF
Managing Smart Meter with DataStax DSE
DataStax
 
PPTX
A Big Data Journey
Paul Boal
 
PPTX
Get Started with Cloudera’s Cyber Solution
Cloudera, Inc.
 
PPTX
Webinar: Building a Multi-Cloud Strategy with Data Autonomy featuring 451 Res...
DataStax
 
PDF
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Databricks
 
PDF
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
Denodo
 
PDF
Self Service Analytics enabled by Data Virtualization from Denodo
Denodo
 
PDF
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...
Denodo
 
PPTX
Crowdsourcing Data Governance
Paul Boal
 
PDF
Modern Data Architecture
Ed Thewlis
 
PDF
Flash session -streaming--ses1243-lon
Jeffrey T. Pollock
 
PPTX
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
PDF
Why Data Virtualization Matters in Your Portfolio
Denodo
 
PPTX
Enterprise 360 - Graphs at the Center of a Data Fabric
Precisely
 
PDF
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
HostedbyConfluent
 
PDF
Future of Data Strategy (ASEAN)
Denodo
 
PPTX
Rethink Analytics with an Enterprise Data Hub
Cloudera, Inc.
 
Data Mesh at CMC Markets: Past, Present and Future
Lorenzo Nicora
 
An Operational Data Layer is Critical for Transformative Banking Applications
DataStax
 
Agile Data Management with Enterprise Data Fabric (ASEAN)
Denodo
 
Managing Smart Meter with DataStax DSE
DataStax
 
A Big Data Journey
Paul Boal
 
Get Started with Cloudera’s Cyber Solution
Cloudera, Inc.
 
Webinar: Building a Multi-Cloud Strategy with Data Autonomy featuring 451 Res...
DataStax
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Databricks
 
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
Denodo
 
Self Service Analytics enabled by Data Virtualization from Denodo
Denodo
 
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...
Denodo
 
Crowdsourcing Data Governance
Paul Boal
 
Modern Data Architecture
Ed Thewlis
 
Flash session -streaming--ses1243-lon
Jeffrey T. Pollock
 
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
Why Data Virtualization Matters in Your Portfolio
Denodo
 
Enterprise 360 - Graphs at the Center of a Data Fabric
Precisely
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
HostedbyConfluent
 
Future of Data Strategy (ASEAN)
Denodo
 
Rethink Analytics with an Enterprise Data Hub
Cloudera, Inc.
 

Similar to Best Practices for Getting to Production with DataStax Enterprise Graph (20)

PPTX
Webinar - Bringing connected graph data to Cassandra with DSE Graph
DataStax
 
PPT
Webinar - The Agility Challenge - Powering Cloud Apps with Multi-Model & Mixe...
DataStax
 
PPTX
Graph Analytics
Khalid Salama
 
PPTX
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j
 
PDF
Trivadis TechEvent 2016 Introduction to DataStax Enterprise (DSE) Graph by Gu...
Trivadis
 
PPTX
Your Roadmap for An Enterprise Graph Strategy
Neo4j
 
PPTX
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
DataStax
 
PDF
Mastering Customer Data on Apache Spark
Caserta
 
PDF
Introduction to Graph Databases
DataStax
 
PPTX
Bad Habits Die Hard
DataStax Academy
 
PPT
Making sense of the Graph Revolution
InfiniteGraph
 
PPTX
EY: Why graph technology makes sense for fraud detection and customer 360 pro...
Neo4j
 
PDF
Roadmap for Enterprise Graph Strategy
Neo4j
 
PPTX
DataStax | Network Analysis Adventure with DSE Graph, DataStax Studio, and Ti...
DataStax
 
PDF
01 introduction to graph data science
Neo4j
 
PDF
Your Roadmap for An Enterprise Graph Strategy
Neo4j
 
PDF
Your Roadmap for An Enterprise Graph Strategy
Neo4j
 
PDF
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Spark Summit
 
PDF
Your Roadmap for An Enterprise Graph Strategy
Neo4j
 
PDF
Advanced Analytics: Graph Database Use Cases
DATAVERSITY
 
Webinar - Bringing connected graph data to Cassandra with DSE Graph
DataStax
 
Webinar - The Agility Challenge - Powering Cloud Apps with Multi-Model & Mixe...
DataStax
 
Graph Analytics
Khalid Salama
 
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j
 
Trivadis TechEvent 2016 Introduction to DataStax Enterprise (DSE) Graph by Gu...
Trivadis
 
Your Roadmap for An Enterprise Graph Strategy
Neo4j
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
DataStax
 
Mastering Customer Data on Apache Spark
Caserta
 
Introduction to Graph Databases
DataStax
 
Bad Habits Die Hard
DataStax Academy
 
Making sense of the Graph Revolution
InfiniteGraph
 
EY: Why graph technology makes sense for fraud detection and customer 360 pro...
Neo4j
 
Roadmap for Enterprise Graph Strategy
Neo4j
 
DataStax | Network Analysis Adventure with DSE Graph, DataStax Studio, and Ti...
DataStax
 
01 introduction to graph data science
Neo4j
 
Your Roadmap for An Enterprise Graph Strategy
Neo4j
 
Your Roadmap for An Enterprise Graph Strategy
Neo4j
 
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Spark Summit
 
Your Roadmap for An Enterprise Graph Strategy
Neo4j
 
Advanced Analytics: Graph Database Use Cases
DATAVERSITY
 
Ad

More from DataStax (19)

PPTX
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
DataStax
 
PPTX
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
DataStax
 
PPTX
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
DataStax
 
PDF
Webinar | Better Together: Apache Cassandra and Apache Kafka
DataStax
 
PDF
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
DataStax
 
PDF
Introduction to Apache Cassandra™ + What’s New in 4.0
DataStax
 
PPTX
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
DataStax
 
PPTX
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
DataStax
 
PPTX
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
DataStax
 
PPTX
Datastax - The Architect's guide to customer experience (CX)
DataStax
 
PPTX
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
DataStax
 
PPTX
Innovation Around Data and AI for Fraud Detection
DataStax
 
PPTX
How to get Real-Time Value from your IoT Data - Datastax
DataStax
 
PPTX
Real Time Customer Experience for today's Right-Now Economy
DataStax
 
PPTX
Accelerating Digital Transformation using Cloud Native Solutions
DataStax
 
PPTX
Webinar - Data Management for the "Right-Now" Economy - The 5 Key Ingredients
DataStax
 
PPTX
Webinar: Customer Experience in Banking - a CTO's Perspective
DataStax
 
PPTX
GDPR: The Catalyst for Customer 360
DataStax
 
PPTX
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
DataStax
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
DataStax
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
DataStax
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
DataStax
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
DataStax
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
DataStax
 
Introduction to Apache Cassandra™ + What’s New in 4.0
DataStax
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
DataStax
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
DataStax
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
DataStax
 
Datastax - The Architect's guide to customer experience (CX)
DataStax
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
DataStax
 
Innovation Around Data and AI for Fraud Detection
DataStax
 
How to get Real-Time Value from your IoT Data - Datastax
DataStax
 
Real Time Customer Experience for today's Right-Now Economy
DataStax
 
Accelerating Digital Transformation using Cloud Native Solutions
DataStax
 
Webinar - Data Management for the "Right-Now" Economy - The 5 Key Ingredients
DataStax
 
Webinar: Customer Experience in Banking - a CTO's Perspective
DataStax
 
GDPR: The Catalyst for Customer 360
DataStax
 
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
DataStax
 
Ad

Recently uploaded (20)

PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Digital Circuits, important subject in CS
contactparinay1
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 

Best Practices for Getting to Production with DataStax Enterprise Graph

  • 1. Five Tips for Getting to Production with DataStax Enterprise Graph 1 © DataStax, All Rights Reserved.
  • 2. A robust, scale-out graph database that focuses on storing, processing, and acting on highly connected and complex data relationships in real- time. DataStax Enterprise (DSE) Graph © DataStax, All Rights Reserved. Confidential2
  • 3. • Customer 360 • Personalization • Recommendations • Fraud Detection • Internet of Things • Asset Management • Data Integration Common DSE Graph Use Cases 3 © DataStax, All Rights Reserved. Confidential
  • 4. Integrating data silos and exploring neighborhoods to provide personalized user experience in real-time. What is Customer 360 (C360)? © DataStax, All Rights Reserved. Confidential4 Location Social Orders Account Contact Feedback Devices 360° Customer Channels
  • 5. 1. Know Your Data Distributions 5 © DataStax, All Rights Reserved. Confidential
  • 6. Ask Yourself... © DataStax, All Rights Reserved. Confidential6 What relationships exist currently or could possibly exist in the data?1 Location Social Orders Account Contact Feedback Devices 360° Customer Channels
  • 7. Ask Yourself... © DataStax, All Rights Reserved. Confidential7 Which of the identified relationships are important? What relationships exist currently or could possibly exist in the data?1 2 Location Social Orders Account Contact Feedback Devices Customer Channels
  • 8. Ask Yourself... © DataStax, All Rights Reserved. Confidential8 What is the distribution of those relationships? Which of the identified relationships are important? What relationships exist currently or could possibly exist in the data?1 2 3 Email to Customer Distribution ... Number of Customers CountofEmails
  • 9. Ask Yourself... © DataStax, All Rights Reserved. Confidential9 What is the distribution of those relationships? Which of the identified relationships are important? What relationships exist currently or could possibly exist in the data?1 2 3 Email to Customer Distribution ... Number of Edges (Degree) CountofEmails
  • 10. 2. Know Your Access Patterns… As Much as Possible 10 © DataStax, All Rights Reserved. Confidential
  • 11. Data Modeling © DataStax, All Rights Reserved. Confidential11 “ The paradigm shift is that we write our data according to how we are going to read it. Nate McCall on the journey of Apache Cassandra during DataStax Accelerate
  • 12. Relational vs. Cassandra Data Modeling © DataStax, All Rights Reserved. Confidential12 Application Models Data Data Models Application Relational Cassandra
  • 13. Relational vs. Cassandra vs Graph Data Modeling © DataStax, All Rights Reserved. Confidential13 Models Data Application Application Models Data Data Models Application Relational Cassandra Graph
  • 14. Common C360 Questions © DataStax, All Rights Reserved. Confidential14 • Who is this customer? • What is their name, location, gender, and age? • What has this customer recently purchased online or in stores? • What feedback have they left about those purchases? • Who is this customer related to? • How influential is this customer? Location Social Orders Account Contact Feedback Devices Customer Channels
  • 15. Common C360 Queries © DataStax, All Rights Reserved. Confidential15 • Who is this customer? • What is their name, location, gender, and age? • What has this customer recently purchased online or in stores? • What feedback have they left about those purchases? • Who is this customer related to? • How influential is this customer? Location Social Orders Account Contact Feedback Devices Customer Channels
  • 16. Conceptual Data Model © DataStax, All Rights Reserved. Confidential16 • Who is this customer? • What is their name, location, gender, and age? • What has this customer recently purchased online or in stores? • What feedback have they left about those purchases? • Who is this customer related to? • How influential is this customer?
  • 17. Logical Data Model © DataStax, All Rights Reserved. Confidential17 • An entity with a single property and an average branching factor of one is a good indication that the entity should be a property rather than a vertex. • An entity that has a high median branching factor should be considered for properties as opposed to vertices.
  • 18. 3. Optimize Query Performance 18 © DataStax, All Rights Reserved. Confidential
  • 19. Understand Branching Factor © DataStax, All Rights Reserved. Confidential19 Traversal time is roughly proportional to the number of edges and vertices visited.
  • 20. Understand Branching Factor © DataStax, All Rights Reserved. Confidential20 Traversal time is roughly proportional to the number of edges and vertices visited.
  • 21. Filter Vertices out Along the Way © DataStax, All Rights Reserved. Confidential21 If you know which vertices you are not looking for, avoid walking to them.
  • 22. Pick the Best Starting Point © DataStax, All Rights Reserved. Confidential22 Consider where your traversal starts - do you walk along less edges when you start at the black vertex or the red vertex?
  • 23. Go Back to the Data Model © DataStax, All Rights Reserved. Confidential23 Can you optimize the path from black to red by adding a short-cut edge?
  • 24. Go Back to the Data Model © DataStax, All Rights Reserved. Confidential24 Can you optimize the path from black to red by adding a short-cut edge?
  • 25. 4. Design a Supernode Strategy 25 © DataStax, All Rights Reserved. Confidential
  • 26. What is a supernode? © DataStax, All Rights Reserved. Confidential26 A vertex with a disproportionately high level of connected edges. Causes problems such as: • performance issues • stability issues • issues with visualization • partial or incorrect results
  • 27. What should you do if you find a supernode? © DataStax, All Rights Reserved. Confidential27 Try to model this vertex as a property. Consider a supernode optimization strategy. Validate and clean data on ingestion. RIGHT-SKEWED YES NO YES LEFT-SKEWED NO What does the distribution look like? Is your data the sole source of truth? Is the data valid?
  • 28. What should you do if you find a supernode? © DataStax, All Rights Reserved. Confidential28 Try to model this vertex as a property. Consider a supernode optimization strategy. Validate and clean data on ingestion. YES NO YES LEFT-SKEWED NO What does the distribution look like? Is your data the sole source of truth? RIGHT-SKEWED Is the data valid?
  • 29. What should you do if you find a supernode? © DataStax, All Rights Reserved. Confidential29 Try to model this vertex as a property. Consider a supernode optimization strategy. Validate and clean data on ingestion. RIGHT-SKEWED YES NO YES LEFT-SKEWED NO What does the distribution look like? Is your data the sole source of truth? Is the data valid?
  • 30. What should you do if you find a supernode? © DataStax, All Rights Reserved. Confidential30 Try to model this vertex as a property. Consider a supernode optimization strategy. Validate and clean data on ingestion. RIGHT-SKEWED YES NO YES LEFT-SKEWED NO What does the distribution look like? Is your data the sole source of truth? Is the data valid?
  • 31. What should you do if you find a supernode? © DataStax, All Rights Reserved. Confidential31 Try to model this vertex as a property. Consider a supernode optimization strategy. Validate and clean data on ingestion. RIGHT-SKEWED YES NO YES LEFT-SKEWED NO What does the distribution look like? Is your data the sole source of truth? Is the data valid?
  • 32. What should you do if you find a supernode? © DataStax, All Rights Reserved. Confidential32 Try to model this vertex as a property. Consider a supernode optimization strategy. Validate and clean data on ingestion. RIGHT-SKEWED YES NO YES LEFT-SKEWED NO What does the distribution look like? Is your data the sole source of truth? Is the data valid?
  • 33. Supernode Strategy: Add an Edge Index © DataStax, All Rights Reserved. Confidential33 Edge indices, also called Vertex Centric Indices are local to a vertex, and give the ability to find and traverse only the edges we need without scanning all edges. To leverage the index, filter on the edge during the traversal.
  • 34. Supernode Strategy: Get More Specific © DataStax, All Rights Reserved. Confidential34 Make your vertices more granular by including another field in the ID of the vertex. vs.
  • 35. If you have a known supernode, but the vertex is too complex to be a property, you can avoid performance issues by only traversing in to the vertex to gather information. Supernode Strategy: Traverse In, but not Out © DataStax, All Rights Reserved. Confidential35
  • 36. 5. Embrace a Multi-Model Approach 36 © DataStax, All Rights Reserved. Confidential
  • 37. Using the Right Tool for the Problem © DataStax, All Rights Reserved. Confidential37 DSE Core DSE Analytics DSE Search DSE Graph Query Complexity Simple Complex Offline Fast Human Fast QueryLatency(p99)
  • 38. Query: Who is this customer? © DataStax, All Rights Reserved. Confidential38 DSE Core DSE Analytics DSE Search DSE Graph Query Complexity Simple Complex Offline Fast Human Fast QueryLatency(p99) Who is this customer?
  • 39. Query: What has this customer recently purchased? © DataStax, All Rights Reserved. Confidential39 DSE Core DSE Analytics DSE Search DSE Graph Query Complexity Simple Complex Offline Fast Human Fast QueryLatency(p99) What has this customer recently purchased? Who is this customer?
  • 40. Query: Who is this customer related to? © DataStax, All Rights Reserved. Confidential40 DSE Core DSE Analytics DSE Search DSE Graph Query Complexity Simple Complex Offline Fast Human Fast QueryLatency(p99) What has this customer recently purchased? Who is this customer related to? Who is this customer?
  • 41. Query: How influential is this customer? © DataStax, All Rights Reserved. Confidential41 DSE Core DSE Analytics DSE Search DSE Graph Query Complexity Simple Complex Offline Fast Human Fast QueryLatency(p99) What has this customer recently purchased? Who is this customer related to? How influential is this customer? Who is this customer?
  • 42. Final Multi-Model Approach © DataStax, All Rights Reserved. Confidential42 DSE Core DSE Analytics DSE Search DSE Graph Query Complexity Simple Complex Offline Fast Human Fast QueryLatency(p99) What has this customer recently purchased? Who is this customer related to? How influential is this customer? Who is this customer?
  • 43. DataStax Graph For Labs 43 © DataStax, All Rights Reserved. Confidential
  • 44. DataStax Graph for Labs © DataStax, All Rights Reserved. Confidential44 “Model Once” Support Because solving complex graph problems requires more than just a graph database. Inherits DSE Core Benefits Fast, scalable and highly available for mission critical applications on prem and in the cloud. Built by the Experts Designed and tested by the core contributors to Apache Cassandra and Tinkerpop.
  • 45. DataStax Graph for Labs © DataStax, All Rights Reserved. Confidential45
  • 46. DOWNLOAD: Visit downloads.datastax.com/#labs to get the new Graph Engine. DataStax Graph for Labs © DataStax, All Rights Reserved. Confidential46