SlideShare a Scribd company logo
Neo4j, Inc. All rights reserved 2021
Neo4j, Inc. All rights reserved 2021
1
Graphs for Data Science and Machine Learning
Dr. Jim Webber
Chief Scientist, Neo4j
#neo4j @jimwebber
Neo4j, Inc. All rights reserved 2021
It’s Not What You Know
Neo4j, Inc. All rights reserved 2021
It’s Who You Know
Neo4j, Inc. All rights reserved 2021
It’s Who You Know And Where They Are
Neo4j, Inc. All rights reserved 2021
5
Higher Pay and More Promotions
• People Near Structural Holes
• Organizational Misfits
Network Structure is
Highly Predictive
Photo by Helena Lopes on Unsplash
“Organizational Misfits and the Origins of Brokerage in Intrafirm Networks” A. Kleinbaum
“Structural Holes and Good Ideas” R. Burt
Neo4j, Inc. All rights reserved 2021
Neo4j, Inc. All rights reserved 2021
6
Relationships
are the strongest
predictors of behavior
But You Can’t Analyse
What You Can’t See
● Most data science techniques
ignore relationships
● It’s painful to manually engineer
connected features from tabular
data
● Graphs are built on
relationships, so…
● You don’t have to guess at
the correlations: with graphs,
relationships are built in
James Fowler
Neo4j, Inc. All rights reserved 2021
7
7 Top 10 Tech Trends in Data and Analytics, 16 Feb 2021
According to Garner, “Graphs form
the foundation of modern D&A,
with capabilities to enhance and
improve user collaboration, ML models
and explainable AI.
The recent Gartner AI in Organizations
Survey demonstrates that graph
techniques are increasingly
prevalent as AI maturity grows,
going from 13% adoption when AI
maturity is lowest to 48% when
maturity is highest.”
AI Research Papers
Featuring Graph
Source: Dimensions Knowledge System
4x
Increase in
traffic to
Neo4j GDS
page in 2H-
2020
Analytics & Data Science Interest
Exploding in Neo4j Community
+4.8m
Views on
the graph
algorithms
short video
+193k
downloads
Neo4j, Inc. All rights reserved 2021
8
Node
Represents an entity in the graph
Relationship
Connect nodes to each other
Property
Describes a node or relationship:
e.g. name, age, weight etc
Wait, what’s a graph?
MICA
ANDRE
Name: “Andre”
Born: May 29, 1970
Twitter: “@dan”
Name: “Mica”
Born: Dec 5, 1975
CAR
Brand “Volvo”
Model: “V70”
Since:
Jan 10, 2011
LOVES
SISTER
BROTHER
O
W
N
S
D
R
I
V
E
S
Neo4j, Inc. All rights reserved 2021
Networks of People Transaction Networks
Bought
B
o
u
gh
t
V
i
e
w
e
d
R
e
t
u
r
n
e
d
Bought
Knowledge Networks
Pl
ay
s
Lives_in
In_sport
Likes
F
a
n
_
o
f
Plays_for
Risk management,
Supply chain, Orders,
Payments, etc.
Employees, Customers,
Suppliers, Partners,
Influencers, etc.
Enterprise content,
Domain specific content,
eCommerce content, etc
K
n
o
w
s
Knows
Knows
K
n
o
w
s
9
Everything is Naturally Connected
Neo4j, Inc. All rights reserved 2021
10
Queries
Find the patterns you know exist.
Machine Learning
Uncover trends and make
predictions
Visualization
Explore, collaborate, and explain
Graphs & Data Science
Analytics
Feature
Engineering
Data
Exploration
Graph
Data
Science
Queries
Machine Learning Visualization
Neo4j, Inc. All rights reserved 2021
11
Graphs & Data Science
Knowledge Graphs
Graph Algorithms
Graph Native
Machine Learning
Find the patterns you’re
looking for in connected data
Use unsupervised machine
learning techniques to
identify associations,
anomalies, and trends.
Use embeddings to learn the
features in your graph that
you don’t even know are
important yet.
Train in-graph supervised ML
models to predict links,
labels, and missing data.
Neo4j, Inc. All rights reserved 2021
Neo4j’s Graph Data Science Framework
Neo4j Graph Data
Science Library
Neo4j
Database
Neo4j
Bloom
Scalable Graph Algorithms &
Analytics Workspace
Native Graph Creation &
Persistence
Visual Graph
Exploration & Prototyping
Neo4j, Inc. All rights reserved 2021
~70 Robust Graph Algorithms & ML methods
● Compute metrics about the topology and connectivity
● Build predictive models to enhance your graph
● Highly parallelized and scale to 10’s of billions of nodes
13
The Neo4j GDS Library
Mutable In-Memory
Workspace
Computational Graph
Native Graph Store
Efficient & Flexible Analytics Workspace
● Automatically reshapes transactional graphs into
an in-memory analytics graph
● Optimized for global traversals and aggregation
● Create workflows and layer algorithms
● Store and manage predictive models in the
model catalog
Neo4j, Inc. All rights reserved 2021
14
~70 Graph Data Science Techniques in Neo4j
Pathfinding &
Search
• Shortest Path
• Single-Source Shortest Path
• All Pairs Shortest Path
• A* Shortest Path
• Yen’s K Shortest Path
• Minimum Weight Spanning Tree
• K-Spanning Tree (MST)
• Random Walk
• Breadth & Depth First Search
Centrality &
Importance
• Degree Centrality
• Closeness Centrality
• Harmonic Centrality
• Betweenness Centrality & Approx.
• PageRank
• Personalized PageRank
• ArticleRank
• Eigenvector Centrality
• Hyperlink Induced Topic Search (HITS)
• Influence Maximization (Greedy, CELF)
Community
Detection
• Triangle Count
• Local Clustering Coefficient
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity
• K-1 Coloring
• Modularity Optimization
• Speaker Listener Label Propagation
Supervised
Machine Learning
• Node Classification
• Link Prediction
… and more!
Heuristic Link
Prediction
• Adamic Adar
• Common Neighbors
• Preferential Attachment
• Resource Allocations
• Same Community
• Total Neighbors
Similarity
• Node Similarity
• K-Nearest Neighbors (KNN)
• Jaccard Similarity
• Cosine Similarity
• Pearson Similarity
• Euclidean Distance
• Approximate Nearest Neighbors (ANN)
Graph
Embeddings
• Node2Vec
• FastRP
• FastRPExtended
• GraphSAGE
• Synthetic Graph Generation
• Scale Properties
• Collapse Paths
• One Hot Encoding
• Split Relationships
• Graph Export
• Pregel API (write your own algos)
Neo4j, Inc. All rights reserved 2021
Our Special Sauce: The Graph Catalogue
• Neo4j automates data
transformations
• Experiment with different data
sets, data models
• Fast iterations & layering
• Production ready features,
parallelization & enterprise
support
• Ability to persist and version
data
A graph-specific analytics workspace that’s mutable – integrated with a native-
graph database
Mutable In-Memory Workspace
Computational Graph
Native Graph Store
Neo4j, Inc. All rights reserved 2021
Neo4j, Inc. All rights reserved 2021
Right, so how will this improve
my machine learning project?
16
Neo4j, Inc. All rights reserved 2021
Community
Detection
17
Neo4j’s Graph Data Science Library
Unsupervised Graph Algorithms
Clustering
Dimension Reduction
(generalization)
Association
Which parts of my graph are
connected to each other?
Which nodes are most
similar?
How important is each node?
Supervised Machine Learning
Node Classification
Link Prediction
Where will connections
form next?
What’s the label
for this node?
Centrality
Embeddings
Similarity
Pathfinding
More Algos than
any other vendor ONLY in neo4j
Neo4j, Inc. All rights reserved 2021
Better Predictions with Data You Already Have
● Traditional ML ignores network structure because it’s difficult to extract
● Add graph data to existing ML pipelines to increase accuracy, or
● Graphs use relationships to unlock otherwise unattainable predictions
18
Machine Learning Pipeline
Neo4j, Inc. All rights reserved 2021
Graphs & Supervised Machine Learning
Traditional ML problems where
relationships between your data points
are important predictive features
19
Predictions influenced by
graph structure
Predictions about
graph structure
Enhance your graph by predicting
missing data or changes to your graph
that will occur in the future
Neo4j, Inc. All rights reserved 2021
Graphs & Supervised Machine Learning
Traditional ML problems where
relationships between your data points
are important predictive features
20
Predictions influenced by
graph structure
Predictions about
graph structure
Enhance your graph by predicting
missing data or changes to your graph
that will occur in the future
Neo4j, Inc. All rights reserved 2021
Graphs & Supervised Machine Learning
Traditional ML problems where
relationships between your data points
are important predictive features
21
Predictions influenced by
graph structure
Predictions about
graph structure
Enhance your graph by predicting
missing data or changes to your graph
that will occur in the future
Neo4j, Inc. All rights reserved 2021
Graphs & Supervised Machine Learning
Traditional ML problems where
relationships between your data points
are important predictive features
22
Predictions influenced by
graph structure
Predictions about
graph structure
Enhance your graph by predicting
missing data or changes to your graph
that will occur in the future
Neo4j, Inc. All rights reserved 2021
23
Graph Feature Engineering
Feature Engineering is how we combine and process data to create new,
more meaningful features. Graph algorithms and embeddings translate the
connections within your data into the rows and columns you need for ML.
Neo4j, Inc. All rights reserved 2021
24
In-Graph Machine Learning
Node
classification:
“What kind of
node is this?”
Link prediction:
“Should there be a
relationship between
these nodes?”
Labeled data: Pairs of nodes
that are either linked or not
Features: Pre-existing
attributes, algorithms
(pageRank), embedding
Neo4j, Inc. All rights reserved 2021
25
Node Classification - in Neo4j
Load your in- memory
graph with labels &
features
Use
nodeClassification.train
Specify the property you want to
predict and the features for making
that prediction
Node classification:
Predicting a node label or (categorical) property
Neo4j Automates the Tricky Parts:
1. Splits data for train & test
2. Builds logistic regression models using the training data
& specified parameters to predict the correct label
3. Evaluates the accuracy of the models using the test data
4. Returns the best performing model
The predictive model
appears in the model
catalog, ready
to apply to
new data
Neo4j, Inc. All rights reserved 2021
26
Link Prediction - in Neo4j
Load your in- memory
graph with labels &
features
Use
linkPrediction.train
Split your graph into train & test
splitRelationships.mutate
Link Prediction:
Predicting unobserved edges or relationships that will form in the future
Neo4j Automates the Tricky Parts:
1. Builds logistic regression models using the training data
& specified parameters to predict the correct label
2. Evaluates the accuracy of the models using the test data
3. Returns the best performing model
The predictive model
appears in the model
catalog, ready
to apply to
new data
Neo4j, Inc. All rights reserved 2021
27
Machine Learning Models in Neo4j
Train a model using your graph and apply it to new or unseen data
Not a data model — a predictive model
Models live in the Neo4j in the
model catalog
• Contains versioning information
• Input data
• Time stamps
• Model names
• Trained models can be persisted to disk
and shared with colleagues
ML Models in the
Analytics Workspace
Neo4j, Inc. All rights reserved 2021
28
Neo4j: The Only Completely In-Graph, ML Workflow
Graph-Native
Feature
Engineering
Train
Predictive Model
Queries
Algorithms
Embeddings
1. Model Type
2. Property
Selection
3. Train & Test
4. Model
Selection
Apply Model to
Existing / New
Data
Use Predictions
for Decisions
Use Predictions
to Enhance
the Graph
Publish & Share
Store Model in
Database
Neo4j, Inc. All rights reserved 2021
29
Neo4j is part of your data ecosystem
DATA SOURCES USE CASES
INGEST
Apache
Hop
Structured
Unstructured
DATA
ANALYTICS
DATA
MANAGEMENT
Journey Analytics
Risk Analytics
Churn Analysis
What-if Analysis
Feature
Engineering & ML
Fraud
Recommendations
Data Fabric
Data Compliance
Data Governance
Data Provenance
Data Lineage
Next Best Case
Ontologies
Neo4j
Bloom
Neo4j
GDS Library
PRODUCT COMPONENTS
APOC
VISUALIZE
AUTO ML
DRIVERS & APIs
Neo4j, Inc. All rights reserved 2021
Neo4j, Inc. All rights reserved 2021
Real World Use Cases
30
Neo4j, Inc. All rights reserved 2021
31
Graph Data Science Spans Industries and Uses
Personalized
Recommendations
Churn
Prediction
Market
Segmentation
Life
Sciences
Predictive
Maintenance
Cybersecurity
Master Data
Management
Fraud
Detection
Neo4j, Inc. All rights reserved 2021
32
Accelerate Innovation using Neo4j Graph Data Science
From Simple to Highly Sophisticated Data Science
Uranus is the third
biggest planet
Analysis Repeatability
Analysis
Complexity
Full Production
Simple, Ad Hoc
High
Analytics
Data Science
Neo4j, Inc. All rights reserved 2021
33
Accelerate Innovation using Neo4j Graph Data Science
From Simple to Highly Sophisticated Data Science
Uranus is the third
biggest planet
R&D: Better health
outcomes through
machine learning on
patient journeys
Fraud Detection
with graph feature
engineering +
AutoML
Analytics to improve reliability
by predicting problems in a
supply-chain knowledge graph
Analysis Repeatability
Analysis
Complexity
Full Production
Simple, Ad Hoc
High
Analytics
Data Science
FinServ
Customers
Neo4j, Inc. All rights reserved 2021
34
• Challenge: Difficulty finding faulty
components via ad hoc analytics on a
vertically integrated supply chain
• Solution: Uses a knowledge graph to model
and analyze their complex products
• Results:
○ Quickly pinpoint root causes of
problems
○ Reduced query times from two
minutes to seconds
○ Anti-recommendation using
graph algorithms to identify and
eliminate bad combinations of
components
Boston Scientific
Finding At-Fault Components
Neo4j, Inc. All rights reserved 2021
35
• Challenge: Graphs are an important
predictive signal, but can be challenging to
incorporate into production ML
• Solution: Use Neo4j for repeatable feature
engineering and incorporate results into
autoML pipelines.
• Results:
○ Identified millions of dollars in
previously undetectable fraud
○ Enriched graph with the results of
investigations to improve future
predictions
Top 10 Bank
Fighting Fraud
Neo4j, Inc. All rights reserved 2021
36
AstraZeneca
Patient Journey
“We used graph algorithms to find
patients that had specific journey
types and patterns and then find
others that are close and similar.”
Joseph Roemer
Global Commercial IT Insight & Analytics Sr. Director
AstraZeneca
● Challenge: How to best intervene sooner
for complex diseases that develop over
years
● Solution: Neo4j knowledge graph of 3 yrs
of visits, tests, & diagnosis with 10’s billions
of records. Using graph algorithms and
machine learning together.
● Results:
○ Identified journey archetypes and
patterns using graph feature
engineering as input to ML
○ Revealed journey similarities over
time with community detection
○ Found influential touch-points in
the journey using graph algorithms
Neo4j, Inc. All rights reserved 2021
What’s most important and
influential in my business?
What’s occurring that’s unusual?
What’s going to happen next?
But traditional
approaches to data make
it impossible to reveal and
effectively use those
connections as data sizes
become large
Predictive signals get lost in
big data noise
37
Graph Data Science Answers the BIG Questions
Connected Data is
Powerful
Graph Data Science uses
Connections to Answer
Critical Questions
Neo4j, Inc. All rights reserved 2021
38
Neo4j Graph Data Science
70 Graph Algorithms
More supported algorithms
than any other vendor
Graph-Native ML
Only commercial offering with
full graph ML workflows
Humane Experience
Automatic transformation from
storage to analytics and
visualization
Scalable Data Science
Algorithms running over 10’s
billions of nodes in production
Extensible
Integrate with other data sources
and ML platforms
Strongest Community
220K+ practioners
72K+ meetups
Neo4j, Inc. All rights reserved 2021
Neo4j, Inc. All rights reserved 2021
39
Resources
Graph Resources
● Video: Advantages of Graph Technology
● Whitepaper: AI & Graph Technology: Enhancing AI with Context &
Connections
● Whitepaper: Financial Fraud Detection with Graph Data Science
● Case Study: Meredith Corporation
Neo4j Bookshelf
● Graph Databases For Dummies
● Graph Data Science For Dummies
● O’Reilly Graph Algorithms
● O’Reilly Knowledge Graphs
Neo4j, Inc. All rights reserved 2021
Neo4j, Inc. All rights reserved 2021
40
Thank you!
Come see us at Stand B400
Dr. Jim Webber
Chief Scientist, Neo4j
jim.webber@neo4j.com

More Related Content

What's hot (20)

PDF
Graphs for Finance - AML with Neo4j Graph Data Science
Neo4j
 
PDF
Workshop - Neo4j Graph Data Science
Neo4j
 
PDF
Workshop Tel Aviv - Graph Data Science
Neo4j
 
PDF
ntroducing to the Power of Graph Technology
Neo4j
 
PPTX
Easily Identify Sources of Supply Chain Gridlock
Neo4j
 
PDF
Knowledge Graph Embeddings for Recommender Systems
Enrico Palumbo
 
PPTX
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Neo4j
 
PPTX
Smarter Fraud Detection With Graph Data Science
Neo4j
 
PDF
Workshop Introduction to Neo4j
Neo4j
 
PDF
Graph Data Science at Scale
Neo4j
 
PDF
Graph Databases – Benefits and Risks
DATAVERSITY
 
PDF
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
Neo4j
 
PDF
Intro to Neo4j and Graph Databases
Neo4j
 
PDF
The Future of Data Science
DataWorks Summit
 
PDF
Knowledge Graphs - The Power of Graph-Based Search
Neo4j
 
PDF
GSK: How Knowledge Graphs Improve Clinical Reporting Workflows
Neo4j
 
PPTX
Intro to Neo4j
Neo4j
 
PPTX
Volvo Cars - Retrieving Safety Insights using Graphs (GraphSummit Stockholm 2...
Neo4j
 
PDF
Graph-Powered Machine Learning
Databricks
 
PPSX
Technip Energies Italy: Planning is a graph matter
Neo4j
 
Graphs for Finance - AML with Neo4j Graph Data Science
Neo4j
 
Workshop - Neo4j Graph Data Science
Neo4j
 
Workshop Tel Aviv - Graph Data Science
Neo4j
 
ntroducing to the Power of Graph Technology
Neo4j
 
Easily Identify Sources of Supply Chain Gridlock
Neo4j
 
Knowledge Graph Embeddings for Recommender Systems
Enrico Palumbo
 
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Neo4j
 
Smarter Fraud Detection With Graph Data Science
Neo4j
 
Workshop Introduction to Neo4j
Neo4j
 
Graph Data Science at Scale
Neo4j
 
Graph Databases – Benefits and Risks
DATAVERSITY
 
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
Neo4j
 
Intro to Neo4j and Graph Databases
Neo4j
 
The Future of Data Science
DataWorks Summit
 
Knowledge Graphs - The Power of Graph-Based Search
Neo4j
 
GSK: How Knowledge Graphs Improve Clinical Reporting Workflows
Neo4j
 
Intro to Neo4j
Neo4j
 
Volvo Cars - Retrieving Safety Insights using Graphs (GraphSummit Stockholm 2...
Neo4j
 
Graph-Powered Machine Learning
Databricks
 
Technip Energies Italy: Planning is a graph matter
Neo4j
 

Similar to Graphs for Data Science and Machine Learning (20)

PDF
Graph Data Science: The Secret to Accelerating Innovation with AI/ML
Neo4j
 
PDF
Graph Data Science with Neo4j: Nordics Webinar
Neo4j
 
PDF
GraphSummit Toronto: Leveraging Graphs for AI and ML
Neo4j
 
PPTX
Using Connected Data and Graph Technology to Enhance Machine Learning and Art...
Neo4j
 
PDF
Einstieg in Neo4j Graph Data Science
Neo4j
 
PDF
3. Relationships Matter: Using Connected Data for Better Machine Learning
Neo4j
 
PDF
El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...
Neo4j
 
PDF
GPT and Graph Data Science to power your Knowledge Graph
Neo4j
 
PDF
4. Document Discovery with Graph Data Science
Neo4j
 
PDF
Relationships Matter: Using Connected Data for Better Machine Learning
Neo4j
 
PPTX
Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...
Neo4j
 
PDF
What Is GDS and Neo4j’s GDS Library
Neo4j
 
PDF
How Graph Technology is Changing AI
Databricks
 
PDF
Deeper Insights with Graph Data Science
Neo4j
 
PDF
GraphSummit Toronto: Keynote - Innovating with Graphs
Neo4j
 
PDF
Introduction to Neo4j
Neo4j
 
PPTX
The path to success with graph database and graph data science_ Neo4j GraphSu...
Neo4j
 
PDF
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
Ivan Zoratti
 
PDF
Neo4j Graph Data Science - Webinar
Neo4j
 
PDF
Neo4j : la voie du succès avec les bases de données de graphes et la Graph Da...
Neo4j
 
Graph Data Science: The Secret to Accelerating Innovation with AI/ML
Neo4j
 
Graph Data Science with Neo4j: Nordics Webinar
Neo4j
 
GraphSummit Toronto: Leveraging Graphs for AI and ML
Neo4j
 
Using Connected Data and Graph Technology to Enhance Machine Learning and Art...
Neo4j
 
Einstieg in Neo4j Graph Data Science
Neo4j
 
3. Relationships Matter: Using Connected Data for Better Machine Learning
Neo4j
 
El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...
Neo4j
 
GPT and Graph Data Science to power your Knowledge Graph
Neo4j
 
4. Document Discovery with Graph Data Science
Neo4j
 
Relationships Matter: Using Connected Data for Better Machine Learning
Neo4j
 
Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...
Neo4j
 
What Is GDS and Neo4j’s GDS Library
Neo4j
 
How Graph Technology is Changing AI
Databricks
 
Deeper Insights with Graph Data Science
Neo4j
 
GraphSummit Toronto: Keynote - Innovating with Graphs
Neo4j
 
Introduction to Neo4j
Neo4j
 
The path to success with graph database and graph data science_ Neo4j GraphSu...
Neo4j
 
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
Ivan Zoratti
 
Neo4j Graph Data Science - Webinar
Neo4j
 
Neo4j : la voie du succès avec les bases de données de graphes et la Graph Da...
Neo4j
 
Ad

More from Neo4j (20)

PDF
GraphSummit Singapore Master Deck - May 20, 2025
Neo4j
 
PPTX
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j
 
PPTX
Neo4j Knowledge for Customer Experience.pptx
Neo4j
 
PPTX
GraphTalk New Zealand - The Art of The Possible.pptx
Neo4j
 
PDF
Neo4j: The Art of the Possible with Graph
Neo4j
 
PDF
Smarter Knowledge Graphs For Public Sector
Neo4j
 
PDF
GraphRAG and Knowledge Graphs Exploring AI's Future
Neo4j
 
PDF
Matinée GenAI & GraphRAG Paris - Décembre 24
Neo4j
 
PDF
ANZ Presentation: GraphSummit Melbourne 2024
Neo4j
 
PDF
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Neo4j
 
PDF
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Neo4j
 
PDF
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Neo4j
 
PDF
Démonstration Digital Twin Building Wire Management
Neo4j
 
PDF
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Neo4j
 
PDF
Démonstration Supply Chain - GraphTalk Paris
Neo4j
 
PDF
The Art of Possible - GraphTalk Paris Opening Session
Neo4j
 
PPTX
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
Neo4j
 
PDF
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Neo4j
 
PDF
Neo4j Graph Data Modelling Session - GraphTalk
Neo4j
 
PDF
Neo4j: The Art of Possible with Graph Technology
Neo4j
 
GraphSummit Singapore Master Deck - May 20, 2025
Neo4j
 
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j
 
Neo4j Knowledge for Customer Experience.pptx
Neo4j
 
GraphTalk New Zealand - The Art of The Possible.pptx
Neo4j
 
Neo4j: The Art of the Possible with Graph
Neo4j
 
Smarter Knowledge Graphs For Public Sector
Neo4j
 
GraphRAG and Knowledge Graphs Exploring AI's Future
Neo4j
 
Matinée GenAI & GraphRAG Paris - Décembre 24
Neo4j
 
ANZ Presentation: GraphSummit Melbourne 2024
Neo4j
 
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Neo4j
 
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Neo4j
 
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Neo4j
 
Démonstration Digital Twin Building Wire Management
Neo4j
 
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Neo4j
 
Démonstration Supply Chain - GraphTalk Paris
Neo4j
 
The Art of Possible - GraphTalk Paris Opening Session
Neo4j
 
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
Neo4j
 
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Neo4j
 
Neo4j Graph Data Modelling Session - GraphTalk
Neo4j
 
Neo4j: The Art of Possible with Graph Technology
Neo4j
 
Ad

Recently uploaded (20)

PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Digital Circuits, important subject in CS
contactparinay1
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 

Graphs for Data Science and Machine Learning

  • 1. Neo4j, Inc. All rights reserved 2021 Neo4j, Inc. All rights reserved 2021 1 Graphs for Data Science and Machine Learning Dr. Jim Webber Chief Scientist, Neo4j #neo4j @jimwebber
  • 2. Neo4j, Inc. All rights reserved 2021 It’s Not What You Know
  • 3. Neo4j, Inc. All rights reserved 2021 It’s Who You Know
  • 4. Neo4j, Inc. All rights reserved 2021 It’s Who You Know And Where They Are
  • 5. Neo4j, Inc. All rights reserved 2021 5 Higher Pay and More Promotions • People Near Structural Holes • Organizational Misfits Network Structure is Highly Predictive Photo by Helena Lopes on Unsplash “Organizational Misfits and the Origins of Brokerage in Intrafirm Networks” A. Kleinbaum “Structural Holes and Good Ideas” R. Burt
  • 6. Neo4j, Inc. All rights reserved 2021 Neo4j, Inc. All rights reserved 2021 6 Relationships are the strongest predictors of behavior But You Can’t Analyse What You Can’t See ● Most data science techniques ignore relationships ● It’s painful to manually engineer connected features from tabular data ● Graphs are built on relationships, so… ● You don’t have to guess at the correlations: with graphs, relationships are built in James Fowler
  • 7. Neo4j, Inc. All rights reserved 2021 7 7 Top 10 Tech Trends in Data and Analytics, 16 Feb 2021 According to Garner, “Graphs form the foundation of modern D&A, with capabilities to enhance and improve user collaboration, ML models and explainable AI. The recent Gartner AI in Organizations Survey demonstrates that graph techniques are increasingly prevalent as AI maturity grows, going from 13% adoption when AI maturity is lowest to 48% when maturity is highest.” AI Research Papers Featuring Graph Source: Dimensions Knowledge System 4x Increase in traffic to Neo4j GDS page in 2H- 2020 Analytics & Data Science Interest Exploding in Neo4j Community +4.8m Views on the graph algorithms short video +193k downloads
  • 8. Neo4j, Inc. All rights reserved 2021 8 Node Represents an entity in the graph Relationship Connect nodes to each other Property Describes a node or relationship: e.g. name, age, weight etc Wait, what’s a graph? MICA ANDRE Name: “Andre” Born: May 29, 1970 Twitter: “@dan” Name: “Mica” Born: Dec 5, 1975 CAR Brand “Volvo” Model: “V70” Since: Jan 10, 2011 LOVES SISTER BROTHER O W N S D R I V E S
  • 9. Neo4j, Inc. All rights reserved 2021 Networks of People Transaction Networks Bought B o u gh t V i e w e d R e t u r n e d Bought Knowledge Networks Pl ay s Lives_in In_sport Likes F a n _ o f Plays_for Risk management, Supply chain, Orders, Payments, etc. Employees, Customers, Suppliers, Partners, Influencers, etc. Enterprise content, Domain specific content, eCommerce content, etc K n o w s Knows Knows K n o w s 9 Everything is Naturally Connected
  • 10. Neo4j, Inc. All rights reserved 2021 10 Queries Find the patterns you know exist. Machine Learning Uncover trends and make predictions Visualization Explore, collaborate, and explain Graphs & Data Science Analytics Feature Engineering Data Exploration Graph Data Science Queries Machine Learning Visualization
  • 11. Neo4j, Inc. All rights reserved 2021 11 Graphs & Data Science Knowledge Graphs Graph Algorithms Graph Native Machine Learning Find the patterns you’re looking for in connected data Use unsupervised machine learning techniques to identify associations, anomalies, and trends. Use embeddings to learn the features in your graph that you don’t even know are important yet. Train in-graph supervised ML models to predict links, labels, and missing data.
  • 12. Neo4j, Inc. All rights reserved 2021 Neo4j’s Graph Data Science Framework Neo4j Graph Data Science Library Neo4j Database Neo4j Bloom Scalable Graph Algorithms & Analytics Workspace Native Graph Creation & Persistence Visual Graph Exploration & Prototyping
  • 13. Neo4j, Inc. All rights reserved 2021 ~70 Robust Graph Algorithms & ML methods ● Compute metrics about the topology and connectivity ● Build predictive models to enhance your graph ● Highly parallelized and scale to 10’s of billions of nodes 13 The Neo4j GDS Library Mutable In-Memory Workspace Computational Graph Native Graph Store Efficient & Flexible Analytics Workspace ● Automatically reshapes transactional graphs into an in-memory analytics graph ● Optimized for global traversals and aggregation ● Create workflows and layer algorithms ● Store and manage predictive models in the model catalog
  • 14. Neo4j, Inc. All rights reserved 2021 14 ~70 Graph Data Science Techniques in Neo4j Pathfinding & Search • Shortest Path • Single-Source Shortest Path • All Pairs Shortest Path • A* Shortest Path • Yen’s K Shortest Path • Minimum Weight Spanning Tree • K-Spanning Tree (MST) • Random Walk • Breadth & Depth First Search Centrality & Importance • Degree Centrality • Closeness Centrality • Harmonic Centrality • Betweenness Centrality & Approx. • PageRank • Personalized PageRank • ArticleRank • Eigenvector Centrality • Hyperlink Induced Topic Search (HITS) • Influence Maximization (Greedy, CELF) Community Detection • Triangle Count • Local Clustering Coefficient • Connected Components (Union Find) • Strongly Connected Components • Label Propagation • Louvain Modularity • K-1 Coloring • Modularity Optimization • Speaker Listener Label Propagation Supervised Machine Learning • Node Classification • Link Prediction … and more! Heuristic Link Prediction • Adamic Adar • Common Neighbors • Preferential Attachment • Resource Allocations • Same Community • Total Neighbors Similarity • Node Similarity • K-Nearest Neighbors (KNN) • Jaccard Similarity • Cosine Similarity • Pearson Similarity • Euclidean Distance • Approximate Nearest Neighbors (ANN) Graph Embeddings • Node2Vec • FastRP • FastRPExtended • GraphSAGE • Synthetic Graph Generation • Scale Properties • Collapse Paths • One Hot Encoding • Split Relationships • Graph Export • Pregel API (write your own algos)
  • 15. Neo4j, Inc. All rights reserved 2021 Our Special Sauce: The Graph Catalogue • Neo4j automates data transformations • Experiment with different data sets, data models • Fast iterations & layering • Production ready features, parallelization & enterprise support • Ability to persist and version data A graph-specific analytics workspace that’s mutable – integrated with a native- graph database Mutable In-Memory Workspace Computational Graph Native Graph Store
  • 16. Neo4j, Inc. All rights reserved 2021 Neo4j, Inc. All rights reserved 2021 Right, so how will this improve my machine learning project? 16
  • 17. Neo4j, Inc. All rights reserved 2021 Community Detection 17 Neo4j’s Graph Data Science Library Unsupervised Graph Algorithms Clustering Dimension Reduction (generalization) Association Which parts of my graph are connected to each other? Which nodes are most similar? How important is each node? Supervised Machine Learning Node Classification Link Prediction Where will connections form next? What’s the label for this node? Centrality Embeddings Similarity Pathfinding More Algos than any other vendor ONLY in neo4j
  • 18. Neo4j, Inc. All rights reserved 2021 Better Predictions with Data You Already Have ● Traditional ML ignores network structure because it’s difficult to extract ● Add graph data to existing ML pipelines to increase accuracy, or ● Graphs use relationships to unlock otherwise unattainable predictions 18 Machine Learning Pipeline
  • 19. Neo4j, Inc. All rights reserved 2021 Graphs & Supervised Machine Learning Traditional ML problems where relationships between your data points are important predictive features 19 Predictions influenced by graph structure Predictions about graph structure Enhance your graph by predicting missing data or changes to your graph that will occur in the future
  • 20. Neo4j, Inc. All rights reserved 2021 Graphs & Supervised Machine Learning Traditional ML problems where relationships between your data points are important predictive features 20 Predictions influenced by graph structure Predictions about graph structure Enhance your graph by predicting missing data or changes to your graph that will occur in the future
  • 21. Neo4j, Inc. All rights reserved 2021 Graphs & Supervised Machine Learning Traditional ML problems where relationships between your data points are important predictive features 21 Predictions influenced by graph structure Predictions about graph structure Enhance your graph by predicting missing data or changes to your graph that will occur in the future
  • 22. Neo4j, Inc. All rights reserved 2021 Graphs & Supervised Machine Learning Traditional ML problems where relationships between your data points are important predictive features 22 Predictions influenced by graph structure Predictions about graph structure Enhance your graph by predicting missing data or changes to your graph that will occur in the future
  • 23. Neo4j, Inc. All rights reserved 2021 23 Graph Feature Engineering Feature Engineering is how we combine and process data to create new, more meaningful features. Graph algorithms and embeddings translate the connections within your data into the rows and columns you need for ML.
  • 24. Neo4j, Inc. All rights reserved 2021 24 In-Graph Machine Learning Node classification: “What kind of node is this?” Link prediction: “Should there be a relationship between these nodes?” Labeled data: Pairs of nodes that are either linked or not Features: Pre-existing attributes, algorithms (pageRank), embedding
  • 25. Neo4j, Inc. All rights reserved 2021 25 Node Classification - in Neo4j Load your in- memory graph with labels & features Use nodeClassification.train Specify the property you want to predict and the features for making that prediction Node classification: Predicting a node label or (categorical) property Neo4j Automates the Tricky Parts: 1. Splits data for train & test 2. Builds logistic regression models using the training data & specified parameters to predict the correct label 3. Evaluates the accuracy of the models using the test data 4. Returns the best performing model The predictive model appears in the model catalog, ready to apply to new data
  • 26. Neo4j, Inc. All rights reserved 2021 26 Link Prediction - in Neo4j Load your in- memory graph with labels & features Use linkPrediction.train Split your graph into train & test splitRelationships.mutate Link Prediction: Predicting unobserved edges or relationships that will form in the future Neo4j Automates the Tricky Parts: 1. Builds logistic regression models using the training data & specified parameters to predict the correct label 2. Evaluates the accuracy of the models using the test data 3. Returns the best performing model The predictive model appears in the model catalog, ready to apply to new data
  • 27. Neo4j, Inc. All rights reserved 2021 27 Machine Learning Models in Neo4j Train a model using your graph and apply it to new or unseen data Not a data model — a predictive model Models live in the Neo4j in the model catalog • Contains versioning information • Input data • Time stamps • Model names • Trained models can be persisted to disk and shared with colleagues ML Models in the Analytics Workspace
  • 28. Neo4j, Inc. All rights reserved 2021 28 Neo4j: The Only Completely In-Graph, ML Workflow Graph-Native Feature Engineering Train Predictive Model Queries Algorithms Embeddings 1. Model Type 2. Property Selection 3. Train & Test 4. Model Selection Apply Model to Existing / New Data Use Predictions for Decisions Use Predictions to Enhance the Graph Publish & Share Store Model in Database
  • 29. Neo4j, Inc. All rights reserved 2021 29 Neo4j is part of your data ecosystem DATA SOURCES USE CASES INGEST Apache Hop Structured Unstructured DATA ANALYTICS DATA MANAGEMENT Journey Analytics Risk Analytics Churn Analysis What-if Analysis Feature Engineering & ML Fraud Recommendations Data Fabric Data Compliance Data Governance Data Provenance Data Lineage Next Best Case Ontologies Neo4j Bloom Neo4j GDS Library PRODUCT COMPONENTS APOC VISUALIZE AUTO ML DRIVERS & APIs
  • 30. Neo4j, Inc. All rights reserved 2021 Neo4j, Inc. All rights reserved 2021 Real World Use Cases 30
  • 31. Neo4j, Inc. All rights reserved 2021 31 Graph Data Science Spans Industries and Uses Personalized Recommendations Churn Prediction Market Segmentation Life Sciences Predictive Maintenance Cybersecurity Master Data Management Fraud Detection
  • 32. Neo4j, Inc. All rights reserved 2021 32 Accelerate Innovation using Neo4j Graph Data Science From Simple to Highly Sophisticated Data Science Uranus is the third biggest planet Analysis Repeatability Analysis Complexity Full Production Simple, Ad Hoc High Analytics Data Science
  • 33. Neo4j, Inc. All rights reserved 2021 33 Accelerate Innovation using Neo4j Graph Data Science From Simple to Highly Sophisticated Data Science Uranus is the third biggest planet R&D: Better health outcomes through machine learning on patient journeys Fraud Detection with graph feature engineering + AutoML Analytics to improve reliability by predicting problems in a supply-chain knowledge graph Analysis Repeatability Analysis Complexity Full Production Simple, Ad Hoc High Analytics Data Science FinServ Customers
  • 34. Neo4j, Inc. All rights reserved 2021 34 • Challenge: Difficulty finding faulty components via ad hoc analytics on a vertically integrated supply chain • Solution: Uses a knowledge graph to model and analyze their complex products • Results: ○ Quickly pinpoint root causes of problems ○ Reduced query times from two minutes to seconds ○ Anti-recommendation using graph algorithms to identify and eliminate bad combinations of components Boston Scientific Finding At-Fault Components
  • 35. Neo4j, Inc. All rights reserved 2021 35 • Challenge: Graphs are an important predictive signal, but can be challenging to incorporate into production ML • Solution: Use Neo4j for repeatable feature engineering and incorporate results into autoML pipelines. • Results: ○ Identified millions of dollars in previously undetectable fraud ○ Enriched graph with the results of investigations to improve future predictions Top 10 Bank Fighting Fraud
  • 36. Neo4j, Inc. All rights reserved 2021 36 AstraZeneca Patient Journey “We used graph algorithms to find patients that had specific journey types and patterns and then find others that are close and similar.” Joseph Roemer Global Commercial IT Insight & Analytics Sr. Director AstraZeneca ● Challenge: How to best intervene sooner for complex diseases that develop over years ● Solution: Neo4j knowledge graph of 3 yrs of visits, tests, & diagnosis with 10’s billions of records. Using graph algorithms and machine learning together. ● Results: ○ Identified journey archetypes and patterns using graph feature engineering as input to ML ○ Revealed journey similarities over time with community detection ○ Found influential touch-points in the journey using graph algorithms
  • 37. Neo4j, Inc. All rights reserved 2021 What’s most important and influential in my business? What’s occurring that’s unusual? What’s going to happen next? But traditional approaches to data make it impossible to reveal and effectively use those connections as data sizes become large Predictive signals get lost in big data noise 37 Graph Data Science Answers the BIG Questions Connected Data is Powerful Graph Data Science uses Connections to Answer Critical Questions
  • 38. Neo4j, Inc. All rights reserved 2021 38 Neo4j Graph Data Science 70 Graph Algorithms More supported algorithms than any other vendor Graph-Native ML Only commercial offering with full graph ML workflows Humane Experience Automatic transformation from storage to analytics and visualization Scalable Data Science Algorithms running over 10’s billions of nodes in production Extensible Integrate with other data sources and ML platforms Strongest Community 220K+ practioners 72K+ meetups
  • 39. Neo4j, Inc. All rights reserved 2021 Neo4j, Inc. All rights reserved 2021 39 Resources Graph Resources ● Video: Advantages of Graph Technology ● Whitepaper: AI & Graph Technology: Enhancing AI with Context & Connections ● Whitepaper: Financial Fraud Detection with Graph Data Science ● Case Study: Meredith Corporation Neo4j Bookshelf ● Graph Databases For Dummies ● Graph Data Science For Dummies ● O’Reilly Graph Algorithms ● O’Reilly Knowledge Graphs
  • 40. Neo4j, Inc. All rights reserved 2021 Neo4j, Inc. All rights reserved 2021 40 Thank you! Come see us at Stand B400 Dr. Jim Webber Chief Scientist, Neo4j [email protected]