SlideShare a Scribd company logo
Improve ML Predictions using Graph
Algorithms
Mark Needham, Neo4j
Amy Hodler, Neo4j
May 2019
#Neo4j
#GraphAnalytics
• Graphs for Predictions
• Connected Features
• Link Prediction
• Neo4j + Spark Workflow
Amy E. Hodler
Graph Analytics & AI Program
Manager, Neo4j
Amy.Hodler@neo4j.com
@amyhodler
neo4j.com/
graph-algorithms-book
Chapter 8: Graph + ML
Spark & Neo4j
Mark Needham
Developer Relations Engineer,
Neo4j
Mark.needham@neo4j.com
@markHneedham
What in Common is Predictive?
Relationships:
Strongest Predictors of Behavior!
“Increasingly we're learning that you can make
better predictions about people by getting all the
information from their friends and their friends’
friends than you can from the information you
have about the person themselves”
James Fowler David Burkus
James Fowler
Albert-Laszlo
Barabasi
Native Graph Platforms are Designed for Connected Data
TRADITIONAL
PLATFORMS
BIG DATA
TECHNOLOGY
Store and retrieve data Aggregate and filter data Connections in data
Real time storage & retrieval Real-Time Connected Insights
Long running queries
aggregation & filtering
“Our Neo4j solution is literally thousands of times faster
than the prior MySQL solution, with queries that require
10-100 times less code”
Volker Pacher, Senior Developer
Max # of hops ~3
Millions
Graph Database Surging in Popularity
Trends since Jan 2013
DB-Engines.com
Improving Machine Learning using Graph Algorithms
Graph Data Science Applications
• Current data science models ignore network structure & complex relationships
• Graphs add highly predictive features to existing ML models
• Otherwise unattainable predictions based on relationships
Novel & More Accurate Predictions
with the Data You Already Have
Machine Learning Pipeline
Improving Machine Learning using Graph Algorithms
Connected Features
Connection-related metrics about our graph, such
as the number of relationships going into or out of
nodes, a count of potential triangles, or neighbors in
common.
14c
What Are Connected Features?
Query (e.g. Cypher)
Real-time, local decisioning
and pattern matching
Graph Algorithms Libraries
Global analysis
and iterations
You know what you’re looking
for and making a decision
You’re learning the overall structure of a
network, updating data, and predicting
Local
Patterns
Global
Computation
Deriving Connected Features
Connected Feature Engineering
Feature Engineering is how we combine and process the data to create new,
more meaningful features, such as clustering or connectivity metrics.
Add More Descriptive Features:
- Influence
- Relationships
- Communities
Extraction
17
Graph Feature Categories & Algorithms
Pathfinding
& Search
Finds the optimal paths or evaluates
route availability and quality
Centrality /
Importance
Determines the importance of
distinct nodes in the network
Community
Detection
Detects group clustering or
partition options
Heuristic
Link Prediction
Estimates the likelihood of nodes
forming a relationship
Evaluates how alike nodes
are
Similarity
Embeddings
Learned representations
of connectivity or topology
Link Prediction
19
Can we infer new interactions in the future?
What unobserved facts we’re missing?
+ 50 years of biomedical data
integrated in a knowledge
graph
Predicting new uses for drugs
by using the graph structure to
create features for link
prediction
Example: het.io
Example: het.io
Methods for Link Prediction
Algorithm Measures
Run targeted algorithms and score
outcomes
Set a threshold value used to predict a
link between nodes
Machine Learning
Use the measures as features to train an
ML model
Community
Detection
Link
Prediction
Similarity
1st
Node
2nd
Node
Common
Neighbors
Preferential
Attachment
label
1 2 4 15 1
3 4 7 12 1
5 6 1 1 0
Centrality
Example:
Predicting Collaboration
• Citation Network Dataset - Research Dataset
– “ArnetMiner: Extraction and Mining of Academic Social Networks”, by
J. Tang et al
– Used a subset with 52K papers, 80K authors, 140K author
relationships and 29K citation relationships
• Neo4j
– Create a co-authorship graph and connected feature engineering
• Spark and MLlib
– Train and test our model using a random forest classifier
24
Predicting Collaboration
with a Graph Enhanced ML Model
Our Link Prediction Workflow
Import Data
Create Co-Author
Graph
Extract Data &
Store as Graph
Explore, Clean,
Modify
Prepare for
Machine Learning
Train
Models
Evaluate Results
Productionize
Improving Machine Learning using Graph Algorithms
Our Link Prediction Workflow
Import Data
Create Co-Author
Graph
Extract Data &
Store as Graph
Explore, Clean,
Modify
Prepare for
Machine Learning
Train
Models
Evaluate Results
Productionize
Identified sparse
feature areas
Feature
Engineering:
New graphy
features
Graph Algorithms Used for
Feature Engineering (few examples)
Preferential Attachment measure the closeness of
nodes based on shared neighbors
Common Neighbors measures the number of possible
neighbors (triadic closure)
Illustration be.amazd.com/link-prediction/
Graph Algorithms Used for
Feature Engineering (few examples)
Triangle counting and clustering coefficients measure the
density of connections around nodes
Louvain Modularity identifies interacting communities and
hierarchies
Our Link Prediction Workflow
Import Data
Create Co-Author
Graph
Extract Data &
Store as Graph
Explore, Clean,
Modify
Prepare for
Machine Learning
Train
Models
Evaluate Results
Productionize
Identified sparse
feature areas
Feature
Engineering:
New graphy
features
Train / Test Split
Resample:
Downsampled for
proportional
representation
31
32
Test/Train Split
1st
Node
2nd
Node
Common
Neighbors
Preferential
Attachment
label
1 2 4 15 1
3 4 7 12 1
5 6 1 1 0
2 12 3 3 0
4 9 4 8 1
7 10 12 36 1
8 11 2 3 0
33
Test/Train Split
1st
Node
2nd
Node
Common
Neighbors
Preferential
Attachment
label
1 2 4 15 1
3 4 7 12 1
5 6 1 1 0
2 12 3 3 0
4 9 4 8 1
7 10 12 36 1
8 11 2 3 0
Train
Test
OMG I’m Good!
Data Leakage!
Graph metric computation for the train set
touches data from the test set.
Did you get really high accuracy on your first
run without tuning?
Train and Test Graphs: Time Based Split
1st
Node
2nd
Node
Common
Neighbors
Preferential
Attachment
label
1 2 4 15 1
3 4 7 12 1
5 6 1 1 0
Train
Test
1st
Node
2nd
Node
Common
Neighbors
Preferential
Attachment
label
2 12 3 3 0
4 9 4 8 1
7 10 12 36 1
< 2006
>= 2006
Train and Test Graphs: Time Based Split
1st
Node
2nd
Node
Common
Neighbors
Preferential
Attachment
label
1 2 4 15 1
3 4 7 12 1
5 6 1 1 0
Train
Test
1st
Node
2nd
Node
Common
Neighbors
Preferential
Attachment
label
2 12 3 3 0
4 9 4 8 1
7 10 12 36 1
Class Imbalance
Negative
Examples
Positive
Examples
There are significantly more negative examples than positive ones:
# negative examples = (# nodes)² - (# relationships) - (# nodes)
38
Class Imbalance
A very high accuracy model could predict that a pair of nodes are not linked.
39
Class Imbalance
Class Imbalance
Our Link Prediction Workflow
Import Data
Create Co-Author
Graph
Extract Data &
Store as Graph
Explore, Clean,
Modify
Prepare for
Machine Learning
Train
Models
Evaluate Results
Productionize
Identified sparse
feature areas
Feature
Engineering:
New graphy
features
Train / Test Split
Resample:
Downsampled for
proportional
representation
Model Selection:
Random Forest
Ensemble
method
Picking a Classifier
Training Our Model
This is one decision tree in our
Random Forest used as a binary
classifier to learn how to classify a
pair: predicting either linked or not
linked.
4 Models Trained
with Multiple Graph Features
Graph Features:
• Common Authors
“Graphy”
Model
Common Authors
Model
Triangles
Model
Community
Model
Graph Features:
• Preferential
Attachment
• Total Neighbors
Graph Features:
• Min & Max Triangles
• Min & Max
Clustering
Coefficient
Graph Features:
• Label Propagation
• Louvain Modularity
Our Link Prediction Workflow
Import Data
Create Co-Author
Graph
Extract Data &
Store as Graph
Explore, Clean,
Modify
Prepare for
Machine Learning
Train
Models
Evaluate Results
Productionize
Identified sparse
feature areas
Feature
Engineering:
New graphy
features
Train / Test Split
Resample:
Downsampled for
proportional
representation
Precision,
Accuracy, Recall
ROC Curve &
AUC
Model Selection:
Random Forest
Ensemble
method
Measures
Accuracy Proportion of total correct predictions.
Beware of skewed data!
Precision Proportion of positive predictions that
are correct.
Low score = more false positives
Recall /
True Positive Rate
Proportion of actual positives that are
correct.
Low score = more false negatives
False Positive Rate Proportion of incorrect positives
ROC Curve & AUC X-Y Chart mapping above 2 metrics
(TPR and FPR) with area under curve
Result: First Model ROC & AUC
Problematic False Positives!
Common Authors
Model 1
Result: All Models Common Authors
Model 1
Community
Model 4
Iteration & Tuning: Feature Influence
For feature importance, the Spark
random forest averages the
reduction in impurity across all
trees in the forest
Feature rankings are in comparison
to the group of features evaluated
Also try PageRank!
Try removing different features
(LabelPropagation)
Graph Machine Learning Workflow
Data aggregation
Create and store
graphs
Extract Data &
Store as Graph
Explore, Clean,
Modify
Prepare for
Machine Learning
Train
Models
Evaluate Results
Productionize
Identify
uninteresting
features
Cleanse (outliers+)
Feature
engineering/
extraction
Train / Test split
Resample for
meaningful
representation
(proportional, etc.)
Precision, accuracy,
recall
(ROC curve & AUC)
SME Review
Cross-validation
Model & variable
selection
Hyperparameter
tuning
Ensemble methods
Resources
• neo4j.com/sandbox
• neo4j.com/developer/
graph-algorithms/
• community.neo4j.com
Data & Code:
• This example from O’Reilly book
bit.ly/2FPgGVV (ML Folder)
Amy.Hodler@neo4j.com
@amyhodler
neo4j.com/
graph-algorithms-book
Q&A/Extra Stuff to delete
52
53
Connected Feature Extraction
Feature Extraction is how when we change the shape or format of the data
to be usable in a machine learning pipeline. For example, from a graph, we
extract the relevant subset of the data into a tabular format for model
building.
Connected Feature Selection
Feature Selection is how we reduce the number of features used in a model
to a relevant subset. This can be done algorithmically or based on domain
expertise, but the objective is to maximize the predictive power of your
model while minimizing overfitting.
720+
7/10
12/2
5
8/10
53K+
100+
300+
450+
Adoption
Top Retail Firms
Top Financial Firms
Top Software Vendors
Customers Partners
• Creator of the Neo4j Graph Platform
• ~250 employees
• HQ in Silicon Valley, other offices include
London, Munich, Paris and Malmö Sweden
• $80M new funding led by Morgan Stanley &
One Peak. Total $160M from Fidelity,
Sunstone, Conor, Creandum, and
Greenbridge Capital
• Over 15M+ downloads & container pulls
• 325+ enterprise subscription customers
with over half with >$1B in revenue
Ecosystem
Startups in program
Enterprise customers
Partners
Meet up members
Events per year
Industry’s Largest Dedicated Investment in Graphs
Neo4j - The Graph Company
Strictly ConfidentialStrictly Confidential
56
Helping The World To Make Sense of Data
ICIJ used Neo4j to uncover the
world’s largest journalistic leak to
date, The Panama Papers
NASA uses Neo4j for a “Lessons
Learned” database to improve
effectiveness in search missions in
space
Neo4j is used to graph the human
body, map correlations, identify cause
& effect and search for the cure for
cancer
SAVING DEMOCRACY
MISSION TO
MARS
CURING CANCER
Graph and ML Algorithms in Neo4j
• Parallel Breadth First Search & DFS
• Shortest Path
• Single-Source Shortest Path
• All Pairs Shortest Path
• Minimum Spanning Tree
• A* Shortest Path
• Yen’s K Shortest Path
• K-Spanning Tree (MST)
• Random Walk
• Degree Centrality
• Closeness Centrality
• CC Variations: Harmonic, Dangalchev,
Wasserman & Faust
• Betweenness Centrality
• Approximate Betweenness Centrality
• PageRank
• Personalized PageRank
• ArticleRank
• Eigenvector Centrality
• Triangle Count
• Clustering Coefficients
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity – 1 Step & Multi-Step
• Balanced Triad (identification)
• Euclidean Distance
• Cosine Similarity
• Jaccard Similarity
• Overlap Similarity
• Pearson Similarity
Pathfinding
& Search
Centrality /
Importance
Community
Detection
Similarity
neo4j.com/docs/
graph-algorithms/current/
Updated April 2019
Link
Prediction
• Adamic Adar
• Common Neighbors
• Preferential Attachment
• Resource Allocations
• Same Community
• Total Neighbors
Conceive
Code
Compute
Store
Non-Native Graph DBNative Graph DB
RDBM
S
Optimized for graph workloads
Connectedness Differentiates Neo4j
Neo4j is an enterprise-grade native graph platform that enables you to:
• Store, reveal and query data relationships
• Traverse and analyze any levels of depth in real-time
• Add context and connect new data on the fly
59
Who We Are: Leader in Graph Innovations
• Performance
• ACID Transactions
• Schema-free Agility
• Graph Algorithms
Designed, built and tested natively
for graphs from the start for:
• Developer Productivity
• Hardware Efficiency
• Global Scale
• Graph Adoption
Graph
Transactions
Graph
Analytics
Data Integration
Development
& Admin
Analytics
Tooling
Drivers & APIs Discovery & Visualization
60
• Record “Cyber Monday” sales
• About 35M daily transactions
• Each transaction is 3-22 hops
• Queries executed in 4ms or less
• Replaced IBM Websphere commerce
• 300M pricing operations per day
• 10x transaction throughput on half the
hardware compared to Oracle
• Replaced Oracle database
• Large postal service with over 500k
employees
• Neo4j routes 7M+ packages daily at peak,
with peaks of 5,000+ routing operations per
second.
Handling Large Graph Work Loads for Enterprises
Real-time promotion
recommendations
Marriott’s Real-time
Pricing Engine
Handling Package
Routing in Real-Time
Recommendations Dynamic Pricing IoT-applicationsFraud Detection
Real-Time Transaction Applications
Generate and
Protect Revenue
Customer
Engagement
Metadata and Advanced Analytics
Data Lake
Integration
Knowledge
Graphs for AI
Risk
Mitigation
Generate
Actionable Insights
Network
Management
Supply Chain
Efficiency
Identity and Access
Management
Internal Business Processes
Improve Efficiency
and Cut Costs
Graph Use Cases by Value Proposition
Softwar
e
Financial
Services Teleco
m
Retail &
Consumer Goods
Media &
Entertainment Other Industries
Airbus
62 Copyright © 2017 Neo4j, Inc. Company Confidential
Graph
Transactions
Graph
Analytics
Data Integration
Development
& Admin
Analytics
Tooling
Drivers & APIs Discovery & Visualization
Developers
Admins
Applications Business Users
Data Analysts
Data Scientists
Enterprise Data Hub
Native Graph Platform: Tools for Many Users
Collections-Focused
Multi-Model, Documents, Columns
& Simple Tables, Joins
Neo4j is designed for data relationships
Different Paradigms
NoSQL
Relational
DBMS
Neo4j Graph
Platform
Connections-Focused
Focused on
Data Relationships
Development Benefits
Easy model maintenance
Easy query
Deployment Benefits
Ultra high performance
Minimal resource usage
How Neo4j Fits — Common Architecture Patterns
From Disparate Silos
To Cross-Silo Connections
From Tabular Data
To Connected Data
From Data Lake Analytics
to Real-Time Operations
Cypher: Powerful & Expressive Query Language
MATCH (:Person { name:“Dan”} ) -[:MARRIED_TO]-> (spouse)
MARRIED_TO
Dan Ann
NODE RELATIONSHIP TYPE
LABEL PROPERTY VARIABLE
Neo4j Bloom
67
• High fidelity
• Scene navigation
• Property views
• Search suggestions
• Saved phrase history
• Property editor
• Schema perspectives
• Bloom chart type
• Visualize
• Communicate
• Discover
• Navigate
• Isolate
• Edit
• Share
68
Real-Time
Recommendations
Fraud
Detection
Network &
IT Operations
Master Data
Management
Knowledge
Graph
Identity & Access
Management
Common Graph Technology Use Cases
AirBnb
Graphs Drive Innovation
69
Context Paths
Auto-Graphs
Graph Layers
1st Graph
Cross-
Connect
Cross-tech applications
Internet of Things
operations
Transparent Neural
Networks
Blockchain-managed
systems
Adjacent graph layers
inspire new innovations
Metadata / Risk
Management
Knowledge Graphs
AI- Powered Customer
Experiences
Connect unlike objects
such as people to products,
locations
Mobile app explosion
Recommendation engines
Fraud detectors
Desire for more context to
follow connections
Connects like objects
People, computer
networks, telco, etc
Business Problem
• Find relationships between people, accounts, shell companies
and offshore accounts
• Journalists are non-technical
• Biggest “Snowden-Style” document leak ever; 11.5 million
documents, 2.6TB of data
Solution and Benefits
• Pulitzer Prize winning investigation resulted in robust
coverage of fraud and corruption
• PM of Iceland & Pakistan resigned, exposed Putin, Prime
Ministers, gangsters, celebrities (Messi)
• Led to assassination of journalist in Malta
Background
• International Consortium of Investigative Journalists (ICIJ),
small team of data journalists
• International investigative team specializing in cross-border
crime, corruption and accountability of power
• Works regularly with leaks and large datasets
ICIJ Panama Papers INVESTIGATIVE JOURNALISM
Fraud Detection / Knowledge Graph70
Thomson Reuters Graph
71
• Data Fusion for Portfolio
Managers
• Graph layers
Background
• Personal shopping assistant
• Converses with buyer via text, picture and voice
to provide real-time recommendations
• Combines AI and natural language understanding
(NLU) in Neo4j Knowledge Graph
• First of many apps in eBay's AI Platform
Business Problem
• Improve personal context in online shopping
• Transform buyer-provided context into ideal
purchase recommendations over social platforms
• "Feels like talking to a friend"
Solution and Benefits
• 3 developers, 8M nodes, 20M relationships
• Needed high-performance traversals to respond
to live customer requests
• Easy to train new algorithms and grow model
• Generating revenue since launch
eBay for Google Assistant ONLINE RETAIL
Knowledge Graph powers Real-Time Recommendations72
EE Customer since 2016 Q3
Background
• Over 7M citizens suffer from Diabetes
• Connecting over 400 researchers
• Incorporates over 50 databases, 100k’s of Excel
workbooks, 30 database of biological samples
• Sought to examine disease from as many angles as
possible.
Business Problem
• Genes are connected by proteins or to metabolites,
and patients are connected with their diets, etc…
• Needed to improve the utilization of immensely
technical data
• Needed to cater to doctors and researchers with
simple navigation, communication and connections
of the graph.
Solution and Benefits
• Dr. Alexander Jarasch, Head of Bioinformatics and
Data Management
• Scientists can conduct parallel research without
asking the same questions or repeating tests
• Built views like a liver sample knowledge graph
DZD - German Center for Diabetes Research
Medical Genomic Research73
EE Customer since 2016
Q4

More Related Content

What's hot (20)

PPTX
Introduction to Graph neural networks @ Vienna Deep Learning meetup
Liad Magen
 
PDF
Graph Neural Network in practice
tuxette
 
PDF
Workshop - Neo4j Graph Data Science
Neo4j
 
PDF
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
Neo4j
 
PDF
Graph based data models
Moumie Soulemane
 
PDF
Introduction to Knowledge Graphs and Semantic AI
Semantic Web Company
 
PDF
Graph neural networks overview
Rodion Kiryukhin
 
PPTX
Graph Neural Network - Introduction
Jungwon Kim
 
PPTX
Graph Analytics
Khalid Salama
 
PDF
Machine Learning Ml Overview Algorithms Use Cases And Applications
SlideTeam
 
PDF
stackconf 2022: Introduction to Vector Search with Weaviate
NETWAYS
 
PPTX
Community detection in social networks
Francisco Restivo
 
PDF
Network embedding
SOYEON KIM
 
PPTX
Smarter Fraud Detection With Graph Data Science
Neo4j
 
PPT
3 Centrality
Maksim Tsvetovat
 
PPTX
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
Simplilearn
 
PDF
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
Animesh Singh
 
PDF
Intro to Neo4j and Graph Databases
Neo4j
 
PDF
Webinar on Graph Neural Networks
LucaCrociani1
 
PDF
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Anoop Deoras
 
Introduction to Graph neural networks @ Vienna Deep Learning meetup
Liad Magen
 
Graph Neural Network in practice
tuxette
 
Workshop - Neo4j Graph Data Science
Neo4j
 
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
Neo4j
 
Graph based data models
Moumie Soulemane
 
Introduction to Knowledge Graphs and Semantic AI
Semantic Web Company
 
Graph neural networks overview
Rodion Kiryukhin
 
Graph Neural Network - Introduction
Jungwon Kim
 
Graph Analytics
Khalid Salama
 
Machine Learning Ml Overview Algorithms Use Cases And Applications
SlideTeam
 
stackconf 2022: Introduction to Vector Search with Weaviate
NETWAYS
 
Community detection in social networks
Francisco Restivo
 
Network embedding
SOYEON KIM
 
Smarter Fraud Detection With Graph Data Science
Neo4j
 
3 Centrality
Maksim Tsvetovat
 
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
Simplilearn
 
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
Animesh Singh
 
Intro to Neo4j and Graph Databases
Neo4j
 
Webinar on Graph Neural Networks
LucaCrociani1
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Anoop Deoras
 

Similar to Improving Machine Learning using Graph Algorithms (20)

PDF
Improve ml predictions using graph algorithms (webinar july 23_19).pptx
Neo4j
 
PDF
Improve ML Predictions using Graph Analytics (today!)
Neo4j
 
PDF
Improve ML Predictions using Connected Feature Extraction
Databricks
 
PPTX
Graphs for Ai and ML
Neo4j
 
PDF
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Databricks
 
PDF
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Fred Madrid
 
PPTX
Graph Analytics: Graph Algorithms Inside Neo4j
Neo4j
 
PDF
3. Relationships Matter: Using Connected Data for Better Machine Learning
Neo4j
 
PPTX
GraphTour Boston - Graphs for AI and ML
Neo4j
 
PPTX
How Graphs are Changing AI
Neo4j
 
PDF
Leveraging Graphs for Better AI
Neo4j
 
PDF
Leveraging Graphs for Better AI
Neo4j
 
PDF
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
miyurud
 
PDF
Predicting Influence and Communities Using Graph Algorithms
Databricks
 
PDF
Spark Summit EU talk by Reza Karimi
Spark Summit
 
PDF
Graph Algorithms for Developers
Neo4j
 
PDF
GraphTour London 2020 - Graphs for AI, Amy Hodler
Neo4j
 
PDF
How Graphs Enhance AI
Neo4j
 
PDF
GraphTour 2020 - Graphs & AI: A Path for Data Science
Neo4j
 
PDF
Knowledge graphs, meet Deep Learning
Connected Data World
 
Improve ml predictions using graph algorithms (webinar july 23_19).pptx
Neo4j
 
Improve ML Predictions using Graph Analytics (today!)
Neo4j
 
Improve ML Predictions using Connected Feature Extraction
Databricks
 
Graphs for Ai and ML
Neo4j
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Databricks
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Fred Madrid
 
Graph Analytics: Graph Algorithms Inside Neo4j
Neo4j
 
3. Relationships Matter: Using Connected Data for Better Machine Learning
Neo4j
 
GraphTour Boston - Graphs for AI and ML
Neo4j
 
How Graphs are Changing AI
Neo4j
 
Leveraging Graphs for Better AI
Neo4j
 
Leveraging Graphs for Better AI
Neo4j
 
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
miyurud
 
Predicting Influence and Communities Using Graph Algorithms
Databricks
 
Spark Summit EU talk by Reza Karimi
Spark Summit
 
Graph Algorithms for Developers
Neo4j
 
GraphTour London 2020 - Graphs for AI, Amy Hodler
Neo4j
 
How Graphs Enhance AI
Neo4j
 
GraphTour 2020 - Graphs & AI: A Path for Data Science
Neo4j
 
Knowledge graphs, meet Deep Learning
Connected Data World
 
Ad

More from Neo4j (20)

PDF
GraphSummit Singapore Master Deck - May 20, 2025
Neo4j
 
PPTX
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j
 
PPTX
Neo4j Knowledge for Customer Experience.pptx
Neo4j
 
PPTX
GraphTalk New Zealand - The Art of The Possible.pptx
Neo4j
 
PDF
Neo4j: The Art of the Possible with Graph
Neo4j
 
PDF
Smarter Knowledge Graphs For Public Sector
Neo4j
 
PDF
GraphRAG and Knowledge Graphs Exploring AI's Future
Neo4j
 
PDF
Matinée GenAI & GraphRAG Paris - Décembre 24
Neo4j
 
PDF
ANZ Presentation: GraphSummit Melbourne 2024
Neo4j
 
PDF
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Neo4j
 
PDF
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Neo4j
 
PDF
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Neo4j
 
PDF
Démonstration Digital Twin Building Wire Management
Neo4j
 
PDF
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Neo4j
 
PDF
Démonstration Supply Chain - GraphTalk Paris
Neo4j
 
PDF
The Art of Possible - GraphTalk Paris Opening Session
Neo4j
 
PPTX
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
Neo4j
 
PDF
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Neo4j
 
PDF
Neo4j Graph Data Modelling Session - GraphTalk
Neo4j
 
PDF
Neo4j: The Art of Possible with Graph Technology
Neo4j
 
GraphSummit Singapore Master Deck - May 20, 2025
Neo4j
 
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j
 
Neo4j Knowledge for Customer Experience.pptx
Neo4j
 
GraphTalk New Zealand - The Art of The Possible.pptx
Neo4j
 
Neo4j: The Art of the Possible with Graph
Neo4j
 
Smarter Knowledge Graphs For Public Sector
Neo4j
 
GraphRAG and Knowledge Graphs Exploring AI's Future
Neo4j
 
Matinée GenAI & GraphRAG Paris - Décembre 24
Neo4j
 
ANZ Presentation: GraphSummit Melbourne 2024
Neo4j
 
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Neo4j
 
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Neo4j
 
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Neo4j
 
Démonstration Digital Twin Building Wire Management
Neo4j
 
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Neo4j
 
Démonstration Supply Chain - GraphTalk Paris
Neo4j
 
The Art of Possible - GraphTalk Paris Opening Session
Neo4j
 
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
Neo4j
 
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Neo4j
 
Neo4j Graph Data Modelling Session - GraphTalk
Neo4j
 
Neo4j: The Art of Possible with Graph Technology
Neo4j
 
Ad

Recently uploaded (20)

PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PDF
Data Retrieval and Preparation Business Analytics.pdf
kayserrakib80
 
PDF
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PDF
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PDF
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
Data Retrieval and Preparation Business Analytics.pdf
kayserrakib80
 
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
BinarySearchTree in datastructures in detail
kichokuttu
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 

Improving Machine Learning using Graph Algorithms

  • 1. Improve ML Predictions using Graph Algorithms Mark Needham, Neo4j Amy Hodler, Neo4j May 2019 #Neo4j #GraphAnalytics
  • 2. • Graphs for Predictions • Connected Features • Link Prediction • Neo4j + Spark Workflow Amy E. Hodler Graph Analytics & AI Program Manager, Neo4j [email protected] @amyhodler neo4j.com/ graph-algorithms-book Chapter 8: Graph + ML Spark & Neo4j Mark Needham Developer Relations Engineer, Neo4j [email protected] @markHneedham
  • 3. What in Common is Predictive?
  • 4. Relationships: Strongest Predictors of Behavior! “Increasingly we're learning that you can make better predictions about people by getting all the information from their friends and their friends’ friends than you can from the information you have about the person themselves” James Fowler David Burkus James Fowler Albert-Laszlo Barabasi
  • 5. Native Graph Platforms are Designed for Connected Data TRADITIONAL PLATFORMS BIG DATA TECHNOLOGY Store and retrieve data Aggregate and filter data Connections in data Real time storage & retrieval Real-Time Connected Insights Long running queries aggregation & filtering “Our Neo4j solution is literally thousands of times faster than the prior MySQL solution, with queries that require 10-100 times less code” Volker Pacher, Senior Developer Max # of hops ~3 Millions
  • 6. Graph Database Surging in Popularity Trends since Jan 2013 DB-Engines.com
  • 8. Graph Data Science Applications
  • 9. • Current data science models ignore network structure & complex relationships • Graphs add highly predictive features to existing ML models • Otherwise unattainable predictions based on relationships Novel & More Accurate Predictions with the Data You Already Have Machine Learning Pipeline
  • 12. Connection-related metrics about our graph, such as the number of relationships going into or out of nodes, a count of potential triangles, or neighbors in common. 14c What Are Connected Features?
  • 13. Query (e.g. Cypher) Real-time, local decisioning and pattern matching Graph Algorithms Libraries Global analysis and iterations You know what you’re looking for and making a decision You’re learning the overall structure of a network, updating data, and predicting Local Patterns Global Computation Deriving Connected Features
  • 14. Connected Feature Engineering Feature Engineering is how we combine and process the data to create new, more meaningful features, such as clustering or connectivity metrics. Add More Descriptive Features: - Influence - Relationships - Communities Extraction
  • 15. 17 Graph Feature Categories & Algorithms Pathfinding & Search Finds the optimal paths or evaluates route availability and quality Centrality / Importance Determines the importance of distinct nodes in the network Community Detection Detects group clustering or partition options Heuristic Link Prediction Estimates the likelihood of nodes forming a relationship Evaluates how alike nodes are Similarity Embeddings Learned representations of connectivity or topology
  • 17. 19 Can we infer new interactions in the future? What unobserved facts we’re missing?
  • 18. + 50 years of biomedical data integrated in a knowledge graph Predicting new uses for drugs by using the graph structure to create features for link prediction Example: het.io
  • 20. Methods for Link Prediction Algorithm Measures Run targeted algorithms and score outcomes Set a threshold value used to predict a link between nodes Machine Learning Use the measures as features to train an ML model Community Detection Link Prediction Similarity 1st Node 2nd Node Common Neighbors Preferential Attachment label 1 2 4 15 1 3 4 7 12 1 5 6 1 1 0 Centrality
  • 22. • Citation Network Dataset - Research Dataset – “ArnetMiner: Extraction and Mining of Academic Social Networks”, by J. Tang et al – Used a subset with 52K papers, 80K authors, 140K author relationships and 29K citation relationships • Neo4j – Create a co-authorship graph and connected feature engineering • Spark and MLlib – Train and test our model using a random forest classifier 24 Predicting Collaboration with a Graph Enhanced ML Model
  • 23. Our Link Prediction Workflow Import Data Create Co-Author Graph Extract Data & Store as Graph Explore, Clean, Modify Prepare for Machine Learning Train Models Evaluate Results Productionize
  • 25. Our Link Prediction Workflow Import Data Create Co-Author Graph Extract Data & Store as Graph Explore, Clean, Modify Prepare for Machine Learning Train Models Evaluate Results Productionize Identified sparse feature areas Feature Engineering: New graphy features
  • 26. Graph Algorithms Used for Feature Engineering (few examples) Preferential Attachment measure the closeness of nodes based on shared neighbors Common Neighbors measures the number of possible neighbors (triadic closure) Illustration be.amazd.com/link-prediction/
  • 27. Graph Algorithms Used for Feature Engineering (few examples) Triangle counting and clustering coefficients measure the density of connections around nodes Louvain Modularity identifies interacting communities and hierarchies
  • 28. Our Link Prediction Workflow Import Data Create Co-Author Graph Extract Data & Store as Graph Explore, Clean, Modify Prepare for Machine Learning Train Models Evaluate Results Productionize Identified sparse feature areas Feature Engineering: New graphy features Train / Test Split Resample: Downsampled for proportional representation
  • 29. 31
  • 30. 32 Test/Train Split 1st Node 2nd Node Common Neighbors Preferential Attachment label 1 2 4 15 1 3 4 7 12 1 5 6 1 1 0 2 12 3 3 0 4 9 4 8 1 7 10 12 36 1 8 11 2 3 0
  • 31. 33 Test/Train Split 1st Node 2nd Node Common Neighbors Preferential Attachment label 1 2 4 15 1 3 4 7 12 1 5 6 1 1 0 2 12 3 3 0 4 9 4 8 1 7 10 12 36 1 8 11 2 3 0 Train Test
  • 32. OMG I’m Good! Data Leakage! Graph metric computation for the train set touches data from the test set. Did you get really high accuracy on your first run without tuning?
  • 33. Train and Test Graphs: Time Based Split 1st Node 2nd Node Common Neighbors Preferential Attachment label 1 2 4 15 1 3 4 7 12 1 5 6 1 1 0 Train Test 1st Node 2nd Node Common Neighbors Preferential Attachment label 2 12 3 3 0 4 9 4 8 1 7 10 12 36 1 < 2006 >= 2006
  • 34. Train and Test Graphs: Time Based Split 1st Node 2nd Node Common Neighbors Preferential Attachment label 1 2 4 15 1 3 4 7 12 1 5 6 1 1 0 Train Test 1st Node 2nd Node Common Neighbors Preferential Attachment label 2 12 3 3 0 4 9 4 8 1 7 10 12 36 1
  • 36. There are significantly more negative examples than positive ones: # negative examples = (# nodes)² - (# relationships) - (# nodes) 38 Class Imbalance
  • 37. A very high accuracy model could predict that a pair of nodes are not linked. 39 Class Imbalance
  • 39. Our Link Prediction Workflow Import Data Create Co-Author Graph Extract Data & Store as Graph Explore, Clean, Modify Prepare for Machine Learning Train Models Evaluate Results Productionize Identified sparse feature areas Feature Engineering: New graphy features Train / Test Split Resample: Downsampled for proportional representation Model Selection: Random Forest Ensemble method
  • 41. Training Our Model This is one decision tree in our Random Forest used as a binary classifier to learn how to classify a pair: predicting either linked or not linked.
  • 42. 4 Models Trained with Multiple Graph Features Graph Features: • Common Authors “Graphy” Model Common Authors Model Triangles Model Community Model Graph Features: • Preferential Attachment • Total Neighbors Graph Features: • Min & Max Triangles • Min & Max Clustering Coefficient Graph Features: • Label Propagation • Louvain Modularity
  • 43. Our Link Prediction Workflow Import Data Create Co-Author Graph Extract Data & Store as Graph Explore, Clean, Modify Prepare for Machine Learning Train Models Evaluate Results Productionize Identified sparse feature areas Feature Engineering: New graphy features Train / Test Split Resample: Downsampled for proportional representation Precision, Accuracy, Recall ROC Curve & AUC Model Selection: Random Forest Ensemble method
  • 44. Measures Accuracy Proportion of total correct predictions. Beware of skewed data! Precision Proportion of positive predictions that are correct. Low score = more false positives Recall / True Positive Rate Proportion of actual positives that are correct. Low score = more false negatives False Positive Rate Proportion of incorrect positives ROC Curve & AUC X-Y Chart mapping above 2 metrics (TPR and FPR) with area under curve
  • 45. Result: First Model ROC & AUC Problematic False Positives! Common Authors Model 1
  • 46. Result: All Models Common Authors Model 1 Community Model 4
  • 47. Iteration & Tuning: Feature Influence For feature importance, the Spark random forest averages the reduction in impurity across all trees in the forest Feature rankings are in comparison to the group of features evaluated Also try PageRank! Try removing different features (LabelPropagation)
  • 48. Graph Machine Learning Workflow Data aggregation Create and store graphs Extract Data & Store as Graph Explore, Clean, Modify Prepare for Machine Learning Train Models Evaluate Results Productionize Identify uninteresting features Cleanse (outliers+) Feature engineering/ extraction Train / Test split Resample for meaningful representation (proportional, etc.) Precision, accuracy, recall (ROC curve & AUC) SME Review Cross-validation Model & variable selection Hyperparameter tuning Ensemble methods
  • 49. Resources • neo4j.com/sandbox • neo4j.com/developer/ graph-algorithms/ • community.neo4j.com Data & Code: • This example from O’Reilly book bit.ly/2FPgGVV (ML Folder) [email protected] @amyhodler neo4j.com/ graph-algorithms-book
  • 50. Q&A/Extra Stuff to delete 52
  • 51. 53 Connected Feature Extraction Feature Extraction is how when we change the shape or format of the data to be usable in a machine learning pipeline. For example, from a graph, we extract the relevant subset of the data into a tabular format for model building.
  • 52. Connected Feature Selection Feature Selection is how we reduce the number of features used in a model to a relevant subset. This can be done algorithmically or based on domain expertise, but the objective is to maximize the predictive power of your model while minimizing overfitting.
  • 53. 720+ 7/10 12/2 5 8/10 53K+ 100+ 300+ 450+ Adoption Top Retail Firms Top Financial Firms Top Software Vendors Customers Partners • Creator of the Neo4j Graph Platform • ~250 employees • HQ in Silicon Valley, other offices include London, Munich, Paris and Malmö Sweden • $80M new funding led by Morgan Stanley & One Peak. Total $160M from Fidelity, Sunstone, Conor, Creandum, and Greenbridge Capital • Over 15M+ downloads & container pulls • 325+ enterprise subscription customers with over half with >$1B in revenue Ecosystem Startups in program Enterprise customers Partners Meet up members Events per year Industry’s Largest Dedicated Investment in Graphs Neo4j - The Graph Company
  • 54. Strictly ConfidentialStrictly Confidential 56 Helping The World To Make Sense of Data ICIJ used Neo4j to uncover the world’s largest journalistic leak to date, The Panama Papers NASA uses Neo4j for a “Lessons Learned” database to improve effectiveness in search missions in space Neo4j is used to graph the human body, map correlations, identify cause & effect and search for the cure for cancer SAVING DEMOCRACY MISSION TO MARS CURING CANCER
  • 55. Graph and ML Algorithms in Neo4j • Parallel Breadth First Search & DFS • Shortest Path • Single-Source Shortest Path • All Pairs Shortest Path • Minimum Spanning Tree • A* Shortest Path • Yen’s K Shortest Path • K-Spanning Tree (MST) • Random Walk • Degree Centrality • Closeness Centrality • CC Variations: Harmonic, Dangalchev, Wasserman & Faust • Betweenness Centrality • Approximate Betweenness Centrality • PageRank • Personalized PageRank • ArticleRank • Eigenvector Centrality • Triangle Count • Clustering Coefficients • Connected Components (Union Find) • Strongly Connected Components • Label Propagation • Louvain Modularity – 1 Step & Multi-Step • Balanced Triad (identification) • Euclidean Distance • Cosine Similarity • Jaccard Similarity • Overlap Similarity • Pearson Similarity Pathfinding & Search Centrality / Importance Community Detection Similarity neo4j.com/docs/ graph-algorithms/current/ Updated April 2019 Link Prediction • Adamic Adar • Common Neighbors • Preferential Attachment • Resource Allocations • Same Community • Total Neighbors
  • 56. Conceive Code Compute Store Non-Native Graph DBNative Graph DB RDBM S Optimized for graph workloads Connectedness Differentiates Neo4j
  • 57. Neo4j is an enterprise-grade native graph platform that enables you to: • Store, reveal and query data relationships • Traverse and analyze any levels of depth in real-time • Add context and connect new data on the fly 59 Who We Are: Leader in Graph Innovations • Performance • ACID Transactions • Schema-free Agility • Graph Algorithms Designed, built and tested natively for graphs from the start for: • Developer Productivity • Hardware Efficiency • Global Scale • Graph Adoption Graph Transactions Graph Analytics Data Integration Development & Admin Analytics Tooling Drivers & APIs Discovery & Visualization
  • 58. 60 • Record “Cyber Monday” sales • About 35M daily transactions • Each transaction is 3-22 hops • Queries executed in 4ms or less • Replaced IBM Websphere commerce • 300M pricing operations per day • 10x transaction throughput on half the hardware compared to Oracle • Replaced Oracle database • Large postal service with over 500k employees • Neo4j routes 7M+ packages daily at peak, with peaks of 5,000+ routing operations per second. Handling Large Graph Work Loads for Enterprises Real-time promotion recommendations Marriott’s Real-time Pricing Engine Handling Package Routing in Real-Time
  • 59. Recommendations Dynamic Pricing IoT-applicationsFraud Detection Real-Time Transaction Applications Generate and Protect Revenue Customer Engagement Metadata and Advanced Analytics Data Lake Integration Knowledge Graphs for AI Risk Mitigation Generate Actionable Insights Network Management Supply Chain Efficiency Identity and Access Management Internal Business Processes Improve Efficiency and Cut Costs Graph Use Cases by Value Proposition
  • 60. Softwar e Financial Services Teleco m Retail & Consumer Goods Media & Entertainment Other Industries Airbus 62 Copyright © 2017 Neo4j, Inc. Company Confidential
  • 61. Graph Transactions Graph Analytics Data Integration Development & Admin Analytics Tooling Drivers & APIs Discovery & Visualization Developers Admins Applications Business Users Data Analysts Data Scientists Enterprise Data Hub Native Graph Platform: Tools for Many Users
  • 62. Collections-Focused Multi-Model, Documents, Columns & Simple Tables, Joins Neo4j is designed for data relationships Different Paradigms NoSQL Relational DBMS Neo4j Graph Platform Connections-Focused Focused on Data Relationships Development Benefits Easy model maintenance Easy query Deployment Benefits Ultra high performance Minimal resource usage
  • 63. How Neo4j Fits — Common Architecture Patterns From Disparate Silos To Cross-Silo Connections From Tabular Data To Connected Data From Data Lake Analytics to Real-Time Operations
  • 64. Cypher: Powerful & Expressive Query Language MATCH (:Person { name:“Dan”} ) -[:MARRIED_TO]-> (spouse) MARRIED_TO Dan Ann NODE RELATIONSHIP TYPE LABEL PROPERTY VARIABLE
  • 65. Neo4j Bloom 67 • High fidelity • Scene navigation • Property views • Search suggestions • Saved phrase history • Property editor • Schema perspectives • Bloom chart type • Visualize • Communicate • Discover • Navigate • Isolate • Edit • Share
  • 66. 68 Real-Time Recommendations Fraud Detection Network & IT Operations Master Data Management Knowledge Graph Identity & Access Management Common Graph Technology Use Cases AirBnb
  • 67. Graphs Drive Innovation 69 Context Paths Auto-Graphs Graph Layers 1st Graph Cross- Connect Cross-tech applications Internet of Things operations Transparent Neural Networks Blockchain-managed systems Adjacent graph layers inspire new innovations Metadata / Risk Management Knowledge Graphs AI- Powered Customer Experiences Connect unlike objects such as people to products, locations Mobile app explosion Recommendation engines Fraud detectors Desire for more context to follow connections Connects like objects People, computer networks, telco, etc
  • 68. Business Problem • Find relationships between people, accounts, shell companies and offshore accounts • Journalists are non-technical • Biggest “Snowden-Style” document leak ever; 11.5 million documents, 2.6TB of data Solution and Benefits • Pulitzer Prize winning investigation resulted in robust coverage of fraud and corruption • PM of Iceland & Pakistan resigned, exposed Putin, Prime Ministers, gangsters, celebrities (Messi) • Led to assassination of journalist in Malta Background • International Consortium of Investigative Journalists (ICIJ), small team of data journalists • International investigative team specializing in cross-border crime, corruption and accountability of power • Works regularly with leaks and large datasets ICIJ Panama Papers INVESTIGATIVE JOURNALISM Fraud Detection / Knowledge Graph70
  • 69. Thomson Reuters Graph 71 • Data Fusion for Portfolio Managers • Graph layers
  • 70. Background • Personal shopping assistant • Converses with buyer via text, picture and voice to provide real-time recommendations • Combines AI and natural language understanding (NLU) in Neo4j Knowledge Graph • First of many apps in eBay's AI Platform Business Problem • Improve personal context in online shopping • Transform buyer-provided context into ideal purchase recommendations over social platforms • "Feels like talking to a friend" Solution and Benefits • 3 developers, 8M nodes, 20M relationships • Needed high-performance traversals to respond to live customer requests • Easy to train new algorithms and grow model • Generating revenue since launch eBay for Google Assistant ONLINE RETAIL Knowledge Graph powers Real-Time Recommendations72 EE Customer since 2016 Q3
  • 71. Background • Over 7M citizens suffer from Diabetes • Connecting over 400 researchers • Incorporates over 50 databases, 100k’s of Excel workbooks, 30 database of biological samples • Sought to examine disease from as many angles as possible. Business Problem • Genes are connected by proteins or to metabolites, and patients are connected with their diets, etc… • Needed to improve the utilization of immensely technical data • Needed to cater to doctors and researchers with simple navigation, communication and connections of the graph. Solution and Benefits • Dr. Alexander Jarasch, Head of Bioinformatics and Data Management • Scientists can conduct parallel research without asking the same questions or repeating tests • Built views like a liver sample knowledge graph DZD - German Center for Diabetes Research Medical Genomic Research73 EE Customer since 2016 Q4