SlideShare a Scribd company logo
Ai Driven Drug and
Vaccine Discovery:
Knowledge Graph + Apache Spark
John Hunter
Lead Architect, Knowledge Graph, GSK
Agenda
§ Intro to drug R&D
§ Why a Knowledge Graph?
§ Knowledge Graph Data
§ Querying a Knowledge Graph
§ Bellman OSS Sparql library
§ The future
Intro to drug & vaccine R&D
Drug and Vaccine R&D pipeline
• 12-15 years to bring a drug or vaccine to market
• Phases
• Research ⬅ Focus of this talk
• Pre-clinical
• Clinical trials
• Government agency review
• Post market monitoring
Typical R&D data use cases
Accelerate research and development with data at scale
• Discover
• Encode
• Predict
How to unify and
manage siloed
data for research?
Use a
Knowledge Graph
Why a Knowledge Graph?
Why a Knowledge Graph?
Connec@ons are priority constructs
Disease Gene
is_associated_with
Subject Predicate Object
Why a Knowledge Graph?
Data can be federated and is human + machine readable
Compound
Disease Gene
Symptom
Side
Effect
Pharma
class
Upregulates
Downregulates
Causes
is_associated_with
is_associated_with
Treats
Palliates
Presents
Includes
Why a Knowledge Graph on Spark?
• Over 500 Billion connections and growing
• High read/write throughput
• Mature tech
• Fantastic OSS and expert community
Knowledge Graph Data
RDF
Resource Definition Framework
RDF Triples
Ntriple format
Disease Gene
is_associated_with
Subject Predicate Object
<Disease> <is_associated_with> <Gene>
RDF Quads
SPOG table
<disease_5> <graph1>
<disease_4> <equivalent_to>
<associated_with> <gene3> <graph2>
<disease_3>
<disease_1> <gene1>
<associated_with>
O : String G : String
<disease_2> <gene2>
<associated_with>
S : String
<graph2>
P : String
<graph1>
…
RDF data store
Data architecture
<disease_5> <graph1>
<disease_4> <equivalent_to>
<associated_with> <gene3> <graph2>
<disease_3>
<disease_1> <gene1>
<associated_with>
O : String G : String
<disease_2> <gene2>
<associated_with>
S : String
<graph2>
P : String
<graph1>
graph1
<associated_with>
Part001.parquet
Part002.parquet
…
PartNNN.parquet
<equavalent_to>
Part001.parquet
Part002.parquet
…
PartNNN.parquet
graph2
…
…
RDF data store
Addi@onal requirements
• Data abstraction
• ACID transactions: Atomic writes, isolated reads
• Dedup
• Incremental queries
• Apache Hudi, Delta Lake
Knowledge Graph Querying
Sparql
Sparql Protocol and RDF Query Language
SELECT ?s ?p ?o
FROM <graph1>
FROM <graph2>
WHERE {
?s ?p ?o
}
LIMIT 10
Sparql
• Commercial offerings
• OSS offerings
• Apache Jena
• Blazegraph
• Sansa Stack
Build vs Buy
Sparql
Bellman Engine
• OSS project sponsored by GSK
• Co-developed by GSK & 47Degrees
• Please star and contribute!
• https://github.com/gsk-aiops/bellman
Sparql
• Parser
• Compiler
• Static analysis
• Optimizer
• Engine
• Formatter
Bellman architecture
Demo
• h[ps://github.com/gsk-aiops/bellman-tools
Querying Knowledge Graphs using Bellman
Future
• Complete Sparql language coverage
• Optimization of queries
• Optimization of data on disk
• Incremental / point in time queries
Bellman next steps
Thank you
Links:
https://github.com/gsk-aiops/bellman
https://github.com/gsk-aiops/bellman-tools
Connect on Github:
https://github.com/JNKHunter

More Related Content

What's hot (20)

PDF
Introduction to LLMs
Loic Merckel
 
PPT
Introduction to MongoDB
Ravi Teja
 
PDF
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
PDF
Spark (Structured) Streaming vs. Kafka Streams
Guido Schmutz
 
PPTX
Advanced Change Data Streaming Patterns in Distributed Systems | Gunnar Morli...
HostedbyConfluent
 
PDF
Time to Talk about Data Mesh
LibbySchulze
 
PDF
Data Modeling & Metadata for Graph Databases
DATAVERSITY
 
PPTX
Getting your enterprise ready for Microsoft 365 Copilot
Vignesh Ganesan I Microsoft MVP
 
PPTX
introduction Azure OpenAI by Usama wahab khan
Usama Wahab Khan Cloud, Data and AI
 
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Jean-Paul Azar
 
PDF
Creating a Data validation and Testing Strategy
RTTS
 
PDF
Discover AI with Microsoft Azure
Jürgen Ambrosi
 
PPTX
Azure Chat Bot application
Vivek Singh
 
PDF
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
HostedbyConfluent
 
PDF
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Neo4j
 
PPTX
Dataiku - From Big Data To Machine Learning
Dataiku
 
PDF
Building Event Driven (Micro)services with Apache Kafka
Guido Schmutz
 
PDF
Apache Spark Introduction
sudhakara st
 
PPTX
Databricks Platform.pptx
Alex Ivy
 
PPTX
Elastic stack Presentation
Amr Alaa Yassen
 
Introduction to LLMs
Loic Merckel
 
Introduction to MongoDB
Ravi Teja
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
Spark (Structured) Streaming vs. Kafka Streams
Guido Schmutz
 
Advanced Change Data Streaming Patterns in Distributed Systems | Gunnar Morli...
HostedbyConfluent
 
Time to Talk about Data Mesh
LibbySchulze
 
Data Modeling & Metadata for Graph Databases
DATAVERSITY
 
Getting your enterprise ready for Microsoft 365 Copilot
Vignesh Ganesan I Microsoft MVP
 
introduction Azure OpenAI by Usama wahab khan
Usama Wahab Khan Cloud, Data and AI
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Jean-Paul Azar
 
Creating a Data validation and Testing Strategy
RTTS
 
Discover AI with Microsoft Azure
Jürgen Ambrosi
 
Azure Chat Bot application
Vivek Singh
 
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
HostedbyConfluent
 
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Neo4j
 
Dataiku - From Big Data To Machine Learning
Dataiku
 
Building Event Driven (Micro)services with Apache Kafka
Guido Schmutz
 
Apache Spark Introduction
sudhakara st
 
Databricks Platform.pptx
Alex Ivy
 
Elastic stack Presentation
Amr Alaa Yassen
 

Similar to Drug and Vaccine Discovery: Knowledge Graph + Apache Spark (20)

PDF
From Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases
Neo4j
 
PDF
Vipada siri knowledge_graph
omoanemo
 
PPTX
Final-Presentation
Revanth Malay
 
PPTX
AI at GSK_Kim Branson_mHealth Israel
Levi Shapiro
 
PDF
Neo4j for Healthcare & Life Sciences
Neo4j
 
PDF
From Advanced Queries to Algorithms and Graph-Based ML: Tackling Diabetes wit...
Neo4j
 
PPT
Stratergies for the intergration of information (IPI_ConfEX)
Ben Gardner
 
PPTX
TranSMART presentation
milanoj1
 
PDF
Drug Discovery Knowledge Graph
Tomás Sabat
 
PDF
AI Pharma Summit Keynote Boston 7-26-17
Brandon Allgood
 
PPTX
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data
Maulik Kamdar
 
PPTX
Equivalence is in the (ID) of the beholder
mhaendel
 
PPT
provenance of microarray experiments
Helena Deus
 
PPTX
How SAP HANA can provide value for Pharma R&D
Marc Maurer
 
PDF
JN resumeDS 050516
James Nelson
 
PPT
Quantitative Medicine Feb 2009
Ian Foster
 
PDF
Linked Data for improved organization of research data
Samuel Lampa
 
PDF
The Monarch Initiative: From Model Organism to Precision Medicine
mhaendel
 
PDF
TIGA: Target Illumination GWAS Analytics
Jeremy Yang
 
PDF
DisGeNET Tutorial SWAT4LS 2015-12-07
Núria Queralt Rosinach
 
From Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases
Neo4j
 
Vipada siri knowledge_graph
omoanemo
 
Final-Presentation
Revanth Malay
 
AI at GSK_Kim Branson_mHealth Israel
Levi Shapiro
 
Neo4j for Healthcare & Life Sciences
Neo4j
 
From Advanced Queries to Algorithms and Graph-Based ML: Tackling Diabetes wit...
Neo4j
 
Stratergies for the intergration of information (IPI_ConfEX)
Ben Gardner
 
TranSMART presentation
milanoj1
 
Drug Discovery Knowledge Graph
Tomás Sabat
 
AI Pharma Summit Keynote Boston 7-26-17
Brandon Allgood
 
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data
Maulik Kamdar
 
Equivalence is in the (ID) of the beholder
mhaendel
 
provenance of microarray experiments
Helena Deus
 
How SAP HANA can provide value for Pharma R&D
Marc Maurer
 
JN resumeDS 050516
James Nelson
 
Quantitative Medicine Feb 2009
Ian Foster
 
Linked Data for improved organization of research data
Samuel Lampa
 
The Monarch Initiative: From Model Organism to Precision Medicine
mhaendel
 
TIGA: Target Illumination GWAS Analytics
Jeremy Yang
 
DisGeNET Tutorial SWAT4LS 2015-12-07
Núria Queralt Rosinach
 
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
Databricks
 
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
PPT
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 4
Databricks
 
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
PDF
Democratizing Data Quality Through a Centralized Platform
Databricks
 
PDF
Learn to Use Databricks for Data Science
Databricks
 
PDF
Why APM Is Not the Same As ML Monitoring
Databricks
 
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
PDF
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
PDF
Sawtooth Windows for Feature Aggregations
Databricks
 
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
PDF
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
PDF
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
PDF
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Ad

Recently uploaded (20)

PPTX
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
PDF
apidays Munich 2025 - Making Sense of AI-Ready APIs in a Buzzword World, Andr...
apidays
 
PDF
McKinsey - Global Energy Perspective 2023_11.pdf
niyudha
 
PPTX
Insurance-Analytics-Branch-Dashboard (1).pptx
trivenisapate02
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PDF
apidays Munich 2025 - Developer Portals, API Catalogs, and Marketplaces, Miri...
apidays
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PDF
Top Civil Engineer Canada Services111111
nengineeringfirms
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PPTX
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PPTX
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
PPTX
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
PPTX
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
PPTX
UVA-Ortho-PPT-Final-1.pptx Data analytics relevant to the top
chinnusindhu1
 
PPT
introdution to python with a very little difficulty
HUZAIFABINABDULLAH
 
PDF
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
PPTX
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
apidays Munich 2025 - Making Sense of AI-Ready APIs in a Buzzword World, Andr...
apidays
 
McKinsey - Global Energy Perspective 2023_11.pdf
niyudha
 
Insurance-Analytics-Branch-Dashboard (1).pptx
trivenisapate02
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
apidays Munich 2025 - Developer Portals, API Catalogs, and Marketplaces, Miri...
apidays
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Top Civil Engineer Canada Services111111
nengineeringfirms
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
UVA-Ortho-PPT-Final-1.pptx Data analytics relevant to the top
chinnusindhu1
 
introdution to python with a very little difficulty
HUZAIFABINABDULLAH
 
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 

Drug and Vaccine Discovery: Knowledge Graph + Apache Spark