SlideShare a Scribd company logo
Donovan N. Chin & R. Aldrin Denny
   Traditional Drug Discovery (insert graph)
   In Silico Prediction of ADME (insert graph)
    ◦   Potency
    ◦   Absorption
    ◦   Lead
    ◦   Drug
    ◦   Toxicity
    ◦   Excretion
    ◦   Metabolism
    ◦   distribution
   Target IVY(Brute force virtual screening of
    very large compound libraries) Lead
    Discovery IVY(Utilize predictive models
    from Biogen data for more efficient virtual
    screening) Lead Optimization candidate
   (insert graph)
    ◦   Potency
    ◦   Lead
    ◦   Drug
    ◦   Toxicity
    ◦   Excretion
    ◦   Metabolism
    ◦   Distribution
    ◦   absorption
   Goal: Identify crystallographic binding mode,
    Rank order ligands wrt binding with protein

   (insert graph)

   Receptor Docking

   Ligand Shape

   Generate plausible trial binding modes using
    docking function then Re-rank modes with
    scoring function
   (insert graph)
   341 Active
   47 Non-Active
   (insert graph)

   After filtering by Pharmacophore Feature
   (insert graph)
   (insert functions for)
    ◦   F_Score*
    ◦   D_Score
    ◦   G_Score
    ◦   PMF_Score
    ◦   Chem_Score
    ◦   ICM_Score*
   Cell Adhesion Assay (50% Serum)
    ◦ (insert graph)

   Biochemical Adhesion Assay
    ◦ (insert graph)

   Scoring Functions Are Poor More Often Than
    Not
   Receptor Site View Library Design FlexX
    Score Consensus Score>=3 e.g. Contact
    Map, CLogP MW, HBOND Rotatable bonds
    Consensus=5? if yes, substructure exists?
    if yes, Pharmacophore<4.2Å? if yes, Publish
    Hit Report
   (insert graph)
   Goal: Predict hit/miss class based on presence of features
    (fingerprints)
   Method
    ◦ Given a set of N samples
    ◦ Given that some subset A of them are good („active‟)
       Then we estimate for a new compound: P(good)~ A/N
    ◦ Given a set of binary features F
       For a given feature F:
            It appears in N samples
            It appears in A good samples
       Can we estimate: P(good l F)~A/N
            (Problem: Error gets worse as Nsmall)
    ◦ P‟(good l F)= (A+P(good)k)/(n+k)
       P‟(good l F)p(good)as N0
       P‟(good l F) A/N as N large
    ◦ (If K=1/P(good) this is the Laplacian correction)
   Descriptors (insert)
   Advantages
    ◦ Can describe huge number of features (up to 4 billion; MDL 1024; Lead
      scope 27,000)
    ◦ Contains tertiary and stereochemistry information
    ◦ Fast
   Classification Analysis

    ◦ Developing Non-Linear Scoring Functions to classify
      actives and non-actives

    ◦ (insert graphs)

    ◦ Cost Function to Minimize: Gini Impurity N= 1-
      ΣP^2(ω)
   Training Set Prediction Success

   (insert table)

   10-fold cross validation

   Randomly split training and test sets

   Significant Improvement in Separating Actives
    from Non-Actives
   (insert graph)

   Significant Improvement in Finding Hits Using
    New SF
   Optimal tree identified (insert graph)

   No random effects (insert graph)
   (insert cluster)

   Able to identify different molecular property
    criteria that lead to hits
   (insert graph)
   (insert graph)

   Size= magnitude of OBA

   OBA values cover range of descriptor space
   (insert graph)

   Choose 1 & 2D Descriptors for ease of
    interpretation and lower “noise”
   Build Model (insert graphs) Apply Model
   Features found in high OBA

   Features found in low OBA

   Would be nice if CART did similar view
   Improved scoring functions for separating
    hits from non-hits in structure-based drug
    design developed with CART and Bayesian
    models

   Identified key differences in molecular
    physical properties that led to hits

   Built reasonably predictive OBA model
    (cannot expect method to extend to other
    systems given complexity of OBA, however)
   Biogen IDEC

   Modeling
    ◦   Rajiah Denny
    ◦   Claudio Chuaqui
    ◦   Juswinder Singh
    ◦   Herman van Vlijmen
    ◦   Norman Wang
    ◦   Anuj Patel
    ◦   Zhan Deng

   Chemistry
    ◦   Kevin Guckian
    ◦   Dan Scott
    ◦   Thomas Durand-Reville
    ◦   Pat Conlon
    ◦   Charlie Hammond
    ◦   Chuck Jewell

   Pharmacology
    ◦ Tonika Bonhert

More Related Content

Similar to Improved Predictions in Structure-Based Drug Design Using CART and Bayesian Models (20)

PPTX
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Salford Systems
 
PPT
Prediction Of Bioactivity From Chemical Structure
Jeremy Besnard
 
PPTX
Summer 2015 Internship
Taylor Martell
 
PPTX
Metabolomic Data Analysis Workshop and Tutorials (2014)
Dmitry Grapov
 
PDF
Introduction to Chainer Chemistry
Preferred Networks
 
PPT
A Validation of Object-Oriented Design Metrics as Quality Indicators
vie_dels
 
PPT
Cukic Promise08 V3
gregoryg
 
PPTX
Use of Definitive Screening Designs to Optimize an Analytical Method
Philip Ramsey
 
PPT
RBHF_SDM_2011_Jie
MDO_Lab
 
PPTX
ADMET.pptx
Santu Chall
 
PPT
Improving enrichment rates
baoilleach
 
PPTX
Using open bioactivity data for developing machine-learning prediction models...
Sunghwan Kim
 
PDF
In-silico structure activity relationship study of toxicity endpoints by QSAR...
Kamel Mansouri
 
PPTX
Protein functional site prediction using the shotest path graphnew1 2
M Beneragama
 
PDF
Doctoral Thesis Dissertation 2014-03-20 @PoliMi
Davide Chicco
 
PDF
P0126557 slides
Nguyen Chien
 
PDF
Madaari : Ordering For The Monkeys
J On The Beach
 
PDF
consistency regularization for generative adversarial networks_review
Yoonho Na
 
PDF
ExplainingMLModels.pdf
LHong526661
 
PPSX
June 2017: Biomedical applications of prototype-based classifiers and relevan...
University of Groningen
 
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Salford Systems
 
Prediction Of Bioactivity From Chemical Structure
Jeremy Besnard
 
Summer 2015 Internship
Taylor Martell
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Dmitry Grapov
 
Introduction to Chainer Chemistry
Preferred Networks
 
A Validation of Object-Oriented Design Metrics as Quality Indicators
vie_dels
 
Cukic Promise08 V3
gregoryg
 
Use of Definitive Screening Designs to Optimize an Analytical Method
Philip Ramsey
 
RBHF_SDM_2011_Jie
MDO_Lab
 
ADMET.pptx
Santu Chall
 
Improving enrichment rates
baoilleach
 
Using open bioactivity data for developing machine-learning prediction models...
Sunghwan Kim
 
In-silico structure activity relationship study of toxicity endpoints by QSAR...
Kamel Mansouri
 
Protein functional site prediction using the shotest path graphnew1 2
M Beneragama
 
Doctoral Thesis Dissertation 2014-03-20 @PoliMi
Davide Chicco
 
P0126557 slides
Nguyen Chien
 
Madaari : Ordering For The Monkeys
J On The Beach
 
consistency regularization for generative adversarial networks_review
Yoonho Na
 
ExplainingMLModels.pdf
LHong526661
 
June 2017: Biomedical applications of prototype-based classifiers and relevan...
University of Groningen
 

More from Salford Systems (20)

PDF
Datascience101presentation4
Salford Systems
 
PPTX
Improve Your Regression with CART and RandomForests
Salford Systems
 
PPTX
Churn Modeling-For-Mobile-Telecommunications
Salford Systems
 
PPT
The Do's and Don'ts of Data Mining
Salford Systems
 
PPTX
Introduction to Random Forests by Dr. Adele Cutler
Salford Systems
 
PPTX
9 Data Mining Challenges From Data Scientists Like You
Salford Systems
 
PPTX
Statistically Significant Quotes To Remember
Salford Systems
 
PPTX
Using CART For Beginners with A Teclo Example Dataset
Salford Systems
 
PPT
CART Classification and Regression Trees Experienced User Guide
Salford Systems
 
PPTX
Evolution of regression ols to gps to mars
Salford Systems
 
PPTX
Data Mining for Higher Education
Salford Systems
 
PDF
Comparison of statistical methods commonly used in predictive modeling
Salford Systems
 
PDF
Molecular data mining tool advances in hiv
Salford Systems
 
PPTX
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
Salford Systems
 
PDF
SPM v7.0 Feature Matrix
Salford Systems
 
PDF
SPM User's Guide: Introducing MARS
Salford Systems
 
PPT
Hybrid cart logit model 1998
Salford Systems
 
PPTX
Session Logs Tutorial for SPM
Salford Systems
 
PPTX
Some of the new features in SPM 7
Salford Systems
 
PPTX
TreeNet Overview - Updated October 2012
Salford Systems
 
Datascience101presentation4
Salford Systems
 
Improve Your Regression with CART and RandomForests
Salford Systems
 
Churn Modeling-For-Mobile-Telecommunications
Salford Systems
 
The Do's and Don'ts of Data Mining
Salford Systems
 
Introduction to Random Forests by Dr. Adele Cutler
Salford Systems
 
9 Data Mining Challenges From Data Scientists Like You
Salford Systems
 
Statistically Significant Quotes To Remember
Salford Systems
 
Using CART For Beginners with A Teclo Example Dataset
Salford Systems
 
CART Classification and Regression Trees Experienced User Guide
Salford Systems
 
Evolution of regression ols to gps to mars
Salford Systems
 
Data Mining for Higher Education
Salford Systems
 
Comparison of statistical methods commonly used in predictive modeling
Salford Systems
 
Molecular data mining tool advances in hiv
Salford Systems
 
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
Salford Systems
 
SPM v7.0 Feature Matrix
Salford Systems
 
SPM User's Guide: Introducing MARS
Salford Systems
 
Hybrid cart logit model 1998
Salford Systems
 
Session Logs Tutorial for SPM
Salford Systems
 
Some of the new features in SPM 7
Salford Systems
 
TreeNet Overview - Updated October 2012
Salford Systems
 
Ad

Recently uploaded (20)

PDF
Market Wrap for 18th July 2025 by CIFDAQ
CIFDAQ
 
PDF
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
PPTX
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PDF
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
PDF
Novus-Safe Pro: Brochure-What is Novus Safe Pro?.pdf
Novus Hi-Tech
 
PDF
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
PDF
How Current Advanced Cyber Threats Transform Business Operation
Eryk Budi Pratama
 
PDF
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PPTX
The Yotta x CloudStack Advantage: Scalable, India-First Cloud
ShapeBlue
 
PPTX
Simplifying End-to-End Apache CloudStack Deployment with a Web-Based Automati...
ShapeBlue
 
PDF
Novus Safe Lite- What is Novus Safe Lite.pdf
Novus Hi-Tech
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
Women in Automation Presents: Reinventing Yourself — Bold Career Pivots That ...
DianaGray10
 
PDF
HydITEx corporation Booklet 2025 English
Георгий Феодориди
 
PDF
Bitcoin+ Escalando sin concesiones - Parte 1
Fernando Paredes García
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PDF
Upskill to Agentic Automation 2025 - Kickoff Meeting
DianaGray10
 
Market Wrap for 18th July 2025 by CIFDAQ
CIFDAQ
 
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
Novus-Safe Pro: Brochure-What is Novus Safe Pro?.pdf
Novus Hi-Tech
 
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
How Current Advanced Cyber Threats Transform Business Operation
Eryk Budi Pratama
 
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
The Yotta x CloudStack Advantage: Scalable, India-First Cloud
ShapeBlue
 
Simplifying End-to-End Apache CloudStack Deployment with a Web-Based Automati...
ShapeBlue
 
Novus Safe Lite- What is Novus Safe Lite.pdf
Novus Hi-Tech
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
Women in Automation Presents: Reinventing Yourself — Bold Career Pivots That ...
DianaGray10
 
HydITEx corporation Booklet 2025 English
Георгий Феодориди
 
Bitcoin+ Escalando sin concesiones - Parte 1
Fernando Paredes García
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
Upskill to Agentic Automation 2025 - Kickoff Meeting
DianaGray10
 
Ad

Improved Predictions in Structure-Based Drug Design Using CART and Bayesian Models

  • 1. Donovan N. Chin & R. Aldrin Denny
  • 2. Traditional Drug Discovery (insert graph)  In Silico Prediction of ADME (insert graph) ◦ Potency ◦ Absorption ◦ Lead ◦ Drug ◦ Toxicity ◦ Excretion ◦ Metabolism ◦ distribution
  • 3. Target IVY(Brute force virtual screening of very large compound libraries) Lead Discovery IVY(Utilize predictive models from Biogen data for more efficient virtual screening) Lead Optimization candidate
  • 4. (insert graph) ◦ Potency ◦ Lead ◦ Drug ◦ Toxicity ◦ Excretion ◦ Metabolism ◦ Distribution ◦ absorption
  • 5. Goal: Identify crystallographic binding mode, Rank order ligands wrt binding with protein  (insert graph)  Receptor Docking  Ligand Shape  Generate plausible trial binding modes using docking function then Re-rank modes with scoring function
  • 6. (insert graph)  341 Active  47 Non-Active
  • 7. (insert graph)  After filtering by Pharmacophore Feature
  • 8. (insert graph)
  • 9. (insert functions for) ◦ F_Score* ◦ D_Score ◦ G_Score ◦ PMF_Score ◦ Chem_Score ◦ ICM_Score*
  • 10. Cell Adhesion Assay (50% Serum) ◦ (insert graph)  Biochemical Adhesion Assay ◦ (insert graph)  Scoring Functions Are Poor More Often Than Not
  • 11. Receptor Site View Library Design FlexX Score Consensus Score>=3 e.g. Contact Map, CLogP MW, HBOND Rotatable bonds Consensus=5? if yes, substructure exists? if yes, Pharmacophore<4.2Å? if yes, Publish Hit Report
  • 12. (insert graph)
  • 13. Goal: Predict hit/miss class based on presence of features (fingerprints)  Method ◦ Given a set of N samples ◦ Given that some subset A of them are good („active‟)  Then we estimate for a new compound: P(good)~ A/N ◦ Given a set of binary features F  For a given feature F:  It appears in N samples  It appears in A good samples  Can we estimate: P(good l F)~A/N  (Problem: Error gets worse as Nsmall) ◦ P‟(good l F)= (A+P(good)k)/(n+k)  P‟(good l F)p(good)as N0  P‟(good l F) A/N as N large ◦ (If K=1/P(good) this is the Laplacian correction)  Descriptors (insert)  Advantages ◦ Can describe huge number of features (up to 4 billion; MDL 1024; Lead scope 27,000) ◦ Contains tertiary and stereochemistry information ◦ Fast
  • 14. Classification Analysis ◦ Developing Non-Linear Scoring Functions to classify actives and non-actives ◦ (insert graphs) ◦ Cost Function to Minimize: Gini Impurity N= 1- ΣP^2(ω)
  • 15. Training Set Prediction Success  (insert table)  10-fold cross validation  Randomly split training and test sets  Significant Improvement in Separating Actives from Non-Actives
  • 16. (insert graph)  Significant Improvement in Finding Hits Using New SF
  • 17. Optimal tree identified (insert graph)  No random effects (insert graph)
  • 18. (insert cluster)  Able to identify different molecular property criteria that lead to hits
  • 19. (insert graph)
  • 20. (insert graph)  Size= magnitude of OBA  OBA values cover range of descriptor space
  • 21. (insert graph)  Choose 1 & 2D Descriptors for ease of interpretation and lower “noise”
  • 22. Build Model (insert graphs) Apply Model
  • 23. Features found in high OBA  Features found in low OBA  Would be nice if CART did similar view
  • 24. Improved scoring functions for separating hits from non-hits in structure-based drug design developed with CART and Bayesian models  Identified key differences in molecular physical properties that led to hits  Built reasonably predictive OBA model (cannot expect method to extend to other systems given complexity of OBA, however)
  • 25. Biogen IDEC  Modeling ◦ Rajiah Denny ◦ Claudio Chuaqui ◦ Juswinder Singh ◦ Herman van Vlijmen ◦ Norman Wang ◦ Anuj Patel ◦ Zhan Deng  Chemistry ◦ Kevin Guckian ◦ Dan Scott ◦ Thomas Durand-Reville ◦ Pat Conlon ◦ Charlie Hammond ◦ Chuck Jewell  Pharmacology ◦ Tonika Bonhert