The utility of for selecting
drugs and probes as systems biology
perturbagens
Christopher Southan
Modelling Biological Systems Symposium, Imperial College London, 26 March 20191
https://sites.google.com/view/tw2informatics/home
Outline
• Chemical perturbation
• PubChem content and coverage
• Targets
• Drugs
• Probes
• Looking at BAY 7598
• Getting into PubChem
• Conclusions
2
Advantages of chemical peuterbation of
biological systems
• Dose response, rapid onset, and time course measurement
• Reversable by washout > retest by pulsing
• Should be robustly reproducible
• Broad choice of controls (e.g. different binding sites, potencies, kinetics,
chemotypes, SAR including inactive analogues)
• In vitro and In cellulo target engagment/mechanistic validation
• Protein/Protein interaction inhibitors
• Option of multiple pathway/network intervention testing
• Omic profiling of effects (e.g. transcripts, imaging, metabolites, proteomics)
• Orthogonal hypothesis testing viaCRISPR, RNAi, KO, antibody ect
• Peptides, nucleotides, antibodies or endogenous metabolites can be tried
• Druggable genome target coverage increasing (e.g. PROTACs)
• Can use compounds with unkown mechanism (e.g. phenotypic screening hits)
• Not restricted to proteins (e.g. RNA-small molecule interactions)
• ~ 3 million bioactive compounds available in PubChem
• Analogue expansion by synthetic chemistry 3
TheTriumvirate:
Substance, Compound, BioAssay
4
• SIDs can be
biololecules (e.g. large
peptides and
antobodies, no images
• CIDs merge SMILES
strings < 1000 atoms,
to unique InChIs
• Average SID:CID ~ 2.6,
drugs ~50, aspirin (CID
244) 307 plus 1563
mixture SIDs
• CIDs 4.7 % mixtures
PubChem CID growth 2005 - 2018
5
CID statistics overview
( March 2019)
6
Potential perturbagen coverage in PubChem
• 80 million rule-of-five chemical space
• 3.4 million tested > 1.25 million active in BioAssay
• 1.0 million assays > 240 million bioactivities
• Activities extracted from ~ 100K papers
• 12,000 protein targets
• ~ 8000 PDB ligands
• 12,000 with BioSytems pathway mappings
• 9500 drugs, ~ 1500 FDA approved
• BindingDB 20,500 cpds, 260 patents, 42,000 activities (just 2019)
• Activity data not in PubChem but in patents ~2.5 million
• Probes, full leads and develpment cpds unkown (not cleanly tagged)
7
Curated sources of human targets
8
• All four are PubChem
submitting sources
for linked chemistry
• Most extracted from
publications
• They all have UniProt
cross-references
• Numbers shown are
for Human Swiss-Prot
• Represent a maximal
and consensus
druggable genome
Drugs vs leads vs probes vs tools: Drugs
• Approved drugs have accumulated data on systems perturbations and
molecular mechanisms of action
• However, the primary human target number is below 400 and little may
be known on secondary targets and (systems) polypharmacology
• Optimisation of ADMET drug parameters may reduce desirable
perturbagen properties such as potency, Kd in vitro and in cellulo
• However the same ADMET optimisation means they can go straight into
cells, organoids, rodents, zebrafish or other model systems
• Chemical vendor and patent SAR coverage are usefully high for drugs
• Drugs are designed for patient efficacy but not to provide systems
biology mechanistic insights per se
• Past and present development compounds (~ 15,000 +) cover a much
wider range of targets but are difficult to select cleanly in PubChem
9
Selecting approved drugs from Guide to Pharmacology
10
9526 SIDs > 1509 approved drugs > mapping to 1321 CIDs
Old drugs
11
New drugs
12
https://cdsouthan.blogspot.com/2019/01/2018-approved-drugs-in-pubchem.html
Probes, repurposing and tools
• Probes are designed to modulate a target for mechanistic and system
insights rather than therapeutic utility in first instance
• Probe guidelines published on potency, specificity and activity in cells
• Interesting differences between the NIH first generation probes circa 1990-
2010 and the later generation from SGC and others in last few years
• Repurposing compounds from AstraZeneca, Boehringer and other
companies also potentially useful as perturbagens
• Tool compound is a broader term for anything with useful specificity that
might not be strictly a drug or a probe (e.g. spider venoms against ion
channels)
13
Useful external probe source for
”mapping in” to PubChem
14
Tracking a probe example into PubChem
15
Not so easy to get the structure
16
The bad news : PubChem has it from a Bayer patent
but no BioAssay data and not named as BAY 7598
17
The good news : MMP12 is indexed in PubChem
with 7 Guide to Pharmacology curated inhibitors
18
You too can get
into PubChem
19
If your perturbation
experiments are
reproducible then put the
results into PubChem
BioAssay (to be FAIR)
https://cdsouthan.blogspot.com/2017/08/getting-into-pubchem-again.html
Conclusions
• As the de facto global nexus for bioactive chemistry PubChem has become
an essential resource for perturbagen choices and cross-checks
• Between drugs, development compounds, leads, probes and patent SAR
sets there is a increasingly wide chemical toolbox choice
• PubChem synergises with the stand-alone sources integrated into it such as
ChEMBL, Guide to Pharmacology, DrugBank, BindingDB and DrugCentral
• Note these may offer easier entry points but follow structures back into
PubChem for source checking, BioAssay assessment, neighbour assessment
and patent links
• UniProt offers an alternative entry point from the chemically modulatable
protein target side and their cross-reference intersecting is very powerful
• Despite their increasing importance collective indexing of probes is poor
• Like all such sources, PubChem it has navigation quirks, caveats, submitter
quality issues, gotchas and cheminformatic challenges
20
References
21
Further PubChem tips and tricks
22
https://www.slideshare.net/cdsouthan/presentations https://cdsouthan.blogspot.com/

PubChem as a source of systems biology perturbagens

  • 1.
    The utility offor selecting drugs and probes as systems biology perturbagens Christopher Southan Modelling Biological Systems Symposium, Imperial College London, 26 March 20191 https://sites.google.com/view/tw2informatics/home
  • 2.
    Outline • Chemical perturbation •PubChem content and coverage • Targets • Drugs • Probes • Looking at BAY 7598 • Getting into PubChem • Conclusions 2
  • 3.
    Advantages of chemicalpeuterbation of biological systems • Dose response, rapid onset, and time course measurement • Reversable by washout > retest by pulsing • Should be robustly reproducible • Broad choice of controls (e.g. different binding sites, potencies, kinetics, chemotypes, SAR including inactive analogues) • In vitro and In cellulo target engagment/mechanistic validation • Protein/Protein interaction inhibitors • Option of multiple pathway/network intervention testing • Omic profiling of effects (e.g. transcripts, imaging, metabolites, proteomics) • Orthogonal hypothesis testing viaCRISPR, RNAi, KO, antibody ect • Peptides, nucleotides, antibodies or endogenous metabolites can be tried • Druggable genome target coverage increasing (e.g. PROTACs) • Can use compounds with unkown mechanism (e.g. phenotypic screening hits) • Not restricted to proteins (e.g. RNA-small molecule interactions) • ~ 3 million bioactive compounds available in PubChem • Analogue expansion by synthetic chemistry 3
  • 4.
    TheTriumvirate: Substance, Compound, BioAssay 4 •SIDs can be biololecules (e.g. large peptides and antobodies, no images • CIDs merge SMILES strings < 1000 atoms, to unique InChIs • Average SID:CID ~ 2.6, drugs ~50, aspirin (CID 244) 307 plus 1563 mixture SIDs • CIDs 4.7 % mixtures
  • 5.
    PubChem CID growth2005 - 2018 5
  • 6.
  • 7.
    Potential perturbagen coveragein PubChem • 80 million rule-of-five chemical space • 3.4 million tested > 1.25 million active in BioAssay • 1.0 million assays > 240 million bioactivities • Activities extracted from ~ 100K papers • 12,000 protein targets • ~ 8000 PDB ligands • 12,000 with BioSytems pathway mappings • 9500 drugs, ~ 1500 FDA approved • BindingDB 20,500 cpds, 260 patents, 42,000 activities (just 2019) • Activity data not in PubChem but in patents ~2.5 million • Probes, full leads and develpment cpds unkown (not cleanly tagged) 7
  • 8.
    Curated sources ofhuman targets 8 • All four are PubChem submitting sources for linked chemistry • Most extracted from publications • They all have UniProt cross-references • Numbers shown are for Human Swiss-Prot • Represent a maximal and consensus druggable genome
  • 9.
    Drugs vs leadsvs probes vs tools: Drugs • Approved drugs have accumulated data on systems perturbations and molecular mechanisms of action • However, the primary human target number is below 400 and little may be known on secondary targets and (systems) polypharmacology • Optimisation of ADMET drug parameters may reduce desirable perturbagen properties such as potency, Kd in vitro and in cellulo • However the same ADMET optimisation means they can go straight into cells, organoids, rodents, zebrafish or other model systems • Chemical vendor and patent SAR coverage are usefully high for drugs • Drugs are designed for patient efficacy but not to provide systems biology mechanistic insights per se • Past and present development compounds (~ 15,000 +) cover a much wider range of targets but are difficult to select cleanly in PubChem 9
  • 10.
    Selecting approved drugsfrom Guide to Pharmacology 10 9526 SIDs > 1509 approved drugs > mapping to 1321 CIDs
  • 11.
  • 12.
  • 13.
    Probes, repurposing andtools • Probes are designed to modulate a target for mechanistic and system insights rather than therapeutic utility in first instance • Probe guidelines published on potency, specificity and activity in cells • Interesting differences between the NIH first generation probes circa 1990- 2010 and the later generation from SGC and others in last few years • Repurposing compounds from AstraZeneca, Boehringer and other companies also potentially useful as perturbagens • Tool compound is a broader term for anything with useful specificity that might not be strictly a drug or a probe (e.g. spider venoms against ion channels) 13
  • 14.
    Useful external probesource for ”mapping in” to PubChem 14
  • 15.
    Tracking a probeexample into PubChem 15
  • 16.
    Not so easyto get the structure 16
  • 17.
    The bad news: PubChem has it from a Bayer patent but no BioAssay data and not named as BAY 7598 17
  • 18.
    The good news: MMP12 is indexed in PubChem with 7 Guide to Pharmacology curated inhibitors 18
  • 19.
    You too canget into PubChem 19 If your perturbation experiments are reproducible then put the results into PubChem BioAssay (to be FAIR) https://cdsouthan.blogspot.com/2017/08/getting-into-pubchem-again.html
  • 20.
    Conclusions • As thede facto global nexus for bioactive chemistry PubChem has become an essential resource for perturbagen choices and cross-checks • Between drugs, development compounds, leads, probes and patent SAR sets there is a increasingly wide chemical toolbox choice • PubChem synergises with the stand-alone sources integrated into it such as ChEMBL, Guide to Pharmacology, DrugBank, BindingDB and DrugCentral • Note these may offer easier entry points but follow structures back into PubChem for source checking, BioAssay assessment, neighbour assessment and patent links • UniProt offers an alternative entry point from the chemically modulatable protein target side and their cross-reference intersecting is very powerful • Despite their increasing importance collective indexing of probes is poor • Like all such sources, PubChem it has navigation quirks, caveats, submitter quality issues, gotchas and cheminformatic challenges 20
  • 21.
  • 22.
    Further PubChem tipsand tricks 22 https://www.slideshare.net/cdsouthan/presentations https://cdsouthan.blogspot.com/