AI Is A Viable Alternative To High Throughput Screening: A 318 Target Study
AI Is A Viable Alternative To High Throughput Screening: A 318 Target Study
com/scientificreports
High throughput screening (HTS) is routinely used to identify bioactive small molecules. This requires
physical compounds, which limits coverage of accessible chemical space. Computational approaches
combined with vast on-demand chemical libraries can access far greater chemical space, provided that
the predictive accuracy is sufficient to identify useful molecules. Through the largest and most diverse
virtual HTS campaign reported to date, comprising 318 individual projects, we demonstrate that our
AtomNet® convolutional neural network successfully finds novel hits across every major therapeutic
area and protein class. We address historical limitations of computational screening by demonstrating
success for target proteins without known binders, high-quality X-ray crystal structures, or manual
cherry-picking of compounds. We show that the molecules selected by the AtomNet® model are novel
drug-like scaffolds rather than minor modifications to known bioactive compounds. Our empirical
results suggest that computational methods can substantially replace HTS as the first step of small-
molecule drug discovery.
Despite present interest in AI/ML and thirty years of case studies1–4, computational screening techniques
have achieved limited adoption within the pharmaceutical industry. A recent investigation into the origins
of 156 clinical candidates5 found that only 1% came from virtual screening; in contrast, over 90% of clinical
candidates were derived from patent busting or high throughput screening (HTS). Unfortunately, these sources
are increasingly challenged, given the pharmaceutical industry’s shift to novel target classes, such as proximity-
induced protein degradation6, protein–protein i nteractions7, and RNA t argeting8.
Currently, HTS is the critical tool in drug discovery, providing most novel scaffolds of recent clinical
candidates5,9,10. These initial starting points crucially shape the course of downstream medicinal chemistry efforts,
as most drugs preserve at least 80% of the scaffold of the initially identified lead11. Despite these foundational
contributions, HTS suffers from practical limitations. Principally, HTS, like all physical experiments, requires that
the compounds exist. However, with the advent of synthesis-on-demand libraries, most commercially-available
molecules have yet to be synthesized. Still, they can be made and delivered for testing in a matter of w eeks12–14.
14,15
These libraries comprise trillions of molecules that exemplify millions of otherwise-unavailable scaffolds12,
providing an opportunity to substantially expand the scope and diversity of available chemical space explored
in the standard drug discovery process.
Computational approaches unlock this opportunity by reversing the requirement to make molecules before
testing them. When computational experiments replace HTS as the primary screen, molecules are tested before
they are made, and the results from these experiments can inform which molecules are worth synthesizing.
Computational experiments further promise to improve upon HTS in terms of cost, speed, need to produce
significant quantities of protein16, effort of miniaturizing assay formats while maintaining experimental
integrity17–19, and reducing false-positive and false-negative r ates16,20–23 including artifacts from aggregation,
covalent modification of the target, autofluorescence, or interactions with the reporter rather than the t arget20,24,25.
Historical computational techniques such as ligand-based Q SAR26–28, structure-based d
ocking29,30, and machine
learning31,32 purport to address these limitations of physical screening methods. Unfortunately, these techniques
have not replaced HTS; in fact, despite increasing interest in ML, the proportion of drugs discovered with
computational techniques has remained steady over the past d ecades5,10.
Because there will always be individual targets for which one screening technique can identify more hits than
another, the key question governing if computation is ready to be the default hit discovery technique is whether
computational screens can identify hits successfully across a broad range of diverse targets. Unfortunately, despite
excellent benchmark a ccuracies33–35, prospective discovery accuracy remains m odest33,36,37. For example, Cerón-
38
Carrasco reported over 700 virtual screens against the SARS-CoV-2 main protease. However, when the author
*
A list of authors and their affiliations appears at the end of the paper. *email: [email protected]
Vol.:(0123456789)
www.nature.com/scientificreports/
sought to validate the computational predictions via physical experiments, the identified compounds were barely
active (800uM). Computational approaches have also been limited by a need for extensive target-specific training
data31,39–41, a requirement for high-quality X-ray crystal s tructures42,43, dependence on human adjudication
(so-called ‘cherry-picking’)12, or a limited domain of applicability44–48. Even recent systems have demonstrated
utility only in identifying minor variants of known molecules for well-studied proteins with tens of thousands
of known binders in their training d ata49,50. Figure 1 exemplifies the striking similarities between recently
ML-developed compounds and their preceding published chemical matter. This is particularly concerning, as
a myopic focus on well-studied proteins has been identified as a cause of low productivity in pharmaceutical
discovery51.
Nevertheless, we have observed that deep learning approaches are not as limited as these historical examples
would imply. Using our AtomNet52–54 screening system, we have previously reported success in finding novel
scaffolds for targets without known l igands55–57, X-ray crystal s tructures56–60, or b
oth56,57, as well as challenging
59,61 60
modulation via protein–protein interaction or allosteric binding (see Supplementary Table S1 for examples).
However, individual examples do not demonstrate the overall success of such deep learning systems. We therefore
report our internal discovery efforts against 22 targets of pharmaceutical interest. We then attempted to further
assess the generalizability and robustness of deep learning predictive systems by identifying bioactive molecules
for a diverse set of targets. We partnered with 482 academic labs and screening centers, from 257 different
academic institutions across 30 countries, through our academic collaboration program, the Artificial Intelligence
Molecular Screen (AIMS). This collaboration afforded an opportunity to prospectively evaluate the utility of
the AtomNet model as a primary screen across a broad range of diverse, challenging, and realistic targets.
In aggregate, we report successes and failures from 318 prospective experiments and evaluate our AtomNet
machine-learning technology’s ability to serve as a viable alternative to physical HTS campaigns.
Results
We investigated the ability of deep learning-based methods to identify novel bioactive chemotypes by applying
the AtomNet model to identify hits for 22 internal targets of pharmaceutical interest. We also explored the
breadth of applicability of this approach by attempting to identify drug-like hits in single-dose screens for 296
academic targets, of which 49 were followed up with dose–response experiments, and 21 were further validated
by exploring analogs of the initial hits. The average hit rate for our internal projects (6.7%) was comparable to
the hit rate for our academic collaborations (7.6%).
Figure 1. Pairs of representative compounds extracted from AI patents (right) and corresponding prior patents
(left) for clinical-stage programs (CDK792,93, A2Ar-antagonist94,95, MALT196,97, QPCTL98,99, USP1100,101, and
3CLpro102,103). The identical atoms between the chemical structures are highlighted in red.
Vol:.(1234567890)
www.nature.com/scientificreports/
is several thousand times larger than HTS libraries and even exceeds the size of most DELs without suffering
limitations of DNA-compatible c hemistry16,23. Each screen requires over 40,000 CPUs, 3,500 GPUs, 150 TB of
main memory, and 55 TB of data transfers. We describe the protocol in detail in the Methods section; briefly,
we computationally scored each catalog compound after removing molecules that were prone to interfere with
the assays or were too similar to known binders of the target or its homologs. The neural network analyzes and
scores the 3D coordinates of each generated protein–ligand co-complex, producing a list of ligands ranked by
their predicted binding probability. Our workflow then clusters the top-ranked molecules to ensure diversity and
algorithmically selects the highest-scoring exemplars from each cluster. At no point are compounds manually
cherry-picked. The molecules were synthesized at Enamine (https://enamine.net) and quality controlled by
LC–MS to purity > 90%, in agreement with HTS standards63. Hits were further validated using NMR. We then
physically tested, on average, 440 compounds per target at reputable contract research organizations (CROs),
while attempting to mitigate assay interferences such as aggregation and oxidation with standard additives
(e.g., Tween-20, Triton-X 100, and dithiothreitol (DTT)). We describe the assay protocols in detail in the
Supplementary Data S1.
We describe the results of the 22 experiments in Table 1. In 91% of the experiments, we identified single-
dose (SD) hits that were reconfirmed in dose–response (DR) experiments. The average target DR hit rate was
6.7% compared to 8.8% from the SD screens. Only 16 of the 22 projects were structurally enabled with X-ray
crystallography; one used a cryo-EM structure, while five used homology models with an average sequence
identity of 42% to their template protein. The DR hit rate for the cryo-EM project was 10.56%, while the average
hit rate for the homology models was a similar 10.8%.
We then advanced 14 projects with at least one dose-responsive scaffold to a round of analog expansion. We
found new bioactive analogs in the SD screen for all projects, with an average hit rate of 29.8%. Further validation
with DR resulted in an average hit rate of 26% per project, which compares favorably with typical HTS hit rates
ranging from 0.151 to 0.001%64,65. We note that the size and chemical diversity within and between physical66
and virtual14 HTS libraries prevent an explicit evaluation of the methods over the same chemical space. The
most potent analogs ranged from single-digit nanomolar, against a kinase, to double-digit micromolar, against a
transcription factor (Supplementary Table S2). Additionally, we present two internal studies in detail. For Large
Tumor Suppressor Kinase 1 (LATS1), we identified potent compounds despite the lack of a crystal structure or
known active compounds. For ATP-driven chaperone Valosin Containing Protein (VCP) we identified novel
allosteric and orthosteric modulators.
Analog potency
# of compounds Potency range SD analog hit rate DR analog hit range (IC50/Ki,
Gene name tested SD hit rate (%) DR hit rate (%) (IC50/Ki, uM) # of analog tested (%) rate (%) uM)
ASAH1 376 10.64 7.71 0.3–102 – – – –
AXL 597 12.06 8.21 0.181–71 3200 35.59 33.56 0.079–86
BCL2 422 3.08 0.00 – – – – –
CBLB 422 1.66 0.00 – – – – –
CDK5 786 10.69 10.43 0.049–79 587 47.53 43.61 0.43–76
CDK7 786 10.69 10.56 0.099–60 735 28.44 27.35 0.191–10
GFPT1 384 6.51 2.34 31–86 734 24.93 24.11 1–194
KCNT1 416 9.62 7.69 1.1–30 – – – –
KDM6A 356 3.93 1.12 24–58 – – – –
LATS1 418 18.18 17.94 0.077–82 841 51.72 45.78 0.034–98
MC2R 208 11.54 9.62 16–68 419 39.38 38.42 2.4–97
MDM4 422 2.37 0.47 5.9–29.8 192 18.23 18.23 4.4–90
NT5E 335 1.49 0.30 176 221 9.95 1.81 8.3–65
PARG 334 7.78 7.78 15–250 – – – –
PARP14 576 5.38 2.95 3–96 616 26.46 26.30 0.2–95
POLQ 330 11.82 11.52 1.2–49 559 11.27 8.77 1.5–42
PPARA 422 4.03 0.24 131 211 14.22 3.79 59–95
PPM1D 530 11.89 6.98 4.5–98 – – – –
PRMT5 422 4.03 0.95 7.2–79 415 7.95 5.54 19–114
PRODH2 542 2.77 1.11 15–84 – – – –
TYK2 189 38.10 34.39 0.016–9 457 71.33 60.39 0.006–10
VCP 416 4.81 4.81 2.4–64 738 – – –
Table 1. Results from 22 Atomwise internal programs. SD and DR denote single-dose and dose–response,
respectively.
Vol.:(0123456789)
www.nature.com/scientificreports/
Academic validation
In addition to our internal discovery efforts, we performed virtual screens for 296 targets, comprising more than
20 billion individual neural network scores of generated protein–ligand co-complexes. We purchased, on average,
85 off-the-shelf commercially available compounds, quality controlled by NMR and LC–MS to > 90% p urity63,
and plated in a single 96-well plate. The compounds were then physically screened for activity against the target
of interest in single-dose assays (see Supplemental Data S1 for assay protocols). As with HTS primary screens,
additional characterization studies are required to validate the initially identified hits so, in 49 projects, we
performed dose–response studies and analog expansion. We present a summary of our results in Supplementary
Table S3.
Figure 2 illustrates the distributions of projects across therapeutic areas, protein families, and assay types.
Every major therapeutic area is represented, with the most frequent area being oncology, comprising 35%
of projects, followed by infectious diseases and neurology, comprising 27% and 9% of projects, respectively.
Breaking down the projects by protein families reveals that all major enzyme classes are represented, with
enzymes comprising 59% of the targets and membrane proteins such as GPCR, transporters, and ion channels,
representing 12% of the targets. Working on a large and diverse set of therapeutic targets requires a heterogeneous
collection of biological assays; 20% of the assays measured direct binding, whereas 56% and 20% were functional
and phenotypic.
In 215 projects, we identified at least one bioactive compound for the target in a biochemical or cell-based
assay. This 73% success rate substantially improves over the ∼50% success rate for H TS21,67. On average, we
screened 85 compounds per project and discovered 4.6 active hits, with an average hit rate of 5.5%. For the subset
of targets where we found any hits, the average was 6.4 hits per project. Thus, we achieved an average hit rate of
7.6%, which again compares favorably with typical HTS hit rates. See Supplementary Material S1 for all assay
definitions and conditions. Supplementary Table S4 shows a representative bioactive compound from each of the
215 successful projects, and Supplementary Fig. S2 shows that the physicochemical properties of the identified
hits are largely druglike and Lipinski-compliant.
The AtomNet technology robustly identified active molecules, even for targets that lacked prior on-target
bioactivity data. This ability to identify hits for previously undrugged targets is critical if machine learning-based
approaches are to replace HTS as the default primary screening approach. For 207 out of the 296 targets (70%),
the training data available for AtomNet models lacked a single active molecule for that target or any closely
related protein (i.e., proteins with sequence identity greater than 70%). We interpret this as evidence of the ability
of properly-architected machine learning systems to extrapolate to novel biological space. Figure 3A illustrates
Figure 2. The distributions of 296 AIMS projects across assay types used in the primary screen, research areas,
target classes, and further breakdown to enzyme classes when applicable.
Vol:.(1234567890)
www.nature.com/scientificreports/
Figure 3. (A) An illustration of the hit rate versus the number of training examples available to our model.
Each point represents a project, with the x-axis denoting the number of active molecules in our training for
the target protein or homologs and the y-axis denoting the hit rate of the project (the percentage of molecules
tested in the project that were active). The model shows no dependence on the availability of on-target training
examples. For 70% of the targets, the AtomNet model training data lacked any active molecules for that target or
any similar targets with greater than 70% sequence identity, yet the model achieved a hit rate of 5.3% compared
to 6.1% when on-target data was available. (B) The distribution of similarities between hits and their most-
similar bioactive compounds in our training data. Our screening protocol ensures that the compounds subjected
to physical testing are not similar to known active compounds or close homologs (< 0.5 Tanimoto similarity
using ECFP4, 1024 bits). Because 70% of the AIMS targets had no annotated bioactivities in our training
dataset, hits identified in these projects have a similarity value of zero.
the hit rate versus the number of training examples available to our model. Although previous computational
approaches typically require thousands of on-target training e xamples31,39,42, the lack of correlation between
training examples and hit rate (R2 = 0.0021, p-value = 0.43) shows that our ML algorithm is agnostic to the
availability of such data. We achieved an average success rate of 75% and hit rates of 5.3% when no training data
was available, comparable to the 67% and 6.1% success and hit rates achieved when binding data was available
in the training set. Interestingly, we also do not see a significant increase in hit rate attributable to the proportion
of binding data available for a target (R2 = 0.008, p-value = 0.39). This reflects the robustness of the screening
protocol and the chemical dissimilarity of scaffolds identified by AtomNet models to previously known bioactive
compounds.
Next, we assessed the ability of the AtomNet models to identify novel scaffolds. This is a critical capability
for primary screens, as follow-up assays tend to work within the chemical space uncovered in the initial screen.
The task of novel scaffold identification appears in two distinct scenarios: (1) when no scaffold is known for the
target and we wish to identify the first scaffold, and (2) when some scaffolds are known but we wish to identify
dissimilar scaffolds because novel chemical matter can yield improved selectivity, toxicity, pharmacokinetics,
or patentability. Performance of AtomNet models for the first scenario, when no scaffolds for the target existed
in the AtomNet model training data, was evaluated on 70% of the targets, where the training data contained no
active molecules for the target or its homologs (vide supra). We achieved an average hit rate of 5.3% for targets
with no training data. For the second scenario, we analyzed the similarity of the identified hits to known bioactive
compounds in our training data (Fig. 3B). Our screening protocol ensures that the compounds subjected to
physical testing are not similar to known active compounds or close homologs (< 0.5 Tanimoto similarity using
ECFP468, 1024 bits). We interpret this as evidence of the ability of properly-architected machine learning systems
to extrapolate to novel chemical space as well. For cases where training data was available (i.e., the Tanimoto
similarity is above zero), the similarity distribution is close to the one expected by random compound pairs69.
The novelty of the small-molecule structures is striking because target-specific machine-learning algorithms
tend to uncover highly similar analogs for known bioactive m olecules50,70,71. The superior performance of the
AtomNet model is expected, considering the bias-variance t radeoff72 in machine learning algorithms. Because
the AtomNet convolutional neural network is a global model, concurrently trained on millions of bioactivities,
hundreds of thousands of small molecules, and thousands of protein binding sites, it can reduce both bias and
variance of the model compared to target-specific o nes33. Specifically, our global model can benefit from multiple
levels of information captured in the structures of the small molecules, the sequences of the target proteins, and
the three-dimensional interactions between the two.
AtomNet also successfully identified active molecules when there was no X-ray crystal structure of the
receptor. Figure 4A compares the hit rates obtained with 3-dimensional crystal structures, cryo-EM, and
homology modeling. We did not attempt to select targets based on the similarity to the template but rather
used the best template available. We observe no substantial difference in success rate between the three, in
Vol.:(0123456789)
www.nature.com/scientificreports/
Figure 4. Hit rates obtained for the 296 AIMS projects. (A) A comparison of hit rates using X-ray
crystallography, NMR, Cryo-EM, and homology for modeling the structure of the proteins. Each point
represents a project with the x-axis denoting the hit rate of the project (the percentage of molecules tested
in the project that were active). The number of projects of each type is given in parentheses. We observed
no substantial difference in success rate between the physical and the computationally inferred models. We
achieved average hit rates of 5.6%, 5.5%, and 5.1% for crystal structures, cryo-EM, and homology modeling,
respectively. The number of projects using NMR structures is too small to make statistically-robust claims. (B) A
comparison of hit rates observed for traditionally challenging target classes such as protein–protein interactions
(PPI) and allosteric binding. Of the 296 projects, 72 targeted PPIs and 58 allosteric binding sites. The average
hit rates were 6.4% and 5.8% for PPIs and allosteric binding, respectively. (C) Comparison of hit rates observed
for different target classes and (D) enzyme classes. No protein or enzyme class falls outside the domain of
applicability of the algorithm.
contrast to the common challenges in using homology models or low-precision structures for structure-based
discovery42,43,73. We achieved average hit rates of 5.6%, 5.5%, and 5.1% for crystal structures, cryo-EM, and
homology modeling. We also successfully identified active compounds in projects with NMR structures, but the
number of such targets is too small to make statistically-robust claims.
An interesting demonstration of the robustness of the AtomNet model to low data and poorly characterized
protein structure is its ability to identify novel hits for traditionally challenging target classes such as
protein–protein interaction (PPI) sites and allosteric binding sites (Fig. 3B). Of the 296 projects, 72 targeted
PPIs and 58 allosteric binding sites. We identified hits for 53 (74%) PPI sites and 46 (79%) allosteric sites, with
13 projects representing allosteric sites at PPI interfaces. The average hit rate was 6.4% and 5.8% for PPIs and
allosteric binding sites, respectively. The algorithm’s success in these target classes, which often suffer from poorly
characterized binding sites and a lack of bioactivity training data, is not surprising because Fig. 2A shows that
our model is largely not dependent on the availability of on-target training data.
Finally, we investigated whether the algorithm exhibits domain of applicability limitations regarding different
protein classes. Figures 4C and 3D illustrate the hit rate observed for each protein and enzyme class. No protein
or enzyme class falls outside the domain of applicability of the algorithm, demonstrating that machine learning-
based approaches are well-suited as a default technology for new scaffold identification. The hit rate for nuclear
receptors is an outlier, with seemingly better accuracy than other classes, but a single data point is not statistically
meaningful.
Vol:.(1234567890)
www.nature.com/scientificreports/
concentration dependent activity was qualitatively determined by testing at concentrations other than that for
the primary screen. The distribution of assay types and target classes for the projects selected for DR validation
also was similar to that of the AIMS projects (Supplementary Fig. S3).
We describe the results of the DR experiments in Supplementary Table S5. In 84% of the experiments, we
validated at least one SD hit and got a DR readout. The median activity for the total of 144 DR measurements
was 15.4 µM (which compares favorably with HTS25,74), of which 13% showed sub-µM potency. Overall, we
achieved an average of 2.8 hits per validation study, resulting in a hit rate of 51%. The false positive rate of 49%
observed in these experiments is favorably compared to HTS’ which can be as high as 95%20,75. This difference
in false positive rates may stem from the comparative ease and robustness of the low-throughput assay format
we employed versus high-throughput assay. Representative dose–response curves for each of the 49 projects are
shown in Supplementary Table S6.
Methods
Screening protocols
AIMS screening protocol
We began by evaluating screening libraries of millions of catalog compounds from commercial vendors MCule
(10 M)76 and Enamine in-stock (2.5 M)77. We then selected a drug-like subset via algorithmic filtering by applying
Eli Lilly medicinal chemistry fi lters78 and removing likely false positives, such as aggregators, autofluorescers,
and PAINS79 (see Fig. 2 for the distributions of drug-like properties of the SD hits). The resulting library
was virtually screened against the target of interest, removing any molecules with greater than 0.5 Tanimoto
similarity in ECFP4 space to any known binders of the target and its homologs within 70% sequence identity. For
kinase targets, we extend the exclusion to the whole kinome. The binding site was defined using co-complexes,
mutagenesis studies, co-complexes of homologs, or by identifying potential sites using ICM Pocket F inder80
81
or Fpocket . Some were orthosteric, while others were allosteric, or as yet unestablished biological functions.
In 64 cases, we built homology models using the closest sequence, with an average sequence similarity of 54%.
We clustered the top 30,000 molecules using the Butina82 algorithm with a Tanimoto similarity cutoff of 0.35
in ECFP4 space, selecting the highest-scoring exemplars. Additional computed physico-chemical property
filters were applied as needed. At no point were compounds cherry-picked. We purchased, on average, 85
compounds, quality controlled by LC–MS to > 90% purity, generally dispensed as 10 mM DMSO stocks plated
in a single 96-well plate. In addition, two vials of DMSO-only negative controls were included before scrambling
the compound locations on the plate, by the supplier, for blinded experimental testing. To further control for
potential artifacts, we removed compounds that showed measurable activity toward more than one target from
the analysis.
Vol.:(0123456789)
www.nature.com/scientificreports/
Data
All data generated or analyzed during this study are included in this published article (and its supplementary
information S1 files). Boxplots illustrations show the quartiles (Q1 and Q3) of the dataset while the whiskers
extend to show the rest of the distribution, except for points that are determined to be “outliers” (1.5 × of the
inter-quartile range, as implemented in the Seaborn and Matplotlib toolboxes90,91).
Conclusion
HTS is the most widely-used tool for hit discovery for new targets. Unfortunately, all physical screening
methods share the critical limitation that a molecule must exist to be screened. Computational methods enable
a fundamental shift to a test-then-make paradigm. In this work, we report on 318 projects (22 internal projects
and 296 collaborations) where we used the AtomNet platform as the primary screening tool coupled with low-
throughput physical screens as validation. The AtomNet technology can identify bioactive scaffolds across a
wide range of proteins, even without known binders, X-ray structures, or manual cherry-picking of compounds.
Our empirical results suggest that machine learning approaches have reached a computational accuracy that can
replace HTS as the first step of small-molecule drug discovery.
Data availability
All data generated or analyzed during this study are included in this published article and its supplementary
information files.
References
1. Kuntz, I. D. Structure-based strategies for drug design and discovery. Science 257, 1078–1082 (1992).
2. Bajorath, J. Integration of virtual and high-throughput screening. Nat. Rev. Drug Discov. 1, 882–894 (2002).
3. Walters, W. P., Stahl, M. T. & Murcko, M. A. Virtual screening—an overview. Drug Discov. Today 3, 160–178 (1998).
4. Ring, C. S. et al. Structure-based inhibitor design by using protein models for the development of antiparasitic agents. Proc.
Natl. Acad. Sci. USA. 90, 3583–3587 (1993).
5. Brown, D. G. An analysis of successful hit-to-clinical candidate pairs. J. Med. Chem. https://doi.org/10.1021/acs.jmedchem.
3c00521 (2023).
6. Békés, M., Langley, D. R. & Crews, C. M. PROTAC targeted protein degraders: The past is prologue. Nat. Rev. Drug Discov. 21,
181–200 (2022).
7. Lu, H. et al. Recent advances in the development of protein–protein interactions modulators: Mechanisms and clinical trials.
Signal Transduct. Target. Ther. 5, 1–23 (2020).
8. Childs-Disney, J. L. et al. Targeting RNA structures with small molecules. Nat. Rev. Drug Discov. 21, 736–762 (2022).
9. Brown, D. G. & Boström, J. Where do recent small molecule clinical development candidates come from?. J. Med. Chem. 61,
9442–9468 (2018).
10. Dragovich, P. S., Haap, W., Mulvihill, M. M., Plancher, J.-M. & Stepan, A. F. Small-molecule lead-finding trends across the roche
and genentech research organizations. J. Med. Chem. 65, 3606–3615 (2022).
11. Perola, E. An analysis of the binding efficiencies of drugs and their leads in successful drug discovery programs. J. Med. Chem.
53, 2986–2997 (2010).
12. Lyu, J. et al. Ultra-large library docking for discovering new chemotypes. Nature 566, 224 (2019).
13. Sadybekov, A. A. et al. Synthon-based ligand discovery in virtual libraries of over 11 billion compounds. Nature 601, 452–459
(2022).
14. Bellmann, L., Penner, P., Gastreich, M. & Rarey, M. Comparison of combinatorial fragment spaces and its application to ultralarge
make-on-demand compound catalogs. J. Chem. Inf. Model. 62, 553–566 (2022).
15. Neumann, A., Marrison, L. & Klein, R. Relevance of the trillion-sized chemical space “explore” as a source for drug discovery.
ACS Med. Chem. Lett. 14, 466–472 (2023).
16. Sunkari, Y. K., Siripuram, V. K., Nguyen, T.-L. & Flajolet, M. High-power screening (HPS) empowered by DNA-encoded libraries.
Trends Pharmacol. Sci. 43, 4–15 (2022).
Vol:.(1234567890)
www.nature.com/scientificreports/
17. Malo, N., Hanley, J. A., Cerquozzi, S., Pelletier, J. & Nadon, R. Statistical practice in high-throughput screening data analysis.
Nat. Biotechnol. 24, 167–175 (2006).
18. Iversen, P. W., Eastwood, B. J., Sittampalam, G. S. & Cox, K. L. A comparison of assay performance measures in screening assays:
Signal window, Z’ factor, and assay variability ratio. J. Biomol. Screen. 11, 247–252 (2006).
19. Zhang, J.-H., Chung, T. D. Y. & Oldenburg, K. R. A simple statistical parameter for use in evaluation and validation of high
throughput screening assays. J. Biomol. Screen. 4, 67–73 (1999).
20. Jadhav, A. et al. Quantitative analyses of aggregation, autofluorescence, and reactivity artifacts in a screen for inhibitors of a
thiol protease. J. Med. Chem. 53, 37–51 (2010).
21. Fox, S. et al. High-throughput screening: Update on practices and success. J. Biomol. Screen. 11, 864–869 (2006).
22. Owen, S. C., Doak, A. K., Wassam, P., Shoichet, M. S. & Shoichet, B. K. Colloidal aggregation affects the efficacy of anticancer
drugs in cell culture. ACS Chem. Biol. 7, 1429–1435 (2012).
23. Rössler, S. L., Grob, N. M., Buchwald, S. L. & Pentelute, B. L. Abiotic peptides as carriers of information for the encoding of
small-molecule library synthesis. Science 379, 939–945 (2023).
24. McGovern, S. L., Caselli, E., Grigorieff, N. & Shoichet, B. K. A Common mechanism underlying promiscuous inhibitors from
virtual and high-throughput screening. J. Med. Chem. 45, 1712–1722 (2002).
25. Feng, B. Y., Shelat, A., Doman, T. N., Guy, R. K. & Shoichet, B. K. High-throughput assays for promiscuous inhibitors. Nat.
Chem. Biol. 1, 146–148 (2005).
26. Martin, E. J., Polyakov, V. R., Tian, L. & Perez, R. C. Profile-QSAR 2.0: Kinase virtual screening accuracy comparable to four-
concentration IC50s for realistically novel compounds. J. Chem. Inf. Model. 57, 2077–2088 (2017).
27. Keiser, M. J. et al. Predicting new molecular targets for known drugs. Nature 462, 175–181 (2009).
28. Svetnik, V. et al. Random forest: A classification and regression tool for compound classification and QSAR modeling. J. Chem.
Inf. Comput. Sci. 43, 1947–1958 (2003).
29. Kitchen, D. B., Decornez, H., Furr, J. R. & Bajorath, J. Docking and scoring in virtual screening for drug discovery: methods
and applications. Nat. Rev. Drug Discov. 3, 935–949 (2004).
30. Shoichet, B. K. Virtual screening of chemical libraries. Nature 432, 862–865 (2004).
31. Ma, J., Sheridan, R. P., Liaw, A., Dahl, G. E. & Svetnik, V. Deep neural nets as a method for quantitative structure-activity
relationships. J. Chem. Inf. Model. 55, 263–274 (2015).
32. Sheridan, R. P. et al. Machine Learning and Deep Learning Experimental error, kurtosis, activity cliffs, and methodology: What
limits the predictivity of QSAR models?. J. Chem. Inf. Model. https://doi.org/10.1021/acs.jcim.9b01067 (2020).
33. Wallach, I. & Heifets, A. Most ligand-based classification benchmarks reward memorization rather than generalization. J. Chem.
Inf. Model. 58, 916–932 (2018).
34. Chen, L. et al. Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual
screening. PLOS ONE 14, e0220113 (2019).
35. Chuang, K. V. & Keiser, M. J. Comment on “Predicting reaction performance in C–N cross-coupling using machine learning”.
Science 362, eaat8603 (2018).
36. Gaieb, Z. et al. D3R Grand Challenge 3: Blind prediction of protein–ligand poses and affinity rankings. J. Comput. Aided Mol.
Des. 33, 1–18 (2019).
37. Gabel, J., Desaphy, J. & Rognan, D. Beware of machine learning-based scoring functions on the danger of developing black
boxes. J. Chem. Inf. Model. 54, 2807–2815 (2014).
38. Cerón-Carrasco, J. P. When virtual screening yields inactive drugs: dealing with false theoretical friends. ChemMedChem 17,
e202200278 (2022).
39. McCloskey, K. et al. Machine learning on DNA-encoded libraries: A new paradigm for hit-finding. J. Med. Chem. 63, 8857–8866
(2020).
40. Wenzel, J., Matter, H. & Schmidt, F. Predictive multitask deep neural network models for ADME-Tox properties: Learning from
large data sets. J. Chem. Inf. Model. 59, 1253–1268 (2019).
41. Feinberg, E. N. et al. PotentialNet for molecular property prediction. ACS Cent. Sci. 4, 1520–1530 (2018).
42. Schindler, C. E. M. et al. Large-scale assessment of binding free energy calculations in active drug discovery projects. J. Chem.
Inf. Model. 60, 5457–5474 (2020).
43. Bordogna, A., Pandini, A. & Bonati, L. Predicting the accuracy of protein–ligand docking on homology models. J. Comput.
Chem. 32, 81–98 (2011).
44. Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688-702.e13 (2020).
45. Melo, M. C. R., Maasch, J. R. M. A. & de la Fuente-Nunez, C. Accelerating antibiotic discovery through artificial intelligence.
Commun. Biol. 4, 1–13 (2021).
46. Skinnider, M. A. et al. A deep generative model enables automated structure elucidation of novel psychoactive substances. Nat.
Mach. Intell. 3, 973–984 (2021).
47. Muegge, I. & Oloff, S. Advances in virtual screening. Drug Discov. Today Technol. 3, 405–411 (2006).
48. N. Muratov, E. et al. QSAR without borders. Chem. Soc. Rev. 49, 3525–3564 (2020).
49. Zhavoronkov, A. et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37, 1038–
1040 (2019).
50. Walters, W. P. & Murcko, M. Assessing the impact of generative AI on medicinal chemistry. Nat. Biotechnol. 38, 143–145 (2020).
51. Scannell, J. W., Blanckley, A., Boldon, H. & Warrington, B. Diagnosing the decline in pharmaceutical R&D efficiency. Nat. Rev.
Drug Discov. 11, 191 (2012).
52. Wallach, I., Dzamba, M. & Heifets, A. AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-
based Drug Discovery. ArXiv Prepr. ArXiv151002855 1–11 (2015).
53. Gniewek, P., Worley, B., Stafford, K., van den Bedem, H. & Anderson, B. Learning physics confers pose-sensitivity in structure-
based virtual screening. https://doi.org/10.48550/arXiv.2110.15459 (2021).
54. Stafford, K. A., Anderson, B. M., Sorenson, J. & van den Bedem, H. AtomNet PoseRanker: Enriching ligand pose quality for
dynamic proteins in virtual high-throughput screens. J. Chem. Inf. Model. 62, 1178–1189 (2022).
55. Hsieh, C.-H. et al. Miro1 marks parkinson’s disease subset and miro1 reducer rescues neuron loss in Parkinson’s models. Cell
Metab. 30, 1131-1140.e7 (2019).
56. Reidenbach, A. G. et al. Multimodal small-molecule screening for human prion protein binders. J. Biol. Chem. 295, 13516–13531
(2020).
57. Bon, C. et al. Discovery of novel trace amine-associated receptor 5 (TAAR5) antagonists using a deep convolutional neural
network. Int. J. Mol. Sci. 23, 3127 (2022).
58. Stecula, A., Hussain, M. S. & Viola, R. E. Discovery of novel inhibitors of a critical brain enzyme using a homology model and
a deep convolutional neural network. J. Med. Chem. 63, 8867–8875 (2020).
59. Su, S. et al. SPOP and OTUD7A Control EWS–FLI1 protein stability to govern ewing sarcoma growth. Adv. Sci. 8, 2004846
(2021).
60. Pedicone, C. et al. Discovery of a novel SHIP1 agonist that promotes degradation of lipid-laden phagocytic cargo by microglia.
iScience 25, 104170 (2022).
Vol.:(0123456789)
www.nature.com/scientificreports/
61. Huang, C. et al. Small molecules block the interaction between porcine reproductive and respiratory syndrome virus and CD163
receptor and the infection of pig cells. Virol. J. 17, 116 (2020).
62. Grygorenko, O. O. et al. Generating multibillion chemical space of readily accessible screening compounds. iScience 23, 101681
(2020).
63. Dandapani, S., Rosse, G., Southall, N., Salvino, J. M. & Thomas, C. J. Selecting, acquiring, and using small molecule libraries for
high-throughput screening. Curr. Protoc. Chem. Biol. 4, 177–191 (2012).
64. Schuffenhauer, A. et al. Library design for fragment based screening. Curr. Top. Med. Chem. 5, 751–762 (2005).
65. Jacoby, E. et al. Key aspects of the novartis compound collection enhancement project for the compilation of a comprehensive
Chemogenomics drug discovery screening collection. Curr. Top. Med. Chem. 5, 397–411 (2005).
66. Petrova, T., Chuprina, A., Parkesh, R. & Pushechnikov, A. Structural enrichment of HTS compounds from available commercial
libraries. MedChemComm 3, 571–579 (2012).
67. Macarron, R. et al. Impact of high-throughput screening in biomedical research. Nat. Rev. Drug Discov. 10, 188–195 (2011).
68. Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).
69. Riniker, S. & Landrum, G. A. Open-source platform to benchmark fingerprints for ligand-based virtual screening. J.
Cheminformatics 5, 26 (2013).
70. Ren, F. et al. AlphaFold accelerates artificial intelligence powered drug discovery: Efficient discovery of a novel cyclin-dependent
kinase 20 (CDK20) Small Molecule Inhibitor (2022).
71. Assessing structural novelty of the first AI-designed drug candidates to go into human clinical trials. CAS https://www.cas.org/
resources/blog/ai-drug-candidates.
72. Kohavi, R. & Wolpert, D. Bias plus variance decomposition for zero-one loss functions. in Proceedings of the Thirteenth
International Conference on International Conference on Machine Learning 275–283 (Morgan Kaufmann Publishers Inc., San
Francisco, CA, USA, 1996).
73. Ferrara, P. & Jacoby, E. Evaluation of the utility of homology models in high throughput docking. J. Mol. Model. 13, 897–905
(2007).
74. Walters, W. P. & Namchuk, M. Designing screens: How to make your hits a hit. Nat. Rev. Drug Discov. 2, 259–266 (2003).
75. Inglese, J. et al. High-throughput screening assays for the identification of chemical probes. Nat. Chem. Biol. 3, 466–479 (2007).
76. mcule database. https://mcule.com/database/.
77. Screening Collections - Enamine. https://enamine.net/compound-collections/screening-collection.
78. Bruns, R. F. & Watson, I. A. Rules for identifying potentially reactive or promiscuous compounds. J. Med. Chem. 55, 9763–9772
(2012).
79. Baell, J. B. & Holloway, G. A. New substructure filters for removal of pan assay interference compounds (PAINS) from screening
libraries and for their exclusion in bioassays. J. Med. Chem. 53, 2719–2740 (2010).
80. Abagyan, R. & Kufareva, I. The flexible pocketome engine for structural chemogenomics. Methods Mol. Biol. Clifton NJ 575,
249–279 (2009).
81. Le Guilloux, V., Schmidtke, P. & Tuffery, P. Fpocket: An open source platform for ligand pocket detection. BMC Bioinformatics
10, 168 (2009).
82. Butina, D. Unsupervised data base clustering based on daylight’s fingerprint and tanimoto similarity: A fast and automated way
to cluster small and large data sets. J. Chem. Inf. Comput. Sci. 39, 747–750 (1999).
83. RDKit: Open-Source Cheminformatics.
84. Rarey, M. & Dixon, J. S. Feature trees: A new molecular similarity measure based on tree matching. J. Comput. Aided Mol. Des.
12, 471–490 (1998).
85. Stafford, K., Anderson, B. M., Sorenson, J. & van den Bedem, H. AtomNet PoseRanker: Enriching Ligand Pose Quality for Dynamic
Proteins in Virtual High Throughput Screens. https://doi.org/10.26434/chemrxiv-2021-t6xkj (2021).
86. Schroedl, S. Current methods and challenges for deep learning in drug discovery. Drug Discov. Today Technol. 32–33, 9–17
(2019).
87. Bender, A., Mussa, H. Y., Glen, R. C. & Reiling, S. Molecular similarity searching using atom environments, information-based
feature selection, and a Naïve Bayesian classifier. J. Chem. Inf. Comput. Sci. 44, 170–178 (2004).
88. Trott, O. & Olson, A. J. AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient
optimization, and multithreading. J. Comput. Chem. 31, 455–461 (2010).
89. Kingma, D. P. & Ba, J. Adam: A Method for Stochastic Optimization. ArXiv14126980 Cs (2017).
90. Waskom, M. L. seaborn: Statistical data visualization. J. Open Source Softw. 6, 3021 (2021).
91. Hunter, J. D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
92. Marineau, J. J. et al. Discovery of SY-5609: A selective, noncovalent inhibitor of CDK7. J. Med. Chem. 65, 1458–1480 (2022).
93. Gu, X., BAI, H., Barbeau, O. R. & Besnard, J. Aromatic heterocyclic compound, and pharmaceutical composition and application
thereof. (2022).
94. Barbay, J. K., Chakravarty, D., Leonard, K., Shook, B. C. & Wang, A. Phenyl and heteroaryl substituted thieno[2,3-d]Pyrimidines
and their use as adenosine A2a receptor antagonists (2010).
95. Bell, A. S., Schreyer, A. M. & Versluys, S. Pyrazolopyrimidine compounds as adenosine receptor antagonists (2019).
96. Soldermann, C. P. et al. Pyrazolo pyrimidine derivatives and their use as MALT1 inhbitors (2019).
97. Feng, S. et al. Tricyclic compounds useful in the treatment of cancer, autoimmune and inflammatory disorders (2023).
98. Heiser, U. & Sommer, R. Inhibitors of glutaminyl cyclase (2020).
99. Cheng, X., Liu, Y., Qin, L., Ren, F. & Wu, J. Beta-lactam derivatives for the treatment of diseases (2023).
100. Wylie, A. A. et al. Therapeutic combinations comprising ubiquitin-specific-processing protease 1 (usp1) inhibitors and poly
(adp-ribose) polymerase (parp) inhibitors (2021).
101. Wu, J., Qin, L. & Liu, J. Small molecule inhibitors of ubiquitin specific protease 1 (usp1) and uses thereof 2023).
102. John, S. E. S. & Mesecar, A. D. Broad-spectrum non-covalent coronavirus protease inhibitors (2017).
103. Zavoronkovs, A., Ivanenkov, Y. A. & Zagribelnyy, B. Sars-cov-2 inhibitors having covalent modifications for treating coronavirus
infections. (2021).
Acknowledgements
See Supplementary section S2.
Author contributions
All authors have contributed to the publication, being variously involved in technology development,
experimental protocol designs, experimental performance, data acquisition, statistical analysis, and manuscript
writing.
Competing interests
The authors affiliated with Atomwise declare the existence of a financial competing interest.
Vol:.(1234567890)
www.nature.com/scientificreports/
Additional information
Supplementary Information The online version contains supplementary material available at https://doi.org/
10.1038/s41598-024-54655-z.
Correspondence and requests for materials should be addressed to
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and
indicate if changes were made. The images or other third party material in this article are included in the article’s
Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included
in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or
exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy
of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Vol.:(0123456789)
www.nature.com/scientificreports/
Carlotta Bon91, Carly J. Chapman92, Carrie L. Partch93, Catherine T. Chaton94, Chang Huang65,
Chao‑Yie Yang95, Charlene M. Kahler38, Charles Karan27, Charles Keller96, Chelsea L. Dieck97,
Chen Huimei70, Chen Liu98, Cheryl Peltier77, Chinmay Kumar Mantri70,
Chinyere Maat Kemet55, Christa E. Müller99, Christian Weber100, Christina M. Zeina59,
Christine S. Muli101, Christophe Morisseau37, Cigdem Alkan33, Clara Reglero19, Cody A. Loy101,
Cornelia M. Wilson102, Courtney Myhr31, Cristina Arrigoni48, Cristina Paulino39,
César Santiago103, Dahai Luo22, Damon J. Tumes104, Daniel A. Keedy105, Daniel A. Lawrence57,
Daniel Chen106, Danny Manor71, Darci J. Trader101, David A. Hildeman52, David H. Drewry107,
David J. Dowling108, David J. Hosfield86, David M. Smith109, David Moreira110,
David P. Siderovski111, David Shum112, David T. Krist113, David W. H. Riches78,
Davide Maria Ferraris114, Deborah H. Anderson115, Deirdre R. Coombe116, Derek S. Welsbie35,
Di Hu71, Diana Ortiz117, Dina Alramadhani118, Dingqiang Zhang119, Dipayan Chaudhuri82,
Dirk J. Slotboom39, Donald R. Ronning120, Donghan Lee121, Dorian Dirksen122,
Douglas A. Shoue123, Douglas William Zochodne124, Durga Krishnamurthy52,
Dustin Duncan125, Dylan M. Glubb92, Edoardo Luigi Maria Gelardi126, Edward C. Hsiao127,
Edward G. Lynn128, Elany Barbosa Silva129, Elena Aguilera130, Elena Lenci50,
Elena Theres Abraham131, Eleonora Lama62, Eleonora Mameli45, Elisa Leung125, Ellie Giles102,
Emily M. Christensen132, Emily R. Mason133, Enrico Petretto70, Ephraim F. Trakhtenberg134,
Eric J. Rubin18, Erick Strauss135, Erik W. Thompson25, Erika Cione136, Erika Mathes Lisabeth137,
Erkang Fan138, Erna Geessien Kroon76, Eunji Jo112, Eva M. García‑Cuesta103,
Evgenia Glukhov35, Evripidis Gavathiotis21, Fang Yu139, Fei Xiang140, Fenfei Leng141,
Feng Wang142, Filippo Ingoglia82, Focco van den Akker71, Francesco Borriello143,
Franco J. Vizeacoumar144, Frank Luh145, Frederick S. Buckner138, Frederick S. Vizeacoumar53,
Fredj Ben Bdira146, Fredrik Svensson73, G. Marcela Rodriguez147, Gabriella Bognár81,
Gaia Lembo148, Gang Zhang149, Garrett Dempsey51, Gary Eitzen150, Gaétan Mayer151,
Geoffrey L. Greene86, George A. Garcia57, Gergely L. Lukacs152, Gergely Prikler81,
Gian Carlo G. Parico93, Gianni Colotti47, Gilles De Keulenaer153, Gino Cortopassi37,
Giovanni Roti60, Giulia Girolimetti62, Giuseppe Fiermonte154, Giuseppe Gasparre155,
Giuseppe Leuzzi19, Gopal Dahal156, Gracjan Michlewski157,158, Graeme L. Conn159,
Grant David Stuchbury85, Gregory R. Bowman160, Grzegorz Maria Popowicz161, Guido Veit152,
Guilherme Eduardo de Souza20, Gustav Akk162, Guy Caljon43, Guzmán Alvarez163,
Gwennan Rucinski164, Gyeongeun Lee112, Gökhan Cildir165, Hai Li27, Hairol E. Breton166,
Hamed Jafar‑Nejad167, Han Zhou168, Hannah P. Moore169, Hannah Tilford164, Haynes Yuan170,
Heesung Shim37, Heike Wulff37, Heinrich Hoppe75, Helena Chaytow45, Heng‑Keat Tam171,
Holly Van Remmen172, Hongyang Xu173, Hosana Maria Debonsi174, Howard B. Lieberman27,
Hoyoung Jung175, Hua‑Ying Fan176, Hui Feng55, Hui Zhou19, Hyeong Jun Kim177,
Iain R. Greig178, Ileana Caliandro179, Ileana Corvo180, Imanol Arozarena181,
Imran N. Mungrue182, Ingrid M. Verhamme183, Insaf Ahmed Qureshi184, Irina Lotsaris185,
Isin Cakir57, J. Jefferson P. Perry194, Jacek Kwiatkowski85, Jacob Boorman71, Jacob Ferreira187,
Jacob Fries188, Jadel Müller Kratz79, Jaden Miner82, Jair L. Siqueira‑Neto35,
James G. Granneman189, James Ng164, James Shorter160, Jan Hendrik Voss99,
Jan M. Gebauer131, Janelle Chuah109, Jarrod J. Mousa190, Jason T. Maynes191, Jay D. Evans192,
Jeffrey Dickhout193, Jeffrey P. MacKeigan137, Jennifer N. Jossart194, Jia Zhou33, Jiabei Lin160,
Jiake Xu195, Jianghai Wang145, Jiaqi Zhu196, Jiayu Liao194, Jingyi Xu194, Jinshi Zhao197,
Jiusheng Lin198, Jiyoun Lee199, Joana Reis48, Joerg Stetefeld77, John B. Bruning200,
John Burt Bruning80, John G. Coles201, John J. Tanner166, John M. Pascal29, Jonathan So59,
Jordan L. Pederick80, Jose A. Costoya110, Joseph B. Rayman19, Joseph J. Maciag52,
Joshua Alexander Nasburg37, Joshua J. Gruber202, Joshua M. Finkelstein55, Joshua Watkins164,
José Miguel Rodríguez‑Frade203, Juan Antonio Sanchez Arias204, Juan José Lasarte205,
Julen Oyarzabal204, Julian Milosavljevic88, Julie Cools153, Julien Lescar22,
Julijus Bogomolovas35, Jun Wang147, Jung‑Min Kee175, Jung‑Min Kee177, Junzhuo Liao206,
Jyothi C. Sistla118, Jônatas Santos Abrahão76, Kamakshi Sishtla207, Karol R. Francisco35,
Kasper B. Hansen208, Kathleen A. Molyneaux71, Kathryn A. Cunningham33, Katie R. Martin137,
Kavita Gadar209, Kayode K. Ojo138, Keith S. Wong125, Kelly L. Wentworth127, Kent Lai82,
Kevin A. Lobb75, Kevin M. Hopkins27, Keykavous Parang210, Khaled Machaca211, Kien Pham98,
Kim Ghilarducci212, Kim S. Sugamori125, Kirk James McManus77, Kirsikka Musta64,
Kiterie M. E. Faller45, Kiyo Nagamori96, Konrad J. Mostert135, Konstantin V. Korotkov94,
Vol:.(1234567890)
www.nature.com/scientificreports/
Vol.:(0123456789)
www.nature.com/scientificreports/
Vol:.(1234567890)
www.nature.com/scientificreports/
111
University of North Texas Health Science Center at Fort Worth, Fort Worth, USA. 112Institut Pasteur Korea,
Seongnam, South Korea. 113Carle Illinois College of Medicine, Urbana, USA. 114Università del Piemonte Orientale,
Vercelli, Italy. 115Saskatchewan Cancer Agency, Saskatoon, Canada. 116Curtin University, Bentley, Australia.
117
Oregon Health and Science University, Portland, USA. 118Virginia Commonwealth University, Richmond, USA.
119
Tufts University, Medford, USA. 120University of Nebraska Medical Center, Omaha, USA. 121University of
Louisville, Louisville, USA. 122Dana Farber Cancer Institute, Boston, USA. 123University of Notre Dame, Notre
Dame, USA. 124University of Alberta, Edmonton, Canada. 125University of Toronto, Toronto, Canada. 126University
of Piemonte Orientale, Vercelli, Italy. 127University of California, San Francisco, San Francisco, USA. 128St. Joseph’s
Healthcare Hamilton, and Hamilton Center for Kidney Research, McMaster University, Hamilton, Canada.
129
Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, USA.
130
Universidad de La República, Montevideo, Uruguay. 131University of Cologne, Cologne, Germany. 132Johnson
University, Knoxville, USA. 133Indiana University, Bloomington, USA. 134School of Medicine, University of
Connecticut, Farmington, USA. 135Stellenbosch University, Stellenbosch, South Africa. 136University of Calabria,
Arcavacata, Italy. 137Michigan State University, East Lansing, USA. 138University of Washington, Washington, USA.
139
Weill Cornell Medicine-Qatar, Ar‑Rayyan, Qatar. 140Gachon University, Seongnam, South Korea. 141Florida
International University, Miami, USA. 142California Institute of Technology, Pasadena, USA. 143Boston Children’s
Hospital, Boston, USA. 144Saskatchewan Cancer Agency and University of Saskatchewan, Saskatchewan, Canada.
145
Sino-American Cancer Foundation, Covina, USA. 146Leiden University, Leiden, The Netherlands. 147Rutgers
University, Newark, USA. 148Core Research Laboratory, ISPRO, Florence, Italy. 149Caltech, Pasadena, USA.
150
University of Alberta, Edmonton, USA. 151Montreal Heart Institute and Université de Montréal, Montreal,
Canada. 152McGill University, Montreal, Canada. 153Antwerp University, Antwerp, Belgium. 154University of Bari
Aldo Moro, Bari, Italy. 155Alma Mater Studiorum-University of Bologna, Bologna, Italy. 156University of Toledo,
Toledo, USA. 157International Institute of Molecular and Cell Biology in Warsaw, Warsaw, Poland. 158Infection
Medicine, University of Edinburgh The Chancellor’s Building, Edinburgh, UK. 159Emory University, Atlanta, USA.
160
University of Pennsylvania, Philadelphia, USA. 161Helmholtz Zentrum München, Munich, Germany.
162
Washington University School of Medicine, St. Louis, USA. 163CENUR Litoral Norte, Universidad de La República,
Montevideo, Uruguay. 164University of Southampton, Southampton, UK. 165Centre for Cancer Biology, University
of South Australia, Adelaide, Australia. 166University of Missouri, Columbia, USA. 167Baylor College of Medicine,
Houston, USA. 168Yale University, New Haven, USA. 169Reno School of Medicine, University of Nevada, Reno, USA.
170
University of Manitoba and CancerCare Manitoba, Winnipeg, Canada. 171Goethe University Frankfurt, Frankfurt,
Germany. 172Oklahoma Medical Research Foundation/Oklahoma City VA Medical Center, Oklahoma City, USA.
173
Oklahoma Medical Research Foundation, Oklahoma City, USA. 174Department of Biomolecular Sciences, School
of Pharmaceutical Sciences of Ribeirão Preto, University of São Paulo, Ribeirão Preto, SP, Brazil. 175Ulsan National
Institute of Science and Technology, Ulsan, South Korea. 176University of New Mexico Comprehensive Cancer
Center, Albuquerque, USA. 177Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea.
178
University of Aberdeen, Aberdeen, UK. 179University of Turin, Turin, Italy. 180Universidad de La República, CenUR
LN, Montevideo, Uruguay. 181Navarrabiomed-IdiSNA, Pamplona, Spain. 182Independent, Los Angeles, USA.
183
Vanderbilt University Medical Center, Nashville, USA. 184University of Hyderabad, Hyderabad, India.
185
University of Sydney, Sydney, Australia. 186City of Hope Medical Center, Duarte, USA. 187Weill Cornell Medicine,
New York, NY 10065, USA. 188University of Toledo College of Medicine and Life Sciences, Toledo, USA. 189School of
Medicine, Wayne State University, Detroit, USA. 190University of Georgia, Athens, USA. 191The Hospital for Sick
Children, Toronto, Canada. 192United States Department of Agriculture, Agricultural Research Service (USDA-
ARS), Washington, DC, USA. 193McMaster University, Hamilton, Canada. 194University of California, Riverside,
Riverside, USA. 195The University of Western Australia, Perth, Australia. 196The University of Connecticut, Storrs,
USA. 197Duke University School of Medicine, Durham, USA. 198University of Nebraska-Lincoln, Lincoln, USA.
199
Sungshin University, Seoul, South Korea. 200University of Adelaide, Adelaide, Australia. 201University Toronto,
Toronto, Canada. 202University of Texas Southwestern Medical Center, Dallas, USA. 203Centro Nacional de
Biotecnologia/CSIC, Madrid, Spain. 204Centro de Investigación Médica Aplicada, Pamplona, Spain. 205Centro de
Investigación Médica Aplicada, Universidad de Navarra, Pamplona, Spain. 206University of Wisconsin-Madison,
Madison, USA. 207Indiana University School of Medicine, Indianapolis, USA. 208University of Montana, Missoula,
USA. 209Brunel University London, London, UK. 210Chapman University, Orange, USA. 211Weill Cornell Medicine
Qatar, Ar‑Rayyan, Qatar. 212Université du Québec À Montréal, Montreal, Canada. 213National Taiwan University,
Taipei, Taiwan. 214Rhodes College, Memphis, USA. 215Harvard School of Public Health, Boston, USA. 216University
of Central Florida, Orlando, USA. 217University of Rochester, Rochester, USA. 218George Mason University, Fairfax,
USA. 219University of Oulu, Oulu, Finland. 220Instituto Investigación Sanitaria La Fe, Valencia, Spain. 221Ludwig-
Maximilians-University, Munich, Germany. 222Brandeis University, Waltham, USA. 223Universidad de La República,
CENUR Litoral Norte, Montevideo, Uruguay. 224Arabian Gulf University, Manama, Bahrain. 225University of
Cambridge, Cambridge, UK. 226University of Magna Graecia, Catanzaro, Italy. 227Massachusetts General Hospital,
Boston, USA. 228University of Missouri-Columbia, Columbia, USA. 229Marquette University, Milwaukee, USA.
230
University of North Dakota, Grand Forks, USA. 231Simon Fraser University, Burnaby, Canada. 232CancerCare
Manitoba Research Institute (CCMR), Winnipeg, Canada. 233The University of Queensland, Brisbane, Australia.
234
University of Manitoba and CancerCare Manitoba Research Institute, Winnipeg, Canada. 235Sanford Burnham
Prebys, La Jolla, USA. 236University of Amsterdam, Amsterdam, The Netherlands. 237UConn Health, Farmington,
USA. 238The University of Texas MD Anderson Cancer Center, Houston, USA. 239Walter Sisulu University, Mthatha,
South Africa. 240Virginia Tech, Blacksburg, USA. 241University of Houston, Houston, USA. 242Rutgers University,
New Brunswick, USA. 243University of Nevada, Reno, USA. 244Universidad de León, León, Spain. 245School of
Medicine, Case Western Reserve University, Cleveland, USA. 246School of Medicine, UConn Health, Farmington,
USA. 247University of Rochester Medical Center, Rochester, USA. 248Chapman University School of Pharmacy,
Irvine, USA. 249University of Winnipeg/St. Boniface Research Centre, Winnipeg, Canada. 250Fred Hutchinson Cancer
Vol.:(0123456789)
www.nature.com/scientificreports/
Center, Seattle, USA. 251Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Saarbrücken, Germany.
252
Technical University of Denmark, Kongens Lyngby, Denmark. 253The City College of New York, New York, USA.
254
Children’s Cancer, Therapy Development Institute (Cc-TDI), Beaverton, USA. 255Soka University, Hachioji, Japan.
256
Institute of Clinical Physiology, National Research Council, Pisa, Italy. 257Charles Sturt University, Bathurst,
Australia. 258Washington University, St Louis, USA. 259University of Dundee, Dundee, UK. 260Montreal Heart
Institute, Montreal, Canada. 261Columbia University Vagelos College of Physicians and Surgeons, Columbia, USA.
262
Georgetown University, Washington, USA. 263Duke University, Durham, USA. 264Center for Applied Medical
Research, University of Navarra, Pamplona, Spain. 265King’s College London, London, UK. 266Precision Vaccines
Program, Division of Infectious Diseases, Boston Children’s Hospital, Boston, USA. 267Fred Hutchinson Cancer
Research Center, Seattle, USA. 268Louisiana State University, Baton Rouge, USA. 269Massey University, Palmerston
North, New Zealand. 270Wake Forest University School of Medicine, Winston‑Salem, USA. 271Central South
University, Changsha, China. 272SUNY Upstate Medical University, Syracuse, USA. 273University of Oxford, Oxford,
UK. 274Goethe-University, Frankfurt, Frankfurt, Germany. 275Institute for Cardiovascular Prevention (IPEK),
Ludwig-Maximilians-Universität München, Munich, Germany. 276Harvard T.H. Chan School of Public Health,
Boston, USA. 277School of Medicine, Boston University, Boston, USA. 278University of Texas Health Science Center
at San Antonio, San Antonio, USA. 279University of Helsinki, Helsinki, Finland. 280Wadsworth Center, NYSDOH,
Albany, USA. 281The University of Sydney, Sydney, Australia.
Vol:.(1234567890)