0% found this document useful (0 votes)
14 views16 pages

AI Is A Viable Alternative To High Throughput Screening: A 318 Target Study

The study demonstrates that Atomwise's AtomNet® convolutional neural network can effectively identify novel drug-like molecules across various therapeutic areas, potentially replacing high throughput screening (HTS) in drug discovery. By conducting a large-scale virtual HTS campaign involving 318 projects, the results show comparable hit rates to traditional methods while addressing limitations such as the need for existing compounds and structural data. The findings suggest that computational methods can significantly enhance the efficiency and diversity of chemical space explored in drug discovery.

Uploaded by

Miklós Kocsis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views16 pages

AI Is A Viable Alternative To High Throughput Screening: A 318 Target Study

The study demonstrates that Atomwise's AtomNet® convolutional neural network can effectively identify novel drug-like molecules across various therapeutic areas, potentially replacing high throughput screening (HTS) in drug discovery. By conducting a large-scale virtual HTS campaign involving 318 projects, the results show comparable hit rates to traditional methods while addressing limitations such as the need for existing compounds and structural data. The findings suggest that computational methods can significantly enhance the efficiency and diversity of chemical space explored in drug discovery.

Uploaded by

Miklós Kocsis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

www.nature.

com/scientificreports

OPEN AI is a viable alternative to high


throughput screening: a 318‑target
study
The Atomwise AIMS Program 1**

High throughput screening (HTS) is routinely used to identify bioactive small molecules. This requires
physical compounds, which limits coverage of accessible chemical space. Computational approaches
combined with vast on-demand chemical libraries can access far greater chemical space, provided that
the predictive accuracy is sufficient to identify useful molecules. Through the largest and most diverse
virtual HTS campaign reported to date, comprising 318 individual projects, we demonstrate that our
AtomNet® convolutional neural network successfully finds novel hits across every major therapeutic
area and protein class. We address historical limitations of computational screening by demonstrating
success for target proteins without known binders, high-quality X-ray crystal structures, or manual
cherry-picking of compounds. We show that the molecules selected by the AtomNet® model are novel
drug-like scaffolds rather than minor modifications to known bioactive compounds. Our empirical
results suggest that computational methods can substantially replace HTS as the first step of small-
molecule drug discovery.

Despite present interest in AI/ML and thirty years of case ­studies1–4, computational screening techniques
have achieved limited adoption within the pharmaceutical industry. A recent investigation into the origins
of 156 clinical ­candidates5 found that only 1% came from virtual screening; in contrast, over 90% of clinical
candidates were derived from patent busting or high throughput screening (HTS). Unfortunately, these sources
are increasingly challenged, given the pharmaceutical industry’s shift to novel target classes, such as proximity-
induced protein ­degradation6, protein–protein i­ nteractions7, and RNA t­ argeting8.
Currently, HTS is the critical tool in drug discovery, providing most novel scaffolds of recent clinical
­candidates5,9,10. These initial starting points crucially shape the course of downstream medicinal chemistry efforts,
as most drugs preserve at least 80% of the scaffold of the initially identified l­ead11. Despite these foundational
contributions, HTS suffers from practical limitations. Principally, HTS, like all physical experiments, requires that
the compounds exist. However, with the advent of synthesis-on-demand libraries, most commercially-available
molecules have yet to be synthesized. Still, they can be made and delivered for testing in a matter of w ­ eeks12–14.
14,15
These libraries comprise trillions of ­molecules that exemplify millions of otherwise-unavailable ­scaffolds12,
providing an opportunity to substantially expand the scope and diversity of available chemical space explored
in the standard drug discovery process.
Computational approaches unlock this opportunity by reversing the requirement to make molecules before
testing them. When computational experiments replace HTS as the primary screen, molecules are tested before
they are made, and the results from these experiments can inform which molecules are worth synthesizing.
Computational experiments further promise to improve upon HTS in terms of cost, speed, need to produce
significant quantities of ­protein16, effort of miniaturizing assay formats while maintaining experimental
­integrity17–19, and reducing false-positive and false-negative r­ ates16,20–23 including artifacts from aggregation,
covalent modification of the target, autofluorescence, or interactions with the reporter rather than the t­ arget20,24,25.
Historical computational techniques such as ligand-based Q ­ SAR26–28, structure-based d
­ ocking29,30, and machine
­learning31,32 purport to address these limitations of physical screening methods. Unfortunately, these techniques
have not replaced HTS; in fact, despite increasing interest in ML, the proportion of drugs discovered with
computational techniques has remained steady over the past d ­ ecades5,10.
Because there will always be individual targets for which one screening technique can identify more hits than
another, the key question governing if computation is ready to be the default hit discovery technique is whether
computational screens can identify hits successfully across a broad range of diverse targets. Unfortunately, despite
excellent benchmark a­ ccuracies33–35, prospective discovery accuracy remains m ­ odest33,36,37. For example, Cerón-
38
Carrasco reported over 700 virtual screens against the SARS-CoV-2 main protease. However, when the author

*
A list of authors and their affiliations appears at the end of the paper. *email: [email protected]

Scientific Reports | (2024) 14:7526 | https://doi.org/10.1038/s41598-024-54655-z 1

Vol.:(0123456789)
www.nature.com/scientificreports/

sought to validate the computational predictions via physical experiments, the identified compounds were barely
active (800uM). Computational approaches have also been limited by a need for extensive target-specific training
­data31,39–41, a requirement for high-quality X-ray crystal s­ tructures42,43, dependence on human adjudication
(so-called ‘cherry-picking’)12, or a limited domain of ­applicability44–48. Even recent systems have demonstrated
utility only in identifying minor variants of known molecules for well-studied proteins with tens of thousands
of known binders in their training d ­ ata49,50. Figure 1 exemplifies the striking similarities between recently
ML-developed compounds and their preceding published chemical matter. This is particularly concerning, as
a myopic focus on well-studied proteins has been identified as a cause of low productivity in pharmaceutical
­discovery51.
Nevertheless, we have observed that deep learning approaches are not as limited as these historical examples
would imply. Using our ­AtomNet52–54 screening system, we have previously reported success in finding novel
scaffolds for targets without known l­ igands55–57, X-ray crystal s­ tructures56–60, or b
­ oth56,57, as well as challenging
59,61 60
modulation via protein–protein ­interaction or allosteric ­binding (see Supplementary Table S1 for examples).
However, individual examples do not demonstrate the overall success of such deep learning systems. We therefore
report our internal discovery efforts against 22 targets of pharmaceutical interest. We then attempted to further
assess the generalizability and robustness of deep learning predictive systems by identifying bioactive molecules
for a diverse set of targets. We partnered with 482 academic labs and screening centers, from 257 different
academic institutions across 30 countries, through our academic collaboration program, the Artificial Intelligence
Molecular Screen (AIMS). This collaboration afforded an opportunity to prospectively evaluate the utility of
the AtomNet model as a primary screen across a broad range of diverse, challenging, and realistic targets.
In aggregate, we report successes and failures from 318 prospective experiments and evaluate our AtomNet
machine-learning technology’s ability to serve as a viable alternative to physical HTS campaigns.

Results
We investigated the ability of deep learning-based methods to identify novel bioactive chemotypes by applying
the AtomNet model to identify hits for 22 internal targets of pharmaceutical interest. We also explored the
breadth of applicability of this approach by attempting to identify drug-like hits in single-dose screens for 296
academic targets, of which 49 were followed up with dose–response experiments, and 21 were further validated
by exploring analogs of the initial hits. The average hit rate for our internal projects (6.7%) was comparable to
the hit rate for our academic collaborations (7.6%).

Internal portfolio validation


As part of Atomwise’s internal drug discovery efforts, we used the AtomNet model instead of high-throughput or
DNA-encoded library (DEL) screening. We screened a 16-billion synthesis-on-demand chemical s­ pace62, which

Figure 1.  Pairs of representative compounds extracted from AI patents (right) and corresponding prior patents
(left) for clinical-stage programs ­(CDK792,93, A2Ar-antagonist94,95, ­MALT196,97, ­QPCTL98,99, ­USP1100,101, and
­3CLpro102,103). The identical atoms between the chemical structures are highlighted in red.

Scientific Reports | (2024) 14:7526 | https://doi.org/10.1038/s41598-024-54655-z 2

Vol:.(1234567890)
www.nature.com/scientificreports/

is several thousand times larger than HTS libraries and even exceeds the size of most DELs without suffering
limitations of DNA-compatible c­ hemistry16,23. Each screen requires over 40,000 CPUs, 3,500 GPUs, 150 TB of
main memory, and 55 TB of data transfers. We describe the protocol in detail in the Methods section; briefly,
we computationally scored each catalog compound after removing molecules that were prone to interfere with
the assays or were too similar to known binders of the target or its homologs. The neural network analyzes and
scores the 3D coordinates of each generated protein–ligand co-complex, producing a list of ligands ranked by
their predicted binding probability. Our workflow then clusters the top-ranked molecules to ensure diversity and
algorithmically selects the highest-scoring exemplars from each cluster. At no point are compounds manually
cherry-picked. The molecules were synthesized at Enamine (https://​enami​ne.​net) and quality controlled by
LC–MS to purity > 90%, in agreement with HTS ­standards63. Hits were further validated using NMR. We then
physically tested, on average, 440 compounds per target at reputable contract research organizations (CROs),
while attempting to mitigate assay interferences such as aggregation and oxidation with standard additives
(e.g., Tween-20, Triton-X 100, and dithiothreitol (DTT)). We describe the assay protocols in detail in the
Supplementary Data S1.
We describe the results of the 22 experiments in Table 1. In 91% of the experiments, we identified single-
dose (SD) hits that were reconfirmed in dose–response (DR) experiments. The average target DR hit rate was
6.7% compared to 8.8% from the SD screens. Only 16 of the 22 projects were structurally enabled with X-ray
crystallography; one used a cryo-EM structure, while five used homology models with an average sequence
identity of 42% to their template protein. The DR hit rate for the cryo-EM project was 10.56%, while the average
hit rate for the homology models was a similar 10.8%.
We then advanced 14 projects with at least one dose-responsive scaffold to a round of analog expansion. We
found new bioactive analogs in the SD screen for all projects, with an average hit rate of 29.8%. Further validation
with DR resulted in an average hit rate of 26% per project, which compares favorably with typical HTS hit rates
ranging from 0.151 to 0.001%64,65. We note that the size and chemical diversity within and between ­physical66
and ­virtual14 HTS libraries prevent an explicit evaluation of the methods over the same chemical space. The
most potent analogs ranged from single-digit nanomolar, against a kinase, to double-digit micromolar, against a
transcription factor (Supplementary Table S2). Additionally, we present two internal studies in detail. For Large
Tumor Suppressor Kinase 1 (LATS1), we identified potent compounds despite the lack of a crystal structure or
known active compounds. For ATP-driven chaperone Valosin Containing Protein (VCP) we identified novel
allosteric and orthosteric modulators.

Analog potency
# of compounds Potency range SD analog hit rate DR analog hit range (IC50/Ki,
Gene name tested SD hit rate (%) DR hit rate (%) (IC50/Ki, uM) # of analog tested (%) rate (%) uM)
ASAH1 376 10.64 7.71 0.3–102 – – – –
AXL 597 12.06 8.21 0.181–71 3200 35.59 33.56 0.079–86
BCL2 422 3.08 0.00 – – – – –
CBLB 422 1.66 0.00 – – – – –
CDK5 786 10.69 10.43 0.049–79 587 47.53 43.61 0.43–76
CDK7 786 10.69 10.56 0.099–60 735 28.44 27.35 0.191–10
GFPT1 384 6.51 2.34 31–86 734 24.93 24.11 1–194
KCNT1 416 9.62 7.69 1.1–30 – – – –
KDM6A 356 3.93 1.12 24–58 – – – –
LATS1 418 18.18 17.94 0.077–82 841 51.72 45.78 0.034–98
MC2R 208 11.54 9.62 16–68 419 39.38 38.42 2.4–97
MDM4 422 2.37 0.47 5.9–29.8 192 18.23 18.23 4.4–90
NT5E 335 1.49 0.30 176 221 9.95 1.81 8.3–65
PARG​ 334 7.78 7.78 15–250 – – – –
PARP14 576 5.38 2.95 3–96 616 26.46 26.30 0.2–95
POLQ 330 11.82 11.52 1.2–49 559 11.27 8.77 1.5–42
PPARA​ 422 4.03 0.24 131 211 14.22 3.79 59–95
PPM1D 530 11.89 6.98 4.5–98 – – – –
PRMT5 422 4.03 0.95 7.2–79 415 7.95 5.54 19–114
PRODH2 542 2.77 1.11 15–84 – – – –
TYK2 189 38.10 34.39 0.016–9 457 71.33 60.39 0.006–10
VCP 416 4.81 4.81 2.4–64 738 – – –

Table 1.  Results from 22 Atomwise internal programs. SD and DR denote single-dose and dose–response,
respectively.

Scientific Reports | (2024) 14:7526 | https://doi.org/10.1038/s41598-024-54655-z 3

Vol.:(0123456789)
www.nature.com/scientificreports/

Academic validation
In addition to our internal discovery efforts, we performed virtual screens for 296 targets, comprising more than
20 billion individual neural network scores of generated protein–ligand co-complexes. We purchased, on average,
85 off-the-shelf commercially available compounds, quality controlled by NMR and LC–MS to > 90% p ­ urity63,
and plated in a single 96-well plate. The compounds were then physically screened for activity against the target
of interest in single-dose assays (see Supplemental Data S1 for assay protocols). As with HTS primary screens,
additional characterization studies are required to validate the initially identified hits so, in 49 projects, we
performed dose–response studies and analog expansion. We present a summary of our results in Supplementary
Table S3.
Figure 2 illustrates the distributions of projects across therapeutic areas, protein families, and assay types.
Every major therapeutic area is represented, with the most frequent area being oncology, comprising 35%
of projects, followed by infectious diseases and neurology, comprising 27% and 9% of projects, respectively.
Breaking down the projects by protein families reveals that all major enzyme classes are represented, with
enzymes comprising 59% of the targets and membrane proteins such as GPCR, transporters, and ion channels,
representing 12% of the targets. Working on a large and diverse set of therapeutic targets requires a heterogeneous
collection of biological assays; 20% of the assays measured direct binding, whereas 56% and 20% were functional
and phenotypic.
In 215 projects, we identified at least one bioactive compound for the target in a biochemical or cell-based
assay. This 73% success rate substantially improves over the ∼50% success rate for H ­ TS21,67. On average, we
screened 85 compounds per project and discovered 4.6 active hits, with an average hit rate of 5.5%. For the subset
of targets where we found any hits, the average was 6.4 hits per project. Thus, we achieved an average hit rate of
7.6%, which again compares favorably with typical HTS hit rates. See Supplementary Material S1 for all assay
definitions and conditions. Supplementary Table S4 shows a representative bioactive compound from each of the
215 successful projects, and Supplementary Fig. S2 shows that the physicochemical properties of the identified
hits are largely druglike and Lipinski-compliant.
The AtomNet technology robustly identified active molecules, even for targets that lacked prior on-target
bioactivity data. This ability to identify hits for previously undrugged targets is critical if machine learning-based
approaches are to replace HTS as the default primary screening approach. For 207 out of the 296 targets (70%),
the training data available for AtomNet models lacked a single active molecule for that target or any closely
related protein (i.e., proteins with sequence identity greater than 70%). We interpret this as evidence of the ability
of properly-architected machine learning systems to extrapolate to novel biological space. Figure 3A illustrates

Figure 2.  The distributions of 296 AIMS projects across assay types used in the primary screen, research areas,
target classes, and further breakdown to enzyme classes when applicable.

Scientific Reports | (2024) 14:7526 | https://doi.org/10.1038/s41598-024-54655-z 4

Vol:.(1234567890)
www.nature.com/scientificreports/

Figure 3.  (A) An illustration of the hit rate versus the number of training examples available to our model.
Each point represents a project, with the x-axis denoting the number of active molecules in our training for
the target protein or homologs and the y-axis denoting the hit rate of the project (the percentage of molecules
tested in the project that were active). The model shows no dependence on the availability of on-target training
examples. For 70% of the targets, the AtomNet model training data lacked any active molecules for that target or
any similar targets with greater than 70% sequence identity, yet the model achieved a hit rate of 5.3% compared
to 6.1% when on-target data was available. (B) The distribution of similarities between hits and their most-
similar bioactive compounds in our training data. Our screening protocol ensures that the compounds subjected
to physical testing are not similar to known active compounds or close homologs (< 0.5 Tanimoto similarity
using ECFP4, 1024 bits). Because 70% of the AIMS targets had no annotated bioactivities in our training
dataset, hits identified in these projects have a similarity value of zero.

the hit rate versus the number of training examples available to our model. Although previous computational
approaches typically require thousands of on-target training e­ xamples31,39,42, the lack of correlation between
training examples and hit rate ­(R2 = 0.0021, p-value = 0.43) shows that our ML algorithm is agnostic to the
availability of such data. We achieved an average success rate of 75% and hit rates of 5.3% when no training data
was available, comparable to the 67% and 6.1% success and hit rates achieved when binding data was available
in the training set. Interestingly, we also do not see a significant increase in hit rate attributable to the proportion
of binding data available for a target ­(R2 = 0.008, p-value = 0.39). This reflects the robustness of the screening
protocol and the chemical dissimilarity of scaffolds identified by AtomNet models to previously known bioactive
compounds.
Next, we assessed the ability of the AtomNet models to identify novel scaffolds. This is a critical capability
for primary screens, as follow-up assays tend to work within the chemical space uncovered in the initial screen.
The task of novel scaffold identification appears in two distinct scenarios: (1) when no scaffold is known for the
target and we wish to identify the first scaffold, and (2) when some scaffolds are known but we wish to identify
dissimilar scaffolds because novel chemical matter can yield improved selectivity, toxicity, pharmacokinetics,
or patentability. Performance of AtomNet models for the first scenario, when no scaffolds for the target existed
in the AtomNet model training data, was evaluated on 70% of the targets, where the training data contained no
active molecules for the target or its homologs (vide supra). We achieved an average hit rate of 5.3% for targets
with no training data. For the second scenario, we analyzed the similarity of the identified hits to known bioactive
compounds in our training data (Fig. 3B). Our screening protocol ensures that the compounds subjected to
physical testing are not similar to known active compounds or close homologs (< 0.5 Tanimoto similarity using
­ECFP468, 1024 bits). We interpret this as evidence of the ability of properly-architected machine learning systems
to extrapolate to novel chemical space as well. For cases where training data was available (i.e., the Tanimoto
similarity is above zero), the similarity distribution is close to the one expected by random compound ­pairs69.
The novelty of the small-molecule structures is striking because target-specific machine-learning algorithms
tend to uncover highly similar analogs for known bioactive m ­ olecules50,70,71. The superior performance of the
AtomNet model is expected, considering the bias-variance t­ radeoff72 in machine learning algorithms. Because
the AtomNet convolutional neural network is a global model, concurrently trained on millions of bioactivities,
hundreds of thousands of small molecules, and thousands of protein binding sites, it can reduce both bias and
variance of the model compared to target-specific o ­ nes33. Specifically, our global model can benefit from multiple
levels of information captured in the structures of the small molecules, the sequences of the target proteins, and
the three-dimensional interactions between the two.
AtomNet also successfully identified active molecules when there was no X-ray crystal structure of the
receptor. Figure 4A compares the hit rates obtained with 3-dimensional crystal structures, cryo-EM, and
homology modeling. We did not attempt to select targets based on the similarity to the template but rather
used the best template available. We observe no substantial difference in success rate between the three, in

Scientific Reports | (2024) 14:7526 | https://doi.org/10.1038/s41598-024-54655-z 5

Vol.:(0123456789)
www.nature.com/scientificreports/

Figure 4.  Hit rates obtained for the 296 AIMS projects. (A) A comparison of hit rates using X-ray
crystallography, NMR, Cryo-EM, and homology for modeling the structure of the proteins. Each point
represents a project with the x-axis denoting the hit rate of the project (the percentage of molecules tested
in the project that were active). The number of projects of each type is given in parentheses. We observed
no substantial difference in success rate between the physical and the computationally inferred models. We
achieved average hit rates of 5.6%, 5.5%, and 5.1% for crystal structures, cryo-EM, and homology modeling,
respectively. The number of projects using NMR structures is too small to make statistically-robust claims. (B) A
comparison of hit rates observed for traditionally challenging target classes such as protein–protein interactions
(PPI) and allosteric binding. Of the 296 projects, 72 targeted PPIs and 58 allosteric binding sites. The average
hit rates were 6.4% and 5.8% for PPIs and allosteric binding, respectively. (C) Comparison of hit rates observed
for different target classes and (D) enzyme classes. No protein or enzyme class falls outside the domain of
applicability of the algorithm.

contrast to the common challenges in using homology models or low-precision structures for structure-based
­discovery42,43,73. We achieved average hit rates of 5.6%, 5.5%, and 5.1% for crystal structures, cryo-EM, and
homology modeling. We also successfully identified active compounds in projects with NMR structures, but the
number of such targets is too small to make statistically-robust claims.
An interesting demonstration of the robustness of the AtomNet model to low data and poorly characterized
protein structure is its ability to identify novel hits for traditionally challenging target classes such as
protein–protein interaction (PPI) sites and allosteric binding sites (Fig. 3B). Of the 296 projects, 72 targeted
PPIs and 58 allosteric binding sites. We identified hits for 53 (74%) PPI sites and 46 (79%) allosteric sites, with
13 projects representing allosteric sites at PPI interfaces. The average hit rate was 6.4% and 5.8% for PPIs and
allosteric binding sites, respectively. The algorithm’s success in these target classes, which often suffer from poorly
characterized binding sites and a lack of bioactivity training data, is not surprising because Fig. 2A shows that
our model is largely not dependent on the availability of on-target training data.
Finally, we investigated whether the algorithm exhibits domain of applicability limitations regarding different
protein classes. Figures 4C and 3D illustrate the hit rate observed for each protein and enzyme class. No protein
or enzyme class falls outside the domain of applicability of the algorithm, demonstrating that machine learning-
based approaches are well-suited as a default technology for new scaffold identification. The hit rate for nuclear
receptors is an outlier, with seemingly better accuracy than other classes, but a single data point is not statistically
meaningful.

Dose–response validation studies


We performed additional validation studies for 49 AIMS projects with at least one reported hit. The objective
of the validation studies was to establish dose–response (DR) relationships for the single-dose (SD) hits. We
describe the protocol of the DR experiments in the Methods section. Briefly, we performed dose–response
measurements for the reported hits from the single-dose primary screens. DR was determined using the same
assay and screening protocol as the single-dose screens, at the same lab, and with the same personnel. Full
dose response curves were obtained in most cases, however in some instances a full curve was not obtained, or

Scientific Reports | (2024) 14:7526 | https://doi.org/10.1038/s41598-024-54655-z 6

Vol:.(1234567890)
www.nature.com/scientificreports/

concentration dependent activity was qualitatively determined by testing at concentrations other than that for
the primary screen. The distribution of assay types and target classes for the projects selected for DR validation
also was similar to that of the AIMS projects (Supplementary Fig. S3).
We describe the results of the DR experiments in Supplementary Table S5. In 84% of the experiments, we
validated at least one SD hit and got a DR readout. The median activity for the total of 144 DR measurements
was 15.4 µM (which compares favorably with ­HTS25,74), of which 13% showed sub-µM potency. Overall, we
achieved an average of 2.8 hits per validation study, resulting in a hit rate of 51%. The false positive rate of 49%
observed in these experiments is favorably compared to HTS’ which can be as high as 95%20,75. This difference
in false positive rates may stem from the comparative ease and robustness of the low-throughput assay format
we employed versus high-throughput assay. Representative dose–response curves for each of the 49 projects are
shown in Supplementary Table S6.

Analog validation studies


For a subset of 21 projects, we further validated hits with DR activity by testing analogs of the active compounds.
In those cases, we used the AtomNet platform to search a purchasable space for additional bioactive compounds
chemically analogous to the SD hits. We selected up to 35 additional compounds for testing, including the active
compounds from the SD screens.
We describe the results of the analoging experiments in Supplementary Table S7. We identified additional
analogs with DR readouts for 16 projects (76%). The median DR activity of the 154 validated analogs was 7.4 µM
compared to the median of 15.4 µM of the parent compound (Supplementary Fig. S4).

Methods
Screening protocols
AIMS screening protocol
We began by evaluating screening libraries of millions of catalog compounds from commercial vendors MCule
(10 M)76 and Enamine in-stock (2.5 M)77. We then selected a drug-like subset via algorithmic filtering by applying
Eli Lilly medicinal chemistry fi ­ lters78 and removing likely false positives, such as aggregators, autofluorescers,
and ­PAINS79 (see Fig. 2 for the distributions of drug-like properties of the SD hits). The resulting library
was virtually screened against the target of interest, removing any molecules with greater than 0.5 Tanimoto
similarity in ECFP4 space to any known binders of the target and its homologs within 70% sequence identity. For
kinase targets, we extend the exclusion to the whole kinome. The binding site was defined using co-complexes,
mutagenesis studies, co-complexes of homologs, or by identifying potential sites using ICM Pocket F ­ inder80
81
or ­Fpocket . Some were orthosteric, while others were allosteric, or as yet unestablished biological functions.
In 64 cases, we built homology models using the closest sequence, with an average sequence similarity of 54%.
We clustered the top 30,000 molecules using the ­Butina82 algorithm with a Tanimoto similarity cutoff of 0.35
in ECFP4 space, selecting the highest-scoring exemplars. Additional computed physico-chemical property
filters were applied as needed. At no point were compounds cherry-picked. We purchased, on average, 85
compounds, quality controlled by LC–MS to > 90% purity, generally dispensed as 10 mM DMSO stocks plated
in a single 96-well plate. In addition, two vials of DMSO-only negative controls were included before scrambling
the compound locations on the plate, by the supplier, for blinded experimental testing. To further control for
potential artifacts, we removed compounds that showed measurable activity toward more than one target from
the analysis.

Dose–response and analoging validation screening protocol


We considered advancing AIMS projects to additional validation studies based on the ability to reorder at least
some of the initial SD hits, the availability of chemical analogs in the screening library to the initial hits, the
capability to perform dose–response experiments, and the ability of the collaborators to perform additional
screens and return results promptly.
We performed two sets of experiments: DR validation of the SD hits from AIMS and analoging with DR
readouts. We performed DR measurements using the same assays and protocols as SD.
We performed an analoging round by identifying, for each AIMS hit, its 1000 nearest neighbors from
the Mcule l­ibrary76, using molecular fingerprints ­similarity68. We augmented the set with additional analogs
using ­substructure83 or ­FTrees84 searches, if needed. We used an AtomNet regression model, trained to predict
quantitative bioactivities (e.g., IC50 or Ki), to score and rank the analogs. A set of 20—35 compounds from the
analogs space of an initial hit were then obtained based on similarity and top scores from the AtomNet model
for testing.

Internal portfolio screening protocol


We followed a protocol similar to the AIMS screen with a few deviations. First, we used the Enamine REAL
library of over 16 billion ­compounds62. Second, we used an ensemble of six AtomNet models for the screens.
Last, on average, we selected a set of 440 compounds for testing.
The analoging protocol is similar to the AIMS validation studies, with the following deviations. First, we
used the Enamine REAL library for analog search. Second, we selected an average of 676 analogs per project.
Third, the analog search protocol was more complex, pulling nearest neighbors based on maximum common
substructure and graph edit distance in addition to the ECFP4-based one.

Scientific Reports | (2024) 14:7526 | https://doi.org/10.1038/s41598-024-54655-z 7

Vol.:(0123456789)
www.nature.com/scientificreports/

AtomNet® model architecture


We previously published in d ­ etail52,53,55,58,59,61,85,86 during the course of the AIMS program, and we described the
most recent version of the AtomNet model architecture in detail ­elsewhere53. We provide a brief description
below.
The AtomNet model is a Graph Convolution Network architecture with atoms represented as vertices and
pair-wise, distance-dependent, edges representing atom proximities. The input is a graph network of features
characterizing the atom types and topologies of an ensemble of protein–ligand complexes. Receptor atoms more
than 7 Å away from any ligand atom are excluded from the complexes, and each node in the graph is associated
with a feature vector representing the atom type using Sybyl ­typing87.
The network has five graph convolutional blocks. In the first two graph convolution blocks, all ligand and
receptor atoms 5 Å apart from each other are considered, and 64 filters per block are used. In the third block, the
cutoff radius and filters are increased to 7 Å and 128, respectively. Only ligand features in the last two blocks are
considered without changing the threshold cutoff or the number of filters. Finally, the sum-pool of the ligand-
only layer creates a 3-task layer on top of the network. That multi-task layer predicts three endpoints: bioactivity,
pose quality, and a physics-based docking s­ core88.
We trained an ensemble of 6 models, splitting the training data into sixfold cross-validation sets based on a
protein sequence similarity cutoff of 70%. Then, each model in the ensemble was trained on a different fold for 10
epochs, using the ADAM ­optimizer89 with a learning rate of 0.001, and targets were sampled with replacement,
proportional to the number of active compounds associated with that target.

Data
All data generated or analyzed during this study are included in this published article (and its supplementary
information S1 files). Boxplots illustrations show the quartiles (Q1 and Q3) of the dataset while the whiskers
extend to show the rest of the distribution, except for points that are determined to be “outliers” (1.5 × of the
inter-quartile range, as implemented in the Seaborn and Matplotlib ­toolboxes90,91).

Conclusion
HTS is the most widely-used tool for hit discovery for new targets. Unfortunately, all physical screening
methods share the critical limitation that a molecule must exist to be screened. Computational methods enable
a fundamental shift to a test-then-make paradigm. In this work, we report on 318 projects (22 internal projects
and 296 collaborations) where we used the AtomNet platform as the primary screening tool coupled with low-
throughput physical screens as validation. The AtomNet technology can identify bioactive scaffolds across a
wide range of proteins, even without known binders, X-ray structures, or manual cherry-picking of compounds.
Our empirical results suggest that machine learning approaches have reached a computational accuracy that can
replace HTS as the first step of small-molecule drug discovery.

Data availability
All data generated or analyzed during this study are included in this published article and its supplementary
information files.

Received: 15 September 2023; Accepted: 15 February 2024

References
1. Kuntz, I. D. Structure-based strategies for drug design and discovery. Science 257, 1078–1082 (1992).
2. Bajorath, J. Integration of virtual and high-throughput screening. Nat. Rev. Drug Discov. 1, 882–894 (2002).
3. Walters, W. P., Stahl, M. T. & Murcko, M. A. Virtual screening—an overview. Drug Discov. Today 3, 160–178 (1998).
4. Ring, C. S. et al. Structure-based inhibitor design by using protein models for the development of antiparasitic agents. Proc.
Natl. Acad. Sci. USA. 90, 3583–3587 (1993).
5. Brown, D. G. An analysis of successful hit-to-clinical candidate pairs. J. Med. Chem. https://​doi.​org/​10.​1021/​acs.​jmedc​hem.​
3c005​21 (2023).
6. Békés, M., Langley, D. R. & Crews, C. M. PROTAC targeted protein degraders: The past is prologue. Nat. Rev. Drug Discov. 21,
181–200 (2022).
7. Lu, H. et al. Recent advances in the development of protein–protein interactions modulators: Mechanisms and clinical trials.
Signal Transduct. Target. Ther. 5, 1–23 (2020).
8. Childs-Disney, J. L. et al. Targeting RNA structures with small molecules. Nat. Rev. Drug Discov. 21, 736–762 (2022).
9. Brown, D. G. & Boström, J. Where do recent small molecule clinical development candidates come from?. J. Med. Chem. 61,
9442–9468 (2018).
10. Dragovich, P. S., Haap, W., Mulvihill, M. M., Plancher, J.-M. & Stepan, A. F. Small-molecule lead-finding trends across the roche
and genentech research organizations. J. Med. Chem. 65, 3606–3615 (2022).
11. Perola, E. An analysis of the binding efficiencies of drugs and their leads in successful drug discovery programs. J. Med. Chem.
53, 2986–2997 (2010).
12. Lyu, J. et al. Ultra-large library docking for discovering new chemotypes. Nature 566, 224 (2019).
13. Sadybekov, A. A. et al. Synthon-based ligand discovery in virtual libraries of over 11 billion compounds. Nature 601, 452–459
(2022).
14. Bellmann, L., Penner, P., Gastreich, M. & Rarey, M. Comparison of combinatorial fragment spaces and its application to ultralarge
make-on-demand compound catalogs. J. Chem. Inf. Model. 62, 553–566 (2022).
15. Neumann, A., Marrison, L. & Klein, R. Relevance of the trillion-sized chemical space “explore” as a source for drug discovery.
ACS Med. Chem. Lett. 14, 466–472 (2023).
16. Sunkari, Y. K., Siripuram, V. K., Nguyen, T.-L. & Flajolet, M. High-power screening (HPS) empowered by DNA-encoded libraries.
Trends Pharmacol. Sci. 43, 4–15 (2022).

Scientific Reports | (2024) 14:7526 | https://doi.org/10.1038/s41598-024-54655-z 8

Vol:.(1234567890)
www.nature.com/scientificreports/

17. Malo, N., Hanley, J. A., Cerquozzi, S., Pelletier, J. & Nadon, R. Statistical practice in high-throughput screening data analysis.
Nat. Biotechnol. 24, 167–175 (2006).
18. Iversen, P. W., Eastwood, B. J., Sittampalam, G. S. & Cox, K. L. A comparison of assay performance measures in screening assays:
Signal window, Z’ factor, and assay variability ratio. J. Biomol. Screen. 11, 247–252 (2006).
19. Zhang, J.-H., Chung, T. D. Y. & Oldenburg, K. R. A simple statistical parameter for use in evaluation and validation of high
throughput screening assays. J. Biomol. Screen. 4, 67–73 (1999).
20. Jadhav, A. et al. Quantitative analyses of aggregation, autofluorescence, and reactivity artifacts in a screen for inhibitors of a
thiol protease. J. Med. Chem. 53, 37–51 (2010).
21. Fox, S. et al. High-throughput screening: Update on practices and success. J. Biomol. Screen. 11, 864–869 (2006).
22. Owen, S. C., Doak, A. K., Wassam, P., Shoichet, M. S. & Shoichet, B. K. Colloidal aggregation affects the efficacy of anticancer
drugs in cell culture. ACS Chem. Biol. 7, 1429–1435 (2012).
23. Rössler, S. L., Grob, N. M., Buchwald, S. L. & Pentelute, B. L. Abiotic peptides as carriers of information for the encoding of
small-molecule library synthesis. Science 379, 939–945 (2023).
24. McGovern, S. L., Caselli, E., Grigorieff, N. & Shoichet, B. K. A Common mechanism underlying promiscuous inhibitors from
virtual and high-throughput screening. J. Med. Chem. 45, 1712–1722 (2002).
25. Feng, B. Y., Shelat, A., Doman, T. N., Guy, R. K. & Shoichet, B. K. High-throughput assays for promiscuous inhibitors. Nat.
Chem. Biol. 1, 146–148 (2005).
26. Martin, E. J., Polyakov, V. R., Tian, L. & Perez, R. C. Profile-QSAR 2.0: Kinase virtual screening accuracy comparable to four-
concentration IC50s for realistically novel compounds. J. Chem. Inf. Model. 57, 2077–2088 (2017).
27. Keiser, M. J. et al. Predicting new molecular targets for known drugs. Nature 462, 175–181 (2009).
28. Svetnik, V. et al. Random forest: A classification and regression tool for compound classification and QSAR modeling. J. Chem.
Inf. Comput. Sci. 43, 1947–1958 (2003).
29. Kitchen, D. B., Decornez, H., Furr, J. R. & Bajorath, J. Docking and scoring in virtual screening for drug discovery: methods
and applications. Nat. Rev. Drug Discov. 3, 935–949 (2004).
30. Shoichet, B. K. Virtual screening of chemical libraries. Nature 432, 862–865 (2004).
31. Ma, J., Sheridan, R. P., Liaw, A., Dahl, G. E. & Svetnik, V. Deep neural nets as a method for quantitative structure-activity
relationships. J. Chem. Inf. Model. 55, 263–274 (2015).
32. Sheridan, R. P. et al. Machine Learning and Deep Learning Experimental error, kurtosis, activity cliffs, and methodology: What
limits the predictivity of QSAR models?. J. Chem. Inf. Model. https://​doi.​org/​10.​1021/​acs.​jcim.​9b010​67 (2020).
33. Wallach, I. & Heifets, A. Most ligand-based classification benchmarks reward memorization rather than generalization. J. Chem.
Inf. Model. 58, 916–932 (2018).
34. Chen, L. et al. Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual
screening. PLOS ONE 14, e0220113 (2019).
35. Chuang, K. V. & Keiser, M. J. Comment on “Predicting reaction performance in C–N cross-coupling using machine learning”.
Science 362, eaat8603 (2018).
36. Gaieb, Z. et al. D3R Grand Challenge 3: Blind prediction of protein–ligand poses and affinity rankings. J. Comput. Aided Mol.
Des. 33, 1–18 (2019).
37. Gabel, J., Desaphy, J. & Rognan, D. Beware of machine learning-based scoring functions on the danger of developing black
boxes. J. Chem. Inf. Model. 54, 2807–2815 (2014).
38. Cerón-Carrasco, J. P. When virtual screening yields inactive drugs: dealing with false theoretical friends. ChemMedChem 17,
e202200278 (2022).
39. McCloskey, K. et al. Machine learning on DNA-encoded libraries: A new paradigm for hit-finding. J. Med. Chem. 63, 8857–8866
(2020).
40. Wenzel, J., Matter, H. & Schmidt, F. Predictive multitask deep neural network models for ADME-Tox properties: Learning from
large data sets. J. Chem. Inf. Model. 59, 1253–1268 (2019).
41. Feinberg, E. N. et al. PotentialNet for molecular property prediction. ACS Cent. Sci. 4, 1520–1530 (2018).
42. Schindler, C. E. M. et al. Large-scale assessment of binding free energy calculations in active drug discovery projects. J. Chem.
Inf. Model. 60, 5457–5474 (2020).
43. Bordogna, A., Pandini, A. & Bonati, L. Predicting the accuracy of protein–ligand docking on homology models. J. Comput.
Chem. 32, 81–98 (2011).
44. Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688-702.e13 (2020).
45. Melo, M. C. R., Maasch, J. R. M. A. & de la Fuente-Nunez, C. Accelerating antibiotic discovery through artificial intelligence.
Commun. Biol. 4, 1–13 (2021).
46. Skinnider, M. A. et al. A deep generative model enables automated structure elucidation of novel psychoactive substances. Nat.
Mach. Intell. 3, 973–984 (2021).
47. Muegge, I. & Oloff, S. Advances in virtual screening. Drug Discov. Today Technol. 3, 405–411 (2006).
48. N. Muratov, E. et al. QSAR without borders. Chem. Soc. Rev. 49, 3525–3564 (2020).
49. Zhavoronkov, A. et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37, 1038–
1040 (2019).
50. Walters, W. P. & Murcko, M. Assessing the impact of generative AI on medicinal chemistry. Nat. Biotechnol. 38, 143–145 (2020).
51. Scannell, J. W., Blanckley, A., Boldon, H. & Warrington, B. Diagnosing the decline in pharmaceutical R&D efficiency. Nat. Rev.
Drug Discov. 11, 191 (2012).
52. Wallach, I., Dzamba, M. & Heifets, A. AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-
based Drug Discovery. ArXiv Prepr. ArXiv151002855 1–11 (2015).
53. Gniewek, P., Worley, B., Stafford, K., van den Bedem, H. & Anderson, B. Learning physics confers pose-sensitivity in structure-
based virtual screening. https://​doi.​org/​10.​48550/​arXiv.​2110.​15459 (2021).
54. Stafford, K. A., Anderson, B. M., Sorenson, J. & van den Bedem, H. AtomNet PoseRanker: Enriching ligand pose quality for
dynamic proteins in virtual high-throughput screens. J. Chem. Inf. Model. 62, 1178–1189 (2022).
55. Hsieh, C.-H. et al. Miro1 marks parkinson’s disease subset and miro1 reducer rescues neuron loss in Parkinson’s models. Cell
Metab. 30, 1131-1140.e7 (2019).
56. Reidenbach, A. G. et al. Multimodal small-molecule screening for human prion protein binders. J. Biol. Chem. 295, 13516–13531
(2020).
57. Bon, C. et al. Discovery of novel trace amine-associated receptor 5 (TAAR5) antagonists using a deep convolutional neural
network. Int. J. Mol. Sci. 23, 3127 (2022).
58. Stecula, A., Hussain, M. S. & Viola, R. E. Discovery of novel inhibitors of a critical brain enzyme using a homology model and
a deep convolutional neural network. J. Med. Chem. 63, 8867–8875 (2020).
59. Su, S. et al. SPOP and OTUD7A Control EWS–FLI1 protein stability to govern ewing sarcoma growth. Adv. Sci. 8, 2004846
(2021).
60. Pedicone, C. et al. Discovery of a novel SHIP1 agonist that promotes degradation of lipid-laden phagocytic cargo by microglia.
iScience 25, 104170 (2022).

Scientific Reports | (2024) 14:7526 | https://doi.org/10.1038/s41598-024-54655-z 9

Vol.:(0123456789)
www.nature.com/scientificreports/

61. Huang, C. et al. Small molecules block the interaction between porcine reproductive and respiratory syndrome virus and CD163
receptor and the infection of pig cells. Virol. J. 17, 116 (2020).
62. Grygorenko, O. O. et al. Generating multibillion chemical space of readily accessible screening compounds. iScience 23, 101681
(2020).
63. Dandapani, S., Rosse, G., Southall, N., Salvino, J. M. & Thomas, C. J. Selecting, acquiring, and using small molecule libraries for
high-throughput screening. Curr. Protoc. Chem. Biol. 4, 177–191 (2012).
64. Schuffenhauer, A. et al. Library design for fragment based screening. Curr. Top. Med. Chem. 5, 751–762 (2005).
65. Jacoby, E. et al. Key aspects of the novartis compound collection enhancement project for the compilation of a comprehensive
Chemogenomics drug discovery screening collection. Curr. Top. Med. Chem. 5, 397–411 (2005).
66. Petrova, T., Chuprina, A., Parkesh, R. & Pushechnikov, A. Structural enrichment of HTS compounds from available commercial
libraries. MedChemComm 3, 571–579 (2012).
67. Macarron, R. et al. Impact of high-throughput screening in biomedical research. Nat. Rev. Drug Discov. 10, 188–195 (2011).
68. Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).
69. Riniker, S. & Landrum, G. A. Open-source platform to benchmark fingerprints for ligand-based virtual screening. J.
Cheminformatics 5, 26 (2013).
70. Ren, F. et al. AlphaFold accelerates artificial intelligence powered drug discovery: Efficient discovery of a novel cyclin-dependent
kinase 20 (CDK20) Small Molecule Inhibitor (2022).
71. Assessing structural novelty of the first AI-designed drug candidates to go into human clinical trials. CAS https://​www.​cas.​org/​
resou​rces/​blog/​ai-​drug-​candi​dates.
72. Kohavi, R. & Wolpert, D. Bias plus variance decomposition for zero-one loss functions. in Proceedings of the Thirteenth
International Conference on International Conference on Machine Learning 275–283 (Morgan Kaufmann Publishers Inc., San
Francisco, CA, USA, 1996).
73. Ferrara, P. & Jacoby, E. Evaluation of the utility of homology models in high throughput docking. J. Mol. Model. 13, 897–905
(2007).
74. Walters, W. P. & Namchuk, M. Designing screens: How to make your hits a hit. Nat. Rev. Drug Discov. 2, 259–266 (2003).
75. Inglese, J. et al. High-throughput screening assays for the identification of chemical probes. Nat. Chem. Biol. 3, 466–479 (2007).
76. mcule database. https://​mcule.​com/​datab​ase/.
77. Screening Collections - Enamine. https://​enami​ne.​net/​compo​und-​colle​ctions/​scree​ning-​colle​ction.
78. Bruns, R. F. & Watson, I. A. Rules for identifying potentially reactive or promiscuous compounds. J. Med. Chem. 55, 9763–9772
(2012).
79. Baell, J. B. & Holloway, G. A. New substructure filters for removal of pan assay interference compounds (PAINS) from screening
libraries and for their exclusion in bioassays. J. Med. Chem. 53, 2719–2740 (2010).
80. Abagyan, R. & Kufareva, I. The flexible pocketome engine for structural chemogenomics. Methods Mol. Biol. Clifton NJ 575,
249–279 (2009).
81. Le Guilloux, V., Schmidtke, P. & Tuffery, P. Fpocket: An open source platform for ligand pocket detection. BMC Bioinformatics
10, 168 (2009).
82. Butina, D. Unsupervised data base clustering based on daylight’s fingerprint and tanimoto similarity: A fast and automated way
to cluster small and large data sets. J. Chem. Inf. Comput. Sci. 39, 747–750 (1999).
83. RDKit: Open-Source Cheminformatics.
84. Rarey, M. & Dixon, J. S. Feature trees: A new molecular similarity measure based on tree matching. J. Comput. Aided Mol. Des.
12, 471–490 (1998).
85. Stafford, K., Anderson, B. M., Sorenson, J. & van den Bedem, H. AtomNet PoseRanker: Enriching Ligand Pose Quality for Dynamic
Proteins in Virtual High Throughput Screens. https://​doi.​org/​10.​26434/​chemr​xiv-​2021-​t6xkj (2021).
86. Schroedl, S. Current methods and challenges for deep learning in drug discovery. Drug Discov. Today Technol. 32–33, 9–17
(2019).
87. Bender, A., Mussa, H. Y., Glen, R. C. & Reiling, S. Molecular similarity searching using atom environments, information-based
feature selection, and a Naïve Bayesian classifier. J. Chem. Inf. Comput. Sci. 44, 170–178 (2004).
88. Trott, O. & Olson, A. J. AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient
optimization, and multithreading. J. Comput. Chem. 31, 455–461 (2010).
89. Kingma, D. P. & Ba, J. Adam: A Method for Stochastic Optimization. ArXiv14126980 Cs (2017).
90. Waskom, M. L. seaborn: Statistical data visualization. J. Open Source Softw. 6, 3021 (2021).
91. Hunter, J. D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
92. Marineau, J. J. et al. Discovery of SY-5609: A selective, noncovalent inhibitor of CDK7. J. Med. Chem. 65, 1458–1480 (2022).
93. Gu, X., BAI, H., Barbeau, O. R. & Besnard, J. Aromatic heterocyclic compound, and pharmaceutical composition and application
thereof. (2022).
94. Barbay, J. K., Chakravarty, D., Leonard, K., Shook, B. C. & Wang, A. Phenyl and heteroaryl substituted thieno[2,3-d]Pyrimidines
and their use as adenosine A2a receptor antagonists (2010).
95. Bell, A. S., Schreyer, A. M. & Versluys, S. Pyrazolopyrimidine compounds as adenosine receptor antagonists (2019).
96. Soldermann, C. P. et al. Pyrazolo pyrimidine derivatives and their use as MALT1 inhbitors (2019).
97. Feng, S. et al. Tricyclic compounds useful in the treatment of cancer, autoimmune and inflammatory disorders (2023).
98. Heiser, U. & Sommer, R. Inhibitors of glutaminyl cyclase (2020).
99. Cheng, X., Liu, Y., Qin, L., Ren, F. & Wu, J. Beta-lactam derivatives for the treatment of diseases (2023).
100. Wylie, A. A. et al. Therapeutic combinations comprising ubiquitin-specific-processing protease 1 (usp1) inhibitors and poly
(adp-ribose) polymerase (parp) inhibitors (2021).
101. Wu, J., Qin, L. & Liu, J. Small molecule inhibitors of ubiquitin specific protease 1 (usp1) and uses thereof 2023).
102. John, S. E. S. & Mesecar, A. D. Broad-spectrum non-covalent coronavirus protease inhibitors (2017).
103. Zavoronkovs, A., Ivanenkov, Y. A. & Zagribelnyy, B. Sars-cov-2 inhibitors having covalent modifications for treating coronavirus
infections. (2021).

Acknowledgements
See Supplementary section S2.

Author contributions
All authors have contributed to the publication, being variously involved in technology development,
experimental protocol designs, experimental performance, data acquisition, statistical analysis, and manuscript
writing.

Competing interests
The authors affiliated with Atomwise declare the existence of a financial competing interest.

Scientific Reports | (2024) 14:7526 | https://doi.org/10.1038/s41598-024-54655-z 10

Vol:.(1234567890)
www.nature.com/scientificreports/

Additional information
Supplementary Information The online version contains supplementary material available at https://​doi.​org/​
10.​1038/​s41598-​024-​54655-z.
Correspondence and requests for materials should be addressed to
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and
indicate if changes were made. The images or other third party material in this article are included in the article’s
Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included
in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or
exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy
of this licence, visit http://​creat​iveco​mmons.​org/​licen​ses/​by/4.​0/.

© The Author(s) 2024, corrected publication 2024

The Atomwise AIMS Program


Izhar Wallach2, Denzil Bernard2, Kong Nguyen2, Gregory Ho2, Adrian Morrison2,
Adrian Stecula2, Andreana Rosnik2, Ann Marie O’Sullivan2, Aram Davtyan2, Ben Samudio2,
Bill Thomas2, Brad Worley2, Brittany Butler2, Christian Laggner2, Desiree Thayer2,
Ehsan Moharreri2, Greg Friedland2, Ha Truong2, Henry van den Bedem2, Ho Leung Ng2,
Kate Stafford2, Krishna Sarangapani2, Kyle Giesler2, Lien Ngo2, Michael Mysinger2,
Mostafa Ahmed2, Nicholas J. Anthis2, Niel Henriksen2, Pawel Gniewek2, Sam Eckert2,
Saulo de Oliveira2, Shabbir Suterwala2, Srimukh Veccham Krishna PrasadPrasad2,
Stefani Shek2, Stephanie Contreras2, Stephanie Hare2, Teresa Palazzo2, Terrence E. O’Brien2,
Tessa Van Grack2, Tiffany Williams2, Ting‑Rong Chern2, Victor Kenyon2, Andreia H. Lee3,
Andrew B. Cann4, Bastiaan Bergman5, Brandon M. Anderson6, Bryan D. Cox7,
Jeffrey M. Warrington8, Jon M. Sorenson9, Joshua M. Goldenberg10, Matthew A. Young11,
Nicholas DeHaan12, Ryan P. Pemberton13, Stefan Schroedl14, Tigran M. Abramyan11,15,
Tushita Gupta16, Venkatesh Mysore17, Adam G. Presser18, Adolfo A. Ferrando19,
Adriano D. Andricopulo20, Agnidipta Ghosh21, Aicha Gharbi Ayachi22, Aisha Mushtaq23,
Ala M. Shaqra24, Alan Kie Leong Toh25, Alan V. Smrcka26, Alberto Ciccia27,
Aldo Sena de Oliveira28, Aleksandr Sverzhinsky29, Alessandra Mara de Sousa30,
Alexander I. Agoulnik31, Alexander Kushnir32, Alexander N. Freiberg33,
Alexander V. Statsyuk34, Alexandre R. Gingras35, Alexei Degterev36, Alexey Tomilov37,
Alice Vrielink38, Alisa A. Garaeva39, Amanda Bryant‑Friedrich40, Amedeo Caflisch41,
Amit K. Patel35, Amith Vikram Rangarajan42, An Matheeussen43, Andrea Battistoni44,
Andrea Caporali45, Andrea Chini46, Andrea Ilari47, Andrea Mattevi48, Andrea Talbot Foote49,
Andrea Trabocchi50, Andreas Stahl51, Andrew B. Herr52, Andrew Berti40, Andrew Freywald53,
Andrew G. Reidenbach54, Andrew Lam55, Andrew R. Cuddihy56, Andrew White57,
Angelo Taglialatela19, Anil K. Ojha58, Ann M. Cathcart59, Anna A. L. Motyl45, Anna Borowska39,
Anna D’Antuono60, Anna K. H. Hirsch61, Anna Maria Porcelli62, Anna Minakova48,
Anna Montanaro60, Anna Müller41, Annarita Fiorillo63, Anniina Virtanen64,
Anthony J. O’Donoghue35, Antonio Del Rio Flores51, Antonio E. Garmendia65,
Antonio Pineda‑Lucena66, Antonito T. Panganiban67, Ariela Samantha38,
Arnab K. Chatterjee68, Arthur L. Haas69, Ashleigh S. Paparella21, Ashley L. St. John70,
Ashutosh Prince71, Assmaa ElSheikh72, Athena Marie Apfel57, Audrey Colomba73,
Austin O’Dea74, Bakary N’tji Diallo75, Beatriz Murta Rezende Moraes Ribeiro76,
Ben A. Bailey‑Elkin77, Benjamin L. Edelman78, Benjamin Liou52, Benjamin Perry79,
Benjamin Soon Kai Chua80, Benjámin Kováts81, Bernhard Englinger59, Bijina Balakrishnan82,
Bin Gong33, Bogos Agianian21, Brandon Pressly37, Brenda P. Medellin Salas83,
Brendan M. Duggan35, Brian V. Geisbrecht84, Brian W. Dymock85, Brianna C. Morten85,
Bruce D. Hammock37, Bruno Eduardo Fernandes Mota76, Bryan C. Dickinson86,
Cameron Fraser87, Camille Lempicki88, Carl D. Novina89, Carles Torner90, Carlo Ballatore35,

Scientific Reports | (2024) 14:7526 | https://doi.org/10.1038/s41598-024-54655-z 11

Vol.:(0123456789)
www.nature.com/scientificreports/

Carlotta Bon91, Carly J. Chapman92, Carrie L. Partch93, Catherine T. Chaton94, Chang Huang65,
Chao‑Yie Yang95, Charlene M. Kahler38, Charles Karan27, Charles Keller96, Chelsea L. Dieck97,
Chen Huimei70, Chen Liu98, Cheryl Peltier77, Chinmay Kumar Mantri70,
Chinyere Maat Kemet55, Christa E. Müller99, Christian Weber100, Christina M. Zeina59,
Christine S. Muli101, Christophe Morisseau37, Cigdem Alkan33, Clara Reglero19, Cody A. Loy101,
Cornelia M. Wilson102, Courtney Myhr31, Cristina Arrigoni48, Cristina Paulino39,
César Santiago103, Dahai Luo22, Damon J. Tumes104, Daniel A. Keedy105, Daniel A. Lawrence57,
Daniel Chen106, Danny Manor71, Darci J. Trader101, David A. Hildeman52, David H. Drewry107,
David J. Dowling108, David J. Hosfield86, David M. Smith109, David Moreira110,
David P. Siderovski111, David Shum112, David T. Krist113, David W. H. Riches78,
Davide Maria Ferraris114, Deborah H. Anderson115, Deirdre R. Coombe116, Derek S. Welsbie35,
Di Hu71, Diana Ortiz117, Dina Alramadhani118, Dingqiang Zhang119, Dipayan Chaudhuri82,
Dirk J. Slotboom39, Donald R. Ronning120, Donghan Lee121, Dorian Dirksen122,
Douglas A. Shoue123, Douglas William Zochodne124, Durga Krishnamurthy52,
Dustin Duncan125, Dylan M. Glubb92, Edoardo Luigi Maria Gelardi126, Edward C. Hsiao127,
Edward G. Lynn128, Elany Barbosa Silva129, Elena Aguilera130, Elena Lenci50,
Elena Theres Abraham131, Eleonora Lama62, Eleonora Mameli45, Elisa Leung125, Ellie Giles102,
Emily M. Christensen132, Emily R. Mason133, Enrico Petretto70, Ephraim F. Trakhtenberg134,
Eric J. Rubin18, Erick Strauss135, Erik W. Thompson25, Erika Cione136, Erika Mathes Lisabeth137,
Erkang Fan138, Erna Geessien Kroon76, Eunji Jo112, Eva M. García‑Cuesta103,
Evgenia Glukhov35, Evripidis Gavathiotis21, Fang Yu139, Fei Xiang140, Fenfei Leng141,
Feng Wang142, Filippo Ingoglia82, Focco van den Akker71, Francesco Borriello143,
Franco J. Vizeacoumar144, Frank Luh145, Frederick S. Buckner138, Frederick S. Vizeacoumar53,
Fredj Ben Bdira146, Fredrik Svensson73, G. Marcela Rodriguez147, Gabriella Bognár81,
Gaia Lembo148, Gang Zhang149, Garrett Dempsey51, Gary Eitzen150, Gaétan Mayer151,
Geoffrey L. Greene86, George A. Garcia57, Gergely L. Lukacs152, Gergely Prikler81,
Gian Carlo G. Parico93, Gianni Colotti47, Gilles De Keulenaer153, Gino Cortopassi37,
Giovanni Roti60, Giulia Girolimetti62, Giuseppe Fiermonte154, Giuseppe Gasparre155,
Giuseppe Leuzzi19, Gopal Dahal156, Gracjan Michlewski157,158, Graeme L. Conn159,
Grant David Stuchbury85, Gregory R. Bowman160, Grzegorz Maria Popowicz161, Guido Veit152,
Guilherme Eduardo de Souza20, Gustav Akk162, Guy Caljon43, Guzmán Alvarez163,
Gwennan Rucinski164, Gyeongeun Lee112, Gökhan Cildir165, Hai Li27, Hairol E. Breton166,
Hamed Jafar‑Nejad167, Han Zhou168, Hannah P. Moore169, Hannah Tilford164, Haynes Yuan170,
Heesung Shim37, Heike Wulff37, Heinrich Hoppe75, Helena Chaytow45, Heng‑Keat Tam171,
Holly Van Remmen172, Hongyang Xu173, Hosana Maria Debonsi174, Howard B. Lieberman27,
Hoyoung Jung175, Hua‑Ying Fan176, Hui Feng55, Hui Zhou19, Hyeong Jun Kim177,
Iain R. Greig178, Ileana Caliandro179, Ileana Corvo180, Imanol Arozarena181,
Imran N. Mungrue182, Ingrid M. Verhamme183, Insaf Ahmed Qureshi184, Irina Lotsaris185,
Isin Cakir57, J. Jefferson P. Perry194, Jacek Kwiatkowski85, Jacob Boorman71, Jacob Ferreira187,
Jacob Fries188, Jadel Müller Kratz79, Jaden Miner82, Jair L. Siqueira‑Neto35,
James G. Granneman189, James Ng164, James Shorter160, Jan Hendrik Voss99,
Jan M. Gebauer131, Janelle Chuah109, Jarrod J. Mousa190, Jason T. Maynes191, Jay D. Evans192,
Jeffrey Dickhout193, Jeffrey P. MacKeigan137, Jennifer N. Jossart194, Jia Zhou33, Jiabei Lin160,
Jiake Xu195, Jianghai Wang145, Jiaqi Zhu196, Jiayu Liao194, Jingyi Xu194, Jinshi Zhao197,
Jiusheng Lin198, Jiyoun Lee199, Joana Reis48, Joerg Stetefeld77, John B. Bruning200,
John Burt Bruning80, John G. Coles201, John J. Tanner166, John M. Pascal29, Jonathan So59,
Jordan L. Pederick80, Jose A. Costoya110, Joseph B. Rayman19, Joseph J. Maciag52,
Joshua Alexander Nasburg37, Joshua J. Gruber202, Joshua M. Finkelstein55, Joshua Watkins164,
José Miguel Rodríguez‑Frade203, Juan Antonio Sanchez Arias204, Juan José Lasarte205,
Julen Oyarzabal204, Julian Milosavljevic88, Julie Cools153, Julien Lescar22,
Julijus Bogomolovas35, Jun Wang147, Jung‑Min Kee175, Jung‑Min Kee177, Junzhuo Liao206,
Jyothi C. Sistla118, Jônatas Santos Abrahão76, Kamakshi Sishtla207, Karol R. Francisco35,
Kasper B. Hansen208, Kathleen A. Molyneaux71, Kathryn A. Cunningham33, Katie R. Martin137,
Kavita Gadar209, Kayode K. Ojo138, Keith S. Wong125, Kelly L. Wentworth127, Kent Lai82,
Kevin A. Lobb75, Kevin M. Hopkins27, Keykavous Parang210, Khaled Machaca211, Kien Pham98,
Kim Ghilarducci212, Kim S. Sugamori125, Kirk James McManus77, Kirsikka Musta64,
Kiterie M. E. Faller45, Kiyo Nagamori96, Konrad J. Mostert135, Konstantin V. Korotkov94,

Scientific Reports | (2024) 14:7526 | https://doi.org/10.1038/s41598-024-54655-z 12

Vol:.(1234567890)
www.nature.com/scientificreports/

Koting Liu213, Kristiana S. Smith214, Kristopher Sarosiek215, Kyle H. Rohde216,


Kyu Kwang Kim217, Kyung Hyeon Lee218, Lajos Pusztai98, Lari Lehtiö219, Larisa M. Haupt25,
Leah E. Cowen125, Lee J. Byrne102, Leila Su145, Leon Wert‑Lamas89,
Leonor Puchades‑Carrasco220, Lifeng Chen86, Linda H. Malkas186, Ling Zhuo221,
Lizbeth Hedstrom222, Lizbeth Hedstrom222, Loren D. Walensky59, Lorenzo Antonelli63,
Luisa Iommarini62, Luke Whitesell125, Lía M. Randall223, M. Dahmani Fathallah224,
Maira Harume Nagai197, Mairi Louise Kilkenny225, Manu Ben‑Johny19, Marc P. Lussier212,
Marc P. Windisch112, Marco Lolicato48, Marco Lucio Lolli179, Margot Vleminckx43,
Maria Cristina Caroleo226, Maria J. Macias90, Marilia Valli20, Marim M. Barghash125,
Mario Mellado203, Mark A. Tye227, Mark A. Wilson198, Mark Hannink228, Mark R. Ashton85,
Mark Vincent C.dela Cerna121, Marta Giorgis179, Martin K. Safo118, Martin St. Maurice229,
Mary Ann McDowell123, Marzia Pasquali82, Masfique Mehedi230,
Mateus Sá Magalhães Serafim76, Matthew B. Soellner57, Matthew G. Alteen231,
Matthew M. Champion123, Maxim Skorodinsky232, Megan L. O’Mara233, Mel Bedi40,
Menico Rizzi114, Michael Levin119, Michael Mowat234, Michael R. Jackson235, Mikell Paige218,
Minnatallah Al‑Yozbaki102, Miriam A. Giardini129, Mirko M. Maksimainen219,
Monica De Luise62, Muhammad Saddam Hussain207, Myron Christodoulides164,
Natalia Stec157, Natalia Zelinskaya159, Natascha Van Pelt43, Nathan M. Merrill57,
Nathanael Singh105, Neeltje A. Kootstra236, Neeraj Singh237, Neha S. Gandhi25, Nei‑Li Chan213,
Nguyen Mai Trinh22, Nicholas O. Schneider229, Nick Matovic85, Nicola Horstmann238,
Nicola Longo82, Nikhil Bharambe22, Nirvan Rouzbeh208, Niusha Mahmoodi21,
Njabulo Joyfull Gumede239, Noelle C. Anastasio33, Noureddine Ben Khalaf224,
Obdulia Rabal204, Olga Kandror215, Olivier Escaffre33, Olli Silvennoinen64,
Ozlem Tastan Bishop75, Pablo Iglesias110, Pablo Sobrado240, Patrick Chuong241,
Patrick O’Connell137, Pau Martin‑Malpartida90, Paul Mellor53, Paul V. Fish73,
Paulo Otávio Lourenço Moreira30, Pei Zhou197, Pengda Liu107, Pengda Liu107, Pengpeng Wu242,
Percy Agogo‑Mawuli111, Peter L. Jones243, Peter Ngoi93, Peter Toogood57, Philbert Ip125,
Philipp von Hundelshausen100, Pil H. Lee57, Rachael B. Rowswell‑Turner217,
Rafael Balaña‑Fouce244, Rafael Eduardo Oliveira Rocha76, Rafael V. C. Guido20,
Rafaela Salgado Ferreira76, Rajendra K. Agrawal58, Rajesh K. Harijan21,
Rajesh Ramachandran245, Rajkumar Verma246, Rakesh K. Singh247, Rakesh Kumar Tiwari248,
Ralph Mazitschek227, Rama K. Koppisetti166, Remus T. Dame146, Renée N. Douville249,
Richard C. Austin193, Richard E. Taylor123, Richard G. Moore217, Richard H. Ebright147,
Richard M. Angell73, Riqiang Yan237, Rishabh Kejriwal65, Robert A. Batey125,
Robert Blelloch127, Robert J. Vandenberg185, Robert J. Hickey186, Robert J. Kelm Jr.49,
Robert J. Lake176, Robert K. Bradley250, Robert M. Blumenthal106, Roberto Solano46,
Robin Matthias Gierse251, Ronald E. Viola156, Ronan R. McCarthy209, Rosa Maria Reguera244,
Ruben Vazquez Uribe252, Rubens Lima do Monte‑Neto30, Ruggiero Gorgoglione154,
Ryan T. Cullinane222, Sachin Katyal170, Sakib Hossain105, Sameer Phadke57,
Samuel A. Shelburne238, Sandra E. Geden216, Sandra Johannsen61, Sarah Wazir219,
Scott Legare77, Scott M. Landfear117, Senthil K. Radhakrishnan118, Serena Ammendola44,
Sergei Dzhumaev253, Seung‑Yong Seo140, Shan Li142, Shan Zhou167, Shaoyou Chu133,
Shefali Chauhan254, Shinsaku Maruta255,256, Shireen R. Ashkar57, Show‑Ling Shyng117,
Silvestro G. Conticello148,256, Silvia Buroni48, Silvia Garavaglia114, Simon J. White65,
Siran Zhu157,158, Sofiya Tsimbalyuk257, Somaia Haque Chadni141, Soo Young Byun112,
Soonju Park112, Sophia Q. Xu258, Sourav Banerjee259, Stefan Zahler221, Stefano Espinoza91,
Stefano Gustincich91, Stefano Sainas179, Stephanie L. Celano137, Stephen J. Capuzzi107,
Stephen N. Waggoner52, Steve Poirier260, Steven H. Olson235, Steven O. Marx261,
Steven R. Van Doren166, Suryakala Sarilla183, Susann M. Brady‑Kalnay71, Sydney Dallman230,
Syeda Maryam Azeem105, Tadahisa Teramoto262, Tamar Mehlman105, Tarryn Swart75,
Tatjana Abaffy263, Tatos Akopian215, Teemu Haikarainen64, Teresa Lozano Moreda264,
Tetsuro Ikegami33, Thaiz Rodrigues Teixeira174, Thilina D. Jayasinghe120,
Thomas H. Gillingwater45, Thomas Kampourakis265, Timothy I. Richardson207,
Timothy J. Herdendorf84, Timothy J. Kotzé135, Timothy R. O’Meara266, Timothy W. Corson207,
Tobias Hermle88, Tomisin Happy Ogunwa255, Tong Lan86, Tong Su228, Toshihiro Banjo267,
Tracy A. O’Mara92, Tristan Chou42, Tsui‑Fen Chou142, Ulrich Baumann131, Umesh R. Desai118,
Vaibhav P. Pai119, Van Chi Thai38, Vasudha Tandon259, Versha Banerji77, Victoria L. Robinson65,

Scientific Reports | (2024) 14:7526 | https://doi.org/10.1038/s41598-024-54655-z 13

Vol.:(0123456789)
www.nature.com/scientificreports/

Vignesh Gunasekharan168, Vigneshwaran Namasivayam99, Vincent F. M. Segers43,


Vincent Maranda53, Vincenza Dolce136, Vinícius Gonçalves Maltarollo76,
Viola Camilla Scoffone48, Virgil A. Woods105, Virginia Paola Ronchi268, Vuong Van Hung Le269,
W. Brent Clayton101, W. Todd Lowther270, Walid A. Houry125, Wei Li271, Weiping Tang206,
Wenjun Zhang51, Wesley C. Van Voorhis138, William A. Donaldson229, William C. Hahn59,
William G. Kerr272, William H. Gerwick129, William J. Bradshaw273, Wuen Ee Foong274,
Xavier Blanchet275, Xiaoyang Wu86, Xin Lu123, Xin Qi245, Xin Xu84, Xinfang Yu167,
Xingping Qin276, Xingyou Wang222, Xinrui Yuan95, Xu Zhang277, Yan Jessie Zhang83,
Yanmei Hu147, Yasser Ali Aldhamen137, Yicheng Chen71, Yihe Li71, Ying Sun52, Yini Zhu123,
Yogesh K. Gupta278, Yolanda Pérez‑Pertejo244, Yong Li167, Young Tang65, Yuan He40,
Yuk‑Ching Tse‑Dinh141, Yulia A. Sidorova279, Yun Yen145, Yunlong Li280, Zachary J. Frangos281,
Zara Chung22, Zhengchen Su33, Zhenghe Wang71, Zhiguo Zhang27, Zhongle Liu125,
Zintis Inde215, Zoraima Artía163 & Abraham Heifets2
2
Atomwise Inc., San Fransico, USA. 3Amgen, Thousand Oaks, USA. 4OpenAI, San Francisco, USA. 5Model
Medicines, La Jolla, USA. 6Atomic.AI, San Francisco, USA. 7Edifice Health, Inc., San Mateo, USA. 8METiS
Therapeutics, Cambridge, USA. 9Genentech, San Mateo, USA. 10US Navy Medical Service Corps Officer
(2300/1810D), San Mateo, USA. 11Totus Medicines, Inc., Emeryville, USA. 12Cytokinetics, Inc., South San Francisco,
USA. 13Nurix Therapeutics, San Francisco, USA. 14Amazon Alexa, Suite, USA. 15The University of North Carolina at
Chapel Hill Eshelman School of Pharmacy, Chapel Hill, USA. 16Refibered Inc., Cupertino, USA. 17NVIDIA, Santa
Clara, USA. 18Harvard TH Chan School of Public Health, Boston, USA. 19Columbia University, New York, USA.
20
University of São Paulo, São Paulo, Brazil. 21Albert Einstein College of Medicine, Bronx, USA. 22Nanyang
Technological University, Singapore, Singapore. 23University of Washington, Seattle, USA. 24Chan Medical School,
University of Massachusetts, Worcester, USA. 25Queensland University of Technology, Brisbane, Australia.
26
University of Michigan Medical School, Ann Arbor, USA. 27Columbia University Irving Medical Center, New York,
USA. 28Universidade Federal de Santa Catarina, Florianópolis, Brazil. 29Université de Montréal, Montreal, Canada.
30
Instituto René Rachou-Fundação Oswaldo Cruz/Fiocruz Minas, Belo Horizonte, Brazil. 31Herbert Wertheim
College of Medicine, Biomolecular Science Institute, Florida International University, Miami, USA. 32NYU Langone
Health, New York, USA. 33The University of Texas Medical Branch at Galveston, Galveston, USA. 34University of
Houston, Galveston, USA. 35University of California, San Diego, USA. 36School of Medicine, Tufts University,
Medford, USA. 37University of California, Davis, Davis, USA. 38University of Western Australia, Crawley, Australia.
39
University of Groningen, Groningen, The Netherlands. 40Wayne State University, Detroit, USA. 41University of
Zurich, Zürich, Switzerland. 42Stanford University, Stanford, USA. 43University of Antwerp, Antwerp, Belgium.
44
University of Rome Tor Vergata, Rome, Italy. 45University of Edinburgh, Edinburgh, UK. 46Department of Plant
Molecular Genetics, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas (CNB-
CSIC), Madrid, Spain. 47CNR (Italian National Research Council), Rome, Italy. 48University of Pavia, Pavia, Italy.
49
University of Vermont, Burlington, USA. 50University of Florence, Florence, Italy. 51University of California,
Berkeley, Berkeley, USA. 52Cincinnati Children’s Hospital Medical Center, Cincinnati, USA. 53University of
Saskatchewan, Saskatoon, Canada. 54Broad Institute of MIT and Harvard, Cambridge, USA. 55Boston University,
Boston, USA. 56CancerCare Manitoba Research Institute, Winnipeg, Canada. 57University of Michigan, Ann Arbor,
USA. 58Wadsworth Center, New York State Department of Health and University at Albany, Albany, USA. 59Dana-
Farber Cancer Institute, Boston, USA. 60University of Parma, Parma, Italy. 61Helmholtz Institute for Pharmaceutical
Research Saarland, Saarbrücken, Germany. 62University of Bologna, Bologna, Italy. 63Sapienza University of Rome,
Rome, Italy. 64Tampere University, Tampere, Finland. 65University of Connecticut, Storrs, USA. 66Centro de
Investigación Médica Aplicada, Universidad de Navarra, Pamplona, Spain. 67Tulane National Primate Research
Center, Tulane University, Covington, USA. 68Scripps Research, San Diego, USA. 69Louisiana State University
School of Medicine, New Orleans, USA. 70Duke-NUS Medical School, Singapore, Singapore. 71Case Western
Reserve University, Cleveland, USA. 72Oregon Health and Science University and Tanta University in Tanta, Tanta,
Egypt. 73University College London, London, UK. 74Saint Louis University, St. Louis, USA. 75Rhodes University,
Makhanda, South Africa. 76Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil. 77University of
Manitoba, Winnipeg, Canada. 78National Jewish Health, Denver, USA. 79Drugs for Neglected Diseases Initiative
(DNDi), Geneva, Switzerland. 80The University of Adelaide, Adelaide, Australia. 81Mcule, Budapest, Hungary.
82
University of Utah, Salt Lake City, USA. 83The University of Texas at Austin, Austin, USA. 84Kansas State
University, Manhattan, USA. 85UniQuest Pty Ltd, St Lucia, Australia. 86University of Chicago, Chicago, USA.
87
Harvard University, Cambridge, USA. 88University of Freiburg, Freiburg Im Breisgau, Germany. 89Dana-Farber
Cancer Institute and Harvard Medical School, Boston, USA. 90IRB Barcelona, Barcelona, Spain. 91Istituto Italiano Di
Tecnologia, Genoa, Italy. 92QIMR Berghofer Medical Research Institute, Herston, Australia. 93University of
California, Santa Cruz, Santa Cruz, USA. 94University of Kentucky, Lexington, USA. 95University of Tennessee
Health Science Center, Memphis, USA. 96Children’s Cancer Therapy Development Institute, Beaverton, USA.
97
Columbia University Medical Center, New York, USA. 98Yale School of Medicine, New Haven, USA. 99University of
Bonn, Bonn, Germany. 100Ludwig-Maximilians-Universität München, Munich, Germany. 101Purdue University, West
Lafayette, USA. 102Canterbury Christ Church University, Canterbury, UK. 103National Centre for Biotechnology
(CNB-CSIC), Madrid, Spain. 104University of South Australia and SA Pathology, Adelaide, Australia. 105CUNY
Advanced Science Research Center, New York, USA. 106The University of Toledo, Toledo, USA. 107University of
North Carolina at Chapel Hill, Chapel Hill, USA. 108Boston Children’s Hospital and Harvard Medical School, Boston,
USA. 109West Virginia University, Morgantown, USA. 110Universidade de Santiago de Compostela, Santiago, Spain.

Scientific Reports | (2024) 14:7526 | https://doi.org/10.1038/s41598-024-54655-z 14

Vol:.(1234567890)
www.nature.com/scientificreports/

111
University of North Texas Health Science Center at Fort Worth, Fort Worth, USA. 112Institut Pasteur Korea,
Seongnam, South Korea. 113Carle Illinois College of Medicine, Urbana, USA. 114Università del Piemonte Orientale,
Vercelli, Italy. 115Saskatchewan Cancer Agency, Saskatoon, Canada. 116Curtin University, Bentley, Australia.
117
Oregon Health and Science University, Portland, USA. 118Virginia Commonwealth University, Richmond, USA.
119
Tufts University, Medford, USA. 120University of Nebraska Medical Center, Omaha, USA. 121University of
Louisville, Louisville, USA. 122Dana Farber Cancer Institute, Boston, USA. 123University of Notre Dame, Notre
Dame, USA. 124University of Alberta, Edmonton, Canada. 125University of Toronto, Toronto, Canada. 126University
of Piemonte Orientale, Vercelli, Italy. 127University of California, San Francisco, San Francisco, USA. 128St. Joseph’s
Healthcare Hamilton, and Hamilton Center for Kidney Research, McMaster University, Hamilton, Canada.
129
Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, USA.
130
Universidad de La República, Montevideo, Uruguay. 131University of Cologne, Cologne, Germany. 132Johnson
University, Knoxville, USA. 133Indiana University, Bloomington, USA. 134School of Medicine, University of
Connecticut, Farmington, USA. 135Stellenbosch University, Stellenbosch, South Africa. 136University of Calabria,
Arcavacata, Italy. 137Michigan State University, East Lansing, USA. 138University of Washington, Washington, USA.
139
Weill Cornell Medicine-Qatar, Ar‑Rayyan, Qatar. 140Gachon University, Seongnam, South Korea. 141Florida
International University, Miami, USA. 142California Institute of Technology, Pasadena, USA. 143Boston Children’s
Hospital, Boston, USA. 144Saskatchewan Cancer Agency and University of Saskatchewan, Saskatchewan, Canada.
145
Sino-American Cancer Foundation, Covina, USA. 146Leiden University, Leiden, The Netherlands. 147Rutgers
University, Newark, USA. 148Core Research Laboratory, ISPRO, Florence, Italy. 149Caltech, Pasadena, USA.
150
University of Alberta, Edmonton, USA. 151Montreal Heart Institute and Université de Montréal, Montreal,
Canada. 152McGill University, Montreal, Canada. 153Antwerp University, Antwerp, Belgium. 154University of Bari
Aldo Moro, Bari, Italy. 155Alma Mater Studiorum-University of Bologna, Bologna, Italy. 156University of Toledo,
Toledo, USA. 157International Institute of Molecular and Cell Biology in Warsaw, Warsaw, Poland. 158Infection
Medicine, University of Edinburgh The Chancellor’s Building, Edinburgh, UK. 159Emory University, Atlanta, USA.
160
University of Pennsylvania, Philadelphia, USA. 161Helmholtz Zentrum München, Munich, Germany.
162
Washington University School of Medicine, St. Louis, USA. 163CENUR Litoral Norte, Universidad de La República,
Montevideo, Uruguay. 164University of Southampton, Southampton, UK. 165Centre for Cancer Biology, University
of South Australia, Adelaide, Australia. 166University of Missouri, Columbia, USA. 167Baylor College of Medicine,
Houston, USA. 168Yale University, New Haven, USA. 169Reno School of Medicine, University of Nevada, Reno, USA.
170
University of Manitoba and CancerCare Manitoba, Winnipeg, Canada. 171Goethe University Frankfurt, Frankfurt,
Germany. 172Oklahoma Medical Research Foundation/Oklahoma City VA Medical Center, Oklahoma City, USA.
173
Oklahoma Medical Research Foundation, Oklahoma City, USA. 174Department of Biomolecular Sciences, School
of Pharmaceutical Sciences of Ribeirão Preto, University of São Paulo, Ribeirão Preto, SP, Brazil. 175Ulsan National
Institute of Science and Technology, Ulsan, South Korea. 176University of New Mexico Comprehensive Cancer
Center, Albuquerque, USA. 177Ulsan National Institute of Science and Technology (UNIST), Ulsan, South Korea.
178
University of Aberdeen, Aberdeen, UK. 179University of Turin, Turin, Italy. 180Universidad de La República, CenUR
LN, Montevideo, Uruguay. 181Navarrabiomed-IdiSNA, Pamplona, Spain. 182Independent, Los Angeles, USA.
183
Vanderbilt University Medical Center, Nashville, USA. 184University of Hyderabad, Hyderabad, India.
185
University of Sydney, Sydney, Australia. 186City of Hope Medical Center, Duarte, USA. 187Weill Cornell Medicine,
New York, NY 10065, USA. 188University of Toledo College of Medicine and Life Sciences, Toledo, USA. 189School of
Medicine, Wayne State University, Detroit, USA. 190University of Georgia, Athens, USA. 191The Hospital for Sick
Children, Toronto, Canada. 192United States Department of Agriculture, Agricultural Research Service (USDA-
ARS), Washington, DC, USA. 193McMaster University, Hamilton, Canada. 194University of California, Riverside,
Riverside, USA. 195The University of Western Australia, Perth, Australia. 196The University of Connecticut, Storrs,
USA. 197Duke University School of Medicine, Durham, USA. 198University of Nebraska-Lincoln, Lincoln, USA.
199
Sungshin University, Seoul, South Korea. 200University of Adelaide, Adelaide, Australia. 201University Toronto,
Toronto, Canada. 202University of Texas Southwestern Medical Center, Dallas, USA. 203Centro Nacional de
Biotecnologia/CSIC, Madrid, Spain. 204Centro de Investigación Médica Aplicada, Pamplona, Spain. 205Centro de
Investigación Médica Aplicada, Universidad de Navarra, Pamplona, Spain. 206University of Wisconsin-Madison,
Madison, USA. 207Indiana University School of Medicine, Indianapolis, USA. 208University of Montana, Missoula,
USA. 209Brunel University London, London, UK. 210Chapman University, Orange, USA. 211Weill Cornell Medicine
Qatar, Ar‑Rayyan, Qatar. 212Université du Québec À Montréal, Montreal, Canada. 213National Taiwan University,
Taipei, Taiwan. 214Rhodes College, Memphis, USA. 215Harvard School of Public Health, Boston, USA. 216University
of Central Florida, Orlando, USA. 217University of Rochester, Rochester, USA. 218George Mason University, Fairfax,
USA. 219University of Oulu, Oulu, Finland. 220Instituto Investigación Sanitaria La Fe, Valencia, Spain. 221Ludwig-
Maximilians-University, Munich, Germany. 222Brandeis University, Waltham, USA. 223Universidad de La República,
CENUR Litoral Norte, Montevideo, Uruguay. 224Arabian Gulf University, Manama, Bahrain. 225University of
Cambridge, Cambridge, UK. 226University of Magna Graecia, Catanzaro, Italy. 227Massachusetts General Hospital,
Boston, USA. 228University of Missouri-Columbia, Columbia, USA. 229Marquette University, Milwaukee, USA.
230
University of North Dakota, Grand Forks, USA. 231Simon Fraser University, Burnaby, Canada. 232CancerCare
Manitoba Research Institute (CCMR), Winnipeg, Canada. 233The University of Queensland, Brisbane, Australia.
234
University of Manitoba and CancerCare Manitoba Research Institute, Winnipeg, Canada. 235Sanford Burnham
Prebys, La Jolla, USA. 236University of Amsterdam, Amsterdam, The Netherlands. 237UConn Health, Farmington,
USA. 238The University of Texas MD Anderson Cancer Center, Houston, USA. 239Walter Sisulu University, Mthatha,
South Africa. 240Virginia Tech, Blacksburg, USA. 241University of Houston, Houston, USA. 242Rutgers University,
New Brunswick, USA. 243University of Nevada, Reno, USA. 244Universidad de León, León, Spain. 245School of
Medicine, Case Western Reserve University, Cleveland, USA. 246School of Medicine, UConn Health, Farmington,
USA. 247University of Rochester Medical Center, Rochester, USA. 248Chapman University School of Pharmacy,
Irvine, USA. 249University of Winnipeg/St. Boniface Research Centre, Winnipeg, Canada. 250Fred Hutchinson Cancer

Scientific Reports | (2024) 14:7526 | https://doi.org/10.1038/s41598-024-54655-z 15

Vol.:(0123456789)
www.nature.com/scientificreports/

Center, Seattle, USA. 251Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Saarbrücken, Germany.
252
Technical University of Denmark, Kongens Lyngby, Denmark. 253The City College of New York, New York, USA.
254
Children’s Cancer, Therapy Development Institute (Cc-TDI), Beaverton, USA. 255Soka University, Hachioji, Japan.
256
Institute of Clinical Physiology, National Research Council, Pisa, Italy. 257Charles Sturt University, Bathurst,
Australia. 258Washington University, St Louis, USA. 259University of Dundee, Dundee, UK. 260Montreal Heart
Institute, Montreal, Canada. 261Columbia University Vagelos College of Physicians and Surgeons, Columbia, USA.
262
Georgetown University, Washington, USA. 263Duke University, Durham, USA. 264Center for Applied Medical
Research, University of Navarra, Pamplona, Spain. 265King’s College London, London, UK. 266Precision Vaccines
Program, Division of Infectious Diseases, Boston Children’s Hospital, Boston, USA. 267Fred Hutchinson Cancer
Research Center, Seattle, USA. 268Louisiana State University, Baton Rouge, USA. 269Massey University, Palmerston
North, New Zealand. 270Wake Forest University School of Medicine, Winston‑Salem, USA. 271Central South
University, Changsha, China. 272SUNY Upstate Medical University, Syracuse, USA. 273University of Oxford, Oxford,
UK. 274Goethe-University, Frankfurt, Frankfurt, Germany. 275Institute for Cardiovascular Prevention (IPEK),
Ludwig-Maximilians-Universität München, Munich, Germany. 276Harvard T.H. Chan School of Public Health,
Boston, USA. 277School of Medicine, Boston University, Boston, USA. 278University of Texas Health Science Center
at San Antonio, San Antonio, USA. 279University of Helsinki, Helsinki, Finland. 280Wadsworth Center, NYSDOH,
Albany, USA. 281The University of Sydney, Sydney, Australia.

Scientific Reports | (2024) 14:7526 | https://doi.org/10.1038/s41598-024-54655-z 16

Vol:.(1234567890)

You might also like