Computational Prediction of Binding Affinity For CDK2ligand Complexes A Protein Target For Cancer Drug Discovery
Computational Prediction of Binding Affinity For CDK2ligand Complexes A Protein Target For Cancer Drug Discovery
net
Current Medicinal Chemistry, XXXX, XX, XX-XX 1
REVIEW ARTICLE
1
Western Michigan University, 1903 Western, Michigan Ave, Kalamazoo, MI 49008, United States;
2
Pontifical Catholic University of Rio Grande do Sul (PUCRS). Av. Ipiranga, 6681 Porto Alegre/RS 90619-
900 Brazil; 3Specialization Program in Bioinformatics. Pontifical Catholic University of Rio Grande do Sul
(PUCRS). Av. Ipiranga, 6681 Porto Alegre/RS 90619-900 Brazil
Abstract: Background: CDK2 participates in the control of eukaryotic cell-cycle progression. Due to
the great interest in CDK2 for drug development and the relative easiness in crystallizing this enzyme,
we have over 400 structural studies focused on this protein target. This structural data is the basis for
the development of computational models to estimate CDK2-ligand binding affinity.
Objective: This work focuses on the recent developments in the application of supervised machine
learning modeling to develop scoring functions to predict the binding affinity of CDK2.
ARTICLE HISTORY
Method: We employed the structures available at the protein data bank and the ligand information ac-
Received: March 25, 2021 cessed from the BindingDB, Binding MOAD, and PDBbind to evaluate the predictive performance of
Revised: June 15, 2021 machine learning techniques combined with physical modeling used to calculate binding affinity. We
Accepted: June 22, 2021
compared this hybrid methodology with classical scoring functions available in docking programs.
DOI:
10.2174/0929867328666210806105810
Results: Our comparative analysis of previously published models indicated that a model created using
a combination of a mass-spring system and cross-validated Elastic Net to predict the binding affinity of
CDK2-inhibitor complexes outperformed classical scoring functions available in AutoDock4 and Au-
toDock Vina.
Conclusion: All studies reviewed here suggest that targeted machine learning models are superior to
classical scoring functions to calculate binding affinities. Specifically for CDK2, we see that the com-
bination of physical modeling with supervised machine learning techniques exhibits improved predic-
tive performance to calculate the protein-ligand binding affinity. These results find theoretical support
in the application of the concept of scoring function space.
Keywords: chemical space, physical modeling, CDK2, scoring function space, drug design, crystal structure, ma-
chine learning
the structural information generated by crystallography Statistical Analysis of Docking Results and Scoring
[5, 9]. Functions (SAnDReS) [44, 45], Tool to Analyze the
Studying the physical basis of intermolecular inter- Binding Affinity (Taba) [46, 47], Pafnucy [48], proper-
actions in protein systems, we know that the key as- ty-encoded shape distributions together with standard
pects defining the molecular recognition process in- support vector machine (PESD-SVM) [49], Neural-
volve van der Waals contacts [10], electrostatic interac- Network-Based Scoring function (NNScore series) [50-
tions [2, 11], hydrogen bonding [12], and entropy [13]. 52], and Random Forest Score (RF-Score series) [53-
The most robust theoretical approach to calculate the 57].
energetics of intermolecular interaction is the applica- In this review, we describe recent applications of
tion of quantum mechanics [14-21], where the intermo- machine learning methods to estimate the binding af-
lecular interactions can be evaluated with precision finity of ligands against targets. These computational
[22]. Quantum-mechanics approaches have been suc- methods use experimental protein-ligand structures for
cessful in drug discovery applications using protein- which binding data is available. The synergism of crys-
ligand docking simulations and scoring function devel- tal data and machine learning techniques paved the way
opment [17]. to explore the scoring function space [9, 58, 59], which
Considering the potential of quantum mechanics for establishes a theoretical framework to address the chal-
drug discovery, we may highlight that in the future, the lenging studies of protein-drug interactions. The devel-
application of quantum computing methodologies and opment of a theoretical basis to address the creation of
supervised-machine learning software to drug discov- targeted scoring functions is the solid basis necessary
ery will generate few false-positive leads in the appli- to fortify the computational models designed for spe-
cation of docking screens for drug discovery [23]. In cific protein targets, making them much more than for-
the opposition to quantum mechanics methods, we may tuitous statistical models to predict a biology response.
approach intermolecular interactions of a drug and a This scenario makes it clear that the study of com-
macromolecule through molecular dynamics simula- plex systems found in cells targeted by drugs is viable
tions of protein-ligand complexes [24-29]. Besides to an abstraction brought about by the concept of scor-
quantum mechanics and molecular dynamics, we may ing function space [9, 58, 59]. Here, we focus on the
also address protein-ligand interactions through the application of machine learning methods to crystallo-
training of machine learning models targeted to specif- graphic structures of cyclin-dependent kinase 2
ic protein systems. (CDK2). This protein target has experimental infor-
In this scenario, the study of intermolecular interac- mation for three-dimensional structures and the binding
tions through the combination of protein-ligand dock- affinity, which makes this system ideal for the devel-
ing simulations and machine learning methods to de- opment of targeted-scoring functions through the appli-
velop targeted scoring functions has shown the poten- cation of machine learning techniques.
tial of generating robust computational models to pre- 2. METHODS
dict binding affinity [30-32]. These approaches are also
The PDB has recently reached over 175,000 struc-
adequate to assess the structural features responsible
tures (search carried out on March 24, 2021). This
for the molecular recognition process. This type of in-
amount of structural information adds to the experi-
tegration of structural data and machine learning tech-
mental data about binding affinity available at Bind-
niques has been successfully applied to a wide range of
ingDB [60, 61], Binding MOAD (Mother of All Data-
protein targets, such as cyclin-dependent kinases (EC
bases) [62-64], and PDBbind [65, 66]. These three da-
2.7.11.22) [33, 34], proteases [35-38], and more recent-
tabases are integrated at the PDB, which allows us to
ly to SARS-CoV-2 drug targets [39-43].
perform searches to recover structures for which bind-
In recent years, due to the availability of machine ing affinity or thermodynamic parameters are known.
learning methods implemented in libraries using Py-
To highlight the recent progress in the application of
thon and R programming languages, we have witnessed
machine learning techniques, we describe computa-
a great number of computational tools devoted to gen-
tional models to predict the affinity of ligands for
erating models to calculate affinity based on the atomic
CDK2. To focus on previously published results of this
coordinates of protein-ligand complexes. Among the
protein class, we bring a recent update in the number of
recently published machine learning programs to esti-
structures for which experimental binding affinity data
mate binding affinity or thermodynamic parameters,
is available.
we may highlight the following computational tools:
Computational Prediction of Binding Affinity Current Medicinal Chemistry, XXXX, Vol. XX, No. XX 3
2.1. CDK2Ki Dataset docking simulations were carried out, the binding affin-
ity calculation was based exclusively on the atomic
We considered only CDK2 crystal structures for
coordinates of the crystallographic structures.
which experimental inhibition constant data is availa-
ble. We updated a recently published CDK dataset 2.3. Combining Physical Modeling with Machine
[46], where we have not only CDK2 but also CDK9. Learning
We eliminated CDK9 data (search carried out on
The program Taba (version 1.0) estimates ligand
March 24, 2021). We show the selected PDB access
binding affinity based on an approach that models pro-
codes in Table 1. Supplementary material 1 brings the
tein-ligand interactions as a mass-spring system [46,
CDK2 inhibitor structures for all entries in the
47]. In this method, we consider that the key determi-
CDK2Ki dataset.
nants for protein-ligand binding affinity are already
Table 1. PDB access codes of the CDK2Ki dataset. registered in the three-dimensional structure of the
complexes and estimate the energy of the system using
Type of Dataset PDB Access Codes a polynomial equation where each term (independent
1E1X,1H1S,1OGU,1PXN,1PXP,2CLX, variable) of this expression considers an isolate mass-
2EXM,2FVD,3LFN,4ACM,4BCK, spring system composed of a potential equation for a
Training set pair of atoms.
4BCM,4BCN,4BCO,4BCP,4BCQ,4EOP,
4FKO,4NJ3,5D1J Taba scoring function relies on simple modeling of
1E1V,1JSV,1PXM,1PXO,1PYE,1V1K, protein-ligand energetics. We consider protein-ligand
Test set interaction as a mass-spring system, as delineated in
2XMY,2XNB,3LFS
Fig. (1). In Fig. (1), we see that the energetics of the
intermolecular interactions are imprinted in the three-
As previously highlighted, we find binding affinity dimensional structure and model this complex net of
data for CDK2 in the BindingDB [57, 58], Binding interactions as independent mass-spring systems. In
MOAD (Mother of All Databases) [59-61], and this representation, the energetics of the protein-ligand
PDBbind [62, 63]. The data about IC50 relies on a wide complex is the summation of each type of pair of atoms
range of techniques to evaluate the binding. On the found in the structure. We express the potential energy
other hand, Ki focuses on a smaller set of experimental of the protein system (V) by the following equation,
techniques, but there is no uniform method to address !
the ligand binding to CDK2 [46]. One possible poten- V x, y, z = ! ! ω!,! d!,! − d!,!,! (1)
tial technique to generate a more reliable experimental In Equation (1), ωi,j is the weight of each independ-
approach to calculate the binding would be to address ent variable. We determine these weights through the
the energetics of the CDK2-ligand interactions using application of machine-learning techniques. The double
isothermal microcalorimetry (ITC). Unfortunately, the summation in equation (1) is taken over all protein (i)
experimental data about Gibbs free energy of binding and ligand atoms (j) inside a defined volume of the
for CDK2-ligand complexes using ITC is scarce [30, structure. The term d0,i,j is the average interatomic dis-
31]. Due to these challenges, we focused our analysis tance for a given pair of atoms i and j, which is calcu-
on previously published machine learning modeling of lated for all structures in the training set. The program
Ki data. We eliminated repeated ligands and for CDK2- Taba calculates the terms (ωi,j, and d0,i,j), taking all
ligand complexes with more than one source of binding structures in the training set. The term di,j is the Euclid-
affinity, we chose the most recent published results. ean distance for a pair of atoms for one specific struc-
2.2. Classical Scoring Functions ture (not averaged for all structures) [46].
Taba reads the coordinates of all structures in a da-
We calculated binding affinity using the atomic co-
taset and calculates the average distance involving the
ordinates of the protein-ligand complexes available in
atoms in the protein (P) and the ligand (L). In this ap-
the CDK2Ki dataset employing the classical scoring
proach, we have average distances for different types
functions implemented in the docking programs Auto-
of pairs of atoms, one for carbon (P)-carbon (L), anoth-
Dock4 [67, 68] and AutoDock Vina (version 1.1.2)
er for oxygen (P)-nitrogen (L), and so on. In each pair
[69]. Ligand and protein atomic partial charges were
of atoms (PL), we take one atom from the ligand and
assigned using the Partial Equalization of Orbital Elec-
the second from the protein. Taba considers these aver-
tronegativities (PEOE) algorithm [70] employing Au-
age interatomic distances as the equilibrium distances of
toDockTools4 [67] (version 1.5.6). No protein-ligand
4 Current Medicinal Chemistry, XXXX, Vol. XX, No. XX Veit-Acosta and de Azevedo Junior
our mass-spring system and determines the relative computational tool [44, 45]. SAnDReS makes use of
weights of each energy term using supervised machine supervised machine learning methods available in the
learning techniques [71]. In the final model developed scikit-learn library [71] to generate polynomial empiri-
using Taba, we keep only the most relevant energy terms. cal scoring functions to predict binding affinity [73-
In this review, we describe the machine learning models 76]. These polynomial equations employ the energy
[72] developed for the CDK2Ki dataset. Taba takes an terms calculated using the previously highlighted pro-
elegant combination of physical modeling with super- tein-ligand docking programs [67-69] and the crystal-
vised machine learning techniques to address protein- lographic coordinates of protein and ligand. SAnDReS
ligand interactions. Fig. (2) outlines a schematic flowchart applies the same k-fold cross-validation approach de-
with the major steps of the Taba methodology [46]. scribed for the program Taba.
terms available in docking programs. On the other opment of anticancer drugs [80-83]. A search on clini-
hand, Taba employs physical modeling of the intermo- caltrials.gov using as keywords CDK2 and cancer re-
lecular interactions. SAnDReS and Taba have the fol- turned 11 trials, including six which are either recruit-
lowing supervised machine learning methods taken ing or active (search carried out on March 24, 2021).
from scikit-learn [71]: Ridge, Lasso, Elastic Net, and Among the CDK inhibitors identified so far, we may
Ordinary Linear Regression. For the first three meth- highlight the FDA-approved drug palbociclib, which
ods, we have an additional option with cross- can treat postmenopausal women with breast cancer
validation. Taken together, we have seven supervised [84-91].
machine-learning techniques in each program. Considering the filtered CDK2Ki dataset [46],
2.6. Statistical Analysis where we removed the data related to CDK9 and elim-
inated ligands for which information about binding af-
We assessed the predictive power of the scoring
finity from the PDB showed inconsistencies in the in-
functions calculating the correlation coefficients [77],
formation associated with the PDBBind, BindingDB,
p-values, and root-mean-squared error (RMSE) be-
and Binding MOAD, we ended up with 29 structures.
tween the experimental data and the predicted binding
These inconsistencies are related to different values of
affinity determined using the classical scoring func-
binding affinity for the same ligand.
tions [67-71], the empirical polynomial scoring func-
tions developed using SAnDReS [44, 45], and the Taba In the CDK2Ki dataset, all crystallographic struc-
mass-spring models [46, 47]. We generated the ma- tures have competitive inhibitors non-covalently bound
chine learning models using approximately 70 % of the to the ATP-binding pocket of CDK2, with resolution
structures in the CDK2Ki dataset (training set) and ~30 ranging from 1.55 to 2.8 Å. The CDK2 has two do-
% of the dataset as a test set as suggested by Cichero et mains with the N-terminal composed of a distorted be-
al. 2010 [78]. ta-sheet and the C-terminal made mostly of alpha-
helical structures as indicated in Fig. (3). The ATP-
Taba uses four significant figures to express the in-
binding pocket of CDK2 lies between the two domains.
teratomic distances to model protein systems. With this
All competitive inhibitors bind to this pocket. Calcula-
number of significant figures for distances, we have
tion of the volume of this binding site using Molegro
values with 1/1000 of Å as adopted in the PDB to ex-
Virtual Docker (MVD) (version 6) [92-95] and a probe
press atomic coordinates of macromolecular structures
with a radius of 1.2 Å indicated a volume of 201.728
[6, 7]. For interatomic distances, the X-ray diffraction
Å3, which allows a wide range of different ligand struc-
crystallographic resolution is not the associated error in
tures to fit into this volume [82, 83].
the atomic coordinates. These errors are not necessarily
in the range of 0.001 Å for the atomic coordinates and In Fig. (4), we have the binding pocket of the struc-
distances, but using statistical analysis of experimental ture of CDK2 in complex with roscovitine [96], where
X-ray diffraction data and the final structure model we highlight the two main residues of CDK2 participat-
such as the Luzzati plot, we have an associated error in ing in intermolecular interactions. Previously published
the range 0.2 Å for a CDK2 with a resolution of 2.4 Å. intermolecular contact analyses of the residues partici-
For log (Ki), we adopted two significant figures, taking pating in interactions involving inhibitors and the ATP-
the experimental data for Ki from the binding databases binding pocket indicated the participation of main-
[60-66]. chain oxygen and nitrogen atoms of Leu 83 and Glu 81
of CDK2 in most complexes with high specific CDK2
3. RESULTS AND DISCUSSION
inhibitors [97-112].
3.1. Biological System We have 415 structures of CDK2 deposited in the
In this review, we focus on the application of ma- PDB (search carried out using UniProt Molecule Name
chine learning techniques to predict the binding affinity as cyclin-dependent kinase 2 on March 24, 2021).
of ligands against structures of CDK2. This protein Among these structures, 212 entries have validated in-
comprises an interesting biological system for the de- hibitors bound to the ATP-binding pocket of CDK2.
velopment of scoring functions for two main reasons. Most of these ligands have data about IC50 and the mi-
Firstly, the availability of crystallographic structures nority about Ki. Analysis of the intermolecular interac-
for which binding affinity data is known. Secondly, it tions of these complex structures indicated that most of
is due to the importance of CDK2 for drug discovery the inhibitors show intermolecular hydrogen bonds in-
and development [79]. CDK2 is a target for the devel- volving main-chain nitrogen and oxygen of Leu 83 and
Glu 81, forming a sequence of spots for the binding of
6 Current Medicinal Chemistry, XXXX, Vol. XX, No. XX Veit-Acosta and de Azevedo Junior
inhibitors named the molecular fork [82]. In Fig. (4), 0.358 with a p-value of 0.12. Analysis of the predictive
we see two of these intermolecular hydrogen bonds power of each energy term used in the AutoDock4
with the participation of main-chain atoms of the resi- scoring function generated ρ ranging from -0.348 to
due Leu 83. 0.359, all with p-values > 0.1. Analysis of the predic-
tive performance for the structures in the test set pro-
duced ρ ranging from -0.183 to 0.367, all with p-values
> 0.1. Supplementary materials 2 and 3 bring the pre-
dicted and experimental binding affinities for all struc-
tures in the CDK2Ki training and test sets, respectively.
Assessment of the binding affinity of the structures
in the training set using a full scoring function and en-
ergy terms available in AutoDock Vina generated ρ
ranging from -0.171 to 0.224, with p-values > 0.1. The
Fig. (3). Structure of human CDK2 in complex with the in- highest correlation was obtained for the full scoring
hibitor roscovitine (PDB access code: 2A4L). The roscovit- function of AutoDock Vina. Analysis of the correlation
ine is indicated with thick lines and the ribbons represent the for the structures in the test set showed ρ ranging from
protein structure. We used the program MVD (version 6)
-0.417 to 0.117, with p-values > 0.1. Supplementary
[92-95] to generate this figure (A higher resolution / colour
version of this figure is available in the electronic copy of the materials 4 and 5 have the binding affinities calculated
article). using AutoDock Vina for all structures in the CDK2Ki
training and test sets, respectively.
The predictive performance of both classical scoring
functions is poor. One possible reason for this failure in
predicting binding affinity using classical scoring func-
tions is the methodology applied in the creation of the-
se computational models. We may highlight that most
of the classical scoring functions use energy terms for
van der Waals, electrostatic energy, hydrogen bonding,
and solvation effects and then determine the relative
weight of each energy term based on the regression
method [1, 119-124]. Such an approach creates a model
Fig. (4). Intermolecular hydrogen bonds of the inhibitor ros- bias against the structures not employed in the training
covitine (RRC) with the residue Leu 83 of the CDK2. set so that these computational models to predict bind-
Dashed lines indicate hydrogen bonds. On the right, we have
the structure of the residue Glu 81 that participates in inter-
ing affinity are prone to work for proteins present in the
molecular hydrogen bonds in other CDK2-inhibitor struc- training set used to determine the relative weights of
tures. We used the program MVD (version 6) [92-95] to each energy term in the empirical scoring function. On
generate this figure (A higher resolution / colour version of the other hand, protein systems not present in the origi-
this figure is available in the electronic copy of the article). nal training set or poorly represented in it could be out
of the scope of the classical scoring function [44, 45],
which generates a low correlation with experimental
3.2. Binding Affinity with Classical Scoring Func- data as observed for the structures in the CDK2Ki da-
tion taset.
AutoDock4 and AutoDock Vina have been success- 3.3. Binding Affinity with SAnDReS
fully applied to different protein targets to identify po-
tential hits. Their scoring functions are fully described SAnDReS aims to integrate all necessary steps to
in the following references [113-115]. On the other create machine learning models in one suite of pro-
hand, the evaluation of binding affinity using the avail- grams [44]. SAnDReS has been applied to a wide range
able scoring functions in these programs was not relia- of different protein systems [125-143] and has success-
ble [46, 47, 116-118]. Application of AutoDock4 to fully generated machine learning models that outper-
predict binding affinity using the crystallographic posi- form classical scoring functions in the prediction of
tion of the ligands in the CDK2Ki (training set) gener- binding affinity [45, 144-151].
ated a Spearman rank correlation coefficient (ρ) of
Computational Prediction of Binding Affinity Current Medicinal Chemistry, XXXX, Vol. XX, No. XX 7
Application of the machine learning methods of generate a targeted scoring function where the inde-
SAnDReS to the structures in the training set and the pendent variables are mass-spring micro-systems com-
energy terms calculated using the program AutoDock4 posed of pairs of atoms. A previously published study
generated polynomials equations with three independ- using this approach to generate a computational model
ent variables (features in the machine learning termi- calibrated for CDK structures [46, 47] was able to pre-
nology), which gives more than six observations for dict binding affinity with superior performance com-
each independent variable. These polynomial scoring pared with classical scoring functions.
equations are described elsewhere [125-143]. The ratio In the present study, we focused on a previously de-
of five observations (structures) per independent varia- scribed CDK machine learning model [46] filtered for
ble (or descriptor) is the minimum requested by the CDK2 structures to create a CDK2-targeted scoring
rule of thumb recommended for regressions models function. In this model, we deleted the CDK9 data to
[152, 153]. Amongst the seven machine learning meth- have only CDK2 structures. We used a polynomial
ods available in SAnDReS, the Elastic Net with cross- equation with three independent variables taking as
validation showed the best overall predictive perfor- energy terms the contributions of the following pairs of
mance. The highest correlation model has an ρ = 0.319, atoms: C-C, C-S, and O-O. Each independent variable
a p-value of 0.17, and an RMSE = 1.1, a correlation is a mass-spring potential energy function. The equilib-
worse than the classical scoring functions. The correla- rium distances for each pair of atoms were calculated
tion for the structures in the test set was also poor, with using the average distance taking all structures in the
an ρ = -0.183, p-value = 0.64, and an RMSE of 1.7. training set. We tested seven machine learning ap-
Fig. (5a) shows the scattering plot for the test set struc- proaches available in Taba, and the Elastic Net with
tures. The polynomial equation for the predicted affini- cross-validation also showed the highest correlation
ty (PA) generated using SAnDReS is shown in equa- between experimental and predicted affinities. The ma-
tion (2), chine learning model determined using Taba has the
PA (model 1) = −6.5 − 0.0416(Final Intermolecular Energy) + following expression (equation (4)),
0.0416(vdW + Hbond + Desolvation Energy) +
0.192(Final Total InternalEnergy) (2) PA (model 3) = ω! + ω! (d!,! − d!,!,! )! + ω! (d!,! −
d!,!,! )! + ω! (d!,! − d!,!,! )! (4)
In equation (2), the variables vdW and Hbond rep-
resent van der Waals and hydrogen bond energy terms, In equation (4), the weights are the following: ω0 = -
respectively. A detailed description of the expression of 6.6, ω1 = 0.132, ω2 = 0.461, and ω3 = 0.226. The equi-
each energy term is available elsewhere [67]. librium distances have the following values: d0,C,C =
4.078 Å, d0,C,S = 4.120 Å, and d0,O,O = 3.663 Å.
Using the same approach for the energy terms cal-
culated using AutoDock Vina, we have the highest cor- For the training set, the Taba model has a correla-
relation model with an ρ = 0.537, a p-value of 0.015, tion ρ = 0.750 with a p-value = 0.0001 and RMSE =
and RMSE = 1.0. The expression of this machine learn- 5.1. For the test set, we have ρ = 0.817 with a p-value =
ing model is in equation (3), as follows, 0.007 and a RMSE = 5.7. Supplementary materials 6
PA (model 2) = −3.2 − 0.00288(Gauss2) + 0.00982(Repulsion) +
and 7 bring the predicted binding affinity for the train-
0.0220(Hydrophobic) (3) ing and test sets, respectively. In Fig. (5c) we have the
scattering plot for the test set structures.
The descriptions for the energy terms in equation
(3) are presented elsewhere [69]. Analysis of this poly- Considering the correlation, the Taba model showed
nomial model against the structures in the test indicated the best predictive performance, compared with the
an ρ = 0.067, a p-value of 0.86, and an RMSE = 2.0. In classical scoring functions and the SAnDReS machine
Fig. (5b), we have the scattering plot for the test set learning models. Nevertheless, the RMSE values for
structures. Although the model generated using the en- training and test sets are relatively high for the Taba
ergy terms of the AutoDock Vina showed some prom- model, over 5.0. RMSE values of the SAnDReS mod-
ising results for the training set, the evaluation against els were all below 2.1. This might be due to the sim-
the test set showed poor predictive power. plicity of the mass-spring approach to protein-ligand
interactions.
3.4. Binding Affinity with Taba
Taking the classical scoring functions and the three
Taba addresses protein-drug interactions as a mass- machine learning models, we may say that these mod-
spring system and combines it with an integrated appli- els show some potential but failed in at least one key
cation of supervised machine learning techniques to aspect of the statistical analysis of the predictive per-
8 Current Medicinal Chemistry, XXXX, Vol. XX, No. XX Veit-Acosta and de Azevedo Junior
Fig. (5). Scatter plots for predicted and experimental binding affinities. a) PA generated using energy terms available in Auto-
Dock4 with machine learning modeling performed with SAnDReS (model 1) (ρ = -0.183, p-value = 0.64, and RMSE = 1.7). b)
PA generated using energy terms available in AutoDock Vina with regression modeling carried out with SAnDReS (model 2)
(ρ = 0.067, p-value = 0.86, and RMSE = 2.0). c) Mass-spring model generated using Taba (model 3) (ρ = 0.817, p-value =
0.007, and RMSE = 5.7). d) Machine learning model involving the three previous models performed using SAnDReS (model
4) (ρ = 0.733, p-value = 0.03, and RMSE = 1.3). We used cross-validated Elastic Net to generate all machine learning models.
We used the program SAnDReS [44] to generate all plots in this figure.
formance. Using the suite of programs SAnDReS, we 1.3. In Fig. (5d), we have the scattering plot for the test
developed a novel scoring function (model 4) consider- set structures. The correlation for the training set is the
ing the models generated using terms from AutoDock4 same and for the test set, we have a worse result, when
(model 1), AutoDock Vina (model 2), and Taba (model compared with model 3. Taking the RMSE, we observe
3). Applying the cross-validated Elastic Net method a significant improvement of model 4 in the training
taking the previously generated models, we have the and test sets. This progress in model 4 is due to the ad-
following expression (equation (5)), dition of terms for electrostatics, desolvation, and hy-
PA model 4 = − 0.36 + 0.770 PA(model 1) + drogen bonding, not present in model 3.
0.0931 PA(model 2) + 0.200 PA(model 3) (5) These differences in the predictive performance of
Taking the training set, the new machine learning the machine learning models should always be consid-
model (model 4) has a ρ = 0.750 with a p-value = ered in the context where we applied it and keeping in
0.0001, and an RMSE = 0.7. For the test set, we have mind the limitations of the training sets for protein sys-
an ρ = 0.733 with a p-value = 0.03, and an RMSE = tems. We chose to focus on structures for which exper-
imental data for atomic coordinates and inhibition con-
Computational Prediction of Binding Affinity Current Medicinal Chemistry, XXXX, Vol. XX, No. XX 9
stants are available. This criterion limits the ratio ob- atomic coordinates of protein-ligand complexes. We
servations per independent variable but creates ma- apply machine learning methods to identify an ade-
chine learning models strictly based on robust experi- quate function to predict binding affinity for an element
mental information. Also, although we have a poor ra- of the protein space considering the relation with a sub-
tio of observations per independent variable from the set of the chemical space.
machine learning point of view, considering the criteria Fig. (6) illustrates the relations involving protein,
used for modeling scoring functions to assess binding chemical, scoring function spaces. We consider CDK2
affinitybased on atomic coordinates, we satisfy a well- as an element of the protein space. We highlight a sub-
established rule of thumb [152, 153]. In summary, con- set in the chemical space composed of CDK2 inhibi-
sidering RMSE and p-value, model 4 exhibits the best tors. Then, we may use machine learning approaches to
overall performance for the CDK2Ki dataset. identify a model to predict binding affinity for this en-
3.5. Scoring Function Space zyme [9, 46-54]. With this mathematical abstraction,
we have a solid theoretical background to explain the
The success of the application of targeted-scoring
superior predictive performances of machine learning
functions to predict binding affinity established the ba-
models developed using SAnDReS [44] and Taba [46],
sis for the creation of a mathematical abstraction for
when compared with classical scoring functions. Tar-
the development of computational models to address
geted scoring functions are the results of explorations
protein-ligand interactions [9, 58]. Taking a systems-
of the scoring function space. So, we define their func-
level approach to address this problem, we may inves-
tions for a single protein.
tigate the relation involving the chemical [126, 154-
161] and protein [162, 163] spaces. Defining a subset One way to think about this abstraction is taking the
of the chemical space as formed by the inhibitors of a experimental binding and the crystallographic data
specific enzyme and seeing this protein as an element available for a given protein as a system, where
of the protein space, we may envisage this relation as a through the application of machine learning methods
base to search the scoring function space. This mathe- we create a computational model tailored to this bio-
matical space has all potential computational models logical system. In doing so, we give up to find a gen-
able to predict the binding affinity taking as input the eral scoring function for all proteins; we address this
Fig. (6). Schematic diagram illustrating the relations involving protein, chemical, and scoring function spaces. On the right, we
take an element of the protein space, indicated by the green sphere. This element is the CDK2. Then, we highlight a subset of
the chemical space composed of CDK2 inhibitors. Finally, we apply machine learning techniques to explore the scoring func-
tion space to find an adequate model to predict the binding affinity. We used the program MVD (version 6) [92-95] to generate
the chemical and protein spaces in this figure (A higher resolution / colour version of this figure is available in the electronic
copy of the article).
10 Current Medicinal Chemistry, XXXX, Vol. XX, No. XX Veit-Acosta and de Azevedo Junior
https://doi.org/10.1002/jcc.21334 PMID: 19499576 nescence and cancer. A structural and functional review.
[70] Gasteiger, J.; Marsili, M. Iterative partial equalization of Curr. Drug Targets, 2019, 20(7), 716-726.
orbital electronegativity-a rapid access to atomic charges. http://dx.doi.org/10.2174/1389450120666181204165344
Tetrahedron, 1980, 36(22), 3219-3228. PMID: 30516105
http://dx.doi.org/10.1016/0040-4020(80)80168-2 [82] Levin, N.M.B.; Pintro, V.O.; de Ávila, M.B.; de Mattos,
[71] Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; B.B.; De Azevedo, W.F.Jr. Understanding the structural ba-
Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; sis for inhibition of cyclin-dependent kinases. New pieces
Weiss, R.; Dubourg, V.; Verplas, J.; Passos, A.; Courna- in the molecular puzzle. Curr. Drug Targets, 2017, 18(9),
peau, D.; Brucher, M.; Perrot, M.; Duchesnay, E. 1104-1111.
Scikitlearn: machine learning in python. J. Mach. Learn. http://dx.doi.org/10.2174/1389450118666161116130155
Res., 2011, 12, 2825-2830. PMID: 27848884
[72] Zou, H.; Hastie, T. Regularization and variable selection via [83] de Azevedo, W.F.Jr. Opinion paper: targeting multiple cy-
the elastic net. J. R. Stat. Soc. Series B Stat. Methodol., clin-dependent kinases (CDKs): a new strategy for molecu-
2005, 67(2), 301-220. lar docking studies. Curr. Drug Targets, 2016, 17(1), 2.
http://dx.doi.org/10.1111/j.1467-9868.2005.00503.x http://dx.doi.org/10.2174/138945011701151217100907
[73] de Azevedo, W.F., Jr; Dias, R. Evaluation of ligand-binding PMID: 26687602
affinity using polynomial empirical scoring functions. [84] Pondé, N.; Wildiers, H.; Awada, A.; de Azambuja, E.; De-
Bioorg. Med. Chem., 2008, 16(20), 9378-9382. liens, C.; Lago, L.D. Targeted therapy for breast cancer in
http://dx.doi.org/10.1016/j.bmc.2008.08.014 PMID: older patients. J. Geriatr. Oncol., 2020, 11(3), 380-388.
18829335 http://dx.doi.org/10.1016/j.jgo.2019.05.012 PMID:
[74] Dias, R.; Timmers, L.F.; Caceres, R.A.; de Azevedo, 31171494
W.F.Jr. Evaluation of molecular docking using polynomial [85] Schoninger, S.F.; Blain, S.W. The ongoing search for bi-
empirical scoring functions. Curr. Drug Targets, 2008, omarkers of CDK4/6 inhibitor responsiveness in breast can-
9(12), 1062-1070. cer. Mol. Cancer Ther., 2020, 19(1), 3-12.
http://dx.doi.org/10.2174/138945008786949450 PMID: http://dx.doi.org/10.1158/1535-7163.MCT-19-0253 PMID:
19128216 31909732
[75] Ducati, R.G.; Basso, L.A.; Santos, D.S.; de Azevedo, [86] Yuan, L.; Alexander, P.B.; Wang, X.F. Cellular senescence:
W.F.Jr. Crystallographic and docking studies of purine nu- from anti-cancer weapon to anti-aging target. Sci. China
cleoside phosphorylase from Mycobacterium tuberculosis. Life Sci., 2020, 63(3), 332-342.
Bioorg. Med. Chem., 2010, 18(13), 4769-4774. http://dx.doi.org/10.1007/s11427-019-1629-6 PMID:
http://dx.doi.org/10.1016/j.bmc.2010.05.009 PMID: 32060861
20570524 [87] Frassoldati, A.; Biganzoli, L.; Bordonaro, R.; Cinieri, S.;
[76] de Azevedo, W.F.Jr.; Dias, R. Experimental approaches to Conte, P.; Laurentis, M.; Mastro, L.D.; Gori, S.; Lauria, R.;
evaluate the thermodynamics of protein-drug interactions. Marchetti, P.; Michelotti, A.; Montemurro, F.; Naso, G.;
Curr. Drug Targets, 2008, 9(12), 1071-1076. Pronzato, P.; Puglisi, F.; Tondini, C.A. Endocrine therapy
http://dx.doi.org/10.2174/138945008786949441 PMID: for hormone receptor-positive, HER2-negative metastatic
19128217 breast cancer: extending endocrine sensitivity. Future On-
[77] Zar, J.H. Significance testing of the Spearman rank correla- col., 2020, 16(5), 129-145.
tion coefficient. J. Am. Stat. Assoc., 1972, 67(339), 578- http://dx.doi.org/10.2217/fon-2018-0942 PMID: 31849236
580. [88] Tamura, K. Differences of cyclin-dependent kinase 4/6
http://dx.doi.org/10.1080/01621459.1972.10481251 inhibitor, palbociclib and abemaciclib, in breast cancer. Jpn.
[78] Cichero, E.; Cesarini, S.; Mosti, L.; Fossa, P. CoMFA and J. Clin. Oncol., 2019, 49(11), 993-998.
CoMSIA analyses on 1,2,3,4-tetrahydropyrrolo[3,4- http://dx.doi.org/10.1093/jjco/hyz151 PMID: 31665472
b]indole and benzimidazole derivatives as selective CB2 re- [89] Rozeboom, B.; Dey, N.; De, P. ER+ metastatic breast can-
ceptor agonists. J. Mol. Model., 2010, 16(9), 1481-1498. cer: past, present, and a prescription for an apoptosis-
http://dx.doi.org/10.1007/s00894-010-0664-1 PMID: targeted future. Am. J. Cancer Res., 2019, 9(12), 2821-
20174844 2831.
[79] Wang, S.; Griffiths, G.; Midgley, C.A.; Barnett, A.L.; PMID: 31911865
Cooper, M.; Grabarek, J.; Ingram, L.; Jackson, W.; Kon- [90] Bonelli, M.; La Monica, S.; Fumarola, C.; Alfieri, R. Multi-
topidis, G.; McClue, S.J.; McInnes, C.; McLachlan, J.; ple effects of CDK4/6 inhibition in cancer: from cell cycle
Meades, C.; Mezna, M.; Stuart, I.; Thomas, M.P.; Zheleva, arrest to immunomodulation. Biochem. Pharmacol., 2019,
D.I.; Lane, D.P.; Jackson, R.C.; Glover, D.M.; Blake, D.G.; 170, 113676.
Fischer, P.M. Discovery and characterization of 2-anilino- http://dx.doi.org/10.1016/j.bcp.2019.113676 PMID:
4-(thiazol-5-yl)pyrimidine transcriptional CDK inhibitors as 31647925
anticancer agents. Chem. Biol., 2010, 17(10), 1111-1121. [91] Grizzi, G.; Ghidini, M.; Botticelli, A.; Tomasello, G.;
http://dx.doi.org/10.1016/j.chembiol.2010.07.016 PMID: Ghidini, A.; Grossi, F.; Fusco, N.; Cabiddu, M.; Savio, T.;
21035734 Petrelli, F. Strategies for increasing the effectiveness of
[80] Tadesse, S.; Anshabo, A.T.; Portman, N.; Lim, E.; Tilley, aromatase inhibitors in locally advanced breast cancer: an
W.; Caldon, C.E.; Wang, S. Targeting CDK2 in cancer: evidence-based review on current options. Cancer Manag.
challenges and opportunities for therapy. Drug Discov. To- Res., 2020, 12, 675-686.
day, 2020, 25(2), 406-413. http://dx.doi.org/10.2147/CMAR.S202965 PMID:
http://dx.doi.org/10.1016/j.drudis.2019.12.001 PMID: 32099464
31839441 [92] Thomsen, R.; Christensen, M.H. MolDock: a new technique
[81] Volkart, P.A.; Bitencourt-Ferreira, G.; Souto, A.A.; de for high-accuracy molecular docking. J. Med. Chem., 2006,
Azevedo, W.F. Cyclin-dependent Kinase 2 in cellular se- 49(11), 3315-3321.
http://dx.doi.org/10.1021/jm051197e PMID: 16722650
Computational Prediction of Binding Affinity Current Medicinal Chemistry, XXXX, Vol. XX, No. XX 15
[93] Heberlé, G.; de Azevedo, W.F.Jr. Bio-inspired algorithms olomoucine and isopentenyladenine. Proteins, 1995, 22(4),
applied to molecular docking simulations. Curr. Med. 378-391.
Chem., 2011, 18(9), 1339-1352. http://dx.doi.org/10.1002/prot.340220408 PMID: 7479711
http://dx.doi.org/10.2174/092986711795029573 PMID: [105] Oudah, K.H.; Najm, M.A.A.; Samir, N.; Serya, R.A.T.;
21366530 Abouzid, K.A.M. Design, synthesis and molecular docking
[94] Bitencourt-Ferreira, G.; de Azevedo, W.F.Jr. Molegro vir- of novel pyrazolo[1,5-a][1,3,5]triazine derivatives as CDK2
tual docker for docking. Methods Mol. Biol., 2019, 2053, inhibitors. Bioorg. Chem., 2019, 92, 103239.
149-167. http://dx.doi.org/10.1016/j.bioorg.2019.103239 PMID:
http://dx.doi.org/10.1007/978-1-4939-9752-7_10 PMID: 31513938
31452104 [106] Ikwu, F.A.; Isyaku, Y.; Obadawo, B.S.; Lawal, H.A.;
[95] de Azevedo, W.F.Jr. Moldock applied to structure-based Ajibowu, S.A. In silico design and molecular docking study
virtual screening. Curr. Drug Targets, 2010, 11(3), 327- of CDK2 inhibitors with potent cytotoxic activity against
334. HCT116 colorectal cancer cell line. J. Genet. Eng. Biotech-
http://dx.doi.org/10.2174/138945010790711941 PMID: nol., 2020, 18(1), 51.
20210757 http://dx.doi.org/10.1186/s43141-020-00066-2 PMID:
[96] de Azevedo, W.F.; Leclerc, S.; Meijer, L.; Havlicek, L.; 32930901
Strnad, M.; Kim, S.H. Inhibition of cyclin-dependent kinas- [107] Teng, M.; Jiang, J.; He, Z.; Kwiatkowski, N.P.; Donovan,
es by purine analogues: crystal structure of human cdk2 K.A.; Mills, C.E.; Victor, C.; Hatcher, J.M.; Fischer, E.S.;
complexed with roscovitine. Eur. J. Biochem., 1997, 243(1- Sorger, P.K.; Zhang, T.; Gray, N.S. Development of CDK2
2), 518-526. and CDK5 dual degrader TMX-2172. Angew. Chem. Int.
http://dx.doi.org/10.1111/j.1432-1033.1997.0518a.x PMID: Ed. Engl., 2020, 59(33), 13865-13870.
9030780 http://dx.doi.org/10.1002/anie.202004087 PMID: 32415712
[97] Krystof, V.; Cankar, P.; Frysová, I.; Slouka, J.; Kontopidis, [108] Shawky, A.M.; Abourehab, M.A.S.; Abdalla, A.N.; Gouda,
G.; Dzubák, P.; Hajdúch, M.; Srovnal, J.; de Azevedo, A.M. Optimization of pyrrolizine-based Schiff bases with
W.F.Jr.; Orság, M.; Paprskárová, M.; Rolcík, J.; Látr, A.; 4-thiazolidinone motif: design, synthesis and investigation
Fischer, P.M.; Strnad, M. 4-arylazo-3,5-diamino-1H- of cytotoxicity and anti-inflammatory potency. Eur. J. Med.
pyrazole CDK inhibitors: SAR study, crystal structure in Chem., 2020, 185, 111780.
complex with CDK2, selectivity, and cellular effects. J. http://dx.doi.org/10.1016/j.ejmech.2019.111780 PMID:
Med. Chem., 2006, 49(22), 6500-6509. 31655429
http://dx.doi.org/10.1021/jm0605740 PMID: 17064068 [109] Viegas, D.J.; Edwards, T.G.; Bloom, D.C.; Abreu, P.A.
[98] Canduri, F.; Perez, P.C.; Caceres, R.A.; de Azevedo, Virtual screening identified compounds that bind to cyclin
W.F.Jr. CDK9 a potential target for drug development. dependent kinase 2 and prevent herpes simplex virus type 1
Med. Chem., 2008, 4(3), 210-218. replication and reactivation in neurons. Antiviral Res., 2019,
http://dx.doi.org/10.2174/157340608784325205 PMID: 172, 104621.
18473913 http://dx.doi.org/10.1016/j.antiviral.2019.104621 PMID:
[99] Canduri, F.; de Azevedo, W.F.Jr. Structural basis for inter- 31634495
action of inhibitors with cyclin-dependent kinase 2. Curr. [110] Zhu, J.; Wu, Y.; Xu, L.; Jin, J. Theoretical studies on the
Comput. Aided Drug Des, 2005, 1(1), 53-64. selectivity mechanisms of glycogen synthase kinase 3β
http://dx.doi.org/10.2174/1573409052952233 (GSK3β) with pyrazine ATP-competitive inhibitors by
[100] Canduri, F.; Uchoa, H.B.; de Azevedo, W.F.Jr. Molecular 3DQSAR, molecular docking, molecular dynamics simula-
models of cyclin-dependent kinase 1 complexed with inhib- tion and free energy calculations. Curr. Computeraided
itors. Biochem. Biophys. Res. Commun., 2004, 324(2), 661- Drug Des., 2020, 16(1), 17-30.
666. http://dx.doi.org/10.2174/1573409915666190708102459
http://dx.doi.org/10.1016/j.bbrc.2004.09.109 PMID: PMID: 31284868
15474478 [111] Fassio, A.V.; Santos, L.H.; Silveira, S.A.; Ferreira, R.S.; de
[101] De Azevedo, W.F.Jr.; Mueller-Dieckmann, H.J.; Schulze- Melo-Minardi, R.C. nAPOLI: a graph-based strategy to de-
Gahmen, U.; Worland, P.J.; Sausville, E.; Kim, S.H. Struc- tect and visualize conserved protein-ligand interactions in
tural basis for specificity and potency of a flavonoid inhibi- large-scale. IEEE/ACM Trans. Comput. Biol. Bioinformat-
tor of human CDK2, a cell cycle kinase. Proc. Natl. Acad. ics, 2020, 17(4), 1317-1328.
Sci. USA, 1996, 93(7), 2735-2740. https://doi.org/10.1109/TCBB.2019.2892099 PMID:
http://dx.doi.org/10.1073/pnas.93.7.2735 PMID: 8610110 30629512
[102] Kim, S.H.; Schulze-Gahmen, U.; Brandsen, J.; de Azevedo [112] Zhang, X.; Shi, G.; Wu, X.; Zhao, Y. Gypensapogenin H
Júnior, W.F. Structural basis for chemical inhibition of from hydrolyzate of total Gynostemma pentaphyllum sapo-
CDK2. Prog. Cell Cycle Res., 1996, 2, 137-145. nins induces apoptosis in human breast carcinoma cells.
http://dx.doi.org/10.1007/978-1-4615-5873-6_14 PMID: Nat. Prod. Res., 2020, 34(11), 1642-1646.
9552391 http://dx.doi.org/10.1080/14786419.2018.1525370 PMID:
[103] Schulze-Gahmen, U.; De Bondt, H.L.; Kim, S.H. High- 30470142
resolution crystal structures of human cyclin-dependent ki- [113] Lohning, A.E.; Levonis, S.M.; Williams-Noonan, B.;
nase 2 with and without ATP: bound waters and natural lig- Schweiker, S.S. A practical guide to molecular docking and
and as guides for inhibitor design. J. Med. Chem., 1996, homology modelling for medicinal chemists. Curr. Top.
39(23), 4540-4546. Med. Chem., 2017, 17(18), 2023-2040.
http://dx.doi.org/10.1021/jm960402a PMID: 8917641 http://dx.doi.org/10.2174/1568026617666170130110827
[104] Schulze-Gahmen, U.; Brandsen, J.; Jones, H.D.; Morgan, PMID: 28137238
D.O.; Meijer, L.; Vesely, J.; Kim, S.H. Multiple modes of [114] Cardamone, F.; Pizzi, S.; Iacovelli, F.; Falconi, M.; Desid-
ligand recognition: crystal structures of cyclin-dependent eri, A. Virtual screening for the development of dual-
protein kinase 2 in complex with ATP and two inhibitors, inhibitors targeting topoisomerase IB and tyrosyl-DNA
16 Current Medicinal Chemistry, XXXX, Vol. XX, No. XX Veit-Acosta and de Azevedo Junior
phosphodiesterase 1. Curr. Drug Targets, 2017, 18(5), 544- in improvement of the agar quality. Food Chem., 2020, 320,
555. 126652.
http://dx.doi.org/10.2174/1389450116666150727114742 http://dx.doi.org/10.1016/j.foodchem.2020.126652 PMID:
PMID: 26212266 32229399
[115] Biesiada, J.; Porollo, A.; Velayutham, P.; Kouril, M.; [126] Taguchi, A.T.; Boyd, J.; Diehnelt, C.W.; Legutki, J.B.;
Meller, J. Survey of public domain software for docking Zhao, Z.G.; Woodbury, N.W. Comprehensive prediction of
simulations and virtual screening. Hum. Genomics, 2011, molecular recognition in a combinatorial chemical space us-
5(5), 497-505. ing machine learning. ACS Comb. Sci., 2020, 22(10), 500-
http://dx.doi.org/10.1186/1479-7364-5-5-497 PMID: 508.
21807604 http://dx.doi.org/10.1021/acscombsci.0c00003 PMID:
[116] Bitencourt-Ferreira, G.; Rizzotto, C.; de Azevedo, W.F.Jr. 32786325
Machine learning-based scoring functions. Development [127] Jehangir, I.; Ahmad, S.F.; Jehangir, M.; Jamal, A.; Khan,
and applications with SAnDReS. Curr. Med. Chem., 2021, M. Integration of bioinformatics and in vitro analysis reveal
28(9), 1746-1756. anti-leishmanial effects of azithromycin and nystatin. Curr.
http://dx.doi.org/10.2174/0929867327666200515101820 Bioinform., 2019, 14(5), 450-459.
PMID: 32410551 http://dx.doi.org/10.2174/1574893614666181217142344
[117] Fresnais, L.; Ballester, P.J. The impact of compound library [128] Lushington, G.H. Chemistry, Screening, and the democracy
size on the performance of scoring functions for structure- of publishing. Comb. Chem. High Throughput Screen.,
based virtual screening. Brief. Bioinform., 2021, 22(3), 2019, 22(5), 288-289.
bbaa095. http://dx.doi.org/10.2174/1386207322999190715161959
http://dx.doi.org/10.1093/bib/bbaa095 PMID: 32568385 PMID: 31446889
[118] Ballester, P.J. Machine Learning for Molecular Modelling [129] Zhao, J.; Cao, Y.; Zhang, L. Exploring the computational
in Drug Design. Biomolecules, 2019, 9(6), 216. methods for protein-ligand binding site prediction. Comput.
http://dx.doi.org/10.3390/biom9060216 PMID: 31167503 Struct. Biotechnol. J., 2020, 18, 417-426.
[119] Azevedo, L.S.; Moraes, F.P.; Xavier, M.M.; Pantoja, E.O.; http://dx.doi.org/10.1016/j.csbj.2020.02.008 PMID:
Villavicencio, B.; Finck, J.A.; Proenca, A.M.; Rocha, K.B.; 32140203
de Azevedo, W.F. Recent progress of molecular docking [130] Zhang, W.; Li, W.; Zhang, J.; Wang, N. Data integration of
simulations applied to development of drugs. Curr. Bioin- hybrid microarray and single cell expression data to en-
form., 2012, 7(4), 352-365. hance gene network inference. Curr. Bioinform., 2019,
http://dx.doi.org/10.2174/157489312803901063 14(3), 255-268.
[120] Figueroa-Villar, J.D.; Petronilho, E.C.; Kuca, K.; Franca, http://dx.doi.org/10.2174/1574893614666190104142228
T.C.C. Review about structure and evaluation of reactiva- [131] Wu, Y.; Guo, Y.; Xiao, Y.; Lao, S. AAE-SC: a scRNA-Seq
tors of acetylcholinesterase inhibited with neurotoxic or- clustering framework based on adversarial autoencoder.
ganophosphorus compounds. Curr. Med. Chem., 2021, IEEE Access, 2020, 8, 178962-178975.
28(7), 1422-1442. http://dx.doi.org/10.1109/ACCESS.2020.3027481
http://dx.doi.org/10.2174/0929867327666200425213215 [132] Li, M.; Zhang, S.; Yang, B. Urea transporters identified as
PMID: 32334495 novel diuretic drug targets. Curr. Drug Targets, 2020,
[121] Russo, S.; de Azevedo, W.F. Computational analysis of 21(3), 279-287.
dipyrone metabolite 4-aminoantipyrine as a cannabinoid re- http://dx.doi.org/10.2174/1389450120666191129101915
ceptor 1 agonist. Curr. Med. Chem., 2020, 27(28), 4741- PMID: 31782365
4749. [133] Safarizadeh, H.; Garkani-Nejad, Z. Investigation of MI-2
http://dx.doi.org/10.2174/0929867326666190906155339 analogues as MALT1 inhibitors to treat of diffuse large B-
PMID: 31490743 cell lymphoma through combined molecular dynamics sim-
[122] Scotti, M.T.; Monteiro, A.F.M.; de Oliveira Viana, J.; Men- ulation, molecular docking and QSAR techniques and de-
donça, F.J.B.Jr.; Ishiki, H.M.; Tchouboun, E.N.; De Araújo, sign of new inhibitors. J. Mol. Struct., 2019, 1180, 708-722.
R.S.A.; Scotti, L. Recent theoretical studies concerning im- http://dx.doi.org/10.1016/j.molstruc.2018.12.022
portant tropical infections. Curr. Med. Chem., 2020, 27(5), [134] Lawal, M.M.; Sanusi, Z.K.; Govender, T.; Maguire,
795-834. G.E.M.; Honarparvar, B.; Kruger, H.G. From recognition to
http://dx.doi.org/10.2174/0929867326666190711121418 reaction mechanism: an overview on the interactions be-
PMID: 31296154 tween HIV-1 protease and its natural targets. Curr. Med.
[123] Lungu, C.N.; Bratanovici, B.I.; Grigore, M.M.; Antoci, V.; Chem., 2020, 27(15), 2514-2549.
Mangalagiu, I.I. Hybrid imidazole-pyridine derivatives: an http://dx.doi.org/10.2174/0929867325666181113122900
approach to novel anticancer DNA intercalators. Curr. Med. PMID: 30421668
Chem., 2020, 27(1), 154-169. [135] Sun, B.; Wang, W.; He, Z.; Zhang, M.; Kong, F.; Sain, M.
http://dx.doi.org/10.2174/0929867326666181220094229 Biopolymer substrates in buccal drug delivery: current sta-
PMID: 30569842 tus and future trend. Curr. Med. Chem., 2020, 27(10), 1661-
[124] Halder, A.K.; Dias Soeiro Cordeiro, M.N. Advanced in 1669.
silico methods for the development of anti- leishmaniasis http://dx.doi.org/10.2174/0929867325666181001114750
and anti-trypanosomiasis agents. Curr. Med. Chem., 2020, PMID: 30277141
27(5), 697-718. [136] Aleksandrov, A.; Myllykallio, H. Advances and challenges
http://dx.doi.org/10.2174/0929867325666181031093702 in drug design against tuberculosis: application of in silico
PMID: 30378482 approaches. Expert Opin. Drug Discov., 2019, 14(1), 35-46.
[125] Zhu, Y.; Liang, M.; Li, H.; Ni, H.; Li, L.; Li, Q.; Jiang, Z. A http://dx.doi.org/10.1080/17460441.2019.1550482 PMID:
mutant of Pseudoalteromonas carrageenovora arylsulfatase 30477360
with enhanced enzyme activity and its potential application [137] Cavada, B.S.; Osterne, V.J.S.; Lossio, C.F.; Pinto-Junior,
V.R.; Oliveira, M.V.; Silva, M.T.L.; Leal, R.B.; Nascimen-
Computational Prediction of Binding Affinity Current Medicinal Chemistry, XXXX, Vol. XX, No. XX 17
to, K.S. One century of ConA and 40 years of ConBr re- http://dx.doi.org/10.2174/0929867325666180417165247
search: a structural review. Int. J. Biol. Macromol., 2019, PMID: 29667549
134, 901-911. [147] Wolin, I.A.V.; Heinrich, I.A.; Nascimento, A.P.M.; Welter,
http://dx.doi.org/10.1016/j.ijbiomac.2019.05.100 PMID: P.G.; Sosa, L.D.V.; De Paul, A.L.; Zanotto-Filho, A.;
31108148 Nedel, C.B.; Lima, L.D.; Osterne, V.J.S.; Pinto-Junior,
[138] Jiang, M.; Li, Z.; Bian, Y.; Wei, Z. A novel protein de- V.R.; Nascimento, K.S.; Cavada, B.S.; Leal, R.B. ConBr
scriptor for the prediction of drug binding sites. BMC Bioin- lectin modulates MAPKs and Akt pathways and triggers au-
formatics, 2019, 20(1), 478. tophagic glioma cell death by a mechanism dependent upon
http://dx.doi.org/10.1186/s12859-019-3058-0 PMID: caspase-8 activation. Biochimie, 2021, 180, 186-204.
31533611 http://dx.doi.org/10.1016/j.biochi.2020.11.003 PMID:
[139] Cavada, B.S.; Araripe, D.A.; Silva, I.B.; Pinto-Junior, V.R.; 33171216
Osterne, V.J.S.; Neco, A.H.B.; Laranjeira, E.P.P.; Lossio, [148] de Ávila, M.B.; de Azevedo, W.F.Jr. Development of ma-
C.F.; Correia, J.L.A.; Pires, A.F.; Assreuy, A.M.S.; Nasci- chine learning models to predict inhibition of 3-
mento, K.S. Structural studies and nociceptive activity of a dehydroquinate dehydratase. Chem. Biol. Drug Des., 2018,
native lectin from Platypodium elegans seeds (nPELa). Int. 92(2), 1468-1474.
J. Biol. Macromol., 2018, 107(Pt A), 236-246. http://dx.doi.org/10.1111/cbdd.13312 PMID: 29676519
https://doi.org/10.1016/j.ijbiomac.2017.08.174 PMID: [149] Pinto-Junior, V.R.; Osterne, V.J.; Santiago, M.Q.; Correia,
28867234 J.L.; Pereira-Junior, F.N.; Leal, R.B.; Pereira, M.G.; Chicas,
[140] Abbasi, W.A.; Asif, A.; Ben-Hur, A.; Minhas, F.U.A.A. L.S.; Nagano, C.S.; Rocha, B.A.; Silva-Filho, J.C.; Ferreira,
Learning protein binding affinity using privileged infor- W.P.; Rocha, C.R.; Nascimento, K.S.; Assreuy, A.M.;
mation. BMC Bioinformatics, 2018, 19(1), 425. Cavada, B.S. Structural studies of a vasorelaxant lectin
http://dx.doi.org/10.1186/s12859-018-2448-z PMID: from Dioclea reflexa hook seeds: crystal structure, molecu-
30442086 lar docking and dynamics. Int. J. Biol. Macromol., 2017,
[141] Ribeiro, F.F.; Mendonca Junior, F.J.B.; Ghasemi, J.B.; Ishi- 98, 12-23.
ki, H.M.; Scotti, M.T.; Scotti, L. Docking of natural prod- http://dx.doi.org/10.1016/j.ijbiomac.2017.01.092 PMID:
ucts against neurodegenerative diseases: general concepts. 28130130
Comb. Chem. High Throughput Screen., 2018, 21(3), 152- [150] Bitencourt-Ferreira, G.; de Azevedo, W.F.Jr. Development
160. of a machine-learning model to predict Gibbs free energy of
http://dx.doi.org/10.2174/1386207321666180313130314 binding for protein-ligand complexes. Biophys. Chem.,
PMID: 29532756 2018, 240, 63-69.
[142] Lemos, A.; Melo, R.; Preto, A.J.; Almeida, J.G.; Moreira, http://dx.doi.org/10.1016/j.bpc.2018.05.010 PMID:
I.S.; Dias Soeiro Cordeiro, M.N.D.S In silico studies target- 29906639
ing G-protein coupled receptors for drug research against [151] Amaral, M.E.A.; Nery, L.R.; Leite, C.E.; de Azevedo,
Parkinson’s disease. Curr. Neuropharmacol., 2018, 16(6), W.F.Jr.; Campos, M.M. Pre-clinical effects of metformin
786-848. and aspirin on the cell lines of different breast cancer sub-
http://dx.doi.org/10.2174/1570159X16666180308161642 types. Invest. New Drugs, 2018, 36(5), 782-796.
PMID: 29521236 http://dx.doi.org/10.1007/s10637-018-0568-y PMID:
[143] Leal, R.B.; Pinto-Junior, V.R.; Osterne, V.J.S.; Wolin, 29392539
I.A.V.; Nascimento, A.P.M.; Neco, A.H.B.; Araripe, D.A.; [152] Borisa, A.; Bhatt, H. 3D-QSAR (CoMFA, CoMFA-RG,
Welter, P.G.; Neto, C.C.; Correia, J.L.A.; Rocha, C.R.C.; CoMSIA) and molecular docking study of thienopyrimidine
Nascimento, K.S.; Cavada, B.S. Crystal structure of DlyL, a and thienopyridine derivatives to explore structural re-
mannose-specific lectin from Dioclea lasiophylla Mart. Ex quirements for aurora-B kinase inhibition. Eur. J. Pharm.
Benth seeds that display cytotoxic effects against C6 glioma Sci., 2015, 79, 1-12.
cells. Int. J. Biol. Macromol., 2018, 114, 64-76. http://dx.doi.org/10.1016/j.ejps.2015.08.017 PMID:
http://dx.doi.org/10.1016/j.ijbiomac.2018.03.080 PMID: 26343315
29559315 [153] Gramatica, P. On the development and validation of QSAR
[144] de Ávila, M.B.; Bitencourt-Ferreira, G.; de Azevedo, models. Methods Mol. Biol., 2013, 930, 499-526.
W.F.Jr. Structural basis for inhibition of enoyl-[Acyl carrier http://dx.doi.org/10.1007/978-1-62703-059-5_21 PMID:
protein] reductase (InhA) from Mycobacterium tuberculo- 23086855
sis. Curr. Med. Chem., 2020, 27(5), 745-759. [154] Triggle, D.J. The chemist as astronaut: searching for biolog-
http://dx.doi.org/10.2174/0929867326666181203125229 ically useful space in the chemical universe. Biochem.
PMID: 30501592 Pharmacol., 2009, 78(3), 217-223.
[145] Freitas, P.G.; Elias, T.C.; Pinto, I.A.; Costa, L.T.; de Car- http://dx.doi.org/10.1016/j.bcp.2009.02.015 PMID:
valho, P.V.S.D.; Omote, D.Q.; Camps, I.; Ishikawa, T.; Ar- 19481639
curi, H.A.; Vinga, S.; Oliveira, A.L.; Junior, W.F.A.; da [155] Kell, D.B.; Samanta, S.; Swainston, N. Deep learning and
Silveira, N.J.F. Computational approach to the discovery of generative methods in cheminformatics and chemical biolo-
phytochemical molecules with therapeutic potential targets gy: navigating small molecule space intelligently. Biochem.
to the PKCZ protein. Lett. Drug Des. Discov., 2018, 15(5), J., 2020, 477(23), 4559-4580.
488-499. http://dx.doi.org/10.1042/BCJ20200781 PMID: 33290527
http://dx.doi.org/10.2174/1570180814666170810120150 [156] Johnson, E.O.; Hung, D.T. A point of inflection and reflec-
[146] Russo, S.; de Azevedo, W.F. Advances in the understanding tion on systems chemical biology. ACS Chem. Biol., 2019,
of the cannabinoid receptor 1 - focusing on the inverse ago- 14(12), 2497-2511.
nists interactions. Curr. Med. Chem., 2019, 26(10), 1908- http://dx.doi.org/10.1021/acschembio.9b00714 PMID:
1919. 31613592
[157] Fotis, C.; Antoranz, A.; Hatziavramidis, D.; Sakellaropou-
los, T.; Alexopoulos, L.G. Network-based technologies for
18 Current Medicinal Chemistry, XXXX, Vol. XX, No. XX Veit-Acosta and de Azevedo Junior
early drug discovery. Drug Discov. Today, 2018, 23(3), based inference of protein function. Proc. Natl. Acad. Sci.
626-635. USA, 2005, 102(10), 3651-3656.
http://dx.doi.org/10.1016/j.drudis.2017.12.001 PMID: http://dx.doi.org/10.1073/pnas.0409772102 PMID:
29294361 15705717
[158] Kirkpatrick, P.; Ellis, C. Chemical space. Nature, 2004, [164] Singh, A.V.; Chandrasekar, V.; Janapareddy, P.; Mathews,
432(7019), 823. D.E.; Laux, P.; Luch, A.; Yang, Y.; Garcia-Canibano, B.;
http://dx.doi.org/10.1038/432823a Balakrishnan, S.; Abinahed, J.; Al Ansari, A.; Dakua, S.P.
[159] Lipinski, C.; Hopkins, A. Navigating chemical space for Emerging application of nanorobotics and artificial intelli-
biology and medicine. Nature, 2004, 432(7019), 855-861. gence to cross the BBB: advances in design, controlled ma-
http://dx.doi.org/10.1038/nature03193 PMID: 15602551 neuvering, and targeting of the barriers. ACS Chem. Neuro-
[160] Shoichet, B.K. Virtual screening of chemical libraries. Na- sci., 2021, 12(11), 1835-1853.
ture, 2004, 432(7019), 862-865. http://dx.doi.org/10.1021/acschemneuro.1c00087 PMID:
http://dx.doi.org/10.1038/nature03197 PMID: 15602552 34008957
[161] Stockwell, B.R. Exploring biology with small organic mol- [165] Singh, A.V.; Jahnke, T.; Wang, S.; Xiao, Y.; Alapan, Y.;
ecules. Nature, 2004, 432(7019), 846-854. Kharratian, S.; Onbasli, M.C.; Kozielski, K.; David, H.;
http://dx.doi.org/10.1038/nature03196 PMID: 15602550 Richter, G.; Bill, J.; Laux, P.; Luch, A.; Sitti, M. Aniso-
[162] Smith, J.M. Natural selection and the concept of a protein tropic gold nanostructures: optimization via in silico model-
space. Nature, 1970, 225(5232), 563-564. ing for hyperthermia. ACS Appl. Nano Mater., 2018, 1(11),
http://dx.doi.org/10.1038/225563a0 PMID: 5411867 6205-6216.
[163] Hou, J.; Jun, S.R.; Zhang, C.; Kim, S.H. Global mapping of http://dx.doi.org/10.1021/acsanm.8b01406
the protein structure space and application in structure-
DISCLAIMER: The above article has been published, as is, ahead-of-print, to provide early visibility but is not the
final version. Major publication processes like copyediting, proofing, typesetting and further review are still to be
done and may lead to changes in the final published version, if it is eventually published. All legal disclaimers that
apply to the final published article.