BIOMOLECULAR MODELING

SPRUCE

Reliable structure-based drug design begins with a structure that correctly represents the biology. SPRUCE transforms experimental data from PDB, mmCIF, or cryo-EM sources into fully prepared, biologically meaningful systems ready for structure-based drug discovery use such as in docking, binding free energy calculations, and molecular dynamics.

Errors introduced during structure preparation propagate through every downstream calculation. Misplaced hydrogens may corrupt docking scores and force field calculations. Incorrect protonation states distort binding pose predictions. Unmodeled loops leave binding site geometry incomplete. SPRUCE resolves these issues systematically, so your modeling results reflect real biology rather than preparation artifacts.

Loop and missing residue modeling. SPRUCE employs a template-based loop modeling approach, leveraging structural homologs to produce physically reasonable conformations.

Three complementary superposition methods are available in Spruce: sequence-based, secondary structure-based, and active site shape-based. This range accommodates everything from closely related orthologs to distantly related targets with conserved binding site architecture.

What SPRUCE Prepares and Why it Matters

Each preparation step addresses a specific source of error in downstream biophysical modeling.

Hydrogen bond network optimization.
Tautomer and protonation state enumeration.
Biological unit assembly.
Loop and missing residue modeling.
Side-chain remodeling and point mutations.
Enumerates pockets for apo structures.
Protein superposition.
Support PDB and mmCIF files.

How SPRUCE Works

SPRUCE streamlines the preparation process by automatically breaking down the system into individual biological components, adding any missing protons or residues, and subsequently optimizing the hydrogen bond network for the entire system.

SPRUCE's structure preparation workflow performs tasks including the enumeration of biological units, alternate locations (if present), modeling missing residues and loops, and placing and optimizing hydrogens, accounting for the likely tautomer states of bound heterogens (ligands and cofactors).

The dataset containing 4 design units (DUs) prepared by SPRUCE for the PDB accession code 4ZJI. The structure contains 4 chains, with 4 copies of the ligand bound, hence SPRUCE created 4 DUs. The color depictions in the Orion Molecular Design Platform make it easier for the user to pick the appropriate DUs. Note that the upper 2 DUs have an Iridium Score of HT (highly trustworthy), whereas the lower 2 have an Iridium Score of MT (medium trustworthy).

Structure Quality Assessment

Not all experimentally determined structures are equally suitable for modeling. Resolution, refinement quality, ligand fit to electron density, and crystal contacts all affect whether a structure will produce reliable predictions. SPRUCE integrates the Iridium classification system to give scientists a transparent, reproducible basis for selecting which structures to use in a campaign.

Iridium-HT: Highly trustworthy, suitable for direct use.
Iridium-MT: Moderately trustworthy, flagged issues present.
Iridium-NT: Not trustworthy, use with caution.

Structures flagged by Iridium receive detailed annotations identifying the specific issue, whether it is poor electron density, poor ligand fit to the density, or problematic crystal contacts. This allows scientists to select optimal protein structures or units for their drug discovery pursuit.

Modeling

The Modeling suite of toolkits provides the core functionality underlying OpenEye's defining principle that shape & electrostatics are the two fundamental descriptors determining intermolecular interactions. Many of the toolkits in the Modeling suite are directly associated with specific OpenEye applications and can therefore be used to create new or extend existing functionality associated with those applications.

OEChem TK Core chemistry handling and representation as well as molecule file I/O
OEDocking TK Molecular docking and scoring
Omega TK Conformer generation
Shape TK 3D shape description, optimization, and overlap
SiteHopper TK Rapid Comparison of Protein Binding Sites
Spicoli TK Surface generation, manipulation, and interrogation
Spruce TK Protein preparation and modeling
Szybki TK General purpose optimization with MMFF94
Szmap TK Understanding water interactions in a binding site
Zap TK Calculate Poisson-Boltzmann electrostatic potentials

Cheminformatics

The Cheminformatics suite of toolkits provides the core foundation upon which all of the OpenEye applications and remaining toolkits are built. The Cheminformatics suite is a collection of seven individual yet interdependent toolkits that are described in the table below.

FastROCS TK Real-time shape similarity for virtual screening, lead hopping & shape clustering
OEChem TK Core chemistry handling and representation as well as molecule file I/O
OEDepict TK 2D Molecule rendering and depiction
Grapheme™ TK Advanced molecule rendering and report generation
GraphSim TK 2D molecular similarity (e.g. fingerprints)
Lexichem TK name-to-structure, structure-to-name, foreign language translation
Quacpac TK Tautomer enumeration and charge assignment
MedChem TK Matched molecular pair analysis, fragmentation utilities, and molecular complexity metrics

References

Essential considerations for using protein-ligand structures in drug discovery, G.L. Warren, T. D. Do, B. P. Kelly, A. Nicholls, S. D. Warren, Drug Discov. Today, 2012, 17, 1270-81
Loopholes and missing links in protein modeling, A. Rossi, C. A. Weiglet, A. Nayeem, S. R. Krystek Jr., Prot. Sci., 2007, 1999-2012