Skip to main content

Posts

Showing posts with the label ChEMBL

Drug Data in ChEMBL: a critical asset

Breaking news  📢   We are excited to announce our new journal article that presents the comprehensive drug data in ChEMBL. The paper describes the state-of-the-art processes to curate and integrate the high-quality drug and clinical candidate drug data. The drug curation processes have been developed over more than 15 years and this is the first time that they have been published.  https://pubs.acs.org/doi/10.1021/acs.jmedchem.5c00920 Published as a 'Perspectives' article in the Journal of Medicinal Chemistry, the paper educates ChEMBL users, helping them to understand the nature of the drug and clinical candidate data and the rationale that underlies curation decisions. G iven the increasing reliance on high-quality data in computational drug discovery, AI and machine learning, the integrated nature of the drug data within the ChEMBL bioactivity resource is a critical asset. This is a bumper week for drug data in ChEMBL! On Monday, our latest ChEMBL 36 release included ...

Sequence similarity searches in ChEMBL

  The ChEMBL database contains bioactivity data that links compounds to their biological targets.  Most ChEMBL targets are proteins (~ 70% in version 27) and these are mapped to their UniProt accessions.   On the ChEMBL interface, searches can be performed with either protein names or accessions...but did you know that protein similarity searches are also possible? Here’s an example using human Phospholipase DDHD2 , a target not found in ChEMBL.       1.       On the ChEMBL interface , click 'Enter a Sequence:     2.       Input the FASTA sequence corresponding to human  Phospholipase DDHD2  and click 'Search in ChEMBL':  3.      Review the BLAST results, select targets of interest and browse bioactivity data: The BLAST  search identifies the mouse  Phospholipase DDHD2   homologue alongside a small number of bioactivity data points and active compounds . ChE...

Data checks

  ChEMBL contains a broad range of binding, functional and ADMET type assays in formats ranging from  in vitro single protein assays to anti proliferative cell-based assays.  Some variation is expected, even for very similar assays, since these are often performed by different groups and institutes.  ChEMBL includes references for all bioactivity values so that full assay details can be reviewed if needed, however there are a number of other data checks that can be used to identify potentially problematic results. 1) Data validity comments: The data validity column was first included in ChEMBL v15 and flags activities with potential validity issues such as a non-standard unit for type or activities outside of the expected range. Users can review flagged activities and decide how these should be handled. The data validity column can be viewed on the interface (click 'Show/Hide columns' and select 'data validity comments') and can be found in the activities ...

Molecule hierarchy

During drug development, active pharmaceutical ingredients are often formulated as salts to provide the final pharmaceutical product. ChEMBL includes parent molecules and their salts (approved and investigational) as well as other alternative forms such as hydrates and radioisotopes. These alternative forms are linked to their parent compound through the molecule hierarchy.   Using the molecule hierarchy The molecule hierarchy can be used to retrieve and display connected compounds and to  aggregate activity data that has been mapped to any member of a compound family. On the interface, related compounds are automatically displayed in the ‘Alternative forms’ section of the ChEMBL compound report card. Bioactivity data can easily be aggregated in the activity summary by using the 'Include/Exclude Alternative Forms' filter. Finding the molecule hierarchy   On the interface, we include alternative forms as shown above. The downloaded database contains the molecule_hiera...

Using ChEMBL activity comments

We’re sometimes asked what the ‘activity_comments’ in the ChEMBL database mean. In this Blog post, we’ll use aspirin as an example to explain some of the more common activity comments. First, let’s review the bioactivity data included in ChEMBL. We extract bioactivity data directly from   seven core medicinal chemistry journals . Some common activity types, such as IC50s, are standardised  to allow broad comparisons across assays; the standardised data can be found in the  standard_value ,  standard_relation  and  standard_units  fields. Original data is retained in the database downloads in the  value ,  relation  and  units  fields. However, we extract all data from a publication including non-numerical bioactivity and ADME data. In these cases, the activity comments may be populated during the ChEMBL extraction-curation process  in order to capture the author's  overall  conclusions . Similarly, for depos...

New ChEMBL Interface

We are pleased to announce that we have a beta version of a new ChEMBL interface that we would like you to try out.  It can be found here   There are also a lot of additional features in the new interface such as free text searching. You no longer need to specify that you want to search for a compound, target, document etc. Also as you type your search, there will be suggestions made for you. You can also filter of results to see just the subset of data you are interested in using a number of different filtering options. However, the new interface still retains some of the old features such as compound and target report cards.  More details on the new features can be found here  and we have also updated our FAQs.  But most of all we hope it is intuitive to use. It will replace the old interface soon but before we retire the old one we would like some feedback on the new one. We will continue to evol...

LSH-based similarity search in MongoDB is faster than postgres cartridge.

TL;DR: In his excellent blog post , Matt Swain described the implementation of compound similarity searches in MongoDB . Unfortunately, Matt's approach had suboptimal ( polynomial ) time complexity with respect to decreasing similarity thresholds, which renders unsuitable for production environments. In this article, we improve on the method by enhancing it with Locality Sensitive Hashing algorithm, which significantly reduces query time and outperforms RDKit PostgreSQL cartridge . myChEMBL 21 - NoSQL edition    Given that NoSQL technologies applied to computational chemistry and cheminformatics are gaining traction and popularity, we decided to include a taster in future myChEMBL releases. Two especially appealing technologies are Neo4j and MongoDB . The former is a graph database and the latter is a BSON document storage. We would like to provide IPython notebook -based tutorials explaining how to use this software to deal with common cheminformat...