Chemical Checker banner

Bioactivity descriptors for uncharacterized chemical compounds

The Structural Bioinformatics and Network Biology Group at the Institute for Research in Biomedicine (IRB Barcelona), known for the development of the Chemical Checker (CC), has recently been working on a deep neural networks methodology to infer bioactivity signatures of small molecule ligands with little experimental data available.

The problem of missing data

For decades, scientists have been trying to build on top of widely-established chemical descriptors and fingerprints, and develop similar, enhanced concepts to encapsulate biological properties such as ligand binding affinities, toxicology or cell sensitivity. Even though these efforts have been successful, the scarcity of experimentally determined bioactivity data meant that bioactivity descriptors could not be derived for most of the synthetically-accessible chemical space.

However, recently, the aforementioned SBNB group led by Patrick Aloy, published a paper presenting a new approach to predict bioactivity descriptors and use them to enhance the identification of hit molecules against Snail1 target.

A new tool to generate bioactivity descriptors

The methodology is based on the assumption that the different bioactivity spaces are not completely independent, and therefore, can be correlated using approaches like similarity calculations. Missing bioactivity features, such as cellular response and clinical outcomes, are generated using a Siamese Neural Network with what they call a “signature dropout scheme” to fill the gaps in the experimental part of the Chemical Checker.

Aloy’s team was able to infer bioactivity signatures for the around 800 thousand molecules available in the Chemical Checker, obtaining a complete set of 25×128-dimensional signatures for each molecule.

Together with previously existing parts of the Chemical Checker, they have managed to complete an exhaustive task of gathering, harmonizing and vectorizing the compound data, which has already been proven to enhance identification of hit molecules for almost undruggable Snail1 transcription factor.

Additionally, CC signatures are compatible with other drug discovery toolkits, presenting an opportunity to implement this methodology to similarity searches, visualization of chemical spaces, clustering, predictions and other related tasks. Nostrum Biodiscovery is routinely using these kinds of tools to augment its hit finding capabilities.

Source of images: Chemical Checker and IRB Barcelona.

Bioactivity descriptors for uncharacterized chemical compounds