241202_Thuy_Labseminar[Multi-View Mixture-of-Experts for Predicting Molecular Properties Using SMILES, SELFIES, and Graph-Based Representations].pptx

Multi-View Mixture-of-Experts for Predicting Molecular Properties
Using SMILES, SELFIES, and Graph-Based Representations
Van Thuy Hoang
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: hoangvanthuy90@gmail.com
2024-12-02
Eduardo Soares et. al.; NeurIPS 2024

2
BACKGROUND: Graph Convolutional Networks (GCNs)
• Key Idea: Each node aggregates information from its neighborhood to get
contextualized node embedding.
• Limitation: Most GNNs focus on homogeneous graph.
Neural
Transformation
Aggregate neighbor’s
information

3
Learning molecular structures though GNNs
• Inputs: Molecules
• Outputs: a score for specific task prediction
Graph Neural Networks
Molecules Pooling Function Task Prediction
Molecular Representation Learning

4
Molecular Graph Neural Network
• The node representation at the -th layer of GNN is formulated as:
𝑣 𝑙
• To obtain the graph-level representation h for a molecular graph
𝐺 :

5
Idea
• In MoE architectures, multiple experts act as sub-networks, with a gating network
selectively activating only the most relevant experts for each input
• a large SMILES-based encoder-decoder
• a BART-based SELFIES encoder-decoder
• a graph-based SMILES model

6
Methodology
• Multi-View Mixture-of-Expert Layer
• Before the gating network is used, the feature extraction module converts raw SMILES
input into embeddings for the gating network.
• Each SMILES string is tokenized, and these tokens are turned into fixed 768-dimensional
vectors. Mean pooling is then used to create a single embedding for the molecule.
• Other feature extraction methods can also be used to improve the molecule’s
representation.
• Let G(x) be the output of the gating network and Ei(ˆx) be the output of the i-th expert
network for a given SMILES input x

7
SMILES-based foundation model
• This paper utilized the SMI-TED289M foundation model as the SMILES encoder.
• SMI-TED289M is a large-scale, open-source encoder-decoder model pre-trained on a
curated dataset of 91 million SMILES samples from PubChem
• All 91 million molecules curated from PubChem were utilized in the tokenization
process, resulting in a set of 4 billion molecular tokens.

8
SELFIES-based foundation model
• SELFIES-BART, the SELFIES-based foundation model is an encoder-decoder architecture
derived from the BART (Bidirectional Auto-Regressive Transformer) model
• This paper first convert these SMILES strings to SELFIES using the SELFIES API.
• In SELFIES each atom or bond is represented by symbols enclosed in [ ], which are then
tokenized using a word level tokenization scheme where each symbol or bond in [ ] is
treated as a word.
• For example:
• SMILES: CCO -> SELFIES: [C][C][O]
• SMILES: C1=CC=CC=C1 -> SELFIES: [C][=C][C][=C][C][=C][Ring1]

9
Graph-based model for small molecules
• This paper employ MHG-GNN, an autoencoder that combines GNN with Molecular
Hypergraph Grammar (MHG) introduced for MHG-VAE
• MHG-GNN receives a molecular structure represented as a graph.
• The encoder constructed as Graph Isomorphism Network (GIN) that additionally
considers edges encodes that graph to its corresponding latent vector

10
Experiments
• a comprehensive set of 9 distinct benchmark datasets sourced from MoleculeNet

11
Results and Discussion
• Results for classification tasks:
• results indicate that MoL-MoE outperforms other leading methods such as
ChemBerta, Chemberta2, and Galactica 30B and 120B.

12
Results and Discussion
• Results for regression tasks:
• MoL-MoE shows robust performance across all tested regression benchmarks,
consistently outperforming other state-of-the-art methods.

13
CONCLUSION
• MoL-MoE, a Multi-view Mixture-of-Experts framework that integrates multiple latent
spaces from SMILES, SELFIES, and molecular graphs to predict molecular properties.
• MoL-MoE dynamically adjusts its focus on different molecular representations based on
the specific needs of each task
• This indicates that the choice of representation is crucial for optimizing model
performance, especially for tasks with distinct characteristics or complexities. For
instance, some tasks may benefit more from SMILES or SELFIES, while others may
require a focus on molecular graphs.

241202_Thuy_Labseminar[Multi-View Mixture-of-Experts for Predicting Molecular Properties Using SMILES, SELFIES, and Graph-Based Representations].pptx

241202_Thuy_Labseminar[Multi-View Mixture-of-Experts for Predicting Molecular Properties Using SMILES, SELFIES, and Graph-Based Representations].pptx

More Related Content

Similar to 241202_Thuy_Labseminar[Multi-View Mixture-of-Experts for Predicting Molecular Properties Using SMILES, SELFIES, and Graph-Based Representations].pptx (20)

More from thanhdowork (20)

Recently uploaded (20)

241202_Thuy_Labseminar[Multi-View Mixture-of-Experts for Predicting Molecular Properties Using SMILES, SELFIES, and Graph-Based Representations].pptx