SlideShare a Scribd company logo
Multi-View Mixture-of-Experts for Predicting Molecular Properties
Using SMILES, SELFIES, and Graph-Based Representations
Van Thuy Hoang
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: hoangvanthuy90@gmail.com
2024-12-02
Eduardo Soares et. al.; NeurIPS 2024
2
BACKGROUND: Graph Convolutional Networks (GCNs)
• Key Idea: Each node aggregates information from its neighborhood to get
contextualized node embedding.
• Limitation: Most GNNs focus on homogeneous graph.
Neural
Transformation
Aggregate neighbor’s
information
3
Learning molecular structures though GNNs
• Inputs: Molecules
• Outputs: a score for specific task prediction
Graph Neural Networks
Molecules Pooling Function Task Prediction
Molecular Representation Learning
4
Molecular Graph Neural Network
• The node representation at the -th layer of GNN is formulated as:
𝑣 𝑙
• To obtain the graph-level representation h for a molecular graph
𝐺 :
5
Idea
• In MoE architectures, multiple experts act as sub-networks, with a gating network
selectively activating only the most relevant experts for each input
• a large SMILES-based encoder-decoder
• a BART-based SELFIES encoder-decoder
• a graph-based SMILES model
6
Methodology
• Multi-View Mixture-of-Expert Layer
• Before the gating network is used, the feature extraction module converts raw SMILES
input into embeddings for the gating network.
• Each SMILES string is tokenized, and these tokens are turned into fixed 768-dimensional
vectors. Mean pooling is then used to create a single embedding for the molecule.
• Other feature extraction methods can also be used to improve the molecule’s
representation.
• Let G(x) be the output of the gating network and Ei(ˆx) be the output of the i-th expert
network for a given SMILES input x
7
SMILES-based foundation model
• This paper utilized the SMI-TED289M foundation model as the SMILES encoder.
• SMI-TED289M is a large-scale, open-source encoder-decoder model pre-trained on a
curated dataset of 91 million SMILES samples from PubChem
• All 91 million molecules curated from PubChem were utilized in the tokenization
process, resulting in a set of 4 billion molecular tokens.
8
SELFIES-based foundation model
• SELFIES-BART, the SELFIES-based foundation model is an encoder-decoder architecture
derived from the BART (Bidirectional Auto-Regressive Transformer) model
• This paper first convert these SMILES strings to SELFIES using the SELFIES API.
• In SELFIES each atom or bond is represented by symbols enclosed in [ ], which are then
tokenized using a word level tokenization scheme where each symbol or bond in [ ] is
treated as a word.
• For example:
• SMILES: CCO -> SELFIES: [C][C][O]
• SMILES: C1=CC=CC=C1 -> SELFIES: [C][=C][C][=C][C][=C][Ring1]
9
Graph-based model for small molecules
• This paper employ MHG-GNN, an autoencoder that combines GNN with Molecular
Hypergraph Grammar (MHG) introduced for MHG-VAE
• MHG-GNN receives a molecular structure represented as a graph.
• The encoder constructed as Graph Isomorphism Network (GIN) that additionally
considers edges encodes that graph to its corresponding latent vector
10
Experiments
• a comprehensive set of 9 distinct benchmark datasets sourced from MoleculeNet
11
Results and Discussion
• Results for classification tasks:
• results indicate that MoL-MoE outperforms other leading methods such as
ChemBerta, Chemberta2, and Galactica 30B and 120B.
12
Results and Discussion
• Results for regression tasks:
• MoL-MoE shows robust performance across all tested regression benchmarks,
consistently outperforming other state-of-the-art methods.
13
CONCLUSION
• MoL-MoE, a Multi-view Mixture-of-Experts framework that integrates multiple latent
spaces from SMILES, SELFIES, and molecular graphs to predict molecular properties.
• MoL-MoE dynamically adjusts its focus on different molecular representations based on
the specific needs of each task
• This indicates that the choice of representation is crucial for optimizing model
performance, especially for tasks with distinct characteristics or complexities. For
instance, some tasks may benefit more from SMILES or SELFIES, while others may
require a focus on molecular graphs.
241202_Thuy_Labseminar[Multi-View Mixture-of-Experts for Predicting Molecular Properties Using SMILES, SELFIES, and Graph-Based Representations].pptx

More Related Content

PPTX
250602_Thuy_Labseminar[MolCA: Molecular Graph-Language Modeling with Cross-Mo...
thanhdowork
 
PPTX
250310_Thuy_Labseminar[MORE: Molecule Pretraining with Multi-Level Pretext Ta...
thanhdowork
 
PPTX
240610_Thuy_Labseminar[Rethinking Tokenizer and Decoder in Masked Graph Model...
thanhdowork
 
PPTX
241104_Thuy_Labseminar[MOAT: Graph Prompting for 3D Molecular Graphs].pptx
thanhdowork
 
PPTX
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
ssuser4b1f48
 
PPTX
250414_Thuy_Labseminar[Few-Shot Graph Learning for Molecular Property Predict...
thanhdowork
 
PPTX
250609_Thuy_Labseminar[InstructMol: Multi-Modal Integration for Building a Ve...
thanhdowork
 
PPTX
240401_Thuy_Labseminar[Train Once and Explain Everywhere: Pre-training Interp...
thanhdowork
 
250602_Thuy_Labseminar[MolCA: Molecular Graph-Language Modeling with Cross-Mo...
thanhdowork
 
250310_Thuy_Labseminar[MORE: Molecule Pretraining with Multi-Level Pretext Ta...
thanhdowork
 
240610_Thuy_Labseminar[Rethinking Tokenizer and Decoder in Masked Graph Model...
thanhdowork
 
241104_Thuy_Labseminar[MOAT: Graph Prompting for 3D Molecular Graphs].pptx
thanhdowork
 
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
ssuser4b1f48
 
250414_Thuy_Labseminar[Few-Shot Graph Learning for Molecular Property Predict...
thanhdowork
 
250609_Thuy_Labseminar[InstructMol: Multi-Modal Integration for Building a Ve...
thanhdowork
 
240401_Thuy_Labseminar[Train Once and Explain Everywhere: Pre-training Interp...
thanhdowork
 

Similar to 241202_Thuy_Labseminar[Multi-View Mixture-of-Experts for Predicting Molecular Properties Using SMILES, SELFIES, and Graph-Based Representations].pptx (20)

PPTX
250303_JW_labseminar[Self-Supervised Graph Transformer on Large-Scale Molecul...
thanhdowork
 
PPTX
250224_Thuy_Labseminar[Bi-level Contrastive Learning for Knowledge-Enhanced M...
thanhdowork
 
PPTX
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
thanhdowork
 
PDF
Modern Convolutional Neural Network techniques for image segmentation
Gioele Ciaparrone
 
PPTX
241014_Thuy_Labseminar[Where to Mask: Structure-Guided Masking for Graph Mask...
thanhdowork
 
PPTX
Image Segmentation Using Deep Learning : A survey
NUPUR YADAV
 
PPTX
[NS][Lab_Seminar_250203]KAG-prompt (1).pptx
thanhdowork
 
PPTX
[NS][Lab_Seminar_250203]KAG-prompt (1).pptx
thanhdowork
 
PPTX
250428_Thuy_Labseminar[Unified Graph Neural Networks Pre-training for Multi-d...
thanhdowork
 
PPTX
240722_Thuy_Labseminar[Unveiling Global Interactive Patterns across Graphs: T...
thanhdowork
 
PPTX
240902_Thuy_Labseminar[ASAP: Adaptive Structure Aware Pooling for Learning Hi...
thanhdowork
 
PPTX
250707_Thuy_Labseminar[3D-MOLM: TOWARDS 3D MOLECULE-TEXT INTERPRETATION IN LA...
thanhdowork
 
PPTX
240930_Thuy_Labseminar[Molecular Contrastive LearningwithChemical Element Kno...
thanhdowork
 
PPTX
240701_Thuy_Labseminar[Motif-aware Attribute Masking for Molecular Graph Pre-...
thanhdowork
 
PPTX
A methodology for full system power modeling in heterogeneous data centers
Raimon Bosch
 
PPTX
AI Project IIT.pptx
clothingrooh
 
PPTX
250526_JW_labseminar[Hierarchical Molecular Graph Self-Supervised Learning fo...
thanhdowork
 
PPTX
Mnist soln
DanishFaisal4
 
PPTX
250428_JW_labseminar[KGAT: Knowledge Graph Attention Network for Recommendati...
thanhdowork
 
PPTX
250317_JW_labseminar[KPGT: Knowledge-Guided Pre-training of Graph Transformer...
thanhdowork
 
250303_JW_labseminar[Self-Supervised Graph Transformer on Large-Scale Molecul...
thanhdowork
 
250224_Thuy_Labseminar[Bi-level Contrastive Learning for Knowledge-Enhanced M...
thanhdowork
 
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
thanhdowork
 
Modern Convolutional Neural Network techniques for image segmentation
Gioele Ciaparrone
 
241014_Thuy_Labseminar[Where to Mask: Structure-Guided Masking for Graph Mask...
thanhdowork
 
Image Segmentation Using Deep Learning : A survey
NUPUR YADAV
 
[NS][Lab_Seminar_250203]KAG-prompt (1).pptx
thanhdowork
 
[NS][Lab_Seminar_250203]KAG-prompt (1).pptx
thanhdowork
 
250428_Thuy_Labseminar[Unified Graph Neural Networks Pre-training for Multi-d...
thanhdowork
 
240722_Thuy_Labseminar[Unveiling Global Interactive Patterns across Graphs: T...
thanhdowork
 
240902_Thuy_Labseminar[ASAP: Adaptive Structure Aware Pooling for Learning Hi...
thanhdowork
 
250707_Thuy_Labseminar[3D-MOLM: TOWARDS 3D MOLECULE-TEXT INTERPRETATION IN LA...
thanhdowork
 
240930_Thuy_Labseminar[Molecular Contrastive LearningwithChemical Element Kno...
thanhdowork
 
240701_Thuy_Labseminar[Motif-aware Attribute Masking for Molecular Graph Pre-...
thanhdowork
 
A methodology for full system power modeling in heterogeneous data centers
Raimon Bosch
 
AI Project IIT.pptx
clothingrooh
 
250526_JW_labseminar[Hierarchical Molecular Graph Self-Supervised Learning fo...
thanhdowork
 
Mnist soln
DanishFaisal4
 
250428_JW_labseminar[KGAT: Knowledge Graph Attention Network for Recommendati...
thanhdowork
 
250317_JW_labseminar[KPGT: Knowledge-Guided Pre-training of Graph Transformer...
thanhdowork
 
Ad

More from thanhdowork (20)

PPTX
250728_Thuy_Labseminar[Predictive Chemistry Augmented with Text Retrieval].pptx
thanhdowork
 
PPTX
[NS][Lab_Seminar_250728]NeuralWalker.pptx
thanhdowork
 
PPTX
A Novel Shape-Aware Topological Representation for GPR Data with DNN Integrat...
thanhdowork
 
PPTX
250721_Thien_Labseminar[Variational Graph Auto-Encoders].pptx
thanhdowork
 
PPTX
250721_HW_LabSeminar[RingFormer: A Ring-Enhanced Graph Transformer for Organi...
thanhdowork
 
PPTX
250721_Thuy_Labseminar[Thought Propagation: An Analogical Approach to Complex...
thanhdowork
 
PPTX
[NS][Lab_Seminar_250721]On Measuring Long-Range Interactions in Graph Neural ...
thanhdowork
 
PPTX
250714_HW_LabSeminar[Structural Reasoning Improves Molecular Understanding of...
thanhdowork
 
PPTX
[NS][Lab_Seminar_250714]Candidate Set Re-ranking for Composed Image Retrieval...
thanhdowork
 
PPTX
250714_Thuy_Labseminar[BioT5: Enriching Cross-modal Integration in Biology wi...
thanhdowork
 
PPTX
250707_HW_LabSeminar[CHEMICAL-REACTION-AWARE MOLECULE REPRESENTATION LEARNING...
thanhdowork
 
PPTX
[NS][Lab_Seminar_250707]Learning with Noisy Triplet Correspondence for Compos...
thanhdowork
 
PPTX
250707_JW_labseminar[CBAM: Convolutional Block Attention Module].pptx
thanhdowork
 
PPTX
[NS][Lab_Seminar_250623]ConText-CIR.pptx
thanhdowork
 
PPTX
250629_HW_LabSeminar[ReactGPT: Understanding of Chemical Reactions via In-Con...
thanhdowork
 
PPTX
250630_JW_labseminar[Does GNN Pretraining Help Molecular].pptx
thanhdowork
 
PPTX
[NS][Lab_Seminar_250623]UniFashion: A Unified Vision-Language Model for Multi...
thanhdowork
 
PPTX
250623_JW_labseminar[STRATEGIES FOR PRE-TRAINING GRAPH NEURAL NETWORKS].pptx
thanhdowork
 
PPTX
[NS][Lab_Seminar_250616]FashionERN: Enhance-and-Refine Network for Composed F...
thanhdowork
 
PPTX
250616_Thuy_Labseminar[Conversational Drug Editing Using Retrieval and Domain...
thanhdowork
 
250728_Thuy_Labseminar[Predictive Chemistry Augmented with Text Retrieval].pptx
thanhdowork
 
[NS][Lab_Seminar_250728]NeuralWalker.pptx
thanhdowork
 
A Novel Shape-Aware Topological Representation for GPR Data with DNN Integrat...
thanhdowork
 
250721_Thien_Labseminar[Variational Graph Auto-Encoders].pptx
thanhdowork
 
250721_HW_LabSeminar[RingFormer: A Ring-Enhanced Graph Transformer for Organi...
thanhdowork
 
250721_Thuy_Labseminar[Thought Propagation: An Analogical Approach to Complex...
thanhdowork
 
[NS][Lab_Seminar_250721]On Measuring Long-Range Interactions in Graph Neural ...
thanhdowork
 
250714_HW_LabSeminar[Structural Reasoning Improves Molecular Understanding of...
thanhdowork
 
[NS][Lab_Seminar_250714]Candidate Set Re-ranking for Composed Image Retrieval...
thanhdowork
 
250714_Thuy_Labseminar[BioT5: Enriching Cross-modal Integration in Biology wi...
thanhdowork
 
250707_HW_LabSeminar[CHEMICAL-REACTION-AWARE MOLECULE REPRESENTATION LEARNING...
thanhdowork
 
[NS][Lab_Seminar_250707]Learning with Noisy Triplet Correspondence for Compos...
thanhdowork
 
250707_JW_labseminar[CBAM: Convolutional Block Attention Module].pptx
thanhdowork
 
[NS][Lab_Seminar_250623]ConText-CIR.pptx
thanhdowork
 
250629_HW_LabSeminar[ReactGPT: Understanding of Chemical Reactions via In-Con...
thanhdowork
 
250630_JW_labseminar[Does GNN Pretraining Help Molecular].pptx
thanhdowork
 
[NS][Lab_Seminar_250623]UniFashion: A Unified Vision-Language Model for Multi...
thanhdowork
 
250623_JW_labseminar[STRATEGIES FOR PRE-TRAINING GRAPH NEURAL NETWORKS].pptx
thanhdowork
 
[NS][Lab_Seminar_250616]FashionERN: Enhance-and-Refine Network for Composed F...
thanhdowork
 
250616_Thuy_Labseminar[Conversational Drug Editing Using Retrieval and Domain...
thanhdowork
 
Ad

Recently uploaded (20)

DOCX
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
PPTX
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
PPTX
20250924 Navigating the Future: How to tell the difference between an emergen...
McGuinness Institute
 
PPTX
CONCEPT OF CHILD CARE. pptx
AneetaSharma15
 
DOCX
Modul Ajar Deep Learning Bahasa Inggris Kelas 11 Terbaru 2025
wahyurestu63
 
PPTX
Software Engineering BSC DS UNIT 1 .pptx
Dr. Pallawi Bulakh
 
PPTX
CARE OF UNCONSCIOUS PATIENTS .pptx
AneetaSharma15
 
PPTX
An introduction to Dialogue writing.pptx
drsiddhantnagine
 
PPTX
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
PDF
RA 12028_ARAL_Orientation_Day-2-Sessions_v2.pdf
Seven De Los Reyes
 
PPTX
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
PDF
Review of Related Literature & Studies.pdf
Thelma Villaflores
 
PPTX
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
PPTX
A Smarter Way to Think About Choosing a College
Cyndy McDonald
 
PPTX
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
PPTX
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 
PDF
Virat Kohli- the Pride of Indian cricket
kushpar147
 
PPTX
Python-Application-in-Drug-Design by R D Jawarkar.pptx
Rahul Jawarkar
 
PPTX
HEALTH CARE DELIVERY SYSTEM - UNIT 2 - GNM 3RD YEAR.pptx
Priyanshu Anand
 
PPTX
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
20250924 Navigating the Future: How to tell the difference between an emergen...
McGuinness Institute
 
CONCEPT OF CHILD CARE. pptx
AneetaSharma15
 
Modul Ajar Deep Learning Bahasa Inggris Kelas 11 Terbaru 2025
wahyurestu63
 
Software Engineering BSC DS UNIT 1 .pptx
Dr. Pallawi Bulakh
 
CARE OF UNCONSCIOUS PATIENTS .pptx
AneetaSharma15
 
An introduction to Dialogue writing.pptx
drsiddhantnagine
 
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
RA 12028_ARAL_Orientation_Day-2-Sessions_v2.pdf
Seven De Los Reyes
 
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
Review of Related Literature & Studies.pdf
Thelma Villaflores
 
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
A Smarter Way to Think About Choosing a College
Cyndy McDonald
 
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 
Virat Kohli- the Pride of Indian cricket
kushpar147
 
Python-Application-in-Drug-Design by R D Jawarkar.pptx
Rahul Jawarkar
 
HEALTH CARE DELIVERY SYSTEM - UNIT 2 - GNM 3RD YEAR.pptx
Priyanshu Anand
 
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 

241202_Thuy_Labseminar[Multi-View Mixture-of-Experts for Predicting Molecular Properties Using SMILES, SELFIES, and Graph-Based Representations].pptx

  • 1. Multi-View Mixture-of-Experts for Predicting Molecular Properties Using SMILES, SELFIES, and Graph-Based Representations Van Thuy Hoang Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: [email protected] 2024-12-02 Eduardo Soares et. al.; NeurIPS 2024
  • 2. 2 BACKGROUND: Graph Convolutional Networks (GCNs) • Key Idea: Each node aggregates information from its neighborhood to get contextualized node embedding. • Limitation: Most GNNs focus on homogeneous graph. Neural Transformation Aggregate neighbor’s information
  • 3. 3 Learning molecular structures though GNNs • Inputs: Molecules • Outputs: a score for specific task prediction Graph Neural Networks Molecules Pooling Function Task Prediction Molecular Representation Learning
  • 4. 4 Molecular Graph Neural Network • The node representation at the -th layer of GNN is formulated as: 𝑣 𝑙 • To obtain the graph-level representation h for a molecular graph 𝐺 :
  • 5. 5 Idea • In MoE architectures, multiple experts act as sub-networks, with a gating network selectively activating only the most relevant experts for each input • a large SMILES-based encoder-decoder • a BART-based SELFIES encoder-decoder • a graph-based SMILES model
  • 6. 6 Methodology • Multi-View Mixture-of-Expert Layer • Before the gating network is used, the feature extraction module converts raw SMILES input into embeddings for the gating network. • Each SMILES string is tokenized, and these tokens are turned into fixed 768-dimensional vectors. Mean pooling is then used to create a single embedding for the molecule. • Other feature extraction methods can also be used to improve the molecule’s representation. • Let G(x) be the output of the gating network and Ei(ˆx) be the output of the i-th expert network for a given SMILES input x
  • 7. 7 SMILES-based foundation model • This paper utilized the SMI-TED289M foundation model as the SMILES encoder. • SMI-TED289M is a large-scale, open-source encoder-decoder model pre-trained on a curated dataset of 91 million SMILES samples from PubChem • All 91 million molecules curated from PubChem were utilized in the tokenization process, resulting in a set of 4 billion molecular tokens.
  • 8. 8 SELFIES-based foundation model • SELFIES-BART, the SELFIES-based foundation model is an encoder-decoder architecture derived from the BART (Bidirectional Auto-Regressive Transformer) model • This paper first convert these SMILES strings to SELFIES using the SELFIES API. • In SELFIES each atom or bond is represented by symbols enclosed in [ ], which are then tokenized using a word level tokenization scheme where each symbol or bond in [ ] is treated as a word. • For example: • SMILES: CCO -> SELFIES: [C][C][O] • SMILES: C1=CC=CC=C1 -> SELFIES: [C][=C][C][=C][C][=C][Ring1]
  • 9. 9 Graph-based model for small molecules • This paper employ MHG-GNN, an autoencoder that combines GNN with Molecular Hypergraph Grammar (MHG) introduced for MHG-VAE • MHG-GNN receives a molecular structure represented as a graph. • The encoder constructed as Graph Isomorphism Network (GIN) that additionally considers edges encodes that graph to its corresponding latent vector
  • 10. 10 Experiments • a comprehensive set of 9 distinct benchmark datasets sourced from MoleculeNet
  • 11. 11 Results and Discussion • Results for classification tasks: • results indicate that MoL-MoE outperforms other leading methods such as ChemBerta, Chemberta2, and Galactica 30B and 120B.
  • 12. 12 Results and Discussion • Results for regression tasks: • MoL-MoE shows robust performance across all tested regression benchmarks, consistently outperforming other state-of-the-art methods.
  • 13. 13 CONCLUSION • MoL-MoE, a Multi-view Mixture-of-Experts framework that integrates multiple latent spaces from SMILES, SELFIES, and molecular graphs to predict molecular properties. • MoL-MoE dynamically adjusts its focus on different molecular representations based on the specific needs of each task • This indicates that the choice of representation is crucial for optimizing model performance, especially for tasks with distinct characteristics or complexities. For instance, some tasks may benefit more from SMILES or SELFIES, while others may require a focus on molecular graphs.