biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-MUV-101
Property | Value |
---|---|
Parameter Count | 84.6M |
License | Apache 2.0 |
Paper | Multi-view biomedical foundation models |
Developer | IBM Research |
Release Date | October 28, 2024 |
What is biomed.sm.mv-te-84m-MoleculeNet-ligand_scaffold-MUV-101?
This is a sophisticated multimodal biomedical foundation model designed for small molecule analysis and prediction. It implements the MMELON (Multi-view Molecular Embedding with Late Fusion) approach, combining three distinct views of molecular structures: 2D images, molecular graphs, and SMILES text representations. The model is specifically optimized for drug-like molecules under 1000 Da molecular weight.
Implementation Details
The model architecture leverages three key components for molecular representation:
- Image Representation: Uses RDKit to generate 2D molecular visualizations with data augmentation
- Graph Representation: Encodes molecules as undirected graphs with atom and bond properties
- Text Representation: Processes SMILES strings using a custom transformer-based tokenizer
- Attention-based Aggregator: Combines multiple views using learned weights
Core Capabilities
- Molecular property prediction for both regression and classification tasks
- Chemical library similarity search using pre-trained embeddings
- Integration with protein embeddings for combined analysis
- Binding affinity, solubility, and toxicity predictions
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its multi-view approach combining visual, graph, and textual representations of molecules, allowing for robust performance across various property prediction tasks.
Q: What are the recommended use cases?
The model is ideal for drug discovery applications, including lead compound identification, molecular property prediction, and virtual screening of small molecules. It's specifically designed for drug-like molecules under 1000 Da.