MoLFormer-XL-both-10pct
Property | Value |
---|---|
Parameter Count | 46.8M |
License | Apache 2.0 |
Paper | arXiv:2106.09553 |
Architecture | Linear Attention Transformer with Rotary Embeddings |
What is MoLFormer-XL-both-10pct?
MoLFormer-XL-both-10pct is a state-of-the-art chemical language model designed to process and understand molecular structures through SMILES string representations. This particular variant is trained on 10% of both ZINC and PubChem datasets, comprising up to 1.1B molecules. The model employs a sophisticated linear attention mechanism combined with rotary embeddings to efficiently learn molecular representations.
Implementation Details
The model utilizes a transformer-based architecture optimized for chemical structure processing, with 46.8M parameters and F32 tensor type. It's implemented using PyTorch and supports feature extraction through the Hugging Face transformers library.
- Trained on canonicalized SMILES strings with removed isomeric information
- Supports molecules up to 202 tokens in length
- Trained using 16 NVIDIA V100 GPUs
- Implements masked language modeling for self-supervised learning
Core Capabilities
- Feature extraction from SMILES strings
- Molecular property prediction through fine-tuning
- Molecular similarity measurements
- Visualization of chemical structures
- Performance benchmarking on MoleculeNet tasks showing state-of-the-art results
Frequently Asked Questions
Q: What makes this model unique?
MoLFormer-XL combines linear attention mechanisms with rotary embeddings, enabling efficient processing of chemical structures while achieving state-of-the-art performance on multiple molecular property prediction benchmarks. The model demonstrates exceptional capability in tasks like BBBP (91.5%), HIV (81.3%), and ClinTox (94.6%).
Q: What are the recommended use cases?
The model is primarily designed for feature extraction and fine-tuning for molecular property prediction tasks such as solubility, toxicity, and other chemical properties. It's particularly effective for similarity measurements and visualization tasks, though it's not intended for molecule generation or processing macromolecules larger than 200 atoms.