MoLFormer-XL-both-10pct

Property	Value
Parameter Count	46.8M
License	Apache 2.0
Paper	arXiv:2106.09553
Architecture	Linear Attention Transformer with Rotary Embeddings

What is MoLFormer-XL-both-10pct?

MoLFormer-XL-both-10pct is a state-of-the-art chemical language model designed to process and understand molecular structures through SMILES string representations. This particular variant is trained on 10% of both ZINC and PubChem datasets, comprising up to 1.1B molecules. The model employs a sophisticated linear attention mechanism combined with rotary embeddings to efficiently learn molecular representations.

Implementation Details

The model utilizes a transformer-based architecture optimized for chemical structure processing, with 46.8M parameters and F32 tensor type. It's implemented using PyTorch and supports feature extraction through the Hugging Face transformers library.

Trained on canonicalized SMILES strings with removed isomeric information
Supports molecules up to 202 tokens in length
Trained using 16 NVIDIA V100 GPUs
Implements masked language modeling for self-supervised learning

Core Capabilities

Feature extraction from SMILES strings
Molecular property prediction through fine-tuning
Molecular similarity measurements
Visualization of chemical structures
Performance benchmarking on MoleculeNet tasks showing state-of-the-art results

Frequently Asked Questions

Q: What makes this model unique?

MoLFormer-XL combines linear attention mechanisms with rotary embeddings, enabling efficient processing of chemical structures while achieving state-of-the-art performance on multiple molecular property prediction benchmarks. The model demonstrates exceptional capability in tasks like BBBP (91.5%), HIV (81.3%), and ClinTox (94.6%).

Q: What are the recommended use cases?

The model is primarily designed for feature extraction and fine-tuning for molecular property prediction tasks such as solubility, toxicity, and other chemical properties. It's particularly effective for similarity measurements and visualization tasks, though it's not intended for molecule generation or processing macromolecules larger than 200 atoms.