ProstT5

Maintained By
Rostlab

ProstT5

PropertyValue
LicenseMIT
Model TypeEncoder-decoder (T5)
Base ModelProtT5-XL-U50
AuthorRostlab

What is ProstT5?

ProstT5 is an advanced protein language model that bridges the gap between protein sequences and structures. Built upon the ProtT5-XL-U50 architecture, it has been fine-tuned on 17M high-quality protein structures from AlphaFoldDB. The model's unique capability lies in its ability to translate between protein sequences (amino acids) and structural representations (3Di-tokens).

Implementation Details

The model employs a two-phase training approach: First, it learns to represent 3Di-tokens through span-denoising, then it's trained for bidirectional translation between sequences and structures. It uses special tokens ("" and "") to indicate translation direction and supports half-precision operations on GPU for optimal performance.

  • Supports both feature extraction and sequence-structure translation
  • Utilizes DeepSpeed stage-2 with gradient accumulation
  • Implements mixed half-precision (bf16) and PyTorch2.0's torchInductor compiler
  • Processing speed: ~0.1s per protein for embeddings, 0.6-2.5s for translation

Core Capabilities

  • Protein sequence to structure translation ("folding")
  • Structure to sequence translation ("inverse folding")
  • Feature extraction for both amino acid and 3Di sequences
  • Remote homology detection through Foldseek integration

Frequently Asked Questions

Q: What makes this model unique?

ProstT5's ability to perform bidirectional translation between protein sequences and structures, while maintaining the capability to generate meaningful embeddings for both modalities, sets it apart from traditional protein language models.

Q: What are the recommended use cases?

The model excels in protein structure prediction, sequence design based on structural constraints, and generating protein embeddings for downstream tasks like remote homology detection and protein function prediction.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.