rinalmo

Maintained By
multimolecule

RiNALMo

PropertyValue
Parameter Count651M
Model TypeBERT-style MLM
LicenseAGPL-3.0
PaperarXiv:2403.00043
Architecture33 layers, 1280 hidden size, 20 heads

What is RiNALMo?

RiNALMo is a sophisticated pre-trained language model designed specifically for non-coding RNA (ncRNA) sequence analysis. Built on the BERT architecture, it leverages masked language modeling to understand and predict RNA sequence patterns across a massive dataset of 36 million unique ncRNA sequences.

Implementation Details

The model employs a deep architecture with 33 layers, 1280 hidden dimensions, and 20 attention heads. It was trained on 7 NVIDIA A100 GPUs using a carefully curated dataset combining RNAcentral, Rfam, Ensembl Genome Browser, and Nucleotide databases.

  • Pre-training uses 15% token masking with specialized replacement strategies
  • Implements sequence clustering for diverse batch sampling
  • Supports maximum sequence length of 1022 tokens
  • Includes specialized preprocessing for RNA sequences (U/T conversion)

Core Capabilities

  • Masked language modeling for RNA sequences
  • Feature extraction for downstream tasks
  • Sequence-level classification and regression
  • Nucleotide-level prediction
  • Contact prediction for RNA structure analysis

Frequently Asked Questions

Q: What makes this model unique?

RiNALMo stands out for its specialized focus on RNA sequences and its comprehensive training on diverse RNA databases, making it particularly effective for RNA structure prediction tasks. The model's architecture and training approach ensure high-quality representation learning for RNA sequences.

Q: What are the recommended use cases?

The model is ideal for RNA sequence analysis tasks, including structure prediction, sequence classification, and feature extraction. It can be fine-tuned for specific downstream tasks in RNA research and analysis.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.