hyenadna-small-32k-seqlen-hf
Property | Value |
---|---|
Author | LongSafari |
Maximum Sequence Length | 32,000 nucleotides |
Model Hub | HuggingFace |
Recommended GPU | T4 (for fine-tuning) |
What is hyenadna-small-32k-seqlen-hf?
HyenaDNA-small is a genomic foundation model designed for processing DNA sequences up to 32,000 nucleotides in length. It represents a significant advancement in genomic sequence modeling by utilizing Hyena operators instead of traditional attention mechanisms, enabling efficient processing of long DNA sequences at single nucleotide resolution.
Implementation Details
The model implements a stack of Hyena operators as a subquadratic alternative to attention in Transformers. It uses a single character tokenizer with a primary vocabulary of 4 nucleotides plus special tokens, enabling true single nucleotide resolution analysis. The architecture was pretrained on the human reference genome (HG38) using next nucleotide prediction.
- Subquadratic computational complexity through Hyena operators
- Single nucleotide resolution tokenization
- Implicit long convolution for global receptive field
- Efficient memory usage compared to attention-based models
Core Capabilities
- DNA sequence analysis up to 32k length
- Regulatory element prediction
- Chromatin profile analysis
- Species classification
- Support for in-context learning with soft prompt tunable tokens
Frequently Asked Questions
Q: What makes this model unique?
The model's use of Hyena operators instead of attention mechanisms allows it to process much longer DNA sequences more efficiently than traditional Transformer models, while maintaining high accuracy in genomic analysis tasks.
Q: What are the recommended use cases?
The model is particularly well-suited for genomic sequence analysis tasks including regulatory element prediction, chromatin profile analysis, and species classification. It's ideal for applications requiring processing of DNA sequences up to 32,000 nucleotides in length.