enformer-191k
Property | Value |
---|---|
Author | EleutherAI |
Architecture | Transformer-based |
Paper | Nature Methods Paper |
Input Length | 196,608 basepairs |
Performance | ~0.45 Pearson R on human data |
What is enformer-191k?
Enformer-191k is a sophisticated neural network architecture designed for predicting gene expression from DNA sequences. Based on the Transformer architecture, this model represents a significant advancement in genomic sequence analysis, capable of processing long DNA sequences of 196,608 basepairs while effectively capturing long-range interactions.
Implementation Details
This implementation is a PyTorch version of the original Enformer model, adapted by Phil Wang. The model was trained using a Poisson loss objective and incorporates shift augmentation without reverse complement augmentation. It processes sequences to predict gene expression patterns with a target length of 896.
- Long-range sequence processing (196,608 bp)
- Shift augmentation implementation
- Poisson loss objective function
- PyTorch-based architecture
Core Capabilities
- Accurate gene expression prediction from DNA sequences
- Processing of extensive genomic regions
- Integration of long-range interactions
- Achievement of ~0.45 Pearson correlation on human data
Frequently Asked Questions
Q: What makes this model unique?
The model's ability to process extremely long DNA sequences (196,608 bp) while maintaining high prediction accuracy sets it apart from previous approaches. Its integration of long-range interactions and achievement of ~0.45 Pearson correlation on human data represents a significant advancement in gene expression prediction.
Q: What are the recommended use cases?
This model is particularly suited for genomic research applications requiring gene expression prediction from DNA sequences, especially when long-range interactions are crucial. It's valuable for studying regulatory elements and their impact on gene expression across extended genomic regions.