Evo-1-8k-CRISPR
Property | Value |
---|---|
Parameter Count | 7 Billion |
Context Length | 8,192 tokens |
Base Architecture | StripedHyena |
Training Data | OpenGenome (~300B tokens) |
Developer | Arc Institute & TogetherAI |
Paper | Sequence modeling and design from molecular to genome scale with Evo |
What is evo-1-8k-crispr?
Evo-1-8k-crispr is a specialized biological foundation model designed specifically for generating CRISPR-Cas systems. It represents a fine-tuned version of the base Evo model, optimized for working with CRISPR-Cas9/12/13 systems at a single-nucleotide resolution. The model leverages the innovative StripedHyena architecture, enabling efficient processing of biological sequences with near-linear scaling of compute and memory requirements.
Implementation Details
The model is built on the StripedHyena architecture, which combines multi-head attention with gated convolutions arranged in Hyena blocks. This hybrid approach offers significant advantages over traditional decoder-only Transformers, particularly in processing biological sequences.
- Utilizes mixed precision computation with float32 precision for poles and residues
- Supports efficient autoregressive generation capable of handling >500k sequences on a single 80GB GPU
- Features multiple parametrization options for different workload requirements
- Enables 3x faster training and finetuning at long context lengths
Core Capabilities
- Generation of CRISPR-Cas systems (Cas9/12/13)
- Single-nucleotide, byte-level resolution sequence modeling
- Long-context processing with 8k token window
- Efficient sequence generation and processing
- Robust performance beyond traditional compute-optimal frontiers
Frequently Asked Questions
Q: What makes this model unique?
This model combines the StripedHyena architecture with specialized training for CRISPR-Cas systems, offering unprecedented efficiency in biological sequence modeling. Its ability to process sequences at single-nucleotide resolution while maintaining near-linear scaling makes it particularly valuable for genetic engineering applications.
Q: What are the recommended use cases?
The model is specifically designed for generating and working with CRISPR-Cas systems, making it ideal for genetic engineering applications, CRISPR design, and related biological sequence modeling tasks. It's particularly useful when working with Cas9, Cas12, and Cas13 systems.