DNABERT-2-117M

DNABERT-2-117M

zhihan1996

DNABERT-2-117M is a transformer-based genome foundation model for DNA sequence analysis with 117M parameters, offering efficient multi-species genome processing.

PropertyValue
Authorzhihan1996
PaperDNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome
Downloads84,873
TagsTransformers, PyTorch, Biology, Medical

What is DNABERT-2-117M?

DNABERT-2-117M is a state-of-the-art transformer-based genome foundation model designed specifically for processing and analyzing DNA sequences across multiple species. Built upon the MosaicBERT architecture, this model represents a significant advancement in genomic data analysis, combining efficiency with powerful sequence processing capabilities.

Implementation Details

The model can be easily implemented using the Hugging Face Transformers library, supporting both PyTorch and custom code integration. It provides versatile embedding options including mean and max pooling for sequence representation, generating 768-dimensional vectors for DNA sequences.

  • Seamless integration with HuggingFace Transformers ecosystem
  • Support for various DNA sequence lengths
  • 768-dimensional output embeddings
  • Multiple pooling strategies available

Core Capabilities

  • Multi-species genome analysis
  • DNA sequence embedding generation
  • Foundation model capabilities for transfer learning
  • Efficient processing of genomic data

Frequently Asked Questions

Q: What makes this model unique?

DNABERT-2-117M stands out for its efficient architecture based on MosaicBERT, specifically optimized for genomic data processing across multiple species. Its ability to generate high-quality DNA sequence embeddings while maintaining computational efficiency makes it particularly valuable for genomic research.

Q: What are the recommended use cases?

The model is particularly well-suited for genomic research, DNA sequence analysis, multi-species genome studies, and medical applications requiring DNA sequence processing. It can be used as a foundation model for transfer learning in specific genomic tasks.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026