m2-bert-80M-32k-retrieval

m2-bert-80M-32k-retrieval

togethercomputer

An 80M parameter BERT variant optimized for long-context retrieval (32k sequence length), featuring Monarch Mixer architecture for efficient processing.

PropertyValue
Model Size80M parameters
LicenseApache-2.0
PaperMonarch Mixer Paper
Max Sequence Length32,768 tokens

What is m2-bert-80M-32k-retrieval?

The m2-bert-80M-32k-retrieval is an innovative variant of BERT that employs the Monarch Mixer architecture, specifically designed for long-context retrieval tasks. This 80M parameter model has been optimized to handle sequences up to 32,768 tokens in length, making it particularly suitable for applications requiring extensive context processing.

Implementation Details

The model generates embeddings with a dimensionality of 768 and can be easily integrated using the Hugging Face transformers library or the Together API. It utilizes FlashFFTConv for efficient processing and requires trust_remote_code=True when loading.

  • Built on BERT architecture with Monarch Mixer modifications
  • Supports both PyTorch and Together API implementations
  • Generates 768-dimensional embeddings for retrieval tasks
  • Implements efficient sub-quadratic GEMM-based architecture

Core Capabilities

  • Long-sequence processing up to 32k tokens
  • Efficient retrieval-optimized embeddings
  • Sentence similarity tasks
  • Text classification capabilities
  • Optimized for English language processing

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its ability to process extremely long sequences (32k tokens) while maintaining efficiency through the Monarch Mixer architecture. It's specifically optimized for retrieval tasks and uses sub-quadratic GEMM-based computations.

Q: What are the recommended use cases?

The model is ideal for long-document retrieval, sentence similarity tasks, and text classification applications that require processing of extensive contexts. It's particularly suitable for applications where traditional BERT models might struggle with sequence length limitations.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026