gemma-2B-10M

gemma-2B-10M

mustafaaljadery

Gemma 2B variant with 10M context length using recurrent local attention, optimized for <32GB memory usage. MIT-licensed with F32 tensor support.

PropertyValue
Parameter Count2.51B parameters
LicenseMIT
Tensor TypeF32
Research PapersInfiniAttention, Transformer-XL
Memory Usage<32GB

What is gemma-2B-10M?

Gemma-2B-10M is an innovative variant of the Gemma 2B model that extends the context length to an impressive 10 million tokens while maintaining efficient memory usage. Developed by Mustafa Aljadery and team, this model implements recurrent local attention mechanisms to achieve O(N) memory complexity, making it particularly resource-efficient.

Implementation Details

The model tackles the traditional KV cache bottleneck in transformer architectures by employing a novel approach combining local attention blocks with recurrence, inspired by both InfiniAttention and Transformer-XL architectures. This implementation enables processing of extremely long sequences while keeping memory requirements under 32GB.

  • Utilizes recurrent local attention for efficient memory scaling
  • Implements custom CUDA-optimized inference
  • Supports bfloat16 data type for inference
  • Early checkpoint trained for 200 steps

Core Capabilities

  • 10M token context length processing
  • Memory-efficient operation (<32GB)
  • Native CUDA inference optimization
  • Linear memory scaling with sequence length

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to handle 10M context length while maintaining linear memory scaling through recurrent local attention sets it apart from traditional transformer models that suffer from quadratic memory growth.

Q: What are the recommended use cases?

This model is particularly suited for applications requiring processing of very long documents or contexts, such as document analysis, long-form content generation, and tasks requiring extensive context understanding while operating within reasonable memory constraints.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026