GemmaX2-28-2B-v0.1

Property	Value
Model Size	2B parameters
Developer	Xiaomi
Paper	arXiv:2502.02481
Model Hub	Hugging Face
Languages Supported	28 languages

What is GemmaX2-28-2B-v0.1?

GemmaX2-28-2B-v0.1 is an advanced multilingual translation model developed through a two-stage process: continual pretraining of Gemma2-2B on 56 billion tokens of multilingual data, followed by supervised fine-tuning on high-quality translation instructions. This model represents a significant advancement in practical-scale multilingual machine translation using open large language models.

Implementation Details

The model is built on the Gemma2-2B architecture and has been enhanced through extensive pretraining on both monolingual and parallel data. It can be easily implemented using the Hugging Face Transformers library, supporting seamless integration into existing NLP pipelines.

Built on Gemma2-2B architecture
Trained on 56 billion tokens of multilingual data
Supports 28 different languages including major Asian, European, and Middle Eastern languages
Fine-tuned specifically for translation tasks

Core Capabilities

High-quality translation between 28 languages
Support for both high-resource and low-resource languages
Efficient processing of translation tasks
Integration with standard ML frameworks

Frequently Asked Questions

Q: What makes this model unique?

The model's unique strength lies in its extensive multilingual capabilities, supporting 28 languages through a carefully designed two-stage training process. It represents a practical approach to multilingual translation using open large language models.

Q: What are the recommended use cases?

The model is specifically designed for translation tasks between any of the 28 supported languages, making it ideal for applications requiring multilingual translation capabilities in a production environment.

GemmaX2-28-2B-v0.1

GemmaX2-28-2B-v0.1

What is GemmaX2-28-2B-v0.1?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models