GemmaX2-28-2B-gguf
Property | Value |
---|---|
Developer | Xiaomi (original), Tonic (quantization) |
License | Apache 2.0 |
Supported Languages | 28 languages including Arabic, English, Chinese, etc. |
Available Formats | f16, bf16, q8_0, tq1_0, tq2_0 |
Base Model | GemmaX2-28-2B-v0.1 |
What is GemmaX2-28-2B-gguf?
GemmaX2-28-2B-gguf is a collection of quantized variants of the GemmaX2-28-2B-v0.1 translation model. The original model was trained on 56 billion tokens across 28 languages and has been optimized through various quantization techniques to enable efficient deployment across different computational environments. These quantizations range from high-precision 16-bit formats to highly compressed ternary formats, offering flexibility in the trade-off between model size and translation quality.
Implementation Details
The model provides multiple quantization formats optimized for different use cases:
- f16/bf16: 16-bit formats offering near-original model quality (5-7GB)
- q8_0: 8-bit quantization balancing size and quality (3-4GB)
- tq1_0/tq2_0: Ternary quantization for minimal size (1-2GB)
- Compatible with llama.cpp and other GGUF-supporting frameworks
- Converted using convert_hf_to_gguf.py from the original Hugging Face model
Core Capabilities
- Multilingual translation across 28 languages
- Efficient inference on resource-constrained devices
- Support for both online and offline translation
- Optimized for various hardware configurations
- Flexible deployment options through GGUF format
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its comprehensive quantization options, allowing users to choose the optimal balance between model size and translation quality. It maintains support for 28 languages while offering deployment flexibility through the GGUF format, making it suitable for various computational environments.
Q: What are the recommended use cases?
The model is ideal for real-time translation applications, offline translation on mobile or embedded devices, and benchmarking quantized LLM performance. The different quantization levels allow users to choose the appropriate version based on their hardware constraints and quality requirements.