miniG
Property | Value |
---|---|
Base Model | THUDM/glm-4-9b-chat-1m |
Parameters | 9B (LLM) + 5B (Optional ViT) |
Context Window | 1M tokens |
Training Data | 120M synthetic entries |
Model URL | https://huggingface.co/CausalLM/miniG |
What is miniG?
miniG is an advanced multimodal language model that combines powerful text processing capabilities with vision understanding. Trained on a massive synthetic dataset of over 120 million entries, it represents a significant step forward in combining large context windows with efficient processing. The model was initialized from THUDM/glm-4-9b-chat-1m and includes an optional Vision Transformer (ViT) component.
Implementation Details
The model utilizes state-of-the-art techniques including retrieval-augmented generation and knowledge graph integration. It features a unique training approach where data synthesis is conducted within clusters derived from a 20B token pretraining corpus. For optimal performance, it's recommended to use standard implementations like Hugging Face Transformers rather than accelerated kernels.
- Supports both text and image input modalities
- Specialized text-only weight version available
- Recommended inference parameters: top_p=0.8, temperature=0.3
- Performance optimized for Transformers implementation
Core Capabilities
- 1M token context window for extensive document processing
- Multimodal processing with both text and image understanding
- High-quality text generation with reduced hallucination
- Robust performance on complex tasks without benchmark-specific optimization
Frequently Asked Questions
Q: What makes this model unique?
miniG stands out for its massive synthetic dataset training and focus on real-world performance over benchmark scores. It includes a special alt version trained with masked context to reduce overfitting and provide more objective performance.
Q: What are the recommended use cases?
The model excels in tasks requiring deep understanding of both text and images, with a particular strength in handling long-form content thanks to its 1M token context window. It's particularly suitable for applications requiring high-quality, reliable outputs rather than optimized benchmark performance.