miniG

CausalLM

miniG: A 9B parameter multimodal LLM with 1M context window, trained on 120M synthetic entries. Supports text/image input with focus on high-quality inference over benchmark performance.

Property	Value
Base Model	THUDM/glm-4-9b-chat-1m
Parameters	9B (LLM) + 5B (Optional ViT)
Context Window	1M tokens
Training Data	120M synthetic entries
Model URL	https://huggingface.co/CausalLM/miniG

What is miniG?

miniG is an advanced multimodal language model that combines powerful text processing capabilities with vision understanding. Trained on a massive synthetic dataset of over 120 million entries, it represents a significant step forward in combining large context windows with efficient processing. The model was initialized from THUDM/glm-4-9b-chat-1m and includes an optional Vision Transformer (ViT) component.

Implementation Details

The model utilizes state-of-the-art techniques including retrieval-augmented generation and knowledge graph integration. It features a unique training approach where data synthesis is conducted within clusters derived from a 20B token pretraining corpus. For optimal performance, it's recommended to use standard implementations like Hugging Face Transformers rather than accelerated kernels.

Supports both text and image input modalities
Specialized text-only weight version available
Recommended inference parameters: top_p=0.8, temperature=0.3
Performance optimized for Transformers implementation

Core Capabilities

1M token context window for extensive document processing
Multimodal processing with both text and image understanding
High-quality text generation with reduced hallucination
Robust performance on complex tasks without benchmark-specific optimization

Frequently Asked Questions

Q: What makes this model unique?

miniG stands out for its massive synthetic dataset training and focus on real-world performance over benchmark scores. It includes a special alt version trained with masked context to reduce overfitting and provide more objective performance.

Q: What are the recommended use cases?

The model excels in tasks requiring deep understanding of both text and images, with a particular strength in handling long-form content thanks to its 1M token context window. It's particularly suitable for applications requiring high-quality, reliable outputs rather than optimized benchmark performance.