miniG

miniG

CausalLM

miniG: A 9B parameter multimodal LLM with 1M context window, trained on 120M synthetic entries. Supports text/image input with focus on high-quality inference over benchmark performance.

PropertyValue
Base ModelTHUDM/glm-4-9b-chat-1m
Parameters9B (LLM) + 5B (Optional ViT)
Context Window1M tokens
Training Data120M synthetic entries
Model URLhttps://huggingface.co/CausalLM/miniG

What is miniG?

miniG is an advanced multimodal language model that combines powerful text processing capabilities with vision understanding. Trained on a massive synthetic dataset of over 120 million entries, it represents a significant step forward in combining large context windows with efficient processing. The model was initialized from THUDM/glm-4-9b-chat-1m and includes an optional Vision Transformer (ViT) component.

Implementation Details

The model utilizes state-of-the-art techniques including retrieval-augmented generation and knowledge graph integration. It features a unique training approach where data synthesis is conducted within clusters derived from a 20B token pretraining corpus. For optimal performance, it's recommended to use standard implementations like Hugging Face Transformers rather than accelerated kernels.

  • Supports both text and image input modalities
  • Specialized text-only weight version available
  • Recommended inference parameters: top_p=0.8, temperature=0.3
  • Performance optimized for Transformers implementation

Core Capabilities

  • 1M token context window for extensive document processing
  • Multimodal processing with both text and image understanding
  • High-quality text generation with reduced hallucination
  • Robust performance on complex tasks without benchmark-specific optimization

Frequently Asked Questions

Q: What makes this model unique?

miniG stands out for its massive synthetic dataset training and focus on real-world performance over benchmark scores. It includes a special alt version trained with masked context to reduce overfitting and provide more objective performance.

Q: What are the recommended use cases?

The model excels in tasks requiring deep understanding of both text and images, with a particular strength in handling long-form content thanks to its 1M token context window. It's particularly suitable for applications requiring high-quality, reliable outputs rather than optimized benchmark performance.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026