MiniMind2

Maintained By
jingyaogong

MiniMind2

PropertyValue
Parameter Count26M-145M
Model TypeLanguage Model (Chinese)
ArchitectureTransformer Decoder-Only
LicenseApache-2.0
Training Time2 hours on NVIDIA 3090
Model URLhttps://huggingface.co/jingyaogong/MiniMind2

What is MiniMind2?

MiniMind2 is an ultra-lightweight Chinese language model series designed to make LLM training accessible to individual researchers and developers. With models ranging from just 26M to 145M parameters, it achieves remarkable performance while being trainable on a single consumer GPU in just 2 hours.

Implementation Details

The model implements a Transformer decoder-only architecture with several optimizations including RMSNorm pre-normalization, SwiGLU activation, and rotary positional embeddings (RoPE). It features both dense and mixture-of-experts (MoE) variants, with the latter incorporating expert routing for improved efficiency.

  • Custom tokenizer with 6,400 tokens vocabulary
  • Implemented in PyTorch with minimal dependencies
  • Supports single/multi-GPU training via DDP and DeepSpeed
  • Complete training pipeline including pretrain, SFT, LoRA, and DPO

Core Capabilities

  • Basic conversation and knowledge-based Q&A
  • Chinese language understanding and generation
  • Limited English language capabilities
  • Support for custom domain adaptation via LoRA
  • Reasoning capabilities through optional R1 distillation

Frequently Asked Questions

Q: What makes this model unique?

MiniMind2's uniqueness lies in its extreme efficiency and accessibility. It demonstrates that meaningful language model capabilities can be achieved with minimal computational resources, making it possible for individual researchers to train models from scratch in just 2 hours for less than $0.50.

Q: What are the recommended use cases?

The model is ideal for research, educational purposes, and proof-of-concept deployments where resource constraints are significant. It's particularly suitable for learning LLM training fundamentals and experimenting with custom domain adaptation through LoRA fine-tuning.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.