MiniMind2

Property	Value
Parameter Count	26M-145M
Model Type	Language Model (Chinese)
Architecture	Transformer Decoder-Only
License	Apache-2.0
Training Time	2 hours on NVIDIA 3090
Model URL	https://huggingface.co/jingyaogong/MiniMind2

What is MiniMind2?

MiniMind2 is an ultra-lightweight Chinese language model series designed to make LLM training accessible to individual researchers and developers. With models ranging from just 26M to 145M parameters, it achieves remarkable performance while being trainable on a single consumer GPU in just 2 hours.

Implementation Details

The model implements a Transformer decoder-only architecture with several optimizations including RMSNorm pre-normalization, SwiGLU activation, and rotary positional embeddings (RoPE). It features both dense and mixture-of-experts (MoE) variants, with the latter incorporating expert routing for improved efficiency.

Custom tokenizer with 6,400 tokens vocabulary
Implemented in PyTorch with minimal dependencies
Supports single/multi-GPU training via DDP and DeepSpeed
Complete training pipeline including pretrain, SFT, LoRA, and DPO

Core Capabilities

Basic conversation and knowledge-based Q&A
Chinese language understanding and generation
Limited English language capabilities
Support for custom domain adaptation via LoRA
Reasoning capabilities through optional R1 distillation

Frequently Asked Questions

Q: What makes this model unique?

MiniMind2's uniqueness lies in its extreme efficiency and accessibility. It demonstrates that meaningful language model capabilities can be achieved with minimal computational resources, making it possible for individual researchers to train models from scratch in just 2 hours for less than $0.50.

Q: What are the recommended use cases?

The model is ideal for research, educational purposes, and proof-of-concept deployments where resource constraints are significant. It's particularly suitable for learning LLM training fundamentals and experimenting with custom domain adaptation through LoRA fine-tuning.

MiniMind2

MiniMind2

What is MiniMind2?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models