Nemotron-Mini-4B-Instruct

Maintained By
nvidia

Nemotron-Mini-4B-Instruct

PropertyValue
DeveloperNVIDIA
Model Size4B parameters
ArchitectureTransformer Decoder with GQA & RoPE
LicenseNVIDIA Community Model License
Research PaperLink

What is Nemotron-Mini-4B-Instruct?

Nemotron-Mini-4B-Instruct is a small language model (SLM) developed by NVIDIA, specifically optimized through distillation, pruning, and quantization techniques. It's a fine-tuned version of Minitron-4B-Base, derived from the larger Nemotron-4 15B model. The model excels in roleplay, retrieval augmented generation (RAG), and function calling tasks while maintaining a compact size suitable for on-device deployment.

Implementation Details

The model features a sophisticated architecture with a 3072 model embedding size, 32 attention heads, and an MLP intermediate dimension of 9216. It implements Grouped-Query Attention (GQA) and Rotary Position Embeddings (RoPE), supporting a context length of 4,096 tokens.

  • Custom prompt template required for optimal performance
  • Supports single-turn conversations and tool use scenarios
  • Compatible with Transformers library and pipeline implementation
  • Undergone comprehensive AI safety evaluation

Core Capabilities

  • Roleplaying and character interactions
  • Retrieval Augmented Generation (RAG)
  • Function calling
  • On-device deployment optimization
  • Commercial use readiness

Frequently Asked Questions

Q: What makes this model unique?

The model's distinguishing feature is its optimization for on-device deployment while maintaining high performance in specific tasks like roleplay and RAG. It achieves this through innovative compression techniques while preserving core functionalities of larger models.

Q: What are the recommended use cases?

The model is particularly well-suited for gaming applications (as demonstrated in NVIDIA ACE), interactive character roleplay, question-answering systems using RAG, and applications requiring function calling capabilities. Its optimization for on-device deployment makes it ideal for applications where low latency and local processing are priorities.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.