Mistral-Large-Instruct-2411-2.75bpw-h6-exl2

Property	Value
Base Model	Mistral-Large-Instruct-2411
Parameters	123B
License	Mistral Research License (MRL)
Context Length	32K tokens
Supported Languages	10+ languages

What is Mistral-Large-Instruct-2411-2.75bpw-h6-exl2?

This model is an EXL2-quantized version of the Mistral-Large-Instruct-2411, optimized for systems with 48GB VRAM while maintaining the powerful capabilities of the original 123B parameter model. It features state-of-the-art reasoning, knowledge, and coding capabilities, with improved long context handling and function calling abilities.

Implementation Details

The model uses 2.75 bits-per-weight quantization with h6 precision, enabling efficient deployment on systems with limited VRAM while preserving model performance. It supports a 32K context window with Q4 cache optimization, making it suitable for extensive document processing and complex reasoning tasks.

Optimized for vLLM deployment
Supports multiple deployment configurations
Enhanced system prompt handling
Native function calling capabilities

Core Capabilities

Multi-lingual support for 10+ languages including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese
Proficient coding abilities across 80+ programming languages
Advanced mathematical and reasoning capabilities
Robust context adherence for RAG applications
Agent-centric design with native function calling

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient quantization that enables running a 123B parameter model on systems with just 48GB VRAM, while maintaining high performance across multiple languages and tasks.

Q: What are the recommended use cases?

The model is specifically designed for research purposes under the Mistral Research License. It excels in coding tasks, mathematical reasoning, and multi-lingual applications, particularly where context length and memory efficiency are crucial.