Mistral-Large-Instruct-2411-2.75bpw-h6-exl2

Maintained By
wolfram

Mistral-Large-Instruct-2411-2.75bpw-h6-exl2

PropertyValue
Base ModelMistral-Large-Instruct-2411
Parameters123B
LicenseMistral Research License (MRL)
Context Length32K tokens
Supported Languages10+ languages

What is Mistral-Large-Instruct-2411-2.75bpw-h6-exl2?

This model is an EXL2-quantized version of the Mistral-Large-Instruct-2411, optimized for systems with 48GB VRAM while maintaining the powerful capabilities of the original 123B parameter model. It features state-of-the-art reasoning, knowledge, and coding capabilities, with improved long context handling and function calling abilities.

Implementation Details

The model uses 2.75 bits-per-weight quantization with h6 precision, enabling efficient deployment on systems with limited VRAM while preserving model performance. It supports a 32K context window with Q4 cache optimization, making it suitable for extensive document processing and complex reasoning tasks.

  • Optimized for vLLM deployment
  • Supports multiple deployment configurations
  • Enhanced system prompt handling
  • Native function calling capabilities

Core Capabilities

  • Multi-lingual support for 10+ languages including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese
  • Proficient coding abilities across 80+ programming languages
  • Advanced mathematical and reasoning capabilities
  • Robust context adherence for RAG applications
  • Agent-centric design with native function calling

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient quantization that enables running a 123B parameter model on systems with just 48GB VRAM, while maintaining high performance across multiple languages and tasks.

Q: What are the recommended use cases?

The model is specifically designed for research purposes under the Mistral Research License. It excels in coding tasks, mathematical reasoning, and multi-lingual applications, particularly where context length and memory efficiency are crucial.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.