Hermes-3-Llama-3.1-70B-FP8

NousResearch

Hermes-3 70B FP8 is a highly capable LLM using ChatML format, featuring function calling, JSON mode & advanced reasoning. Built on Llama 3.1.

Property	Value
Parameter Count	70.6B
Model Type	FP8 Quantized Language Model
Base Model	Meta-Llama-3.1-70B
License	Llama3
Paper	Technical Report

What is Hermes-3-Llama-3.1-70B-FP8?

Hermes-3-Llama-3.1-70B-FP8 is a NeuralMagic FP8 quantized version of the flagship Hermes 3 language model, specifically optimized for use with vLLM. This model represents the latest iteration in the Hermes series, featuring advanced capabilities in reasoning, roleplaying, and multi-turn conversations.

Implementation Details

The model uses ChatML as its prompt format, enabling structured multi-turn dialogue and system-level instructions. It supports both function calling and JSON mode for structured outputs, making it highly versatile for various applications.

Quantization: FP8 (E4M3) format for optimal performance
Architecture: Based on Llama 3.1 70B foundation model
Framework Compatibility: Optimized for vLLM deployment

Core Capabilities

Advanced agentic capabilities and improved reasoning
Enhanced roleplaying and multi-turn conversation handling
Powerful function calling with structured output capabilities
Long context coherence and improved code generation
User-aligned responses with strong steering capabilities

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its FP8 quantization while maintaining high performance, competitive with Llama-3.1 Instruct models. It offers advanced function calling capabilities and structured output formats, making it particularly suitable for practical applications requiring precise control and structured responses.

Q: What are the recommended use cases?

The model excels in scenarios requiring structured dialogue, function calling, JSON outputs, and complex reasoning tasks. It's particularly well-suited for applications needing multi-turn conversations, roleplaying scenarios, and tasks requiring detailed technical responses or code generation.