Selene-1-Mini-Llama-3.1-8B

Property	Value
Developer	AtlaAI
Base Model	Llama-3.1-8B
Context Length	128K tokens
Languages	English, German, French, Italian, Portuguese, Hindi, Spanish, Thai
Paper	arXiv:2501.17195

What is Selene-1-Mini-Llama-3.1-8B?

Selene-1-Mini-Llama-3.1-8B is a state-of-the-art small language model-as-a-judge (SLMJ) that has been specifically designed for evaluation tasks. Despite its relatively compact size, it achieves performance comparable to models 10 times larger, including outperforming GPT-4 on specialized benchmarks like RewardBench, EvalBiasBench, and AutoJ.

Implementation Details

The model is post-trained from Llama-3.1-8B and has been optimized across various evaluation tasks and scoring criteria. It implements the Llama 3 conversation template and requires proper template application for optimal performance. The model can be easily deployed using Hugging Face Transformers library and supports both CPU and GPU deployment.

Built on Llama-3.1-8B architecture
Supports 128K context length for comprehensive evaluation
Implements structured evaluation outputs
Provides qualitative critiques with reasoning

Core Capabilities

Absolute scoring evaluations (1-5 scale ratings)
Binary classification tasks
Pairwise preference analysis
Multi-language support for major global languages
RAG hallucination detection
Structured evaluation outputs with reasoning

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its ability to match the performance of much larger models while maintaining a smaller parameter count of 8B. It's specifically designed for evaluation tasks and achieves state-of-the-art results on multiple benchmarks.

Q: What are the recommended use cases?

The model is ideal for evaluation tasks including response quality assessment, harmlessness evaluation, logical consistency checking, and RAG hallucination detection. It can be used for both absolute scoring and comparative analysis of responses.