Selene-1-Mini-Llama-3.1-8B
Property | Value |
---|---|
Developer | AtlaAI |
Base Model | Llama-3.1-8B |
Context Length | 128K tokens |
Languages | English, German, French, Italian, Portuguese, Hindi, Spanish, Thai |
Paper | arXiv:2501.17195 |
What is Selene-1-Mini-Llama-3.1-8B?
Selene-1-Mini-Llama-3.1-8B is a state-of-the-art small language model-as-a-judge (SLMJ) that has been specifically designed for evaluation tasks. Despite its relatively compact size, it achieves performance comparable to models 10 times larger, including outperforming GPT-4 on specialized benchmarks like RewardBench, EvalBiasBench, and AutoJ.
Implementation Details
The model is post-trained from Llama-3.1-8B and has been optimized across various evaluation tasks and scoring criteria. It implements the Llama 3 conversation template and requires proper template application for optimal performance. The model can be easily deployed using Hugging Face Transformers library and supports both CPU and GPU deployment.
- Built on Llama-3.1-8B architecture
- Supports 128K context length for comprehensive evaluation
- Implements structured evaluation outputs
- Provides qualitative critiques with reasoning
Core Capabilities
- Absolute scoring evaluations (1-5 scale ratings)
- Binary classification tasks
- Pairwise preference analysis
- Multi-language support for major global languages
- RAG hallucination detection
- Structured evaluation outputs with reasoning
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its ability to match the performance of much larger models while maintaining a smaller parameter count of 8B. It's specifically designed for evaluation tasks and achieves state-of-the-art results on multiple benchmarks.
Q: What are the recommended use cases?
The model is ideal for evaluation tasks including response quality assessment, harmlessness evaluation, logical consistency checking, and RAG hallucination detection. It can be used for both absolute scoring and comparative analysis of responses.