Flow-Judge-v0.1
Property | Value |
---|---|
Parameters | 3.8B |
Base Architecture | Phi-3.5-mini-instruct |
License | Apache 2.0 |
Context Length | 8192 tokens |
What is Flow-Judge-v0.1?
Flow-Judge-v0.1 is a specialized language model designed for evaluating LLM system outputs. Built on the Phi-3.5-mini architecture, this 3.8B parameter model provides sophisticated evaluation capabilities while maintaining a relatively small footprint. The model stands out for its ability to perform customizable evaluations across different scoring scales and provide structured, detailed feedback.
Implementation Details
The model leverages the Phi-3.5-mini-instruct architecture and supports modern optimization techniques including MQA and Flash Attention 2. It was trained using synthetic datasets and fine-tuned using RSLoRa, achieving performance comparable to larger models in various benchmarks.
- Supports multiple scoring scales: Pass/fail, 3-Likert, and 5-Likert
- Provides structured output with feedback and score tags
- Uses bfloat16 precision weights
- Available in AWQ and GGUF quantized versions
Core Capabilities
- Customizable evaluation criteria and rubrics
- Detailed qualitative feedback generation
- High performance on held-out test sets (0.955 F1 score on Pass/Fail)
- Strong performance on specialized benchmarks like HaluEval and Covid-QA
- Efficient processing with minimal hardware requirements (4GB VRAM minimum)
Frequently Asked Questions
Q: What makes this model unique?
Flow-Judge combines the efficiency of a smaller model (3.8B parameters) with sophisticated evaluation capabilities typically found in larger models. Its ability to provide structured feedback and support multiple scoring scales makes it particularly valuable for LLM system evaluation tasks.
Q: What are the recommended use cases?
The model is specifically designed for evaluating LLM system outputs across various domains. It's particularly well-suited for tasks requiring detailed feedback and scoring according to custom rubrics, making it valuable for development teams working on AI applications.