Flow-Judge-v0.1

Property	Value
Parameters	3.8B
Base Architecture	Phi-3.5-mini-instruct
License	Apache 2.0
Context Length	8192 tokens

What is Flow-Judge-v0.1?

Flow-Judge-v0.1 is a specialized language model designed for evaluating LLM system outputs. Built on the Phi-3.5-mini architecture, this 3.8B parameter model provides sophisticated evaluation capabilities while maintaining a relatively small footprint. The model stands out for its ability to perform customizable evaluations across different scoring scales and provide structured, detailed feedback.

Implementation Details

The model leverages the Phi-3.5-mini-instruct architecture and supports modern optimization techniques including MQA and Flash Attention 2. It was trained using synthetic datasets and fine-tuned using RSLoRa, achieving performance comparable to larger models in various benchmarks.

Supports multiple scoring scales: Pass/fail, 3-Likert, and 5-Likert
Provides structured output with feedback and score tags
Uses bfloat16 precision weights
Available in AWQ and GGUF quantized versions

Core Capabilities

Customizable evaluation criteria and rubrics
Detailed qualitative feedback generation
High performance on held-out test sets (0.955 F1 score on Pass/Fail)
Strong performance on specialized benchmarks like HaluEval and Covid-QA
Efficient processing with minimal hardware requirements (4GB VRAM minimum)

Frequently Asked Questions

Q: What makes this model unique?

Flow-Judge combines the efficiency of a smaller model (3.8B parameters) with sophisticated evaluation capabilities typically found in larger models. Its ability to provide structured feedback and support multiple scoring scales makes it particularly valuable for LLM system evaluation tasks.

Q: What are the recommended use cases?

The model is specifically designed for evaluating LLM system outputs across various domains. It's particularly well-suited for tasks requiring detailed feedback and scoring according to custom rubrics, making it valuable for development teams working on AI applications.

Flow-Judge-v0.1

Flow-Judge-v0.1

What is Flow-Judge-v0.1?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models