Bespoke-MiniCheck-7B

Property	Value
Parameter Count	7.74B
Model Type	Text Classification
Architecture	InternLM2-based
License	CC BY-NC 4.0
Paper	MiniCheck Paper

What is Bespoke-MiniCheck-7B?

Bespoke-MiniCheck-7B is a state-of-the-art fact-checking model developed by Bespoke Labs. It's designed to determine whether a given sentence is supported by a reference document, outputting a binary classification (0 or 1). The model is built on the internlm2_5-7b-chat architecture and has been fine-tuned on a carefully curated dataset of 35K examples.

Implementation Details

The model is trained on a combination of 21K ANLI examples and 14K synthetically-generated examples, created using Meta's Llama-3.1-405B-Instruct. The synthetic data includes both "claim-to-document" and "doc-to-claim" examples, with sophisticated curation techniques to ensure high quality.

Supports input documents up to 32K tokens
Implements automatic prefix caching for improved performance
Achieves throughput of >500 docs/min with vLLM optimization
Uses BF16 tensor type for efficient computation

Core Capabilities

Binary fact-checking classification
Handles multi-sentence claims through sentence breakdown
Efficient document processing with configurable chunk sizes
State-of-the-art performance on LLM-AggreFact benchmark
Supports batch processing for multiple documents and claims

Frequently Asked Questions

Q: What makes this model unique?

This model achieves SOTA performance in fact-checking despite its relatively small size, thanks to high-quality data curation and efficient architecture design. It's particularly notable for its combination of accuracy and processing speed.

Q: What are the recommended use cases?

The model is ideal for document-based fact verification, content validation, and automated fact-checking systems. It's particularly useful when processing large volumes of claims against reference documents, with practical applications in content moderation, research verification, and information validation.