lettucedect-large-modernbert-en-v1

Property	Value
Organization	KRLabsOrg
Architecture	ModernBERT Large
Task	Hallucination Detection
Context Length	8192 tokens
Language	English
Paper	arXiv:2502.17125

What is lettucedect-large-modernbert-en-v1?

LettuceDetect is a sophisticated transformer-based model specifically designed for hallucination detection in Retrieval-Augmented Generation (RAG) applications. Built on ModernBERT's large architecture, it excels at analyzing context-answer pairs to identify potential hallucinations in generated content. With its impressive F1 score of 79.22%, it outperforms many existing solutions including GPT-4 and is competitive with state-of-the-art models.

Implementation Details

The model leverages ModernBERT's extended context support of up to 8192 tokens, making it particularly effective for processing lengthy documents. It performs token-level classification to identify spans of text that aren't supported by the provided context, offering granular hallucination detection capabilities.

Token-level classification architecture for precise hallucination detection
Extended context support up to 8192 tokens
Trained on the RagTruth dataset
Outperforms prompt-based methods and many encoder-based models

Core Capabilities

Span-level hallucination detection with confidence scores
Processing of extensive context-answer pairs
Integration-ready for RAG applications
Support for multiple context documents
Python API for easy implementation

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to process long contexts (8192 tokens) combined with its token-level classification approach makes it particularly effective for detailed document analysis. Its performance metrics surpass many existing solutions while maintaining practical inference times.

Q: What are the recommended use cases?

The model is ideal for RAG applications where verifying the truthfulness of generated content against source documents is crucial. It's particularly useful in enterprise settings, content generation verification, and any scenario where hallucination detection is critical for maintaining information accuracy.