hallucination_evaluation_model

Maintained By
vectara

Hallucination Evaluation Model (HHEM-2.1-Open)

PropertyValue
Parameters110M
LicenseApache 2.0
Base Modelgoogle/flan-t5-base
PaperRAGTruth Paper
Tensor TypeF32

What is hallucination_evaluation_model?

HHEM-2.1-Open is an advanced model designed by Vectara for detecting hallucinations in Language Model outputs, particularly in Retrieval-Augmented Generation (RAG) applications. It evaluates the factual consistency between generated content and source materials, providing scores between 0 and 1 to indicate the level of hallucination.

Implementation Details

Built on the FLAN-T5 architecture, HHEM-2.1-Open processes pairs of premise and hypothesis texts to determine factual consistency. The model requires less than 600MB RAM and can process 2k-token inputs in approximately 1.5 seconds on standard CPU hardware.

  • Unlimited context length capability
  • Outperforms GPT-3.5-Turbo and GPT-4 on benchmark datasets
  • Efficient resource utilization for production deployment

Core Capabilities

  • Binary classification of hallucinated vs. consistent content
  • Asymmetric evaluation of premise-hypothesis pairs
  • High performance on RAGTruth-Summ (64.42% balanced accuracy) and RAGTruth-QA (74.28% balanced accuracy)
  • Efficient processing of long-form content

Frequently Asked Questions

Q: What makes this model unique?

HHEM-2.1-Open stands out for its ability to process unlimited context length, unlike its predecessor's 512 token limit, while maintaining superior performance over larger models like GPT-4 in hallucination detection tasks.

Q: What are the recommended use cases?

The model is particularly suited for RAG applications where verifying the factual consistency of LLM-generated summaries against source documents is crucial. It's ideal for production environments where resource efficiency is important while maintaining high accuracy.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.