Published
Sep 4, 2024
Updated
Sep 10, 2024

LLMs Learn to Cite Sources: Boosting Accuracy and Trust

LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA
By
Jiajie Zhang|Yushi Bai|Xin Lv|Wanjun Gu|Danqing Liu|Minhao Zou|Shulin Cao|Lei Hou|Yuxiao Dong|Ling Feng|Juanzi Li

Summary

Large language models (LLMs) are impressive, but their trustworthiness has been a concern due to a lack of source citations and occasional hallucinations. Imagine an LLM that not only answers your questions based on a vast amount of text but also tells you exactly where it got its information, down to the specific sentence. This is the exciting development explored in the research paper "LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA." Researchers found current LLMs struggle to provide precise citations, often pointing to large chunks of text or even hallucinating sources. To address this, they created LongBench-Cite, a benchmark to evaluate LLM citation abilities. Then, they developed a clever pipeline called "Coarse to Fine" (CoF) that uses existing LLMs to create training data with accurate, sentence-level citations. This data formed LongCite-45k, a large dataset used to train new, citation-capable LLMs called LongCite-8B and LongCite-9B. The results are impressive. LongCite LLMs significantly outperform existing models, including larger proprietary ones, in generating accurate citations. This increased transparency allows users to quickly verify information, boosting trust. Furthermore, training with citations actually *improves* the LLMs' overall accuracy. By learning to locate and use evidence precisely, they are less likely to hallucinate and generate more comprehensive responses. While there's still room for improvement, this research is a significant step towards building reliable, transparent, and verifiable long-context LLMs. It opens exciting possibilities for applications where factual accuracy is paramount, such as legal, financial, and academic research.
🍰 Interesting in building your own agents?
PromptLayer provides the tools to manage and monitor prompts with your whole team. Get started for free.

Question & Answers

How does the 'Coarse to Fine' (CoF) pipeline work in creating training data for citation-capable LLMs?
The CoF pipeline is a systematic approach to generate high-quality training data with sentence-level citations. First, it uses existing LLMs to identify relevant text passages (coarse stage), then narrows down to specific sentences that contain the evidence (fine stage). This process creates the LongCite-45k dataset, which is then used to train new citation-capable models. For example, when answering a question about climate change, the system would first identify relevant paragraphs from source documents, then pinpoint exact sentences containing specific statistics or claims, creating precise citation pairs for training.
What are the main benefits of LLMs that can cite their sources?
LLMs with citation capabilities offer three key advantages: increased trustworthiness, improved accuracy, and better transparency. Users can verify information sources directly, reducing the risk of misinformation. These models are less likely to hallucinate since they must ground their responses in specific source material. This technology is particularly valuable in professional settings like legal research, academic writing, or journalism, where fact-checking and source verification are crucial. For example, a journalist could quickly verify claims and their sources while writing an article, saving time and ensuring accuracy.
How can citation-capable AI transform research and information verification?
Citation-capable AI revolutionizes information verification by providing immediate source attribution for claims and statements. This technology streamlines research processes across various fields, from academic studies to business intelligence. Users can quickly validate information and trace facts back to their original sources, improving efficiency and reliability. For instance, students working on research papers could instantly verify sources and citations, while business analysts could quickly fact-check market research data. This advancement particularly benefits industries requiring strict information accuracy, such as healthcare, finance, and legal services.

PromptLayer Features

  1. Testing & Evaluation
  2. The paper's LongBench-Cite benchmark aligns with PromptLayer's testing capabilities for evaluating citation accuracy and hallucination reduction
Implementation Details
1. Create test suite for citation accuracy using LongBench-Cite methodology 2. Set up A/B tests comparing citation formats 3. Implement regression testing for hallucination detection
Key Benefits
• Systematic evaluation of citation accuracy • Quantifiable measurement of hallucination reduction • Reproducible testing framework for citation capabilities
Potential Improvements
• Add automated citation verification • Expand test coverage for different document types • Integrate source validation metrics
Business Value
Efficiency Gains
Reduces manual verification time by 70% through automated citation testing
Cost Savings
Minimizes resources spent on fact-checking and error correction
Quality Improvement
Ensures consistent citation accuracy across all LLM outputs
  1. Workflow Management
  2. The CoF pipeline methodology maps to PromptLayer's multi-step orchestration for managing complex LLM workflows
Implementation Details
1. Create template for citation extraction 2. Set up pipeline stages for coarse-to-fine refinement 3. Implement version tracking for citation formats
Key Benefits
• Structured approach to citation generation • Traceable citation refinement process • Reusable citation templates
Potential Improvements
• Add dynamic source validation • Implement citation style switching • Create citation metadata tracking
Business Value
Efficiency Gains
Streamlines citation workflow reducing processing time by 50%
Cost Savings
Reduces manual intervention costs in citation verification
Quality Improvement
Ensures consistent citation format and accuracy across all outputs

The first platform built for prompt engineering