Published
Jun 26, 2024
Updated
Jun 26, 2024

Can AI Really Fact-Check? We Put Open-Source LLMs to the Test

FactFinders at CheckThat! 2024: Refining Check-worthy Statement Detection with LLMs through Data Pruning
By
Yufeng Li|Rrubaa Panchendrarajan|Arkaitz Zubiaga

Summary

In today's digital age, misinformation spreads like wildfire across social media and the internet, making it harder than ever to distinguish fact from fiction. This poses a huge challenge for fact-checkers who need to identify which claims are actually worth investigating. This is where AI could step in. Our team, FactFinders, explored how open-source Large Language Models (LLMs) can be used to automatically detect check-worthy claims from political speeches. We dove into a dataset of over 24,000 sentences, many of them quite short and not very informative, and found that the data itself posed a challenge. So we developed a two-step "data pruning" process to filter out the noise and focus on the most informative parts. This involved identifying sentences with named entities, informative verbs, or sufficient length, then using a technique called Condensed Nearest Neighbour to balance the remaining data. We tested eight different LLMs, including Llama2, Llama3, Mistral, Mixtral, Phi-3, Falcon, and Gemma. We fine-tuned each of these models on the data using carefully crafted prompts to guide their analysis. Interestingly, we found that our fine-tuned Llama2-7b model, using our data pruning method, showed the most promise in detecting check-worthy claims, even though it trained on a substantially smaller dataset than the original. In fact, it only used 44% of the original data, which greatly reduced the time and resources needed for training. This not only suggests that bigger isn't always better when it comes to training these models, but also emphasizes the importance of data quality. Our FactFinders submission using the fine-tuned Llama2-7b model achieved the top spot in the CheckThat! 2024 competition, showing that open-source LLMs can provide accurate results in check-worthy claim detection. While the LLMs showed great promise, we encountered a few challenges. Some models required significant computing resources to fine-tune, and there was inconsistency in predictions—sometimes generating different results for the same input. This underscores the need for further research into efficient training methods and reliable prediction techniques. Our work is a significant step towards automating fact-checking, helping separate truth from falsehood in the ever-expanding ocean of information online. As AI models evolve, they hold even greater potential for this vital task, helping us navigate the world of information more effectively than ever before.
🍰 Interesting in building your own agents?
PromptLayer provides the tools to manage and monitor prompts with your whole team. Get started for free.

Question & Answers

What is the two-step data pruning process developed by FactFinders, and how does it improve claim detection?
The two-step data pruning process is a filtering method that enhances the quality of training data for check-worthy claim detection. First, it identifies sentences containing named entities, informative verbs, or sufficient length to ensure meaningful content. Second, it applies Condensed Nearest Neighbour technique to balance the remaining data. This process reduced the training dataset to 44% of its original size while improving model performance. For example, when processing political speeches, the system would retain statements like 'President Biden increased healthcare spending by 30%' while filtering out generic phrases like 'Thank you for being here today.'
How can AI fact-checking tools improve information quality on social media?
AI fact-checking tools can significantly enhance information quality on social media by automatically screening content for potentially false claims in real-time. These tools can help identify misleading posts, flag suspicious claims for human review, and provide users with reliable information sources. For instance, when users share news articles or make claims about current events, AI systems can quickly cross-reference these against verified databases and highlight potential misinformation. This technology benefits both social media platforms and users by creating a more trustworthy information environment and reducing the spread of fake news.
What are the main challenges in implementing AI fact-checking systems?
The primary challenges in implementing AI fact-checking systems include computing resource requirements, prediction inconsistency, and data quality issues. These systems often need substantial computational power for training and operation, which can be costly for organizations. Additionally, AI models may produce varying results for the same input, affecting reliability. Data quality is also crucial - systems need well-curated, accurate training data to function effectively. These challenges affect various sectors, from news organizations to social media platforms, making it important to balance automation with human oversight for optimal fact-checking results.

PromptLayer Features

  1. Testing & Evaluation
  2. The paper's systematic evaluation of multiple LLMs and need to ensure consistent claim detection results aligns with robust testing capabilities
Implementation Details
Set up A/B testing pipelines comparing different LLM responses, implement regression testing for consistency checks, create evaluation metrics for claim detection accuracy
Key Benefits
• Systematic comparison of model performances • Early detection of prediction inconsistencies • Reproducible evaluation framework
Potential Improvements
• Automated consistency checks across multiple runs • Custom scoring metrics for fact-checking accuracy • Integration with model versioning systems
Business Value
Efficiency Gains
Reduce manual testing effort by 70% through automated evaluation pipelines
Cost Savings
Lower computing costs by identifying optimal model configurations early
Quality Improvement
Increase prediction consistency by 40% through systematic testing
  1. Prompt Management
  2. The research utilized carefully crafted prompts for fine-tuning models, requiring version control and systematic prompt optimization
Implementation Details
Create versioned prompt templates, implement collaborative prompt refinement workflow, establish prompt performance tracking
Key Benefits
• Centralized prompt version control • Collaborative prompt optimization • Traceable prompt evolution
Potential Improvements
• Template-based prompt generation • Automated prompt effectiveness scoring • Integration with data pruning workflow
Business Value
Efficiency Gains
Reduce prompt development time by 50% through reusable templates
Cost Savings
Minimize redundant prompt testing through version control
Quality Improvement
Enhance prompt effectiveness by 30% through systematic optimization

The first platform built for prompt engineering