Published
May 27, 2024
Updated
May 27, 2024

Can AI Write Formal Proofs? Laurel Shows It's Possible

Laurel: Generating Dafny Assertions Using Large Language Models
By
Eric Mugnier|Emmanuel Anaya Gonzalez|Ranjit Jhala|Nadia Polikarpova|Yuanyuan Zhou

Summary

Formal verification, a powerful technique for ensuring software correctness, is gaining traction. Tools like Dafny automate parts of the verification process, but often require "helper assertions" to guide the underlying automated theorem provers. These assertions are tedious for engineers to craft. New research introduces Laurel, a tool that leverages Large Language Models (LLMs) like GPT-4 to automatically generate these helper assertions. Laurel analyzes the code and error messages from the verifier to pinpoint the likely location of a missing assertion and inserts a placeholder. It then selects similar assertions from the codebase as examples to guide the LLM. The results are impressive: Laurel successfully generates over 50% of the required assertions, significantly reducing the manual effort involved in formal verification. This opens exciting possibilities for further automating program verification and making it more accessible to developers.
🍰 Interesting in building your own agents?
PromptLayer provides the tools to manage and monitor prompts with your whole team. Get started for free.

Question & Answers

How does Laurel's technical process work to generate helper assertions for formal verification?
Laurel employs a two-step technical process to generate helper assertions. First, it analyzes code and verifier error messages to identify where assertions are needed and inserts placeholders. Then, it uses a context-aware approach by selecting similar existing assertions from the codebase as examples to guide the LLM (GPT-4) in generating new, appropriate assertions. This process works like a smart autocomplete system for verification logic - similar to how code completion tools work, but specifically for formal proofs. For example, if verifying a sorting algorithm, Laurel might identify that a loop invariant is needed and generate an assertion about array ordering based on similar patterns in the codebase.
What are the main benefits of automated formal verification for software development?
Automated formal verification helps ensure software reliability by mathematically proving that code behaves as intended. It catches potential bugs and issues before deployment, reducing costly fixes and improving overall software quality. The main benefits include increased reliability in critical systems (like medical devices or financial software), reduced testing time compared to traditional methods, and earlier detection of potential issues in the development cycle. For example, a banking application could use formal verification to ensure that money transfers always maintain correct balance calculations, preventing expensive errors in production.
How is AI transforming software testing and verification?
AI is revolutionizing software testing and verification by automating previously manual processes and improving accuracy. Modern AI tools can automatically generate test cases, predict potential bugs, and even write verification proofs, significantly reducing the time and effort required for quality assurance. The benefits include faster development cycles, reduced human error, and more comprehensive testing coverage. For instance, AI can analyze patterns in code to automatically generate test scenarios that human testers might miss, or help identify potential security vulnerabilities before they become problems in production environments.

PromptLayer Features

  1. Prompt Management
  2. Laurel's use of similar assertions as examples to guide LLM responses parallels prompt versioning and template management
Implementation Details
Create versioned prompt templates for different types of assertions, store successful assertion examples as reusable components, implement similarity-based retrieval for relevant examples
Key Benefits
• Systematic organization of assertion templates • Version control of successful prompting strategies • Easy reuse of proven assertion patterns
Potential Improvements
• Add metadata tagging for assertion types • Implement semantic search for similar assertions • Create assertion template hierarchy system
Business Value
Efficiency Gains
50% reduction in manual assertion writing time through template reuse
Cost Savings
Reduced engineering hours spent on verification tasks
Quality Improvement
More consistent and proven assertion patterns across projects
  1. Testing & Evaluation
  2. Laurel's evaluation of successful assertion generation rates mirrors the need for systematic prompt testing
Implementation Details
Set up batch testing of assertion generation, implement success rate tracking, create regression tests for assertion quality
Key Benefits
• Quantitative evaluation of assertion quality • Early detection of degraded performance • Continuous improvement of generation accuracy
Potential Improvements
• Implement automated assertion validation • Add comparative testing between model versions • Create benchmark datasets for assertion testing
Business Value
Efficiency Gains
Faster identification of optimal prompting strategies
Cost Savings
Reduced debugging time through better quality assurance
Quality Improvement
Higher success rate in automated assertion generation

The first platform built for prompt engineering