Laurel: Generating Dafny Assertions Using Large Language Models

Back

Published

May 27, 2024

Updated

May 27, 2024

Can AI Write Formal Proofs? Laurel Shows It's Possible

Laurel: Generating Dafny Assertions Using Large Language Models

Eric Mugnier|Emmanuel Anaya Gonzalez|Ranjit Jhala|Nadia Polikarpova|Yuanyuan Zhou

https://arxiv.org/abs/2405.16792v1

Summary

Formal verification, a powerful technique for ensuring software correctness, is gaining traction. Tools like Dafny automate parts of the verification process, but often require "helper assertions" to guide the underlying automated theorem provers. These assertions are tedious for engineers to craft. New research introduces Laurel, a tool that leverages Large Language Models (LLMs) like GPT-4 to automatically generate these helper assertions. Laurel analyzes the code and error messages from the verifier to pinpoint the likely location of a missing assertion and inserts a placeholder. It then selects similar assertions from the codebase as examples to guide the LLM. The results are impressive: Laurel successfully generates over 50% of the required assertions, significantly reducing the manual effort involved in formal verification. This opens exciting possibilities for further automating program verification and making it more accessible to developers.

🍰 Interesting in building your own agents?

PromptLayer provides the tools to manage and monitor prompts with your whole team. Get started for free.

Question & Answers

How does Laurel's technical process work to generate helper assertions for formal verification?

Laurel employs a two-step technical process to generate helper assertions. First, it analyzes code and verifier error messages to identify where assertions are needed and inserts placeholders. Then, it uses a context-aware approach by selecting similar existing assertions from the codebase as examples to guide the LLM (GPT-4) in generating new, appropriate assertions. This process works like a smart autocomplete system for verification logic - similar to how code completion tools work, but specifically for formal proofs. For example, if verifying a sorting algorithm, Laurel might identify that a loop invariant is needed and generate an assertion about array ordering based on similar patterns in the codebase.

What are the main benefits of automated formal verification for software development?

Automated formal verification helps ensure software reliability by mathematically proving that code behaves as intended. It catches potential bugs and issues before deployment, reducing costly fixes and improving overall software quality. The main benefits include increased reliability in critical systems (like medical devices or financial software), reduced testing time compared to traditional methods, and earlier detection of potential issues in the development cycle. For example, a banking application could use formal verification to ensure that money transfers always maintain correct balance calculations, preventing expensive errors in production.

How is AI transforming software testing and verification?

AI is revolutionizing software testing and verification by automating previously manual processes and improving accuracy. Modern AI tools can automatically generate test cases, predict potential bugs, and even write verification proofs, significantly reducing the time and effort required for quality assurance. The benefits include faster development cycles, reduced human error, and more comprehensive testing coverage. For instance, AI can analyze patterns in code to automatically generate test scenarios that human testers might miss, or help identify potential security vulnerabilities before they become problems in production environments.

PromptLayer Features

Prompt Management
Laurel's use of similar assertions as examples to guide LLM responses parallels prompt versioning and template management

Implementation Details

Create versioned prompt templates for different types of assertions, store successful assertion examples as reusable components, implement similarity-based retrieval for relevant examples

Key Benefits

• Systematic organization of assertion templates • Version control of successful prompting strategies • Easy reuse of proven assertion patterns

Potential Improvements

• Add metadata tagging for assertion types • Implement semantic search for similar assertions • Create assertion template hierarchy system

Business Value

Efficiency Gains

50% reduction in manual assertion writing time through template reuse

Cost Savings

Reduced engineering hours spent on verification tasks

Quality Improvement

More consistent and proven assertion patterns across projects

Analytics
Testing & Evaluation
Laurel's evaluation of successful assertion generation rates mirrors the need for systematic prompt testing

Implementation Details

Set up batch testing of assertion generation, implement success rate tracking, create regression tests for assertion quality

Key Benefits

• Quantitative evaluation of assertion quality • Early detection of degraded performance • Continuous improvement of generation accuracy

Potential Improvements

• Implement automated assertion validation • Add comparative testing between model versions • Create benchmark datasets for assertion testing

Business Value

Efficiency Gains

Faster identification of optimal prompting strategies

Cost Savings

Reduced debugging time through better quality assurance

Quality Improvement

Higher success rate in automated assertion generation

Can AI Write Formal Proofs? Laurel Shows It's Possible

Summary

Question & Answers

PromptLayer Features

The first platform built for prompt engineering