Published
Nov 1, 2024
Updated
Nov 1, 2024

Can AI Diagnose Illness Using Lab Results?

Evaluating the Impact of Lab Test Results on Large Language Models Generated Differential Diagnoses from Clinical Case Vignettes
By
Balu Bhasuran|Qiao Jin|Yuzhang Xie|Carl Yang|Karim Hanna|Jennifer Costa|Cindy Shavor|Zhiyong Lu|Zhe He

Summary

Diagnosing illness is a complex puzzle, requiring doctors to piece together symptoms, medical history, and crucial lab results. Could AI step in and help solve these puzzles? New research explores how large language models (LLMs) perform at generating differential diagnoses—lists of possible illnesses—when given clinical case information, with and without lab test results. The results are intriguing. Researchers tested five LLMs, including GPT-4, GPT-3.5, Llama-2, Claude-2, and Mixtral, on 50 real clinical cases. They found that providing lab data significantly boosted the models' ability to provide accurate diagnoses. GPT-4 performed best, achieving up to 80% “lenient accuracy” when given lab results. This means that while it didn't always pinpoint the *exact* diagnosis as the top choice, it consistently included the correct illness in its wider list of possibilities. Interestingly, the study showed that even open-source LLMs saw improvement when given lab data, suggesting that this type of information is key for AI diagnostic tools. The models successfully interpreted common lab tests like liver function panels and toxicology reports, demonstrating an ability to understand and apply this data. However, while promising, the research also highlights challenges. The models struggled more with complex cases requiring specialized tests, demonstrating that human expertise is still essential. This study suggests a future where AI acts as a powerful assistant to doctors, helping them quickly narrow down possibilities and potentially improving diagnostic speed and accuracy. But further development and refinement are needed before these AI tools become a regular part of healthcare.
🍰 Interesting in building your own agents?
PromptLayer provides the tools to manage and monitor prompts with your whole team. Get started for free.

Question & Answers

What specific performance improvements did the LLMs show when provided with lab data compared to without it?
The study demonstrated that access to lab data significantly enhanced diagnostic accuracy across all tested LLMs. GPT-4 achieved the highest performance with up to 80% lenient accuracy when incorporating lab results. The improvement process worked through three main mechanisms: 1) Enhanced context through standardized lab values that provided objective diagnostic indicators, 2) Pattern recognition across multiple lab parameters simultaneously, and 3) Integration of lab results with other clinical information. For example, when analyzing a liver function panel, the AI could correlate elevated enzymes with specific conditions, similar to how a physician would use these markers to narrow down potential diagnoses.
How can AI assist doctors in making medical diagnoses in everyday practice?
AI can serve as a powerful support tool for doctors by analyzing patient data and suggesting possible diagnoses quickly. The technology acts like a smart assistant that processes vast amounts of information - including symptoms, lab results, and medical history - to generate potential diagnoses for consideration. This can help doctors work more efficiently by narrowing down possibilities and ensuring important diagnostic options aren't overlooked. For instance, during a busy clinic day, AI could quickly analyze a patient's lab results and suggest relevant conditions to investigate, saving valuable time while maintaining diagnostic accuracy.
What are the main benefits of combining AI with laboratory testing in healthcare?
Combining AI with laboratory testing offers several key advantages in healthcare delivery. First, it enhances diagnostic accuracy by providing objective data analysis alongside clinical observations. Second, it speeds up the diagnostic process by quickly processing complex lab results and suggesting relevant conditions. Third, it can help reduce human error by serving as a second set of 'eyes' on test results. For example, in a busy emergency department, AI could quickly flag concerning lab values and suggest potential diagnoses, helping healthcare providers make faster, more informed decisions about patient care.

PromptLayer Features

  1. Testing & Evaluation
  2. The paper's methodology of testing multiple LLMs across standardized clinical cases aligns with PromptLayer's batch testing and evaluation capabilities
Implementation Details
Set up systematic batch tests comparing model responses with and without lab data, implement scoring metrics for diagnostic accuracy, and track performance across model versions
Key Benefits
• Standardized evaluation across multiple models • Automated accuracy tracking and comparison • Reproducible testing framework
Potential Improvements
• Add specialized medical accuracy metrics • Implement domain-specific evaluation criteria • Develop automated validation against medical databases
Business Value
Efficiency Gains
Reduces manual evaluation time by 70% through automated testing
Cost Savings
Decreases evaluation costs by standardizing testing protocols
Quality Improvement
Ensures consistent evaluation criteria across all diagnostic tests
  1. Prompt Management
  2. The study's use of structured clinical data and lab results requires careful prompt engineering and versioning
Implementation Details
Create templated prompts for different types of lab data, maintain versions for different diagnostic scenarios, and implement access controls for medical data
Key Benefits
• Consistent prompt structure across tests • Version control for different data types • Secure handling of medical information
Potential Improvements
• Add medical-specific prompt templates • Implement field validation for lab data • Create specialized medical prompt libraries
Business Value
Efficiency Gains
Reduces prompt development time by 50% through reusable templates
Cost Savings
Minimizes errors and rework through standardized prompt management
Quality Improvement
Ensures consistent and accurate presentation of medical data

The first platform built for prompt engineering