en_core_med7_lg

Maintained By
kormilitzin

en_core_med7_lg

PropertyValue
AuthorAndrey Kormilitzin
LicenseMIT
spaCy Version>=3.4.2,<3.5.0
Vector Dimensions300
Accuracy (F-Score)87.70%

What is en_core_med7_lg?

en_core_med7_lg is a specialized medical natural language processing model designed for clinical text analysis. Developed by Andrey Kormilitzin, it's built on spaCy and features comprehensive word vectors with 514,157 unique keys. The model excels in named entity recognition (NER) for medical text, achieving an impressive F-score of 87.70%.

Implementation Details

The model implements a dual-component pipeline consisting of tok2vec and NER modules. It utilizes 300-dimensional word vectors and is optimized for medical text processing with specific focus on medication-related entities.

  • Pre-trained word vectors: 514,157 unique vectors
  • High-performance NER with 86.50% precision and 88.93% recall
  • Specialized for medical domain terminology

Core Capabilities

  • Recognition of 7 medical entities: DOSAGE, DRUG, DURATION, FORM, FREQUENCY, ROUTE, STRENGTH
  • Advanced token vectorization for medical terminology
  • Robust performance on clinical text analysis
  • Compatible with spaCy 3.4.2 ecosystem

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically tailored for medical text analysis with a focus on medication-related information extraction. Its high accuracy and specialized entity recognition capabilities make it particularly valuable for healthcare applications.

Q: What are the recommended use cases?

The model is ideal for processing clinical notes, medical records, and pharmaceutical documentation. It excels at extracting medication details including dosages, drug names, and administration instructions.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.