en_core_med7_lg
Property | Value |
---|---|
Author | Andrey Kormilitzin |
License | MIT |
spaCy Version | >=3.4.2,<3.5.0 |
Vector Dimensions | 300 |
Accuracy (F-Score) | 87.70% |
What is en_core_med7_lg?
en_core_med7_lg is a specialized medical natural language processing model designed for clinical text analysis. Developed by Andrey Kormilitzin, it's built on spaCy and features comprehensive word vectors with 514,157 unique keys. The model excels in named entity recognition (NER) for medical text, achieving an impressive F-score of 87.70%.
Implementation Details
The model implements a dual-component pipeline consisting of tok2vec and NER modules. It utilizes 300-dimensional word vectors and is optimized for medical text processing with specific focus on medication-related entities.
- Pre-trained word vectors: 514,157 unique vectors
- High-performance NER with 86.50% precision and 88.93% recall
- Specialized for medical domain terminology
Core Capabilities
- Recognition of 7 medical entities: DOSAGE, DRUG, DURATION, FORM, FREQUENCY, ROUTE, STRENGTH
- Advanced token vectorization for medical terminology
- Robust performance on clinical text analysis
- Compatible with spaCy 3.4.2 ecosystem
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically tailored for medical text analysis with a focus on medication-related information extraction. Its high accuracy and specialized entity recognition capabilities make it particularly valuable for healthcare applications.
Q: What are the recommended use cases?
The model is ideal for processing clinical notes, medical records, and pharmaceutical documentation. It excels at extracting medication details including dosages, drug names, and administration instructions.