en_core_web_sm

Property	Value
License	MIT
Author	Explosion AI
spaCy Version	>=3.7.2,<3.8.0
Token Accuracy	99.86%

What is en_core_web_sm?

en_core_web_sm is a lightweight English language processing model optimized for CPU usage, developed by Explosion AI. It's part of the spaCy ecosystem and provides comprehensive natural language processing capabilities while maintaining a small footprint.

Implementation Details

The model implements a sophisticated pipeline architecture comprising seven core components: tok2vec, tagger, parser, senter, ner, attribute_ruler, and lemmatizer. It's trained on various high-quality datasets including OntoNotes 5, ClearNLP, and WordNet 3.0.

Named Entity Recognition (NER) with 84.56% F-score
Part-of-speech tagging with 97.25% accuracy
Dependency parsing with 91.75% unlabeled attachment score
Sentence segmentation with 90.59% F-score

Core Capabilities

18 distinct NER categories including PERSON, ORG, DATE, and more
Comprehensive token classification with 50+ POS tags
44 dependency parsing labels for detailed syntactic analysis
High-accuracy sentence boundary detection

Frequently Asked Questions

Q: What makes this model unique?

The model's strength lies in its balanced performance across multiple NLP tasks while maintaining a small footprint, making it ideal for CPU-based applications requiring quick processing.

Q: What are the recommended use cases?

This model is particularly well-suited for production environments where computational resources are limited but require reliable English language processing, including named entity recognition, POS tagging, and dependency parsing.

en_core_web_sm

en_core_web_sm

What is en_core_web_sm?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models