en_core_web_sm

Maintained By
spacy

en_core_web_sm

PropertyValue
LicenseMIT
AuthorExplosion AI
spaCy Version>=3.7.2,<3.8.0
Token Accuracy99.86%

What is en_core_web_sm?

en_core_web_sm is a lightweight English language processing model optimized for CPU usage, developed by Explosion AI. It's part of the spaCy ecosystem and provides comprehensive natural language processing capabilities while maintaining a small footprint.

Implementation Details

The model implements a sophisticated pipeline architecture comprising seven core components: tok2vec, tagger, parser, senter, ner, attribute_ruler, and lemmatizer. It's trained on various high-quality datasets including OntoNotes 5, ClearNLP, and WordNet 3.0.

  • Named Entity Recognition (NER) with 84.56% F-score
  • Part-of-speech tagging with 97.25% accuracy
  • Dependency parsing with 91.75% unlabeled attachment score
  • Sentence segmentation with 90.59% F-score

Core Capabilities

  • 18 distinct NER categories including PERSON, ORG, DATE, and more
  • Comprehensive token classification with 50+ POS tags
  • 44 dependency parsing labels for detailed syntactic analysis
  • High-accuracy sentence boundary detection

Frequently Asked Questions

Q: What makes this model unique?

The model's strength lies in its balanced performance across multiple NLP tasks while maintaining a small footprint, making it ideal for CPU-based applications requiring quick processing.

Q: What are the recommended use cases?

This model is particularly well-suited for production environments where computational resources are limited but require reliable English language processing, including named entity recognition, POS tagging, and dependency parsing.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.