en_core_web_sm

en_core_web_sm

spacy

Compact English language model for NLP tasks with 97.25% tagging accuracy. Features tok2vec, tagger, parser & NER components. MIT licensed.

PropertyValue
LicenseMIT
AuthorExplosion AI
spaCy Version>=3.7.2,<3.8.0
Token Accuracy99.86%

What is en_core_web_sm?

en_core_web_sm is a lightweight English language processing model optimized for CPU usage, developed by Explosion AI. It's part of the spaCy ecosystem and provides comprehensive natural language processing capabilities while maintaining a small footprint.

Implementation Details

The model implements a sophisticated pipeline architecture comprising seven core components: tok2vec, tagger, parser, senter, ner, attribute_ruler, and lemmatizer. It's trained on various high-quality datasets including OntoNotes 5, ClearNLP, and WordNet 3.0.

  • Named Entity Recognition (NER) with 84.56% F-score
  • Part-of-speech tagging with 97.25% accuracy
  • Dependency parsing with 91.75% unlabeled attachment score
  • Sentence segmentation with 90.59% F-score

Core Capabilities

  • 18 distinct NER categories including PERSON, ORG, DATE, and more
  • Comprehensive token classification with 50+ POS tags
  • 44 dependency parsing labels for detailed syntactic analysis
  • High-accuracy sentence boundary detection

Frequently Asked Questions

Q: What makes this model unique?

The model's strength lies in its balanced performance across multiple NLP tasks while maintaining a small footprint, making it ideal for CPU-based applications requiring quick processing.

Q: What are the recommended use cases?

This model is particularly well-suited for production environments where computational resources are limited but require reliable English language processing, including named entity recognition, POS tagging, and dependency parsing.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026