en_core_web_trf

Maintained By
spacy

en_core_web_trf

PropertyValue
LicenseMIT
AuthorExplosion AI
Base ArchitectureRoBERTa-base
spaCy Compatibility≥3.7.2, <3.8.0

What is en_core_web_trf?

en_core_web_trf is a powerful English language transformer-based pipeline model built on the RoBERTa architecture. It represents spaCy's state-of-the-art offering for English language processing, combining high accuracy with comprehensive language understanding capabilities.

Implementation Details

The model is implemented using a transformer architecture based on RoBERTa-base, featuring byte-BPE tokenization with a vocabulary size of 50,265 tokens. It employs a sophisticated pipeline including transformer, tagger, parser, named entity recognizer, attribute ruler, and lemmatizer components.

  • Transformer Configuration: 768-dimensional embeddings with 144 token window size
  • Named Entity Recognition F-score: 90.19%
  • Part-of-Speech Tagging Accuracy: 98.13%
  • Dependency Parsing (LAS): 93.91%

Core Capabilities

  • Named Entity Recognition with 18 entity types
  • Part-of-Speech Tagging with 50+ tag classes
  • Dependency Parsing with 45 dependency labels
  • Sentence Boundary Detection (90.11% F-score)
  • Lemmatization and Attribute Assignment

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its exceptional accuracy across multiple NLP tasks, particularly in POS tagging (98.13%) and NER (90.19% F-score). It's built on the robust RoBERTa architecture and trained on high-quality datasets including OntoNotes 5.

Q: What are the recommended use cases?

The model excels in production environments requiring high-accuracy language understanding, including document analysis, information extraction, and text analytics. It's particularly suitable for applications needing precise entity recognition, syntactic analysis, or detailed linguistic annotation.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.