sec-bert-num

Maintained By
nlpaueb

SEC-BERT-NUM

PropertyValue
Parameters110M
Architecture12-layer, 768-hidden, 12-heads BERT
Training Data260,773 SEC 10-K filings (1993-2019)
PaperFiNER: Financial Numeric Entity Recognition for XBRL Tagging

What is SEC-BERT-NUM?

SEC-BERT-NUM is a specialized BERT model designed specifically for financial domain natural language processing. Its unique feature is the uniform handling of numerical expressions by replacing all number tokens with a [NUM] pseudo-token, preventing fragmentation of numeric values. The model was trained on a massive dataset of SEC filings, making it particularly effective for financial text analysis tasks.

Implementation Details

The model builds upon the BERT-BASE architecture but incorporates several domain-specific optimizations. It uses a custom 30k subword vocabulary trained from scratch on financial documents and follows the same training setup as BERT-BASE with 1 million training steps.

  • Trained using Google's official BERT repository
  • Optimized for both PyTorch and TF2 compatibility
  • Implements special numeric token handling through pre-processing
  • Trained on Google Cloud TPU v3-8

Core Capabilities

  • Superior performance in financial text understanding
  • Consistent handling of numerical expressions
  • Enhanced masked token prediction for financial contexts
  • Specialized vocabulary for financial domain

Frequently Asked Questions

Q: What makes this model unique?

SEC-BERT-NUM's distinctive feature is its uniform handling of numerical expressions through the [NUM] token, which helps maintain consistency in financial text processing and improves performance on financial NLP tasks.

Q: What are the recommended use cases?

The model is particularly well-suited for financial text analysis tasks, including financial numeric entity recognition, sentiment analysis of financial documents, and processing of SEC filings and other financial reports.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.