sec-bert-shape

Maintained By
nlpaueb

SEC-BERT-SHAPE

PropertyValue
Parameters110M
Architecture12-layer, 768-hidden, 12-heads
Training Data260,773 10-K SEC filings (1993-2019)
PaperFiNER: Financial Numeric Entity Recognition for XBRL Tagging

What is SEC-BERT-SHAPE?

SEC-BERT-SHAPE is a specialized BERT model designed for financial text analysis, particularly focusing on handling numerical expressions in financial documents. It's part of the SEC-BERT family of models and uniquely transforms numbers into shape-based representations (e.g., "53.2" becomes "[XX.X]"), maintaining the structural integrity of numerical data while processing text.

Implementation Details

The model is built on BERT's architecture but introduces innovative preprocessing for numerical values. It uses a custom vocabulary of 30k subwords and was trained for 1 million steps on a massive dataset of SEC filings using Google Cloud TPU v3-8.

  • Preserves numerical patterns through 214 predefined shape pseudo-tokens
  • Handles complex number formats including decimals and thousands separators
  • Trained specifically on financial documents for domain-specific understanding
  • Implements efficient number tokenization without fragmentation

Core Capabilities

  • Enhanced numerical understanding in financial contexts
  • Improved performance on financial text analysis tasks
  • Better handling of complex numerical expressions
  • Specialized for SEC filing analysis and financial document processing
  • Compatible with both PyTorch and TensorFlow 2

Frequently Asked Questions

Q: What makes this model unique?

SEC-BERT-SHAPE's distinctive feature is its number shape preservation system, which converts numerical values into standardized patterns while maintaining their structural information. This approach significantly improves the model's ability to process financial data where number formats are crucial.

Q: What are the recommended use cases?

The model is particularly suited for financial document analysis, SEC filing processing, numerical information extraction from financial texts, and other FinTech applications where precise handling of numerical data is essential.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.