SEC-BERT-SHAPE

Property	Value
Parameters	110M
Architecture	12-layer, 768-hidden, 12-heads
Training Data	260,773 10-K SEC filings (1993-2019)
Paper	FiNER: Financial Numeric Entity Recognition for XBRL Tagging

What is SEC-BERT-SHAPE?

SEC-BERT-SHAPE is a specialized BERT model designed for financial text analysis, particularly focusing on handling numerical expressions in financial documents. It's part of the SEC-BERT family of models and uniquely transforms numbers into shape-based representations (e.g., "53.2" becomes "[XX.X]"), maintaining the structural integrity of numerical data while processing text.

Implementation Details

The model is built on BERT's architecture but introduces innovative preprocessing for numerical values. It uses a custom vocabulary of 30k subwords and was trained for 1 million steps on a massive dataset of SEC filings using Google Cloud TPU v3-8.

Preserves numerical patterns through 214 predefined shape pseudo-tokens
Handles complex number formats including decimals and thousands separators
Trained specifically on financial documents for domain-specific understanding
Implements efficient number tokenization without fragmentation

Core Capabilities

Enhanced numerical understanding in financial contexts
Improved performance on financial text analysis tasks
Better handling of complex numerical expressions
Specialized for SEC filing analysis and financial document processing
Compatible with both PyTorch and TensorFlow 2

Frequently Asked Questions

Q: What makes this model unique?

SEC-BERT-SHAPE's distinctive feature is its number shape preservation system, which converts numerical values into standardized patterns while maintaining their structural information. This approach significantly improves the model's ability to process financial data where number formats are crucial.

Q: What are the recommended use cases?

The model is particularly suited for financial document analysis, SEC filing processing, numerical information extraction from financial texts, and other FinTech applications where precise handling of numerical data is essential.

sec-bert-shape