SEC-BERT-BASE

Property	Value
Parameters	110M
Architecture	12-layer, 768-hidden, 12-heads
Training Data	260,773 10-K SEC filings (1993-2019)
Paper	FiNER: Financial Numeric Entity Recognition for XBRL Tagging

What is sec-bert-base?

SEC-BERT-BASE is a specialized BERT model trained specifically for financial domain natural language processing. It's part of the SEC-BERT family of models developed by AUEB's Natural Language Processing Group, designed to enhance financial text analysis capabilities. The model was pre-trained on a massive dataset of SEC filings, making it particularly adept at understanding financial terminology and contexts.

Implementation Details

The model implements a custom 30k subword vocabulary trained from scratch on financial documents. It follows BERT's base architecture but with domain-specific training on financial texts. The training process involved 1 million steps with 256-sequence batches and a 1e-4 learning rate, utilizing Google Cloud TPU v3-8.

Custom financial vocabulary of 30k subwords
Pre-trained on 260,773 10-K filings
Compatible with both PyTorch and TensorFlow 2
Trained with masked language modeling objective

Core Capabilities

Superior performance in financial text prediction tasks
Enhanced understanding of financial terminology
Accurate numeric value and context prediction
Improved financial entity recognition

Frequently Asked Questions

Q: What makes this model unique?

SEC-BERT-BASE stands out due to its specialized training on financial documents from SEC filings, making it particularly effective for financial NLP tasks compared to general-purpose BERT models. The model shows significantly better performance in predicting financial contexts and terminology.

Q: What are the recommended use cases?

The model is ideal for financial text analysis tasks including: financial document parsing, numeric entity recognition, financial sentiment analysis, and automated financial report analysis. It's particularly useful for FinTech applications and financial research requiring deep understanding of SEC documents.

sec-bert-base