sec-bert-base

Maintained By
nlpaueb

SEC-BERT-BASE

PropertyValue
Parameters110M
Architecture12-layer, 768-hidden, 12-heads
Training Data260,773 10-K SEC filings (1993-2019)
PaperFiNER: Financial Numeric Entity Recognition for XBRL Tagging

What is sec-bert-base?

SEC-BERT-BASE is a specialized BERT model trained specifically for financial domain natural language processing. It's part of the SEC-BERT family of models developed by AUEB's Natural Language Processing Group, designed to enhance financial text analysis capabilities. The model was pre-trained on a massive dataset of SEC filings, making it particularly adept at understanding financial terminology and contexts.

Implementation Details

The model implements a custom 30k subword vocabulary trained from scratch on financial documents. It follows BERT's base architecture but with domain-specific training on financial texts. The training process involved 1 million steps with 256-sequence batches and a 1e-4 learning rate, utilizing Google Cloud TPU v3-8.

  • Custom financial vocabulary of 30k subwords
  • Pre-trained on 260,773 10-K filings
  • Compatible with both PyTorch and TensorFlow 2
  • Trained with masked language modeling objective

Core Capabilities

  • Superior performance in financial text prediction tasks
  • Enhanced understanding of financial terminology
  • Accurate numeric value and context prediction
  • Improved financial entity recognition

Frequently Asked Questions

Q: What makes this model unique?

SEC-BERT-BASE stands out due to its specialized training on financial documents from SEC filings, making it particularly effective for financial NLP tasks compared to general-purpose BERT models. The model shows significantly better performance in predicting financial contexts and terminology.

Q: What are the recommended use cases?

The model is ideal for financial text analysis tasks including: financial document parsing, numeric entity recognition, financial sentiment analysis, and automated financial report analysis. It's particularly useful for FinTech applications and financial research requiring deep understanding of SEC documents.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.