sychonix

Maintained By
sychonix

BERT Base Uncased

PropertyValue
Parameter Count110M
Model TypeTransformer-based Language Model
ArchitectureBERT Base (Bidirectional Encoder)
Paperhttp://arxiv.org/abs/1810.04805
Training DataBookCorpus + English Wikipedia

What is BERT Base Uncased?

BERT Base Uncased is a transformer-based language model pretrained on English text using masked language modeling (MLM) and next sentence prediction (NSP) objectives. This uncased variant treats "english" and "English" identically, making it suitable for applications where case sensitivity isn't crucial.

Implementation Details

The model employs a bidirectional transformer architecture trained on 11,038 unpublished books and English Wikipedia. Training was performed on 4 cloud TPUs in Pod configuration for one million steps with a batch size of 256. The model uses WordPiece tokenization with a 30,000 token vocabulary.

  • 15% of tokens are masked during training
  • Trained with Adam optimizer (learning rate 1e-4)
  • Includes warmup period of 10,000 steps
  • Sequence length: 128 tokens (90% of steps), 512 tokens (10% of steps)

Core Capabilities

  • Masked Language Modeling
  • Next Sentence Prediction
  • Feature Extraction for downstream tasks
  • Achieves 79.6% average score on GLUE benchmark

Frequently Asked Questions

Q: What makes this model unique?

This model's bidirectional nature sets it apart from traditional left-to-right language models, allowing it to understand context from both directions. Its maskingV strategy enables rich contextual learning without information leakage.

Q: What are the recommended use cases?

The model is best suited for sequence classification, token classification, and question answering tasks. It's not recommended for text generation tasks, where models like GPT-2 would be more appropriate.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.