BERT Base Uncased
Property | Value |
---|---|
Parameter Count | 110M |
Model Type | Transformer-based Language Model |
Architecture | BERT Base (Bidirectional Encoder) |
Paper | http://arxiv.org/abs/1810.04805 |
Training Data | BookCorpus + English Wikipedia |
What is BERT Base Uncased?
BERT Base Uncased is a transformer-based language model pretrained on English text using masked language modeling (MLM) and next sentence prediction (NSP) objectives. This uncased variant treats "english" and "English" identically, making it suitable for applications where case sensitivity isn't crucial.
Implementation Details
The model employs a bidirectional transformer architecture trained on 11,038 unpublished books and English Wikipedia. Training was performed on 4 cloud TPUs in Pod configuration for one million steps with a batch size of 256. The model uses WordPiece tokenization with a 30,000 token vocabulary.
- 15% of tokens are masked during training
- Trained with Adam optimizer (learning rate 1e-4)
- Includes warmup period of 10,000 steps
- Sequence length: 128 tokens (90% of steps), 512 tokens (10% of steps)
Core Capabilities
- Masked Language Modeling
- Next Sentence Prediction
- Feature Extraction for downstream tasks
- Achieves 79.6% average score on GLUE benchmark
Frequently Asked Questions
Q: What makes this model unique?
This model's bidirectional nature sets it apart from traditional left-to-right language models, allowing it to understand context from both directions. Its maskingV strategy enables rich contextual learning without information leakage.
Q: What are the recommended use cases?
The model is best suited for sequence classification, token classification, and question answering tasks. It's not recommended for text generation tasks, where models like GPT-2 would be more appropriate.