bert-base-uncased

bert-base-uncased

google-bert

BERT base uncased (110M params) - Foundational transformer model for English language tasks with masked language modeling, trained on BookCorpus and Wikipedia.

PropertyValue
Parameter Count110M
LicenseApache 2.0
PaperView Paper
Training DataBookCorpus + Wikipedia
ArchitectureTransformer-based

What is bert-base-uncased?

BERT base uncased is a foundational transformer model that revolutionized natural language processing. Developed by Google, this 110M parameter model is trained on a massive corpus of lowercase English text, treating "english" and "English" identically. It uses innovative bidirectional training and implements two key pre-training objectives: Masked Language Modeling (MLM) and Next Sentence Prediction (NSP).

Implementation Details

The model employs a sophisticated pre-training approach where 15% of input tokens are masked, with 80% replaced by [MASK] tokens, 10% by random tokens, and 10% left unchanged. It processes sequences up to 512 tokens long and uses WordPiece tokenization with a 30,000 token vocabulary.

  • Training utilized 4 cloud TPUs in Pod configuration
  • Trained for 1 million steps with 256 batch size
  • Uses Adam optimizer with 1e-4 learning rate
  • Implements learning rate warmup and linear decay

Core Capabilities

  • Masked language modeling for bidirectional context understanding
  • Next sentence prediction for document-level comprehension
  • Feature extraction for downstream tasks
  • High performance on GLUE benchmark tasks
  • Efficient fine-tuning capabilities

Frequently Asked Questions

Q: What makes this model unique?

This model's bidirectional training architecture sets it apart, allowing it to understand context from both directions simultaneously, unlike traditional left-to-right language models. Its masked language modeling approach enables deep bidirectional representations.

Q: What are the recommended use cases?

The model excels in tasks that require whole-sentence understanding, including sequence classification, token classification, and question answering. It's particularly effective when fine-tuned for specific downstream tasks but isn't recommended for text generation tasks.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026