bert-base-spanish-wwm-uncased

bert-base-spanish-wwm-uncased

dccuchile

Pre-trained Spanish BERT model using Whole Word Masking, achieving SOTA results on Spanish NLP tasks with 309K+ downloads and strong benchmark performance.

PropertyValue
Developerdccuchile
LicenseCC BY 4.0 (with disclaimers)
Downloads309,301
Vocabulary Size31k BPE subwords
Training Steps2M

What is bert-base-spanish-wwm-uncased?

BETO is a specialized Spanish BERT model trained on a comprehensive Spanish corpus using the Whole Word Masking technique. This uncased version represents a significant advancement in Spanish natural language processing, comparable in size to BERT-Base while being specifically optimized for Spanish language tasks.

Implementation Details

The model utilizes a vocabulary of approximately 31,000 BPE subwords constructed using SentencePiece and underwent training for 2 million steps. It's implemented using both PyTorch and TensorFlow frameworks, making it versatile for different development environments.

  • Trained with Whole Word Masking technique for better contextual understanding
  • Achieves state-of-the-art performance on multiple Spanish NLP benchmarks
  • Compatible with both PyTorch and TensorFlow frameworks

Core Capabilities

  • Part-of-Speech Tagging (POS): 98.44% accuracy
  • Named Entity Recognition (NER): 82.67% accuracy
  • Document Classification (MLDoc): 96.12% accuracy
  • Natural Language Inference (XNLI): 80.15% accuracy
  • Paraphrase Identification (PAWS-X): 89.55% accuracy

Frequently Asked Questions

Q: What makes this model unique?

BETO stands out for its specialized training on Spanish language data using Whole Word Masking, consistently outperforming multilingual BERT models on Spanish-specific tasks. It's particularly notable for achieving state-of-the-art results in MLDoc classification (96.12%) and competitive performance across other benchmarks.

Q: What are the recommended use cases?

The model is particularly well-suited for Spanish language processing tasks including text classification, named entity recognition, part-of-speech tagging, and natural language inference. It's ideal for both academic research and production applications requiring sophisticated Spanish language understanding.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026