bert-base-spanish-wwm-uncased

Maintained By
dccuchile

BETO: Spanish BERT (Uncased)

PropertyValue
Developerdccuchile
LicenseCC BY 4.0 (with disclaimers)
Downloads309,301
Vocabulary Size31k BPE subwords
Training Steps2M

What is bert-base-spanish-wwm-uncased?

BETO is a specialized Spanish BERT model trained on a comprehensive Spanish corpus using the Whole Word Masking technique. This uncased version represents a significant advancement in Spanish natural language processing, comparable in size to BERT-Base while being specifically optimized for Spanish language tasks.

Implementation Details

The model utilizes a vocabulary of approximately 31,000 BPE subwords constructed using SentencePiece and underwent training for 2 million steps. It's implemented using both PyTorch and TensorFlow frameworks, making it versatile for different development environments.

  • Trained with Whole Word Masking technique for better contextual understanding
  • Achieves state-of-the-art performance on multiple Spanish NLP benchmarks
  • Compatible with both PyTorch and TensorFlow frameworks

Core Capabilities

  • Part-of-Speech Tagging (POS): 98.44% accuracy
  • Named Entity Recognition (NER): 82.67% accuracy
  • Document Classification (MLDoc): 96.12% accuracy
  • Natural Language Inference (XNLI): 80.15% accuracy
  • Paraphrase Identification (PAWS-X): 89.55% accuracy

Frequently Asked Questions

Q: What makes this model unique?

BETO stands out for its specialized training on Spanish language data using Whole Word Masking, consistently outperforming multilingual BERT models on Spanish-specific tasks. It's particularly notable for achieving state-of-the-art results in MLDoc classification (96.12%) and competitive performance across other benchmarks.

Q: What are the recommended use cases?

The model is particularly well-suited for Spanish language processing tasks including text classification, named entity recognition, part-of-speech tagging, and natural language inference. It's ideal for both academic research and production applications requiring sophisticated Spanish language understanding.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.