bert-base-spanish-wwm-cased

Maintained By
dccuchile

BETO: Spanish BERT (Cased Version)

PropertyValue
Authordccuchile
Downloads55,540
LicenseCC BY 4.0 (with disclaimers)
Task TypeFill-Mask, Masked Language Modeling

What is bert-base-spanish-wwm-cased?

BETO is a specialized BERT model trained specifically for Spanish language tasks using the Whole Word Masking technique. This cased version maintains the original case sensitivity of the text and has been trained on a comprehensive Spanish corpus for 2M steps. The model utilizes a vocabulary of approximately 31k BPE subwords constructed using SentencePiece.

Implementation Details

The model follows the BERT-Base architecture and has been trained using both PyTorch and TensorFlow frameworks. It has demonstrated impressive performance across various Spanish language benchmarks, often surpassing the multilingual BERT model's results.

  • Achieves 98.97% accuracy on POS tagging tasks
  • Reaches 88.43% performance on Named Entity Recognition
  • Attains 82.01% accuracy on XNLI tasks
  • Shows 95.60% accuracy on MLDoc classification

Core Capabilities

  • Part-of-Speech Tagging
  • Named Entity Recognition
  • Text Classification
  • Cross-lingual Natural Language Inference
  • Paraphrase Identification

Frequently Asked Questions

Q: What makes this model unique?

BETO is specifically optimized for Spanish language tasks, using Whole Word Masking and achieving state-of-the-art performance on various Spanish benchmarks. Unlike multilingual models, it focuses solely on Spanish, allowing for better language-specific representations.

Q: What are the recommended use cases?

The model is ideal for Spanish language processing tasks including text classification, named entity recognition, and part-of-speech tagging. It's particularly useful for applications requiring case-sensitive processing of Spanish text.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.