RoBERTalex

Property	Value
Architecture	RoBERTa base
Language	Spanish
Training Data	8.9GB Legal Domain Corpora
License	Apache 2.0
Paper	arXiv:2110.12201

What is RoBERTalex?

RoBERTalex is a specialized Spanish language model based on RoBERTa, specifically trained on legal domain texts. Developed by the Text Mining Unit at Barcelona Supercomputing Center, it represents a significant advancement in Spanish legal NLP applications. The model was trained on a comprehensive 8.9GB corpus of legal texts, making it particularly adept at understanding and processing Spanish legal language.

Implementation Details

The model utilizes a byte-version of BPE tokenization with a vocabulary size of 50,262 tokens. Training was conducted using 2 computing nodes, each equipped with 4 NVIDIA V100 GPUs with 16GB VRAM. The model follows RoBERTa's masked language modeling approach and has achieved impressive performance metrics across various NLP tasks.

Trained on preprocessed legal corpora with sentence splitting and language detection
Implements document boundary preservation during training
Uses RoBERTa base architecture with Spanish legal domain specialization

Core Capabilities

Masked Language Modeling with state-of-the-art performance
98.71% F1 score on UD-POS tagging
83.23% F1 score on CoNLL-NERC
73.74% Combined score on STS tasks
Fine-tuning capability for downstream tasks like Question Answering and Text Classification

Frequently Asked Questions

Q: What makes this model unique?

RoBERTalex is specifically optimized for Spanish legal text processing, trained on an extensive legal corpus, making it particularly effective for legal domain applications while maintaining strong performance on general language tasks.

Q: What are the recommended use cases?

The model excels in masked language modeling tasks and can be fine-tuned for various downstream applications including Question Answering, Text Classification, and Named Entity Recognition in legal contexts. It's particularly suitable for Spanish legal document processing and analysis.

RoBERTalex

RoBERTalex

What is RoBERTalex?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models