slovakbert

slovakbert

gerulata

SlovakBERT - A 125M parameter RoBERTa-based language model trained on 19.35GB of Slovak text data, optimized for masked language modeling tasks.

PropertyValue
Parameter Count125M
LicenseMIT
ArchitectureRoBERTa
PaperarXiv:2109.15254
Training Data Size19.35GB

What is slovakbert?

SlovakBERT is a state-of-the-art language model specifically designed for the Slovak language, developed by Gerulata Technologies. It's a case-sensitive RoBERTa-based model trained on a diverse corpus of Slovak text, including Wikipedia, OpenSubtitles, Oscar, and various web crawls, totaling 19.35GB of text data.

Implementation Details

The model was trained using fairseq on 4 Nvidia A100 GPUs for 300K steps, utilizing a batch size of 512 and sequence length of 512. It employs the Adam optimizer with carefully tuned hyperparameters and implements 16-bit float precision for efficient training.

  • Trained on 181.6M unique sentences across multiple datasets
  • Implements masked language modeling (MLM) objective
  • Supports both PyTorch and TensorFlow frameworks
  • Features special token handling for URLs and emails

Core Capabilities

  • Masked Language Modeling for Slovak text
  • Text embeddings generation
  • Fine-tuning capabilities for downstream tasks
  • Cross-framework compatibility (PyTorch/TensorFlow)

Frequently Asked Questions

Q: What makes this model unique?

SlovakBERT is specifically optimized for the Slovak language, trained on an extensive and diverse dataset of Slovak text. It has been trained with special consideration for Slovak-specific linguistic features and implements case-sensitivity for improved accuracy.

Q: What are the recommended use cases?

The model is primarily designed for fine-tuning on downstream tasks in Slovak language processing. It excels at masked language modeling and can be effectively used for text embeddings, sentiment analysis, and other NLP tasks requiring deep understanding of Slovak language context.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026