mbert-swedish-distilled-cased

Property	Value
Model Type	Distilled BERT
Architecture	6-layer Transformer
Training Data	Swedish Culturomics Gigaword Corpus (2010-2015)
Model Link	Hugging Face

What is mbert-swedish-distilled-cased?

mbert-swedish-distilled-cased is a compressed version of the multilingual BERT model, specifically optimized for Swedish language processing. Created as part of a Master's Thesis project, this model employs the LightMBERT distillation method to create a more efficient 6-layer architecture while maintaining competitive performance with the original mBERT.

Implementation Details

The model was distilled using approximately 9GB of tokenized Swedish text from the Culturomics Gigaword Corpus (2010-2015). Unlike traditional LightMBERT implementations, this version was trained without freezing the embedding layer, allowing for more flexible adaptation to Swedish language patterns.

6-layer architecture (reduced from original mBERT)
Trained on high-quality Swedish corpus data
Maintains original mBERT tokenizer
Supports both masked language modeling and next sentence prediction

Core Capabilities

Achieves 0.859 F1 score on SUCX 3.0 dataset (compared to mBERT's 0.866)
Performs well on cross-lingual tasks with 0.826 F1 score on English WikiANN
Suitable for fine-tuning on downstream tasks
Optimized for Swedish language processing while maintaining multilingual capabilities

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient 6-layer architecture specifically optimized for Swedish language processing, while maintaining performance comparable to the full mBERT model. It represents a successful application of knowledge distillation techniques to create a more resource-efficient model for Nordic language processing.

Q: What are the recommended use cases?

The model is primarily designed for fine-tuning on downstream tasks in Swedish language processing. It can be used for tasks such as named entity recognition, text classification, and other NLP tasks where Swedish language understanding is crucial. It also supports masked language modeling and next sentence prediction out of the box.