mbert-swedish-distilled-cased
Property | Value |
---|---|
Model Type | Distilled BERT |
Architecture | 6-layer Transformer |
Training Data | Swedish Culturomics Gigaword Corpus (2010-2015) |
Model Link | Hugging Face |
What is mbert-swedish-distilled-cased?
mbert-swedish-distilled-cased is a compressed version of the multilingual BERT model, specifically optimized for Swedish language processing. Created as part of a Master's Thesis project, this model employs the LightMBERT distillation method to create a more efficient 6-layer architecture while maintaining competitive performance with the original mBERT.
Implementation Details
The model was distilled using approximately 9GB of tokenized Swedish text from the Culturomics Gigaword Corpus (2010-2015). Unlike traditional LightMBERT implementations, this version was trained without freezing the embedding layer, allowing for more flexible adaptation to Swedish language patterns.
- 6-layer architecture (reduced from original mBERT)
- Trained on high-quality Swedish corpus data
- Maintains original mBERT tokenizer
- Supports both masked language modeling and next sentence prediction
Core Capabilities
- Achieves 0.859 F1 score on SUCX 3.0 dataset (compared to mBERT's 0.866)
- Performs well on cross-lingual tasks with 0.826 F1 score on English WikiANN
- Suitable for fine-tuning on downstream tasks
- Optimized for Swedish language processing while maintaining multilingual capabilities
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its efficient 6-layer architecture specifically optimized for Swedish language processing, while maintaining performance comparable to the full mBERT model. It represents a successful application of knowledge distillation techniques to create a more resource-efficient model for Nordic language processing.
Q: What are the recommended use cases?
The model is primarily designed for fine-tuning on downstream tasks in Swedish language processing. It can be used for tasks such as named entity recognition, text classification, and other NLP tasks where Swedish language understanding is crucial. It also supports masked language modeling and next sentence prediction out of the box.